Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/4] afs: Refcount afs_call struct
From: David Howells @ 2017-01-09 14:52 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel


These patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.

The patches are:

 (1) Add some tracepoints for logging incoming calls and monitoring
     notifications from AF_RXRPC and data reception.

 (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
     initially expected.  It can be brought back later if needed.  This
     clears some stuff out that I don't then need to fix up in (4).

 (3) Allow listen(..., 0) to be used to disable listening.  This makes
     shutting down the AFS cache manager server in the kernel much easier
     and the accounting simpler as we can then be sure that (a) all
     preallocated afs_call structs are relesed and (b) no new incoming
     calls are going to be started.

     For the moment, listening cannot be reenabled.

 (4) Add refcounting to the afs_call struct to fix a potential multiple
     release detected by static checking and add a tracepoint to follow the
     lifecycle of afs_call objects.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-rewrite-20170109

David
---
David Howells (4):
      afs: Add some tracepoints
      afs: Kill afs_wait_mode
      rxrpc: Allow listen(sock, 0) to be used to disable listening
      afs: Refcount the afs_call struct


 fs/afs/callback.c          |    2 
 fs/afs/cmservice.c         |   63 ++++++++-----
 fs/afs/fsclient.c          |   80 ++++++++---------
 fs/afs/internal.h          |   96 ++++++++------------
 fs/afs/main.c              |    1 
 fs/afs/rxrpc.c             |  208 +++++++++++++++++++++++---------------------
 fs/afs/vlclient.c          |    8 +-
 fs/afs/vlocation.c         |    4 -
 fs/afs/vnode.c             |   26 +++---
 include/trace/events/afs.h |  184 +++++++++++++++++++++++++++++++++++++++
 net/rxrpc/af_rxrpc.c       |    8 ++
 net/rxrpc/ar-internal.h    |    1 
 net/rxrpc/call_accept.c    |    3 -
 13 files changed, 442 insertions(+), 242 deletions(-)
 create mode 100644 include/trace/events/afs.h

^ permalink raw reply

* [PATCH net-next 1/4] afs: Add some tracepoints
From: David Howells @ 2017-01-09 14:52 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <148397356909.20445.15077871371099721338.stgit@warthog.procyon.org.uk>

Add three tracepoints to the AFS filesystem:

 (1) The afs_recv_data tracepoint logs data segments that are extracted
     from the data received from the peer through afs_extract_data().

 (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
     coming in to an asynchronous call.

 (3) The afs_cb_call tracepoint logs incoming calls that have had their
     operation ID extracted and mapped into a supported cache manager
     service call.

To make (3) work, the name strings in the afs_call_type struct objects have
to be annotated with __tracepoint_string.  This is done with the CM_NAME()
macro.

Further, the AFS call state enum needs a name so that it can be used to
declare parameter types.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cmservice.c         |   22 ++++++---
 fs/afs/internal.h          |   21 +++++---
 fs/afs/main.c              |    1 
 fs/afs/rxrpc.c             |    6 ++
 include/trace/events/afs.h |  109 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 144 insertions(+), 15 deletions(-)
 create mode 100644 include/trace/events/afs.h

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index d764236072b1..a2e1e02005f6 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -25,11 +25,16 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *);
 static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *);
 static void afs_cm_destructor(struct afs_call *);
 
+#define CM_NAME(name) \
+	const char afs_SRXCB##name##_name[] __tracepoint_string =	\
+		"CB." #name
+
 /*
  * CB.CallBack operation type
  */
+static CM_NAME(CallBack);
 static const struct afs_call_type afs_SRXCBCallBack = {
-	.name		= "CB.CallBack",
+	.name		= afs_SRXCBCallBack_name,
 	.deliver	= afs_deliver_cb_callback,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
@@ -38,8 +43,9 @@ static const struct afs_call_type afs_SRXCBCallBack = {
 /*
  * CB.InitCallBackState operation type
  */
+static CM_NAME(InitCallBackState);
 static const struct afs_call_type afs_SRXCBInitCallBackState = {
-	.name		= "CB.InitCallBackState",
+	.name		= afs_SRXCBInitCallBackState_name,
 	.deliver	= afs_deliver_cb_init_call_back_state,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
@@ -48,8 +54,9 @@ static const struct afs_call_type afs_SRXCBInitCallBackState = {
 /*
  * CB.InitCallBackState3 operation type
  */
+static CM_NAME(InitCallBackState3);
 static const struct afs_call_type afs_SRXCBInitCallBackState3 = {
-	.name		= "CB.InitCallBackState3",
+	.name		= afs_SRXCBInitCallBackState3_name,
 	.deliver	= afs_deliver_cb_init_call_back_state3,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
@@ -58,8 +65,9 @@ static const struct afs_call_type afs_SRXCBInitCallBackState3 = {
 /*
  * CB.Probe operation type
  */
+static CM_NAME(Probe);
 static const struct afs_call_type afs_SRXCBProbe = {
-	.name		= "CB.Probe",
+	.name		= afs_SRXCBProbe_name,
 	.deliver	= afs_deliver_cb_probe,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
@@ -68,8 +76,9 @@ static const struct afs_call_type afs_SRXCBProbe = {
 /*
  * CB.ProbeUuid operation type
  */
+static CM_NAME(ProbeUuid);
 static const struct afs_call_type afs_SRXCBProbeUuid = {
-	.name		= "CB.ProbeUuid",
+	.name		= afs_SRXCBProbeUuid_name,
 	.deliver	= afs_deliver_cb_probe_uuid,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
@@ -78,8 +87,9 @@ static const struct afs_call_type afs_SRXCBProbeUuid = {
 /*
  * CB.TellMeAboutYourself operation type
  */
+static CM_NAME(TellMeAboutYourself);
 static const struct afs_call_type afs_SRXCBTellMeAboutYourself = {
-	.name		= "CB.TellMeAboutYourself",
+	.name		= afs_SRXCBTellMeAboutYourself_name,
 	.deliver	= afs_deliver_cb_tell_me_about_yourself,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 6f7a9638ba1a..f71e58fcc2f2 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -68,6 +68,15 @@ struct afs_wait_mode {
 extern const struct afs_wait_mode afs_sync_call;
 extern const struct afs_wait_mode afs_async_call;
 
+enum afs_call_state {
+	AFS_CALL_REQUESTING,	/* request is being sent for outgoing call */
+	AFS_CALL_AWAIT_REPLY,	/* awaiting reply to outgoing call */
+	AFS_CALL_AWAIT_OP_ID,	/* awaiting op ID on incoming call */
+	AFS_CALL_AWAIT_REQUEST,	/* awaiting request data on incoming call */
+	AFS_CALL_REPLYING,	/* replying to incoming call */
+	AFS_CALL_AWAIT_ACK,	/* awaiting final ACK of incoming call */
+	AFS_CALL_COMPLETE,	/* Completed or failed */
+};
 /*
  * a record of an in-progress RxRPC call
  */
@@ -91,15 +100,7 @@ struct afs_call {
 	pgoff_t			first;		/* first page in mapping to deal with */
 	pgoff_t			last;		/* last page in mapping to deal with */
 	size_t			offset;		/* offset into received data store */
-	enum {					/* call state */
-		AFS_CALL_REQUESTING,	/* request is being sent for outgoing call */
-		AFS_CALL_AWAIT_REPLY,	/* awaiting reply to outgoing call */
-		AFS_CALL_AWAIT_OP_ID,	/* awaiting op ID on incoming call */
-		AFS_CALL_AWAIT_REQUEST,	/* awaiting request data on incoming call */
-		AFS_CALL_REPLYING,	/* replying to incoming call */
-		AFS_CALL_AWAIT_ACK,	/* awaiting final ACK of incoming call */
-		AFS_CALL_COMPLETE,	/* Completed or failed */
-	}			state;
+	enum afs_call_state	state;
 	int			error;		/* error code */
 	u32			abort_code;	/* Remote abort ID or 0 */
 	unsigned		request_size;	/* size of request data */
@@ -773,6 +774,8 @@ extern int afs_fsync(struct file *, loff_t, loff_t, int);
 /*
  * debug tracing
  */
+#include <trace/events/afs.h>
+
 extern unsigned afs_debug;
 
 #define dbgprintk(FMT,...) \
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 0b187ef3b5b7..f8188feb03ad 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -15,6 +15,7 @@
 #include <linux/completion.h>
 #include <linux/sched.h>
 #include <linux/random.h>
+#define CREATE_TRACE_POINTS
 #include "internal.h"
 
 MODULE_DESCRIPTION("AFS Client File System");
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 25f05a8d21b1..f26344a8c029 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -416,6 +416,8 @@ static void afs_deliver_to_call(struct afs_call *call)
 			ret = rxrpc_kernel_recv_data(afs_socket, call->rxcall,
 						     NULL, 0, &offset, false,
 						     &call->abort_code);
+			trace_afs_recv_data(call, 0, offset, false, ret);
+
 			if (ret == -EINPROGRESS || ret == -EAGAIN)
 				return;
 			if (ret == 1 || ret < 0) {
@@ -541,6 +543,7 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
 {
 	struct afs_call *call = (struct afs_call *)call_user_ID;
 
+	trace_afs_notify_call(rxcall, call);
 	call->need_attention = true;
 	queue_work(afs_async_calls, &call->async_work);
 }
@@ -689,6 +692,8 @@ static int afs_deliver_cm_op_id(struct afs_call *call)
 	if (!afs_cm_incoming_call(call))
 		return -ENOTSUPP;
 
+	trace_afs_cb_call(call);
+
 	/* pass responsibility for the remainer of this message off to the
 	 * cache manager op */
 	return call->type->deliver(call);
@@ -780,6 +785,7 @@ int afs_extract_data(struct afs_call *call, void *buf, size_t count,
 	ret = rxrpc_kernel_recv_data(afs_socket, call->rxcall,
 				     buf, count, &call->offset,
 				     want_more, &call->abort_code);
+	trace_afs_recv_data(call, count, call->offset, want_more, ret);
 	if (ret == 0 || ret == -EAGAIN)
 		return ret;
 
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
new file mode 100644
index 000000000000..845907b04ff4
--- /dev/null
+++ b/include/trace/events/afs.h
@@ -0,0 +1,109 @@
+/* AFS tracepoints
+ *
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM afs
+
+#if !defined(_TRACE_AFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_AFS_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(afs_recv_data,
+	    TP_PROTO(struct afs_call *call, unsigned count, unsigned offset,
+		     bool want_more, int ret),
+
+	    TP_ARGS(call, count, offset, want_more, ret),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,	rxcall		)
+		    __field(struct afs_call *,		call		)
+		    __field(enum afs_call_state,	state		)
+		    __field(unsigned int,		count		)
+		    __field(unsigned int,		offset		)
+		    __field(unsigned short,		unmarshall	)
+		    __field(bool,			want_more	)
+		    __field(int,			ret		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rxcall	= call->rxcall;
+		    __entry->call	= call;
+		    __entry->state	= call->state;
+		    __entry->unmarshall	= call->unmarshall;
+		    __entry->count	= count;
+		    __entry->offset	= offset;
+		    __entry->want_more	= want_more;
+		    __entry->ret	= ret;
+			   ),
+
+	    TP_printk("c=%p ac=%p s=%u u=%u %u/%u wm=%u ret=%d",
+		      __entry->rxcall,
+		      __entry->call,
+		      __entry->state, __entry->unmarshall,
+		      __entry->offset, __entry->count,
+		      __entry->want_more, __entry->ret)
+	    );
+
+TRACE_EVENT(afs_notify_call,
+	    TP_PROTO(struct rxrpc_call *rxcall, struct afs_call *call),
+
+	    TP_ARGS(rxcall, call),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,	rxcall		)
+		    __field(struct afs_call *,		call		)
+		    __field(enum afs_call_state,	state		)
+		    __field(unsigned short,		unmarshall	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rxcall	= rxcall;
+		    __entry->call	= call;
+		    __entry->state	= call->state;
+		    __entry->unmarshall	= call->unmarshall;
+			   ),
+
+	    TP_printk("c=%p ac=%p s=%u u=%u",
+		      __entry->rxcall,
+		      __entry->call,
+		      __entry->state, __entry->unmarshall)
+	    );
+
+TRACE_EVENT(afs_cb_call,
+	    TP_PROTO(struct afs_call *call),
+
+	    TP_ARGS(call),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,	rxcall		)
+		    __field(struct afs_call *,		call		)
+		    __field(const char *,		name		)
+		    __field(u32,			op		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rxcall	= call->rxcall;
+		    __entry->call	= call;
+		    __entry->name	= call->type->name;
+		    __entry->op		= call->operation_ID;
+			   ),
+
+	    TP_printk("c=%p ac=%p %s o=%u",
+		      __entry->rxcall,
+		      __entry->call,
+		      __entry->name,
+		      __entry->op)
+	    );
+
+#endif /* _TRACE_AFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

^ permalink raw reply related

* [PATCH net-next 2/4] afs: Kill afs_wait_mode
From: David Howells @ 2017-01-09 14:53 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <148397356909.20445.15077871371099721338.stgit@warthog.procyon.org.uk>

The afs_wait_mode struct isn't really necessary.  Client calls only use one
of a choice of two (synchronous or the asynchronous) and incoming calls
don't use the wait at all.  Replace with a boolean parameter.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/callback.c  |    2 +
 fs/afs/fsclient.c  |   80 ++++++++++++++++++++++++++--------------------------
 fs/afs/internal.h  |   68 ++++++++++++--------------------------------
 fs/afs/rxrpc.c     |   51 ++++++++-------------------------
 fs/afs/vlclient.c  |    8 +++--
 fs/afs/vlocation.c |    4 +--
 fs/afs/vnode.c     |   26 ++++++++---------
 7 files changed, 90 insertions(+), 149 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 1e9d2f84e5b5..b29447e03ede 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -343,7 +343,7 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work)
 	 *   had callbacks entirely, and the server will call us later to break
 	 *   them
 	 */
-	afs_fs_give_up_callbacks(server, &afs_async_call);
+	afs_fs_give_up_callbacks(server, true);
 }
 
 /*
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 7dc1f6fb3661..ac8e766978dc 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -275,7 +275,7 @@ int afs_fs_fetch_file_status(struct afs_server *server,
 			     struct key *key,
 			     struct afs_vnode *vnode,
 			     struct afs_volsync *volsync,
-			     const struct afs_wait_mode *wait_mode)
+			     bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -300,7 +300,7 @@ int afs_fs_fetch_file_status(struct afs_server *server,
 	bp[2] = htonl(vnode->fid.vnode);
 	bp[3] = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -464,7 +464,7 @@ static int afs_fs_fetch_data64(struct afs_server *server,
 			       struct key *key,
 			       struct afs_vnode *vnode,
 			       struct afs_read *req,
-			       const struct afs_wait_mode *wait_mode)
+			       bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -495,7 +495,7 @@ static int afs_fs_fetch_data64(struct afs_server *server,
 	bp[7] = htonl(lower_32_bits(req->len));
 
 	atomic_inc(&req->usage);
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -505,7 +505,7 @@ int afs_fs_fetch_data(struct afs_server *server,
 		      struct key *key,
 		      struct afs_vnode *vnode,
 		      struct afs_read *req,
-		      const struct afs_wait_mode *wait_mode)
+		      bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -513,7 +513,7 @@ int afs_fs_fetch_data(struct afs_server *server,
 	if (upper_32_bits(req->pos) ||
 	    upper_32_bits(req->len) ||
 	    upper_32_bits(req->pos + req->len))
-		return afs_fs_fetch_data64(server, key, vnode, req, wait_mode);
+		return afs_fs_fetch_data64(server, key, vnode, req, async);
 
 	_enter("");
 
@@ -539,7 +539,7 @@ int afs_fs_fetch_data(struct afs_server *server,
 	bp[5] = htonl(lower_32_bits(req->len));
 
 	atomic_inc(&req->usage);
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -568,7 +568,7 @@ static const struct afs_call_type afs_RXFSGiveUpCallBacks = {
  * - the callbacks are held in the server->cb_break ring
  */
 int afs_fs_give_up_callbacks(struct afs_server *server,
-			     const struct afs_wait_mode *wait_mode)
+			     bool async)
 {
 	struct afs_call *call;
 	size_t ncallbacks;
@@ -622,7 +622,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
 	ASSERT(ncallbacks > 0);
 	wake_up_nr(&server->cb_break_waitq, ncallbacks);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -673,7 +673,7 @@ int afs_fs_create(struct afs_server *server,
 		  struct afs_fid *newfid,
 		  struct afs_file_status *newstatus,
 		  struct afs_callback *newcb,
-		  const struct afs_wait_mode *wait_mode)
+		  bool async)
 {
 	struct afs_call *call;
 	size_t namesz, reqsz, padsz;
@@ -718,7 +718,7 @@ int afs_fs_create(struct afs_server *server,
 	*bp++ = htonl(mode & S_IALLUGO); /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -763,7 +763,7 @@ int afs_fs_remove(struct afs_server *server,
 		  struct afs_vnode *vnode,
 		  const char *name,
 		  bool isdir,
-		  const struct afs_wait_mode *wait_mode)
+		  bool async)
 {
 	struct afs_call *call;
 	size_t namesz, reqsz, padsz;
@@ -798,7 +798,7 @@ int afs_fs_remove(struct afs_server *server,
 		bp = (void *) bp + padsz;
 	}
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -844,7 +844,7 @@ int afs_fs_link(struct afs_server *server,
 		struct afs_vnode *dvnode,
 		struct afs_vnode *vnode,
 		const char *name,
-		const struct afs_wait_mode *wait_mode)
+		bool async)
 {
 	struct afs_call *call;
 	size_t namesz, reqsz, padsz;
@@ -883,7 +883,7 @@ int afs_fs_link(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -932,7 +932,7 @@ int afs_fs_symlink(struct afs_server *server,
 		   const char *contents,
 		   struct afs_fid *newfid,
 		   struct afs_file_status *newstatus,
-		   const struct afs_wait_mode *wait_mode)
+		   bool async)
 {
 	struct afs_call *call;
 	size_t namesz, reqsz, padsz, c_namesz, c_padsz;
@@ -987,7 +987,7 @@ int afs_fs_symlink(struct afs_server *server,
 	*bp++ = htonl(S_IRWXUGO); /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1036,7 +1036,7 @@ int afs_fs_rename(struct afs_server *server,
 		  const char *orig_name,
 		  struct afs_vnode *new_dvnode,
 		  const char *new_name,
-		  const struct afs_wait_mode *wait_mode)
+		  bool async)
 {
 	struct afs_call *call;
 	size_t reqsz, o_namesz, o_padsz, n_namesz, n_padsz;
@@ -1090,7 +1090,7 @@ int afs_fs_rename(struct afs_server *server,
 		bp = (void *) bp + n_padsz;
 	}
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1145,7 +1145,7 @@ static int afs_fs_store_data64(struct afs_server *server,
 			       pgoff_t first, pgoff_t last,
 			       unsigned offset, unsigned to,
 			       loff_t size, loff_t pos, loff_t i_size,
-			       const struct afs_wait_mode *wait_mode)
+			       bool async)
 {
 	struct afs_vnode *vnode = wb->vnode;
 	struct afs_call *call;
@@ -1194,7 +1194,7 @@ static int afs_fs_store_data64(struct afs_server *server,
 	*bp++ = htonl(i_size >> 32);
 	*bp++ = htonl((u32) i_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1203,7 +1203,7 @@ static int afs_fs_store_data64(struct afs_server *server,
 int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 		      pgoff_t first, pgoff_t last,
 		      unsigned offset, unsigned to,
-		      const struct afs_wait_mode *wait_mode)
+		      bool async)
 {
 	struct afs_vnode *vnode = wb->vnode;
 	struct afs_call *call;
@@ -1229,7 +1229,7 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 
 	if (pos >> 32 || i_size >> 32 || size >> 32 || (pos + size) >> 32)
 		return afs_fs_store_data64(server, wb, first, last, offset, to,
-					   size, pos, i_size, wait_mode);
+					   size, pos, i_size, async);
 
 	call = afs_alloc_flat_call(&afs_RXFSStoreData,
 				   (4 + 6 + 3) * 4,
@@ -1268,7 +1268,7 @@ int afs_fs_store_data(struct afs_server *server, struct afs_writeback *wb,
 	*bp++ = htonl(size);
 	*bp++ = htonl(i_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1330,7 +1330,7 @@ static const struct afs_call_type afs_RXFSStoreData64_as_Status = {
  */
 static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 				 struct afs_vnode *vnode, struct iattr *attr,
-				 const struct afs_wait_mode *wait_mode)
+				 bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1369,7 +1369,7 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
 	*bp++ = htonl(attr->ia_size >> 32);	/* new file length */
 	*bp++ = htonl((u32) attr->ia_size);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1378,7 +1378,7 @@ static int afs_fs_setattr_size64(struct afs_server *server, struct key *key,
  */
 static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 			       struct afs_vnode *vnode, struct iattr *attr,
-			       const struct afs_wait_mode *wait_mode)
+			       bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1389,7 +1389,7 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 	ASSERT(attr->ia_valid & ATTR_SIZE);
 	if (attr->ia_size >> 32)
 		return afs_fs_setattr_size64(server, key, vnode, attr,
-					     wait_mode);
+					     async);
 
 	call = afs_alloc_flat_call(&afs_RXFSStoreData_as_Status,
 				   (4 + 6 + 3) * 4,
@@ -1417,7 +1417,7 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
 	*bp++ = 0;				/* size of write */
 	*bp++ = htonl(attr->ia_size);		/* new file length */
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1426,14 +1426,14 @@ static int afs_fs_setattr_size(struct afs_server *server, struct key *key,
  */
 int afs_fs_setattr(struct afs_server *server, struct key *key,
 		   struct afs_vnode *vnode, struct iattr *attr,
-		   const struct afs_wait_mode *wait_mode)
+		   bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
 
 	if (attr->ia_valid & ATTR_SIZE)
 		return afs_fs_setattr_size(server, key, vnode, attr,
-					   wait_mode);
+					   async);
 
 	_enter(",%x,{%x:%u},,",
 	       key_serial(key), vnode->fid.vid, vnode->fid.vnode);
@@ -1459,7 +1459,7 @@ int afs_fs_setattr(struct afs_server *server, struct key *key,
 
 	xdr_encode_AFS_StoreStatus(&bp, attr);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1661,7 +1661,7 @@ int afs_fs_get_volume_status(struct afs_server *server,
 			     struct key *key,
 			     struct afs_vnode *vnode,
 			     struct afs_volume_status *vs,
-			     const struct afs_wait_mode *wait_mode)
+			     bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1691,7 +1691,7 @@ int afs_fs_get_volume_status(struct afs_server *server,
 	bp[0] = htonl(FSGETVOLUMESTATUS);
 	bp[1] = htonl(vnode->fid.vid);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1753,7 +1753,7 @@ int afs_fs_set_lock(struct afs_server *server,
 		    struct key *key,
 		    struct afs_vnode *vnode,
 		    afs_lock_type_t type,
-		    const struct afs_wait_mode *wait_mode)
+		    bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1777,7 +1777,7 @@ int afs_fs_set_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.unique);
 	*bp++ = htonl(type);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1786,7 +1786,7 @@ int afs_fs_set_lock(struct afs_server *server,
 int afs_fs_extend_lock(struct afs_server *server,
 		       struct key *key,
 		       struct afs_vnode *vnode,
-		       const struct afs_wait_mode *wait_mode)
+		       bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1809,7 +1809,7 @@ int afs_fs_extend_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
 
 /*
@@ -1818,7 +1818,7 @@ int afs_fs_extend_lock(struct afs_server *server,
 int afs_fs_release_lock(struct afs_server *server,
 			struct key *key,
 			struct afs_vnode *vnode,
-			const struct afs_wait_mode *wait_mode)
+			bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -1841,5 +1841,5 @@ int afs_fs_release_lock(struct afs_server *server,
 	*bp++ = htonl(vnode->fid.vnode);
 	*bp++ = htonl(vnode->fid.unique);
 
-	return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);
+	return afs_make_call(&server->addr, call, GFP_NOFS, async);
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f71e58fcc2f2..b411670d5f67 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -51,23 +51,6 @@ struct afs_mount_params {
 	struct key		*key;		/* key to use for secure mounting */
 };
 
-/*
- * definition of how to wait for the completion of an operation
- */
-struct afs_wait_mode {
-	/* RxRPC received message notification */
-	rxrpc_notify_rx_t notify_rx;
-
-	/* synchronous call waiter and call dispatched notification */
-	int (*wait)(struct afs_call *call);
-
-	/* asynchronous call completion */
-	void (*async_complete)(void *reply, int error);
-};
-
-extern const struct afs_wait_mode afs_sync_call;
-extern const struct afs_wait_mode afs_async_call;
-
 enum afs_call_state {
 	AFS_CALL_REQUESTING,	/* request is being sent for outgoing call */
 	AFS_CALL_AWAIT_REPLY,	/* awaiting reply to outgoing call */
@@ -82,7 +65,6 @@ enum afs_call_state {
  */
 struct afs_call {
 	const struct afs_call_type *type;	/* type of call */
-	const struct afs_wait_mode *wait_mode;	/* completion wait mode */
 	wait_queue_head_t	waitq;		/* processes awaiting completion */
 	struct work_struct	async_work;	/* asynchronous work processor */
 	struct work_struct	work;		/* actual work processor */
@@ -111,6 +93,7 @@ struct afs_call {
 	bool			incoming;	/* T if incoming call */
 	bool			send_pages;	/* T if data from mapping should be sent */
 	bool			need_attention;	/* T if RxRPC poked us */
+	bool			async;		/* T if asynchronous */
 	u16			service_id;	/* RxRPC service ID to call */
 	__be16			port;		/* target UDP port */
 	u32			operation_ID;	/* operation ID for an incoming call */
@@ -527,50 +510,37 @@ extern int afs_flock(struct file *, int, struct file_lock *);
  */
 extern int afs_fs_fetch_file_status(struct afs_server *, struct key *,
 				    struct afs_vnode *, struct afs_volsync *,
-				    const struct afs_wait_mode *);
-extern int afs_fs_give_up_callbacks(struct afs_server *,
-				    const struct afs_wait_mode *);
+				    bool);
+extern int afs_fs_give_up_callbacks(struct afs_server *, bool);
 extern int afs_fs_fetch_data(struct afs_server *, struct key *,
-			     struct afs_vnode *, struct afs_read *,
-			     const struct afs_wait_mode *);
+			     struct afs_vnode *, struct afs_read *, bool);
 extern int afs_fs_create(struct afs_server *, struct key *,
 			 struct afs_vnode *, const char *, umode_t,
 			 struct afs_fid *, struct afs_file_status *,
-			 struct afs_callback *,
-			 const struct afs_wait_mode *);
+			 struct afs_callback *, bool);
 extern int afs_fs_remove(struct afs_server *, struct key *,
-			 struct afs_vnode *, const char *, bool,
-			 const struct afs_wait_mode *);
+			 struct afs_vnode *, const char *, bool, bool);
 extern int afs_fs_link(struct afs_server *, struct key *, struct afs_vnode *,
-		       struct afs_vnode *, const char *,
-		       const struct afs_wait_mode *);
+		       struct afs_vnode *, const char *, bool);
 extern int afs_fs_symlink(struct afs_server *, struct key *,
 			  struct afs_vnode *, const char *, const char *,
-			  struct afs_fid *, struct afs_file_status *,
-			  const struct afs_wait_mode *);
+			  struct afs_fid *, struct afs_file_status *, bool);
 extern int afs_fs_rename(struct afs_server *, struct key *,
 			 struct afs_vnode *, const char *,
-			 struct afs_vnode *, const char *,
-			 const struct afs_wait_mode *);
+			 struct afs_vnode *, const char *, bool);
 extern int afs_fs_store_data(struct afs_server *, struct afs_writeback *,
-			     pgoff_t, pgoff_t, unsigned, unsigned,
-			     const struct afs_wait_mode *);
+			     pgoff_t, pgoff_t, unsigned, unsigned, bool);
 extern int afs_fs_setattr(struct afs_server *, struct key *,
-			  struct afs_vnode *, struct iattr *,
-			  const struct afs_wait_mode *);
+			  struct afs_vnode *, struct iattr *, bool);
 extern int afs_fs_get_volume_status(struct afs_server *, struct key *,
 				    struct afs_vnode *,
-				    struct afs_volume_status *,
-				    const struct afs_wait_mode *);
+				    struct afs_volume_status *, bool);
 extern int afs_fs_set_lock(struct afs_server *, struct key *,
-			   struct afs_vnode *, afs_lock_type_t,
-			   const struct afs_wait_mode *);
+			   struct afs_vnode *, afs_lock_type_t, bool);
 extern int afs_fs_extend_lock(struct afs_server *, struct key *,
-			      struct afs_vnode *,
-			      const struct afs_wait_mode *);
+			      struct afs_vnode *, bool);
 extern int afs_fs_release_lock(struct afs_server *, struct key *,
-			       struct afs_vnode *,
-			       const struct afs_wait_mode *);
+			       struct afs_vnode *, bool);
 
 /*
  * inode.c
@@ -624,8 +594,7 @@ extern struct socket *afs_socket;
 
 extern int afs_open_socket(void);
 extern void afs_close_socket(void);
-extern int afs_make_call(struct in_addr *, struct afs_call *, gfp_t,
-			 const struct afs_wait_mode *);
+extern int afs_make_call(struct in_addr *, struct afs_call *, gfp_t, bool);
 extern struct afs_call *afs_alloc_flat_call(const struct afs_call_type *,
 					    size_t, size_t);
 extern void afs_flat_call_destructor(struct afs_call *);
@@ -681,11 +650,10 @@ extern int afs_get_MAC_address(u8 *, size_t);
  */
 extern int afs_vl_get_entry_by_name(struct in_addr *, struct key *,
 				    const char *, struct afs_cache_vlocation *,
-				    const struct afs_wait_mode *);
+				    bool);
 extern int afs_vl_get_entry_by_id(struct in_addr *, struct key *,
 				  afs_volid_t, afs_voltype_t,
-				  struct afs_cache_vlocation *,
-				  const struct afs_wait_mode *);
+				  struct afs_cache_vlocation *, bool);
 
 /*
  * vlocation.c
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index f26344a8c029..ec1e41f929d1 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -25,29 +25,11 @@ static void afs_free_call(struct afs_call *);
 static void afs_wake_up_call_waiter(struct sock *, struct rxrpc_call *, unsigned long);
 static int afs_wait_for_call_to_complete(struct afs_call *);
 static void afs_wake_up_async_call(struct sock *, struct rxrpc_call *, unsigned long);
-static int afs_dont_wait_for_call_to_complete(struct afs_call *);
 static void afs_process_async_call(struct work_struct *);
 static void afs_rx_new_call(struct sock *, struct rxrpc_call *, unsigned long);
 static void afs_rx_discard_new_call(struct rxrpc_call *, unsigned long);
 static int afs_deliver_cm_op_id(struct afs_call *);
 
-/* synchronous call management */
-const struct afs_wait_mode afs_sync_call = {
-	.notify_rx	= afs_wake_up_call_waiter,
-	.wait		= afs_wait_for_call_to_complete,
-};
-
-/* asynchronous call management */
-const struct afs_wait_mode afs_async_call = {
-	.notify_rx	= afs_wake_up_async_call,
-	.wait		= afs_dont_wait_for_call_to_complete,
-};
-
-/* asynchronous incoming call management */
-static const struct afs_wait_mode afs_async_incoming_call = {
-	.notify_rx	= afs_wake_up_async_call,
-};
-
 /* asynchronous incoming call initial processing */
 static const struct afs_call_type afs_RXCMxxxx = {
 	.name		= "CB.xxxx",
@@ -315,7 +297,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg,
  * initiate a call
  */
 int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
-		  const struct afs_wait_mode *wait_mode)
+		  bool async)
 {
 	struct sockaddr_rxrpc srx;
 	struct rxrpc_call *rxcall;
@@ -332,7 +314,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	       call, call->type->name, key_serial(call->key),
 	       atomic_read(&afs_outstanding_calls));
 
-	call->wait_mode = wait_mode;
+	call->async = async;
 	INIT_WORK(&call->async_work, afs_process_async_call);
 
 	memset(&srx, 0, sizeof(srx));
@@ -347,7 +329,9 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	/* create a call */
 	rxcall = rxrpc_kernel_begin_call(afs_socket, &srx, call->key,
 					 (unsigned long) call, gfp,
-					 wait_mode->notify_rx);
+					 (async ?
+					  afs_wake_up_async_call :
+					  afs_wake_up_call_waiter));
 	call->key = NULL;
 	if (IS_ERR(rxcall)) {
 		ret = PTR_ERR(rxcall);
@@ -386,7 +370,10 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 
 	/* at this point, an async call may no longer exist as it may have
 	 * already completed */
-	return wait_mode->wait(call);
+	if (call->async)
+		return -EINPROGRESS;
+
+	return afs_wait_for_call_to_complete(call);
 
 error_do_abort:
 	rxrpc_kernel_abort_call(afs_socket, rxcall, RX_USER_ABORT, -ret, "KSD");
@@ -549,17 +536,6 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
 }
 
 /*
- * put a call into asynchronous mode
- * - mustn't touch the call descriptor as the call my have completed by the
- *   time we get here
- */
-static int afs_dont_wait_for_call_to_complete(struct afs_call *call)
-{
-	_enter("");
-	return -EINPROGRESS;
-}
-
-/*
  * delete an asynchronous call
  */
 static void afs_delete_async_call(struct work_struct *work)
@@ -587,10 +563,7 @@ static void afs_process_async_call(struct work_struct *work)
 		afs_deliver_to_call(call);
 	}
 
-	if (call->state == AFS_CALL_COMPLETE && call->wait_mode) {
-		if (call->wait_mode->async_complete)
-			call->wait_mode->async_complete(call->reply,
-							call->error);
+	if (call->state == AFS_CALL_COMPLETE) {
 		call->reply = NULL;
 
 		/* kill the call */
@@ -626,10 +599,10 @@ static void afs_charge_preallocation(struct work_struct *work)
 				break;
 
 			INIT_WORK(&call->async_work, afs_process_async_call);
-			call->wait_mode = &afs_async_incoming_call;
 			call->type = &afs_RXCMxxxx;
-			init_waitqueue_head(&call->waitq);
+			call->async = true;
 			call->state = AFS_CALL_AWAIT_OP_ID;
+			init_waitqueue_head(&call->waitq);
 		}
 
 		if (rxrpc_kernel_charge_accept(afs_socket,
diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c
index 94bcd97d22b8..a5e4cc561b6c 100644
--- a/fs/afs/vlclient.c
+++ b/fs/afs/vlclient.c
@@ -147,7 +147,7 @@ int afs_vl_get_entry_by_name(struct in_addr *addr,
 			     struct key *key,
 			     const char *volname,
 			     struct afs_cache_vlocation *entry,
-			     const struct afs_wait_mode *wait_mode)
+			     bool async)
 {
 	struct afs_call *call;
 	size_t volnamesz, reqsz, padsz;
@@ -177,7 +177,7 @@ int afs_vl_get_entry_by_name(struct in_addr *addr,
 		memset((void *) bp + volnamesz, 0, padsz);
 
 	/* initiate the call */
-	return afs_make_call(addr, call, GFP_KERNEL, wait_mode);
+	return afs_make_call(addr, call, GFP_KERNEL, async);
 }
 
 /*
@@ -188,7 +188,7 @@ int afs_vl_get_entry_by_id(struct in_addr *addr,
 			   afs_volid_t volid,
 			   afs_voltype_t voltype,
 			   struct afs_cache_vlocation *entry,
-			   const struct afs_wait_mode *wait_mode)
+			   bool async)
 {
 	struct afs_call *call;
 	__be32 *bp;
@@ -211,5 +211,5 @@ int afs_vl_get_entry_by_id(struct in_addr *addr,
 	*bp   = htonl(voltype);
 
 	/* initiate the call */
-	return afs_make_call(addr, call, GFP_KERNEL, wait_mode);
+	return afs_make_call(addr, call, GFP_KERNEL, async);
 }
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 45a86396fd2d..d7d8dd8c0b31 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -53,7 +53,7 @@ static int afs_vlocation_access_vl_by_name(struct afs_vlocation *vl,
 
 		/* attempt to access the VL server */
 		ret = afs_vl_get_entry_by_name(&addr, key, vl->vldb.name, vldb,
-					       &afs_sync_call);
+					       false);
 		switch (ret) {
 		case 0:
 			goto out;
@@ -111,7 +111,7 @@ static int afs_vlocation_access_vl_by_id(struct afs_vlocation *vl,
 
 		/* attempt to access the VL server */
 		ret = afs_vl_get_entry_by_id(&addr, key, volid, voltype, vldb,
-					     &afs_sync_call);
+					     false);
 		switch (ret) {
 		case 0:
 			goto out;
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index 45aa874f5d32..dcb956143c86 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -358,7 +358,7 @@ int afs_vnode_fetch_status(struct afs_vnode *vnode,
 		       server, ntohl(server->addr.s_addr));
 
 		ret = afs_fs_fetch_file_status(server, key, vnode, NULL,
-					       &afs_sync_call);
+					       false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -421,7 +421,7 @@ int afs_vnode_fetch_data(struct afs_vnode *vnode, struct key *key,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_fetch_data(server, key, vnode, desc,
-					&afs_sync_call);
+					false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -477,7 +477,7 @@ int afs_vnode_create(struct afs_vnode *vnode, struct key *key,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_create(server, key, vnode, name, mode, newfid,
-				    newstatus, newcb, &afs_sync_call);
+				    newstatus, newcb, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -533,7 +533,7 @@ int afs_vnode_remove(struct afs_vnode *vnode, struct key *key, const char *name,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_remove(server, key, vnode, name, isdir,
-				    &afs_sync_call);
+				    false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -595,7 +595,7 @@ int afs_vnode_link(struct afs_vnode *dvnode, struct afs_vnode *vnode,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_link(server, key, dvnode, vnode, name,
-				  &afs_sync_call);
+				  false);
 
 	} while (!afs_volume_release_fileserver(dvnode, server, ret));
 
@@ -659,7 +659,7 @@ int afs_vnode_symlink(struct afs_vnode *vnode, struct key *key,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_symlink(server, key, vnode, name, content,
-				     newfid, newstatus, &afs_sync_call);
+				     newfid, newstatus, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -729,7 +729,7 @@ int afs_vnode_rename(struct afs_vnode *orig_dvnode,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_rename(server, key, orig_dvnode, orig_name,
-				    new_dvnode, new_name, &afs_sync_call);
+				    new_dvnode, new_name, false);
 
 	} while (!afs_volume_release_fileserver(orig_dvnode, server, ret));
 
@@ -795,7 +795,7 @@ int afs_vnode_store_data(struct afs_writeback *wb, pgoff_t first, pgoff_t last,
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
 		ret = afs_fs_store_data(server, wb, first, last, offset, to,
-					&afs_sync_call);
+					false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -847,7 +847,7 @@ int afs_vnode_setattr(struct afs_vnode *vnode, struct key *key,
 
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
-		ret = afs_fs_setattr(server, key, vnode, attr, &afs_sync_call);
+		ret = afs_fs_setattr(server, key, vnode, attr, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -894,7 +894,7 @@ int afs_vnode_get_volume_status(struct afs_vnode *vnode, struct key *key,
 
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
-		ret = afs_fs_get_volume_status(server, key, vnode, vs, &afs_sync_call);
+		ret = afs_fs_get_volume_status(server, key, vnode, vs, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -933,7 +933,7 @@ int afs_vnode_set_lock(struct afs_vnode *vnode, struct key *key,
 
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
-		ret = afs_fs_set_lock(server, key, vnode, type, &afs_sync_call);
+		ret = afs_fs_set_lock(server, key, vnode, type, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -971,7 +971,7 @@ int afs_vnode_extend_lock(struct afs_vnode *vnode, struct key *key)
 
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
-		ret = afs_fs_extend_lock(server, key, vnode, &afs_sync_call);
+		ret = afs_fs_extend_lock(server, key, vnode, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 
@@ -1009,7 +1009,7 @@ int afs_vnode_release_lock(struct afs_vnode *vnode, struct key *key)
 
 		_debug("USING SERVER: %08x\n", ntohl(server->addr.s_addr));
 
-		ret = afs_fs_release_lock(server, key, vnode, &afs_sync_call);
+		ret = afs_fs_release_lock(server, key, vnode, false);
 
 	} while (!afs_volume_release_fileserver(vnode, server, ret));
 

^ permalink raw reply related

* [PATCH net-next 3/4] rxrpc: Allow listen(sock, 0) to be used to disable listening
From: David Howells @ 2017-01-09 14:53 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <148397356909.20445.15077871371099721338.stgit@warthog.procyon.org.uk>

Allow listen() with a backlog of 0 to be used to disable listening on an
AF_RXRPC socket.  This also releases any preallocation, thereby making it
easier for a kernel service to account for all allocated call structures
when shutting down the service.

The socket cannot thereafter have listening reenabled, but must rather be
closed and reopened.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/af_rxrpc.c    |    8 ++++++++
 net/rxrpc/ar-internal.h |    1 +
 net/rxrpc/call_accept.c |    3 ++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 5f63f6dcaabb..199b46e93e64 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -224,6 +224,14 @@ static int rxrpc_listen(struct socket *sock, int backlog)
 		else
 			sk->sk_max_ack_backlog = old;
 		break;
+	case RXRPC_SERVER_LISTENING:
+		if (backlog == 0) {
+			rx->sk.sk_state = RXRPC_SERVER_LISTEN_DISABLED;
+			sk->sk_max_ack_backlog = 0;
+			rxrpc_discard_prealloc(rx);
+			ret = 0;
+			break;
+		}
 	default:
 		ret = -EBUSY;
 		break;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 84927c7b5fdf..12be432be9b2 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -60,6 +60,7 @@ enum {
 	RXRPC_CLIENT_BOUND,		/* client local address bound */
 	RXRPC_SERVER_BOUND,		/* server local address bound */
 	RXRPC_SERVER_LISTENING,		/* server listening for connections */
+	RXRPC_SERVER_LISTEN_DISABLED,	/* server listening disabled */
 	RXRPC_CLOSE,			/* socket is being closed */
 };
 
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 832d854c2d5c..7c4c64ab8da2 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -349,7 +349,8 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local,
 
 found_service:
 	spin_lock(&rx->incoming_lock);
-	if (rx->sk.sk_state == RXRPC_CLOSE) {
+	if (rx->sk.sk_state == RXRPC_SERVER_LISTEN_DISABLED ||
+	    rx->sk.sk_state == RXRPC_CLOSE) {
 		trace_rxrpc_abort("CLS", sp->hdr.cid, sp->hdr.callNumber,
 				  sp->hdr.seq, RX_INVALID_OPERATION, ESHUTDOWN);
 		skb->mark = RXRPC_SKB_MARK_LOCAL_ABORT;

^ permalink raw reply related

* [PATCH net-next 4/4] afs: Refcount the afs_call struct
From: David Howells @ 2017-01-09 14:53 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <148397356909.20445.15077871371099721338.stgit@warthog.procyon.org.uk>

A static checker warning occurs in the AFS filesystem:

	fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
	error: dereferencing freed memory 'call'

due to the reply being sent before we access the server it points to.  The
act of sending the reply causes the call to be freed if an error occurs
(but not if it doesn't).

On top of this, the lifetime handling of afs_call structs is fragile
because they get passed around through workqueues without any sort of
refcounting.

Deal with the issues by:

 (1) Fix the maybe/maybe not nature of the reply sending functions with
     regards to whether they release the call struct.

 (2) Refcount the afs_call struct and sort out places that need to get/put
     references.

 (3) Pass a ref through the work queue and release (or pass on) that ref in
     the work function.  Care has to be taken because a work queue may
     already own a ref to the call.

 (4) Do the cleaning up in the put function only.

 (5) Simplify module cleanup by always incrementing afs_outstanding_calls
     whenever a call is allocated.

 (6) Set the backlog to 0 with kernel_listen() at the beginning of the
     process of closing the socket to prevent new incoming calls from
     occurring and to remove the contribution of preallocated calls from
     afs_outstanding_calls before we wait on it.

A tracepoint is also added to monitor the afs_call refcount and lifetime.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fixes: 08e0e7c82eea: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
---

 fs/afs/cmservice.c         |   41 ++++++------
 fs/afs/internal.h          |    9 ++-
 fs/afs/rxrpc.c             |  153 +++++++++++++++++++++++++++-----------------
 include/trace/events/afs.h |   75 ++++++++++++++++++++++
 4 files changed, 199 insertions(+), 79 deletions(-)

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index a2e1e02005f6..e349a3316303 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -24,6 +24,11 @@ static int afs_deliver_cb_callback(struct afs_call *);
 static int afs_deliver_cb_probe_uuid(struct afs_call *);
 static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *);
 static void afs_cm_destructor(struct afs_call *);
+static void SRXAFSCB_CallBack(struct work_struct *);
+static void SRXAFSCB_InitCallBackState(struct work_struct *);
+static void SRXAFSCB_Probe(struct work_struct *);
+static void SRXAFSCB_ProbeUuid(struct work_struct *);
+static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
 
 #define CM_NAME(name) \
 	const char afs_SRXCB##name##_name[] __tracepoint_string =	\
@@ -38,6 +43,7 @@ static const struct afs_call_type afs_SRXCBCallBack = {
 	.deliver	= afs_deliver_cb_callback,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_CallBack,
 };
 
 /*
@@ -49,6 +55,7 @@ static const struct afs_call_type afs_SRXCBInitCallBackState = {
 	.deliver	= afs_deliver_cb_init_call_back_state,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_InitCallBackState,
 };
 
 /*
@@ -60,6 +67,7 @@ static const struct afs_call_type afs_SRXCBInitCallBackState3 = {
 	.deliver	= afs_deliver_cb_init_call_back_state3,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_InitCallBackState,
 };
 
 /*
@@ -71,6 +79,7 @@ static const struct afs_call_type afs_SRXCBProbe = {
 	.deliver	= afs_deliver_cb_probe,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_Probe,
 };
 
 /*
@@ -82,6 +91,7 @@ static const struct afs_call_type afs_SRXCBProbeUuid = {
 	.deliver	= afs_deliver_cb_probe_uuid,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_ProbeUuid,
 };
 
 /*
@@ -93,6 +103,7 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = {
 	.deliver	= afs_deliver_cb_tell_me_about_yourself,
 	.abort_to_error	= afs_abort_to_error,
 	.destructor	= afs_cm_destructor,
+	.work		= SRXAFSCB_TellMeAboutYourself,
 };
 
 /*
@@ -163,6 +174,7 @@ static void SRXAFSCB_CallBack(struct work_struct *work)
 	afs_send_empty_reply(call);
 
 	afs_break_callbacks(call->server, call->count, call->request);
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -284,9 +296,7 @@ static int afs_deliver_cb_callback(struct afs_call *call)
 		return -ENOTCONN;
 	call->server = server;
 
-	INIT_WORK(&call->work, SRXAFSCB_CallBack);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
 
 /*
@@ -300,6 +310,7 @@ static void SRXAFSCB_InitCallBackState(struct work_struct *work)
 
 	afs_init_callback_state(call->server);
 	afs_send_empty_reply(call);
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -330,9 +341,7 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call)
 		return -ENOTCONN;
 	call->server = server;
 
-	INIT_WORK(&call->work, SRXAFSCB_InitCallBackState);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
 
 /*
@@ -404,9 +413,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call)
 		return -ENOTCONN;
 	call->server = server;
 
-	INIT_WORK(&call->work, SRXAFSCB_InitCallBackState);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
 
 /*
@@ -418,6 +425,7 @@ static void SRXAFSCB_Probe(struct work_struct *work)
 
 	_enter("");
 	afs_send_empty_reply(call);
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -437,9 +445,7 @@ static int afs_deliver_cb_probe(struct afs_call *call)
 	/* no unmarshalling required */
 	call->state = AFS_CALL_REPLYING;
 
-	INIT_WORK(&call->work, SRXAFSCB_Probe);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
 
 /*
@@ -462,6 +468,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *work)
 		reply.match = htonl(1);
 
 	afs_send_simple_reply(call, &reply, sizeof(reply));
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -520,9 +527,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call)
 
 	call->state = AFS_CALL_REPLYING;
 
-	INIT_WORK(&call->work, SRXAFSCB_ProbeUuid);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
 
 /*
@@ -584,7 +589,7 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct *work)
 	reply.cap.capcount = htonl(1);
 	reply.cap.caps[0] = htonl(AFS_CAP_ERROR_TRANSLATION);
 	afs_send_simple_reply(call, &reply, sizeof(reply));
-
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -604,7 +609,5 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call)
 	/* no unmarshalling required */
 	call->state = AFS_CALL_REPLYING;
 
-	INIT_WORK(&call->work, SRXAFSCB_TellMeAboutYourself);
-	queue_work(afs_wq, &call->work);
-	return 0;
+	return afs_queue_call_work(call);
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index b411670d5f67..65504e218d35 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -66,7 +66,7 @@ enum afs_call_state {
 struct afs_call {
 	const struct afs_call_type *type;	/* type of call */
 	wait_queue_head_t	waitq;		/* processes awaiting completion */
-	struct work_struct	async_work;	/* asynchronous work processor */
+	struct work_struct	async_work;	/* async I/O processor */
 	struct work_struct	work;		/* actual work processor */
 	struct rxrpc_call	*rxcall;	/* RxRPC call handle */
 	struct key		*key;		/* security for this call */
@@ -82,6 +82,7 @@ struct afs_call {
 	pgoff_t			first;		/* first page in mapping to deal with */
 	pgoff_t			last;		/* last page in mapping to deal with */
 	size_t			offset;		/* offset into received data store */
+	atomic_t		usage;
 	enum afs_call_state	state;
 	int			error;		/* error code */
 	u32			abort_code;	/* Remote abort ID or 0 */
@@ -115,6 +116,9 @@ struct afs_call_type {
 
 	/* clean up a call */
 	void (*destructor)(struct afs_call *call);
+
+	/* Work function */
+	void (*work)(struct work_struct *work);
 };
 
 /*
@@ -591,9 +595,12 @@ extern void afs_proc_cell_remove(struct afs_cell *);
  * rxrpc.c
  */
 extern struct socket *afs_socket;
+extern atomic_t afs_outstanding_calls;
 
 extern int afs_open_socket(void);
 extern void afs_close_socket(void);
+extern void afs_put_call(struct afs_call *);
+extern int afs_queue_call_work(struct afs_call *);
 extern int afs_make_call(struct in_addr *, struct afs_call *, gfp_t, bool);
 extern struct afs_call *afs_alloc_flat_call(const struct afs_call_type *,
 					    size_t, size_t);
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index ec1e41f929d1..95f42872b787 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -19,9 +19,8 @@
 struct socket *afs_socket; /* my RxRPC socket */
 static struct workqueue_struct *afs_async_calls;
 static struct afs_call *afs_spare_incoming_call;
-static atomic_t afs_outstanding_calls;
+atomic_t afs_outstanding_calls;
 
-static void afs_free_call(struct afs_call *);
 static void afs_wake_up_call_waiter(struct sock *, struct rxrpc_call *, unsigned long);
 static int afs_wait_for_call_to_complete(struct afs_call *);
 static void afs_wake_up_async_call(struct sock *, struct rxrpc_call *, unsigned long);
@@ -112,9 +111,11 @@ void afs_close_socket(void)
 {
 	_enter("");
 
+	kernel_listen(afs_socket, 0);
+	flush_workqueue(afs_async_calls);
+
 	if (afs_spare_incoming_call) {
-		atomic_inc(&afs_outstanding_calls);
-		afs_free_call(afs_spare_incoming_call);
+		afs_put_call(afs_spare_incoming_call);
 		afs_spare_incoming_call = NULL;
 	}
 
@@ -123,7 +124,6 @@ void afs_close_socket(void)
 			 TASK_UNINTERRUPTIBLE);
 	_debug("no outstanding calls");
 
-	flush_workqueue(afs_async_calls);
 	kernel_sock_shutdown(afs_socket, SHUT_RDWR);
 	flush_workqueue(afs_async_calls);
 	sock_release(afs_socket);
@@ -134,44 +134,79 @@ void afs_close_socket(void)
 }
 
 /*
- * free a call
+ * Allocate a call.
  */
-static void afs_free_call(struct afs_call *call)
+static struct afs_call *afs_alloc_call(const struct afs_call_type *type,
+				       gfp_t gfp)
 {
-	_debug("DONE %p{%s} [%d]",
-	       call, call->type->name, atomic_read(&afs_outstanding_calls));
+	struct afs_call *call;
+	int o;
 
-	ASSERTCMP(call->rxcall, ==, NULL);
-	ASSERT(!work_pending(&call->async_work));
-	ASSERT(call->type->name != NULL);
+	call = kzalloc(sizeof(*call), gfp);
+	if (!call)
+		return NULL;
 
-	kfree(call->request);
-	kfree(call);
+	call->type = type;
+	atomic_set(&call->usage, 1);
+	INIT_WORK(&call->async_work, afs_process_async_call);
+	init_waitqueue_head(&call->waitq);
 
-	if (atomic_dec_and_test(&afs_outstanding_calls))
-		wake_up_atomic_t(&afs_outstanding_calls);
+	o = atomic_inc_return(&afs_outstanding_calls);
+	trace_afs_call(call, afs_call_trace_alloc, 1, o,
+		       __builtin_return_address(0));
+	return call;
 }
 
 /*
- * End a call but do not free it
+ * Dispose of a reference on a call.
  */
-static void afs_end_call_nofree(struct afs_call *call)
+void afs_put_call(struct afs_call *call)
 {
-	if (call->rxcall) {
-		rxrpc_kernel_end_call(afs_socket, call->rxcall);
-		call->rxcall = NULL;
+	int n = atomic_dec_return(&call->usage);
+	int o = atomic_read(&afs_outstanding_calls);
+
+	trace_afs_call(call, afs_call_trace_put, n + 1, o,
+		       __builtin_return_address(0));
+
+	ASSERTCMP(n, >=, 0);
+	if (n == 0) {
+		ASSERT(!work_pending(&call->async_work));
+		ASSERT(call->type->name != NULL);
+
+		if (call->rxcall) {
+			rxrpc_kernel_end_call(afs_socket, call->rxcall);
+			call->rxcall = NULL;
+		}
+		if (call->type->destructor)
+			call->type->destructor(call);
+
+		kfree(call->request);
+		kfree(call);
+
+		o = atomic_dec_return(&afs_outstanding_calls);
+		trace_afs_call(call, afs_call_trace_free, 0, o,
+			       __builtin_return_address(0));
+		if (o == 0)
+			wake_up_atomic_t(&afs_outstanding_calls);
 	}
-	if (call->type->destructor)
-		call->type->destructor(call);
 }
 
 /*
- * End a call and free it
+ * Queue the call for actual work.  Returns 0 unconditionally for convenience.
  */
-static void afs_end_call(struct afs_call *call)
+int afs_queue_call_work(struct afs_call *call)
 {
-	afs_end_call_nofree(call);
-	afs_free_call(call);
+	int u = atomic_inc_return(&call->usage);
+
+	trace_afs_call(call, afs_call_trace_work, u,
+		       atomic_read(&afs_outstanding_calls),
+		       __builtin_return_address(0));
+
+	INIT_WORK(&call->work, call->type->work);
+
+	if (!queue_work(afs_wq, &call->work))
+		afs_put_call(call);
+	return 0;
 }
 
 /*
@@ -182,25 +217,19 @@ struct afs_call *afs_alloc_flat_call(const struct afs_call_type *type,
 {
 	struct afs_call *call;
 
-	call = kzalloc(sizeof(*call), GFP_NOFS);
+	call = afs_alloc_call(type, GFP_NOFS);
 	if (!call)
 		goto nomem_call;
 
-	_debug("CALL %p{%s} [%d]",
-	       call, type->name, atomic_read(&afs_outstanding_calls));
-	atomic_inc(&afs_outstanding_calls);
-
-	call->type = type;
-	call->request_size = request_size;
-	call->reply_max = reply_max;
-
 	if (request_size) {
+		call->request_size = request_size;
 		call->request = kmalloc(request_size, GFP_NOFS);
 		if (!call->request)
 			goto nomem_free;
 	}
 
 	if (reply_max) {
+		call->reply_max = reply_max;
 		call->buffer = kmalloc(reply_max, GFP_NOFS);
 		if (!call->buffer)
 			goto nomem_free;
@@ -210,7 +239,7 @@ struct afs_call *afs_alloc_flat_call(const struct afs_call_type *type,
 	return call;
 
 nomem_free:
-	afs_free_call(call);
+	afs_put_call(call);
 nomem_call:
 	return NULL;
 }
@@ -315,7 +344,6 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 	       atomic_read(&afs_outstanding_calls));
 
 	call->async = async;
-	INIT_WORK(&call->async_work, afs_process_async_call);
 
 	memset(&srx, 0, sizeof(srx));
 	srx.srx_family = AF_RXRPC;
@@ -378,7 +406,7 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp,
 error_do_abort:
 	rxrpc_kernel_abort_call(afs_socket, rxcall, RX_USER_ABORT, -ret, "KSD");
 error_kill_call:
-	afs_end_call(call);
+	afs_put_call(call);
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -448,7 +476,7 @@ static void afs_deliver_to_call(struct afs_call *call)
 
 done:
 	if (call->state == AFS_CALL_COMPLETE && call->incoming)
-		afs_end_call(call);
+		afs_put_call(call);
 out:
 	_leave("");
 	return;
@@ -505,7 +533,7 @@ static int afs_wait_for_call_to_complete(struct afs_call *call)
 	}
 
 	_debug("call complete");
-	afs_end_call(call);
+	afs_put_call(call);
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -529,14 +557,25 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
 				   unsigned long call_user_ID)
 {
 	struct afs_call *call = (struct afs_call *)call_user_ID;
+	int u;
 
 	trace_afs_notify_call(rxcall, call);
 	call->need_attention = true;
-	queue_work(afs_async_calls, &call->async_work);
+
+	u = __atomic_add_unless(&call->usage, 1, 0);
+	if (u != 0) {
+		trace_afs_call(call, afs_call_trace_wake, u,
+			       atomic_read(&afs_outstanding_calls),
+			       __builtin_return_address(0));
+
+		if (!queue_work(afs_async_calls, &call->async_work))
+			afs_put_call(call);
+	}
 }
 
 /*
- * delete an asynchronous call
+ * Delete an asynchronous call.  The work item carries a ref to the call struct
+ * that we need to release.
  */
 static void afs_delete_async_call(struct work_struct *work)
 {
@@ -544,13 +583,14 @@ static void afs_delete_async_call(struct work_struct *work)
 
 	_enter("");
 
-	afs_free_call(call);
+	afs_put_call(call);
 
 	_leave("");
 }
 
 /*
- * perform processing on an asynchronous call
+ * Perform I/O processing on an asynchronous call.  The work item carries a ref
+ * to the call struct that we either need to release or to pass on.
  */
 static void afs_process_async_call(struct work_struct *work)
 {
@@ -566,15 +606,16 @@ static void afs_process_async_call(struct work_struct *work)
 	if (call->state == AFS_CALL_COMPLETE) {
 		call->reply = NULL;
 
-		/* kill the call */
-		afs_end_call_nofree(call);
-
-		/* we can't just delete the call because the work item may be
-		 * queued */
+		/* We have two refs to release - one from the alloc and one
+		 * queued with the work item - and we can't just deallocate the
+		 * call because the work item may be queued again.
+		 */
 		call->async_work.func = afs_delete_async_call;
-		queue_work(afs_async_calls, &call->async_work);
+		if (!queue_work(afs_async_calls, &call->async_work))
+			afs_put_call(call);
 	}
 
+	afs_put_call(call);
 	_leave("");
 }
 
@@ -594,12 +635,10 @@ static void afs_charge_preallocation(struct work_struct *work)
 
 	for (;;) {
 		if (!call) {
-			call = kzalloc(sizeof(struct afs_call), GFP_KERNEL);
+			call = afs_alloc_call(&afs_RXCMxxxx, GFP_KERNEL);
 			if (!call)
 				break;
 
-			INIT_WORK(&call->async_work, afs_process_async_call);
-			call->type = &afs_RXCMxxxx;
 			call->async = true;
 			call->state = AFS_CALL_AWAIT_OP_ID;
 			init_waitqueue_head(&call->waitq);
@@ -624,9 +663,8 @@ static void afs_rx_discard_new_call(struct rxrpc_call *rxcall,
 {
 	struct afs_call *call = (struct afs_call *)user_call_ID;
 
-	atomic_inc(&afs_outstanding_calls);
 	call->rxcall = NULL;
-	afs_free_call(call);
+	afs_put_call(call);
 }
 
 /*
@@ -635,7 +673,6 @@ static void afs_rx_discard_new_call(struct rxrpc_call *rxcall,
 static void afs_rx_new_call(struct sock *sk, struct rxrpc_call *rxcall,
 			    unsigned long user_call_ID)
 {
-	atomic_inc(&afs_outstanding_calls);
 	queue_work(afs_wq, &afs_charge_preallocation_work);
 }
 
@@ -699,7 +736,6 @@ void afs_send_empty_reply(struct afs_call *call)
 		rxrpc_kernel_abort_call(afs_socket, call->rxcall,
 					RX_USER_ABORT, ENOMEM, "KOO");
 	default:
-		afs_end_call(call);
 		_leave(" [error]");
 		return;
 	}
@@ -738,7 +774,6 @@ void afs_send_simple_reply(struct afs_call *call, const void *buf, size_t len)
 		rxrpc_kernel_abort_call(afs_socket, call->rxcall,
 					RX_USER_ABORT, ENOMEM, "KOO");
 	}
-	afs_end_call(call);
 	_leave(" [error]");
 }
 
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 845907b04ff4..8b95c16b7045 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -16,6 +16,51 @@
 
 #include <linux/tracepoint.h>
 
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __AFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __AFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+enum afs_call_trace {
+	afs_call_trace_alloc,
+	afs_call_trace_free,
+	afs_call_trace_put,
+	afs_call_trace_wake,
+	afs_call_trace_work,
+};
+
+#endif /* end __AFS_DECLARE_TRACE_ENUMS_ONCE_ONLY */
+
+/*
+ * Declare tracing information enums and their string mappings for display.
+ */
+#define afs_call_traces \
+	EM(afs_call_trace_alloc,		"ALLOC") \
+	EM(afs_call_trace_free,			"FREE ") \
+	EM(afs_call_trace_put,			"PUT  ") \
+	EM(afs_call_trace_wake,			"WAKE ") \
+	E_(afs_call_trace_work,			"WORK ")
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+afs_call_traces;
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b)	{ a, b },
+#define E_(a, b)	{ a, b }
+
 TRACE_EVENT(afs_recv_data,
 	    TP_PROTO(struct afs_call *call, unsigned count, unsigned offset,
 		     bool want_more, int ret),
@@ -103,6 +148,36 @@ TRACE_EVENT(afs_cb_call,
 		      __entry->op)
 	    );
 
+TRACE_EVENT(afs_call,
+	    TP_PROTO(struct afs_call *call, enum afs_call_trace op,
+		     int usage, int outstanding, const void *where),
+
+	    TP_ARGS(call, op, usage, outstanding, where),
+
+	    TP_STRUCT__entry(
+		    __field(struct afs_call *,		call		)
+		    __field(int,			op		)
+		    __field(int,			usage		)
+		    __field(int,			outstanding	)
+		    __field(const void *,		where		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->call = call;
+		    __entry->op = op;
+		    __entry->usage = usage;
+		    __entry->outstanding = outstanding;
+		    __entry->where = where;
+			   ),
+
+	    TP_printk("c=%p %s u=%d o=%d sp=%pSR",
+		      __entry->call,
+		      __print_symbolic(__entry->op, afs_call_traces),
+		      __entry->usage,
+		      __entry->outstanding,
+		      __entry->where)
+	    );
+
 #endif /* _TRACE_AFS_H */
 
 /* This part must be outside protection */

^ permalink raw reply related

* [net-next PATCH 0/3] net: optimize ICMP-reply code path
From: Jesper Dangaard Brouer @ 2017-01-09 15:03 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Jesper Dangaard Brouer; +Cc: xiyou.wangcong

This patchset is optimizing the ICMP-reply code path, for ICMP packets
that gets rate limited. A remote party can easily trigger this code
path by sending packets to port number with no listening service.

Generally the patchset moves the sysctl_icmp_msgs_per_sec ratelimit
checking to earlier in the code path and removes an allocation.

Use-case: The specific case I experienced this being a bottleneck is,
sending UDP packets to a port with no listener, which obviously result
in kernel replying with ICMP Destination Unreachable (type:3), Port
Unreachable (code:3), which cause the bottleneck.

 After Eric and Paolo optimized the UDP socket code, the kernels PPS
processing capabilities is lower for no-listen ports, than normal UDP
sockets.  This is bad for capacity planning when restarting a service.

UDP no-listen benchmark 8xCPUs using pktgen_sample04_many_flows.sh:
 Baseline: 6.6 Mpps
 Patch:   14.7 Mpps
Driver mlx5 at 50Gbit/s.

---

Jesper Dangaard Brouer (3):
      Revert "icmp: avoid allocating large struct on stack"
      net: reduce cycles spend on ICMP replies that gets rate limited
      net: for rate-limited ICMP replies save one atomic operation

 net/ipv4/icmp.c |  125 +++++++++++++++++++++++++++++++++----------------------
 net/ipv6/icmp.c |   68 +++++++++++++++++++++---------
 2 files changed, 123 insertions(+), 70 deletions(-)

^ permalink raw reply

* [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Jesper Dangaard Brouer @ 2017-01-09 15:04 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Jesper Dangaard Brouer; +Cc: xiyou.wangcong
In-Reply-To: <20170109150246.30215.63371.stgit@firesoul>

This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
on stack"), because struct icmp_bxm no really a large struct, and
allocating and free of this small 112 bytes hurts performance.

Fixes: 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/ipv4/icmp.c |   40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 0777ea949223..b4b9807329a7 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -571,7 +571,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
 	struct iphdr *iph;
 	int room;
-	struct icmp_bxm *icmp_param;
+	struct icmp_bxm icmp_param;
 	struct rtable *rt = skb_rtable(skb_in);
 	struct ipcm_cookie ipc;
 	struct flowi4 fl4;
@@ -648,13 +648,9 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		}
 	}
 
-	icmp_param = kmalloc(sizeof(*icmp_param), GFP_ATOMIC);
-	if (!icmp_param)
-		return;
-
 	sk = icmp_xmit_lock(net);
 	if (!sk)
-		goto out_free;
+		return;
 
 	/*
 	 *	Construct source address and options.
@@ -681,7 +677,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 					  iph->tos;
 	mark = IP4_REPLY_MARK(net, skb_in->mark);
 
-	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in))
+	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
 		goto out_unlock;
 
 
@@ -689,22 +685,22 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	 *	Prepare data for ICMP header.
 	 */
 
-	icmp_param->data.icmph.type	 = type;
-	icmp_param->data.icmph.code	 = code;
-	icmp_param->data.icmph.un.gateway = info;
-	icmp_param->data.icmph.checksum	 = 0;
-	icmp_param->skb	  = skb_in;
-	icmp_param->offset = skb_network_offset(skb_in);
+	icmp_param.data.icmph.type	 = type;
+	icmp_param.data.icmph.code	 = code;
+	icmp_param.data.icmph.un.gateway = info;
+	icmp_param.data.icmph.checksum	 = 0;
+	icmp_param.skb	  = skb_in;
+	icmp_param.offset = skb_network_offset(skb_in);
 	inet_sk(sk)->tos = tos;
 	sk->sk_mark = mark;
 	ipc.addr = iph->saddr;
-	ipc.opt = &icmp_param->replyopts.opt;
+	ipc.opt = &icmp_param.replyopts.opt;
 	ipc.tx_flags = 0;
 	ipc.ttl = 0;
 	ipc.tos = -1;
 
 	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos, mark,
-			       type, code, icmp_param);
+			       type, code, &icmp_param);
 	if (IS_ERR(rt))
 		goto out_unlock;
 
@@ -716,21 +712,19 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	room = dst_mtu(&rt->dst);
 	if (room > 576)
 		room = 576;
-	room -= sizeof(struct iphdr) + icmp_param->replyopts.opt.opt.optlen;
+	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
 	room -= sizeof(struct icmphdr);
 
-	icmp_param->data_len = skb_in->len - icmp_param->offset;
-	if (icmp_param->data_len > room)
-		icmp_param->data_len = room;
-	icmp_param->head_len = sizeof(struct icmphdr);
+	icmp_param.data_len = skb_in->len - icmp_param.offset;
+	if (icmp_param.data_len > room)
+		icmp_param.data_len = room;
+	icmp_param.head_len = sizeof(struct icmphdr);
 
-	icmp_push_reply(icmp_param, &fl4, &ipc, &rt);
+	icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
 ende:
 	ip_rt_put(rt);
 out_unlock:
 	icmp_xmit_unlock(sk);
-out_free:
-	kfree(icmp_param);
 out:;
 }
 EXPORT_SYMBOL(icmp_send);

^ permalink raw reply related

* [net-next PATCH 2/3] net: reduce cycles spend on ICMP replies that gets rate limited
From: Jesper Dangaard Brouer @ 2017-01-09 15:04 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Jesper Dangaard Brouer; +Cc: xiyou.wangcong
In-Reply-To: <20170109150246.30215.63371.stgit@firesoul>

This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path.  Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.

The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process.  The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.

Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d5452 ("icmp: add a global rate limitation").  It is
a token bucket limiter with a global lock.  It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/ipv4/icmp.c |   71 +++++++++++++++++++++++++++++++++++++------------------
 net/ipv6/icmp.c |   49 ++++++++++++++++++++++++++------------
 2 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index b4b9807329a7..58d75ca58b83 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -282,6 +282,33 @@ bool icmp_global_allow(void)
 }
 EXPORT_SYMBOL(icmp_global_allow);
 
+static bool icmpv4_mask_allow(struct net *net, int type, int code)
+{
+	if (type > NR_ICMP_TYPES)
+		return true;
+
+	/* Don't limit PMTU discovery. */
+	if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
+		return true;
+
+	/* Limit if icmp type is enabled in ratemask. */
+	if (!((1 << type) & net->ipv4.sysctl_icmp_ratemask))
+		return true;
+
+	return false;
+}
+
+static bool icmpv4_global_allow(struct net *net, int type, int code)
+{
+	if (icmpv4_mask_allow(net, type, code))
+		return true;
+
+	if (icmp_global_allow())
+		return true;
+
+	return false;
+}
+
 /*
  *	Send an ICMP frame.
  */
@@ -290,34 +317,22 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
 			       struct flowi4 *fl4, int type, int code)
 {
 	struct dst_entry *dst = &rt->dst;
+	struct inet_peer *peer;
 	bool rc = true;
+	int vif;
 
-	if (type > NR_ICMP_TYPES)
-		goto out;
-
-	/* Don't limit PMTU discovery. */
-	if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED)
+	if (icmpv4_mask_allow(net, type, code))
 		goto out;
 
 	/* No rate limit on loopback */
 	if (dst->dev && (dst->dev->flags&IFF_LOOPBACK))
 		goto out;
 
-	/* Limit if icmp type is enabled in ratemask. */
-	if (!((1 << type) & net->ipv4.sysctl_icmp_ratemask))
-		goto out;
-
-	rc = false;
-	if (icmp_global_allow()) {
-		int vif = l3mdev_master_ifindex(dst->dev);
-		struct inet_peer *peer;
-
-		peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
-		rc = inet_peer_xrlim_allow(peer,
-					   net->ipv4.sysctl_icmp_ratelimit);
-		if (peer)
-			inet_putpeer(peer);
-	}
+	vif = l3mdev_master_ifindex(dst->dev);
+	peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
+	rc = inet_peer_xrlim_allow(peer, net->ipv4.sysctl_icmp_ratelimit);
+	if (peer)
+		inet_putpeer(peer);
 out:
 	return rc;
 }
@@ -396,6 +411,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	struct inet_sock *inet;
 	__be32 daddr, saddr;
 	u32 mark = IP4_REPLY_MARK(net, skb->mark);
+	int type = icmp_param->data.icmph.type;
+	int code = icmp_param->data.icmph.code;
 
 	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
 		return;
@@ -405,6 +422,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 		return;
 	inet = inet_sk(sk);
 
+	/* global icmp_msgs_per_sec */
+	if (!icmpv4_global_allow(net, type, code))
+		goto out_unlock;
+
 	icmp_param->data.icmph.checksum = 0;
 
 	inet->tos = ip_hdr(skb)->tos;
@@ -433,8 +454,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	rt = ip_route_output_key(net, &fl4);
 	if (IS_ERR(rt))
 		goto out_unlock;
-	if (icmpv4_xrlim_allow(net, rt, &fl4, icmp_param->data.icmph.type,
-			       icmp_param->data.icmph.code))
+	if (icmpv4_xrlim_allow(net, rt, &fl4, type, code))
 		icmp_push_reply(icmp_param, &fl4, &ipc, &rt);
 	ip_rt_put(rt);
 out_unlock:
@@ -650,7 +670,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 
 	sk = icmp_xmit_lock(net);
 	if (!sk)
-		return;
+		goto out;
+
+	/* Check global sysctl_icmp_msgs_per_sec ratelimit */
+	if (!icmpv4_global_allow(net, type, code))
+		goto out_unlock;
 
 	/*
 	 *	Construct source address and options.
@@ -704,6 +728,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	if (IS_ERR(rt))
 		goto out_unlock;
 
+	/* peer icmp_ratelimit */
 	if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
 		goto ende;
 
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 3036f665e6c8..b26ae8b5c1ce 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -168,6 +168,30 @@ static bool is_ineligible(const struct sk_buff *skb)
 	return false;
 }
 
+static bool icmpv6_mask_allow(int type)
+{
+	/* Informational messages are not limited. */
+	if (type & ICMPV6_INFOMSG_MASK)
+		return true;
+
+	/* Do not limit pmtu discovery, it would break it. */
+	if (type == ICMPV6_PKT_TOOBIG)
+		return true;
+
+	return false;
+}
+
+static bool icmpv6_global_allow(int type)
+{
+	if (icmpv6_mask_allow(type))
+		return true;
+
+	if (icmp_global_allow())
+		return true;
+
+	return false;
+}
+
 /*
  * Check the ICMP output rate limit
  */
@@ -178,12 +202,7 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
 	struct dst_entry *dst;
 	bool res = false;
 
-	/* Informational messages are not limited. */
-	if (type & ICMPV6_INFOMSG_MASK)
-		return true;
-
-	/* Do not limit pmtu discovery, it would break it. */
-	if (type == ICMPV6_PKT_TOOBIG)
+	if (icmpv6_mask_allow(type))
 		return true;
 
 	/*
@@ -200,20 +219,16 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
 	} else {
 		struct rt6_info *rt = (struct rt6_info *)dst;
 		int tmo = net->ipv6.sysctl.icmpv6_time;
+		struct inet_peer *peer;
 
 		/* Give more bandwidth to wider prefixes. */
 		if (rt->rt6i_dst.plen < 128)
 			tmo >>= ((128 - rt->rt6i_dst.plen)>>5);
 
-		if (icmp_global_allow()) {
-			struct inet_peer *peer;
-
-			peer = inet_getpeer_v6(net->ipv6.peers,
-					       &fl6->daddr, 1);
-			res = inet_peer_xrlim_allow(peer, tmo);
-			if (peer)
-				inet_putpeer(peer);
-		}
+		peer = inet_getpeer_v6(net->ipv6.peers, &fl6->daddr, 1);
+		res = inet_peer_xrlim_allow(peer, tmo);
+		if (peer)
+			inet_putpeer(peer);
 	}
 	dst_release(dst);
 	return res;
@@ -493,6 +508,10 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	sk = icmpv6_xmit_lock(net);
 	if (!sk)
 		return;
+
+	if (!icmpv6_global_allow(type))
+		goto out;
+
 	sk->sk_mark = mark;
 	np = inet6_sk(sk);
 

^ permalink raw reply related

* [net-next PATCH 3/3] net: for rate-limited ICMP replies save one atomic operation
From: Jesper Dangaard Brouer @ 2017-01-09 15:04 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Jesper Dangaard Brouer; +Cc: xiyou.wangcong
In-Reply-To: <20170109150246.30215.63371.stgit@firesoul>

It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.

The icmp_global_allow() call states it must be called with BH
disabled.  This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/ipv4/icmp.c |   34 +++++++++++++++++++++-------------
 net/ipv6/icmp.c |   25 ++++++++++++++++---------
 2 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 58d75ca58b83..fc310db2708b 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -209,19 +209,17 @@ static struct sock *icmp_sk(struct net *net)
 	return *this_cpu_ptr(net->ipv4.icmp_sk);
 }
 
+/* Called with BH disabled */
 static inline struct sock *icmp_xmit_lock(struct net *net)
 {
 	struct sock *sk;
 
-	local_bh_disable();
-
 	sk = icmp_sk(net);
 
 	if (unlikely(!spin_trylock(&sk->sk_lock.slock))) {
 		/* This can happen if the output path signals a
 		 * dst_link_failure() for an outgoing ICMP packet.
 		 */
-		local_bh_enable();
 		return NULL;
 	}
 	return sk;
@@ -229,7 +227,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
 
 static inline void icmp_xmit_unlock(struct sock *sk)
 {
-	spin_unlock_bh(&sk->sk_lock.slock);
+	spin_unlock(&sk->sk_lock.slock);
 }
 
 int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
@@ -417,14 +415,17 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
 		return;
 
-	sk = icmp_xmit_lock(net);
-	if (!sk)
-		return;
-	inet = inet_sk(sk);
+	/* Needed by both icmp_global_allow and icmp_xmit_lock */
+	local_bh_disable();
 
 	/* global icmp_msgs_per_sec */
 	if (!icmpv4_global_allow(net, type, code))
-		goto out_unlock;
+		goto out_bh_enable;
+
+	sk = icmp_xmit_lock(net);
+	if (!sk)
+		goto out_bh_enable;
+	inet = inet_sk(sk);
 
 	icmp_param->data.icmph.checksum = 0;
 
@@ -459,6 +460,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	ip_rt_put(rt);
 out_unlock:
 	icmp_xmit_unlock(sk);
+out_bh_enable:
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
@@ -668,13 +671,16 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		}
 	}
 
-	sk = icmp_xmit_lock(net);
-	if (!sk)
-		goto out;
+	/* Needed by both icmp_global_allow and icmp_xmit_lock */
+	local_bh_disable();
 
 	/* Check global sysctl_icmp_msgs_per_sec ratelimit */
 	if (!icmpv4_global_allow(net, type, code))
-		goto out_unlock;
+		goto out_bh_enable;
+
+	sk = icmp_xmit_lock(net);
+	if (!sk)
+		goto out_bh_enable;
 
 	/*
 	 *	Construct source address and options.
@@ -750,6 +756,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	ip_rt_put(rt);
 out_unlock:
 	icmp_xmit_unlock(sk);
+out_bh_enable:
+	local_bh_enable();
 out:;
 }
 EXPORT_SYMBOL(icmp_send);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index b26ae8b5c1ce..230b5aac9f03 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -110,19 +110,17 @@ static const struct inet6_protocol icmpv6_protocol = {
 	.flags		=	INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
 };
 
+/* Called with BH disabled */
 static __inline__ struct sock *icmpv6_xmit_lock(struct net *net)
 {
 	struct sock *sk;
 
-	local_bh_disable();
-
 	sk = icmpv6_sk(net);
 	if (unlikely(!spin_trylock(&sk->sk_lock.slock))) {
 		/* This can happen if the output path (f.e. SIT or
 		 * ip6ip6 tunnel) signals dst_link_failure() for an
 		 * outgoing ICMP6 packet.
 		 */
-		local_bh_enable();
 		return NULL;
 	}
 	return sk;
@@ -130,7 +128,7 @@ static __inline__ struct sock *icmpv6_xmit_lock(struct net *net)
 
 static __inline__ void icmpv6_xmit_unlock(struct sock *sk)
 {
-	spin_unlock_bh(&sk->sk_lock.slock);
+	spin_unlock(&sk->sk_lock.slock);
 }
 
 /*
@@ -489,6 +487,13 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 		return;
 	}
 
+	/* Needed by both icmp_global_allow and icmpv6_xmit_lock */
+	local_bh_disable();
+
+	/* Check global sysctl_icmp_msgs_per_sec ratelimit */
+	if (!icmpv6_global_allow(type))
+		goto out_bh_enable;
+
 	mip6_addr_swap(skb);
 
 	memset(&fl6, 0, sizeof(fl6));
@@ -507,10 +512,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 
 	sk = icmpv6_xmit_lock(net);
 	if (!sk)
-		return;
-
-	if (!icmpv6_global_allow(type))
-		goto out;
+		goto out_bh_enable;
 
 	sk->sk_mark = mark;
 	np = inet6_sk(sk);
@@ -571,6 +573,8 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	dst_release(dst);
 out:
 	icmpv6_xmit_unlock(sk);
+out_bh_enable:
+	local_bh_enable();
 }
 
 /* Slightly more convenient version of icmp6_send.
@@ -684,9 +688,10 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 	fl6.flowi6_uid = sock_net_uid(net, NULL);
 	security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
 
+	local_bh_disable();
 	sk = icmpv6_xmit_lock(net);
 	if (!sk)
-		return;
+		goto out_bh_enable;
 	sk->sk_mark = mark;
 	np = inet6_sk(sk);
 
@@ -728,6 +733,8 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 	dst_release(dst);
 out:
 	icmpv6_xmit_unlock(sk);
+out_bh_enable:
+	local_bh_enable();
 }
 
 void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)

^ permalink raw reply related

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Vivien Didelot @ 2017-01-09 15:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <20170109073236.GA1862@nanopsycho>

Hi Jiri,

Jiri Pirko <jiri@resnulli.us> writes:

>>    # cat /etc/udev/rules.d/90-net-dsa.rules
>>    SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", NAME="$result"
>
> I know this is kind of confusing, but phys_port_id is to be used to
> indicate same physical port that is shared by multiple netdevices- for
> example sr-iov usecase. For switchdev usecase, you should use
> phys_port_name.
>
> I will add some documentation to kernel regarding this. But I see that
> net/dsa/slave.c already implements .ndo_get_phys_port_id :(
>
> I recently made changes in udev so it names the switch ports according
> to phys_port_name, out of the box, without need for any rules:
> https://github.com/systemd/systemd/pull/4506/commits/c960caa0c2a620fc506c6f0f7b6c40eeace48e4d

Thanks for the details. So if I understand correctly, what will be found
in phys_port_name will be used as is by udev to name the interface?

Extra question: shouldn't phys_port_{id,name} be switchdev attributes in
addition to SWITCHDEV_ATTR_ID_PORT_PARENT_ID?

> I guess that it should be enough for you to implement
> ndo_get_phys_port_name.

Well, if this name must be unique on a system, it's not likely to happen
until we agree that we use an ugly tXsYpZ template where X is a tree ID,
or we assign system-wide unique IDs to switches, which requires a bit of
changes.

I'm thinking, since DSA slaves are switchdev users, can't we all use a
switchdev helper to optionally get a system-wide unique name for a given
switch port? e.g. a template such as "swp%d"? I'd prefer that switchdev
and DSA do not diverge much when it comes to implement such attributes.

But again, this is not related to this patch ;-)

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next 0/4] net: dsa: Make dsa_switch_ops const
From: Vivien Didelot @ 2017-01-09 15:11 UTC (permalink / raw)
  To: Florian Fainelli, netdev; +Cc: davem, andrew, Florian Fainelli
In-Reply-To: <20170108225208.15431-1-f.fainelli@gmail.com>

Hi Florian,

Florian Fainelli <f.fainelli@gmail.com> writes:

> This patch series allows us to annotate dsa_switch_ops with a const
> qualifier.
>
> Florian Fainelli (4):
>   net: dsa: b53: Export most operations to other drivers
>   net: dsa: bcm_sf2: Declare our own dsa_switch_ops
>   net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver
>   net: dsa: Make dsa_switch_ops const

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

This is much cleaner now, thanks!

     Vivien

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Jiri Pirko @ 2017-01-09 15:11 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <877f6446lp.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

Mon, Jan 09, 2017 at 04:04:50PM CET, vivien.didelot@savoirfairelinux.com wrote:
>Hi Jiri,
>
>Jiri Pirko <jiri@resnulli.us> writes:
>
>>>    # cat /etc/udev/rules.d/90-net-dsa.rules
>>>    SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", NAME="$result"
>>
>> I know this is kind of confusing, but phys_port_id is to be used to
>> indicate same physical port that is shared by multiple netdevices- for
>> example sr-iov usecase. For switchdev usecase, you should use
>> phys_port_name.
>>
>> I will add some documentation to kernel regarding this. But I see that
>> net/dsa/slave.c already implements .ndo_get_phys_port_id :(
>>
>> I recently made changes in udev so it names the switch ports according
>> to phys_port_name, out of the box, without need for any rules:
>> https://github.com/systemd/systemd/pull/4506/commits/c960caa0c2a620fc506c6f0f7b6c40eeace48e4d
>
>Thanks for the details. So if I understand correctly, what will be found
>in phys_port_name will be used as is by udev to name the interface?

Yes.


>
>Extra question: shouldn't phys_port_{id,name} be switchdev attributes in

Again, phys_port_id has nothing to do with switches. Should be removed
from dsa because its use there is incorrect.


>addition to SWITCHDEV_ATTR_ID_PORT_PARENT_ID?

Perhaps it could. Now it is a netdev op. I thinks it works nicely.


>
>> I guess that it should be enough for you to implement
>> ndo_get_phys_port_name.
>
>Well, if this name must be unique on a system, it's not likely to happen
>until we agree that we use an ugly tXsYpZ template where X is a tree ID,
>or we assign system-wide unique IDs to switches, which requires a bit of
>changes.

No. That should be unique within one switch. In mlxsw we name it "p1",
"p2", ...

The final netdev names are:
enp3s0np1, enp3s0np2, ...


>
>I'm thinking, since DSA slaves are switchdev users, can't we all use a
>switchdev helper to optionally get a system-wide unique name for a given
>switch port? e.g. a template such as "swp%d"? I'd prefer that switchdev
>and DSA do not diverge much when it comes to implement such attributes.

Again, not system-wide, just switch-wide.


>
>But again, this is not related to this patch ;-)

It is! You are using phys_port_id, which is completely wrong. You should
not use it.

^ permalink raw reply

* [PATCH] vhost: scsi: constify target_core_fabric_ops structures
From: Bhumika Goyal @ 2017-01-09 15:21 UTC (permalink / raw)
  To: julia.lawall, mst, jasowang, kvm, virtualization, netdev,
	linux-kernel
  Cc: Bhumika Goyal

Declare target_core_fabric_ops strucrues as const as they are only
passed as an argument to the functions target_register_template and 
target_unregister_template. The arguments are of type const struct 
target_core_fabric_ops *, so target_core_fabric_ops structures having 
this property can be declared const.
Done using Coccinelle:

@r disable optional_qualifier@
identifier i;
position p;
@@
static struct target_core_fabric_ops i@p={...};

@ok@
position p;
identifier r.i;
@@
(
target_register_template(&i@p)
|
target_unregister_template(&i@p)
)
@bad@
position p!={r.p,ok.p};
identifier r.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
+const
struct target_core_fabric_ops i;

File size before: drivers/vhost/scsi.o
   text	   data	    bss	    dec	    hex	filename
  18063	   2985	     40	  21088	   5260	drivers/vhost/scsi.o

File size after: drivers/vhost/scsi.o
   text	   data	    bss	    dec	    hex	filename
  18479	   2601	     40	  21120	   5280	drivers/vhost/scsi.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
---
 drivers/vhost/scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 253310c..620366d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2087,7 +2087,7 @@ static void vhost_scsi_drop_tport(struct se_wwn *wwn)
 	NULL,
 };
 
-static struct target_core_fabric_ops vhost_scsi_ops = {
+static const struct target_core_fabric_ops vhost_scsi_ops = {
 	.module				= THIS_MODULE,
 	.name				= "vhost",
 	.get_fabric_name		= vhost_scsi_get_fabric_name,
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next] bridge: multicast to unicast
From: michael-dev @ 2017-01-09 15:25 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Linus Lüssing, Felix Fietkau, netdev, David S . Miller,
	Stephen Hemminger, bridge, linux-kernel, linux-wireless
In-Reply-To: <1483964159.17582.34.camel@sipsolutions.net>

Am 09.01.2017 13:15, schrieb Johannes Berg:
>> That is bridge fdb entries (need to) expire so the bridge might
>> "forget" a still-connected station not sending but only consuming
>> broadcast traffic.
> 
> Ok, that I don't know. Somehow if you address a unicast packet there
> the bridge has to make a decision - so it really should know?

If the bridge has not learned the unicast destination mac address on any 
port, it will flood the packet on all ports except the port it received 
the packet on.

Regards,
M. Braun

^ permalink raw reply

* [PATCHv3 1/6] sh_eth: use correct name for ECMR_MPDE bit
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

This bit was wrongly named due to a typo, Sergei checked the SH7734/63
manuals and this bit should be named MPDE.

Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/net/ethernet/renesas/sh_eth.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index 9eec1e185adf..11798d5c4a1d 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -339,7 +339,7 @@ enum FELIC_MODE_BIT {
 	ECMR_DPAD = 0x00200000, ECMR_RZPF = 0x00100000,
 	ECMR_ZPF = 0x00080000, ECMR_PFR = 0x00040000, ECMR_RXF = 0x00020000,
 	ECMR_TXF = 0x00010000, ECMR_MCT = 0x00002000, ECMR_PRCEF = 0x00001000,
-	ECMR_PMDE = 0x00000200, ECMR_RE = 0x00000040, ECMR_TE = 0x00000020,
+	ECMR_MPDE = 0x00000200, ECMR_RE = 0x00000040, ECMR_TE = 0x00000020,
 	ECMR_RTM = 0x00000010, ECMR_ILB = 0x00000008, ECMR_ELB = 0x00000004,
 	ECMR_DM = 0x00000002, ECMR_PRM = 0x00000001,
 };
-- 
2.11.0


^ permalink raw reply related

* [PATCHv3 2/6] sh_eth: add generic wake-on-lan support via magic packet
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

Add generic functionality to support Wake-on-LAN using MagicPacket which
are supported by at least a few versions of sh_eth. Only add
functionality for WoL, no specific sh_eth versions are marked to support
WoL yet.

WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that sh_eth was not suspended to reduce resume time. To reset the
hardware the driver closes and reopens the device just like it would do
in a normal suspend/resume scenario without WoL enabled, but it both
closes and opens the device in the resume callback since the device
needs to be open for WoL to work.

One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/net/ethernet/renesas/sh_eth.c | 114 +++++++++++++++++++++++++++++++---
 drivers/net/ethernet/renesas/sh_eth.h |   3 +
 2 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 8a784dce45fa..542c92b57b35 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1552,6 +1552,8 @@ static void sh_eth_emac_interrupt(struct net_device *ndev)
 			sh_eth_rcv_snd_enable(ndev);
 		}
 	}
+	if (felic_stat & ECSR_MPD)
+		pm_wakeup_event(&mdp->pdev->dev, 0);
 }
 
 /* error control function */
@@ -2201,6 +2203,33 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
 	return 0;
 }
 
+static void sh_eth_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+	struct sh_eth_private *mdp = netdev_priv(ndev);
+
+	wol->supported = 0;
+	wol->wolopts = 0;
+
+	if (mdp->cd->magic && mdp->clk) {
+		wol->supported = WAKE_MAGIC;
+		wol->wolopts = mdp->wol_enabled ? WAKE_MAGIC : 0;
+	}
+}
+
+static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+	struct sh_eth_private *mdp = netdev_priv(ndev);
+
+	if (!mdp->cd->magic || !mdp->clk || wol->wolopts & ~WAKE_MAGIC)
+		return -EOPNOTSUPP;
+
+	mdp->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
+
+	device_set_wakeup_enable(&mdp->pdev->dev, mdp->wol_enabled);
+
+	return 0;
+}
+
 static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_regs_len	= sh_eth_get_regs_len,
 	.get_regs	= sh_eth_get_regs,
@@ -2215,6 +2244,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.set_ringparam	= sh_eth_set_ringparam,
 	.get_link_ksettings = sh_eth_get_link_ksettings,
 	.set_link_ksettings = sh_eth_set_link_ksettings,
+	.get_wol	= sh_eth_get_wol,
+	.set_wol	= sh_eth_set_wol,
 };
 
 /* network device open function */
@@ -3017,6 +3048,11 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 		goto out_release;
 	}
 
+	/* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
+	mdp->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(mdp->clk))
+		mdp->clk = NULL;
+
 	ndev->base_addr = res->start;
 
 	spin_lock_init(&mdp->lock);
@@ -3111,6 +3147,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_napi_del;
 
+	if (mdp->cd->magic && mdp->clk)
+		device_set_wakeup_capable(&pdev->dev, 1);
+
 	/* print device information */
 	netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d.\n",
 		    (u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
@@ -3150,15 +3189,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
 
 #ifdef CONFIG_PM
 #ifdef CONFIG_PM_SLEEP
+static int sh_eth_wol_setup(struct net_device *ndev)
+{
+	struct sh_eth_private *mdp = netdev_priv(ndev);
+
+	/* Only allow ECI interrupts */
+	synchronize_irq(ndev->irq);
+	napi_disable(&mdp->napi);
+	sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
+
+	/* Enable MagicPacket */
+	sh_eth_modify(ndev, ECMR, 0, ECMR_MPDE);
+
+	/* Increased clock usage so device won't be suspended */
+	clk_enable(mdp->clk);
+
+	return enable_irq_wake(ndev->irq);
+}
+
+static int sh_eth_wol_restore(struct net_device *ndev)
+{
+	struct sh_eth_private *mdp = netdev_priv(ndev);
+	int ret;
+
+	napi_enable(&mdp->napi);
+
+	/* Disable MagicPacket */
+	sh_eth_modify(ndev, ECMR, ECMR_MPDE, 0);
+
+	/* The device needs to be reset to restore MagicPacket logic
+	 * for next wakeup. If we close and open the device it will
+	 * both be reset and all registers restored. This is what
+	 * happens during suspend and resume without WoL enabled.
+	 */
+	ret = sh_eth_close(ndev);
+	if (ret < 0)
+		return ret;
+	ret = sh_eth_open(ndev);
+	if (ret < 0)
+		return ret;
+
+	/* Restore clock usage count */
+	clk_disable(mdp->clk);
+
+	return disable_irq_wake(ndev->irq);
+}
+
 static int sh_eth_suspend(struct device *dev)
 {
 	struct net_device *ndev = dev_get_drvdata(dev);
+	struct sh_eth_private *mdp = netdev_priv(ndev);
 	int ret = 0;
 
-	if (netif_running(ndev)) {
-		netif_device_detach(ndev);
+	if (!netif_running(ndev))
+		return 0;
+
+	netif_device_detach(ndev);
+
+	if (mdp->wol_enabled)
+		ret = sh_eth_wol_setup(ndev);
+	else
 		ret = sh_eth_close(ndev);
-	}
 
 	return ret;
 }
@@ -3166,14 +3257,21 @@ static int sh_eth_suspend(struct device *dev)
 static int sh_eth_resume(struct device *dev)
 {
 	struct net_device *ndev = dev_get_drvdata(dev);
+	struct sh_eth_private *mdp = netdev_priv(ndev);
 	int ret = 0;
 
-	if (netif_running(ndev)) {
+	if (!netif_running(ndev))
+		return 0;
+
+	if (mdp->wol_enabled)
+		ret = sh_eth_wol_restore(ndev);
+	else
 		ret = sh_eth_open(ndev);
-		if (ret < 0)
-			return ret;
-		netif_device_attach(ndev);
-	}
+
+	if (ret < 0)
+		return ret;
+
+	netif_device_attach(ndev);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index 11798d5c4a1d..96b5459d131f 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -493,6 +493,7 @@ struct sh_eth_cpu_data {
 	unsigned shift_rd0:1;	/* shift Rx descriptor word 0 right by 16 */
 	unsigned rmiimode:1;	/* EtherC has RMIIMODE register */
 	unsigned rtrate:1;	/* EtherC has RTRATE register */
+	unsigned magic:1;	/* EtherC has ECMR.MPDE and ECSR.MPD */
 };
 
 struct sh_eth_private {
@@ -501,6 +502,7 @@ struct sh_eth_private {
 	const u16 *reg_offset;
 	void __iomem *addr;
 	void __iomem *tsu_addr;
+	struct clk *clk;
 	u32 num_rx_ring;
 	u32 num_tx_ring;
 	dma_addr_t rx_desc_dma;
@@ -529,6 +531,7 @@ struct sh_eth_private {
 	unsigned no_ether_link:1;
 	unsigned ether_link_active_low:1;
 	unsigned is_opened:1;
+	unsigned wol_enabled:1;
 };
 
 static inline void sh_eth_soft_swap(char *src, int len)
-- 
2.11.0

^ permalink raw reply related

* [PATCHv3 5/6] sh_eth: enable wake-on-lan for sh7734
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

This is based on public datasheet for sh7734 which shows it has the
same behavior and registers for WoL as other versions of sh_eth.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/net/ethernet/renesas/sh_eth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 67cb6b3793ca..5292022fe6b1 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -819,6 +819,7 @@ static struct sh_eth_cpu_data sh7734_data = {
 	.hw_crc		= 1,
 	.select_mii	= 1,
 	.shift_rd0	= 1,
+	.magic		= 1,
 };
 
 /* SH7763 */
-- 
2.11.0

^ permalink raw reply related

* [PATCHv3 6/6] sh_eth: enable wake-on-lan for sh7763
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

This is based on public datasheet for sh7763 which shows it has the
same behavior and registers for WoL as other versions of sh_eth.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/net/ethernet/renesas/sh_eth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 5292022fe6b1..ecf0f7997a72 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -847,6 +847,7 @@ static struct sh_eth_cpu_data sh7763_data = {
 	.no_ade		= 1,
 	.tsu		= 1,
 	.irq_flags	= IRQF_SHARED,
+	.magic		= 1,
 };
 
 static struct sh_eth_cpu_data sh7619_data = {
-- 
2.11.0


^ permalink raw reply related

* [PATCHv3 0/6] sh_eth: add wake-on-lan support via magic packet
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund

Hi,

This series adds support for Wake-on-Lan using Magic Packet for a few
models of the sh_eth driver. Patch 1/6 fix a naming error, patch 2/6 
adds generic support to control and support WoL while patches 3/6 - 6/6 
enable different models.

Based ontop of net-next master.

Changes since v2.
- Fix bookkeeping for "active_count" and "event_count" reported in 
  /sys/kernel/debug/wakeup_sources. Thanks Geert for noticing this.
- Add new patch 1/6 which corrects the name of ECMR_MPDE bit, suggested 
  by Sergei.
- s/sh7743/sh7734/ in patch 5/6. Thanks Geert for spotting this.
- Spelling improvements suggested by Sergei and Geert.
- Add Tested-by to 3/6 and 4/6.

Changes since v1.
- Split generic WoL functionality and device enablement to different
  patches.
- Enable more devices then Gen2 after feedback from Geert and
  datasheets.
- Do not set mdp->irq_enabled = false and remove specific MagicPacket
  interrupt clearing, instead let sh_eth_error() clear the interrupt as
  for other EMAC interrupts, thanks Sergei for the suggestion.
- Use the original return logic in sh_eth_resume().
- Moved sh_eth_private variable *clk to top of data structure  to avoid
  possible gaps due to alignment restrictions.
- Make wol_enabled in sh_eth_private part of the already existing
  bitfield instead of a bool.
- Do not initiate mdp->wol_enabled to 0, the struct is kzalloc'ed so
  it's already set to 0.

Niklas Söderlund (6):
  sh_eth: use correct name for ECMR_MPDE bit
  sh_eth: add generic wake-on-lan support via magic packet
  sh_eth: enable wake-on-lan for R-Car Gen2 devices
  sh_eth: enable wake-on-lan for r8a7740/armadillo
  sh_eth: enable wake-on-lan for sh7734
  sh_eth: enable wake-on-lan for sh7763

 drivers/net/ethernet/renesas/sh_eth.c | 123 +++++++++++++++++++++++++++++++---
 drivers/net/ethernet/renesas/sh_eth.h |   5 +-
 2 files changed, 117 insertions(+), 11 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCHv3 4/6] sh_eth: enable wake-on-lan for r8a7740/armadillo
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

Geert Uytterhoeven reported WoL worked on his Armadillo board.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/renesas/sh_eth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index afaf8785ed72..67cb6b3793ca 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -577,6 +577,7 @@ static struct sh_eth_cpu_data r8a7740_data = {
 	.tsu		= 1,
 	.select_mii	= 1,
 	.shift_rd0	= 1,
+	.magic		= 1,
 };
 
 /* There is CPU dependent code */
-- 
2.11.0

^ permalink raw reply related

* [PATCHv3 3/6] sh_eth: enable wake-on-lan for R-Car Gen2 devices
From: Niklas Söderlund @ 2017-01-09 15:34 UTC (permalink / raw)
  To: Sergei Shtylyov, Simon Horman, netdev, linux-renesas-soc
  Cc: Geert Uytterhoeven, linux-pm, Niklas Söderlund
In-Reply-To: <20170109153409.13956-1-niklas.soderlund+renesas@ragnatech.se>

Tested on Gen2 r8a7791/Koelsch.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/renesas/sh_eth.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 542c92b57b35..afaf8785ed72 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -623,8 +623,9 @@ static struct sh_eth_cpu_data r8a779x_data = {
 
 	.register_type	= SH_ETH_REG_FAST_RCAR,
 
-	.ecsr_value	= ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD,
-	.ecsipr_value	= ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP,
+	.ecsr_value	= ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD,
+	.ecsipr_value	= ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP |
+			  ECSIPR_MPDIP,
 	.eesipr_value	= 0x01ff009f,
 
 	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
@@ -639,6 +640,7 @@ static struct sh_eth_cpu_data r8a779x_data = {
 	.tpauser	= 1,
 	.hw_swap	= 1,
 	.rmiimode	= 1,
+	.magic		= 1,
 };
 #endif /* CONFIG_OF */
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Vivien Didelot @ 2017-01-09 15:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <20170109151131.GC1862@nanopsycho>

Hi Jiri,

Jiri Pirko <jiri@resnulli.us> writes:

>>Extra question: shouldn't phys_port_{id,name} be switchdev attributes in
>
> Again, phys_port_id has nothing to do with switches. Should be removed
> from dsa because its use there is incorrect.

Florian, since 3a543ef just got in, can it be reverted?

>>> I guess that it should be enough for you to implement
>>> ndo_get_phys_port_name.
>>
>>Well, if this name must be unique on a system, it's not likely to happen
>>until we agree that we use an ugly tXsYpZ template where X is a tree ID,
>>or we assign system-wide unique IDs to switches, which requires a bit of
>>changes.
>
> No. That should be unique within one switch. In mlxsw we name it "p1",
> "p2", ...
>
> The final netdev names are:
> enp3s0np1, enp3s0np2, ...

OK perfect then, "p%d" sounds good. You seems to avoid "p0" in mlxsw, is
there a reason for that?

>>But again, this is not related to this patch ;-)
>
> It is! You are using phys_port_id, which is completely wrong. You should
> not use it.

I can resend this patch without the udev examples in the commit message
if that can be less confusing.

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next] bridge: multicast to unicast
From: Johannes Berg @ 2017-01-09 15:47 UTC (permalink / raw)
  To: michael-dev
  Cc: netdev, bridge, linux-wireless, linux-kernel, David S . Miller,
	Felix Fietkau
In-Reply-To: <ebc5050e12c04273fa518e35d7f0b8bc@fami-braun.de>

On Mon, 2017-01-09 at 16:25 +0100, michael-dev wrote:
> Am 09.01.2017 13:15, schrieb Johannes Berg:
> > > That is bridge fdb entries (need to) expire so the bridge might
> > > "forget" a still-connected station not sending but only consuming
> > > broadcast traffic.
> > 
> > Ok, that I don't know. Somehow if you address a unicast packet
> > there
> > the bridge has to make a decision - so it really should know?
> 
> If the bridge has not learned the unicast destination mac address on
> any port, it will flood the packet on all ports except the port it
> received the packet on.

Ok, so this really needs to be done in mac80211.

I'll send out the pull request soon then.

johannes

^ permalink raw reply

* Re: [for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR
From: David Miller @ 2017-01-09 15:47 UTC (permalink / raw)
  To: saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA, saeedm-VPRAkNaXOzVWk0Htik3J/w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, leonro-VPRAkNaXOzVWk0Htik3J/w,
	eli-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
	leon-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <CALzJLG-yiZfa6J0WciFPowfF2LWrXAqCKc4ETu4x=ZJdn+CfDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Saeed Mahameed <saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Date: Mon, 9 Jan 2017 10:31:36 +0200

> We will submit an incremental patch for this, as checkpatch doesn't
> complain about such minor things.

Please fix this and resubmit the series.

Checkpatch not complaining is not an argument for fixing up coding
style issues reported to you in feedback.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv2 net-next 0/5] sctp: add support for generating stream reconf chunks
From: David Miller @ 2017-01-09 15:53 UTC (permalink / raw)
  To: nhorman; +Cc: lucien.xin, netdev, linux-sctp, marcelo.leitner, vyasevich
In-Reply-To: <20170109124325.GB20973@hmswarspite.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 9 Jan 2017 07:43:25 -0500

> These all look reasonably good, but it seems before we accept them,
> there should be an additional patch that actually makes use of the code.
> I presume that is forthcomming?

This all comes from my asking that the original huge set of patches be
split up.

People always just rush this kind of work and never think about laying
out the resubmission properly.

One should always only submit new interfaces along with an actual use
because only with a use can we properly review whether the new
interface is good or not.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox