* [PATCH 2/3] rxrpc: The RXRPC_ACCEPT control message should not have an address
From: David Howells @ 2016-04-12 15:05 UTC (permalink / raw)
To: linux-afs; +Cc: dhowells, netdev, linux-kernel
In-Reply-To: <20160412150533.20637.23952.stgit@warthog.procyon.org.uk>
When sendmsg() is called with the RXRPC_ACCEPT control message, sendmsg()
shouldn't also be given an address in msg_name.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/ar-output.c b/net/rxrpc/ar-output.c
index b87fda075b45..044de9bf34a4 100644
--- a/net/rxrpc/ar-output.c
+++ b/net/rxrpc/ar-output.c
@@ -199,7 +199,8 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len)
return ret;
if (cmd == RXRPC_CMD_ACCEPT) {
- if (rx->sk.sk_state != RXRPC_SERVER_LISTENING)
+ if (rx->sk.sk_state != RXRPC_SERVER_LISTENING ||
+ msg->msg_name)
return -EINVAL;
call = rxrpc_accept_call(rx, user_call_ID);
if (IS_ERR(call))
^ permalink raw reply related
* [PATCH 1/3] rxrpc: Don't permit use of connect() op and simplify sendmsg() op
From: David Howells @ 2016-04-12 15:05 UTC (permalink / raw)
To: linux-afs; +Cc: dhowells, netdev, linux-kernel
In-Reply-To: <20160412150533.20637.23952.stgit@warthog.procyon.org.uk>
Simplify the RxRPC user interface and remove the use of connect() to direct
client calls. It is redundant given that sendmsg() can be given the target
address and calls to multiple targets are permitted from a client socket
and also from a service socket.
Simplify sendmsg() also. If we can't find a call immediately, we create
one, as now, but if the call then exists when we try and add it, we give an
error rather than using the call we found at the second attempt. We should
never see this situation unless two threads are racing, trying to create a
call with the same ID - which would be an error.
It also isn't required to provide sendmsg() with an address - provided the
control message data holds a user ID that maps to a currently active call.
Signed-off-by: David Howells <dhowells@redhat.com>
---
Documentation/networking/rxrpc.txt | 8 --
include/linux/rxrpc.h | 18 ++-
net/rxrpc/af_rxrpc.c | 185 +++++++++---------------------------
net/rxrpc/ar-call.c | 158 ++++++++++++-------------------
net/rxrpc/ar-connection.c | 17 ---
net/rxrpc/ar-internal.h | 21 ++--
net/rxrpc/ar-output.c | 186 +++++++++++++++++-------------------
7 files changed, 219 insertions(+), 374 deletions(-)
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index 16a924c486bf..a1089f93e4ce 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -216,12 +216,8 @@ Interaction with the user of the RxRPC socket:
be used in all other sendmsgs or recvmsgs associated with that call. The
tag is carried in the control data.
- (*) connect() is used to supply a default destination address for a client
- socket. This may be overridden by supplying an alternate address to the
- first sendmsg() of a call (struct msghdr::msg_name).
-
- (*) If connect() is called on an unbound client, a random local port will
- bound before the operation takes place.
+ (*) connect() is not used. The target address for a client call must be
+ supplied to the first sendmsg() of a call (struct msghdr::msg_name).
(*) A server socket may also be used to make client calls. To do this, the
first sendmsg() of the call must specify the target address. The server's
diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h
index a53915cd5581..e4182d3f8c8b 100644
--- a/include/linux/rxrpc.h
+++ b/include/linux/rxrpc.h
@@ -40,16 +40,18 @@ struct sockaddr_rxrpc {
/*
* RxRPC control messages
+ * - data type is specified by default (ie. not abort or accept)
* - terminal messages mean that a user call ID tag can be recycled
+ * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg()
*/
-#define RXRPC_USER_CALL_ID 1 /* user call ID specifier */
-#define RXRPC_ABORT 2 /* abort request / notification [terminal] */
-#define RXRPC_ACK 3 /* [Server] RPC op final ACK received [terminal] */
-#define RXRPC_NET_ERROR 5 /* network error received [terminal] */
-#define RXRPC_BUSY 6 /* server busy received [terminal] */
-#define RXRPC_LOCAL_ERROR 7 /* local error generated [terminal] */
-#define RXRPC_NEW_CALL 8 /* [Server] new incoming call notification */
-#define RXRPC_ACCEPT 9 /* [Server] accept request */
+#define RXRPC_USER_CALL_ID 1 /* sr: user call ID specifier */
+#define RXRPC_ABORT 2 /* sr: abort request / notification [terminal] */
+#define RXRPC_ACK 3 /* -r: [Service] RPC op final ACK received [terminal] */
+#define RXRPC_NET_ERROR 5 /* -r: network error received [terminal] */
+#define RXRPC_BUSY 6 /* -r: server busy received [terminal] */
+#define RXRPC_LOCAL_ERROR 7 /* -r: local error generated [terminal] */
+#define RXRPC_NEW_CALL 8 /* -r: [Service] new incoming call notification */
+#define RXRPC_ACCEPT 9 /* s-: [Service] accept request */
/*
* RxRPC security levels
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index e45e94ca030f..dd462352a79c 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -137,33 +137,33 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len)
lock_sock(&rx->sk);
- if (rx->sk.sk_state != RXRPC_UNCONNECTED) {
+ if (rx->sk.sk_state != RXRPC_UNBOUND) {
ret = -EINVAL;
goto error_unlock;
}
memcpy(&rx->srx, srx, sizeof(rx->srx));
- /* Find or create a local transport endpoint to use */
local = rxrpc_lookup_local(&rx->srx);
if (IS_ERR(local)) {
ret = PTR_ERR(local);
goto error_unlock;
}
- rx->local = local;
- if (srx->srx_service) {
+ if (rx->srx.srx_service) {
write_lock_bh(&local->services_lock);
list_for_each_entry(prx, &local->services, listen_link) {
- if (prx->srx.srx_service == srx->srx_service)
+ if (prx->srx.srx_service == rx->srx.srx_service)
goto service_in_use;
}
+ rx->local = local;
list_add_tail(&rx->listen_link, &local->services);
write_unlock_bh(&local->services_lock);
rx->sk.sk_state = RXRPC_SERVER_BOUND;
} else {
+ rx->local = local;
rx->sk.sk_state = RXRPC_CLIENT_BOUND;
}
@@ -172,8 +172,9 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len)
return 0;
service_in_use:
- ret = -EADDRINUSE;
write_unlock_bh(&local->services_lock);
+ rxrpc_put_local(local);
+ ret = -EADDRINUSE;
error_unlock:
release_sock(&rx->sk);
error:
@@ -195,11 +196,11 @@ static int rxrpc_listen(struct socket *sock, int backlog)
lock_sock(&rx->sk);
switch (rx->sk.sk_state) {
- case RXRPC_UNCONNECTED:
+ case RXRPC_UNBOUND:
ret = -EADDRNOTAVAIL;
break;
+ case RXRPC_CLIENT_UNBOUND:
case RXRPC_CLIENT_BOUND:
- case RXRPC_CLIENT_CONNECTED:
default:
ret = -EBUSY;
break;
@@ -219,20 +220,18 @@ static int rxrpc_listen(struct socket *sock, int backlog)
/*
* find a transport by address
*/
-static struct rxrpc_transport *rxrpc_name_to_transport(struct socket *sock,
- struct sockaddr *addr,
- int addr_len, int flags,
- gfp_t gfp)
+struct rxrpc_transport *rxrpc_name_to_transport(struct rxrpc_sock *rx,
+ struct sockaddr *addr,
+ int addr_len, int flags,
+ gfp_t gfp)
{
struct sockaddr_rxrpc *srx = (struct sockaddr_rxrpc *) addr;
struct rxrpc_transport *trans;
- struct rxrpc_sock *rx = rxrpc_sk(sock->sk);
struct rxrpc_peer *peer;
_enter("%p,%p,%d,%d", rx, addr, addr_len, flags);
ASSERT(rx->local != NULL);
- ASSERT(rx->sk.sk_state > RXRPC_UNCONNECTED);
if (rx->srx.transport_type != srx->transport_type)
return ERR_PTR(-ESOCKTNOSUPPORT);
@@ -254,7 +253,7 @@ static struct rxrpc_transport *rxrpc_name_to_transport(struct socket *sock,
/**
* rxrpc_kernel_begin_call - Allow a kernel service to begin a call
* @sock: The socket on which to make the call
- * @srx: The address of the peer to contact (defaults to socket setting)
+ * @srx: The address of the peer to contact
* @key: The security context to use (defaults to socket setting)
* @user_call_ID: The ID to use
*
@@ -280,25 +279,14 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock,
lock_sock(&rx->sk);
- if (srx) {
- trans = rxrpc_name_to_transport(sock, (struct sockaddr *) srx,
- sizeof(*srx), 0, gfp);
- if (IS_ERR(trans)) {
- call = ERR_CAST(trans);
- trans = NULL;
- goto out_notrans;
- }
- } else {
- trans = rx->trans;
- if (!trans) {
- call = ERR_PTR(-ENOTCONN);
- goto out_notrans;
- }
- atomic_inc(&trans->usage);
+ trans = rxrpc_name_to_transport(rx, (struct sockaddr *)srx,
+ sizeof(*srx), 0, gfp);
+ if (IS_ERR(trans)) {
+ call = ERR_CAST(trans);
+ trans = NULL;
+ goto out_notrans;
}
- if (!srx)
- srx = &rx->srx;
if (!key)
key = rx->key;
if (key && !key->payload.data[0])
@@ -310,8 +298,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock,
goto out;
}
- call = rxrpc_get_client_call(rx, trans, bundle, user_call_ID, true,
- gfp);
+ call = rxrpc_new_client_call(rx, trans, bundle, user_call_ID, gfp);
rxrpc_put_bundle(trans, bundle);
out:
rxrpc_put_transport(trans);
@@ -360,69 +347,14 @@ void rxrpc_kernel_intercept_rx_messages(struct socket *sock,
EXPORT_SYMBOL(rxrpc_kernel_intercept_rx_messages);
/*
- * connect an RxRPC socket
- * - this just targets it at a specific destination; no actual connection
- * negotiation takes place
+ * We don't permit connection of an RxRPC socket. It's pointless since
+ * sendmsg() takes the target address for a new call and a socket can support
+ * calls to multiple servers simultaneously.
*/
static int rxrpc_connect(struct socket *sock, struct sockaddr *addr,
int addr_len, int flags)
{
- struct sockaddr_rxrpc *srx = (struct sockaddr_rxrpc *) addr;
- struct sock *sk = sock->sk;
- struct rxrpc_transport *trans;
- struct rxrpc_local *local;
- struct rxrpc_sock *rx = rxrpc_sk(sk);
- int ret;
-
- _enter("%p,%p,%d,%d", rx, addr, addr_len, flags);
-
- ret = rxrpc_validate_address(rx, srx, addr_len);
- if (ret < 0) {
- _leave(" = %d [bad addr]", ret);
- return ret;
- }
-
- lock_sock(&rx->sk);
-
- switch (rx->sk.sk_state) {
- case RXRPC_UNCONNECTED:
- /* find a local transport endpoint if we don't have one already */
- ASSERTCMP(rx->local, ==, NULL);
- rx->srx.srx_family = AF_RXRPC;
- rx->srx.srx_service = 0;
- rx->srx.transport_type = srx->transport_type;
- rx->srx.transport_len = sizeof(sa_family_t);
- rx->srx.transport.family = srx->transport.family;
- local = rxrpc_lookup_local(&rx->srx);
- if (IS_ERR(local)) {
- release_sock(&rx->sk);
- return PTR_ERR(local);
- }
- rx->local = local;
- rx->sk.sk_state = RXRPC_CLIENT_BOUND;
- case RXRPC_CLIENT_BOUND:
- break;
- case RXRPC_CLIENT_CONNECTED:
- release_sock(&rx->sk);
- return -EISCONN;
- default:
- release_sock(&rx->sk);
- return -EBUSY; /* server sockets can't connect as well */
- }
-
- trans = rxrpc_name_to_transport(sock, addr, addr_len, flags,
- GFP_KERNEL);
- if (IS_ERR(trans)) {
- release_sock(&rx->sk);
- _leave(" = %ld", PTR_ERR(trans));
- return PTR_ERR(trans);
- }
-
- rx->trans = trans;
- rx->sk.sk_state = RXRPC_CLIENT_CONNECTED;
-
- release_sock(&rx->sk);
- return 0;
+ return -EOPNOTSUPP;
}
/*
@@ -436,7 +368,7 @@ static int rxrpc_connect(struct socket *sock, struct sockaddr *addr,
*/
static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
{
- struct rxrpc_transport *trans;
+ struct rxrpc_local *local;
struct rxrpc_sock *rx = rxrpc_sk(sock->sk);
int ret;
@@ -453,48 +385,33 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
}
}
- trans = NULL;
lock_sock(&rx->sk);
- if (m->msg_name) {
- ret = -EISCONN;
- trans = rxrpc_name_to_transport(sock, m->msg_name,
- m->msg_namelen, 0, GFP_KERNEL);
- if (IS_ERR(trans)) {
- ret = PTR_ERR(trans);
- trans = NULL;
- goto out;
- }
- } else {
- trans = rx->trans;
- if (trans)
- atomic_inc(&trans->usage);
- }
-
switch (rx->sk.sk_state) {
- case RXRPC_SERVER_LISTENING:
- if (!m->msg_name) {
- ret = rxrpc_server_sendmsg(rx, m, len);
- break;
+ case RXRPC_UNBOUND:
+ local = rxrpc_lookup_local(&rx->srx);
+ if (IS_ERR(local)) {
+ ret = PTR_ERR(local);
+ goto error_unlock;
}
- case RXRPC_SERVER_BOUND:
+
+ rx->local = local;
+ rx->sk.sk_state = RXRPC_CLIENT_UNBOUND;
+ /* Fall through */
+
+ case RXRPC_CLIENT_UNBOUND:
case RXRPC_CLIENT_BOUND:
- if (!m->msg_name) {
- ret = -ENOTCONN;
- break;
- }
- case RXRPC_CLIENT_CONNECTED:
- ret = rxrpc_client_sendmsg(rx, trans, m, len);
+ case RXRPC_SERVER_BOUND:
+ case RXRPC_SERVER_LISTENING:
+ ret = rxrpc_do_sendmsg(rx, m, len);
break;
default:
- ret = -ENOTCONN;
+ ret = -EINVAL;
break;
}
-out:
+error_unlock:
release_sock(&rx->sk);
- if (trans)
- rxrpc_put_transport(trans);
_leave(" = %d", ret);
return ret;
}
@@ -521,7 +438,7 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname,
if (optlen != 0)
goto error;
ret = -EISCONN;
- if (rx->sk.sk_state != RXRPC_UNCONNECTED)
+ if (rx->sk.sk_state != RXRPC_UNBOUND)
goto error;
set_bit(RXRPC_SOCK_EXCLUSIVE_CONN, &rx->flags);
goto success;
@@ -531,7 +448,7 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname,
if (rx->key)
goto error;
ret = -EISCONN;
- if (rx->sk.sk_state != RXRPC_UNCONNECTED)
+ if (rx->sk.sk_state != RXRPC_UNBOUND)
goto error;
ret = rxrpc_request_key(rx, optval, optlen);
goto error;
@@ -541,7 +458,7 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname,
if (rx->key)
goto error;
ret = -EISCONN;
- if (rx->sk.sk_state != RXRPC_UNCONNECTED)
+ if (rx->sk.sk_state != RXRPC_UNBOUND)
goto error;
ret = rxrpc_server_keyring(rx, optval, optlen);
goto error;
@@ -551,7 +468,7 @@ static int rxrpc_setsockopt(struct socket *sock, int level, int optname,
if (optlen != sizeof(unsigned int))
goto error;
ret = -EISCONN;
- if (rx->sk.sk_state != RXRPC_UNCONNECTED)
+ if (rx->sk.sk_state != RXRPC_UNBOUND)
goto error;
ret = get_user(min_sec_level,
(unsigned int __user *) optval);
@@ -630,7 +547,7 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
return -ENOMEM;
sock_init_data(sock, sk);
- sk->sk_state = RXRPC_UNCONNECTED;
+ sk->sk_state = RXRPC_UNBOUND;
sk->sk_write_space = rxrpc_write_space;
sk->sk_max_ack_backlog = sysctl_rxrpc_max_qlen;
sk->sk_destruct = rxrpc_sock_destructor;
@@ -703,14 +620,6 @@ static int rxrpc_release_sock(struct sock *sk)
rx->conn = NULL;
}
- if (rx->bundle) {
- rxrpc_put_bundle(rx->trans, rx->bundle);
- rx->bundle = NULL;
- }
- if (rx->trans) {
- rxrpc_put_transport(rx->trans);
- rx->trans = NULL;
- }
if (rx->local) {
rxrpc_put_local(rx->local);
rx->local = NULL;
diff --git a/net/rxrpc/ar-call.c b/net/rxrpc/ar-call.c
index 571a41fd5a32..9296bdb26c24 100644
--- a/net/rxrpc/ar-call.c
+++ b/net/rxrpc/ar-call.c
@@ -194,6 +194,43 @@ struct rxrpc_call *rxrpc_find_call_hash(
}
/*
+ * find an extant server call
+ * - called in process context with IRQs enabled
+ */
+struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *rx,
+ unsigned long user_call_ID)
+{
+ struct rxrpc_call *call;
+ struct rb_node *p;
+
+ _enter("%p,%lx", rx, user_call_ID);
+
+ read_lock(&rx->call_lock);
+
+ p = rx->calls.rb_node;
+ while (p) {
+ call = rb_entry(p, struct rxrpc_call, sock_node);
+
+ if (user_call_ID < call->user_call_ID)
+ p = p->rb_left;
+ else if (user_call_ID > call->user_call_ID)
+ p = p->rb_right;
+ else
+ goto found_extant_call;
+ }
+
+ read_unlock(&rx->call_lock);
+ _leave(" = NULL");
+ return NULL;
+
+found_extant_call:
+ rxrpc_get_call(call);
+ read_unlock(&rx->call_lock);
+ _leave(" = %p [%d]", call, atomic_read(&call->usage));
+ return call;
+}
+
+/*
* allocate a new call
*/
static struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
@@ -309,51 +346,27 @@ static struct rxrpc_call *rxrpc_alloc_client_call(
* set up a call for the given data
* - called in process context with IRQs enabled
*/
-struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *rx,
+struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx,
struct rxrpc_transport *trans,
struct rxrpc_conn_bundle *bundle,
unsigned long user_call_ID,
- int create,
gfp_t gfp)
{
- struct rxrpc_call *call, *candidate;
- struct rb_node *p, *parent, **pp;
+ struct rxrpc_call *call, *xcall;
+ struct rb_node *parent, **pp;
- _enter("%p,%d,%d,%lx,%d",
- rx, trans ? trans->debug_id : -1, bundle ? bundle->debug_id : -1,
- user_call_ID, create);
+ _enter("%p,%d,%d,%lx",
+ rx, trans->debug_id, bundle ? bundle->debug_id : -1,
+ user_call_ID);
- /* search the extant calls first for one that matches the specified
- * user ID */
- read_lock(&rx->call_lock);
-
- p = rx->calls.rb_node;
- while (p) {
- call = rb_entry(p, struct rxrpc_call, sock_node);
-
- if (user_call_ID < call->user_call_ID)
- p = p->rb_left;
- else if (user_call_ID > call->user_call_ID)
- p = p->rb_right;
- else
- goto found_extant_call;
+ call = rxrpc_alloc_client_call(rx, trans, bundle, gfp);
+ if (IS_ERR(call)) {
+ _leave(" = %ld", PTR_ERR(call));
+ return call;
}
- read_unlock(&rx->call_lock);
-
- if (!create || !trans)
- return ERR_PTR(-EBADSLT);
-
- /* not yet present - create a candidate for a new record and then
- * redo the search */
- candidate = rxrpc_alloc_client_call(rx, trans, bundle, gfp);
- if (IS_ERR(candidate)) {
- _leave(" = %ld", PTR_ERR(candidate));
- return candidate;
- }
-
- candidate->user_call_ID = user_call_ID;
- __set_bit(RXRPC_CALL_HAS_USERID, &candidate->flags);
+ call->user_call_ID = user_call_ID;
+ __set_bit(RXRPC_CALL_HAS_USERID, &call->flags);
write_lock(&rx->call_lock);
@@ -361,19 +374,16 @@ struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *rx,
parent = NULL;
while (*pp) {
parent = *pp;
- call = rb_entry(parent, struct rxrpc_call, sock_node);
+ xcall = rb_entry(parent, struct rxrpc_call, sock_node);
- if (user_call_ID < call->user_call_ID)
+ if (user_call_ID < xcall->user_call_ID)
pp = &(*pp)->rb_left;
- else if (user_call_ID > call->user_call_ID)
+ else if (user_call_ID > xcall->user_call_ID)
pp = &(*pp)->rb_right;
else
- goto found_extant_second;
+ goto found_user_ID_now_present;
}
- /* second search also failed; add the new call */
- call = candidate;
- candidate = NULL;
rxrpc_get_call(call);
rb_link_node(&call->sock_node, parent, pp);
@@ -389,20 +399,16 @@ struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *rx,
_leave(" = %p [new]", call);
return call;
- /* we found the call in the list immediately */
-found_extant_call:
- rxrpc_get_call(call);
- read_unlock(&rx->call_lock);
- _leave(" = %p [extant %d]", call, atomic_read(&call->usage));
- return call;
-
- /* we found the call on the second time through the list */
-found_extant_second:
- rxrpc_get_call(call);
+ /* We unexpectedly found the user ID in the list after taking
+ * the call_lock. This shouldn't happen unless the user races
+ * with itself and tries to add the same user ID twice at the
+ * same time in different threads.
+ */
+found_user_ID_now_present:
write_unlock(&rx->call_lock);
- rxrpc_put_call(candidate);
- _leave(" = %p [second %d]", call, atomic_read(&call->usage));
- return call;
+ rxrpc_put_call(call);
+ _leave(" = -EEXIST [%p]", call);
+ return ERR_PTR(-EEXIST);
}
/*
@@ -564,46 +570,6 @@ old_call:
}
/*
- * find an extant server call
- * - called in process context with IRQs enabled
- */
-struct rxrpc_call *rxrpc_find_server_call(struct rxrpc_sock *rx,
- unsigned long user_call_ID)
-{
- struct rxrpc_call *call;
- struct rb_node *p;
-
- _enter("%p,%lx", rx, user_call_ID);
-
- /* search the extant calls for one that matches the specified user
- * ID */
- read_lock(&rx->call_lock);
-
- p = rx->calls.rb_node;
- while (p) {
- call = rb_entry(p, struct rxrpc_call, sock_node);
-
- if (user_call_ID < call->user_call_ID)
- p = p->rb_left;
- else if (user_call_ID > call->user_call_ID)
- p = p->rb_right;
- else
- goto found_extant_call;
- }
-
- read_unlock(&rx->call_lock);
- _leave(" = NULL");
- return NULL;
-
- /* we found the call in the list immediately */
-found_extant_call:
- rxrpc_get_call(call);
- read_unlock(&rx->call_lock);
- _leave(" = %p [%d]", call, atomic_read(&call->usage));
- return call;
-}
-
-/*
* detach a call from a socket and set up for release
*/
void rxrpc_release_call(struct rxrpc_call *call)
diff --git a/net/rxrpc/ar-connection.c b/net/rxrpc/ar-connection.c
index 97f4fae74bca..5307dba4a13a 100644
--- a/net/rxrpc/ar-connection.c
+++ b/net/rxrpc/ar-connection.c
@@ -78,11 +78,6 @@ struct rxrpc_conn_bundle *rxrpc_get_bundle(struct rxrpc_sock *rx,
_enter("%p{%x},%x,%hx,",
rx, key_serial(key), trans->debug_id, service_id);
- if (rx->trans == trans && rx->bundle) {
- atomic_inc(&rx->bundle->usage);
- return rx->bundle;
- }
-
/* search the extant bundles first for one that matches the specified
* user ID */
spin_lock(&trans->client_lock);
@@ -136,10 +131,6 @@ struct rxrpc_conn_bundle *rxrpc_get_bundle(struct rxrpc_sock *rx,
rb_insert_color(&bundle->node, &trans->bundles);
spin_unlock(&trans->client_lock);
_net("BUNDLE new on trans %d", trans->debug_id);
- if (!rx->bundle && rx->sk.sk_state == RXRPC_CLIENT_CONNECTED) {
- atomic_inc(&bundle->usage);
- rx->bundle = bundle;
- }
_leave(" = %p [new]", bundle);
return bundle;
@@ -148,10 +139,6 @@ found_extant_bundle:
atomic_inc(&bundle->usage);
spin_unlock(&trans->client_lock);
_net("BUNDLE old on trans %d", trans->debug_id);
- if (!rx->bundle && rx->sk.sk_state == RXRPC_CLIENT_CONNECTED) {
- atomic_inc(&bundle->usage);
- rx->bundle = bundle;
- }
_leave(" = %p [extant %d]", bundle, atomic_read(&bundle->usage));
return bundle;
@@ -161,10 +148,6 @@ found_extant_second:
spin_unlock(&trans->client_lock);
kfree(candidate);
_net("BUNDLE old2 on trans %d", trans->debug_id);
- if (!rx->bundle && rx->sk.sk_state == RXRPC_CLIENT_CONNECTED) {
- atomic_inc(&bundle->usage);
- rx->bundle = bundle;
- }
_leave(" = %p [second %d]", bundle, atomic_read(&bundle->usage));
return bundle;
}
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index f0b807a163fa..bbf2443af875 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -39,9 +39,9 @@ struct rxrpc_crypt {
* sk_state for RxRPC sockets
*/
enum {
- RXRPC_UNCONNECTED = 0,
+ RXRPC_UNBOUND = 0,
+ RXRPC_CLIENT_UNBOUND, /* Unbound socket used as client */
RXRPC_CLIENT_BOUND, /* client local address bound */
- RXRPC_CLIENT_CONNECTED, /* client is connected */
RXRPC_SERVER_BOUND, /* server local address bound */
RXRPC_SERVER_LISTENING, /* server listening for connections */
RXRPC_CLOSE, /* socket is being closed */
@@ -55,8 +55,6 @@ struct rxrpc_sock {
struct sock sk;
rxrpc_interceptor_t interceptor; /* kernel service Rx interceptor function */
struct rxrpc_local *local; /* local endpoint */
- struct rxrpc_transport *trans; /* transport handler */
- struct rxrpc_conn_bundle *bundle; /* virtual connection bundle */
struct rxrpc_connection *conn; /* exclusive virtual connection */
struct list_head listen_link; /* link in the local endpoint's listen list */
struct list_head secureq; /* calls awaiting connection security clearance */
@@ -477,6 +475,10 @@ extern u32 rxrpc_epoch;
extern atomic_t rxrpc_debug_id;
extern struct workqueue_struct *rxrpc_workqueue;
+struct rxrpc_transport *rxrpc_name_to_transport(struct rxrpc_sock *,
+ struct sockaddr *,
+ int, int, gfp_t);
+
/*
* ar-accept.c
*/
@@ -502,14 +504,15 @@ extern rwlock_t rxrpc_call_lock;
struct rxrpc_call *rxrpc_find_call_hash(struct rxrpc_host_header *,
void *, sa_family_t, const void *);
-struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *,
+struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *,
+ unsigned long);
+struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *,
struct rxrpc_transport *,
struct rxrpc_conn_bundle *,
- unsigned long, int, gfp_t);
+ unsigned long, gfp_t);
struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *,
struct rxrpc_connection *,
struct rxrpc_host_header *);
-struct rxrpc_call *rxrpc_find_server_call(struct rxrpc_sock *, unsigned long);
void rxrpc_release_call(struct rxrpc_call *);
void rxrpc_release_calls_on_socket(struct rxrpc_sock *);
void __rxrpc_put_call(struct rxrpc_call *);
@@ -581,9 +584,7 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, time_t,
extern unsigned int rxrpc_resend_timeout;
int rxrpc_send_packet(struct rxrpc_transport *, struct sk_buff *);
-int rxrpc_client_sendmsg(struct rxrpc_sock *, struct rxrpc_transport *,
- struct msghdr *, size_t);
-int rxrpc_server_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
+int rxrpc_do_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
/*
* ar-peer.c
diff --git a/net/rxrpc/ar-output.c b/net/rxrpc/ar-output.c
index 51cb10062a8d..b87fda075b45 100644
--- a/net/rxrpc/ar-output.c
+++ b/net/rxrpc/ar-output.c
@@ -30,13 +30,13 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
/*
* extract control messages from the sendmsg() control buffer
*/
-static int rxrpc_sendmsg_cmsg(struct rxrpc_sock *rx, struct msghdr *msg,
+static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
unsigned long *user_call_ID,
enum rxrpc_command *command,
- u32 *abort_code,
- bool server)
+ u32 *abort_code)
{
struct cmsghdr *cmsg;
+ bool got_user_ID = false;
int len;
*command = RXRPC_CMD_SEND_DATA;
@@ -68,6 +68,7 @@ static int rxrpc_sendmsg_cmsg(struct rxrpc_sock *rx, struct msghdr *msg,
CMSG_DATA(cmsg);
}
_debug("User Call ID %lx", *user_call_ID);
+ got_user_ID = true;
break;
case RXRPC_ABORT:
@@ -88,8 +89,6 @@ static int rxrpc_sendmsg_cmsg(struct rxrpc_sock *rx, struct msghdr *msg,
*command = RXRPC_CMD_ACCEPT;
if (len != 0)
return -EINVAL;
- if (!server)
- return -EISCONN;
break;
default:
@@ -97,6 +96,8 @@ static int rxrpc_sendmsg_cmsg(struct rxrpc_sock *rx, struct msghdr *msg,
}
}
+ if (!got_user_ID)
+ return -EINVAL;
_leave(" = 0");
return 0;
}
@@ -124,55 +125,96 @@ static void rxrpc_send_abort(struct rxrpc_call *call, u32 abort_code)
}
/*
+ * Create a new client call for sendmsg().
+ */
+static struct rxrpc_call *
+rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg,
+ unsigned long user_call_ID)
+{
+ struct rxrpc_conn_bundle *bundle;
+ struct rxrpc_transport *trans;
+ struct rxrpc_call *call;
+ struct key *key;
+ long ret;
+
+ DECLARE_SOCKADDR(struct sockaddr_rxrpc *, srx, msg->msg_name);
+
+ _enter("");
+
+ if (!msg->msg_name)
+ return ERR_PTR(-EDESTADDRREQ);
+
+ trans = rxrpc_name_to_transport(rx, msg->msg_name, msg->msg_namelen, 0,
+ GFP_KERNEL);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ goto out;
+ }
+
+ key = rx->key;
+ if (key && !rx->key->payload.data[0])
+ key = NULL;
+ bundle = rxrpc_get_bundle(rx, trans, key, srx->srx_service, GFP_KERNEL);
+ if (IS_ERR(bundle)) {
+ ret = PTR_ERR(bundle);
+ goto out_trans;
+ }
+
+ call = rxrpc_new_client_call(rx, trans, bundle, user_call_ID,
+ GFP_KERNEL);
+ rxrpc_put_bundle(trans, bundle);
+ rxrpc_put_transport(trans);
+ if (IS_ERR(call)) {
+ ret = PTR_ERR(call);
+ goto out_trans;
+ }
+
+ _leave(" = %p\n", call);
+ return call;
+
+out_trans:
+ rxrpc_put_transport(trans);
+out:
+ _leave(" = %ld", ret);
+ return ERR_PTR(ret);
+}
+
+/*
* send a message forming part of a client call through an RxRPC socket
* - caller holds the socket locked
* - the socket may be either a client socket or a server socket
*/
-int rxrpc_client_sendmsg(struct rxrpc_sock *rx, struct rxrpc_transport *trans,
- struct msghdr *msg, size_t len)
+int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len)
{
- struct rxrpc_conn_bundle *bundle;
enum rxrpc_command cmd;
struct rxrpc_call *call;
unsigned long user_call_ID = 0;
- struct key *key;
- u16 service_id;
u32 abort_code = 0;
int ret;
_enter("");
- ASSERT(trans != NULL);
-
- ret = rxrpc_sendmsg_cmsg(rx, msg, &user_call_ID, &cmd, &abort_code,
- false);
+ ret = rxrpc_sendmsg_cmsg(msg, &user_call_ID, &cmd, &abort_code);
if (ret < 0)
return ret;
- bundle = NULL;
- if (trans) {
- service_id = rx->srx.srx_service;
- if (msg->msg_name) {
- DECLARE_SOCKADDR(struct sockaddr_rxrpc *, srx,
- msg->msg_name);
- service_id = srx->srx_service;
- }
- key = rx->key;
- if (key && !rx->key->payload.data[0])
- key = NULL;
- bundle = rxrpc_get_bundle(rx, trans, key, service_id,
- GFP_KERNEL);
- if (IS_ERR(bundle))
- return PTR_ERR(bundle);
+ if (cmd == RXRPC_CMD_ACCEPT) {
+ if (rx->sk.sk_state != RXRPC_SERVER_LISTENING)
+ return -EINVAL;
+ call = rxrpc_accept_call(rx, user_call_ID);
+ if (IS_ERR(call))
+ return PTR_ERR(call);
+ rxrpc_put_call(call);
+ return 0;
}
- call = rxrpc_get_client_call(rx, trans, bundle, user_call_ID,
- abort_code == 0, GFP_KERNEL);
- if (trans)
- rxrpc_put_bundle(trans, bundle);
- if (IS_ERR(call)) {
- _leave(" = %ld", PTR_ERR(call));
- return PTR_ERR(call);
+ call = rxrpc_find_call_by_user_ID(rx, user_call_ID);
+ if (!call) {
+ if (cmd != RXRPC_CMD_SEND_DATA)
+ return -EBADSLT;
+ call = rxrpc_new_client_call_for_sendmsg(rx, msg, user_call_ID);
+ if (IS_ERR(call))
+ return PTR_ERR(call);
}
_debug("CALL %d USR %lx ST %d on CONN %p",
@@ -180,14 +222,21 @@ int rxrpc_client_sendmsg(struct rxrpc_sock *rx, struct rxrpc_transport *trans,
if (call->state >= RXRPC_CALL_COMPLETE) {
/* it's too late for this call */
- ret = -ESHUTDOWN;
+ ret = -ECONNRESET;
} else if (cmd == RXRPC_CMD_SEND_ABORT) {
rxrpc_send_abort(call, abort_code);
+ ret = 0;
} else if (cmd != RXRPC_CMD_SEND_DATA) {
ret = -EINVAL;
- } else if (call->state != RXRPC_CALL_CLIENT_SEND_REQUEST) {
+ } else if (!call->in_clientflag &&
+ call->state != RXRPC_CALL_CLIENT_SEND_REQUEST) {
/* request phase complete for this client call */
ret = -EPROTO;
+ } else if (call->in_clientflag &&
+ call->state != RXRPC_CALL_SERVER_ACK_REQUEST &&
+ call->state != RXRPC_CALL_SERVER_SEND_REPLY) {
+ /* Reply phase not begun or not complete for service call. */
+ ret = -EPROTO;
} else {
ret = rxrpc_send_data(rx, call, msg, len);
}
@@ -266,67 +315,6 @@ void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code)
EXPORT_SYMBOL(rxrpc_kernel_abort_call);
/*
- * send a message through a server socket
- * - caller holds the socket locked
- */
-int rxrpc_server_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len)
-{
- enum rxrpc_command cmd;
- struct rxrpc_call *call;
- unsigned long user_call_ID = 0;
- u32 abort_code = 0;
- int ret;
-
- _enter("");
-
- ret = rxrpc_sendmsg_cmsg(rx, msg, &user_call_ID, &cmd, &abort_code,
- true);
- if (ret < 0)
- return ret;
-
- if (cmd == RXRPC_CMD_ACCEPT) {
- call = rxrpc_accept_call(rx, user_call_ID);
- if (IS_ERR(call))
- return PTR_ERR(call);
- rxrpc_put_call(call);
- return 0;
- }
-
- call = rxrpc_find_server_call(rx, user_call_ID);
- if (!call)
- return -EBADSLT;
- if (call->state >= RXRPC_CALL_COMPLETE) {
- ret = -ESHUTDOWN;
- goto out;
- }
-
- switch (cmd) {
- case RXRPC_CMD_SEND_DATA:
- if (call->state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
- call->state != RXRPC_CALL_SERVER_ACK_REQUEST &&
- call->state != RXRPC_CALL_SERVER_SEND_REPLY) {
- /* Tx phase not yet begun for this call */
- ret = -EPROTO;
- break;
- }
-
- ret = rxrpc_send_data(rx, call, msg, len);
- break;
-
- case RXRPC_CMD_SEND_ABORT:
- rxrpc_send_abort(call, abort_code);
- break;
- default:
- BUG();
- }
-
- out:
- rxrpc_put_call(call);
- _leave(" = %d", ret);
- return ret;
-}
-
-/*
* send a packet through the transport endpoint
*/
int rxrpc_send_packet(struct rxrpc_transport *trans, struct sk_buff *skb)
^ permalink raw reply related
* [PATCH 0/3] RxRPC: 2nd rewrite part 2
From: David Howells @ 2016-04-12 15:05 UTC (permalink / raw)
To: linux-afs; +Cc: dhowells, netdev, linux-kernel
Here's the next part of the AF_RXRPC rewrite. In this set I make some
changes to the user interface for AF_RXRPC:
(1) connect() is no longer supported on an AF_RXRPC socket. It is
redundant given that sendmsg() can be given the target address;
indeed, even on a connected client socket, sendmsg() can still be used
with an address other than the connection address.
(2) listen() is required to allow a service socket to begin accepting
incoming calls. Previously, bind() with a service ID set in the
address caused the socket to begin listening. Listen only adjusted
the backlog parameter on the socket previously.
(3) The maximum backlog size can be adjusted through a sysctl - though it
is still limited to the range 4-32. At some point I would like to
have some preallocated rxrpc_call structs prepared for incoming calls,
using the backlog to limit the preallocation. Passing INT_MAX to
listen() requests the maximum allowed.
(4) Calling sendmsg() on a socket that is not yet bound shifts the socket
to be a purely client socket and binds a random local UDP port.
(5) sendmsg() with a RXRPC_ACCEPT control message must not also have an
address specified in msg_name. It doesn't make sense to supply an
address here.
(6) If sendmsg() is asked to make a call with a particular user call ID
which doesn't yet exist, the user call ID must not come into existence
whilst sendmsg() is off creating a new call. Previously it would just
add its data to the call.
I would also like to consider making further changes, but I think they'd
probably be too much of a change:
(*) Require a control message of RXRPC_NEW_CALL to be passed to sendmsg()
when beginning a new call to make it clear that we're instituting a
new user call ID, not expecting the user call ID to already exist with
the socket. This would make (6) above cleaner.
(*) Provide RXRPC_LOCALLY_ABORTED and RXRPC_REMOTELY_ABORTED control
messages for recvmsg() to return instead of RXRPC_ABORT (which would
then be for sendmsg() only). Another way to do this is to return an
additional control message that, say, indicates that the termination
was remote.
(*) Allow userspace to presupply user call IDs for incoming calls to use.
These would be used instead of RXRPC_ACCEPT. A control message would
be required: one for sendmsg() to supply a user ID (RXRPC_PREACCEPT
say) and then RXRPC_NEW_CALL would be given a parameter through
recvmsg() to indicate the number of user call IDs available.
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
Tagged thusly:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160412
This is based on net-next/master
David
---
David Howells (3):
rxrpc: Don't permit use of connect() op and simplify sendmsg() op
rxrpc: The RXRPC_ACCEPT control message should not have an address
rxrpc: Use the listen() system call to move to listening state
Documentation/networking/rxrpc.txt | 8 -
fs/afs/rxrpc.c | 34 +++---
include/linux/rxrpc.h | 18 ++-
net/rxrpc/af_rxrpc.c | 209 ++++++++++--------------------------
net/rxrpc/ar-call.c | 158 +++++++++++----------------
net/rxrpc/ar-connection.c | 17 ---
net/rxrpc/ar-internal.h | 22 ++--
net/rxrpc/ar-output.c | 187 +++++++++++++++-----------------
net/rxrpc/misc.c | 6 +
net/rxrpc/sysctl.c | 10 ++
10 files changed, 269 insertions(+), 400 deletions(-)
^ permalink raw reply
* Re: TCP reaching to maximum throughput after a long time
From: Ben Greear @ 2016-04-12 15:04 UTC (permalink / raw)
To: Machani, Yaniv, netdev
Cc: Eric Dumazet, David S. Miller, Eric Dumazet, Neal Cardwell,
Yuchung Cheng, Nandita Dukkipati, open list, Kama, Meirav
In-Reply-To: <1460472764.6473.589.camel@edumazet-glaptop3.roam.corp.google.com>
On 04/12/2016 07:52 AM, Eric Dumazet wrote:
> On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
>> Hi,
>> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
>> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4 it is taking ~20-30 seconds.
>> UDP TX/RX and TCP RX performance is as expected.
>> We are using a Beagle Bone Black and a WiLink8 device.
>>
>> Were there any related changes that might cause such behavior ?
>> Kernel configuration and sysctl values were compared, but no significant differences have been found.
If you are using 'Cubic' TCP congestion control, then please try something different.
It was broken last I checked, at least when used with the ath10k driver.
https://marc.info/?l=linux-netdev&m=144405216005715&w=2
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* [PATCH tip/core/rcu 5/6] rcu: Remove superfluous versions of rcu_read_lock_sched_held()
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Boqun Feng, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
From: Boqun Feng <boqun.feng@gmail.com>
Currently, we have four versions of rcu_read_lock_sched_held(), depending
on the combined choices on PREEMPT_COUNT and DEBUG_LOCK_ALLOC. However,
there is an existing function preemptible() that already distinguishes
between the PREEMPT_COUNT=y and PREEMPT_COUNT=n cases, and allows these
four implementations to be consolidated down to two.
This commit therefore uses preemptible() to achieve this consolidation.
Note that there could be a small performance regression in the case
of CONFIG_DEBUG_LOCK_ALLOC=y && PREEMPT_COUNT=n. However, given the
overhead associated with CONFIG_DEBUG_LOCK_ALLOC=y, this should be
down in the noise.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 17 +----------------
kernel/rcu/update.c | 4 ++--
2 files changed, 3 insertions(+), 18 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 45de591657a6..5f1533e3d032 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -508,14 +508,7 @@ int rcu_read_lock_bh_held(void);
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise.
*/
-#ifdef CONFIG_PREEMPT_COUNT
int rcu_read_lock_sched_held(void);
-#else /* #ifdef CONFIG_PREEMPT_COUNT */
-static inline int rcu_read_lock_sched_held(void)
-{
- return 1;
-}
-#endif /* #else #ifdef CONFIG_PREEMPT_COUNT */
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
@@ -532,18 +525,10 @@ static inline int rcu_read_lock_bh_held(void)
return 1;
}
-#ifdef CONFIG_PREEMPT_COUNT
static inline int rcu_read_lock_sched_held(void)
{
- return preempt_count() != 0 || irqs_disabled();
+ return !preemptible();
}
-#else /* #ifdef CONFIG_PREEMPT_COUNT */
-static inline int rcu_read_lock_sched_held(void)
-{
- return 1;
-}
-#endif /* #else #ifdef CONFIG_PREEMPT_COUNT */
-
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#ifdef CONFIG_PROVE_RCU
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index ca828b41c938..3ccdc8eebc5a 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -67,7 +67,7 @@ static int rcu_normal_after_boot;
module_param(rcu_normal_after_boot, int, 0);
#endif /* #ifndef CONFIG_TINY_RCU */
-#if defined(CONFIG_DEBUG_LOCK_ALLOC) && defined(CONFIG_PREEMPT_COUNT)
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
/**
* rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
*
@@ -111,7 +111,7 @@ int rcu_read_lock_sched_held(void)
return 0;
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
- return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
+ return lockdep_opinion || !preemptible();
}
EXPORT_SYMBOL(rcu_read_lock_sched_held);
#endif
--
2.5.2
^ permalink raw reply related
* [PATCH tip/core/rcu 1/6] rcu: Consolidate dumping of ftrace buffer
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
This commit consolidates a couple definitions and several calls for
single-shot ftrace-buffer dumping.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 13 +++++++++++++
kernel/rcu/rcutorture.c | 17 +++--------------
kernel/rcu/tree.c | 4 ++--
3 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2657aff2725b..45de591657a6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1144,4 +1144,17 @@ static inline void rcu_sysidle_force_exit(void)
#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+/*
+ * Dump the ftrace buffer, but only one time per callsite per boot.
+ */
+#define rcu_ftrace_dump(oops_dump_mode) \
+do { \
+ static atomic_t ___rfd_beenhere = ATOMIC_INIT(0); \
+ \
+ if (!atomic_read(&___rfd_beenhere) && \
+ !atomic_xchg(&___rfd_beenhere, 1)) \
+ ftrace_dump(oops_dump_mode); \
+} while (0)
+
+
#endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 250ea67c1615..463867c43221 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1082,17 +1082,6 @@ rcu_torture_fakewriter(void *arg)
return 0;
}
-static void rcutorture_trace_dump(void)
-{
- static atomic_t beenhere = ATOMIC_INIT(0);
-
- if (atomic_read(&beenhere))
- return;
- if (atomic_xchg(&beenhere, 1) != 0)
- return;
- ftrace_dump(DUMP_ALL);
-}
-
/*
* RCU torture reader from timer handler. Dereferences rcu_torture_current,
* incrementing the corresponding element of the pipeline array. The
@@ -1142,7 +1131,7 @@ static void rcu_torture_timer(unsigned long unused)
if (pipe_count > 1) {
do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu, ts,
started, completed);
- rcutorture_trace_dump();
+ rcu_ftrace_dump(DUMP_ALL);
}
__this_cpu_inc(rcu_torture_count[pipe_count]);
completed = completed - started;
@@ -1215,7 +1204,7 @@ rcu_torture_reader(void *arg)
if (pipe_count > 1) {
do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu,
ts, started, completed);
- rcutorture_trace_dump();
+ rcu_ftrace_dump(DUMP_ALL);
}
__this_cpu_inc(rcu_torture_count[pipe_count]);
completed = completed - started;
@@ -1333,7 +1322,7 @@ rcu_torture_stats_print(void)
rcu_torture_writer_state,
gpnum, completed, flags);
show_rcu_gp_kthreads();
- rcutorture_trace_dump();
+ rcu_ftrace_dump(DUMP_ALL);
}
rtcv_snap = rcu_torture_current_version;
}
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9a535a86e732..531a328076bd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -637,7 +637,7 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
idle_task(smp_processor_id());
trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
- ftrace_dump(DUMP_ORIG);
+ rcu_ftrace_dump(DUMP_ORIG);
WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
current->pid, current->comm,
idle->pid, idle->comm); /* must be idle task! */
@@ -799,7 +799,7 @@ static void rcu_eqs_exit_common(long long oldval, int user)
trace_rcu_dyntick(TPS("Error on exit: not idle task"),
oldval, rdtp->dynticks_nesting);
- ftrace_dump(DUMP_ORIG);
+ rcu_ftrace_dump(DUMP_ORIG);
WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
current->pid, current->comm,
idle->pid, idle->comm); /* must be idle task! */
--
2.5.2
^ permalink raw reply related
* [PATCH tip/core/rcu 0/6] Miscellaneous fixes for 4.7
From: Paul E. McKenney @ 2016-04-12 15:01 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani
Hello!
This series provides miscellaneous fixes for RCU:
1. Consolidate RCU's single-shot dumping of the ftrace buffer.
This is a bit widespread, so the upcoming documentation,
expedited-grace-period, and torture patchsets are also based
on this patch.
2. Awaken grace-period kthread at RCU CPU stall-warning time.
3. Move the FQS timeout farther into the future only if the current
wakeup actually resulted in an FQS scan.
4. Awaken grace-period kthread if too long since FQS.
5. Remove superfluous versions of rcu_read_lock_sched_held(),
courtesy of Boqun Feng.
6. Dump ftrace buffer when kicking grace-period kthread.
Thanx, Paul
------------------------------------------------------------------------
include/linux/rcupdate.h | 30 +++++++++----------
kernel/rcu/rcutorture.c | 17 +----------
kernel/rcu/tree.c | 71 ++++++++++++++++++++++++++++++++++++++---------
kernel/rcu/tree.h | 2 +
kernel/rcu/update.c | 4 +-
5 files changed, 79 insertions(+), 45 deletions(-)
^ permalink raw reply
* [PATCH tip/core/rcu 4/6] rcu: Awaken grace-period kthread if too long since FQS
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing. This commit is a crude hack that does
a wakeup if a scheduling-clock interrupt sees that it has been
too long since force-quiescent-state (FQS) processing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcu/tree.c | 39 +++++++++++++++++++++++++++++++++++++--
kernel/rcu/tree.h | 2 ++
2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6116cfad18ff..a739292be605 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -385,9 +385,11 @@ module_param(qlowmark, long, 0444);
static ulong jiffies_till_first_fqs = ULONG_MAX;
static ulong jiffies_till_next_fqs = ULONG_MAX;
+static bool rcu_kick_kthreads;
module_param(jiffies_till_first_fqs, ulong, 0644);
module_param(jiffies_till_next_fqs, ulong, 0644);
+module_param(rcu_kick_kthreads, bool, 0644);
/*
* How long the grace period must be before we start recruiting
@@ -1251,6 +1253,24 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
}
}
+/*
+ * If too much time has passed in the current grace period, and if
+ * so configured, go kick the relevant kthreads.
+ */
+static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
+{
+ unsigned long j;
+
+ if (!rcu_kick_kthreads)
+ return;
+ j = READ_ONCE(rsp->jiffies_kick_kthreads);
+ if (time_after(jiffies, j) && rsp->gp_kthread) {
+ WARN_ONCE(1, "Kicking %s grace-period kthread\n", rsp->name);
+ wake_up_process(rsp->gp_kthread);
+ WRITE_ONCE(rsp->jiffies_kick_kthreads, j + HZ);
+ }
+}
+
static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
{
int cpu;
@@ -1262,6 +1282,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
struct rcu_node *rnp = rcu_get_root(rsp);
long totqlen = 0;
+ /* Kick and suppress, if so configured. */
+ rcu_stall_kick_kthreads(rsp);
+ if (rcu_cpu_stall_suppress)
+ return;
+
/* Only let one CPU complain about others per time interval. */
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -1335,6 +1360,11 @@ static void print_cpu_stall(struct rcu_state *rsp)
struct rcu_node *rnp = rcu_get_root(rsp);
long totqlen = 0;
+ /* Kick and suppress, if so configured. */
+ rcu_stall_kick_kthreads(rsp);
+ if (rcu_cpu_stall_suppress)
+ return;
+
/*
* OK, time to rat on ourselves...
* See Documentation/RCU/stallwarn.txt for info on how to debug
@@ -1379,8 +1409,10 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
unsigned long js;
struct rcu_node *rnp;
- if (rcu_cpu_stall_suppress || !rcu_gp_in_progress(rsp))
+ if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
+ !rcu_gp_in_progress(rsp))
return;
+ rcu_stall_kick_kthreads(rsp);
j = jiffies;
/*
@@ -2119,8 +2151,11 @@ static int __noreturn rcu_gp_kthread(void *arg)
}
ret = 0;
for (;;) {
- if (!ret)
+ if (!ret) {
rsp->jiffies_force_qs = jiffies + j;
+ WRITE_ONCE(rsp->jiffies_kick_kthreads,
+ jiffies + 3 * j);
+ }
trace_rcu_grace_period(rsp->name,
READ_ONCE(rsp->gpnum),
TPS("fqswait"));
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index df668c0f9e64..34d3973f7223 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -513,6 +513,8 @@ struct rcu_state {
unsigned long jiffies_force_qs; /* Time at which to invoke */
/* force_quiescent_state(). */
+ unsigned long jiffies_kick_kthreads; /* Time at which to kick */
+ /* kthreads, if configured. */
unsigned long n_force_qs; /* Number of calls to */
/* force_quiescent_state(). */
unsigned long n_force_qs_lh; /* ~Number of calls leaving */
--
2.5.2
^ permalink raw reply related
* [PATCH tip/core/rcu 6/6] rcu: Dump ftrace buffer when kicking grace-period kthread
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
If it is necessary to kick the grace-period kthread, that is a good
time to dump the trace buffer in order to learn why kicking was needed.
This commit therefore does the dump.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcu/tree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a739292be605..86edb92276d3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1266,6 +1266,7 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
j = READ_ONCE(rsp->jiffies_kick_kthreads);
if (time_after(jiffies, j) && rsp->gp_kthread) {
WARN_ONCE(1, "Kicking %s grace-period kthread\n", rsp->name);
+ rcu_ftrace_dump(DUMP_ALL);
wake_up_process(rsp->gp_kthread);
WRITE_ONCE(rsp->jiffies_kick_kthreads, j + HZ);
}
--
2.5.2
^ permalink raw reply related
* [PATCH tip/core/rcu 3/6] rcu: Make FQS schedule advance only if FQS happened
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
Currently, the force-quiescent-state (FQS) code in rcu_gp_kthread() can
advance the next FQS even if one was not executed last time. This can
happen due timeout-duration uncertainty. This commit therefore avoids
advancing the FQS schedule unless an FQS was just executed. In the
corner case where an FQS was not executed, but is due now, the code does
a one-jiffy wait.
This change prepares for kthread kicking.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcu/tree.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a327a253c178..6116cfad18ff 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2146,6 +2146,15 @@ static int __noreturn rcu_gp_kthread(void *arg)
TPS("fqsend"));
cond_resched_rcu_qs();
WRITE_ONCE(rsp->gp_activity, jiffies);
+ ret = 0; /* Force full wait till next FQS. */
+ j = jiffies_till_next_fqs;
+ if (j > HZ) {
+ j = HZ;
+ jiffies_till_next_fqs = HZ;
+ } else if (j < 1) {
+ j = 1;
+ jiffies_till_next_fqs = 1;
+ }
} else {
/* Deal with stray signal. */
cond_resched_rcu_qs();
@@ -2154,14 +2163,12 @@ static int __noreturn rcu_gp_kthread(void *arg)
trace_rcu_grace_period(rsp->name,
READ_ONCE(rsp->gpnum),
TPS("fqswaitsig"));
- }
- j = jiffies_till_next_fqs;
- if (j > HZ) {
- j = HZ;
- jiffies_till_next_fqs = HZ;
- } else if (j < 1) {
- j = 1;
- jiffies_till_next_fqs = 1;
+ ret = 1; /* Keep old FQS timing. */
+ j = jiffies;
+ if (time_after(jiffies, rsp->jiffies_force_qs))
+ j = 1;
+ else
+ j = rsp->jiffies_force_qs - j;
}
}
--
2.5.2
^ permalink raw reply related
* [PATCH tip/core/rcu 2/6] rcu: Awaken grace-period kthread when stalled
From: Paul E. McKenney @ 2016-04-12 15:02 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
bobby.prani, Paul E. McKenney
In-Reply-To: <20160412150157.GA19367@linux.vnet.ibm.com>
Recent kernels can fail to awaken the grace-period kthread for
quiescent-state forcing. This commit is a crude hack that does
a wakeup any time a stall is detected.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcu/tree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 531a328076bd..a327a253c178 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1224,8 +1224,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
rsp->gp_flags,
gp_state_getname(rsp->gp_state), rsp->gp_state,
rsp->gp_kthread ? rsp->gp_kthread->state : ~0);
- if (rsp->gp_kthread)
+ if (rsp->gp_kthread) {
sched_show_task(rsp->gp_kthread);
+ wake_up_process(rsp->gp_kthread);
+ }
}
}
--
2.5.2
^ permalink raw reply related
* Re: [PATCH 2/2] arm64: mm: make pfn always valid with flat memory
From: Catalin Marinas @ 2016-04-12 15:00 UTC (permalink / raw)
To: Xishi Qiu
Cc: Chen Feng, mark.rutland, dan.zhao, mhocko, puck.chen,
ard.biesheuvel, suzhuangluan, will.deacon, linux-kernel, linuxarm,
linux-mm, kirill.shutemov, rientjes, oliver.fu, akpm,
robin.murphy, yudongbin, linux-arm-kernel, saberlily.xia
In-Reply-To: <570B85B6.8000805@huawei.com>
On Mon, Apr 11, 2016 at 07:08:38PM +0800, Xishi Qiu wrote:
> On 2016/4/5 16:22, Chen Feng wrote:
>
> > Make the pfn always valid when using flat memory.
> > If the reserved memory is not align to memblock-size,
> > there will be holes in zone.
> >
> > This patch makes the memory in buddy always in the
> > array of mem-map.
> >
> > Signed-off-by: Chen Feng <puck.chen@hisilicon.com>
> > Signed-off-by: Fu Jun <oliver.fu@hisilicon.com>
> > ---
> > arch/arm64/mm/init.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index ea989d8..0e1d5b7 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -306,7 +306,8 @@ static void __init free_unused_memmap(void)
>
> How about let free_unused_memmap() support for CONFIG_SPARSEMEM_VMEMMAP?
We would need extra care to check that the memmap was actually allocated
in the first place.
--
Catalin
^ permalink raw reply
* Re: [PATCH 1/2] arm64: mem-model: add flatmem model for arm64
From: Catalin Marinas @ 2016-04-12 14:59 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Chen Feng, Mark Rutland, Dan Zhao, mhocko, Yiping Xu, puck.chen,
linux-mm@kvack.org, suzhuangluan, Will Deacon,
linux-kernel@vger.kernel.org, linuxarm, albert.lubing,
linux-arm-kernel@lists.infradead.org, David Rientjes, oliver.fu,
Andrew Morton, Laura Abbott, robin.murphy, kirill.shutemov,
saberlily.xia
In-Reply-To: <CAKv+Gu-cWWUi6fCiveqaZRVhGCpEasCLEs7wq6t+C-x65g4cgQ@mail.gmail.com>
On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> > On 2016/4/11 16:00, Ard Biesheuvel wrote:
> >> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
> >>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>>>> 0 1.5G 2G 3.5G 4G
> >>>>> | | | | |
> >>>>> +--------------+------+---------------+--------------+
> >>>>> | MEM | hole | MEM | IO (regs) |
> >>>>> +--------------+------+---------------+--------------+
> >>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
> >>>
> >>
> >> No, it is not. It may be covered by a section, but that does not mean
> >> sparsemem vmemmap will actually allocate backing for it. The
> >> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
> >> due to the fact that the backing is performed at PMD granularity.
> >>
> >> Please, could you share the contents of the vmemmap section in
> >> /sys/kernel/debug/kernel_page_tables of your system running with
> >> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
> >
> > Please see the pg-tables below.
> >
> > With sparse and vmemmap enable.
> >
> > ---[ vmemmap start ]---
> > 0xffffffbdc0200000-0xffffffbdc4800000 70M RW NX SHD AF UXN MEM/NORMAL
> > ---[ vmemmap end ]---
[...]
> > The board is 4GB, and the memap is 70MB
> > 1G memory --- 14MB mem_map array.
>
> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> pages assuming sizeof(struct page) == 64
>
> So you are losing 6 MB to rounding here, which I agree is significant.
> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> on 4k pages kernels, but perhaps we're better off asking the opinion
> of the other cc'ees.
IIRC, SECTION_SIZE_BITS was chosen to be the maximum sane value we were
thinking of at the time, assuming that 1GB RAM alignment to be fairly
normal. For the !SPARSEMEM_VMEMMAP case, we should probably be fine with
29 but, as Will said, we need to be careful with the page flags. At a
quick look, we have 25 page flags, 2 bits per zone, NUMA nodes and (48 -
section_size_bits) for the section width. We also need to take into
account 4 more bits for 52-bit PA support (ARMv8.2). So, without NUMA
nodes, we are currently at 49 bits used in page->flags.
For the SPARSEMEM_VMEMMAP case, we can decrease the SECTION_SIZE_BITS in
the MAX_ORDER limit.
An alternative would be to free the vmemmap holes later (but still keep
the vmemmap mapping alias). Yet another option would be to change the
sparse_mem_map_populate() logic get the actual section end rather than
always assuming PAGES_PER_SECTION. But I don't think any of these are
worth if we can safely reduce SECTION_SIZE_BITS.
--
Catalin
^ permalink raw reply
* Re: [PATCH] watchdog: dw_wdt: dont build for avr32
From: Andy Shevchenko @ 2016-04-12 14:58 UTC (permalink / raw)
To: Sudip Mukherjee
Cc: Guenter Roeck, Wim Van Sebroeck, Haavard Skinnemoen,
Hans-Christian Egtvedt, linux-kernel@vger.kernel.org,
kernel-testers, linux-watchdog
In-Reply-To: <570CFA90.4060100@gmail.com>
On Tue, Apr 12, 2016 at 4:39 PM, Sudip Mukherjee
<sudipm.mukherjee@gmail.com> wrote:
> On Tuesday 12 April 2016 06:36 PM, Guenter Roeck wrote:
>>
>> On 04/11/2016 10:51 PM, Sudip Mukherjee wrote:
>>>
>>> The build of avr32 allmodconfig fails with the error:
>>> ERROR: "__avr32_udiv64" [drivers/watchdog/kempld_wdt.ko] undefined!
>>>
>> This means there is a direct 64 bit divide operation in the driver,
>> which we should identify and fix.
This driver will quite likely never be used on AVR32. Do we need to
fix this due to some other architectures?
> yes, there is.
>
> in function: kempld_wdt_set_stage_timeout()
> remainder = do_div(stage_timeout64, prescaler);
> Any idea how to fix it?
Not easy, however, prescaler value is ((1 << 21) - 1) which someone
might consider as (1 << 21) with lost in precision.
Thus, shift on 20 bits right, add last bit to the value and shift on 1
bit right more.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 14:57 UTC (permalink / raw)
To: Sergei Shtylyov, Andrew Lunn; +Cc: netdev, linux-kernel, Gregory CLEMENT
In-Reply-To: <570CFE13.3040100@cogentembedded.com>
Le 12/04/2016 15:54, Sergei Shtylyov a écrit :
> Hello.
>
> On 4/12/2016 12:22 PM, Nicolas Ferre wrote:
>
>>>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>>>> there should be no need to frob the PHY reset in this driver anymore...
>>>>
>>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>>
>>>> ---
>>>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>>>> drivers/net/ethernet/cadence/macb.h | 1 -
>>>> 2 files changed, 18 deletions(-)
>>>>
>>>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>>>> ===================================================================
>>>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>>>> +++ net-next/drivers/net/ethernet/cadence/macb.c
> [...]
>>>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>>>> else
>>>> macb_get_hwaddr(bp);
>>>>
>>>> - /* Power up the PHY if there is a GPIO reset */
>>>> - phy_node = of_get_next_available_child(np, NULL);
>>>> - if (phy_node) {
>>>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>>>> -
>>>> - if (gpio_is_valid(gpio)) {
>>>> - bp->reset_gpio = gpio_to_desc(gpio);
>>>> - gpiod_direction_output(bp->reset_gpio, 1);
>>>
>>> Hi Sergei
>>>
>>> The code you are deleting would of ignored the flags in the gpio
>>
>> I don't parse this.
>
>> The code deleted does take the flag into account.
>
> Not really -- you need to call of_get_named_gpio_flags() (with a valid
> last argument) for that.
Yep,
>> And the DT property
>> associated to it seems correct to me (I mean, with proper flag
>> specification).
>
> It apparently is not as it have GPIO_ACTIVE_HIGH and the driver assumes
> active-low reset signal.
Yes, logic was inverted and... anyway, the flag never used for real...
Thanks Sergei.
No problem for me accepting a patch for the at91-vinco.dts.
Bye,
--
Nicolas Ferre
^ permalink raw reply
* Re: TCP reaching to maximum throughput after a long time
From: Eric Dumazet @ 2016-04-12 14:52 UTC (permalink / raw)
To: Machani, Yaniv, netdev
Cc: David S. Miller, Eric Dumazet, Neal Cardwell, Yuchung Cheng,
Nandita Dukkipati, open list, Kama, Meirav
In-Reply-To: <AE1C82FB3D0EC64DB1F752C81CBD110139100057@DFRE01.ent.ti.com>
On Tue, 2016-04-12 at 12:17 +0000, Machani, Yaniv wrote:
> Hi,
> After updating from Kernel 3.14 to Kernel 4.4 we have seen a TCP performance degradation over Wi-Fi.
> In 3.14 kernel, TCP got to its max throughout after less than a second, while in the 4.4 it is taking ~20-30 seconds.
> UDP TX/RX and TCP RX performance is as expected.
> We are using a Beagle Bone Black and a WiLink8 device.
>
> Were there any related changes that might cause such behavior ?
> Kernel configuration and sysctl values were compared, but no significant differences have been found.
>
> See a log of the behavior below :
> -----------------------------------------------------------
> Client connecting to 10.2.46.5, TCP port 5001
> TCP window size: 320 KByte (WARNING: requested 256 KByte)
> ------------------------------------------------------------
> [ 3] local 10.2.46.6 port 49282 connected with 10.2.46.5 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 5.75 MBytes 48.2 Mbits/sec
> [ 3] 1.0- 2.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 2.0- 3.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 3.0- 4.0 sec 6.50 MBytes 54.5 Mbits/sec
> [ 3] 4.0- 5.0 sec 6.75 MBytes 56.6 Mbits/sec
> [ 3] 5.0- 6.0 sec 3.38 MBytes 28.3 Mbits/sec
> [ 3] 6.0- 7.0 sec 6.38 MBytes 53.5 Mbits/sec
> [ 3] 7.0- 8.0 sec 6.88 MBytes 57.7 Mbits/sec
> [ 3] 8.0- 9.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 9.0-10.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 10.0-11.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 11.0-12.0 sec 7.25 MBytes 60.8 Mbits/sec
> [ 3] 12.0-13.0 sec 7.12 MBytes 59.8 Mbits/sec
> [ 3] 13.0-14.0 sec 7.25 MBytes 60.8 Mbits/sec
> [ 3] 14.0-15.0 sec 7.62 MBytes 64.0 Mbits/sec
> [ 3] 15.0-16.0 sec 7.88 MBytes 66.1 Mbits/sec
> [ 3] 16.0-17.0 sec 8.12 MBytes 68.2 Mbits/sec
> [ 3] 17.0-18.0 sec 8.25 MBytes 69.2 Mbits/sec
> [ 3] 18.0-19.0 sec 8.50 MBytes 71.3 Mbits/sec
> [ 3] 19.0-20.0 sec 8.88 MBytes 74.4 Mbits/sec
> [ 3] 20.0-21.0 sec 8.75 MBytes 73.4 Mbits/sec
> [ 3] 21.0-22.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 22.0-23.0 sec 8.75 MBytes 73.4 Mbits/sec
> [ 3] 23.0-24.0 sec 8.50 MBytes 71.3 Mbits/sec
> [ 3] 24.0-25.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 25.0-26.0 sec 8.62 MBytes 72.4 Mbits/sec
> [ 3] 26.0-27.0 sec 8.62 MBytes 72.4 Mbits/sec
>
CC netdev, where this is better discussed.
This could be a lot of different factors, and caused by a sender
problem, a receiver problem, ...
TCP behavior depends on the drivers, so maybe a change there can explain
this.
You could capture the first 5000 frames of the flow and post the pcap ?
(-s 128 to capture only the headers)
tcpdump -p -s 128 -i eth0 -c 5000 host 10.2.46.5 -w flow.pcap
Also, while test is running, you could fetch
ss -temoi dst 10.2.46.5:5001
^ permalink raw reply
* Re: mmotm woes, mainly compaction
From: Michal Hocko @ 2016-04-12 14:51 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, Vlastimil Babka, linux-kernel, linux-mm
In-Reply-To: <20160412121020.GC10771@dhcp22.suse.cz>
On Tue 12-04-16 14:10:20, Michal Hocko wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d1da0ceaf1e..d80c9755ffc7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3030,8 +3030,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
> * failure could be caused by weak migration mode.
> */
> if (compaction_failed(compact_result)) {
> - if (*migrate_mode == MIGRATE_ASYNC) {
> - *migrate_mode = MIGRATE_SYNC_LIGHT;
> + if (*migrate_mode < MIGRATE_SYNC) {
> + *migrate_mode++;
> return true;
this should be (*migrate_mode)++ of course.
> }
> return false;
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [RFC PATCH 12/12] IMA: Use the the system trusted keyrings instead of .ima_mok [ver #4]
From: Mimi Zohar @ 2016-04-12 14:50 UTC (permalink / raw)
To: David Howells; +Cc: linux-security-module, keyrings, linux-kernel
In-Reply-To: <20160407085922.29311.38135.stgit@warthog.procyon.org.uk>
On Thu, 2016-04-07 at 09:59 +0100, David Howells wrote:
> Add a config option (IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY)
> that, when enabled, allows keys to be added to the IMA keyrings by
> userspace - with the restriction that each must be signed by a key in the
> system trusted keyrings.
>
> EPERM will be returned if this option is disabled, ENOKEY will be returned if
> no authoritative key can be found and EKEYREJECTED will be returned if the
> signature doesn't match. Other errors such as ENOPKG may also be returned.
>
> If this new option is enabled, the builtin system keyring is searched, as is
> the secondary system keyring if that is also enabled. Intermediate keys
> between the builtin system keyring and the key being added can be added to
> the secondary keyring (which replaces .ima_mok) to form a trust chain -
> provided they are also validly signed by a key in one of the trusted keyrings.
>
> The .ima_mok keyring is then removed and the IMA blacklist keyring gets its
> own config option (IMA_BLACKLIST_KEYRING).
>
> Signed-off-by: David Howells <dhowells@redhat.com>
Eventually we'll want to be able to load keys directly on the IMA
keyring, but until that code is added, this is good. Thanks!
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Mimi
> ---
>
> include/keys/system_keyring.h | 13 ++-----------
> security/integrity/digsig.c | 30 ++++--------------------------
> security/integrity/ima/Kconfig | 35 ++++++++++++++++++++++-------------
> security/integrity/ima/Makefile | 2 +-
> security/integrity/ima/ima_mok.c | 17 ++++-------------
> 5 files changed, 33 insertions(+), 64 deletions(-)
>
> diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
> index 614424029de7..fbd4647767e9 100644
> --- a/include/keys/system_keyring.h
> +++ b/include/keys/system_keyring.h
> @@ -33,28 +33,19 @@ extern int restrict_link_by_builtin_and_secondary_trusted(
> #define restrict_link_by_builtin_and_secondary_trusted restrict_link_by_builtin_trusted
> #endif
>
> -#ifdef CONFIG_IMA_MOK_KEYRING
> -extern struct key *ima_mok_keyring;
> +#ifdef CONFIG_IMA_BLACKLIST_KEYRING
> extern struct key *ima_blacklist_keyring;
>
> -static inline struct key *get_ima_mok_keyring(void)
> -{
> - return ima_mok_keyring;
> -}
> static inline struct key *get_ima_blacklist_keyring(void)
> {
> return ima_blacklist_keyring;
> }
> #else
> -static inline struct key *get_ima_mok_keyring(void)
> -{
> - return NULL;
> -}
> static inline struct key *get_ima_blacklist_keyring(void)
> {
> return NULL;
> }
> -#endif /* CONFIG_IMA_MOK_KEYRING */
> +#endif /* CONFIG_IMA_BLACKLIST_KEYRING */
>
>
> #endif /* _KEYS_SYSTEM_KEYRING_H */
> diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
> index 98ee4c752cf5..4304372b323f 100644
> --- a/security/integrity/digsig.c
> +++ b/security/integrity/digsig.c
> @@ -42,32 +42,10 @@ static bool init_keyring __initdata = true;
> static bool init_keyring __initdata;
> #endif
>
> -#ifdef CONFIG_SYSTEM_TRUSTED_KEYRING
> -/*
> - * Restrict the addition of keys into the IMA keyring.
> - *
> - * Any key that needs to go in .ima keyring must be signed by CA in
> - * either .system or .ima_mok keyrings.
> - */
> -static int restrict_link_by_ima_mok(struct key *keyring,
> - const struct key_type *type,
> - const union key_payload *payload)
> -{
> - int ret;
> -
> - ret = restrict_link_by_builtin_trusted(keyring, type, payload);
> - if (ret != -ENOKEY)
> - return ret;
> -
> - return restrict_link_by_signature(get_ima_mok_keyring(),
> - type, payload);
> -}
> +#ifdef CONFIG_IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY
> +#define restrict_link_to_ima restrict_link_by_builtin_and_secondary_trusted
> #else
> -/*
> - * If there's no system trusted keyring, then keys cannot be loaded into
> - * .ima_mok and added keys cannot be marked trusted.
> - */
> -#define restrict_link_by_ima_mok restrict_link_reject
> +#define restrict_link_to_ima restrict_link_by_builtin_trusted
> #endif
>
> int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
> @@ -114,7 +92,7 @@ int __init integrity_init_keyring(const unsigned int id)
> KEY_USR_VIEW | KEY_USR_READ |
> KEY_USR_WRITE | KEY_USR_SEARCH),
> KEY_ALLOC_NOT_IN_QUOTA,
> - restrict_link_by_ima_mok, NULL);
> + restrict_link_to_ima, NULL);
> if (IS_ERR(keyring[id])) {
> err = PTR_ERR(keyring[id]);
> pr_info("Can't allocate %s keyring (%d)\n",
> diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
> index e54a8a8dae94..aab9b0a53edf 100644
> --- a/security/integrity/ima/Kconfig
> +++ b/security/integrity/ima/Kconfig
> @@ -155,23 +155,32 @@ config IMA_TRUSTED_KEYRING
>
> This option is deprecated in favor of INTEGRITY_TRUSTED_KEYRING
>
> -config IMA_MOK_KEYRING
> - bool "Create IMA machine owner keys (MOK) and blacklist keyrings"
> +config IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY
> + bool "Permit keys validly signed by a built-in or secondary CA cert (EXPERIMENTAL)"
> + depends on SYSTEM_TRUSTED_KEYRING
> + depends on SECONDARY_TRUSTED_KEYRING
> + select INTEGRITY_TRUSTED_KEYRING
> + default n
> + help
> + Keys may be added to the IMA or IMA blacklist keyrings, if the
> + key is validly signed by a CA cert in the system built-in or
> + secondary trusted keyrings.
> +
> + Intermediate keys between those the kernel has compiled in and the
> + IMA keys to be added may be added to the system secondary keyring,
> + provided they are validly signed by a key already resident in the
> + built-in or secondary trusted keyrings.
> +
> +config IMA_BLACKLIST_KEYRING
> + bool "Create IMA machine owner blacklist keyrings (EXPERIMENTAL)"
> depends on SYSTEM_TRUSTED_KEYRING
> depends on IMA_TRUSTED_KEYRING
> default n
> help
> - This option creates IMA MOK and blacklist keyrings. IMA MOK is an
> - intermediate keyring that sits between .system and .ima keyrings,
> - effectively forming a simple CA hierarchy. To successfully import a
> - key into .ima_mok it must be signed by a key which CA is in .system
> - keyring. On turn any key that needs to go in .ima keyring must be
> - signed by CA in either .system or .ima_mok keyrings. IMA MOK is empty
> - at kernel boot.
> -
> - IMA blacklist keyring contains all revoked IMA keys. It is consulted
> - before any other keyring. If the search is successful the requested
> - operation is rejected and error is returned to the caller.
> + This option creates an IMA blacklist keyring, which contains all
> + revoked IMA keys. It is consulted before any other keyring. If
> + the search is successful the requested operation is rejected and
> + an error is returned to the caller.
>
> config IMA_LOAD_X509
> bool "Load X509 certificate onto the '.ima' trusted keyring"
> diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
> index a8539f9e060f..9aeaedad1e2b 100644
> --- a/security/integrity/ima/Makefile
> +++ b/security/integrity/ima/Makefile
> @@ -8,4 +8,4 @@ obj-$(CONFIG_IMA) += ima.o
> ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
> ima_policy.o ima_template.o ima_template_lib.o
> ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
> -obj-$(CONFIG_IMA_MOK_KEYRING) += ima_mok.o
> +obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
> diff --git a/security/integrity/ima/ima_mok.c b/security/integrity/ima/ima_mok.c
> index 2988726d30d6..74a279957464 100644
> --- a/security/integrity/ima/ima_mok.c
> +++ b/security/integrity/ima/ima_mok.c
> @@ -20,23 +20,14 @@
> #include <keys/system_keyring.h>
>
>
> -struct key *ima_mok_keyring;
> struct key *ima_blacklist_keyring;
>
> /*
> - * Allocate the IMA MOK and blacklist keyrings
> + * Allocate the IMA blacklist keyring
> */
> __init int ima_mok_init(void)
> {
> - pr_notice("Allocating IMA MOK and blacklist keyrings.\n");
> -
> - ima_mok_keyring = keyring_alloc(".ima_mok",
> - KUIDT_INIT(0), KGIDT_INIT(0), current_cred(),
> - (KEY_POS_ALL & ~KEY_POS_SETATTR) |
> - KEY_USR_VIEW | KEY_USR_READ |
> - KEY_USR_WRITE | KEY_USR_SEARCH,
> - KEY_ALLOC_NOT_IN_QUOTA,
> - restrict_link_by_builtin_trusted, NULL);
> + pr_notice("Allocating IMA blacklist keyring.\n");
>
> ima_blacklist_keyring = keyring_alloc(".ima_blacklist",
> KUIDT_INIT(0), KGIDT_INIT(0), current_cred(),
> @@ -46,8 +37,8 @@ __init int ima_mok_init(void)
> KEY_ALLOC_NOT_IN_QUOTA,
> restrict_link_by_builtin_trusted, NULL);
>
> - if (IS_ERR(ima_mok_keyring) || IS_ERR(ima_blacklist_keyring))
> - panic("Can't allocate IMA MOK or blacklist keyrings.");
> + if (IS_ERR(ima_blacklist_keyring))
> + panic("Can't allocate IMA blacklist keyring.");
>
> set_bit(KEY_FLAG_KEEP, &ima_blacklist_keyring->flags);
> return 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH v6] drivers: amba: properly handle devices with power domains
From: Ulf Hansson @ 2016-04-12 14:49 UTC (permalink / raw)
To: Marek Szyprowski
Cc: linux-samsung-soc, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Russell King - ARM Linux,
Krzysztof Kozlowski, Bartlomiej Zolnierkiewicz
In-Reply-To: <1460470193-24928-1-git-send-email-m.szyprowski@samsung.com>
[...]
>
> +static int __init amba_deferred_device_init(void)
> +{
> + struct deferred_device *ddev, *tmp;
> +
> + list_for_each_entry_safe(ddev, tmp, &deferred_devices, node) {
> + int ret = amba_device_try_add(ddev->dev, ddev->parent);
> +
> + if (ret == -EPROBE_DEFER)
> + continue;
What happens with devices that still fails to be added here? Should we
schedule a periodic work to re-try?
> +
> + list_del_init(&ddev->node);
> + kfree(ddev);
> + }
> +
> + return 0;
> +}
> +late_initcall(amba_deferred_device_init);
> +
> static struct amba_device *
> amba_aphb_device_add(struct device *parent, const char *name,
> resource_size_t base, size_t size, int irq1, int irq2,
> --
> 1.9.2
>
I assume there are other similar buses like AMBA that needs
enumeration before it can bind an appropriate driver for its device.
Perhaps that's a good reason to make this new "device add re-try"
mechanism a generic thing supported by the driver core?
Kind regards
Uffe
^ permalink raw reply
* [PATCH 2/2] arm64: vhe: Verify CPU Exception Levels
From: Suzuki K Poulose @ 2016-04-12 14:46 UTC (permalink / raw)
To: linux-arm-kernel
Cc: marc.zyngier, will.deacon, catalin.marinas, linux-kernel,
mark.rutland, Suzuki K Poulose, Christoffer Dall
In-Reply-To: <1460472361-28419-1-git-send-email-suzuki.poulose@arm.com>
With a VHE capable CPU, kernel can run at EL2 and is a decided at early
boot. If some of the CPUs didn't start it EL2 or doesn't have VHE, we
could have CPUs running at different exception levels, all in the same
kernel! This patch adds an early check for the secondary CPUs to detect
such situations.
For each non-boot CPU add a sanity check to make sure we don't have
different run levels w.r.t the boot CPU. We save the information on
whether the boot CPU is running in hyp mode or not and ensure the
remaining CPUs match it.
Applies on 4.6-rc3.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
arch/arm64/include/asm/virt.h | 14 ++++++++++++++
arch/arm64/kernel/cpufeature.c | 1 +
arch/arm64/kernel/smp.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 9f22dd6..b346d76 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -60,6 +60,20 @@ static inline bool is_kernel_in_hyp_mode(void)
return el == CurrentEL_EL2;
}
+#ifdef CONFIG_ARM64_VHE
+
+extern bool boot_cpu_hyp_mode;
+static inline bool is_boot_cpu_in_hyp_mode(void)
+{
+ return boot_cpu_hyp_mode;
+}
+
+extern void verify_cpu_run_el(void);
+
+#else
+static inline void verify_cpu_run_el(void) {}
+#endif
+
/* The section containing the hypervisor text */
extern char __hyp_text_start[];
extern char __hyp_text_end[];
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 943f514..91088de 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -908,6 +908,7 @@ static u64 __raw_read_system_reg(u32 sys_id)
*/
static void check_early_cpu_features(void)
{
+ verify_cpu_run_el();
verify_cpu_asid_bits();
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b2d5f4e..6825225 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -75,6 +75,38 @@ enum ipi_msg_type {
IPI_WAKEUP
};
+#ifdef CONFIG_ARM64_VHE
+
+/* Whether the boot CPU is running in HYP mode or not*/
+bool boot_cpu_hyp_mode;
+
+static inline void save_boot_cpu_run_el(void)
+{
+ boot_cpu_hyp_mode = is_kernel_in_hyp_mode();
+}
+
+/*
+ * Verify that a secondary CPU is running the kernel at the same
+ * EL as that of the boot CPU.
+ */
+void verify_cpu_run_el(void)
+{
+ bool in_el2 = is_kernel_in_hyp_mode();
+ bool boot_cpu_el2 = is_boot_cpu_in_hyp_mode();
+
+ if (in_el2 ^ boot_cpu_el2) {
+ pr_crit("CPU%d: mismatched Exception Level(EL%d) with boot CPU(EL%d)\n",
+ smp_processor_id(),
+ in_el2 ? 2 : 1,
+ boot_cpu_el2 ? 2 : 1);
+ cpu_panic_kernel();
+ }
+}
+
+#else
+static inline void save_boot_cpu_run_el(void) {}
+#endif
+
#ifdef CONFIG_HOTPLUG_CPU
static int op_cpu_kill(unsigned int cpu);
#else
@@ -401,6 +433,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
void __init smp_prepare_boot_cpu(void)
{
cpuinfo_store_boot_cpu();
+ save_boot_cpu_run_el();
set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
}
--
1.7.9.5
^ permalink raw reply related
* [PATCH 1/2] arm64: Add cpu_panic_kernel helper
From: Suzuki K Poulose @ 2016-04-12 14:46 UTC (permalink / raw)
To: linux-arm-kernel
Cc: marc.zyngier, will.deacon, catalin.marinas, linux-kernel,
mark.rutland, Suzuki K Poulose
During the activation of a secondary CPU, we could report serious
configuration issues and hence request to crash the kernel. We do
this for CPU ASID bit check now. We will need it also for handling
mismatched exception levels for the CPUs with VHE. Hence, add a
helper to do the same for reusability.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
arch/arm64/include/asm/smp.h | 11 +++++++++++
arch/arm64/mm/context.c | 3 +--
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 817a067..433e504 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -113,6 +113,17 @@ static inline void update_cpu_boot_status(int val)
dsb(ishst);
}
+/*
+ * The calling secondary CPU has detected serious configuration mismatch,
+ * which calls for a kernel panic. Update the boot status and park the calling
+ * CPU.
+ */
+static inline void cpu_panic_kernel(void)
+{
+ update_cpu_boot_status(CPU_PANIC_KERNEL);
+ cpu_park_loop();
+}
+
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index c90c3c5..b7b3978 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -75,8 +75,7 @@ void verify_cpu_asid_bits(void)
*/
pr_crit("CPU%d: smaller ASID size(%u) than boot CPU (%u)\n",
smp_processor_id(), asid, asid_bits);
- update_cpu_boot_status(CPU_PANIC_KERNEL);
- cpu_park_loop();
+ cpu_panic_kernel();
}
}
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Nicolas Ferre @ 2016-04-12 14:45 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Sergei Shtylyov, netdev, linux-kernel, Gregory CLEMENT
In-Reply-To: <20160412134001.GB29895@lunn.ch>
Le 12/04/2016 15:40, Andrew Lunn a écrit :
> On Tue, Apr 12, 2016 at 11:22:10AM +0200, Nicolas Ferre wrote:
>> Le 11/04/2016 04:28, Andrew Lunn a écrit :
>>> On Sat, Apr 09, 2016 at 01:25:03AM +0300, Sergei Shtylyov wrote:
>>>> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
>>>> there should be no need to frob the PHY reset in this driver anymore...
>>>>
>>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>>>
>>>> ---
>>>> drivers/net/ethernet/cadence/macb.c | 17 -----------------
>>>> drivers/net/ethernet/cadence/macb.h | 1 -
>>>> 2 files changed, 18 deletions(-)
>>>>
>>>> Index: net-next/drivers/net/ethernet/cadence/macb.c
>>>> ===================================================================
>>>> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
>>>> +++ net-next/drivers/net/ethernet/cadence/macb.c
>>>> @@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
>>>> = macb_clk_init;
>>>> int (*init)(struct platform_device *) = macb_init;
>>>> struct device_node *np = pdev->dev.of_node;
>>>> - struct device_node *phy_node;
>>>> const struct macb_config *macb_config = NULL;
>>>> struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
>>>> unsigned int queue_mask, num_queues;
>>>> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
>>>> else
>>>> macb_get_hwaddr(bp);
>>>>
>>>> - /* Power up the PHY if there is a GPIO reset */
>>>> - phy_node = of_get_next_available_child(np, NULL);
>>>> - if (phy_node) {
>>>> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
>>>> -
>>>> - if (gpio_is_valid(gpio)) {
>>>> - bp->reset_gpio = gpio_to_desc(gpio);
>>>> - gpiod_direction_output(bp->reset_gpio, 1);
>>>
>>> Hi Sergei
>>>
>>> The code you are deleting would of ignored the flags in the gpio
>> I don't parse this.
>>
>> The code deleted does take the flag into account. And the DT property
>> associated to it seems correct to me (I mean, with proper flag
>> specification).
>
> Hi Nicolas
>
> of_get_named_gpio() does not do anything with the flags. So for
> example,
>
> gpios = <&gpio0 12 GPIO_ACTIVE_LOW>;
>
> the GPIO_ACTIVE_LOW would be ignored. If you want the flags to be
> respected, you need to use the gpiod API for all calls, in particular,
> you need to use something which calls gpiod_get_index(), since that is
> the only function to call gpiod_parse_flags() to translate
> GPIO_ACTIVE_LOW into a flag within the gpio descriptor.
Ok, I remember what confused me now: this code, used to be something around:
devm_gpiod_get_optional(&bp->pdev->dev, "phy-reset", GPIOD_OUT_HIGH);
before it has been changed to the chunk above... So, yes, the DT flag
was not handled anyway...
Sorry for the noise and thanks for the clarification.
Bye,
--
Nicolas Ferre
^ permalink raw reply
* Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states
From: Chris J Arges @ 2016-04-12 14:44 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Jiri Kosina, Jessica Yu, Miroslav Benes, linux-kernel,
live-patching, Vojtech Pavlik
In-Reply-To: <e4dbf8827421c83d45427c28b3a31516c59cc6a5.1458933243.git.jpoimboe@redhat.com>
On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> Once we have a consistency model, patches and their objects will be
> enabled and disabled at different times. For example, when a patch is
> disabled, its loaded objects' funcs can remain registered with ftrace
> indefinitely until the unpatching operation is complete and they're no
> longer in use.
>
> It's less confusing if we give them different names: patches can be
> enabled or disabled; objects (and their funcs) can be patched or
> unpatched:
>
> - Enabled means that a patch is logically enabled (but not necessarily
> fully applied).
>
> - Patched means that an object's funcs are registered with ftrace and
> added to the klp_ops func stack.
>
> Also, since these states are binary, represent them with booleans
> instead of ints.
>
Josh,
Awesome work here first of all!
Looking through the patchset a bit I see the following bools:
- functions: patched, transitioning
- objects: patched
- patches: enabled
It seems this reflects the following states at a patch level:
disabled - module inserted, not yet logically enabled
enabled - logically enabled, but not all objects/functions patched
transitioning - objects/functions are being applied or reverted
patched - all objects/functions patched
However each object and function could have the same state and the parent object
just reflects the 'aggregate state'. For example if all funcs in an object are
patched then the object is also patched.
Perhaps we need more states (or maybe there will be more in the future), but
wouldn't this just be easier to have something like for each patch, object, and
function?
enum klp_state{
KLP_DISABLED,
KLP_ENABLED,
KLP_TRANSITION,
KLP_PATCHED,
}
I'm happy to help out too.
--chris
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
> include/linux/livepatch.h | 17 ++++-------
> kernel/livepatch/core.c | 72 +++++++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 47 deletions(-)
>
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index bd830d5..6d45dc7 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -28,11 +28,6 @@
>
> #include <asm/livepatch.h>
>
> -enum klp_state {
> - KLP_DISABLED,
> - KLP_ENABLED
> -};
> -
> /**
> * struct klp_func - function structure for live patching
> * @old_name: name of the function to be patched
> @@ -41,8 +36,8 @@ enum klp_state {
> * can be found (optional)
> * @old_addr: the address of the function being patched
> * @kobj: kobject for sysfs resources
> - * @state: tracks function-level patch application state
> * @stack_node: list node for klp_ops func_stack list
> + * @patched: the func has been added to the klp_ops list
> */
> struct klp_func {
> /* external */
> @@ -60,8 +55,8 @@ struct klp_func {
> /* internal */
> unsigned long old_addr;
> struct kobject kobj;
> - enum klp_state state;
> struct list_head stack_node;
> + bool patched;
> };
>
> /**
> @@ -90,7 +85,7 @@ struct klp_reloc {
> * @kobj: kobject for sysfs resources
> * @mod: kernel module associated with the patched object
> * (NULL for vmlinux)
> - * @state: tracks object-level patch application state
> + * @patched: the object's funcs have been add to the klp_ops list
> */
> struct klp_object {
> /* external */
> @@ -101,7 +96,7 @@ struct klp_object {
> /* internal */
> struct kobject kobj;
> struct module *mod;
> - enum klp_state state;
> + bool patched;
> };
>
> /**
> @@ -110,7 +105,7 @@ struct klp_object {
> * @objs: object entries for kernel objects to be patched
> * @list: list node for global list of registered patches
> * @kobj: kobject for sysfs resources
> - * @state: tracks patch-level application state
> + * @enabled: the patch is enabled (but operation may be incomplete)
> */
> struct klp_patch {
> /* external */
> @@ -120,7 +115,7 @@ struct klp_patch {
> /* internal */
> struct list_head list;
> struct kobject kobj;
> - enum klp_state state;
> + bool enabled;
> };
>
> #define klp_for_each_object(patch, obj) \
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index d68fbf6..be1e106 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -298,11 +298,11 @@ unlock:
> rcu_read_unlock();
> }
>
> -static void klp_disable_func(struct klp_func *func)
> +static void klp_unpatch_func(struct klp_func *func)
> {
> struct klp_ops *ops;
>
> - if (WARN_ON(func->state != KLP_ENABLED))
> + if (WARN_ON(!func->patched))
> return;
> if (WARN_ON(!func->old_addr))
> return;
> @@ -322,10 +322,10 @@ static void klp_disable_func(struct klp_func *func)
> list_del_rcu(&func->stack_node);
> }
>
> - func->state = KLP_DISABLED;
> + func->patched = false;
> }
>
> -static int klp_enable_func(struct klp_func *func)
> +static int klp_patch_func(struct klp_func *func)
> {
> struct klp_ops *ops;
> int ret;
> @@ -333,7 +333,7 @@ static int klp_enable_func(struct klp_func *func)
> if (WARN_ON(!func->old_addr))
> return -EINVAL;
>
> - if (WARN_ON(func->state != KLP_DISABLED))
> + if (WARN_ON(func->patched))
> return -EINVAL;
>
> ops = klp_find_ops(func->old_addr);
> @@ -372,7 +372,7 @@ static int klp_enable_func(struct klp_func *func)
> list_add_rcu(&func->stack_node, &ops->func_stack);
> }
>
> - func->state = KLP_ENABLED;
> + func->patched = true;
>
> return 0;
>
> @@ -383,36 +383,36 @@ err:
> return ret;
> }
>
> -static void klp_disable_object(struct klp_object *obj)
> +static void klp_unpatch_object(struct klp_object *obj)
> {
> struct klp_func *func;
>
> klp_for_each_func(obj, func)
> - if (func->state == KLP_ENABLED)
> - klp_disable_func(func);
> + if (func->patched)
> + klp_unpatch_func(func);
>
> - obj->state = KLP_DISABLED;
> + obj->patched = false;
> }
>
> -static int klp_enable_object(struct klp_object *obj)
> +static int klp_patch_object(struct klp_object *obj)
> {
> struct klp_func *func;
> int ret;
>
> - if (WARN_ON(obj->state != KLP_DISABLED))
> + if (WARN_ON(obj->patched))
> return -EINVAL;
>
> if (WARN_ON(!klp_is_object_loaded(obj)))
> return -EINVAL;
>
> klp_for_each_func(obj, func) {
> - ret = klp_enable_func(func);
> + ret = klp_patch_func(func);
> if (ret) {
> - klp_disable_object(obj);
> + klp_unpatch_object(obj);
> return ret;
> }
> }
> - obj->state = KLP_ENABLED;
> + obj->patched = true;
>
> return 0;
> }
> @@ -423,17 +423,17 @@ static int __klp_disable_patch(struct klp_patch *patch)
>
> /* enforce stacking: only the last enabled patch can be disabled */
> if (!list_is_last(&patch->list, &klp_patches) &&
> - list_next_entry(patch, list)->state == KLP_ENABLED)
> + list_next_entry(patch, list)->enabled)
> return -EBUSY;
>
> pr_notice("disabling patch '%s'\n", patch->mod->name);
>
> klp_for_each_object(patch, obj) {
> - if (obj->state == KLP_ENABLED)
> - klp_disable_object(obj);
> + if (obj->patched)
> + klp_unpatch_object(obj);
> }
>
> - patch->state = KLP_DISABLED;
> + patch->enabled = false;
>
> return 0;
> }
> @@ -457,7 +457,7 @@ int klp_disable_patch(struct klp_patch *patch)
> goto err;
> }
>
> - if (patch->state == KLP_DISABLED) {
> + if (!patch->enabled) {
> ret = -EINVAL;
> goto err;
> }
> @@ -475,12 +475,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
> struct klp_object *obj;
> int ret;
>
> - if (WARN_ON(patch->state != KLP_DISABLED))
> + if (WARN_ON(patch->enabled))
> return -EINVAL;
>
> /* enforce stacking: only the first disabled patch can be enabled */
> if (patch->list.prev != &klp_patches &&
> - list_prev_entry(patch, list)->state == KLP_DISABLED)
> + !list_prev_entry(patch, list)->enabled)
> return -EBUSY;
>
> pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
> @@ -492,12 +492,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
> if (!klp_is_object_loaded(obj))
> continue;
>
> - ret = klp_enable_object(obj);
> + ret = klp_patch_object(obj);
> if (ret)
> goto unregister;
> }
>
> - patch->state = KLP_ENABLED;
> + patch->enabled = true;
>
> return 0;
>
> @@ -555,20 +555,20 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
> if (ret)
> return -EINVAL;
>
> - if (val != KLP_DISABLED && val != KLP_ENABLED)
> + if (val > 1)
> return -EINVAL;
>
> patch = container_of(kobj, struct klp_patch, kobj);
>
> mutex_lock(&klp_mutex);
>
> - if (val == patch->state) {
> + if (patch->enabled == val) {
> /* already in requested state */
> ret = -EINVAL;
> goto err;
> }
>
> - if (val == KLP_ENABLED) {
> + if (val) {
> ret = __klp_enable_patch(patch);
> if (ret)
> goto err;
> @@ -593,7 +593,7 @@ static ssize_t enabled_show(struct kobject *kobj,
> struct klp_patch *patch;
>
> patch = container_of(kobj, struct klp_patch, kobj);
> - return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->state);
> + return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->enabled);
> }
>
> static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
> @@ -684,7 +684,7 @@ static void klp_free_patch(struct klp_patch *patch)
> static int klp_init_func(struct klp_object *obj, struct klp_func *func)
> {
> INIT_LIST_HEAD(&func->stack_node);
> - func->state = KLP_DISABLED;
> + func->patched = false;
>
> /* The format for the sysfs directory is <function,sympos> where sympos
> * is the nth occurrence of this symbol in kallsyms for the patched
> @@ -729,7 +729,7 @@ static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
> if (!obj->funcs)
> return -EINVAL;
>
> - obj->state = KLP_DISABLED;
> + obj->patched = false;
> obj->mod = NULL;
>
> klp_find_object_module(obj);
> @@ -770,7 +770,7 @@ static int klp_init_patch(struct klp_patch *patch)
>
> mutex_lock(&klp_mutex);
>
> - patch->state = KLP_DISABLED;
> + patch->enabled = false;
>
> ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
> klp_root_kobj, "%s", patch->mod->name);
> @@ -816,7 +816,7 @@ int klp_unregister_patch(struct klp_patch *patch)
> goto out;
> }
>
> - if (patch->state == KLP_ENABLED) {
> + if (patch->enabled) {
> ret = -EBUSY;
> goto out;
> }
> @@ -897,13 +897,13 @@ int klp_module_coming(struct module *mod)
> goto err;
> }
>
> - if (patch->state == KLP_DISABLED)
> + if (!patch->enabled)
> break;
>
> pr_notice("applying patch '%s' to loading module '%s'\n",
> patch->mod->name, obj->mod->name);
>
> - ret = klp_enable_object(obj);
> + ret = klp_patch_object(obj);
> if (ret) {
> pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> patch->mod->name, obj->mod->name, ret);
> @@ -954,10 +954,10 @@ void klp_module_going(struct module *mod)
> if (!klp_is_module(obj) || strcmp(obj->name, mod->name))
> continue;
>
> - if (patch->state != KLP_DISABLED) {
> + if (patch->enabled) {
> pr_notice("reverting patch '%s' on unloading module '%s'\n",
> patch->mod->name, obj->mod->name);
> - klp_disable_object(obj);
> + klp_unpatch_object(obj);
> }
>
> klp_free_object_loaded(obj);
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe live-patching" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH 1/2] arm64: mem-model: add flatmem model for arm64
From: Catalin Marinas @ 2016-04-12 14:44 UTC (permalink / raw)
To: Will Deacon
Cc: Ard Biesheuvel, Mark Rutland, Dan Zhao, mhocko, Yiping Xu,
puck.chen, linux-mm@kvack.org, Chen Feng, suzhuangluan,
David Rientjes, linux-kernel@vger.kernel.org, linuxarm,
albert.lubing, linux-arm-kernel@lists.infradead.org, oliver.fu,
Andrew Morton, Laura Abbott, robin.murphy, kirill.shutemov,
saberlily.xia
In-Reply-To: <20160411104013.GG15729@arm.com>
On Mon, Apr 11, 2016 at 11:40:13AM +0100, Will Deacon wrote:
> On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> > On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> > > Please see the pg-tables below.
> > >
> > >
> > > With sparse and vmemmap enable.
> > >
> > > ---[ vmemmap start ]---
> > > 0xffffffbdc0200000-0xffffffbdc4800000 70M RW NX SHD AF UXN MEM/NORMAL
> > > ---[ vmemmap end ]---
> > >
> >
> > OK, I see what you mean now. Sorry for taking so long to catch up.
> >
> > > The board is 4GB, and the memap is 70MB
> > > 1G memory --- 14MB mem_map array.
> >
> > No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> > pages assuming sizeof(struct page) == 64
> >
> > So you are losing 6 MB to rounding here, which I agree is significant.
> > I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> > on 4k pages kernels, but perhaps we're better off asking the opinion
> > of the other cc'ees.
>
> You need to be really careful making SECTION_SIZE_BITS smaller because
> it has a direct correlation on the use of page->flags and you can end up
> running out of bits fairly easily.
With SPARSEMEM_VMEMMAP, SECTION_SIZE_BITS no longer affect the page
flags since we no longer need to encode the section number in
page->flags.
--
Catalin
^ permalink raw reply
* Re: [PATCH] rcu: Remove some superfluous lines
From: Paul E. McKenney @ 2016-04-12 14:44 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel
In-Reply-To: <20160310135500.GQ6356@twins.programming.kicks-ass.net>
On Thu, Mar 10, 2016 at 02:55:00PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 10, 2016 at 05:41:46AM -0800, Paul E. McKenney wrote:
> > On Thu, Mar 10, 2016 at 09:49:04AM +0100, Peter Zijlstra wrote:
> > >
> > > I think you'll find this condition is superfluous, as the whole function
> > > is under #ifdef of that same.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >
> > Right you are! It got moved under that #ifdef in the process of merging
> > the RCU, hotplug, and swait changes, and I failed to notice. Good catch!
> >
> > I will apply this to my tree once -rc1 comes out, as it will apply to
> > -rcu at that point.
> >
> > Or maybe we should remove the #ifdef and add IS_ENABLED() to the other
> > functions under that #ifdef. Thoughts?
>
> I'd go with the #ifdef, its the conventional pattern.
Longer term, I am moving from #ifdef to IS_ENABLED(), as it makes for
easier detection of compiler errors in oddball combinations of Kconfig
options. But no point in carrying redundant code in the meantime,
so queued for 4.8.
Thanx, Paul
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox