* [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
Peer records created for incoming connections weren't getting their hash
key set. This meant that incoming calls wouldn't see more than one DATA
packet - which is not a problem for AFS CM calls with small request data
blobs.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/peer_object.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 2efe29a4c232..3e6cd174b53d 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -203,6 +203,7 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
*/
static void rxrpc_init_peer(struct rxrpc_peer *peer, unsigned long hash_key)
{
+ peer->hash_key = hash_key;
rxrpc_assess_MTU_size(peer);
peer->mtu = peer->if_mtu;
@@ -238,7 +239,6 @@ static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local,
peer = rxrpc_alloc_peer(local, gfp);
if (peer) {
- peer->hash_key = hash_key;
memcpy(&peer->srx, srx, sizeof(*srx));
rxrpc_init_peer(peer, hash_key);
}
^ permalink raw reply related
* [PATCH net-next 02/10] rxrpc: Add missing wakeup on Tx window rotation
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
We need to wake up the sender when Tx window rotation due to an incoming
ACK makes space in the buffer otherwise the sender is liable to just hang
endlessly.
This problem isn't noticeable if the Tx phase transfers no more than will
fit in a single window or the Tx window rotates fast enough that it doesn't
get full.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index afeba98004b1..a707d5952164 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -59,6 +59,8 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
spin_unlock(&call->lock);
+ wake_up(&call->waitq);
+
while (list) {
skb = list;
list = skb->next;
^ permalink raw reply related
* [PATCH net-next 03/10] rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
The IDLE ACK packet should use the rxrpc_idle_ack_delay setting when the
timer is set for it.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 2b976e789562..61432049869b 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -95,7 +95,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
break;
case RXRPC_ACK_IDLE:
- if (rxrpc_soft_ack_delay < expiry)
+ if (rxrpc_idle_ack_delay < expiry)
expiry = rxrpc_idle_ack_delay;
break;
^ permalink raw reply related
* [PATCH net-next 04/10] rxrpc: Requeue call for recvmsg if more data
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
rxrpc_recvmsg() needs to make sure that the call it has just been
processing gets requeued for further attention if the buffer has been
filled and there's more data to be consumed. The softirq producer only
queues the call and wakes the socket if it fills the first slot in the
window, so userspace might end up sleeping forever otherwise, despite there
being data available.
This is not a problem provided the userspace buffer is big enough or it
empties the buffer completely before more data comes in.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/recvmsg.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 20d0b5c6f81b..16ff56f69256 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -463,6 +463,10 @@ try_again:
flags, &copied);
if (ret == -EAGAIN)
ret = 0;
+
+ if (after(call->rx_top, call->rx_hard_ack) &&
+ call->rxtx_buffer[(call->rx_hard_ack + 1) & RXRPC_RXTX_BUFF_MASK])
+ rxrpc_notify_socket(call);
break;
default:
ret = 0;
^ permalink raw reply related
* [PATCH net-next 05/10] rxrpc: Add missing unlock in rxrpc_call_accept()
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
Add a missing unlock in rxrpc_call_accept() in the path taken if there's no
call to wake up.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_accept.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index b8acec0d596e..06e328f6b0f0 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -425,9 +425,11 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
write_lock(&rx->call_lock);
- ret = -ENODATA;
- if (list_empty(&rx->to_be_accepted))
- goto out;
+ if (list_empty(&rx->to_be_accepted)) {
+ write_unlock(&rx->call_lock);
+ kleave(" = -ENODATA [empty]");
+ return ERR_PTR(-ENODATA);
+ }
/* check the user ID isn't already in use */
pp = &rx->calls.rb_node;
^ permalink raw reply related
* [PATCH net-next 06/10] rxrpc: Use skb->len not skb->data_len
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
skb->len should be used rather than skb->data_len when referring to the
amount of data in a packet. This will only cause a malfunction in the
following cases:
(1) We receive a jumbo packet (validation and splitting both are wrong).
(2) We see if there's extra ACK info in an ACK packet (we think it's not
there and just ignore it).
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index a707d5952164..5958ef8ba2a0 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -127,7 +127,7 @@ static bool rxrpc_validate_jumbo(struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
unsigned int offset = sp->offset;
- unsigned int len = skb->data_len;
+ unsigned int len = skb->len;
int nr_jumbo = 1;
u8 flags = sp->hdr.flags;
@@ -196,7 +196,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
u8 ack = 0, flags, annotation = 0;
_enter("{%u,%u},{%u,%u}",
- call->rx_hard_ack, call->rx_top, skb->data_len, seq);
+ call->rx_hard_ack, call->rx_top, skb->len, seq);
_proto("Rx DATA %%%u { #%u f=%02x }",
sp->hdr.serial, seq, sp->hdr.flags);
@@ -233,7 +233,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
next_subpacket:
queued = false;
ix = seq & RXRPC_RXTX_BUFF_MASK;
- len = skb->data_len;
+ len = skb->len;
if (flags & RXRPC_JUMBO_PACKET)
len = RXRPC_JUMBO_DATALEN;
@@ -444,7 +444,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
}
offset = sp->offset + nr_acks + 3;
- if (skb->data_len >= offset + sizeof(buf.info)) {
+ if (skb->len >= offset + sizeof(buf.info)) {
if (skb_copy_bits(skb, offset, &buf.info, sizeof(buf.info)) < 0)
return rxrpc_proto_abort("XAI", call, 0);
rxrpc_input_ackinfo(call, skb, &buf.info);
^ permalink raw reply related
* [PATCH net-next 07/10] rxrpc: Allow tx_winsize to grow in response to an ACK
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
Allow tx_winsize to grow when the ACK info packet shows a larger receive
window at the other end rather than only permitting it to shrink.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 5958ef8ba2a0..8e529afcd6c1 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -333,14 +333,16 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_peer *peer;
unsigned int mtu;
+ u32 rwind = ntohl(ackinfo->rwind);
_proto("Rx ACK %%%u Info { rx=%u max=%u rwin=%u jm=%u }",
sp->hdr.serial,
ntohl(ackinfo->rxMTU), ntohl(ackinfo->maxMTU),
- ntohl(ackinfo->rwind), ntohl(ackinfo->jumbo_max));
+ rwind, ntohl(ackinfo->jumbo_max));
- if (call->tx_winsize > ntohl(ackinfo->rwind))
- call->tx_winsize = ntohl(ackinfo->rwind);
+ if (rwind > RXRPC_RXTX_BUFF_SIZE - 1)
+ rwind = RXRPC_RXTX_BUFF_SIZE - 1;
+ call->tx_winsize = rwind;
mtu = min(ntohl(ackinfo->rxMTU), ntohl(ackinfo->maxMTU));
^ permalink raw reply related
* [PATCH net-next 08/10] rxrpc: Adjust the call ref tracepoint to show kernel API refs
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
Adjust the call ref tracepoint to show references held on a call by the
kernel API separately as much as possible and add an additional trace to at
the allocation point from the preallocation buffer for an incoming call.
Note that this doesn't show the allocation of a client call for the kernel
separately at the moment.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/af_rxrpc.c | 2 +-
net/rxrpc/ar-internal.h | 2 ++
net/rxrpc/call_accept.c | 3 ++-
net/rxrpc/call_object.c | 2 ++
4 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index caa226dd436e..25d00ded24bc 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -299,7 +299,7 @@ void rxrpc_kernel_end_call(struct socket *sock, struct rxrpc_call *call)
{
_enter("%d{%d}", call->debug_id, atomic_read(&call->usage));
rxrpc_release_call(rxrpc_sk(sock->sk), call);
- rxrpc_put_call(call, rxrpc_call_put);
+ rxrpc_put_call(call, rxrpc_call_put_kernel);
}
EXPORT_SYMBOL(rxrpc_kernel_end_call);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b1cb79ec4e96..47c74a581a0f 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -540,8 +540,10 @@ enum rxrpc_call_trace {
rxrpc_call_seen,
rxrpc_call_got,
rxrpc_call_got_userid,
+ rxrpc_call_got_kernel,
rxrpc_call_put,
rxrpc_call_put_userid,
+ rxrpc_call_put_kernel,
rxrpc_call_put_noqueue,
rxrpc_call__nr_trace
};
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 06e328f6b0f0..5fd9d2c89b7f 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -121,7 +121,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
call->user_call_ID = user_call_ID;
call->notify_rx = notify_rx;
- rxrpc_get_call(call, rxrpc_call_got);
+ rxrpc_get_call(call, rxrpc_call_got_kernel);
user_attach_call(call, user_call_ID);
rxrpc_get_call(call, rxrpc_call_got_userid);
rb_link_node(&call->sock_node, parent, pp);
@@ -300,6 +300,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx,
smp_store_release(&b->call_backlog_tail,
(call_tail + 1) & (RXRPC_BACKLOG_MAX - 1));
+ rxrpc_see_call(call);
call->conn = conn;
call->peer = rxrpc_get_peer(conn->params.peer);
return call;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 18ab13f82f6e..3f9476508204 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -56,8 +56,10 @@ const char rxrpc_call_traces[rxrpc_call__nr_trace][4] = {
[rxrpc_call_seen] = "SEE",
[rxrpc_call_got] = "GOT",
[rxrpc_call_got_userid] = "Gus",
+ [rxrpc_call_got_kernel] = "Gke",
[rxrpc_call_put] = "PUT",
[rxrpc_call_put_userid] = "Pus",
+ [rxrpc_call_put_kernel] = "Pke",
[rxrpc_call_put_noqueue] = "PNQ",
};
^ permalink raw reply related
* [PATCH net-next 09/10] rxrpc: Fix prealloc refcounting
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
The preallocated call buffer holds a ref on the calls within that buffer.
The ref was being released in the wrong place - it worked okay for incoming
calls to the AFS cache manager service, but doesn't work right for incoming
calls to a userspace service.
Instead of releasing an extra ref service calls in rxrpc_release_call(),
the ref needs to be released during the acceptance/rejectance process. To
this end:
(1) The prealloc ref is now normally released during
rxrpc_new_incoming_call().
(2) For preallocated kernel API calls, the kernel API's ref needs to be
released when the call is discarded on socket close.
(3) We shouldn't take a second ref in rxrpc_accept_call().
(4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds
the call to the to_be_accepted socket queue.
In doing (4) above, we would prefer not to put the call's refcount down to
0 as that entails doing cleanup in softirq context, but it's unlikely as
there are several refs held elsewhere, at least one of which must be put by
someone in process context calling rxrpc_release_call(). However, it's not
a problem if we do have to do that.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_accept.c | 9 ++++++++-
net/rxrpc/call_object.c | 3 ---
net/rxrpc/recvmsg.c | 1 +
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 5fd9d2c89b7f..26c293ef98eb 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -221,6 +221,7 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
if (rx->discard_new_call) {
_debug("discard %lx", call->user_call_ID);
rx->discard_new_call(call, call->user_call_ID);
+ rxrpc_put_call(call, rxrpc_call_put_kernel);
}
rxrpc_call_completed(call);
rxrpc_release_call(rx, call);
@@ -402,6 +403,13 @@ found_service:
if (call->state == RXRPC_CALL_SERVER_ACCEPTING)
rxrpc_notify_socket(call);
+ /* We have to discard the prealloc queue's ref here and rely on a
+ * combination of the RCU read lock and refs held either by the socket
+ * (recvmsg queue, to-be-accepted queue or user ID tree) or the kernel
+ * service to prevent the call from being deallocated too early.
+ */
+ rxrpc_put_call(call, rxrpc_call_put);
+
_leave(" = %p{%d}", call, call->debug_id);
out:
spin_unlock(&rx->incoming_lock);
@@ -469,7 +477,6 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
}
/* formalise the acceptance */
- rxrpc_get_call(call, rxrpc_call_got);
call->notify_rx = notify_rx;
call->user_call_ID = user_call_ID;
rxrpc_get_call(call, rxrpc_call_got_userid);
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 3f9476508204..9aa1c4b53563 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -464,9 +464,6 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
call->rxtx_buffer[i] = NULL;
}
- /* We have to release the prealloc backlog ref */
- if (rxrpc_is_service_call(call))
- rxrpc_put_call(call, rxrpc_call_put);
_leave("");
}
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 16ff56f69256..a284205b8ecf 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -118,6 +118,7 @@ static int rxrpc_recvmsg_new_call(struct rxrpc_sock *rx,
list_del_init(&call->recvmsg_link);
write_unlock_bh(&rx->recvmsg_lock);
+ rxrpc_get_call(call, rxrpc_call_got);
write_lock(&rx->call_lock);
list_add_tail(&call->accept_link, &rx->to_be_accepted);
write_unlock(&rx->call_lock);
^ permalink raw reply related
* [PATCH net-next 10/10] rxrpc: Correctly initialise, limit and transmit call->rx_winsize
From: David Howells @ 2016-09-13 22:22 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380525614.23135.7539695651371953351.stgit@warthog.procyon.org.uk>
call->rx_winsize should be initialised to the sysctl setting and the sysctl
setting should be limited to the maximum we want to permit. Further, we
need to place this in the ACK info instead of the sysctl setting.
Furthermore, discard the idea of accepting the subpackets of a jumbo packet
that lie beyond the receive window when the first packet of the jumbo is
within the window. Just discard the excess subpackets instead. This
allows the receive window to be opened up right to the buffer size less one
for the dead slot.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-internal.h | 3 ++-
net/rxrpc/call_object.c | 2 +-
net/rxrpc/input.c | 23 ++++++++++++++++-------
net/rxrpc/misc.c | 5 ++++-
net/rxrpc/output.c | 4 ++--
net/rxrpc/sysctl.c | 2 +-
6 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 47c74a581a0f..e78c40b37db5 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -498,6 +498,7 @@ struct rxrpc_call {
*/
#define RXRPC_RXTX_BUFF_SIZE 64
#define RXRPC_RXTX_BUFF_MASK (RXRPC_RXTX_BUFF_SIZE - 1)
+#define RXRPC_INIT_RX_WINDOW_SIZE 32
struct sk_buff **rxtx_buffer;
u8 *rxtx_annotations;
#define RXRPC_TX_ANNO_ACK 0
@@ -518,7 +519,7 @@ struct rxrpc_call {
rxrpc_seq_t rx_expect_next; /* Expected next packet sequence number */
u8 rx_winsize; /* Size of Rx window */
u8 tx_winsize; /* Maximum size of Tx window */
- u8 nr_jumbo_dup; /* Number of jumbo duplicates */
+ u8 nr_jumbo_bad; /* Number of jumbo dups/exceeds-windows */
/* receive-phase ACK management */
u8 ackr_reason; /* reason to ACK */
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 9aa1c4b53563..22f9b0d1a138 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -152,7 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
memset(&call->sock_node, 0xed, sizeof(call->sock_node));
/* Leave space in the ring to handle a maxed-out jumbo packet */
- call->rx_winsize = RXRPC_RXTX_BUFF_SIZE - 1 - 46;
+ call->rx_winsize = rxrpc_rx_window_size;
call->tx_winsize = 16;
call->rx_expect_next = 1;
return call;
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 8e529afcd6c1..75af0bd316c7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -164,7 +164,7 @@ protocol_error:
* (that information is encoded in the ACK packet).
*/
static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
- u8 annotation, bool *_jumbo_dup)
+ u8 annotation, bool *_jumbo_bad)
{
/* Discard normal packets that are duplicates. */
if (annotation == 0)
@@ -174,9 +174,9 @@ static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
* more partially duplicate jumbo packets, we refuse to take any more
* jumbos for this call.
*/
- if (!*_jumbo_dup) {
- call->nr_jumbo_dup++;
- *_jumbo_dup = true;
+ if (!*_jumbo_bad) {
+ call->nr_jumbo_bad++;
+ *_jumbo_bad = true;
}
}
@@ -191,7 +191,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
unsigned int ix;
rxrpc_serial_t serial = sp->hdr.serial, ack_serial = 0;
rxrpc_seq_t seq = sp->hdr.seq, hard_ack;
- bool immediate_ack = false, jumbo_dup = false, queued;
+ bool immediate_ack = false, jumbo_bad = false, queued;
u16 len;
u8 ack = 0, flags, annotation = 0;
@@ -222,7 +222,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
flags = sp->hdr.flags;
if (flags & RXRPC_JUMBO_PACKET) {
- if (call->nr_jumbo_dup > 3) {
+ if (call->nr_jumbo_bad > 3) {
ack = RXRPC_ACK_NOSPACE;
ack_serial = serial;
goto ack;
@@ -259,7 +259,7 @@ next_subpacket:
}
if (call->rxtx_buffer[ix]) {
- rxrpc_input_dup_data(call, seq, annotation, &jumbo_dup);
+ rxrpc_input_dup_data(call, seq, annotation, &jumbo_bad);
if (ack != RXRPC_ACK_DUPLICATE) {
ack = RXRPC_ACK_DUPLICATE;
ack_serial = serial;
@@ -304,6 +304,15 @@ skip:
annotation++;
if (flags & RXRPC_JUMBO_PACKET)
annotation |= RXRPC_RX_ANNO_JLAST;
+ if (after(seq, hard_ack + call->rx_winsize)) {
+ ack = RXRPC_ACK_EXCEEDS_WINDOW;
+ ack_serial = serial;
+ if (!jumbo_bad) {
+ call->nr_jumbo_bad++;
+ jumbo_bad = true;
+ }
+ goto ack;
+ }
_proto("Rx DATA Jumbo %%%u", serial);
goto next_subpacket;
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index fd096f742e4b..8b910780f1ac 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -50,7 +50,10 @@ unsigned int rxrpc_idle_ack_delay = 0.5 * HZ;
* limit is hit, we should generate an EXCEEDS_WINDOW ACK and discard further
* packets.
*/
-unsigned int rxrpc_rx_window_size = RXRPC_RXTX_BUFF_SIZE - 46;
+unsigned int rxrpc_rx_window_size = RXRPC_INIT_RX_WINDOW_SIZE;
+#if (RXRPC_RXTX_BUFF_SIZE - 1) < RXRPC_INIT_RX_WINDOW_SIZE
+#error Need to reduce RXRPC_INIT_RX_WINDOW_SIZE
+#endif
/*
* Maximum Rx MTU size. This indicates to the sender the size of jumbo packet
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 719a4c23f09d..90c7722d5779 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -71,10 +71,10 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
mtu = call->conn->params.peer->if_mtu;
mtu -= call->conn->params.peer->hdrsize;
- jmax = (call->nr_jumbo_dup > 3) ? 1 : rxrpc_rx_jumbo_max;
+ jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max;
pkt->ackinfo.rxMTU = htonl(rxrpc_rx_mtu);
pkt->ackinfo.maxMTU = htonl(mtu);
- pkt->ackinfo.rwind = htonl(rxrpc_rx_window_size);
+ pkt->ackinfo.rwind = htonl(call->rx_winsize);
pkt->ackinfo.jumbo_max = htonl(jmax);
*ackp++ = 0;
diff --git a/net/rxrpc/sysctl.c b/net/rxrpc/sysctl.c
index b7ca8cf13c84..a03c61c672f5 100644
--- a/net/rxrpc/sysctl.c
+++ b/net/rxrpc/sysctl.c
@@ -20,7 +20,7 @@ static const unsigned int one = 1;
static const unsigned int four = 4;
static const unsigned int thirtytwo = 32;
static const unsigned int n_65535 = 65535;
-static const unsigned int n_max_acks = RXRPC_MAXACKS;
+static const unsigned int n_max_acks = RXRPC_RXTX_BUFF_SIZE - 1;
/*
* RxRPC operating parameters.
^ permalink raw reply related
* Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
From: Scott Wood @ 2016-09-13 22:24 UTC (permalink / raw)
To: Y.B. Lu, linux-mmc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ulf.hansson-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
Arnd Bergmann
Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Russell King, Bhupesh Sharma,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Santosh Shilimkar,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jochen Friedrich, X.B. Xie,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Rob Herring, linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Claudiu Manoil, Kumar Gala, Leo Li,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <HE1PR04MB08892E63EB2D579D6F0A7550F8FE0-6LN7OEpIatX1kPMWxTxe+c9NdZoXdze2vxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
On Tue, 2016-09-13 at 07:23 +0000, Y.B. Lu wrote:
> >
> >
> > -----Original Message-----
> > From: linux-mmc-owner@vger.kernel.org [mailto:linux-mmc-
> > owner@vger.kernel.org] On Behalf Of Scott Wood
> > Sent: Tuesday, September 13, 2016 7:25 AM
> > To: Y.B. Lu; linux-mmc@vger.kernel.org; ulf.hansson@linaro.org; Arnd
> > Bergmann
> > Cc: linuxppc-dev@lists.ozlabs.org; devicetree@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > clk@vger.kernel.org; linux-i2c@vger.kernel.org; iommu@lists.linux-
> > foundation.org; netdev@vger.kernel.org; Mark Rutland; Rob Herring;
> > Russell King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh
> > Sharma; Qiang Zhao; Kumar Gala; Santosh Shilimkar; Leo Li; X.B. Xie
> > Subject: Re: [v11, 5/8] soc: fsl: add GUTS driver for QorIQ platforms
> >
> > BTW, aren't ls2080a and ls2085a the same die? And is there no non-E
> > version of LS2080A/LS2040A?
> [Lu Yangbo-B47093] I checked all the svr values in chip errata doc "Revision
> level to part marking cross-reference" table.
> I found ls2080a and ls2085a were in two separate doc. And I didn’t find non-
> E version of LS2080A/LS2040A in chip errata doc.
> Do you know is there any other doc we can confirm this?
No. Traditionally we've always had E and non-E versions of each chip, but I
have no knowledge of whether that has changed (I do note that the way that E-
status is indicated in SVR has changed).
But please label LS2080A and LS2085A as the same die (or provide strong
evidence that they are not).
>
> >
> >
> > >
> > > > >
> > > > > + do {
> > > > > + if (!matches->soc_id)
> > > > > + return NULL;
> > > > > + if (glob_match(svr_match, matches->soc_id))
> > > > > + break;
> > > > > + } while (matches++);
> > > > Are you expecting "matches++" to ever evaluate as false?
> > > [Lu Yangbo-B47093] Yes, this is used to match the soc we use in
> > > qoriq_soc array until getting true.
> > > We need to get the name and die information defined in array.
> > I'm not asking whether the glob_match will ever return true. I'm saying
> > that "matches++" will never become NULL.
> [Lu Yangbo-B47093] The matches++ will never become NULL while it will return
> NULL after matching for all the members in array.
"matches++" will never "return NULL". It's just an incrementing address. It
won't be null until you wrap around the address space, and even if the other
loop terminators never kicked in you'd crash long before that happens.
Please rewrite the loop as something like:
while (matches->soc_id) {
if (glob_match(...))
return matches;
matches++;
}
return NULL;
> > > > > + /* Register soc device */
> > > > > + soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> > > > > + if (!soc_dev_attr) {
> > > > > + ret = -ENOMEM;
> > > > > + goto out_unmap;
> > > > > + }
> > > > Couldn't this be statically allocated?
> > > [Lu Yangbo-B47093] Do you mean we define this struct statically ?
> > >
> > > static struct soc_device_attribute soc_dev_attr;
> > Yes.
> >
> [Lu Yangbo-B47093] It's ok to define it statically. Is there any need to do
> that?
It's simpler.
-Scott
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Rustad, Mark D @ 2016-09-13 22:41 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Tom Herbert, Brenden Blanco,
Linux Kernel Network Developers, intel-wired-lan,
Jesper Dangaard Brouer, Cong Wang, David S. Miller, William Tu
In-Reply-To: <20160913215208.GA40872@ast-mbp.thefacebook.com>
[-- Attachment #1: Type: text/plain, Size: 4687 bytes --]
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Tue, Sep 13, 2016 at 07:14:27PM +0000, Rustad, Mark D wrote:
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>
>>> On Tue, Sep 13, 2016 at 06:28:03PM +0000, Rustad, Mark D wrote:
>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>>> I've looked through qemu and it appears only emulate e1k and tg3.
>>>>> The latter is still used in the field, so the risk of touching
>>>>> it is higher.
>>>>
>>>> I have no idea what makes you think that e1k is *not* "used in the
>>>> field".
>>>> I grant you it probably is used more virtualized than not these days,
>>>> but it
>>>> certainly exists and is used. You can still buy them new at Newegg for
>>>> goodness sakes!
>>>
>>> the point that it's only used virtualized, since PCI (not PCIE) have
>>> been long dead.
>>
>> My point is precisely the opposite. It is a real device, it exists in real
>> systems and it is used in those systems. I worked on embedded systems that
>> ran Linux and used e1000 devices. I am sure they are still out there
>> because
>> customers are still paying for support of those systems.
>>
>> Yes, PCI(-X) is absent from any current hardware and has been for some
>> years
>> now, but there is an installed base that continues. What part of that
>> installed base updates software? I don't know, but I would not just assume
>> that it is 0. I know that I updated the kernel on those embedded systems
>> that I worked on when I was supporting them. Never at the bleeding edge,
>> but
>> generally hopping from one LTS kernel to another as needed.
>
> I suspect modern linux won't boot on such old pci only systems for other
> reasons not related to networking, since no one really cares to test
> kernels there.
Actually it does boot, because although the motherboard was PCIe, the slots
and the adapters in them were PCI-X. So the core architecture was not so
stale.
> So I think we mostly agree. There is chance that this xdp e1k code will
> find a way to that old system. What are the chances those users will
> be using xdp there? I think pretty close to zero.
For sure they wouldn't be using XDP, but they could suffer regressions in a
changed driver that might find its way there. That is the risk.
> The pci-e nics integrated into motherboards that pretend to be tg3
> (though they're not at all build by broadcom) are significantly more
> common.
> That's why I picked e1k instead of tg3.
That may be true (I really don't know anything about tg3 so I certainly
can't dispute it), so the risk could be smaller with e1k, but there is
still a regression risk for real existing hardware. That is my concern.
> Also note how this patch is not touching anything in the main e1k path
> (because I don't have a hw to test and I suspect Intel's driver team
> doesn't have it either) to make sure there is no breakage on those
> old systems. I created separate e1000_xmit_raw_frame() routine
> instead of adding flags into e1000_xmit_frame() for the same reasons:
> to make sure there is no breakage.
> Same reasoning for not doing an union of page/skb as Alexander suggested.
> I wanted minimal change to e1k that allows development xdp programs in kvm
> without affecting e1k main path. If you see the actual bug in the patch,
> please point out the line.
I can't say that I can, because I am not familiar with the internals of
e1k. When I was using it, I never had cause to even look at the driver
because it just worked. My attentions then were properly elsewhere.
My concern is with messing with a driver that probably no one has an active
testbed routinely running regression tests.
Maybe a new ID should be assigned and the driver forked for this purpose.
At least then only the virtualization case would have to be tested. Of
course the hosts would have to offer that ID, but if this is just for
testing that should be ok, or at least possible.
If someone with more internal knowledge of this driver has enough
confidence to bless such patches, that might be fine, but it is a fallacy
to think that e1k is *only* a virtualization driver today. Not yet anyway.
Maybe around 2020.
That said, I can see that you have tried to keep the original code path
pretty much intact. I would note that you introduced rcu calls into the
!bpf path that would never have been done before. While that should be ok,
I would really like to see it tested, at least for the !bpf case, on real
hardware to be sure. I really can't comment on the workaround issue brought
up by Eric, because I just don't know about them. At least that risk seems
to only be in the bpf case.
^ permalink raw reply
* [PATCH net-next 0/4] rxrpc: Support IPv6
From: David Howells @ 2016-09-13 22:41 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Here is a set of patches that add IPv6 support. They need to be applied on
top of the just-posted miscellaneous fix patches. They are:
(1) Make autobinding of an unconnected socket work when sendmsg() is
called to initiate a client call.
(2) Don't specify the protocol when creating the client socket, but rather
take the default instead.
(3) Use rxrpc_extract_addr_from_skb() in a couple of places that were
doing the same thing manually. This allows the IPv6 address
extraction to be done in fewer places.
(4) Add IPv6 support. With this, calls can be made to IPv6 servers from
userspace AF_RXRPC programs; AFS, however, can't use IPv6 yet as the
RPC calls need to be upgradeable.
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
Tagged thusly:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160913-2
David
---
David Howells (4):
rxrpc: Create an address for sendmsg() to bind unbound socket with
rxrpc: Don't specify protocol to when creating transport socket
rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manually
rxrpc: Add IPv6 support
net/rxrpc/af_rxrpc.c | 27 ++++++++++-
net/rxrpc/conn_object.c | 8 +++
net/rxrpc/local_event.c | 13 ++---
net/rxrpc/local_object.c | 39 +++++++---------
net/rxrpc/output.c | 48 +++++++++-----------
net/rxrpc/peer_event.c | 24 ++++++++++
net/rxrpc/peer_object.c | 109 +++++++++++++++++++++++++++++-----------------
net/rxrpc/proc.c | 30 +++++--------
8 files changed, 179 insertions(+), 119 deletions(-)
^ permalink raw reply
* [PATCH net-next 1/4] rxrpc: Create an address for sendmsg() to bind unbound socket with
From: David Howells @ 2016-09-13 22:41 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380649153.30728.6717292274642860064.stgit@warthog.procyon.org.uk>
Create an address for sendmsg() to bind unbound socket with rather than
using a completely blank address otherwise the transport socket creation
will fail because it will try to use address family 0.
We use the address family specified in the protocol argument when the
AF_RXRPC socket was created and SOCK_DGRAM as the default. For anything
else, bind() must be used.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/af_rxrpc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 25d00ded24bc..741b0d8d2e8c 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -401,6 +401,18 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
switch (rx->sk.sk_state) {
case RXRPC_UNBOUND:
+ rx->srx.srx_family = AF_RXRPC;
+ rx->srx.srx_service = 0;
+ rx->srx.transport_type = SOCK_DGRAM;
+ rx->srx.transport.family = rx->family;
+ switch (rx->family) {
+ case AF_INET:
+ rx->srx.transport_len = sizeof(struct sockaddr_in);
+ break;
+ default:
+ ret = -EAFNOSUPPORT;
+ goto error_unlock;
+ }
local = rxrpc_lookup_local(&rx->srx);
if (IS_ERR(local)) {
ret = PTR_ERR(local);
^ permalink raw reply related
* [PATCH net-next 3/4] rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manually
From: David Howells @ 2016-09-13 22:41 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380649153.30728.6717292274642860064.stgit@warthog.procyon.org.uk>
There are two places that want to transmit a packet in response to one just
received and manually pick the address to reply to out of the sk_buff.
Make them use rxrpc_extract_addr_from_skb() instead so that IPv6 is handled
automatically.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/local_event.c | 13 +++++--------
net/rxrpc/output.c | 32 ++++++--------------------------
2 files changed, 11 insertions(+), 34 deletions(-)
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
index cdd58e6e9fbd..f073e932500e 100644
--- a/net/rxrpc/local_event.c
+++ b/net/rxrpc/local_event.c
@@ -15,8 +15,6 @@
#include <linux/net.h>
#include <linux/skbuff.h>
#include <linux/slab.h>
-#include <linux/udp.h>
-#include <linux/ip.h>
#include <net/sock.h>
#include <net/af_rxrpc.h>
#include <generated/utsrelease.h>
@@ -33,7 +31,7 @@ static void rxrpc_send_version_request(struct rxrpc_local *local,
{
struct rxrpc_wire_header whdr;
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
- struct sockaddr_in sin;
+ struct sockaddr_rxrpc srx;
struct msghdr msg;
struct kvec iov[2];
size_t len;
@@ -41,12 +39,11 @@ static void rxrpc_send_version_request(struct rxrpc_local *local,
_enter("");
- sin.sin_family = AF_INET;
- sin.sin_port = udp_hdr(skb)->source;
- sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
+ if (rxrpc_extract_addr_from_skb(&srx, skb) < 0)
+ return;
- msg.msg_name = &sin;
- msg.msg_namelen = sizeof(sin);
+ msg.msg_name = &srx.transport;
+ msg.msg_namelen = srx.transport_len;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = 0;
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 90c7722d5779..ec3621f2c5c8 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -15,8 +15,6 @@
#include <linux/gfp.h>
#include <linux/skbuff.h>
#include <linux/export.h>
-#include <linux/udp.h>
-#include <linux/ip.h>
#include <net/sock.h>
#include <net/af_rxrpc.h>
#include "ar-internal.h"
@@ -272,10 +270,7 @@ send_fragmentable:
*/
void rxrpc_reject_packets(struct rxrpc_local *local)
{
- union {
- struct sockaddr sa;
- struct sockaddr_in sin;
- } sa;
+ struct sockaddr_rxrpc srx;
struct rxrpc_skb_priv *sp;
struct rxrpc_wire_header whdr;
struct sk_buff *skb;
@@ -292,32 +287,21 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
iov[1].iov_len = sizeof(code);
size = sizeof(whdr) + sizeof(code);
- msg.msg_name = &sa;
+ msg.msg_name = &srx.transport;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = 0;
- memset(&sa, 0, sizeof(sa));
- sa.sa.sa_family = local->srx.transport.family;
- switch (sa.sa.sa_family) {
- case AF_INET:
- msg.msg_namelen = sizeof(sa.sin);
- break;
- default:
- msg.msg_namelen = 0;
- break;
- }
-
memset(&whdr, 0, sizeof(whdr));
whdr.type = RXRPC_PACKET_TYPE_ABORT;
while ((skb = skb_dequeue(&local->reject_queue))) {
rxrpc_see_skb(skb);
sp = rxrpc_skb(skb);
- switch (sa.sa.sa_family) {
- case AF_INET:
- sa.sin.sin_port = udp_hdr(skb)->source;
- sa.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
+
+ if (rxrpc_extract_addr_from_skb(&srx, skb) == 0) {
+ msg.msg_namelen = srx.transport_len;
+
code = htonl(skb->priority);
whdr.epoch = htonl(sp->hdr.epoch);
@@ -329,10 +313,6 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
whdr.flags &= RXRPC_CLIENT_INITIATED;
kernel_sendmsg(local->socket, &msg, iov, 2, size);
- break;
-
- default:
- break;
}
rxrpc_free_skb(skb);
^ permalink raw reply related
* [PATCH net-next 2/4] rxrpc: Don't specify protocol to when creating transport socket
From: David Howells @ 2016-09-13 22:41 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380649153.30728.6717292274642860064.stgit@warthog.procyon.org.uk>
Pass 0 as the protocol argument when creating the transport socket rather
than IPPROTO_UDP.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/local_object.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 782b9adf67cb..8720be2a6250 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -103,8 +103,8 @@ static int rxrpc_open_socket(struct rxrpc_local *local)
_enter("%p{%d}", local, local->srx.transport_type);
/* create a socket to represent the local endpoint */
- ret = sock_create_kern(&init_net, PF_INET, local->srx.transport_type,
- IPPROTO_UDP, &local->socket);
+ ret = sock_create_kern(&init_net, local->srx.transport.family,
+ local->srx.transport_type, 0, &local->socket);
if (ret < 0) {
_leave(" = %d [socket]", ret);
return ret;
^ permalink raw reply related
* [PATCH net-next 4/4] rxrpc: Add IPv6 support
From: David Howells @ 2016-09-13 22:41 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147380649153.30728.6717292274642860064.stgit@warthog.procyon.org.uk>
Add IPv6 support to AF_RXRPC. With this, AF_RXRPC sockets can be created:
service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);
instead of:
service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
The AFS filesystem doesn't support IPv6 at the moment, though, since that
requires upgrades to some of the RPC calls.
Note that a good portion of this patch is replacing "%pI4:%u" in print
statements with "%pISpc" which is able to handle both protocols and print
the port.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/af_rxrpc.c | 15 +++++-
net/rxrpc/conn_object.c | 8 +++
net/rxrpc/local_object.c | 35 ++++++---------
net/rxrpc/output.c | 16 +++++++
net/rxrpc/peer_event.c | 24 ++++++++++
net/rxrpc/peer_object.c | 109 +++++++++++++++++++++++++++++-----------------
net/rxrpc/proc.c | 30 +++++--------
7 files changed, 154 insertions(+), 83 deletions(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 741b0d8d2e8c..f61f7b2d1ca4 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -106,19 +106,23 @@ static int rxrpc_validate_address(struct rxrpc_sock *rx,
case AF_INET:
if (srx->transport_len < sizeof(struct sockaddr_in))
return -EINVAL;
- _debug("INET: %x @ %pI4",
- ntohs(srx->transport.sin.sin_port),
- &srx->transport.sin.sin_addr);
tail = offsetof(struct sockaddr_rxrpc, transport.sin.__pad);
break;
case AF_INET6:
+ if (srx->transport_len < sizeof(struct sockaddr_in6))
+ return -EINVAL;
+ tail = offsetof(struct sockaddr_rxrpc, transport) +
+ sizeof(struct sockaddr_in6);
+ break;
+
default:
return -EAFNOSUPPORT;
}
if (tail < len)
memset((void *)srx + tail, 0, len - tail);
+ _debug("INET: %pISp", &srx->transport);
return 0;
}
@@ -409,6 +413,9 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
case AF_INET:
rx->srx.transport_len = sizeof(struct sockaddr_in);
break;
+ case AF_INET6:
+ rx->srx.transport_len = sizeof(struct sockaddr_in6);
+ break;
default:
ret = -EAFNOSUPPORT;
goto error_unlock;
@@ -563,7 +570,7 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
return -EAFNOSUPPORT;
/* we support transport protocol UDP/UDP6 only */
- if (protocol != PF_INET)
+ if (protocol != PF_INET && protocol != PF_INET6)
return -EPROTONOSUPPORT;
if (sock->type != SOCK_DGRAM)
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index ffa9addb97b2..c0ddba787fd4 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -134,6 +134,14 @@ struct rxrpc_connection *rxrpc_find_connection_rcu(struct rxrpc_local *local,
srx.transport.sin.sin_addr.s_addr)
goto not_found;
break;
+ case AF_INET6:
+ if (peer->srx.transport.sin6.sin6_port !=
+ srx.transport.sin6.sin6_port ||
+ memcmp(&peer->srx.transport.sin6.sin6_addr,
+ &srx.transport.sin6.sin6_addr,
+ sizeof(struct in6_addr)) != 0)
+ goto not_found;
+ break;
default:
BUG();
}
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 8720be2a6250..f5b9bb0d3f98 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -58,6 +58,15 @@ static long rxrpc_local_cmp_key(const struct rxrpc_local *local,
memcmp(&local->srx.transport.sin.sin_addr,
&srx->transport.sin.sin_addr,
sizeof(struct in_addr));
+ case AF_INET6:
+ /* If the choice of UDP6 port is left up to the transport, then
+ * the endpoint record doesn't match.
+ */
+ return ((u16 __force)local->srx.transport.sin6.sin6_port -
+ (u16 __force)srx->transport.sin6.sin6_port) ?:
+ memcmp(&local->srx.transport.sin6.sin6_addr,
+ &srx->transport.sin6.sin6_addr,
+ sizeof(struct in6_addr));
default:
BUG();
}
@@ -100,7 +109,8 @@ static int rxrpc_open_socket(struct rxrpc_local *local)
struct sock *sock;
int ret, opt;
- _enter("%p{%d}", local, local->srx.transport_type);
+ _enter("%p{%d,%d}",
+ local, local->srx.transport_type, local->srx.transport.family);
/* create a socket to represent the local endpoint */
ret = sock_create_kern(&init_net, local->srx.transport.family,
@@ -169,18 +179,8 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
long diff;
int ret;
- if (srx->transport.family == AF_INET) {
- _enter("{%d,%u,%pI4+%hu}",
- srx->transport_type,
- srx->transport.family,
- &srx->transport.sin.sin_addr,
- ntohs(srx->transport.sin.sin_port));
- } else {
- _enter("{%d,%u}",
- srx->transport_type,
- srx->transport.family);
- return ERR_PTR(-EAFNOSUPPORT);
- }
+ _enter("{%d,%d,%pISp}",
+ srx->transport_type, srx->transport.family, &srx->transport);
mutex_lock(&rxrpc_local_mutex);
@@ -233,13 +233,8 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
found:
mutex_unlock(&rxrpc_local_mutex);
- _net("LOCAL %s %d {%d,%u,%pI4+%hu}",
- age,
- local->debug_id,
- local->srx.transport_type,
- local->srx.transport.family,
- &local->srx.transport.sin.sin_addr,
- ntohs(local->srx.transport.sin.sin_port));
+ _net("LOCAL %s %d {%pISp}",
+ age, local->debug_id, &local->srx.transport);
_leave(" = %p", local);
return local;
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index ec3621f2c5c8..d7cd87f17f0d 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -258,6 +258,22 @@ send_fragmentable:
(char *)&opt, sizeof(opt));
}
break;
+
+ case AF_INET6:
+ opt = IPV6_PMTUDISC_DONT;
+ ret = kernel_setsockopt(conn->params.local->socket,
+ SOL_IPV6, IPV6_MTU_DISCOVER,
+ (char *)&opt, sizeof(opt));
+ if (ret == 0) {
+ ret = kernel_sendmsg(conn->params.local->socket, &msg,
+ iov, 1, iov[0].iov_len);
+
+ opt = IPV6_PMTUDISC_DO;
+ kernel_setsockopt(conn->params.local->socket,
+ SOL_IPV6, IPV6_MTU_DISCOVER,
+ (char *)&opt, sizeof(opt));
+ }
+ break;
}
up_write(&conn->params.local->defrag_sem);
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index c8948936c6fc..74217589cf44 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -66,6 +66,30 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
}
break;
+ case AF_INET6:
+ srx.transport.sin6.sin6_port = serr->port;
+ srx.transport_len = sizeof(struct sockaddr_in6);
+ switch (serr->ee.ee_origin) {
+ case SO_EE_ORIGIN_ICMP6:
+ _net("Rx ICMP6");
+ memcpy(&srx.transport.sin6.sin6_addr,
+ skb_network_header(skb) + serr->addr_offset,
+ sizeof(struct in6_addr));
+ break;
+ case SO_EE_ORIGIN_ICMP:
+ _net("Rx ICMP on v6 sock");
+ memcpy(&srx.transport.sin6.sin6_addr.s6_addr + 12,
+ skb_network_header(skb) + serr->addr_offset,
+ sizeof(struct in_addr));
+ break;
+ default:
+ memcpy(&srx.transport.sin6.sin6_addr,
+ &ipv6_hdr(skb)->saddr,
+ sizeof(struct in6_addr));
+ break;
+ }
+ break;
+
default:
BUG();
}
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 3e6cd174b53d..dfc07b41a472 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -16,12 +16,14 @@
#include <linux/skbuff.h>
#include <linux/udp.h>
#include <linux/in.h>
+#include <linux/in6.h>
#include <linux/slab.h>
#include <linux/hashtable.h>
#include <net/sock.h>
#include <net/af_rxrpc.h>
#include <net/ip.h>
#include <net/route.h>
+#include <net/ip6_route.h>
#include "ar-internal.h"
static DEFINE_HASHTABLE(rxrpc_peer_hash, 10);
@@ -50,6 +52,11 @@ static unsigned long rxrpc_peer_hash_key(struct rxrpc_local *local,
size = sizeof(srx->transport.sin.sin_addr);
p = (u16 *)&srx->transport.sin.sin_addr;
break;
+ case AF_INET6:
+ hash_key += (u16 __force)srx->transport.sin.sin_port;
+ size = sizeof(srx->transport.sin6.sin6_addr);
+ p = (u16 *)&srx->transport.sin6.sin6_addr;
+ break;
default:
WARN(1, "AF_RXRPC: Unsupported transport address family\n");
return 0;
@@ -93,6 +100,12 @@ static long rxrpc_peer_cmp_key(const struct rxrpc_peer *peer,
memcmp(&peer->srx.transport.sin.sin_addr,
&srx->transport.sin.sin_addr,
sizeof(struct in_addr));
+ case AF_INET6:
+ return ((u16 __force)peer->srx.transport.sin6.sin6_port -
+ (u16 __force)srx->transport.sin6.sin6_port) ?:
+ memcmp(&peer->srx.transport.sin6.sin6_addr,
+ &srx->transport.sin6.sin6_addr,
+ sizeof(struct in6_addr));
default:
BUG();
}
@@ -130,17 +143,7 @@ struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *local,
peer = __rxrpc_lookup_peer_rcu(local, srx, hash_key);
if (peer) {
- switch (srx->transport.family) {
- case AF_INET:
- _net("PEER %d {%d,%u,%pI4+%hu}",
- peer->debug_id,
- peer->srx.transport_type,
- peer->srx.transport.family,
- &peer->srx.transport.sin.sin_addr,
- ntohs(peer->srx.transport.sin.sin_port));
- break;
- }
-
+ _net("PEER %d {%pISp}", peer->debug_id, &peer->srx.transport);
_leave(" = %p {u=%d}", peer, atomic_read(&peer->usage));
}
return peer;
@@ -152,22 +155,49 @@ struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *local,
*/
static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer)
{
+ struct dst_entry *dst;
struct rtable *rt;
- struct flowi4 fl4;
+ struct flowi fl;
+ struct flowi4 *fl4 = &fl.u.ip4;
+ struct flowi6 *fl6 = &fl.u.ip6;
peer->if_mtu = 1500;
- rt = ip_route_output_ports(&init_net, &fl4, NULL,
- peer->srx.transport.sin.sin_addr.s_addr, 0,
- htons(7000), htons(7001),
- IPPROTO_UDP, 0, 0);
- if (IS_ERR(rt)) {
- _leave(" [route err %ld]", PTR_ERR(rt));
- return;
+ memset(&fl, 0, sizeof(fl));
+ switch (peer->srx.transport.family) {
+ case AF_INET:
+ rt = ip_route_output_ports(
+ &init_net, fl4, NULL,
+ peer->srx.transport.sin.sin_addr.s_addr, 0,
+ htons(7000), htons(7001), IPPROTO_UDP, 0, 0);
+ if (IS_ERR(rt)) {
+ _leave(" [route err %ld]", PTR_ERR(rt));
+ return;
+ }
+ dst = &rt->dst;
+ break;
+
+ case AF_INET6:
+ fl6->flowi6_iif = LOOPBACK_IFINDEX;
+ fl6->flowi6_scope = RT_SCOPE_UNIVERSE;
+ fl6->flowi6_proto = IPPROTO_UDP;
+ memcpy(&fl6->daddr, &peer->srx.transport.sin6.sin6_addr,
+ sizeof(struct in6_addr));
+ fl6->fl6_dport = htons(7001);
+ fl6->fl6_sport = htons(7000);
+ dst = ip6_route_output(&init_net, NULL, fl6);
+ if (IS_ERR(dst)) {
+ _leave(" [route err %ld]", PTR_ERR(dst));
+ return;
+ }
+ break;
+
+ default:
+ BUG();
}
- peer->if_mtu = dst_mtu(&rt->dst);
- dst_release(&rt->dst);
+ peer->if_mtu = dst_mtu(dst);
+ dst_release(dst);
_leave(" [if_mtu %u]", peer->if_mtu);
}
@@ -207,17 +237,22 @@ static void rxrpc_init_peer(struct rxrpc_peer *peer, unsigned long hash_key)
rxrpc_assess_MTU_size(peer);
peer->mtu = peer->if_mtu;
- if (peer->srx.transport.family == AF_INET) {
+ switch (peer->srx.transport.family) {
+ case AF_INET:
peer->hdrsize = sizeof(struct iphdr);
- switch (peer->srx.transport_type) {
- case SOCK_DGRAM:
- peer->hdrsize += sizeof(struct udphdr);
- break;
- default:
- BUG();
- break;
- }
- } else {
+ break;
+ case AF_INET6:
+ peer->hdrsize = sizeof(struct ipv6hdr);
+ break;
+ default:
+ BUG();
+ }
+
+ switch (peer->srx.transport_type) {
+ case SOCK_DGRAM:
+ peer->hdrsize += sizeof(struct udphdr);
+ break;
+ default:
BUG();
}
@@ -285,11 +320,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
struct rxrpc_peer *peer, *candidate;
unsigned long hash_key = rxrpc_peer_hash_key(local, srx);
- _enter("{%d,%d,%pI4+%hu}",
- srx->transport_type,
- srx->transport_len,
- &srx->transport.sin.sin_addr,
- ntohs(srx->transport.sin.sin_port));
+ _enter("{%pISp}", &srx->transport);
/* search the peer list first */
rcu_read_lock();
@@ -326,11 +357,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
peer = candidate;
}
- _net("PEER %d {%d,%pI4+%hu}",
- peer->debug_id,
- peer->srx.transport_type,
- &peer->srx.transport.sin.sin_addr,
- ntohs(peer->srx.transport.sin.sin_port));
+ _net("PEER %d {%pISp}", peer->debug_id, &peer->srx.transport);
_leave(" = %p {u=%d}", peer, atomic_read(&peer->usage));
return peer;
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index d529d1b4021c..65cd980767fa 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -52,11 +52,12 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
struct rxrpc_sock *rx;
struct rxrpc_peer *peer;
struct rxrpc_call *call;
- char lbuff[4 + 4 + 4 + 4 + 5 + 1], rbuff[4 + 4 + 4 + 4 + 5 + 1];
+ char lbuff[50], rbuff[50];
if (v == &rxrpc_calls) {
seq_puts(seq,
- "Proto Local Remote "
+ "Proto Local "
+ " Remote "
" SvID ConnID CallID End Use State Abort "
" UserID\n");
return 0;
@@ -68,9 +69,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
if (rx) {
local = READ_ONCE(rx->local);
if (local)
- sprintf(lbuff, "%pI4:%u",
- &local->srx.transport.sin.sin_addr,
- ntohs(local->srx.transport.sin.sin_port));
+ sprintf(lbuff, "%pISpc", &local->srx.transport);
else
strcpy(lbuff, "no_local");
} else {
@@ -79,14 +78,12 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
peer = call->peer;
if (peer)
- sprintf(rbuff, "%pI4:%u",
- &peer->srx.transport.sin.sin_addr,
- ntohs(peer->srx.transport.sin.sin_port));
+ sprintf(rbuff, "%pISpc", &peer->srx.transport);
else
strcpy(rbuff, "no_connection");
seq_printf(seq,
- "UDP %-22.22s %-22.22s %4x %08x %08x %s %3u"
+ "UDP %-47.47s %-47.47s %4x %08x %08x %s %3u"
" %-8.8s %08x %lx\n",
lbuff,
rbuff,
@@ -145,11 +142,12 @@ static void rxrpc_connection_seq_stop(struct seq_file *seq, void *v)
static int rxrpc_connection_seq_show(struct seq_file *seq, void *v)
{
struct rxrpc_connection *conn;
- char lbuff[4 + 4 + 4 + 4 + 5 + 1], rbuff[4 + 4 + 4 + 4 + 5 + 1];
+ char lbuff[50], rbuff[50];
if (v == &rxrpc_connection_proc_list) {
seq_puts(seq,
- "Proto Local Remote "
+ "Proto Local "
+ " Remote "
" SvID ConnID End Use State Key "
" Serial ISerial\n"
);
@@ -163,16 +161,12 @@ static int rxrpc_connection_seq_show(struct seq_file *seq, void *v)
goto print;
}
- sprintf(lbuff, "%pI4:%u",
- &conn->params.local->srx.transport.sin.sin_addr,
- ntohs(conn->params.local->srx.transport.sin.sin_port));
+ sprintf(lbuff, "%pISpc", &conn->params.local->srx.transport);
- sprintf(rbuff, "%pI4:%u",
- &conn->params.peer->srx.transport.sin.sin_addr,
- ntohs(conn->params.peer->srx.transport.sin.sin_port));
+ sprintf(rbuff, "%pISpc", &conn->params.peer->srx.transport);
print:
seq_printf(seq,
- "UDP %-22.22s %-22.22s %4x %08x %s %3u"
+ "UDP %-47.47s %-47.47s %4x %08x %s %3u"
" %s %08x %08x %08x\n",
lbuff,
rbuff,
^ permalink raw reply related
* Modification to skb->queue_mapping affecting performance
From: Michael Ma @ 2016-09-13 22:59 UTC (permalink / raw)
To: Linux Kernel Network Developers
Hi -
We currently use mqprio on ifb to work around the qdisc root lock
contention on the receiver side. The problem that we found was that
queue_mapping is already set when redirecting from ingress qdisc to
ifb (based on RX selection, I guess?) so the TX queue selection is not
based on priority.
Then we implemented a filter which can set skb->queue_mapping to 0 so
that TX queue selection can be done as expected and flows with
different priorities will go through different TX queues. However with
the queue_mapping recomputed, we found the achievable bandwidth with
small packets (512 bytes) dropped significantly if they're targeting
different queues. From perf profile I don't see any bottleneck from
CPU perspective.
Any thoughts on why modifying queue_mapping will have this kind of
effect? Also is there any better way of achieving receiver side
throttling using HTB while avoiding the qdisc root lock on ifb?
Thanks,
Michael
^ permalink raw reply
* re
From: Mrs. Maria-Elisabeth Schaeffler @ 2016-09-13 23:04 UTC (permalink / raw)
Did you get my message?
^ permalink raw reply
* Re: Modification to skb->queue_mapping affecting performance
From: Eric Dumazet @ 2016-09-13 23:07 UTC (permalink / raw)
To: Michael Ma; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAAmHdhxyrwJXHysUU6KoXeTaRHK8Mxr5PH9j4drR_87T+Wnypg@mail.gmail.com>
On Tue, 2016-09-13 at 15:59 -0700, Michael Ma wrote:
> Hi -
>
> We currently use mqprio on ifb to work around the qdisc root lock
> contention on the receiver side. The problem that we found was that
> queue_mapping is already set when redirecting from ingress qdisc to
> ifb (based on RX selection, I guess?) so the TX queue selection is not
> based on priority.
>
> Then we implemented a filter which can set skb->queue_mapping to 0 so
> that TX queue selection can be done as expected and flows with
> different priorities will go through different TX queues. However with
> the queue_mapping recomputed, we found the achievable bandwidth with
> small packets (512 bytes) dropped significantly if they're targeting
> different queues. From perf profile I don't see any bottleneck from
> CPU perspective.
>
> Any thoughts on why modifying queue_mapping will have this kind of
> effect? Also is there any better way of achieving receiver side
> throttling using HTB while avoiding the qdisc root lock on ifb?
But, how many queues do you have on your NIC, and have you setup ifb to
have a same number of queues ?
There is no qdisc lock contention anymore AFAIK, since each cpu will use
a dedicate IFB queue and tasklet.
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Francois Romieu @ 2016-09-13 23:17 UTC (permalink / raw)
To: Rustad, Mark D
Cc: Alexei Starovoitov, Eric Dumazet, Tom Herbert, Brenden Blanco,
Linux Kernel Network Developers, intel-wired-lan, William Tu,
Jesper Dangaard Brouer, Cong Wang, David S. Miller
In-Reply-To: <AB6E7248-F1A3-4C9C-9249-7F25E51BA1E4@intel.com>
Rustad, Mark D <mark.d.rustad@intel.com> :
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
> > the point that it's only used virtualized, since PCI (not PCIE) have
> > been long dead.
>
> My point is precisely the opposite. It is a real device, it exists in real
> systems and it is used in those systems. I worked on embedded systems that
> ran Linux and used e1000 devices. I am sure they are still out there because
> customers are still paying for support of those systems.
Old PCI is not the bulk of my professional hardware but it's still used here,
both for networking and video. No embedded systems. Mostly dumb file serving
ranging from quad core to i5 and xeon. Recent kernels.
> The day is coming when all the motherboards with PCI(-X) will be gone, but I
> think it is still at least a few years off.
Add some time for the PCI <-> PCIe adapters to disappear as well. :o)
--
Ueimor
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexei Starovoitov @ 2016-09-13 23:40 UTC (permalink / raw)
To: Rustad, Mark D
Cc: Eric Dumazet, Tom Herbert, Brenden Blanco,
Linux Kernel Network Developers, intel-wired-lan,
Jesper Dangaard Brouer, Cong Wang, David S. Miller, William Tu
In-Reply-To: <151DF165-61A2-428A-BD24-04E7704F9724@intel.com>
On Tue, Sep 13, 2016 at 10:41:12PM +0000, Rustad, Mark D wrote:
> That said, I can see that you have tried to keep the original code path
> pretty much intact. I would note that you introduced rcu calls into the !bpf
> path that would never have been done before. While that should be ok, I
> would really like to see it tested, at least for the !bpf case, on real
> hardware to be sure.
please go ahead and test. rcu_read_lock is zero extra instructions
for everything but preempt or debug kernels.
^ permalink raw reply
* Re: [PATCH v2] bnx2: Reset device during driver initialization
From: Baoquan He @ 2016-09-13 23:50 UTC (permalink / raw)
To: David Miller
Cc: netdev, sony.chacko, Dept-HSGLinuxNICDev, linux-kernel, kexec,
joro, dyoung
In-Reply-To: <20160913.112543.283011433711221158.davem@davemloft.net>
On 09/13/16 at 11:25am, David Miller wrote:
>
> Just to be clear, I did actually apply this v2 of the patch
> rather than the initial version.:)
Thanks a lot!
^ permalink raw reply
* Re: Modification to skb->queue_mapping affecting performance
From: Eric Dumazet @ 2016-09-14 0:09 UTC (permalink / raw)
To: Michael Ma; +Cc: netdev
In-Reply-To: <CAAmHdhw6JXYtYNjcJchieW-KcwRkNnFmXTRTXrLQS_Kh+rfP2w@mail.gmail.com>
On Tue, 2016-09-13 at 16:30 -0700, Michael Ma wrote:
> The RX queue number I found from "ls /sys/class/net/eth0/queues" is
> 64. (is this the correct way of identifying the queue number on NIC?)
> I setup ifb with 24 queues which is equal to the TX queue number of
> eth0 and also the number of CPU cores.
Please do not drop netdev@ from this mail exchange.
ethtool -l eth0
>
> > There is no qdisc lock contention anymore AFAIK, since each cpu will use
> > a dedicate IFB queue and tasklet.
> >
> How is this achieved? I thought qdisc on ifb will still be protected
> by the qdisc root lock in __dev_xmit_skb() so essentially all threads
> processing qdisc are still serialized without using MQ?
You have to properly setup ifb/mq like in :
# netem based setup, installed at receiver side only
ETH=eth0
IFB=ifb10
#DELAY="delay 100ms"
EST="est 1sec 4sec"
#REORDER=1000us
#LOSS="loss 2.0"
TXQ=24 # change this to number of TX queues on the physical NIC
ip link add $IFB numtxqueues $TXQ type ifb
ip link set dev $IFB up
tc qdisc del dev $ETH ingress 2>/dev/null
tc qdisc add dev $ETH ingress 2>/dev/null
tc filter add dev $ETH parent ffff: \
protocol ip u32 match u32 0 0 flowid 1:1 \
action mirred egress redirect dev $IFB
tc qdisc del dev $IFB root 2>/dev/null
tc qdisc add dev $IFB root handle 1: mq
for i in `seq 1 $TXQ`
do
slot=$( printf %x $(( i )) )
tc qd add dev $IFB parent 1:$slot $EST netem \
limit 100000 $DELAY $REORDER $LOSS
done
^ permalink raw reply
* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Rustad, Mark D @ 2016-09-14 0:13 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Tom Herbert, Brenden Blanco,
Linux Kernel Network Developers, intel-wired-lan,
Jesper Dangaard Brouer, Cong Wang, David S. Miller, William Tu
In-Reply-To: <20160913234039.GA42336@ast-mbp.thefacebook.com>
[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Tue, Sep 13, 2016 at 10:41:12PM +0000, Rustad, Mark D wrote:
>> That said, I can see that you have tried to keep the original code path
>> pretty much intact. I would note that you introduced rcu calls into the
>> !bpf
>> path that would never have been done before. While that should be ok, I
>> would really like to see it tested, at least for the !bpf case, on real
>> hardware to be sure.
>
> please go ahead and test. rcu_read_lock is zero extra instructions
> for everything but preempt or debug kernels.
Well, I don't have any hardware in hand to test with, though my former
employer would. I guess my current employer would too! :-) FWIW, the kernel
used in that system I referred to before was a preempt kernel.
The test matrix is large, the tail is long and you can't just gloss these
things over. I understand that it isn't the focus of your work, just as
regression testing the e1000 is not the focus of any of our work any more.
That is precisely why it is a sensitive area.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox