* [PATCH net-next 3/3] udp: keep the sk_receive_queue held when splicing
From: Paolo Abeni @ 2017-05-15 9:01 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet
In-Reply-To: <cover.1494837879.git.pabeni@redhat.com>
On packet reception, when we are forced to splice the
sk_receive_queue, we can keep the related lock held, so
that we can avoid re-acquiring it, if fwd memory
scheduling is required.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/ipv4/udp.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 492c76b..d698973 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1164,7 +1164,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
}
/* fully reclaim rmem/fwd memory allocated for skb */
-static void udp_rmem_release(struct sock *sk, int size, int partial)
+static void udp_rmem_release(struct sock *sk, int size, int partial,
+ int rx_queue_lock_held)
{
struct udp_sock *up = udp_sk(sk);
struct sk_buff_head *sk_queue;
@@ -1181,9 +1182,13 @@ static void udp_rmem_release(struct sock *sk, int size, int partial)
}
up->forward_deficit = 0;
- /* acquire the sk_receive_queue for fwd allocated memory scheduling */
+ /* acquire the sk_receive_queue for fwd allocated memory scheduling,
+ * if the called don't held it already
+ */
sk_queue = &sk->sk_receive_queue;
- spin_lock(&sk_queue->lock);
+ if (!rx_queue_lock_held)
+ spin_lock(&sk_queue->lock);
+
sk->sk_forward_alloc += size;
amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
@@ -1197,7 +1202,8 @@ static void udp_rmem_release(struct sock *sk, int size, int partial)
/* this can save us from acquiring the rx queue lock on next receive */
skb_queue_splice_tail_init(sk_queue, &up->reader_queue);
- spin_unlock(&sk_queue->lock);
+ if (!rx_queue_lock_held)
+ spin_unlock(&sk_queue->lock);
}
/* Note: called with reader_queue.lock held.
@@ -1207,10 +1213,16 @@ static void udp_rmem_release(struct sock *sk, int size, int partial)
*/
void udp_skb_destructor(struct sock *sk, struct sk_buff *skb)
{
- udp_rmem_release(sk, skb->dev_scratch, 1);
+ udp_rmem_release(sk, skb->dev_scratch, 1, 0);
}
EXPORT_SYMBOL(udp_skb_destructor);
+/* as above, but the caller held the rx queue lock, too */
+void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
+{
+ udp_rmem_release(sk, skb->dev_scratch, 1, 1);
+}
+
/* Idea of busylocks is to let producers grab an extra spinlock
* to relieve pressure on the receive_queue spinlock shared by consumer.
* Under flood, this means that only one producer can be in line
@@ -1325,7 +1337,7 @@ void udp_destruct_sock(struct sock *sk)
total += skb->truesize;
kfree_skb(skb);
}
- udp_rmem_release(sk, total, 0);
+ udp_rmem_release(sk, total, 0, 1);
inet_sock_destruct(sk);
}
@@ -1397,7 +1409,7 @@ static int first_packet_length(struct sock *sk)
}
res = skb ? skb->len : -1;
if (total)
- udp_rmem_release(sk, total, 1);
+ udp_rmem_release(sk, total, 1, 0);
spin_unlock_bh(&rcvq->lock);
return res;
}
@@ -1471,16 +1483,20 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
goto busy_check;
}
- /* refill the reader queue and walk it again */
+ /* refill the reader queue and walk it again
+ * keep both queues locked to avoid re-acquiring
+ * the sk_receive_queue lock if fwd memory scheduling
+ * is needed.
+ */
_off = *off;
spin_lock(&sk_queue->lock);
skb_queue_splice_tail_init(sk_queue, queue);
- spin_unlock(&sk_queue->lock);
skb = __skb_try_recv_from_queue(sk, queue, flags,
- udp_skb_destructor,
+ udp_skb_dtor_locked,
peeked, &_off, err,
&last);
+ spin_unlock(&sk_queue->lock);
spin_unlock_bh(&queue->lock);
if (skb) {
*off = _off;
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 2/3] udp: use a separate rx queue for packet reception
From: Paolo Abeni @ 2017-05-15 9:01 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet
In-Reply-To: <cover.1494837879.git.pabeni@redhat.com>
under udp flood the sk_receive_queue spinlock is heavily contended.
This patch try to reduce the contention on such lock adding a
second receive queue to the udp sockets; recvmsg() looks first
in such queue and, only if empty, tries to fetch the data from
sk_receive_queue. The latter is spliced into the newly added
queue every time the receive path has to acquire the
sk_receive_queue lock.
The accounting of forward allocated memory is still protected with
the sk_receive_queue lock, so udp_rmem_release() needs to acquire
both locks when the forward deficit is flushed.
On specific scenarios we can end up acquiring and releasing the
sk_receive_queue lock multiple times; that will be covered by
the next patch
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/linux/udp.h | 3 ++
include/net/udp.h | 9 +---
include/net/udplite.h | 2 +-
net/ipv4/udp.c | 138 ++++++++++++++++++++++++++++++++++++++++++++------
net/ipv6/udp.c | 3 +-
5 files changed, 131 insertions(+), 24 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 6cb4061..eaea63b 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -80,6 +80,9 @@ struct udp_sock {
struct sk_buff *skb,
int nhoff);
+ /* udp_recvmsg try to use this before splicing sk_receive_queue */
+ struct sk_buff_head reader_queue ____cacheline_aligned_in_smp;
+
/* This field is dirtied by udp_recvmsg() */
int forward_deficit;
};
diff --git a/include/net/udp.h b/include/net/udp.h
index 3391dbd..1468dbd 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -249,13 +249,8 @@ void udp_destruct_sock(struct sock *sk);
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len);
int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb);
void udp_skb_destructor(struct sock *sk, struct sk_buff *skb);
-static inline struct sk_buff *
-__skb_recv_udp(struct sock *sk, unsigned int flags, int noblock, int *peeked,
- int *off, int *err)
-{
- return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
- udp_skb_destructor, peeked, off, err);
-}
+struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
+ int noblock, int *peeked, int *off, int *err);
static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags,
int noblock, int *err)
{
diff --git a/include/net/udplite.h b/include/net/udplite.h
index ea34052..b7a18f6 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -26,8 +26,8 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset,
/* Designate sk as UDP-Lite socket */
static inline int udplite_sk_init(struct sock *sk)
{
+ udp_init_sock(sk);
udp_sk(sk)->pcflag = UDPLITE_BIT;
- sk->sk_destruct = udp_destruct_sock;
return 0;
}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ea6e4cf..492c76b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1167,19 +1167,24 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
static void udp_rmem_release(struct sock *sk, int size, int partial)
{
struct udp_sock *up = udp_sk(sk);
+ struct sk_buff_head *sk_queue;
int amt;
if (likely(partial)) {
up->forward_deficit += size;
size = up->forward_deficit;
if (size < (sk->sk_rcvbuf >> 2) &&
- !skb_queue_empty(&sk->sk_receive_queue))
+ !skb_queue_empty(&up->reader_queue))
return;
} else {
size += up->forward_deficit;
}
up->forward_deficit = 0;
+ /* acquire the sk_receive_queue for fwd allocated memory scheduling */
+ sk_queue = &sk->sk_receive_queue;
+ spin_lock(&sk_queue->lock);
+
sk->sk_forward_alloc += size;
amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
sk->sk_forward_alloc -= amt;
@@ -1188,9 +1193,14 @@ static void udp_rmem_release(struct sock *sk, int size, int partial)
__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
atomic_sub(size, &sk->sk_rmem_alloc);
+
+ /* this can save us from acquiring the rx queue lock on next receive */
+ skb_queue_splice_tail_init(sk_queue, &up->reader_queue);
+
+ spin_unlock(&sk_queue->lock);
}
-/* Note: called with sk_receive_queue.lock held.
+/* Note: called with reader_queue.lock held.
* Instead of using skb->truesize here, find a copy of it in skb->dev_scratch
* This avoids a cache line miss while receive_queue lock is held.
* Look at __udp_enqueue_schedule_skb() to find where this copy is done.
@@ -1306,10 +1316,12 @@ EXPORT_SYMBOL_GPL(__udp_enqueue_schedule_skb);
void udp_destruct_sock(struct sock *sk)
{
/* reclaim completely the forward allocated memory */
+ struct udp_sock *up = udp_sk(sk);
unsigned int total = 0;
struct sk_buff *skb;
- while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+ skb_queue_splice_tail_init(&sk->sk_receive_queue, &up->reader_queue);
+ while ((skb = __skb_dequeue(&up->reader_queue)) != NULL) {
total += skb->truesize;
kfree_skb(skb);
}
@@ -1321,6 +1333,7 @@ EXPORT_SYMBOL_GPL(udp_destruct_sock);
int udp_init_sock(struct sock *sk)
{
+ skb_queue_head_init(&udp_sk(sk)->reader_queue);
sk->sk_destruct = udp_destruct_sock;
return 0;
}
@@ -1338,6 +1351,26 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
}
EXPORT_SYMBOL_GPL(skb_consume_udp);
+static struct sk_buff *__first_packet_length(struct sock *sk,
+ struct sk_buff_head *rcvq,
+ int *total)
+{
+ struct sk_buff *skb;
+
+ while ((skb = skb_peek(rcvq)) != NULL &&
+ udp_lib_checksum_complete(skb)) {
+ __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+ IS_UDPLITE(sk));
+ __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+ IS_UDPLITE(sk));
+ atomic_inc(&sk->sk_drops);
+ __skb_unlink(skb, rcvq);
+ *total += skb->truesize;
+ kfree_skb(skb);
+ }
+ return skb;
+}
+
/**
* first_packet_length - return length of first packet in receive queue
* @sk: socket
@@ -1347,22 +1380,20 @@ EXPORT_SYMBOL_GPL(skb_consume_udp);
*/
static int first_packet_length(struct sock *sk)
{
- struct sk_buff_head *rcvq = &sk->sk_receive_queue;
+ struct sk_buff_head *rcvq = &udp_sk(sk)->reader_queue;
+ struct sk_buff_head *sk_queue = &sk->sk_receive_queue;
struct sk_buff *skb;
int total = 0;
int res;
spin_lock_bh(&rcvq->lock);
- while ((skb = skb_peek(rcvq)) != NULL &&
- udp_lib_checksum_complete(skb)) {
- __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
- IS_UDPLITE(sk));
- __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
- IS_UDPLITE(sk));
- atomic_inc(&sk->sk_drops);
- __skb_unlink(skb, rcvq);
- total += skb->truesize;
- kfree_skb(skb);
+ skb = __first_packet_length(sk, rcvq, &total);
+ if (!skb && !skb_queue_empty(sk_queue)) {
+ spin_lock(&sk_queue->lock);
+ skb_queue_splice_tail_init(sk_queue, rcvq);
+ spin_unlock(&sk_queue->lock);
+
+ skb = __first_packet_length(sk, rcvq, &total);
}
res = skb ? skb->len : -1;
if (total)
@@ -1400,6 +1431,79 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
}
EXPORT_SYMBOL(udp_ioctl);
+struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
+ int noblock, int *peeked, int *off, int *err)
+{
+ struct sk_buff_head *sk_queue = &sk->sk_receive_queue;
+ struct sk_buff_head *queue;
+ struct sk_buff *last;
+ long timeo;
+ int error;
+
+ queue = &udp_sk(sk)->reader_queue;
+ flags |= noblock ? MSG_DONTWAIT : 0;
+ timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+ do {
+ struct sk_buff *skb;
+
+ error = sock_error(sk);
+ if (error)
+ break;
+
+ error = -EAGAIN;
+ *peeked = 0;
+ do {
+ int _off = *off;
+
+ spin_lock_bh(&queue->lock);
+ skb = __skb_try_recv_from_queue(sk, queue, flags,
+ udp_skb_destructor,
+ peeked, &_off, err,
+ &last);
+ if (skb) {
+ spin_unlock_bh(&queue->lock);
+ *off = _off;
+ return skb;
+ }
+
+ if (skb_queue_empty(sk_queue)) {
+ spin_unlock_bh(&queue->lock);
+ goto busy_check;
+ }
+
+ /* refill the reader queue and walk it again */
+ _off = *off;
+ spin_lock(&sk_queue->lock);
+ skb_queue_splice_tail_init(sk_queue, queue);
+ spin_unlock(&sk_queue->lock);
+
+ skb = __skb_try_recv_from_queue(sk, queue, flags,
+ udp_skb_destructor,
+ peeked, &_off, err,
+ &last);
+ spin_unlock_bh(&queue->lock);
+ if (skb) {
+ *off = _off;
+ return skb;
+ }
+
+busy_check:
+ if (!sk_can_busy_loop(sk))
+ break;
+
+ sk_busy_loop(sk, flags & MSG_DONTWAIT);
+ } while (!skb_queue_empty(sk_queue));
+
+ /* sk_queue is empty, reader_queue may contain peeked packets */
+ } while (timeo &&
+ !__skb_wait_for_more_packets(sk, &error, &timeo,
+ (struct sk_buff *)sk_queue));
+
+ *err = error;
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(__skb_recv_udp);
+
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -1490,7 +1594,8 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
return err;
csum_copy_err:
- if (!__sk_queue_drop_skb(sk, skb, flags, udp_skb_destructor)) {
+ if (!__sk_queue_drop_skb(sk, &udp_sk(sk)->reader_queue, skb, flags,
+ udp_skb_destructor)) {
UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, is_udplite);
UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
}
@@ -2325,6 +2430,9 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
+ if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
+ mask |= POLLIN | POLLRDNORM;
+
sock_rps_record_flow(sk);
/* Check for false positives due to checksum errors */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 04862ab..f78fdf8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -455,7 +455,8 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
return err;
csum_copy_err:
- if (!__sk_queue_drop_skb(sk, skb, flags, udp_skb_destructor)) {
+ if (!__sk_queue_drop_skb(sk, &udp_sk(sk)->reader_queue, skb, flags,
+ udp_skb_destructor)) {
if (is_udp4) {
UDP_INC_STATS(sock_net(sk),
UDP_MIB_CSUMERRORS, is_udplite);
--
2.9.3
^ permalink raw reply related
* [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: David Lebrun @ 2017-05-15 9:18 UTC (permalink / raw)
To: netdev; +Cc: daniel, David Lebrun
When seg6.h is included in a user space program that also includes
netinet/in.h, it results in multiple definitions of structures such as
struct in6_addr. Recent glibc versions have a workaround that consists in
defining __USE_KERNEL_IPV6_DEFS to prevent duplicates. However, such a
program will fail to compile with older glibc versions.
This patch ensures that including seg6.h will work in any case.
Fixes: ea3ebc73b46fbdb049dafd47543bb22efaa09c8e ("uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
include/uapi/linux/seg6.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/uapi/linux/seg6.h b/include/uapi/linux/seg6.h
index 7278511..52b8f46 100644
--- a/include/uapi/linux/seg6.h
+++ b/include/uapi/linux/seg6.h
@@ -15,7 +15,16 @@
#define _UAPI_LINUX_SEG6_H
#include <linux/types.h>
+
+#ifdef __KERNEL__
#include <linux/in6.h> /* For struct in6_addr. */
+#else
+#ifdef __USE_KERNEL_IPV6_DEFS
+#include <linux/in6.h>
+#else
+#include <netinet/in.h>
+#endif
+#endif
/*
* SRH
--
2.10.2
^ permalink raw reply related
* Re: [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: Daniel Borkmann @ 2017-05-15 10:05 UTC (permalink / raw)
To: David Lebrun; +Cc: netdev
In-Reply-To: <20170515091828.19349-1-david.lebrun@uclouvain.be>
On 05/15/2017 11:18 AM, David Lebrun wrote:
> When seg6.h is included in a user space program that also includes
> netinet/in.h, it results in multiple definitions of structures such as
> struct in6_addr. Recent glibc versions have a workaround that consists in
> defining __USE_KERNEL_IPV6_DEFS to prevent duplicates. However, such a
> program will fail to compile with older glibc versions.
>
> This patch ensures that including seg6.h will work in any case.
>
> Fixes: ea3ebc73b46fbdb049dafd47543bb22efaa09c8e ("uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors")
> Reported-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
> ---
> include/uapi/linux/seg6.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/uapi/linux/seg6.h b/include/uapi/linux/seg6.h
> index 7278511..52b8f46 100644
> --- a/include/uapi/linux/seg6.h
> +++ b/include/uapi/linux/seg6.h
> @@ -15,7 +15,16 @@
> #define _UAPI_LINUX_SEG6_H
>
> #include <linux/types.h>
> +
> +#ifdef __KERNEL__
> #include <linux/in6.h> /* For struct in6_addr. */
> +#else
> +#ifdef __USE_KERNEL_IPV6_DEFS
> +#include <linux/in6.h>
> +#else
> +#include <netinet/in.h>
> +#endif
> +#endif
>
> /*
> * SRH
>
When this gets pulled into iproute2's include/linux/seg6.h due to
header rebase, we still have the same effect.
__USE_KERNEL_IPV6_DEFS gets defined by ip/iproute_lwtunnel.c, and
when we then include above header, we end up including linux/in6.h
just like before, same compile error in iproute2.
Or, is there still another fix for iproute2 coming after this has
landed?
Thanks,
Daniel
ip
CC iproute_lwtunnel.o
In file included from ../include/linux/seg6.h:23:0,
from iproute_lwtunnel.c:26:
../include/linux/in6.h:131:26: error: expected identifier before numeric constant
#define IPPROTO_HOPOPTS 0 /* IPv6 hop-by-hop options */
^
In file included from /usr/include/resolv.h:57:0,
from ../include/utils.h:6,
from iproute_lwtunnel.c:32:
/usr/include/netinet/in.h:196:8: error: redefinition of ‘struct in6_addr’
struct in6_addr
^
In file included from ../include/linux/seg6.h:23:0,
from iproute_lwtunnel.c:26:
../include/linux/in6.h:32:8: note: originally defined here
struct in6_addr {
^
In file included from /usr/include/resolv.h:57:0,
from ../include/utils.h:6,
from iproute_lwtunnel.c:32:
/usr/include/netinet/in.h:237:8: error: redefinition of ‘struct sockaddr_in6’
struct sockaddr_in6
^
In file included from ../include/linux/seg6.h:23:0,
from iproute_lwtunnel.c:26:
../include/linux/in6.h:49:8: note: originally defined here
struct sockaddr_in6 {
^
In file included from /usr/include/resolv.h:57:0,
from ../include/utils.h:6,
from iproute_lwtunnel.c:32:
/usr/include/netinet/in.h:273:8: error: redefinition of ‘struct ipv6_mreq’
struct ipv6_mreq
^
In file included from ../include/linux/seg6.h:23:0,
from iproute_lwtunnel.c:26:
../include/linux/in6.h:59:8: note: originally defined here
struct ipv6_mreq {
^
iproute_lwtunnel.c: In function ‘parse_encap_seg6’:
iproute_lwtunnel.c:375:3: warning: passing argument 3 of ‘inet_get_addr’ from incompatible pointer type [enabled by default]
inet_get_addr(s, NULL, &srh->segments[i]);
^
In file included from iproute_lwtunnel.c:32:0:
../include/utils.h:241:5: note: expected ‘struct in6_addr *’ but argument is of type ‘struct in6_addr *’
int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
^
make[1]: *** [iproute_lwtunnel.o] Error 1
make: *** [all] Error 2
^ permalink raw reply
* 23208 netdev
From: anita.traylor @ 2017-05-15 10:05 UTC (permalink / raw)
To: netdev
[-- Attachment #1: 58825288.zip --]
[-- Type: application/zip, Size: 4756 bytes --]
^ permalink raw reply
* Re: [PATCH v2 net-next 06/12] ep93xx_eth: add GRO support
From: Lennert Buytenhek @ 2017-05-15 10:31 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Eric Dumazet, Ryan Mallon,
Hartley Sweeten
In-Reply-To: <20170204232502.22361-7-edumazet@google.com>
On Sat, Feb 04, 2017 at 03:24:56PM -0800, Eric Dumazet wrote:
> Use napi_complete_done() instead of __napi_complete() to :
>
> 1) Get support of gro_flush_timeout if opt-in
> 2) Not rearm interrupts for busy-polling users.
> 3) use standard NAPI API.
> 4) get rid of baroque code and ease maintenance.
>
> [...]
>
> @@ -310,35 +311,17 @@ static int ep93xx_rx(struct net_device *dev, int processed, int budget)
> return processed;
> }
>
> -static int ep93xx_have_more_rx(struct ep93xx_priv *ep)
> -{
> - struct ep93xx_rstat *rstat = ep->descs->rstat + ep->rx_pointer;
> - return !!((rstat->rstat0 & RSTAT0_RFP) && (rstat->rstat1 & RSTAT1_RFP));
> -}
> -
> static int ep93xx_poll(struct napi_struct *napi, int budget)
> {
> struct ep93xx_priv *ep = container_of(napi, struct ep93xx_priv, napi);
> struct net_device *dev = ep->dev;
> - int rx = 0;
> -
> -poll_some_more:
> - rx = ep93xx_rx(dev, rx, budget);
> - if (rx < budget) {
> - int more = 0;
> + int rx;
>
> + rx = ep93xx_rx(dev, budget);
> + if (rx < budget && napi_complete_done(napi, rx)) {
> spin_lock_irq(&ep->rx_lock);
> - __napi_complete(napi);
> wrl(ep, REG_INTEN, REG_INTEN_TX | REG_INTEN_RX);
> - if (ep93xx_have_more_rx(ep)) {
> - wrl(ep, REG_INTEN, REG_INTEN_TX);
> - wrl(ep, REG_INTSTSP, REG_INTSTS_RX);
> - more = 1;
> - }
> spin_unlock_irq(&ep->rx_lock);
> -
> - if (more && napi_reschedule(napi))
> - goto poll_some_more;
> }
>
> if (rx) {
This code was the way it was because the ep93xx hardware is somewhat
braindead. If I remember correctly (but it's been a while since I wrote
this code):
1. ep93xx netdev IRQs are edge-triggered, so if you re-enable IRQs
while there was still work to be done, you will not get another IRQ.
2. Disabling an interrupt source in the interrupt mask register will
cause its interrupt status bit to always return zero, so you cannot
check whether an interrupt status is pending without having the
interrupt source enabled.
(I'll admit that a comment explaining this would have been in order.)
I don't know if we really care about this hardware anymore (I don't),
but the ep93xx platform is still listed as being maintained in the
MAINTAINERS file -- adding Ryan and Hartley.
^ permalink raw reply
* Re: [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: David Lebrun @ 2017-05-15 10:37 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <59197D53.8030104@iogearbox.net>
[-- Attachment #1.1: Type: text/plain, Size: 246 bytes --]
On 05/15/2017 12:05 PM, Daniel Borkmann wrote:
>
> Or, is there still another fix for iproute2 coming after this has
> landed?
Yes, I will submit the fix to iproute2 once this one has been applied,
so that I can reference it.
David
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply
* Re: [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: Daniel Borkmann @ 2017-05-15 10:43 UTC (permalink / raw)
To: David Lebrun; +Cc: netdev
In-Reply-To: <fa27cb59-a81e-7f74-7d3f-7feb6d8406f1@uclouvain.be>
On 05/15/2017 12:37 PM, David Lebrun wrote:
> On 05/15/2017 12:05 PM, Daniel Borkmann wrote:
>>
>> Or, is there still another fix for iproute2 coming after this has
>> landed?
>
> Yes, I will submit the fix to iproute2 once this one has been applied,
> so that I can reference it.
Okay, thanks!
^ permalink raw reply
* Re: [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: David Lebrun @ 2017-05-15 10:53 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <59198665.7060908@iogearbox.net>
[-- Attachment #1.1: Type: text/plain, Size: 315 bytes --]
On 05/15/2017 12:43 PM, Daniel Borkmann wrote:
>
> Okay, thanks!
Mmmh actually I can fix this without sending a patch to iproute2.
Handling the __USE_KERNEL_IPV6_DEFS case in seg6.h is wrong, as it is
already performed in netinet/in.h. I can fix the issue with a simpler ifdef.
Will send v2.
David
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply
* Re: [PATCH net] ipv6: sr: fix user space compilation error with old glibc
From: David Lebrun @ 2017-05-15 11:01 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <d402278a-543f-1cbb-32f9-7d61d8f62f51@uclouvain.be>
[-- Attachment #1.1: Type: text/plain, Size: 175 bytes --]
On 05/15/2017 12:53 PM, David Lebrun wrote:
> I can fix the issue with a simpler ifdef.
The simpler ifdef works fine, but a patch to iproute2 is still needed.
David
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply
* [PATCH net v2] ipv6: sr: fix user space compilation error with old glibc
From: David Lebrun @ 2017-05-15 11:03 UTC (permalink / raw)
To: netdev; +Cc: daniel, David Lebrun
When seg6.h is included in a user space program that also includes
netinet/in.h, it results in multiple definitions of structures such as
struct in6_addr. Recent glibc versions have a workaround that consists in
defining __USE_KERNEL_IPV6_DEFS to prevent duplicates. However, such a
program will fail to compile with older glibc versions.
This patch ensures that including seg6.h will work in any case.
v2: do not try to handle __USE_KERNEL_IPV6_DEFS case in seg6.h
Fixes: ea3ebc73b46fbdb049dafd47543bb22efaa09c8e ("uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
include/uapi/linux/seg6.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/uapi/linux/seg6.h b/include/uapi/linux/seg6.h
index 7278511..4055ff3 100644
--- a/include/uapi/linux/seg6.h
+++ b/include/uapi/linux/seg6.h
@@ -15,7 +15,12 @@
#define _UAPI_LINUX_SEG6_H
#include <linux/types.h>
+
+#ifdef __KERNEL__
#include <linux/in6.h> /* For struct in6_addr. */
+#else
+#include <netinet/in.h>
+#endif
/*
* SRH
--
2.10.2
^ permalink raw reply related
* [PATCH iproute2 net] ip: lwtunnel: remove definition of __USE_KERNEL_IPV6_DEFS
From: David Lebrun @ 2017-05-15 11:13 UTC (permalink / raw)
To: netdev; +Cc: daniel, David Lebrun
When __USE_KERNEL_IPV6_DEFS is set, netinet/in.h will not define structures
such as in6_addr. This is useful when linux/in6.h is also included.
However, older glibc versions do not support this switch.
This patch allows iproute2 to still be compiled with older glibc versions,
along with the kernel net-next patch that updates include/linux/seg6.h.
Fixes: e8493916a8ede9970732e33ea52d30b83071f401 ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
ip/iproute_lwtunnel.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 1395f03..434e60a 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -20,9 +20,6 @@
#include <linux/lwtunnel.h>
#include <linux/mpls_iptunnel.h>
-#ifndef __USE_KERNEL_IPV6_DEFS
-#define __USE_KERNEL_IPV6_DEFS
-#endif
#include <linux/seg6.h>
#include <linux/seg6_iptunnel.h>
#include <linux/seg6_hmac.h>
--
2.10.2
^ permalink raw reply related
* [PATCH] bridge: netlink: check vlan_default_pvid range
From: Tobias Jungel @ 2017-05-15 11:08 UTC (permalink / raw)
To: Stephen Hemminger, David S. Miller, netdev
Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch checks the passed pvid and
disables the pvid in case it is out of bounds.
Reproduce by calling:
[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port vlan ids
bridge0 9999 PVID Egress Untagged
dummy0 9999 PVID Egress Untagged
---
net/bridge/br_vlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index b838213..e363d75 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -820,7 +820,7 @@ int __br_vlan_set_default_pvid(struct net_bridge *br, u16 pvid)
int err = 0;
unsigned long *changed;
- if (!pvid) {
+ if (!pvid || pvid >= VLAN_VID_MASK) {
br_vlan_disable_default_pvid(br);
return 0;
}
--
2.9.4
^ permalink raw reply related
* [PATCH 1/2] net-next: stmmac: add adjust_link function
From: Corentin Labbe @ 2017-05-15 11:41 UTC (permalink / raw)
To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, linux-kernel, Corentin Labbe
My dwmac-sun8i serie will add some if (has_sun8i) to
stmmac_adjust_link()
Since the current stmmac_adjust_link() alreaady have lots of if (has_gmac/gmac4),
It is now better to create an adjust_link() function for each dwmac.
So this patch add an adjust_link() function pointer, and move code out
of stmmac_adjust_link to it.
Removing in the process stmmac_mac_flow_ctrl/stmmac_hw_fix_mac_speed
since there not used anymore.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
drivers/net/ethernet/stmicro/stmmac/common.h | 3 +
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 54 ++++++++++++++
.../net/ethernet/stmicro/stmmac/dwmac100_core.c | 46 ++++++++++++
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 54 ++++++++++++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 83 +---------------------
5 files changed, 158 insertions(+), 82 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index b7ce3fbb5375..451c231006fe 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -469,11 +469,14 @@ struct stmmac_dma_ops {
};
struct mac_device_info;
+struct stmmac_priv;
/* Helpers to program the MAC core */
struct stmmac_ops {
/* MAC core initialization */
void (*core_init)(struct mac_device_info *hw, int mtu);
+ /* adjust link */
+ int (*adjust_link)(struct stmmac_priv *priv);
/* Enable the MAC RX/TX */
void (*set_mac)(void __iomem *ioaddr, bool enable);
/* Enable and verify that the IPC module is supported */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index f3d9305e5f70..5f3aace46c41 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -26,6 +26,7 @@
#include <linux/slab.h>
#include <linux/ethtool.h>
#include <asm/io.h>
+#include "stmmac.h"
#include "stmmac_pcs.h"
#include "dwmac1000.h"
@@ -75,6 +76,58 @@ static void dwmac1000_core_init(struct mac_device_info *hw, int mtu)
#endif
}
+static int dwmac1000_adjust_link(struct stmmac_priv *priv)
+{
+ struct net_device *ndev = priv->dev;
+ struct phy_device *phydev = ndev->phydev;
+ int new_state = 0;
+ u32 tx_cnt = priv->plat->tx_queues_to_use;
+ u32 ctrl;
+
+ ctrl = readl(priv->ioaddr + GMAC_CONTROL);
+
+ if (phydev->duplex != priv->oldduplex) {
+ new_state = 1;
+ if (!(phydev->duplex))
+ ctrl &= ~GMAC_CONTROL_DM;
+ else
+ ctrl |= GMAC_CONTROL_DM;
+ priv->oldduplex = phydev->duplex;
+ }
+
+ if (phydev->pause)
+ priv->hw->mac->flow_ctrl(priv->hw, phydev->duplex, priv->flow_ctrl,
+ priv->pause, tx_cnt);
+
+ if (phydev->speed != priv->speed) {
+ new_state = 1;
+ switch (phydev->speed) {
+ case 1000:
+ ctrl &= ~GMAC_CONTROL_PS;
+ break;
+ case 100:
+ ctrl |= GMAC_CONTROL_PS;
+ ctrl |= GMAC_CONTROL_FES;
+ break;
+ case 10:
+ ctrl |= GMAC_CONTROL_PS;
+ ctrl |= ~GMAC_CONTROL_FES;
+ break;
+ default:
+ netif_warn(priv, link, priv->dev,
+ "broken speed: %d\n", phydev->speed);
+ phydev->speed = SPEED_UNKNOWN;
+ break;
+ }
+ if (phydev->speed != SPEED_UNKNOWN && likely(priv->plat->fix_mac_speed))
+ priv->plat->fix_mac_speed(priv->plat->bsp_priv, phydev->speed);
+ priv->speed = phydev->speed;
+ }
+
+ writel(ctrl, priv->ioaddr + GMAC_CONTROL);
+ return new_state;
+}
+
static int dwmac1000_rx_ipc_enable(struct mac_device_info *hw)
{
void __iomem *ioaddr = hw->pcsr;
@@ -490,6 +543,7 @@ static void dwmac1000_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x,
static const struct stmmac_ops dwmac1000_ops = {
.core_init = dwmac1000_core_init,
+ .adjust_link = dwmac1000_adjust_link,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac1000_rx_ipc_enable,
.dump_regs = dwmac1000_dump_regs,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 1b3609105484..ba3d46e65e1a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -27,6 +27,7 @@
#include <linux/crc32.h>
#include <asm/io.h>
#include "dwmac100.h"
+#include "stmmac.h"
static void dwmac100_core_init(struct mac_device_info *hw, int mtu)
{
@@ -40,6 +41,50 @@ static void dwmac100_core_init(struct mac_device_info *hw, int mtu)
#endif
}
+static int dwmac100_adjust_link(struct stmmac_priv *priv)
+{
+ struct net_device *ndev = priv->dev;
+ struct phy_device *phydev = ndev->phydev;
+ int new_state = 0;
+ u32 tx_cnt = priv->plat->tx_queues_to_use;
+ u32 ctrl;
+
+ ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
+ if (phydev->duplex != priv->oldduplex) {
+ new_state = 1;
+ if (!(phydev->duplex))
+ ctrl &= ~MAC_CONTROL_F;
+ else
+ ctrl |= MAC_CONTROL_F;
+ priv->oldduplex = phydev->duplex;
+ }
+
+ if (phydev->pause)
+ priv->hw->mac->flow_ctrl(priv->hw, phydev->duplex, priv->flow_ctrl,
+ priv->pause, tx_cnt);
+
+ if (phydev->speed != priv->speed) {
+ new_state = 1;
+ switch (phydev->speed) {
+ case 100:
+ case 10:
+ ctrl &= ~MAC_CONTROL_PS;
+ break;
+ default:
+ netif_warn(priv, link, priv->dev,
+ "broken speed: %d\n", phydev->speed);
+ phydev->speed = SPEED_UNKNOWN;
+ break;
+ }
+ if (phydev->speed != SPEED_UNKNOWN && likely(priv->plat->fix_mac_speed))
+ priv->plat->fix_mac_speed(priv->plat->bsp_priv, phydev->speed);
+ priv->speed = phydev->speed;
+ }
+
+ writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
+ return new_state;
+}
+
static void dwmac100_dump_mac_regs(struct mac_device_info *hw, u32 *reg_space)
{
void __iomem *ioaddr = hw->pcsr;
@@ -150,6 +195,7 @@ static void dwmac100_pmt(struct mac_device_info *hw, unsigned long mode)
static const struct stmmac_ops dwmac100_ops = {
.core_init = dwmac100_core_init,
+ .adjust_link = dwmac100_adjust_link,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac100_rx_ipc_enable,
.dump_regs = dwmac100_dump_mac_regs,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 48793f2e9307..133b6bcd7b61 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -19,6 +19,7 @@
#include <linux/io.h>
#include "stmmac_pcs.h"
#include "dwmac4.h"
+#include "stmmac.h"
static void dwmac4_core_init(struct mac_device_info *hw, int mtu)
{
@@ -59,6 +60,58 @@ static void dwmac4_core_init(struct mac_device_info *hw, int mtu)
writel(value, ioaddr + GMAC_INT_EN);
}
+static int dwmac4_adjust_link(struct stmmac_priv *priv)
+{
+ struct net_device *ndev = priv->dev;
+ struct phy_device *phydev = ndev->phydev;
+ int new_state = 0;
+ u32 tx_cnt = priv->plat->tx_queues_to_use;
+ u32 ctrl;
+
+ ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
+
+ if (phydev->duplex != priv->oldduplex) {
+ new_state = 1;
+ if (!(phydev->duplex))
+ ctrl &= ~GMAC_CONFIG_DM;
+ else
+ ctrl |= GMAC_CONFIG_DM;
+ priv->oldduplex = phydev->duplex;
+ }
+
+ if (phydev->pause)
+ priv->hw->mac->flow_ctrl(priv->hw, phydev->duplex, priv->flow_ctrl,
+ priv->pause, tx_cnt);
+
+ if (phydev->speed != priv->speed) {
+ new_state = 1;
+ switch (phydev->speed) {
+ case 1000:
+ ctrl &= ~GMAC_CONFIG_PS;
+ break;
+ case 100:
+ ctrl |= GMAC_CONFIG_PS;
+ ctrl |= GMAC_CONFIG_FES;
+ break;
+ case 10:
+ ctrl |= GMAC_CONFIG_PS;
+ ctrl |= ~GMAC_CONFIG_FES;
+ break;
+ default:
+ netif_warn(priv, link, priv->dev,
+ "broken speed: %d\n", phydev->speed);
+ phydev->speed = SPEED_UNKNOWN;
+ break;
+ }
+ if (phydev->speed != SPEED_UNKNOWN && likely(priv->plat->fix_mac_speed))
+ priv->plat->fix_mac_speed(priv->plat->bsp_priv, phydev->speed);
+ priv->speed = phydev->speed;
+ }
+
+ writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
+ return new_state;
+}
+
static void dwmac4_rx_queue_enable(struct mac_device_info *hw,
u8 mode, u32 queue)
{
@@ -669,6 +722,7 @@ static void dwmac4_debug(void __iomem *ioaddr, struct stmmac_extra_stats *x,
static const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
+ .adjust_link = dwmac4_adjust_link,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac4_rx_ipc_enable,
.rx_queue_enable = dwmac4_rx_queue_enable,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b05a042cf2c6..fb3e2ddaa7c9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -286,21 +286,6 @@ static inline u32 stmmac_rx_dirty(struct stmmac_priv *priv, u32 queue)
}
/**
- * stmmac_hw_fix_mac_speed - callback for speed selection
- * @priv: driver private structure
- * Description: on some platforms (e.g. ST), some HW system configuration
- * registers have to be set according to the link speed negotiated.
- */
-static inline void stmmac_hw_fix_mac_speed(struct stmmac_priv *priv)
-{
- struct net_device *ndev = priv->dev;
- struct phy_device *phydev = ndev->phydev;
-
- if (likely(priv->plat->fix_mac_speed))
- priv->plat->fix_mac_speed(priv->plat->bsp_priv, phydev->speed);
-}
-
-/**
* stmmac_enable_eee_mode - check and enter in LPI mode
* @priv: driver private structure
* Description: this function is to verify and enter in LPI mode in case of
@@ -759,19 +744,6 @@ static void stmmac_release_ptp(struct stmmac_priv *priv)
}
/**
- * stmmac_mac_flow_ctrl - Configure flow control in all queues
- * @priv: driver private structure
- * Description: It is used for configuring the flow control in all queues
- */
-static void stmmac_mac_flow_ctrl(struct stmmac_priv *priv, u32 duplex)
-{
- u32 tx_cnt = priv->plat->tx_queues_to_use;
-
- priv->hw->mac->flow_ctrl(priv->hw, duplex, priv->flow_ctrl,
- priv->pause, tx_cnt);
-}
-
-/**
* stmmac_adjust_link - adjusts the link parameters
* @dev: net device structure
* Description: this is the helper called by the physical abstraction layer
@@ -793,60 +765,7 @@ static void stmmac_adjust_link(struct net_device *dev)
spin_lock_irqsave(&priv->lock, flags);
if (phydev->link) {
- u32 ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
-
- /* Now we make sure that we can be in full duplex mode.
- * If not, we operate in half-duplex mode. */
- if (phydev->duplex != priv->oldduplex) {
- new_state = 1;
- if (!(phydev->duplex))
- ctrl &= ~priv->hw->link.duplex;
- else
- ctrl |= priv->hw->link.duplex;
- priv->oldduplex = phydev->duplex;
- }
- /* Flow Control operation */
- if (phydev->pause)
- stmmac_mac_flow_ctrl(priv, phydev->duplex);
-
- if (phydev->speed != priv->speed) {
- new_state = 1;
- switch (phydev->speed) {
- case 1000:
- if (priv->plat->has_gmac ||
- priv->plat->has_gmac4)
- ctrl &= ~priv->hw->link.port;
- break;
- case 100:
- if (priv->plat->has_gmac ||
- priv->plat->has_gmac4) {
- ctrl |= priv->hw->link.port;
- ctrl |= priv->hw->link.speed;
- } else {
- ctrl &= ~priv->hw->link.port;
- }
- break;
- case 10:
- if (priv->plat->has_gmac ||
- priv->plat->has_gmac4) {
- ctrl |= priv->hw->link.port;
- ctrl &= ~(priv->hw->link.speed);
- } else {
- ctrl &= ~priv->hw->link.port;
- }
- break;
- default:
- netif_warn(priv, link, priv->dev,
- "broken speed: %d\n", phydev->speed);
- phydev->speed = SPEED_UNKNOWN;
- break;
- }
- if (phydev->speed != SPEED_UNKNOWN)
- stmmac_hw_fix_mac_speed(priv);
- priv->speed = phydev->speed;
- }
-
- writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
+ new_state = priv->hw->mac->adjust_link(priv);
if (!priv->oldlink) {
new_state = 1;
--
2.13.0
^ permalink raw reply related
* [PATCH 2/2] net-next: stmmac: remove struct mac_link
From: Corentin Labbe @ 2017-05-15 11:41 UTC (permalink / raw)
To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, linux-kernel, Corentin Labbe
In-Reply-To: <20170515114140.1676-1-clabbe.montjoie@gmail.com>
With the usage of adjust_link(), the struct mac_link is now useless.
This patch remove it.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
drivers/net/ethernet/stmicro/stmmac/common.h | 7 -------
drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 3 ---
drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 3 ---
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 3 ---
4 files changed, 16 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 451c231006fe..63350b70e10f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -551,12 +551,6 @@ struct stmmac_hwtimestamp {
extern const struct stmmac_hwtimestamp stmmac_ptp;
extern const struct stmmac_mode_ops dwmac4_ring_mode_ops;
-struct mac_link {
- int port;
- int duplex;
- int speed;
-};
-
struct mii_regs {
unsigned int addr; /* MII Address */
unsigned int data; /* MII Data */
@@ -587,7 +581,6 @@ struct mac_device_info {
const struct stmmac_mode_ops *mode;
const struct stmmac_hwtimestamp *ptp;
struct mii_regs mii; /* MII register Addresses */
- struct mac_link link;
void __iomem *pcsr; /* vpointer to device CSRs */
int multicast_filter_bins;
int unicast_filter_entries;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 5f3aace46c41..52092ec5f4af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -585,9 +585,6 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
mac->mac = &dwmac1000_ops;
mac->dma = &dwmac1000_dma_ops;
- mac->link.port = GMAC_CONTROL_PS;
- mac->link.duplex = GMAC_CONTROL_DM;
- mac->link.speed = GMAC_CONTROL_FES;
mac->mii.addr = GMAC_MII_ADDR;
mac->mii.data = GMAC_MII_DATA;
mac->mii.addr_shift = 11;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index ba3d46e65e1a..faddbf3c2916 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -221,9 +221,6 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id)
mac->mac = &dwmac100_ops;
mac->dma = &dwmac100_dma_ops;
- mac->link.port = MAC_CONTROL_PS;
- mac->link.duplex = MAC_CONTROL_F;
- mac->link.speed = 0;
mac->mii.addr = MAC_MII_ADDR;
mac->mii.data = MAC_MII_DATA;
mac->mii.addr_shift = 11;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 133b6bcd7b61..baf32c91122d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -801,9 +801,6 @@ struct mac_device_info *dwmac4_setup(void __iomem *ioaddr, int mcbins,
if (mac->multicast_filter_bins)
mac->mcast_bits_log2 = ilog2(mac->multicast_filter_bins);
- mac->link.port = GMAC_CONFIG_PS;
- mac->link.duplex = GMAC_CONFIG_DM;
- mac->link.speed = GMAC_CONFIG_FES;
mac->mii.addr = GMAC_MDIO_ADDR;
mac->mii.data = GMAC_MDIO_DATA;
mac->mii.addr_shift = 21;
--
2.13.0
^ permalink raw reply related
* Re: [PATCH net] ipv6: avoid dad-failures for addresses with NODAD
From: David Ahern @ 2017-05-15 11:56 UTC (permalink / raw)
To: Mahesh Bandewar, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, David Miller
Cc: Eric Dumazet, Mahesh Bandewar
In-Reply-To: <20170513000339.15843-1-mahesh@bandewar.net>
On 5/12/17 6:03 PM, Mahesh Bandewar wrote:
> From: Mahesh Bandewar <maheshb@google.com>
>
> Every address gets added with TENTATIVE flag even for the addresses with
> IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
> we realize it's an address with NODAD and complete the process without
> sending any probe. However the TENTATIVE flags stays on the
> address for sometime enough to cause misinterpretation when we receive a NS.
> While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
> and endup with an address that was originally configured as NODAD with
> DADFAILED.
>
> We can't avoid scheduling dad_work for addresses with NODAD but we can
> avoid adding TENTATIVE flag to avoid this racy situation.
>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
> net/ipv6/addrconf.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index b09ac38d8dc4..53f2dc092023 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1022,7 +1022,10 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
> INIT_HLIST_NODE(&ifa->addr_lst);
> ifa->scope = scope;
> ifa->prefix_len = pfxlen;
> - ifa->flags = flags | IFA_F_TENTATIVE;
> + ifa->flags = flags;
> + /* No need to add the TENTATIVE flag for addresses with NODAD */
> + if (!(flags & IFA_F_NODAD))
> + ifa->flags |= IFA_F_TENTATIVE;
> ifa->valid_lft = valid_lft;
> ifa->prefered_lft = prefered_lft;
> ifa->cstamp = ifa->tstamp = jiffies;
>
LGTM.
Acked-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* [PATCH] usbnet: no address filtering on RNDIS
From: Oliver Neukum @ 2017-05-15 11:58 UTC (permalink / raw)
To: davem, netdev; +Cc: Oliver Neukum
RNDIS does not do multicast filtering and the commands crash a few devices.
Make it conditional.
Fixes: b527caee1b91946db844b1dc63d4f726958891c8
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: "Ridgway, Keith" <kridgway@harris.com>
---
drivers/net/usb/cdc_ether.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index f3ae88fdf332..64b1a6bdef98 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -313,9 +313,11 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf)
/* Some devices don't initialise properly. In particular
* the packet filter is not reset. There are devices that
* don't do reset all the way. So the packet filter should
- * be set to a sane initial value.
+ * be set to a sane initial value, if filtering is supported.
+ * RNDIS does not support it.
*/
- usbnet_cdc_update_filter(dev);
+ if (!rndis)
+ usbnet_cdc_update_filter(dev);
return 0;
--
2.12.0
^ permalink raw reply related
* Re: [PATCH] bridge: netlink: check vlan_default_pvid range
From: Sabrina Dubroca @ 2017-05-15 12:01 UTC (permalink / raw)
To: Tobias Jungel; +Cc: Stephen Hemminger, David S. Miller, netdev
In-Reply-To: <20170515110819.11847-1-tobias.jungel@bisdn.de>
Hi Tobias,
2017-05-15, 13:08:19 +0200, Tobias Jungel wrote:
> Currently it is allowed to set the default pvid of a bridge to a value
> above VLAN_VID_MASK (0xfff). This patch checks the passed pvid and
> disables the pvid in case it is out of bounds.
Could we return an error (-EINVAL) to userspace instead? Silently
disabling the feature seems confusing to me. This would probably be
better in br_validate() (like the IFLA_BR_VLAN_PROTOCOL check), since
there's already such a check when setting default_pvid from sysfs (in
br_vlan_set_default_pvid()).
>
> Reproduce by calling:
>
> [root@test ~]# ip l a type bridge
> [root@test ~]# ip l a type dummy
> [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
> [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
> [root@test ~]# ip l s dummy0 master bridge0
> [root@test ~]# bridge vlan
> port vlan ids
> bridge0 9999 PVID Egress Untagged
>
> dummy0 9999 PVID Egress Untagged
You'll also need to add a Signed-off-by, and a Fixes tag would be nice.
Thanks,
--
Sabrina
^ permalink raw reply
* Re: [PATCH] bridge: netlink: check vlan_default_pvid range
From: Nikolay Aleksandrov @ 2017-05-15 12:05 UTC (permalink / raw)
To: Tobias Jungel, Stephen Hemminger, David S. Miller, netdev
In-Reply-To: <20170515110819.11847-1-tobias.jungel@bisdn.de>
On 5/15/17 2:08 PM, Tobias Jungel wrote:
> Currently it is allowed to set the default pvid of a bridge to a value
> above VLAN_VID_MASK (0xfff). This patch checks the passed pvid and
> disables the pvid in case it is out of bounds.
>
> Reproduce by calling:
>
> [root@test ~]# ip l a type bridge
> [root@test ~]# ip l a type dummy
> [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
> [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
> [root@test ~]# ip l s dummy0 master bridge0
> [root@test ~]# bridge vlan
> port vlan ids
> bridge0 9999 PVID Egress Untagged
>
> dummy0 9999 PVID Egress Untagged
> ---
> net/bridge/br_vlan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Good catch, but this is not the right fix. Default pvid of 0 disables the
default pvid, but a wrong pvid should result in -EINVAL return instead of
0. Take a look how the parent default pvid function handles this case
(br_vlan_set_default_pvid).
Also please add a sign off and a proper Fixes tag.
For more information about submitting patches you can check
Documentation/SubmittingPatches
Thanks,
Nik
^ permalink raw reply
* Re: [PATCH] usbnet: no address filtering on RNDIS
From: Bjørn Mork @ 2017-05-15 12:20 UTC (permalink / raw)
To: Oliver Neukum; +Cc: davem, netdev
In-Reply-To: <20170515115853.32688-1-oneukum@suse.com>
Oliver Neukum <oneukum@suse.com> writes:
> RNDIS does not do multicast filtering and the commands crash a few devices.
> Make it conditional.
Strange. I thought we already discussed this when the filter reset
request was initially added 3 years ago. Ref
https://patchwork.ozlabs.org/patch/374137/
Bjørn
^ permalink raw reply
* Re: [PATCH net v2] i40e/i40evf: proper update of the page_offset field
From: Mauro Rodrigues @ 2017-05-15 12:45 UTC (permalink / raw)
To: netdev
In-Reply-To: <20170515045200.27789-1-bjorn.topel@gmail.com>
On Mon, May 15, 2017 at 06:52:00AM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
> i40e_build_skb updates the page_offset field with an incorrect offset,
> which can lead to data corruption. This patch updates page_offset
> correctly, by properly setting truesize.
>
> Note that the bug only appears on architectures where PAGE_SIZE is
> 8192 or larger.
>
> Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
I tested the fix and it solves the problem for me! Thank you!
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++-
> drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 29321a6167a6..cd894f4023b1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1854,7 +1854,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
> #if (PAGE_SIZE < 8192)
> unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
> #else
> - unsigned int truesize = SKB_DATA_ALIGN(size);
> + unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
> + SKB_DATA_ALIGN(I40E_SKB_PAD + size);
> #endif
> struct sk_buff *skb;
>
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> index dfe241a12ad0..12b02e530503 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> @@ -1190,7 +1190,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
> #if (PAGE_SIZE < 8192)
> unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
> #else
> - unsigned int truesize = SKB_DATA_ALIGN(size);
> + unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
> + SKB_DATA_ALIGN(I40E_SKB_PAD + size);
> #endif
> struct sk_buff *skb;
>
> --
> 2.11.0
>
^ permalink raw reply
* [RFC PATCH net-next] rxrpc: Support network namespacing
From: David Howells @ 2017-05-15 12:56 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Support network namespacing in AF_RXRPC with the following changes:
(1) All the local endpoint, peer and call lists, locks, counters, etc. are
moved into the per-namespace record.
(2) All the connection tracking is moved into the per-namespace record
with the exception of the client connection ID tree, which is kept
global so that connection IDs are kept unique per-machine.
Possibly the client connection management should be kept global so
that the number of active client connections is managed globally.
This makes for a slightly tricky cleanup operation and starts to look
like it ought to be control-grouped in some way.
(3) Each namespace gets its own epoch. This allows each network namespace
to pretend to be a separate client machine.
Possibly these should be kept unique per-machine also.
(4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
the contents reflect the namespace.
I'm not sure sticking them in their own directory is a good idea at
this point, though they are just info files.
fs/afs/ should be okay with this patch as it explicitly requires the current
net namespace to be init_net to permit a mount to proceed at the moment. It
will, however, need updating so that cells, IP addresses and DNS records are
per-namespace also.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/Makefile | 1
net/rxrpc/af_rxrpc.c | 35 +++++------
net/rxrpc/ar-internal.h | 65 ++++++++++++++++----
net/rxrpc/call_accept.c | 14 ++--
net/rxrpc/call_object.c | 39 +++++++-----
net/rxrpc/conn_client.c | 153 +++++++++++++++++++++++-----------------------
net/rxrpc/conn_object.c | 55 ++++++++---------
net/rxrpc/conn_service.c | 11 ++-
net/rxrpc/local_object.c | 48 +++++++-------
net/rxrpc/net_ns.c | 85 ++++++++++++++++++++++++++
net/rxrpc/peer_object.c | 26 ++++----
net/rxrpc/proc.c | 40 ++++++++----
12 files changed, 356 insertions(+), 216 deletions(-)
create mode 100644 net/rxrpc/net_ns.c
diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index b9da4d6b914f..9c68d2f8ba39 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -19,6 +19,7 @@ rxrpc-y := \
local_event.o \
local_object.o \
misc.o \
+ net_ns.o \
output.o \
peer_event.o \
peer_object.o \
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 7fb59c3f1542..cd34ffbff1d1 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -38,9 +38,6 @@ MODULE_PARM_DESC(debug, "RxRPC debugging mask");
static struct proto rxrpc_proto;
static const struct proto_ops rxrpc_rpc_ops;
-/* local epoch for detecting local-end reset */
-u32 rxrpc_epoch;
-
/* current debugging ID */
atomic_t rxrpc_debug_id;
@@ -155,7 +152,7 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len)
memcpy(&rx->srx, srx, sizeof(rx->srx));
- local = rxrpc_lookup_local(&rx->srx);
+ local = rxrpc_lookup_local(sock_net(sock->sk), &rx->srx);
if (IS_ERR(local)) {
ret = PTR_ERR(local);
goto error_unlock;
@@ -434,7 +431,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
ret = -EAFNOSUPPORT;
goto error_unlock;
}
- local = rxrpc_lookup_local(&rx->srx);
+ local = rxrpc_lookup_local(sock_net(sock->sk), &rx->srx);
if (IS_ERR(local)) {
ret = PTR_ERR(local);
goto error_unlock;
@@ -582,9 +579,6 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
_enter("%p,%d", sock, protocol);
- if (!net_eq(net, &init_net))
- return -EAFNOSUPPORT;
-
/* we support transport protocol UDP/UDP6 only */
if (protocol != PF_INET &&
IS_ENABLED(CONFIG_AF_RXRPC_IPV6) && protocol != PF_INET6)
@@ -780,8 +774,6 @@ static int __init af_rxrpc_init(void)
BUILD_BUG_ON(sizeof(struct rxrpc_skb_priv) > FIELD_SIZEOF(struct sk_buff, cb));
- get_random_bytes(&rxrpc_epoch, sizeof(rxrpc_epoch));
- rxrpc_epoch |= RXRPC_RANDOM_EPOCH;
get_random_bytes(&tmp, sizeof(tmp));
tmp &= 0x3fffffff;
if (tmp == 0)
@@ -809,6 +801,10 @@ static int __init af_rxrpc_init(void)
goto error_security;
}
+ ret = register_pernet_subsys(&rxrpc_net_ops);
+ if (ret)
+ goto error_pernet;
+
ret = proto_register(&rxrpc_proto, 1);
if (ret < 0) {
pr_crit("Cannot register protocol\n");
@@ -839,11 +835,6 @@ static int __init af_rxrpc_init(void)
goto error_sysctls;
}
-#ifdef CONFIG_PROC_FS
- proc_create("rxrpc_calls", 0, init_net.proc_net, &rxrpc_call_seq_fops);
- proc_create("rxrpc_conns", 0, init_net.proc_net,
- &rxrpc_connection_seq_fops);
-#endif
return 0;
error_sysctls:
@@ -855,6 +846,8 @@ static int __init af_rxrpc_init(void)
error_sock:
proto_unregister(&rxrpc_proto);
error_proto:
+ unregister_pernet_subsys(&rxrpc_net_ops);
+error_pernet:
rxrpc_exit_security();
error_security:
destroy_workqueue(rxrpc_workqueue);
@@ -875,14 +868,16 @@ static void __exit af_rxrpc_exit(void)
unregister_key_type(&key_type_rxrpc);
sock_unregister(PF_RXRPC);
proto_unregister(&rxrpc_proto);
- rxrpc_destroy_all_calls();
- rxrpc_destroy_all_connections();
+ unregister_pernet_subsys(&rxrpc_net_ops);
ASSERTCMP(atomic_read(&rxrpc_n_tx_skbs), ==, 0);
ASSERTCMP(atomic_read(&rxrpc_n_rx_skbs), ==, 0);
- rxrpc_destroy_all_locals();
- remove_proc_entry("rxrpc_conns", init_net.proc_net);
- remove_proc_entry("rxrpc_calls", init_net.proc_net);
+ /* Make sure the local and peer records pinned by any dying connections
+ * are released.
+ */
+ rcu_barrier();
+ rxrpc_destroy_client_conn_ids();
+
destroy_workqueue(rxrpc_workqueue);
rxrpc_exit_security();
kmem_cache_destroy(rxrpc_call_jar);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 7486926e60a8..067dbb3121d0 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -11,6 +11,8 @@
#include <linux/atomic.h>
#include <linux/seqlock.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
#include <net/sock.h>
#include <net/af_rxrpc.h>
#include <rxrpc/packet.h>
@@ -65,6 +67,37 @@ enum {
};
/*
+ * Per-network namespace data.
+ */
+struct rxrpc_net {
+ struct proc_dir_entry *proc_net; /* Subdir in /proc/net */
+ u32 epoch; /* Local epoch for detecting local-end reset */
+ struct list_head calls; /* List of calls active in this namespace */
+ rwlock_t call_lock; /* Lock for ->calls */
+
+ struct list_head conn_proc_list; /* List of conns in this namespace for proc */
+ struct list_head service_conns; /* Service conns in this namespace */
+ rwlock_t conn_lock; /* Lock for ->conn_proc_list, ->service_conns */
+ struct delayed_work service_conn_reaper;
+
+ unsigned int nr_client_conns;
+ unsigned int nr_active_client_conns;
+ bool kill_all_client_conns;
+ spinlock_t client_conn_cache_lock; /* Lock for ->*_client_conns */
+ spinlock_t client_conn_discard_lock; /* Prevent multiple discarders */
+ struct list_head waiting_client_conns;
+ struct list_head active_client_conns;
+ struct list_head idle_client_conns;
+ struct delayed_work client_conn_reaper;
+
+ struct list_head local_endpoints;
+ struct mutex local_mutex; /* Lock for ->local_endpoints */
+
+ spinlock_t peer_hash_lock; /* Lock for ->peer_hash */
+ DECLARE_HASHTABLE (peer_hash, 10);
+};
+
+/*
* Service backlog preallocation.
*
* This contains circular buffers of preallocated peers, connections and calls
@@ -211,6 +244,7 @@ struct rxrpc_security {
struct rxrpc_local {
struct rcu_head rcu;
atomic_t usage;
+ struct rxrpc_net *rxnet; /* The network ns in which this resides */
struct list_head link;
struct socket *socket; /* my UDP socket */
struct work_struct processor;
@@ -601,7 +635,6 @@ struct rxrpc_ack_summary {
* af_rxrpc.c
*/
extern atomic_t rxrpc_n_tx_skbs, rxrpc_n_rx_skbs;
-extern u32 rxrpc_epoch;
extern atomic_t rxrpc_debug_id;
extern struct workqueue_struct *rxrpc_workqueue;
@@ -634,8 +667,6 @@ extern const char *const rxrpc_call_states[];
extern const char *const rxrpc_call_completions[];
extern unsigned int rxrpc_max_call_lifetime;
extern struct kmem_cache *rxrpc_call_jar;
-extern struct list_head rxrpc_calls;
-extern rwlock_t rxrpc_call_lock;
struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *, unsigned long);
struct rxrpc_call *rxrpc_alloc_call(gfp_t);
@@ -653,7 +684,7 @@ void rxrpc_see_call(struct rxrpc_call *);
void rxrpc_get_call(struct rxrpc_call *, enum rxrpc_call_trace);
void rxrpc_put_call(struct rxrpc_call *, enum rxrpc_call_trace);
void rxrpc_cleanup_call(struct rxrpc_call *);
-void __exit rxrpc_destroy_all_calls(void);
+void rxrpc_destroy_all_calls(struct rxrpc_net *);
static inline bool rxrpc_is_service_call(const struct rxrpc_call *call)
{
@@ -773,7 +804,8 @@ int rxrpc_connect_call(struct rxrpc_call *, struct rxrpc_conn_parameters *,
void rxrpc_expose_client_call(struct rxrpc_call *);
void rxrpc_disconnect_client_call(struct rxrpc_call *);
void rxrpc_put_client_conn(struct rxrpc_connection *);
-void __exit rxrpc_destroy_all_client_connections(void);
+void rxrpc_discard_expired_client_conns(struct work_struct *);
+void rxrpc_destroy_all_client_connections(struct rxrpc_net *);
/*
* conn_event.c
@@ -784,9 +816,6 @@ void rxrpc_process_connection(struct work_struct *);
* conn_object.c
*/
extern unsigned int rxrpc_connection_expiry;
-extern struct list_head rxrpc_connections;
-extern struct list_head rxrpc_connection_proc_list;
-extern rwlock_t rxrpc_connection_lock;
int rxrpc_extract_addr_from_skb(struct sockaddr_rxrpc *, struct sk_buff *);
struct rxrpc_connection *rxrpc_alloc_connection(gfp_t);
@@ -800,7 +829,8 @@ void rxrpc_see_connection(struct rxrpc_connection *);
void rxrpc_get_connection(struct rxrpc_connection *);
struct rxrpc_connection *rxrpc_get_connection_maybe(struct rxrpc_connection *);
void rxrpc_put_service_conn(struct rxrpc_connection *);
-void __exit rxrpc_destroy_all_connections(void);
+void rxrpc_service_connection_reaper(struct work_struct *);
+void rxrpc_destroy_all_connections(struct rxrpc_net *);
static inline bool rxrpc_conn_is_client(const struct rxrpc_connection *conn)
{
@@ -828,7 +858,7 @@ static inline void rxrpc_put_connection(struct rxrpc_connection *conn)
*/
struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *,
struct sk_buff *);
-struct rxrpc_connection *rxrpc_prealloc_service_connection(gfp_t);
+struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *, gfp_t);
void rxrpc_new_incoming_connection(struct rxrpc_connection *, struct sk_buff *);
void rxrpc_unpublish_service_conn(struct rxrpc_connection *);
@@ -861,9 +891,9 @@ extern void rxrpc_process_local_events(struct rxrpc_local *);
/*
* local_object.c
*/
-struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *);
+struct rxrpc_local *rxrpc_lookup_local(struct net *, const struct sockaddr_rxrpc *);
void __rxrpc_put_local(struct rxrpc_local *);
-void __exit rxrpc_destroy_all_locals(void);
+void rxrpc_destroy_all_locals(struct rxrpc_net *);
static inline void rxrpc_get_local(struct rxrpc_local *local)
{
@@ -902,6 +932,17 @@ extern unsigned int rxrpc_resend_timeout;
extern const s8 rxrpc_ack_priority[];
/*
+ * net_ns.c
+ */
+extern unsigned int rxrpc_net_id;
+extern struct pernet_operations rxrpc_net_ops;
+
+static inline struct rxrpc_net *rxrpc_net(struct net *net)
+{
+ return net_generic(net, rxrpc_net_id);
+}
+
+/*
* output.c
*/
int rxrpc_send_ack_packet(struct rxrpc_call *, bool);
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 1752fcf8e8f1..a8515b0d4717 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -38,6 +38,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
{
const void *here = __builtin_return_address(0);
struct rxrpc_call *call;
+ struct rxrpc_net *rxnet = rxrpc_net(sock_net(&rx->sk));
int max, tmp;
unsigned int size = RXRPC_BACKLOG_MAX;
unsigned int head, tail, call_head, call_tail;
@@ -79,7 +80,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
if (CIRC_CNT(head, tail, size) < max) {
struct rxrpc_connection *conn;
- conn = rxrpc_prealloc_service_connection(gfp);
+ conn = rxrpc_prealloc_service_connection(rxnet, gfp);
if (!conn)
return -ENOMEM;
b->conn_backlog[head] = conn;
@@ -136,9 +137,9 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
write_unlock(&rx->call_lock);
- write_lock(&rxrpc_call_lock);
- list_add_tail(&call->link, &rxrpc_calls);
- write_unlock(&rxrpc_call_lock);
+ write_lock(&rxnet->call_lock);
+ list_add_tail(&call->link, &rxnet->calls);
+ write_unlock(&rxnet->call_lock);
b->call_backlog[call_head] = call;
smp_store_release(&b->call_backlog_head, (call_head + 1) & (size - 1));
@@ -185,6 +186,7 @@ int rxrpc_service_prealloc(struct rxrpc_sock *rx, gfp_t gfp)
void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
{
struct rxrpc_backlog *b = rx->backlog;
+ struct rxrpc_net *rxnet = rxrpc_net(sock_net(&rx->sk));
unsigned int size = RXRPC_BACKLOG_MAX, head, tail;
if (!b)
@@ -209,10 +211,10 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
tail = b->conn_backlog_tail;
while (CIRC_CNT(head, tail, size) > 0) {
struct rxrpc_connection *conn = b->conn_backlog[tail];
- write_lock(&rxrpc_connection_lock);
+ write_lock(&rxnet->conn_lock);
list_del(&conn->link);
list_del(&conn->proc_link);
- write_unlock(&rxrpc_connection_lock);
+ write_unlock(&rxnet->conn_lock);
kfree(conn);
tail = (tail + 1) & (size - 1);
}
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 47f7f4205653..692110808baa 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -44,8 +44,6 @@ const char *const rxrpc_call_completions[NR__RXRPC_CALL_COMPLETIONS] = {
};
struct kmem_cache *rxrpc_call_jar;
-LIST_HEAD(rxrpc_calls);
-DEFINE_RWLOCK(rxrpc_call_lock);
static void rxrpc_call_timer_expired(unsigned long _call)
{
@@ -207,6 +205,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx,
__releases(&rx->sk.sk_lock.slock)
{
struct rxrpc_call *call, *xcall;
+ struct rxrpc_net *rxnet = rxrpc_net(sock_net(&rx->sk));
struct rb_node *parent, **pp;
const void *here = __builtin_return_address(0);
int ret;
@@ -255,9 +254,9 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx,
write_unlock(&rx->call_lock);
- write_lock(&rxrpc_call_lock);
- list_add_tail(&call->link, &rxrpc_calls);
- write_unlock(&rxrpc_call_lock);
+ write_lock(&rxnet->call_lock);
+ list_add_tail(&call->link, &rxnet->calls);
+ write_unlock(&rxnet->call_lock);
/* From this point on, the call is protected by its own lock. */
release_sock(&rx->sk);
@@ -508,6 +507,7 @@ void rxrpc_release_calls_on_socket(struct rxrpc_sock *rx)
*/
void rxrpc_put_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
{
+ struct rxrpc_net *rxnet;
const void *here = __builtin_return_address(0);
int n;
@@ -520,9 +520,12 @@ void rxrpc_put_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
_debug("call %d dead", call->debug_id);
ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE);
- write_lock(&rxrpc_call_lock);
- list_del_init(&call->link);
- write_unlock(&rxrpc_call_lock);
+ if (!list_empty(&call->link)) {
+ rxnet = rxrpc_net(sock_net(&call->socket->sk));
+ write_lock(&rxnet->call_lock);
+ list_del_init(&call->link);
+ write_unlock(&rxnet->call_lock);
+ }
rxrpc_cleanup_call(call);
}
@@ -570,21 +573,23 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
}
/*
- * Make sure that all calls are gone.
+ * Make sure that all calls are gone from a network namespace. To reach this
+ * point, any open UDP sockets in that namespace must have been closed, so any
+ * outstanding calls cannot be doing I/O.
*/
-void __exit rxrpc_destroy_all_calls(void)
+void rxrpc_destroy_all_calls(struct rxrpc_net *rxnet)
{
struct rxrpc_call *call;
_enter("");
- if (list_empty(&rxrpc_calls))
+ if (list_empty(&rxnet->calls))
return;
- write_lock(&rxrpc_call_lock);
+ write_lock(&rxnet->call_lock);
- while (!list_empty(&rxrpc_calls)) {
- call = list_entry(rxrpc_calls.next, struct rxrpc_call, link);
+ while (!list_empty(&rxnet->calls)) {
+ call = list_entry(rxnet->calls.next, struct rxrpc_call, link);
_debug("Zapping call %p", call);
rxrpc_see_call(call);
@@ -595,10 +600,10 @@ void __exit rxrpc_destroy_all_calls(void)
rxrpc_call_states[call->state],
call->flags, call->events);
- write_unlock(&rxrpc_call_lock);
+ write_unlock(&rxnet->call_lock);
cond_resched();
- write_lock(&rxrpc_call_lock);
+ write_lock(&rxnet->call_lock);
}
- write_unlock(&rxrpc_call_lock);
+ write_unlock(&rxnet->call_lock);
}
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index e8dea0d49e7f..c86f3202f967 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -31,7 +31,7 @@
* may freely grant available channels to new calls and calls may be
* waiting on it for channels to become available.
*
- * The connection is on the rxrpc_active_client_conns list which is kept
+ * The connection is on the rxnet->active_client_conns list which is kept
* in activation order for culling purposes.
*
* rxrpc_nr_active_client_conns is held incremented also.
@@ -46,7 +46,7 @@
* expires, the EXPOSED flag is cleared and the connection transitions to
* the INACTIVE state.
*
- * The connection is on the rxrpc_idle_client_conns list which is kept in
+ * The connection is on the rxnet->idle_client_conns list which is kept in
* order of how soon they'll expire.
*
* There are flags of relevance to the cache:
@@ -85,27 +85,13 @@ __read_mostly unsigned int rxrpc_reap_client_connections = 900;
__read_mostly unsigned int rxrpc_conn_idle_client_expiry = 2 * 60 * HZ;
__read_mostly unsigned int rxrpc_conn_idle_client_fast_expiry = 2 * HZ;
-static unsigned int rxrpc_nr_client_conns;
-static unsigned int rxrpc_nr_active_client_conns;
-static __read_mostly bool rxrpc_kill_all_client_conns;
-
-static DEFINE_SPINLOCK(rxrpc_client_conn_cache_lock);
-static DEFINE_SPINLOCK(rxrpc_client_conn_discard_mutex);
-static LIST_HEAD(rxrpc_waiting_client_conns);
-static LIST_HEAD(rxrpc_active_client_conns);
-static LIST_HEAD(rxrpc_idle_client_conns);
-
/*
* We use machine-unique IDs for our client connections.
*/
DEFINE_IDR(rxrpc_client_conn_ids);
static DEFINE_SPINLOCK(rxrpc_conn_id_lock);
-static void rxrpc_cull_active_client_conns(void);
-static void rxrpc_discard_expired_client_conns(struct work_struct *);
-
-static DECLARE_DELAYED_WORK(rxrpc_client_conn_reap,
- rxrpc_discard_expired_client_conns);
+static void rxrpc_cull_active_client_conns(struct rxrpc_net *);
/*
* Get a connection ID and epoch for a client connection from the global pool.
@@ -116,6 +102,7 @@ static DECLARE_DELAYED_WORK(rxrpc_client_conn_reap,
static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
gfp_t gfp)
{
+ struct rxrpc_net *rxnet = conn->params.local->rxnet;
int id;
_enter("");
@@ -131,7 +118,7 @@ static int rxrpc_get_client_connection_id(struct rxrpc_connection *conn,
spin_unlock(&rxrpc_conn_id_lock);
idr_preload_end();
- conn->proto.epoch = rxrpc_epoch;
+ conn->proto.epoch = rxnet->epoch;
conn->proto.cid = id << RXRPC_CIDSHIFT;
set_bit(RXRPC_CONN_HAS_IDR, &conn->flags);
_leave(" [CID %x]", conn->proto.cid);
@@ -183,6 +170,7 @@ static struct rxrpc_connection *
rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp)
{
struct rxrpc_connection *conn;
+ struct rxrpc_net *rxnet = cp->local->rxnet;
int ret;
_enter("");
@@ -213,9 +201,9 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp)
if (ret < 0)
goto error_2;
- write_lock(&rxrpc_connection_lock);
- list_add_tail(&conn->proc_link, &rxrpc_connection_proc_list);
- write_unlock(&rxrpc_connection_lock);
+ write_lock(&rxnet->conn_lock);
+ list_add_tail(&conn->proc_link, &rxnet->conn_proc_list);
+ write_unlock(&rxnet->conn_lock);
/* We steal the caller's peer ref. */
cp->peer = NULL;
@@ -243,12 +231,13 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp)
*/
static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn)
{
+ struct rxrpc_net *rxnet = conn->params.local->rxnet;
int id_cursor, id, distance, limit;
if (test_bit(RXRPC_CONN_DONT_REUSE, &conn->flags))
goto dont_reuse;
- if (conn->proto.epoch != rxrpc_epoch)
+ if (conn->proto.epoch != rxnet->epoch)
goto mark_dont_reuse;
/* The IDR tree gets very expensive on memory if the connection IDs are
@@ -440,12 +429,13 @@ static int rxrpc_get_client_conn(struct rxrpc_call *call,
/*
* Activate a connection.
*/
-static void rxrpc_activate_conn(struct rxrpc_connection *conn)
+static void rxrpc_activate_conn(struct rxrpc_net *rxnet,
+ struct rxrpc_connection *conn)
{
trace_rxrpc_client(conn, -1, rxrpc_client_to_active);
conn->cache_state = RXRPC_CONN_CLIENT_ACTIVE;
- rxrpc_nr_active_client_conns++;
- list_move_tail(&conn->cache_link, &rxrpc_active_client_conns);
+ rxnet->nr_active_client_conns++;
+ list_move_tail(&conn->cache_link, &rxnet->active_client_conns);
}
/*
@@ -460,7 +450,8 @@ static void rxrpc_activate_conn(struct rxrpc_connection *conn)
* channels if it has been culled to make space and then re-requested by a new
* call.
*/
-static void rxrpc_animate_client_conn(struct rxrpc_connection *conn)
+static void rxrpc_animate_client_conn(struct rxrpc_net *rxnet,
+ struct rxrpc_connection *conn)
{
unsigned int nr_conns;
@@ -469,12 +460,12 @@ static void rxrpc_animate_client_conn(struct rxrpc_connection *conn)
if (conn->cache_state == RXRPC_CONN_CLIENT_ACTIVE)
goto out;
- spin_lock(&rxrpc_client_conn_cache_lock);
+ spin_lock(&rxnet->client_conn_cache_lock);
- nr_conns = rxrpc_nr_client_conns;
+ nr_conns = rxnet->nr_client_conns;
if (!test_and_set_bit(RXRPC_CONN_COUNTED, &conn->flags)) {
trace_rxrpc_client(conn, -1, rxrpc_client_count);
- rxrpc_nr_client_conns = nr_conns + 1;
+ rxnet->nr_client_conns = nr_conns + 1;
}
switch (conn->cache_state) {
@@ -494,21 +485,21 @@ static void rxrpc_animate_client_conn(struct rxrpc_connection *conn)
}
out_unlock:
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ spin_unlock(&rxnet->client_conn_cache_lock);
out:
_leave(" [%d]", conn->cache_state);
return;
activate_conn:
_debug("activate");
- rxrpc_activate_conn(conn);
+ rxrpc_activate_conn(rxnet, conn);
goto out_unlock;
wait_for_capacity:
_debug("wait");
trace_rxrpc_client(conn, -1, rxrpc_client_to_waiting);
conn->cache_state = RXRPC_CONN_CLIENT_WAITING;
- list_move_tail(&conn->cache_link, &rxrpc_waiting_client_conns);
+ list_move_tail(&conn->cache_link, &rxnet->waiting_client_conns);
goto out_unlock;
}
@@ -660,18 +651,19 @@ int rxrpc_connect_call(struct rxrpc_call *call,
struct sockaddr_rxrpc *srx,
gfp_t gfp)
{
+ struct rxrpc_net *rxnet = cp->local->rxnet;
int ret;
_enter("{%d,%lx},", call->debug_id, call->user_call_ID);
- rxrpc_discard_expired_client_conns(NULL);
- rxrpc_cull_active_client_conns();
+ rxrpc_discard_expired_client_conns(&rxnet->client_conn_reaper.work);
+ rxrpc_cull_active_client_conns(rxnet);
ret = rxrpc_get_client_conn(call, cp, srx, gfp);
if (ret < 0)
return ret;
- rxrpc_animate_client_conn(call->conn);
+ rxrpc_animate_client_conn(rxnet, call->conn);
rxrpc_activate_channels(call->conn);
ret = rxrpc_wait_for_channel(call, gfp);
@@ -729,6 +721,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
unsigned int channel = call->cid & RXRPC_CHANNELMASK;
struct rxrpc_connection *conn = call->conn;
struct rxrpc_channel *chan = &conn->channels[channel];
+ struct rxrpc_net *rxnet = rxrpc_net(sock_net(&call->socket->sk));
trace_rxrpc_client(conn, channel, rxrpc_client_chan_disconnect);
call->conn = NULL;
@@ -750,7 +743,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
/* We must deactivate or idle the connection if it's now
* waiting for nothing.
*/
- spin_lock(&rxrpc_client_conn_cache_lock);
+ spin_lock(&rxnet->client_conn_cache_lock);
if (conn->cache_state == RXRPC_CONN_CLIENT_WAITING &&
list_empty(&conn->waiting_calls) &&
!conn->active_chans)
@@ -787,14 +780,14 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
* list. It might even get moved back to the active list whilst we're
* waiting for the lock.
*/
- spin_lock(&rxrpc_client_conn_cache_lock);
+ spin_lock(&rxnet->client_conn_cache_lock);
switch (conn->cache_state) {
case RXRPC_CONN_CLIENT_ACTIVE:
if (list_empty(&conn->waiting_calls)) {
rxrpc_deactivate_one_channel(conn, channel);
if (!conn->active_chans) {
- rxrpc_nr_active_client_conns--;
+ rxnet->nr_active_client_conns--;
goto idle_connection;
}
goto out;
@@ -820,7 +813,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
}
out:
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ spin_unlock(&rxnet->client_conn_cache_lock);
out_2:
spin_unlock(&conn->channel_lock);
rxrpc_put_connection(conn);
@@ -835,11 +828,11 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
trace_rxrpc_client(conn, channel, rxrpc_client_to_idle);
conn->idle_timestamp = jiffies;
conn->cache_state = RXRPC_CONN_CLIENT_IDLE;
- list_move_tail(&conn->cache_link, &rxrpc_idle_client_conns);
- if (rxrpc_idle_client_conns.next == &conn->cache_link &&
- !rxrpc_kill_all_client_conns)
+ list_move_tail(&conn->cache_link, &rxnet->idle_client_conns);
+ if (rxnet->idle_client_conns.next == &conn->cache_link &&
+ !rxnet->kill_all_client_conns)
queue_delayed_work(rxrpc_workqueue,
- &rxrpc_client_conn_reap,
+ &rxnet->client_conn_reaper,
rxrpc_conn_idle_client_expiry);
} else {
trace_rxrpc_client(conn, channel, rxrpc_client_to_inactive);
@@ -857,6 +850,7 @@ rxrpc_put_one_client_conn(struct rxrpc_connection *conn)
{
struct rxrpc_connection *next = NULL;
struct rxrpc_local *local = conn->params.local;
+ struct rxrpc_net *rxnet = local->rxnet;
unsigned int nr_conns;
trace_rxrpc_client(conn, -1, rxrpc_client_cleanup);
@@ -875,18 +869,18 @@ rxrpc_put_one_client_conn(struct rxrpc_connection *conn)
if (test_bit(RXRPC_CONN_COUNTED, &conn->flags)) {
trace_rxrpc_client(conn, -1, rxrpc_client_uncount);
- spin_lock(&rxrpc_client_conn_cache_lock);
- nr_conns = --rxrpc_nr_client_conns;
+ spin_lock(&rxnet->client_conn_cache_lock);
+ nr_conns = --rxnet->nr_client_conns;
if (nr_conns < rxrpc_max_client_connections &&
- !list_empty(&rxrpc_waiting_client_conns)) {
- next = list_entry(rxrpc_waiting_client_conns.next,
+ !list_empty(&rxnet->waiting_client_conns)) {
+ next = list_entry(rxnet->waiting_client_conns.next,
struct rxrpc_connection, cache_link);
rxrpc_get_connection(next);
- rxrpc_activate_conn(next);
+ rxrpc_activate_conn(rxnet, next);
}
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ spin_unlock(&rxnet->client_conn_cache_lock);
}
rxrpc_kill_connection(conn);
@@ -921,10 +915,10 @@ void rxrpc_put_client_conn(struct rxrpc_connection *conn)
/*
* Kill the longest-active client connections to make room for new ones.
*/
-static void rxrpc_cull_active_client_conns(void)
+static void rxrpc_cull_active_client_conns(struct rxrpc_net *rxnet)
{
struct rxrpc_connection *conn;
- unsigned int nr_conns = rxrpc_nr_client_conns;
+ unsigned int nr_conns = rxnet->nr_client_conns;
unsigned int nr_active, limit;
_enter("");
@@ -936,12 +930,12 @@ static void rxrpc_cull_active_client_conns(void)
}
limit = rxrpc_reap_client_connections;
- spin_lock(&rxrpc_client_conn_cache_lock);
- nr_active = rxrpc_nr_active_client_conns;
+ spin_lock(&rxnet->client_conn_cache_lock);
+ nr_active = rxnet->nr_active_client_conns;
while (nr_active > limit) {
- ASSERT(!list_empty(&rxrpc_active_client_conns));
- conn = list_entry(rxrpc_active_client_conns.next,
+ ASSERT(!list_empty(&rxnet->active_client_conns));
+ conn = list_entry(rxnet->active_client_conns.next,
struct rxrpc_connection, cache_link);
ASSERTCMP(conn->cache_state, ==, RXRPC_CONN_CLIENT_ACTIVE);
@@ -953,14 +947,14 @@ static void rxrpc_cull_active_client_conns(void)
trace_rxrpc_client(conn, -1, rxrpc_client_to_waiting);
conn->cache_state = RXRPC_CONN_CLIENT_WAITING;
list_move_tail(&conn->cache_link,
- &rxrpc_waiting_client_conns);
+ &rxnet->waiting_client_conns);
}
nr_active--;
}
- rxrpc_nr_active_client_conns = nr_active;
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ rxnet->nr_active_client_conns = nr_active;
+ spin_unlock(&rxnet->client_conn_cache_lock);
ASSERTCMP(nr_active, >=, 0);
_leave(" [culled]");
}
@@ -972,22 +966,25 @@ static void rxrpc_cull_active_client_conns(void)
* This may be called from conn setup or from a work item so cannot be
* considered non-reentrant.
*/
-static void rxrpc_discard_expired_client_conns(struct work_struct *work)
+void rxrpc_discard_expired_client_conns(struct work_struct *work)
{
struct rxrpc_connection *conn;
+ struct rxrpc_net *rxnet =
+ container_of(to_delayed_work(work),
+ struct rxrpc_net, client_conn_reaper);
unsigned long expiry, conn_expires_at, now;
unsigned int nr_conns;
bool did_discard = false;
- _enter("%c", work ? 'w' : 'n');
+ _enter("");
- if (list_empty(&rxrpc_idle_client_conns)) {
+ if (list_empty(&rxnet->idle_client_conns)) {
_leave(" [empty]");
return;
}
/* Don't double up on the discarding */
- if (!spin_trylock(&rxrpc_client_conn_discard_mutex)) {
+ if (!spin_trylock(&rxnet->client_conn_discard_lock)) {
_leave(" [already]");
return;
}
@@ -995,19 +992,19 @@ static void rxrpc_discard_expired_client_conns(struct work_struct *work)
/* We keep an estimate of what the number of conns ought to be after
* we've discarded some so that we don't overdo the discarding.
*/
- nr_conns = rxrpc_nr_client_conns;
+ nr_conns = rxnet->nr_client_conns;
next:
- spin_lock(&rxrpc_client_conn_cache_lock);
+ spin_lock(&rxnet->client_conn_cache_lock);
- if (list_empty(&rxrpc_idle_client_conns))
+ if (list_empty(&rxnet->idle_client_conns))
goto out;
- conn = list_entry(rxrpc_idle_client_conns.next,
+ conn = list_entry(rxnet->idle_client_conns.next,
struct rxrpc_connection, cache_link);
ASSERT(test_bit(RXRPC_CONN_EXPOSED, &conn->flags));
- if (!rxrpc_kill_all_client_conns) {
+ if (!rxnet->kill_all_client_conns) {
/* If the number of connections is over the reap limit, we
* expedite discard by reducing the expiry timeout. We must,
* however, have at least a short grace period to be able to do
@@ -1030,7 +1027,7 @@ static void rxrpc_discard_expired_client_conns(struct work_struct *work)
conn->cache_state = RXRPC_CONN_CLIENT_INACTIVE;
list_del_init(&conn->cache_link);
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ spin_unlock(&rxnet->client_conn_cache_lock);
/* When we cleared the EXPOSED flag, we took on responsibility for the
* reference that that had on the usage count. We deal with that here.
@@ -1050,14 +1047,14 @@ static void rxrpc_discard_expired_client_conns(struct work_struct *work)
* then things get messier.
*/
_debug("not yet");
- if (!rxrpc_kill_all_client_conns)
+ if (!rxnet->kill_all_client_conns)
queue_delayed_work(rxrpc_workqueue,
- &rxrpc_client_conn_reap,
+ &rxnet->client_conn_reaper,
conn_expires_at - now);
out:
- spin_unlock(&rxrpc_client_conn_cache_lock);
- spin_unlock(&rxrpc_client_conn_discard_mutex);
+ spin_unlock(&rxnet->client_conn_cache_lock);
+ spin_unlock(&rxnet->client_conn_discard_lock);
_leave("");
}
@@ -1065,17 +1062,17 @@ static void rxrpc_discard_expired_client_conns(struct work_struct *work)
* Preemptively destroy all the client connection records rather than waiting
* for them to time out
*/
-void __exit rxrpc_destroy_all_client_connections(void)
+void rxrpc_destroy_all_client_connections(struct rxrpc_net *rxnet)
{
_enter("");
- spin_lock(&rxrpc_client_conn_cache_lock);
- rxrpc_kill_all_client_conns = true;
- spin_unlock(&rxrpc_client_conn_cache_lock);
+ spin_lock(&rxnet->client_conn_cache_lock);
+ rxnet->kill_all_client_conns = true;
+ spin_unlock(&rxnet->client_conn_cache_lock);
- cancel_delayed_work(&rxrpc_client_conn_reap);
+ cancel_delayed_work(&rxnet->client_conn_reaper);
- if (!queue_delayed_work(rxrpc_workqueue, &rxrpc_client_conn_reap, 0))
+ if (!queue_delayed_work(rxrpc_workqueue, &rxnet->client_conn_reaper, 0))
_debug("destroy: queue failed");
_leave("");
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index b0ecb770fdce..ade4d3d0b2a7 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -22,13 +22,6 @@
*/
unsigned int rxrpc_connection_expiry = 10 * 60;
-static void rxrpc_connection_reaper(struct work_struct *work);
-
-LIST_HEAD(rxrpc_connections);
-LIST_HEAD(rxrpc_connection_proc_list);
-DEFINE_RWLOCK(rxrpc_connection_lock);
-static DECLARE_DELAYED_WORK(rxrpc_connection_reap, rxrpc_connection_reaper);
-
static void rxrpc_destroy_connection(struct rcu_head *);
/*
@@ -222,15 +215,17 @@ void rxrpc_disconnect_call(struct rxrpc_call *call)
*/
void rxrpc_kill_connection(struct rxrpc_connection *conn)
{
+ struct rxrpc_net *rxnet = conn->params.local->rxnet;
+
ASSERT(!rcu_access_pointer(conn->channels[0].call) &&
!rcu_access_pointer(conn->channels[1].call) &&
!rcu_access_pointer(conn->channels[2].call) &&
!rcu_access_pointer(conn->channels[3].call));
ASSERT(list_empty(&conn->cache_link));
- write_lock(&rxrpc_connection_lock);
+ write_lock(&rxnet->conn_lock);
list_del_init(&conn->proc_link);
- write_unlock(&rxrpc_connection_lock);
+ write_unlock(&rxnet->conn_lock);
/* Drain the Rx queue. Note that even though we've unpublished, an
* incoming packet could still be being added to our Rx queue, so we
@@ -309,14 +304,17 @@ rxrpc_get_connection_maybe(struct rxrpc_connection *conn)
*/
void rxrpc_put_service_conn(struct rxrpc_connection *conn)
{
+ struct rxrpc_net *rxnet;
const void *here = __builtin_return_address(0);
int n;
n = atomic_dec_return(&conn->usage);
trace_rxrpc_conn(conn, rxrpc_conn_put_service, n, here);
ASSERTCMP(n, >=, 0);
- if (n == 0)
- rxrpc_queue_delayed_work(&rxrpc_connection_reap, 0);
+ if (n == 0) {
+ rxnet = conn->params.local->rxnet;
+ rxrpc_queue_delayed_work(&rxnet->service_conn_reaper, 0);
+ }
}
/*
@@ -348,9 +346,12 @@ static void rxrpc_destroy_connection(struct rcu_head *rcu)
/*
* reap dead service connections
*/
-static void rxrpc_connection_reaper(struct work_struct *work)
+void rxrpc_service_connection_reaper(struct work_struct *work)
{
struct rxrpc_connection *conn, *_p;
+ struct rxrpc_net *rxnet =
+ container_of(to_delayed_work(work),
+ struct rxrpc_net, service_conn_reaper);
unsigned long reap_older_than, earliest, idle_timestamp, now;
LIST_HEAD(graveyard);
@@ -361,8 +362,8 @@ static void rxrpc_connection_reaper(struct work_struct *work)
reap_older_than = now - rxrpc_connection_expiry * HZ;
earliest = ULONG_MAX;
- write_lock(&rxrpc_connection_lock);
- list_for_each_entry_safe(conn, _p, &rxrpc_connections, link) {
+ write_lock(&rxnet->conn_lock);
+ list_for_each_entry_safe(conn, _p, &rxnet->service_conns, link) {
ASSERTCMP(atomic_read(&conn->usage), >, 0);
if (likely(atomic_read(&conn->usage) > 1))
continue;
@@ -393,12 +394,12 @@ static void rxrpc_connection_reaper(struct work_struct *work)
list_move_tail(&conn->link, &graveyard);
}
- write_unlock(&rxrpc_connection_lock);
+ write_unlock(&rxnet->conn_lock);
if (earliest != ULONG_MAX) {
_debug("reschedule reaper %ld", (long) earliest - now);
ASSERT(time_after(earliest, now));
- rxrpc_queue_delayed_work(&rxrpc_connection_reap,
+ rxrpc_queue_delayed_work(&rxnet->client_conn_reaper,
earliest - now);
}
@@ -418,36 +419,30 @@ static void rxrpc_connection_reaper(struct work_struct *work)
* preemptively destroy all the service connection records rather than
* waiting for them to time out
*/
-void __exit rxrpc_destroy_all_connections(void)
+void rxrpc_destroy_all_connections(struct rxrpc_net *rxnet)
{
struct rxrpc_connection *conn, *_p;
bool leak = false;
_enter("");
- rxrpc_destroy_all_client_connections();
+ rxrpc_destroy_all_client_connections(rxnet);
rxrpc_connection_expiry = 0;
- cancel_delayed_work(&rxrpc_connection_reap);
- rxrpc_queue_delayed_work(&rxrpc_connection_reap, 0);
+ cancel_delayed_work(&rxnet->client_conn_reaper);
+ rxrpc_queue_delayed_work(&rxnet->client_conn_reaper, 0);
flush_workqueue(rxrpc_workqueue);
- write_lock(&rxrpc_connection_lock);
- list_for_each_entry_safe(conn, _p, &rxrpc_connections, link) {
+ write_lock(&rxnet->conn_lock);
+ list_for_each_entry_safe(conn, _p, &rxnet->service_conns, link) {
pr_err("AF_RXRPC: Leaked conn %p {%d}\n",
conn, atomic_read(&conn->usage));
leak = true;
}
- write_unlock(&rxrpc_connection_lock);
+ write_unlock(&rxnet->conn_lock);
BUG_ON(leak);
- ASSERT(list_empty(&rxrpc_connection_proc_list));
-
- /* Make sure the local and peer records pinned by any dying connections
- * are released.
- */
- rcu_barrier();
- rxrpc_destroy_client_conn_ids();
+ ASSERT(list_empty(&rxnet->conn_proc_list));
_leave("");
}
diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index eef551f40dc2..edfc633f7d5e 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -121,7 +121,8 @@ static void rxrpc_publish_service_conn(struct rxrpc_peer *peer,
* Preallocate a service connection. The connection is placed on the proc and
* reap lists so that we don't have to get the lock from BH context.
*/
-struct rxrpc_connection *rxrpc_prealloc_service_connection(gfp_t gfp)
+struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *rxnet,
+ gfp_t gfp)
{
struct rxrpc_connection *conn = rxrpc_alloc_connection(gfp);
@@ -132,10 +133,10 @@ struct rxrpc_connection *rxrpc_prealloc_service_connection(gfp_t gfp)
conn->state = RXRPC_CONN_SERVICE_PREALLOC;
atomic_set(&conn->usage, 2);
- write_lock(&rxrpc_connection_lock);
- list_add_tail(&conn->link, &rxrpc_connections);
- list_add_tail(&conn->proc_link, &rxrpc_connection_proc_list);
- write_unlock(&rxrpc_connection_lock);
+ write_lock(&rxnet->conn_lock);
+ list_add_tail(&conn->link, &rxnet->service_conns);
+ list_add_tail(&conn->proc_link, &rxnet->conn_proc_list);
+ write_unlock(&rxnet->conn_lock);
trace_rxrpc_conn(conn, rxrpc_conn_new_service,
atomic_read(&conn->usage),
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index ff4864d550b8..17d79fd73ade 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -25,9 +25,6 @@
static void rxrpc_local_processor(struct work_struct *);
static void rxrpc_local_rcu(struct rcu_head *);
-static DEFINE_MUTEX(rxrpc_local_mutex);
-static LIST_HEAD(rxrpc_local_endpoints);
-
/*
* Compare a local to an address. Return -ve, 0 or +ve to indicate less than,
* same or greater than.
@@ -77,13 +74,15 @@ static long rxrpc_local_cmp_key(const struct rxrpc_local *local,
/*
* Allocate a new local endpoint.
*/
-static struct rxrpc_local *rxrpc_alloc_local(const struct sockaddr_rxrpc *srx)
+static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet,
+ const struct sockaddr_rxrpc *srx)
{
struct rxrpc_local *local;
local = kzalloc(sizeof(struct rxrpc_local), GFP_KERNEL);
if (local) {
atomic_set(&local->usage, 1);
+ local->rxnet = rxnet;
INIT_LIST_HEAD(&local->link);
INIT_WORK(&local->processor, rxrpc_local_processor);
init_rwsem(&local->defrag_sem);
@@ -105,7 +104,7 @@ static struct rxrpc_local *rxrpc_alloc_local(const struct sockaddr_rxrpc *srx)
* create the local socket
* - must be called with rxrpc_local_mutex locked
*/
-static int rxrpc_open_socket(struct rxrpc_local *local)
+static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net)
{
struct sock *sock;
int ret, opt;
@@ -114,7 +113,7 @@ static int rxrpc_open_socket(struct rxrpc_local *local)
local, local->srx.transport_type, local->srx.transport.family);
/* create a socket to represent the local endpoint */
- ret = sock_create_kern(&init_net, local->srx.transport.family,
+ ret = sock_create_kern(net, local->srx.transport.family,
local->srx.transport_type, 0, &local->socket);
if (ret < 0) {
_leave(" = %d [socket]", ret);
@@ -172,9 +171,11 @@ static int rxrpc_open_socket(struct rxrpc_local *local)
/*
* Look up or create a new local endpoint using the specified local address.
*/
-struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
+struct rxrpc_local *rxrpc_lookup_local(struct net *net,
+ const struct sockaddr_rxrpc *srx)
{
struct rxrpc_local *local;
+ struct rxrpc_net *rxnet = rxrpc_net(net);
struct list_head *cursor;
const char *age;
long diff;
@@ -183,10 +184,10 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
_enter("{%d,%d,%pISp}",
srx->transport_type, srx->transport.family, &srx->transport);
- mutex_lock(&rxrpc_local_mutex);
+ mutex_lock(&rxnet->local_mutex);
- for (cursor = rxrpc_local_endpoints.next;
- cursor != &rxrpc_local_endpoints;
+ for (cursor = rxnet->local_endpoints.next;
+ cursor != &rxnet->local_endpoints;
cursor = cursor->next) {
local = list_entry(cursor, struct rxrpc_local, link);
@@ -220,11 +221,11 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
goto found;
}
- local = rxrpc_alloc_local(srx);
+ local = rxrpc_alloc_local(rxnet, srx);
if (!local)
goto nomem;
- ret = rxrpc_open_socket(local);
+ ret = rxrpc_open_socket(local, net);
if (ret < 0)
goto sock_error;
@@ -232,7 +233,7 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
age = "new";
found:
- mutex_unlock(&rxrpc_local_mutex);
+ mutex_unlock(&rxnet->local_mutex);
_net("LOCAL %s %d {%pISp}",
age, local->debug_id, &local->srx.transport);
@@ -243,13 +244,13 @@ struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
nomem:
ret = -ENOMEM;
sock_error:
- mutex_unlock(&rxrpc_local_mutex);
+ mutex_unlock(&rxnet->local_mutex);
kfree(local);
_leave(" = %d", ret);
return ERR_PTR(ret);
addr_in_use:
- mutex_unlock(&rxrpc_local_mutex);
+ mutex_unlock(&rxnet->local_mutex);
_leave(" = -EADDRINUSE");
return ERR_PTR(-EADDRINUSE);
}
@@ -273,6 +274,7 @@ void __rxrpc_put_local(struct rxrpc_local *local)
static void rxrpc_local_destroyer(struct rxrpc_local *local)
{
struct socket *socket = local->socket;
+ struct rxrpc_net *rxnet = local->rxnet;
_enter("%d", local->debug_id);
@@ -286,9 +288,9 @@ static void rxrpc_local_destroyer(struct rxrpc_local *local)
}
local->dead = true;
- mutex_lock(&rxrpc_local_mutex);
+ mutex_lock(&rxnet->local_mutex);
list_del_init(&local->link);
- mutex_unlock(&rxrpc_local_mutex);
+ mutex_unlock(&rxnet->local_mutex);
ASSERT(RB_EMPTY_ROOT(&local->client_conns));
ASSERT(!local->service);
@@ -357,7 +359,7 @@ static void rxrpc_local_rcu(struct rcu_head *rcu)
/*
* Verify the local endpoint list is empty by this point.
*/
-void __exit rxrpc_destroy_all_locals(void)
+void rxrpc_destroy_all_locals(struct rxrpc_net *rxnet)
{
struct rxrpc_local *local;
@@ -365,15 +367,13 @@ void __exit rxrpc_destroy_all_locals(void)
flush_workqueue(rxrpc_workqueue);
- if (!list_empty(&rxrpc_local_endpoints)) {
- mutex_lock(&rxrpc_local_mutex);
- list_for_each_entry(local, &rxrpc_local_endpoints, link) {
+ if (!list_empty(&rxnet->local_endpoints)) {
+ mutex_lock(&rxnet->local_mutex);
+ list_for_each_entry(local, &rxnet->local_endpoints, link) {
pr_err("AF_RXRPC: Leaked local %p {%d}\n",
local, atomic_read(&local->usage));
}
- mutex_unlock(&rxrpc_local_mutex);
+ mutex_unlock(&rxnet->local_mutex);
BUG();
}
-
- rcu_barrier();
}
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
new file mode 100644
index 000000000000..26449a6bb076
--- /dev/null
+++ b/net/rxrpc/net_ns.c
@@ -0,0 +1,85 @@
+/* rxrpc network namespace handling.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/proc_fs.h>
+#include "ar-internal.h"
+
+unsigned int rxrpc_net_id;
+
+/*
+ * Initialise a per-network namespace record.
+ */
+static __net_init int rxrpc_init_net(struct net *net)
+{
+ struct rxrpc_net *rxnet = rxrpc_net(net);
+ int ret;
+
+ get_random_bytes(&rxnet->epoch, sizeof(rxnet->epoch));
+ rxnet->epoch |= RXRPC_RANDOM_EPOCH;
+
+ INIT_LIST_HEAD(&rxnet->calls);
+ rwlock_init(&rxnet->call_lock);
+
+ INIT_LIST_HEAD(&rxnet->conn_proc_list);
+ INIT_LIST_HEAD(&rxnet->service_conns);
+ rwlock_init(&rxnet->conn_lock);
+ INIT_DELAYED_WORK(&rxnet->service_conn_reaper,
+ rxrpc_service_connection_reaper);
+
+ rxnet->nr_client_conns = 0;
+ rxnet->nr_active_client_conns = 0;
+ rxnet->kill_all_client_conns = false;
+ spin_lock_init(&rxnet->client_conn_cache_lock);
+ spin_lock_init(&rxnet->client_conn_discard_lock);
+ INIT_LIST_HEAD(&rxnet->waiting_client_conns);
+ INIT_LIST_HEAD(&rxnet->active_client_conns);
+ INIT_LIST_HEAD(&rxnet->idle_client_conns);
+ INIT_DELAYED_WORK(&rxnet->client_conn_reaper,
+ rxrpc_discard_expired_client_conns);
+
+ INIT_LIST_HEAD(&rxnet->local_endpoints);
+ mutex_init(&rxnet->local_mutex);
+ hash_init(rxnet->peer_hash);
+ spin_lock_init(&rxnet->peer_hash_lock);
+
+ ret = -ENOMEM;
+ rxnet->proc_net = proc_net_mkdir(net, "rxrpc", net->proc_net);
+ if (!rxnet->proc_net)
+ goto err_proc;
+
+ proc_create("calls", 0444, rxnet->proc_net, &rxrpc_call_seq_fops);
+ proc_create("conns", 0444, rxnet->proc_net, &rxrpc_connection_seq_fops);
+ return 0;
+
+ proc_remove(rxnet->proc_net);
+err_proc:
+ return ret;
+}
+
+/*
+ * Clean up a per-network namespace record.
+ */
+static __net_exit void rxrpc_exit_net(struct net *net)
+{
+ struct rxrpc_net *rxnet = rxrpc_net(net);
+
+ rxrpc_destroy_all_calls(rxnet);
+ rxrpc_destroy_all_connections(rxnet);
+ rxrpc_destroy_all_locals(rxnet);
+ proc_remove(rxnet->proc_net);
+}
+
+struct pernet_operations rxrpc_net_ops = {
+ .init = rxrpc_init_net,
+ .exit = rxrpc_exit_net,
+ .id = &rxrpc_net_id,
+ .size = sizeof(struct rxrpc_net),
+};
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 862eea6b266c..cfed3b27adf0 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -26,9 +26,6 @@
#include <net/ip6_route.h>
#include "ar-internal.h"
-static DEFINE_HASHTABLE(rxrpc_peer_hash, 10);
-static DEFINE_SPINLOCK(rxrpc_peer_hash_lock);
-
/*
* Hash a peer key.
*/
@@ -124,8 +121,9 @@ static struct rxrpc_peer *__rxrpc_lookup_peer_rcu(
unsigned long hash_key)
{
struct rxrpc_peer *peer;
+ struct rxrpc_net *rxnet = local->rxnet;
- hash_for_each_possible_rcu(rxrpc_peer_hash, peer, hash_link, hash_key) {
+ hash_for_each_possible_rcu(rxnet->peer_hash, peer, hash_link, hash_key) {
if (rxrpc_peer_cmp_key(peer, local, srx, hash_key) == 0) {
if (atomic_read(&peer->usage) == 0)
return NULL;
@@ -301,13 +299,14 @@ struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *local,
struct rxrpc_peer *prealloc)
{
struct rxrpc_peer *peer;
+ struct rxrpc_net *rxnet = local->rxnet;
unsigned long hash_key;
hash_key = rxrpc_peer_hash_key(local, &prealloc->srx);
prealloc->local = local;
rxrpc_init_peer(prealloc, hash_key);
- spin_lock(&rxrpc_peer_hash_lock);
+ spin_lock(&rxnet->peer_hash_lock);
/* Need to check that we aren't racing with someone else */
peer = __rxrpc_lookup_peer_rcu(local, &prealloc->srx, hash_key);
@@ -315,10 +314,10 @@ struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *local,
peer = NULL;
if (!peer) {
peer = prealloc;
- hash_add_rcu(rxrpc_peer_hash, &peer->hash_link, hash_key);
+ hash_add_rcu(rxnet->peer_hash, &peer->hash_link, hash_key);
}
- spin_unlock(&rxrpc_peer_hash_lock);
+ spin_unlock(&rxnet->peer_hash_lock);
return peer;
}
@@ -329,6 +328,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
struct sockaddr_rxrpc *srx, gfp_t gfp)
{
struct rxrpc_peer *peer, *candidate;
+ struct rxrpc_net *rxnet = local->rxnet;
unsigned long hash_key = rxrpc_peer_hash_key(local, srx);
_enter("{%pISp}", &srx->transport);
@@ -350,17 +350,17 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
return NULL;
}
- spin_lock_bh(&rxrpc_peer_hash_lock);
+ spin_lock_bh(&rxnet->peer_hash_lock);
/* Need to check that we aren't racing with someone else */
peer = __rxrpc_lookup_peer_rcu(local, srx, hash_key);
if (peer && !rxrpc_get_peer_maybe(peer))
peer = NULL;
if (!peer)
- hash_add_rcu(rxrpc_peer_hash,
+ hash_add_rcu(rxnet->peer_hash,
&candidate->hash_link, hash_key);
- spin_unlock_bh(&rxrpc_peer_hash_lock);
+ spin_unlock_bh(&rxnet->peer_hash_lock);
if (peer)
kfree(candidate);
@@ -379,11 +379,13 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
*/
void __rxrpc_put_peer(struct rxrpc_peer *peer)
{
+ struct rxrpc_net *rxnet = peer->local->rxnet;
+
ASSERT(hlist_empty(&peer->error_targets));
- spin_lock_bh(&rxrpc_peer_hash_lock);
+ spin_lock_bh(&rxnet->peer_hash_lock);
hash_del_rcu(&peer->hash_link);
- spin_unlock_bh(&rxrpc_peer_hash_lock);
+ spin_unlock_bh(&rxnet->peer_hash_lock);
kfree_rcu(peer, rcu);
}
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index b9bcfbfb095c..e92d8405b15a 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -30,19 +30,25 @@ static const char *const rxrpc_conn_states[RXRPC_CONN__NR_STATES] = {
*/
static void *rxrpc_call_seq_start(struct seq_file *seq, loff_t *_pos)
{
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
rcu_read_lock();
- read_lock(&rxrpc_call_lock);
- return seq_list_start_head(&rxrpc_calls, *_pos);
+ read_lock(&rxnet->call_lock);
+ return seq_list_start_head(&rxnet->calls, *_pos);
}
static void *rxrpc_call_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
- return seq_list_next(v, &rxrpc_calls, pos);
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
+ return seq_list_next(v, &rxnet->calls, pos);
}
static void rxrpc_call_seq_stop(struct seq_file *seq, void *v)
{
- read_unlock(&rxrpc_call_lock);
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
+ read_unlock(&rxnet->call_lock);
rcu_read_unlock();
}
@@ -52,10 +58,11 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
struct rxrpc_sock *rx;
struct rxrpc_peer *peer;
struct rxrpc_call *call;
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
rxrpc_seq_t tx_hard_ack, rx_hard_ack;
char lbuff[50], rbuff[50];
- if (v == &rxrpc_calls) {
+ if (v == &rxnet->calls) {
seq_puts(seq,
"Proto Local "
" Remote "
@@ -113,7 +120,8 @@ static const struct seq_operations rxrpc_call_seq_ops = {
static int rxrpc_call_seq_open(struct inode *inode, struct file *file)
{
- return seq_open(file, &rxrpc_call_seq_ops);
+ return seq_open_net(inode, file, &rxrpc_call_seq_ops,
+ sizeof(struct seq_net_private));
}
const struct file_operations rxrpc_call_seq_fops = {
@@ -129,27 +137,34 @@ const struct file_operations rxrpc_call_seq_fops = {
*/
static void *rxrpc_connection_seq_start(struct seq_file *seq, loff_t *_pos)
{
- read_lock(&rxrpc_connection_lock);
- return seq_list_start_head(&rxrpc_connection_proc_list, *_pos);
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
+ read_lock(&rxnet->conn_lock);
+ return seq_list_start_head(&rxnet->conn_proc_list, *_pos);
}
static void *rxrpc_connection_seq_next(struct seq_file *seq, void *v,
loff_t *pos)
{
- return seq_list_next(v, &rxrpc_connection_proc_list, pos);
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
+ return seq_list_next(v, &rxnet->conn_proc_list, pos);
}
static void rxrpc_connection_seq_stop(struct seq_file *seq, void *v)
{
- read_unlock(&rxrpc_connection_lock);
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
+
+ read_unlock(&rxnet->conn_lock);
}
static int rxrpc_connection_seq_show(struct seq_file *seq, void *v)
{
struct rxrpc_connection *conn;
+ struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
char lbuff[50], rbuff[50];
- if (v == &rxrpc_connection_proc_list) {
+ if (v == &rxnet->conn_proc_list) {
seq_puts(seq,
"Proto Local "
" Remote "
@@ -197,7 +212,8 @@ static const struct seq_operations rxrpc_connection_seq_ops = {
static int rxrpc_connection_seq_open(struct inode *inode, struct file *file)
{
- return seq_open(file, &rxrpc_connection_seq_ops);
+ return seq_open_net(inode, file, &rxrpc_connection_seq_ops,
+ sizeof(struct seq_net_private));
}
const struct file_operations rxrpc_connection_seq_fops = {
^ permalink raw reply related
* [PATCH net-next] net: rps: don't skip offline cpus when set rps_cpus
From: Liping Zhang @ 2017-05-15 12:55 UTC (permalink / raw)
To: davem; +Cc: netdev, therbert, Liping Zhang
From: Liping Zhang <zlpnobody@gmail.com>
On our 4-core system, sometimes I can enable all CPUs to process packets.
But sometimes I can't, if all the CPUs become offline except core 0, I
will get the following result, which is really annoying for my script:
# echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
# cat /sys/class/net/eth0/queues/rx-0/rps_cpus
1
Since we won't steer the packets to these offline cpus, it's reasonable
to enable all configed cpus to the rps_map, even if they are offline for
the time being.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
---
net/core/net-sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 65ea0ff..8adb36d 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -731,7 +731,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
}
i = 0;
- for_each_cpu_and(cpu, mask, cpu_online_mask)
+ for_each_cpu_and(cpu, mask, cpu_possible_mask)
map->cpus[i++] = cpu;
if (i)
--
2.9.3
^ permalink raw reply related
* Re: stable/linux-4.10.y build: 203 builds: 3 failed, 200 passed, 3 errors, 5 warnings (v4.10.16)
From: Arnd Bergmann @ 2017-05-15 12:57 UTC (permalink / raw)
To: kernelci.org bot
Cc: Kernel Build Reports Mailman List, Eric Dumazet, gregkh, stable,
Networking
In-Reply-To: <5918603a.118d1c0a.d6026.6b1a@mx.google.com>
On Sun, May 14, 2017 at 3:48 PM, kernelci.org bot <bot@kernelci.org> wrote:
>
> stable/linux-4.10.y build: 203 builds: 3 failed, 200 passed, 3 errors, 5 warnings (v4.10.16)
> Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.10.y/kernel/v4.10.16/
> Tree: stable
> Branch: linux-4.10.y
> Git Describe: v4.10.16
> Git Commit: 6e8e9958691907e8d7eb3b2107619dddbdaeb175
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> Built: 4 unique architectures
>
> Build Failures Detected:
>
> arm: gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05)
> spear3xx_defconfig FAIL
> spear6xx_defconfig FAIL
> tct_hammer_defconfig FAIL
> Errors summary:
> 3 net/core/skbuff.c:1575:37: error: 'sock_edemux' undeclared (first use in this function)
> Warnings summary:
This is a regression against v4.10.15, caused by the backport of
c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") by Eric
Dumazet.
Part of another commit that Eric did earlier fixes it:
158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()")
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1534,7 +1534,7 @@ void sock_efree(struct sk_buff *skb);
#ifdef CONFIG_INET
void sock_edemux(struct sk_buff *skb);
#else
-#define sock_edemux(skb) sock_efree(skb)
+#define sock_edemux sock_efree
#endif
int sock_setsockopt(struct socket *sock, int level, int op,
This commit is not marked 'Cc: stable' upstream, but is referenced
in the one that was backported and looks like it might be appropriate
for stable as well. Eric, can you clarify?
Arnd
^ permalink raw reply
* Re: [PATCH 2/3] bpf: Track alignment of MAP pointers in verifier.
From: Daniel Borkmann @ 2017-05-15 13:10 UTC (permalink / raw)
To: David Miller; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <20170514.210005.2210026226038557615.davem@davemloft.net>
On 05/15/2017 03:00 AM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Sun, 14 May 2017 16:31:10 +0200
>
>> On 05/13/2017 04:28 AM, David Miller wrote:
>>> @@ -823,10 +825,27 @@ static int check_pkt_ptr_alignment(const struct
>>> bpf_reg_state *reg,
>>> }
>>>
>>> static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
>>> - int size, bool strict)
>>> + int off, int size, bool strict)
>>> {
>>> - if (strict && size != 1) {
>>> - verbose("Unknown alignment. Only byte-sized access allowed in value
>>> - access.\n");
>>> + int reg_off;
>>> +
>>> + /* Byte size accesses are always allowed. */
>>> + if (!strict || size == 1)
>>> + return 0;
>>> +
>>> + reg_off = reg->off;
>>> + if (reg->id) {
>>> + if (reg->aux_off_align % size) {
>>> + verbose("Value access is only %u byte aligned, %d byte access not
>>> allowed\n",
>>> + reg->aux_off_align, size);
>>> + return -EACCES;
>>> + }
>>> + reg_off += reg->aux_off;
>>> + }
>>
>> What are the semantics of using id here? In ptr_to_pkt, we have it,
>> so that eventually, in find_good_pkt_pointers() we can match on id
>> and update the range for all such regs with the same id. I'm just
>> wondering as the side effect of this is that this makes state
>> pruning worse.
>
> Ok. I was advancing reg->id so that it can be used here as the signal
> that there is "auxiliary" components to the pointer, and thus we need
> to take reg->aux_off_align and reg->aux_off into account.
>
> I did not realize the state pruning component of reg->id so I'll need
> to look more deeply into this.
>
> We could use something other than reg->id to decide if there are
> variable components to the pointer if necessary.
>
>> Also, reg->off is currently only used in ptr_to_pkt types and
>> checked as well in check_packet_access(). Now as semantics change,
>> do we need to check for it as well in check_map_access_adj() which
>> we currently don't do?
>
> It should not be necessary. Both before and after my changes we
> validate the access range using the reg->min_value and reg->max_value.
>
>>> - /* a constant was added to pkt_ptr.
>>> + /* a constant was added to the pointer.
>>> * Remember it while keeping the same 'id'
>>> */
>>> dst_reg->off += imm;
>>
>> Can this now overflow for map type? Also in the UNKNOWN_VALUE case
>> below since overflow checks are then only enforced in ptr_to_pkt case?
>
> Indeed, we will have to do "something". The reg->off used to be u16
> and is now a u32 with my changes. So it can handle something larger
> than MAX_PACKET_OFF.
>
> But we still have to handle overflow. I am not so sure what range of
> offsets is reasonable for these MAP pointers, can you make a
> suggestion?
The worst-case maximum allowed value size is currently at KMALLOC_MAX_SIZE
(see array_map_alloc()), so we might need to take that one into account.
>>> } else {
>>> - bool had_id;
>>> -
>>> - if (src_reg->type == PTR_TO_PACKET) {
>>> + if (is_packet && src_reg->type == PTR_TO_PACKET) {
>>> /* R6=pkt(id=0,off=0,r=62) R7=imm22; r7 += r6 */
>>> tmp_reg = *dst_reg; /* save r7 state */
>>> *dst_reg = *src_reg; /* copy pkt_ptr state r6 into r7 */
>>
>> I believe clang could probably generate something similar also for
>> map value pointers.
>
> Ok, it should be easy to make that part work both with packet pointers
> and MAPs.
>
> Thanks for your feedback, I'll try to refine this some more.
Ok, thanks!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox