[PATCH RFC] unix: account skb memory to receiving socket's sk_rmem

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
@ 2013-02-04 23:14 Hannes Frederic Sowa
  2013-03-09 21:43 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-02-04 23:14 UTC (permalink / raw)
  To: netdev; +Cc: yannick, eric.dumazet, xiyou.wangcong, davem

I justed sketched up a patch on how to account unix domain dgram socket buffer
to the receiving sock. This problem has been brought up by Yannick Koehler
here: http://article.gmane.org/gmane.linux.network/256128

I still miss proper poll() handling and am working out on how to introduce
the sysctl unix_dgram_*mem* vectors (need to figuire out correct socket
lock handling). Eric mentioned that calling sock_rfree without socket lock
is wrong, but I hope that this is only the case if memory accounting is
taking place (as currently isn't with this patch)? Otherwise I am glad
to hear advises on how to handle the POLLOUT|POLLWRNORM|... case.

Is sticking the unix address into the skbs unixcb a viable solution?

(Patch should work with 3.8-rc6)

Thanks:

[PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending

In case of unix datagram sockets, skb memory was only accounted in the
sending socket's sk_wmem_alloc. Hence, if one receiver would stop to
receive frames on its socket, the sending socket's send buffer space
could get exhausted and the socket would block sending datagrams to
other destionations, too.

This patch places the refcounted peer's unix address for AF_UNIX
SOCK_DGRAM sockets into the skb's UNIXCB. So a reference from the skb
to the receiving struct sock can be set and so enables to do proper skb
destructor handling for rmem and wmem. Buffer memory is then accounted
to the receiving socket. If the socket rmem is exhausted the normal
blocking and timeout behaviour kicks in.

Based on the patches from Yannick Koehler and Cong Wang.

Reported-by: Yannick Koehler <yannick@koehler.name>
CC: Yannick Koehler <yannick@koehler.name>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c    | 61 +++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 0a996a3..a618a2e 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -31,6 +31,7 @@ struct unix_skb_parms {
 	struct pid		*pid;		/* Skb credentials	*/
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
+	struct unix_address	*peer_address;
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Security ID		*/
 #endif
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5b5c876..73d1436 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -184,6 +184,12 @@ static inline int unix_recvq_full(struct sock const *sk)
 	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
 }
 
+static inline bool unix_rmem_full(struct sock const *sk,
+				  struct sk_buff const *skb)
+{
+	return sk_rmem_alloc_get(sk) + skb->truesize > sk->sk_rcvbuf;
+}
+
 struct sock *unix_peer_get(struct sock *s)
 {
 	struct sock *peer;
@@ -637,6 +643,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 		goto out;
 
 	sock_init_data(sock, sk);
+	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_SEQPACKET)
+		sock_set_flag(sk, SOCK_USE_WRITE_QUEUE);
 	lockdep_set_class(&sk->sk_receive_queue.lock,
 				&af_unix_sk_receive_queue_lock_key);
 
@@ -1338,7 +1346,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_scm(struct sk_buff *skb)
+static inline void __unix_skb_destruct(struct sk_buff *skb)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1350,6 +1358,19 @@ static void unix_destruct_scm(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
+	if (UNIXCB(skb).peer_address)
+		unix_release_addr(UNIXCB(skb).peer_address);
+}
+
+static void unix_skb_destruct_r(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
+	sock_rfree(skb);
+}
+
+static void unix_skb_destruct_w(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
 	sock_wfree(skb);
 }
 
@@ -1400,7 +1421,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
-	skb->destructor = unix_destruct_scm;
+	skb->destructor = unix_skb_destruct_w;
 	return err;
 }
 
@@ -1422,6 +1443,19 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+static void unix_skb_set_owner_r(struct sk_buff *skb, struct sock *oldsk,
+					struct sock *newsk)
+{
+	WARN_ON(!sock_flag(oldsk, SOCK_USE_WRITE_QUEUE));
+	WARN_ON(skb->sk != oldsk);
+	sock_wfree(skb);
+	skb->sk = newsk;
+	skb->destructor = unix_skb_destruct_r;
+	atomic_add(skb->truesize, &newsk->sk_rmem_alloc);
+	WARN_ON(sk_has_account(newsk));
+	sk_mem_charge(newsk, skb->truesize); /* nop */
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1486,6 +1520,11 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
+	if (u->addr) {
+		UNIXCB(skb).peer_address = u->addr;
+		atomic_inc(&UNIXCB(skb).peer_address->refcnt);
+	}
+
 	err = unix_scm_to_skb(siocb->scm, skb, true);
 	if (err < 0)
 		goto out_free;
@@ -1561,7 +1600,8 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
+	if ((unix_peer(other) != sk && unix_recvq_full(other)) ||
+	    unix_rmem_full(other, skb)) {
 		if (!timeo) {
 			err = -EAGAIN;
 			goto out_unlock;
@@ -1579,6 +1619,7 @@ restart:
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
+	unix_skb_set_owner_r(skb, sk, other);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
 		unix_sk(other)->recursion_level = max_level;
@@ -1751,14 +1792,12 @@ static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
 }
 
-static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
+static void unix_copy_addr(struct msghdr *msg, struct unix_address *ua)
 {
-	struct unix_sock *u = unix_sk(sk);
-
 	msg->msg_namelen = 0;
-	if (u->addr) {
-		msg->msg_namelen = u->addr->len;
-		memcpy(msg->msg_name, u->addr->name, u->addr->len);
+	if (ua) {
+		msg->msg_namelen = ua->len;
+		memcpy(msg->msg_name, ua->name, ua->len);
 	}
 }
 
@@ -1804,7 +1843,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
-		unix_copy_addr(msg, skb->sk);
+		unix_copy_addr(msg, UNIXCB(skb).peer_address);
 
 	if (size > skb->len - skip)
 		size = skb->len - skip;
@@ -2004,7 +2043,7 @@ again:
 
 		/* Copy address just once */
 		if (sunaddr) {
-			unix_copy_addr(msg, skb->sk);
+			unix_copy_addr(msg, unix_sk(skb->sk)->addr);
 			sunaddr = NULL;
 		}
 
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-02-04 23:14 [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending Hannes Frederic Sowa
@ 2013-03-09 21:43 ` Hannes Frederic Sowa
  2013-03-10  4:31   ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-09 21:43 UTC (permalink / raw)
  To: netdev, yannick, eric.dumazet, xiyou.wangcong, davem

On Tue, Feb 05, 2013 at 12:14:14AM +0100, Hannes Frederic Sowa wrote:
> I justed sketched up a patch on how to account unix domain dgram socket buffer
> to the receiving sock. This problem has been brought up by Yannick Koehler
> here: http://article.gmane.org/gmane.linux.network/256128
> 
> I still miss proper poll() handling and am working out on how to introduce
> the sysctl unix_dgram_*mem* vectors (need to figuire out correct socket
> lock handling). Eric mentioned that calling sock_rfree without socket lock
> is wrong, but I hope that this is only the case if memory accounting is
> taking place (as currently isn't with this patch)? Otherwise I am glad
> to hear advises on how to handle the POLLOUT|POLLWRNORM|... case.
> 
> Is sticking the unix address into the skbs unixcb a viable solution?

I had this patch left from the last net-next submission timeframe. In
the meantime I did some updates I would love to have a review on. It
e.g. includes poll handling now.

This patch lacks documentation updates for max_dgram_qlen. I'll update
the documentation on the next submission.

Btw, iproute from current git has the ability to report socket memory for
unix domain sockets, too. So these changes should be better observable.

[PATCH net-next RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending

In case of unix datagram sockets, skb memory was only accounted in the
sending socket's sk_wmem_alloc. Hence, if one receiver would stop to
receive frames on its socket, the sending socket's send buffer space
could get exhausted and the socket would block sending datagrams to
other destionations, too.

This patch places the refcounted peer's unix address for AF_UNIX
SOCK_DGRAM sockets into the skb's UNIXCB. So a reference from the skb
to the receiving struct sock can be set and so enables to do proper skb
destructor handling for rmem and wmem. Buffer memory is then accounted
to the receiving socket. If the socket rmem is exhausted the normal
blocking and timeout behaviour kicks in.

Resource exhausion protection for unix dgram sockets is now based
only on sockets rmem checking. Unix dgram sockets do not rely on
sk_max_ack_backlog anymore. The controls for this are
/proc/sys/net/core/{r,w}mem_{default,max}.

This patch also changes the reporting of unix dgram rqueue size, as it
now reports not only the size of the first fragment but the amount of
readable memory for the socket.

Based on the patches from Yannick Koehler and Cong Wang.

Reported-by: Yannick Koehler <yannick@koehler.name>
CC: Yannick Koehler <yannick@koehler.name>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c    | 73 ++++++++++++++++++++++++++++++++++++++++-----------
 net/unix/diag.c       |  4 ++-
 3 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 0a996a3..3855fcc 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -31,6 +31,7 @@ struct unix_skb_parms {
 	struct pid		*pid;		/* Skb credentials	*/
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
+	struct unix_address	*peer_address;	/* only used for dgram	*/
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Security ID		*/
 #endif
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51be64f..bb00856 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -184,6 +184,12 @@ static inline int unix_recvq_full(struct sock const *sk)
 	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
 }
 
+static inline bool unix_rmem_full(struct sock const *sk,
+				  struct sk_buff const *skb)
+{
+	return sk_rmem_alloc_get(sk) + skb->truesize > sk->sk_rcvbuf;
+}
+
 struct sock *unix_peer_get(struct sock *s)
 {
 	struct sock *peer;
@@ -319,6 +325,11 @@ static inline int unix_writable(struct sock *sk)
 	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
 }
 
+static inline bool unix_other_writable(struct sock *sk)
+{
+	return (atomic_read(&sk->sk_rmem_alloc) << 2) <= sk->sk_rcvbuf;
+}
+
 static void unix_write_space(struct sock *sk)
 {
 	struct socket_wq *wq;
@@ -635,6 +646,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 		goto out;
 
 	sock_init_data(sock, sk);
+	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_SEQPACKET)
+		sock_set_flag(sk, SOCK_USE_WRITE_QUEUE);
 	lockdep_set_class(&sk->sk_receive_queue.lock,
 				&af_unix_sk_receive_queue_lock_key);
 
@@ -1032,14 +1045,18 @@ out:
 static long unix_wait_for_peer(struct sock *other, long timeo)
 {
 	struct unix_sock *u = unix_sk(other);
-	int sched;
+	bool sched;
 	DEFINE_WAIT(wait);
 
 	prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
 
 	sched = !sock_flag(other, SOCK_DEAD) &&
-		!(other->sk_shutdown & RCV_SHUTDOWN) &&
-		unix_recvq_full(other);
+		!(other->sk_shutdown & RCV_SHUTDOWN);
+
+	if (other->sk_type == SOCK_DGRAM || other->sk_type == SOCK_SEQPACKET)
+		sched = sched && unix_other_writable(other);
+	else
+		sched = sched && unix_recvq_full(other);
 
 	unix_state_unlock(other);
 
@@ -1336,7 +1353,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_scm(struct sk_buff *skb)
+static inline void __unix_skb_destruct(struct sk_buff *skb)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1348,6 +1365,19 @@ static void unix_destruct_scm(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
+	if (UNIXCB(skb).peer_address)
+		unix_release_addr(UNIXCB(skb).peer_address);
+}
+
+static void unix_skb_destruct_r(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
+	sock_rfree(skb);
+}
+
+static void unix_skb_destruct_w(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
 	sock_wfree(skb);
 }
 
@@ -1398,7 +1428,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
-	skb->destructor = unix_destruct_scm;
+	skb->destructor = unix_skb_destruct_w;
 	return err;
 }
 
@@ -1420,6 +1450,15 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+static void unix_skb_set_owner_r(struct sk_buff *skb, struct sock *oldsk,
+				 struct sock *newsk)
+{
+	sock_wfree(skb);
+	skb->sk = newsk;
+	skb->destructor = unix_skb_destruct_r;
+	atomic_add(skb->truesize, &newsk->sk_rmem_alloc);
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1484,6 +1523,11 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
+	if (u->addr) {
+		UNIXCB(skb).peer_address = u->addr;
+		atomic_inc(&UNIXCB(skb).peer_address->refcnt);
+	}
+
 	err = unix_scm_to_skb(siocb->scm, skb, true);
 	if (err < 0)
 		goto out_free;
@@ -1559,7 +1603,7 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
+	if (unix_rmem_full(other, skb)) {
 		if (!timeo) {
 			err = -EAGAIN;
 			goto out_unlock;
@@ -1577,6 +1621,7 @@ restart:
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
+	unix_skb_set_owner_r(skb, sk, other);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
 		unix_sk(other)->recursion_level = max_level;
@@ -1749,14 +1794,12 @@ static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
 }
 
-static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
+static void unix_copy_addr(struct msghdr *msg, struct unix_address *ua)
 {
-	struct unix_sock *u = unix_sk(sk);
-
 	msg->msg_namelen = 0;
-	if (u->addr) {
-		msg->msg_namelen = u->addr->len;
-		memcpy(msg->msg_name, u->addr->name, u->addr->len);
+	if (ua) {
+		msg->msg_namelen = ua->len;
+		memcpy(msg->msg_name, ua->name, ua->len);
 	}
 }
 
@@ -1802,7 +1845,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
-		unix_copy_addr(msg, skb->sk);
+		unix_copy_addr(msg, UNIXCB(skb).peer_address);
 
 	if (size > skb->len - skip)
 		size = skb->len - skip;
@@ -2002,7 +2045,7 @@ again:
 
 		/* Copy address just once */
 		if (sunaddr) {
-			unix_copy_addr(msg, skb->sk);
+			unix_copy_addr(msg, unix_sk(skb->sk)->addr);
 			sunaddr = NULL;
 		}
 
@@ -2225,7 +2268,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	if (other) {
 		if (unix_peer(other) != sk) {
 			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
+			if (!unix_other_writable(other))
 				writable = 0;
 		}
 		sock_put(other);
diff --git a/net/unix/diag.c b/net/unix/diag.c
index d591091..41a38ec 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -102,7 +102,9 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
 		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
-		rql.udiag_rqueue = (u32) unix_inq_len(sk);
+		rql.udiag_rqueue = (u32) (sk->sk_type == SOCK_DGRAM ?
+					  sk_rmem_alloc_get(sk) :
+					  unix_inq_len(sk));
 		rql.udiag_wqueue = (u32) unix_outq_len(sk);
 	}
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-09 21:43 ` Hannes Frederic Sowa
@ 2013-03-10  4:31   ` Eric Dumazet
  2013-03-10  4:40     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2013-03-10  4:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, yannick, xiyou.wangcong, davem

On Sat, 2013-03-09 at 22:43 +0100, Hannes Frederic Sowa wrote:

> I had this patch left from the last net-next submission timeframe. In
> the meantime I did some updates I would love to have a review on. It
> e.g. includes poll handling now.
> 
> This patch lacks documentation updates for max_dgram_qlen. I'll update
> the documentation on the next submission.
> 
> Btw, iproute from current git has the ability to report socket memory for
> unix domain sockets, too. So these changes should be better observable.

I am busy this week attending Netfilter Workshop in Copenhagen.

Do you have a user test program ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-10  4:31   ` Eric Dumazet
@ 2013-03-10  4:40     ` Hannes Frederic Sowa
  2013-03-11 19:37       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-10  4:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, yannick, xiyou.wangcong, davem

On Sun, Mar 10, 2013 at 05:31:01AM +0100, Eric Dumazet wrote:
> On Sat, 2013-03-09 at 22:43 +0100, Hannes Frederic Sowa wrote:
> 
> > I had this patch left from the last net-next submission timeframe. In
> > the meantime I did some updates I would love to have a review on. It
> > e.g. includes poll handling now.
> > 
> > This patch lacks documentation updates for max_dgram_qlen. I'll update
> > the documentation on the next submission.
> > 
> > Btw, iproute from current git has the ability to report socket memory for
> > unix domain sockets, too. So these changes should be better observable.
> 
> I am busy this week attending Netfilter Workshop in Copenhagen.
> 
> Do you have a user test program ?

I used a couple of perl scripts. I'll bring them in shape and will post them
here, hopefully tomorrow.

Have great fun in Copenhagen!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-10  4:40     ` Hannes Frederic Sowa
@ 2013-03-11 19:37       ` Hannes Frederic Sowa
  2013-03-26  0:17         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-11 19:37 UTC (permalink / raw)
  To: Eric Dumazet, netdev, yannick, xiyou.wangcong, davem

On Sun, Mar 10, 2013 at 05:40:43AM +0100, Hannes Frederic Sowa wrote:
> On Sun, Mar 10, 2013 at 05:31:01AM +0100, Eric Dumazet wrote:
> > Do you have a user test program ?
> 
> I used a couple of perl scripts. I'll bring them in shape and will post them
> here, hopefully tomorrow.

Did not have enough time yesterday, so now here they are. I just
copy and pasted them together (the code is ugly, sorry). Last test
(test_send_multiple) should fail on current kernels and should work on
kernels with this patch applied (please increase max_dgram_qlen on as
it should not be the limiting factor):

  https://gist.github.com/hannes/5136858

or

  git clone https://gist.github.com/5136858.git

While doing this I added a testcase for poll on self-connected sockets. I
did report a writeable socket here, while in fact it is not. I fixed
this case by removing the 'if (unix_peer(other) != sk)' check in
unix_dgram_poll because it is not necessary anymore.

Two other changes are rather cosmetic: I replaced
atomic_read(&sk->sk_{r,w}mem_alloc) with sk_{r,w}mem_alloc_get.

The thing I am a bit nervous about is the handling of unix_write_space. I
disabled the call in sock_wfree (by setting SOCK_USE_WRITE_QUEUE)  because
I am unsure if we can call sk_wake_async without the socket lock, which
would interfere with the unix_state spinlocks. Otherwise I think it is called
later on without socket lock, too. So I am tempted to remove this hunk from
the patch because otherwise sleeping processes in sock_alloc_send_pskb would
not be woken up.

[PATCH net-next RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending

In case of unix datagram sockets, skb memory was only accounted in the
sending socket's sk_wmem_alloc. Hence, if one receiver would stop to
receive frames on its socket, the sending socket's send buffer space
could get exhausted and the socket would block sending datagrams to
other destionations, too.

This patch places the refcounted peer's unix address for AF_UNIX
SOCK_DGRAM sockets into the skb's UNIXCB. So a reference from the skb
to the receiving struct sock can be set and so enables to do proper skb
destructor handling for rmem and wmem. Buffer memory is then accounted
to the receiving socket. If the socket rmem is exhausted the normal
blocking and timeout behaviour kicks in.

Resource exhausion protection for unix dgram sockets is now based
only on sockets rmem checking. Unix dgram sockets do not rely on
sk_max_ack_backlog anymore. The controls for this are
/proc/sys/net/core/{r,w}mem_{default,max}.

This patch also changes the reporting of unix dgram rqueue size, as it
now reports not only the size of the first fragment but the amount of
readable memory for the socket.

Based on the patches from Yannick Koehler and Cong Wang.

Reported-by: Yannick Koehler <yannick@koehler.name>
CC: Yannick Koehler <yannick@koehler.name>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c    | 81 ++++++++++++++++++++++++++++++++++++++-------------
 net/unix/diag.c       |  4 ++-
 3 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 0a996a3..3855fcc 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -31,6 +31,7 @@ struct unix_skb_parms {
 	struct pid		*pid;		/* Skb credentials	*/
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
+	struct unix_address	*peer_address;	/* only used for dgram	*/
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Security ID		*/
 #endif
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51be64f..7caf111 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -184,6 +184,12 @@ static inline int unix_recvq_full(struct sock const *sk)
 	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
 }
 
+static inline bool unix_rmem_full(struct sock const *sk,
+				  struct sk_buff const *skb)
+{
+	return sk_rmem_alloc_get(sk) + skb->truesize > sk->sk_rcvbuf;
+}
+
 struct sock *unix_peer_get(struct sock *s)
 {
 	struct sock *peer;
@@ -316,7 +322,12 @@ found:
 
 static inline int unix_writable(struct sock *sk)
 {
-	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+	return (sk_wmem_alloc_get(sk) << 2) <= sk->sk_sndbuf;
+}
+
+static inline bool unix_other_writable(struct sock *sk)
+{
+	return (sk_rmem_alloc_get(sk) << 2) <= sk->sk_rcvbuf;
 }
 
 static void unix_write_space(struct sock *sk)
@@ -635,6 +646,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 		goto out;
 
 	sock_init_data(sock, sk);
+	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_SEQPACKET)
+		sock_set_flag(sk, SOCK_USE_WRITE_QUEUE);
 	lockdep_set_class(&sk->sk_receive_queue.lock,
 				&af_unix_sk_receive_queue_lock_key);
 
@@ -1032,14 +1045,18 @@ out:
 static long unix_wait_for_peer(struct sock *other, long timeo)
 {
 	struct unix_sock *u = unix_sk(other);
-	int sched;
+	bool sched;
 	DEFINE_WAIT(wait);
 
 	prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
 
 	sched = !sock_flag(other, SOCK_DEAD) &&
-		!(other->sk_shutdown & RCV_SHUTDOWN) &&
-		unix_recvq_full(other);
+		!(other->sk_shutdown & RCV_SHUTDOWN);
+
+	if (other->sk_type == SOCK_DGRAM || other->sk_type == SOCK_SEQPACKET)
+		sched = sched && unix_other_writable(other);
+	else
+		sched = sched && unix_recvq_full(other);
 
 	unix_state_unlock(other);
 
@@ -1336,7 +1353,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_scm(struct sk_buff *skb)
+static inline void __unix_skb_destruct(struct sk_buff *skb)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1348,6 +1365,19 @@ static void unix_destruct_scm(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
+	if (UNIXCB(skb).peer_address)
+		unix_release_addr(UNIXCB(skb).peer_address);
+}
+
+static void unix_skb_destruct_r(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
+	sock_rfree(skb);
+}
+
+static void unix_skb_destruct_w(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
 	sock_wfree(skb);
 }
 
@@ -1398,7 +1428,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
-	skb->destructor = unix_destruct_scm;
+	skb->destructor = unix_skb_destruct_w;
 	return err;
 }
 
@@ -1420,6 +1450,15 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+static void unix_skb_set_owner_r(struct sk_buff *skb, struct sock *oldsk,
+				 struct sock *newsk)
+{
+	sock_wfree(skb);
+	skb->sk = newsk;
+	skb->destructor = unix_skb_destruct_r;
+	atomic_add(skb->truesize, &newsk->sk_rmem_alloc);
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1484,6 +1523,11 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
+	if (u->addr) {
+		UNIXCB(skb).peer_address = u->addr;
+		atomic_inc(&UNIXCB(skb).peer_address->refcnt);
+	}
+
 	err = unix_scm_to_skb(siocb->scm, skb, true);
 	if (err < 0)
 		goto out_free;
@@ -1559,7 +1603,7 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
+	if (unix_rmem_full(other, skb)) {
 		if (!timeo) {
 			err = -EAGAIN;
 			goto out_unlock;
@@ -1577,6 +1621,7 @@ restart:
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
+	unix_skb_set_owner_r(skb, sk, other);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
 		unix_sk(other)->recursion_level = max_level;
@@ -1749,14 +1794,12 @@ static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
 }
 
-static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
+static void unix_copy_addr(struct msghdr *msg, struct unix_address *ua)
 {
-	struct unix_sock *u = unix_sk(sk);
-
 	msg->msg_namelen = 0;
-	if (u->addr) {
-		msg->msg_namelen = u->addr->len;
-		memcpy(msg->msg_name, u->addr->name, u->addr->len);
+	if (ua) {
+		msg->msg_namelen = ua->len;
+		memcpy(msg->msg_name, ua->name, ua->len);
 	}
 }
 
@@ -1802,7 +1845,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
-		unix_copy_addr(msg, skb->sk);
+		unix_copy_addr(msg, UNIXCB(skb).peer_address);
 
 	if (size > skb->len - skip)
 		size = skb->len - skip;
@@ -2002,7 +2045,7 @@ again:
 
 		/* Copy address just once */
 		if (sunaddr) {
-			unix_copy_addr(msg, skb->sk);
+			unix_copy_addr(msg, unix_sk(skb->sk)->addr);
 			sunaddr = NULL;
 		}
 
@@ -2223,11 +2266,9 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	writable = unix_writable(sk);
 	other = unix_peer_get(sk);
 	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
+		sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
+		if (!unix_other_writable(other))
+			writable = 0;
 		sock_put(other);
 	}
 
diff --git a/net/unix/diag.c b/net/unix/diag.c
index d591091..41a38ec 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -102,7 +102,9 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
 		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
-		rql.udiag_rqueue = (u32) unix_inq_len(sk);
+		rql.udiag_rqueue = (u32) (sk->sk_type == SOCK_DGRAM ?
+					  sk_rmem_alloc_get(sk) :
+					  unix_inq_len(sk));
 		rql.udiag_wqueue = (u32) unix_outq_len(sk);
 	}
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-11 19:37       ` Hannes Frederic Sowa
@ 2013-03-26  0:17         ` Hannes Frederic Sowa
  2013-03-26 15:53           ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-26  0:17 UTC (permalink / raw)
  To: Eric Dumazet, netdev, yannick, xiyou.wangcong, davem

On Mon, Mar 11, 2013 at 08:37:04PM +0100, Hannes Frederic Sowa wrote:
> On Sun, Mar 10, 2013 at 05:40:43AM +0100, Hannes Frederic Sowa wrote:
> > On Sun, Mar 10, 2013 at 05:31:01AM +0100, Eric Dumazet wrote:
> > > Do you have a user test program ?
> > 
> > I used a couple of perl scripts. I'll bring them in shape and will post them
> > here, hopefully tomorrow.
> 
> Did not have enough time yesterday, so now here they are. I just
> copy and pasted them together (the code is ugly, sorry). Last test
> (test_send_multiple) should fail on current kernels and should work on
> kernels with this patch applied (please increase max_dgram_qlen on as
> it should not be the limiting factor):
> 
>   https://gist.github.com/hannes/5136858
> 
> or
> 
>   git clone https://gist.github.com/5136858.git
> 

This is the newest version of this patch. I only stripped out the setting
of the SOCK_USE_WRITE_QUEUE socket flag (I was unsure if otherwise
sock_wfree needed the socket lock but I am pretty confident that is
is not needed). Perhaps someone finds some time to have a look at this
patch. Also, if you come up with things my test case does not cover yet
please let me know and I will catch up with the tests.

Patch is based on net-next.

Thanks!

[PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending

In case of unix datagram sockets, skb memory was only accounted in the
sending socket's sk_wmem_alloc. Hence, if one receiver would stop to
receive frames on its socket, the sending socket's send buffer space
could get exhausted and the socket would block sending datagrams to
other destionations, too.

This patch places the refcounted peer's unix address for AF_UNIX
SOCK_DGRAM sockets into the skb's UNIXCB. So a reference from the skb
to the receiving struct sock can be set and so enables to do proper skb
destructor handling for rmem and wmem. Buffer memory is then accounted
to the receiving socket. If the socket rmem is exhausted the normal
blocking and timeout behaviour kicks in.

Resource exhausion protection for unix dgram sockets is now based
only on sockets rmem checking. Unix dgram sockets do not rely on
sk_max_ack_backlog anymore. The controls for this are
/proc/sys/net/core/{r,w}mem_{default,max}.

This patch also changes the reporting of unix dgram rqueue size, as it
now reports not only the size of the first fragment but the amount of
readable memory for the socket.

Based on the patches from Yannick Koehler and Cong Wang.

Reported-by: Yannick Koehler <yannick@koehler.name>
CC: Yannick Koehler <yannick@koehler.name>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/af_unix.h |  1 +
 net/unix/af_unix.c    | 79 ++++++++++++++++++++++++++++++++++++++-------------
 net/unix/diag.c       |  4 ++-
 3 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 0a996a3..3855fcc 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -31,6 +31,7 @@ struct unix_skb_parms {
 	struct pid		*pid;		/* Skb credentials	*/
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
+	struct unix_address	*peer_address;	/* only used for dgram	*/
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Security ID		*/
 #endif
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51be64f..741c88c 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -184,6 +184,12 @@ static inline int unix_recvq_full(struct sock const *sk)
 	return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
 }
 
+static inline bool unix_rmem_full(struct sock const *sk,
+				  struct sk_buff const *skb)
+{
+	return sk_rmem_alloc_get(sk) + skb->truesize > sk->sk_rcvbuf;
+}
+
 struct sock *unix_peer_get(struct sock *s)
 {
 	struct sock *peer;
@@ -316,7 +322,12 @@ found:
 
 static inline int unix_writable(struct sock *sk)
 {
-	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+	return (sk_wmem_alloc_get(sk) << 2) <= sk->sk_sndbuf;
+}
+
+static inline bool unix_other_writable(struct sock *sk)
+{
+	return (sk_rmem_alloc_get(sk) << 2) <= sk->sk_rcvbuf;
 }
 
 static void unix_write_space(struct sock *sk)
@@ -1032,14 +1043,18 @@ out:
 static long unix_wait_for_peer(struct sock *other, long timeo)
 {
 	struct unix_sock *u = unix_sk(other);
-	int sched;
+	bool sched;
 	DEFINE_WAIT(wait);
 
 	prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
 
 	sched = !sock_flag(other, SOCK_DEAD) &&
-		!(other->sk_shutdown & RCV_SHUTDOWN) &&
-		unix_recvq_full(other);
+		!(other->sk_shutdown & RCV_SHUTDOWN);
+
+	if (other->sk_type == SOCK_DGRAM || other->sk_type == SOCK_SEQPACKET)
+		sched = sched && unix_other_writable(other);
+	else
+		sched = sched && unix_recvq_full(other);
 
 	unix_state_unlock(other);
 
@@ -1336,7 +1351,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_scm(struct sk_buff *skb)
+static inline void __unix_skb_destruct(struct sk_buff *skb)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1348,6 +1363,19 @@ static void unix_destruct_scm(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
+	if (UNIXCB(skb).peer_address)
+		unix_release_addr(UNIXCB(skb).peer_address);
+}
+
+static void unix_skb_destruct_r(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
+	sock_rfree(skb);
+}
+
+static void unix_skb_destruct_w(struct sk_buff *skb)
+{
+	__unix_skb_destruct(skb);
 	sock_wfree(skb);
 }
 
@@ -1398,7 +1426,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
-	skb->destructor = unix_destruct_scm;
+	skb->destructor = unix_skb_destruct_w;
 	return err;
 }
 
@@ -1420,6 +1448,15 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+static void unix_skb_set_owner_r(struct sk_buff *skb, struct sock *oldsk,
+				 struct sock *newsk)
+{
+	sock_wfree(skb);
+	skb->sk = newsk;
+	skb->destructor = unix_skb_destruct_r;
+	atomic_add(skb->truesize, &newsk->sk_rmem_alloc);
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1484,6 +1521,11 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
+	if (u->addr) {
+		UNIXCB(skb).peer_address = u->addr;
+		atomic_inc(&UNIXCB(skb).peer_address->refcnt);
+	}
+
 	err = unix_scm_to_skb(siocb->scm, skb, true);
 	if (err < 0)
 		goto out_free;
@@ -1559,7 +1601,7 @@ restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
+	if (unix_rmem_full(other, skb)) {
 		if (!timeo) {
 			err = -EAGAIN;
 			goto out_unlock;
@@ -1577,6 +1619,7 @@ restart:
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
+	unix_skb_set_owner_r(skb, sk, other);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
 		unix_sk(other)->recursion_level = max_level;
@@ -1749,14 +1792,12 @@ static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
 }
 
-static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
+static void unix_copy_addr(struct msghdr *msg, struct unix_address *ua)
 {
-	struct unix_sock *u = unix_sk(sk);
-
 	msg->msg_namelen = 0;
-	if (u->addr) {
-		msg->msg_namelen = u->addr->len;
-		memcpy(msg->msg_name, u->addr->name, u->addr->len);
+	if (ua) {
+		msg->msg_namelen = ua->len;
+		memcpy(msg->msg_name, ua->name, ua->len);
 	}
 }
 
@@ -1802,7 +1843,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
-		unix_copy_addr(msg, skb->sk);
+		unix_copy_addr(msg, UNIXCB(skb).peer_address);
 
 	if (size > skb->len - skip)
 		size = skb->len - skip;
@@ -2002,7 +2043,7 @@ again:
 
 		/* Copy address just once */
 		if (sunaddr) {
-			unix_copy_addr(msg, skb->sk);
+			unix_copy_addr(msg, unix_sk(skb->sk)->addr);
 			sunaddr = NULL;
 		}
 
@@ -2223,11 +2264,9 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	writable = unix_writable(sk);
 	other = unix_peer_get(sk);
 	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
+		sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
+		if (!unix_other_writable(other))
+			writable = 0;
 		sock_put(other);
 	}
 
diff --git a/net/unix/diag.c b/net/unix/diag.c
index d591091..41a38ec 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -102,7 +102,9 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
 		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
-		rql.udiag_rqueue = (u32) unix_inq_len(sk);
+		rql.udiag_rqueue = (u32) (sk->sk_type == SOCK_DGRAM ?
+					  sk_rmem_alloc_get(sk) :
+					  unix_inq_len(sk));
 		rql.udiag_wqueue = (u32) unix_outq_len(sk);
 	}
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-26  0:17         ` Hannes Frederic Sowa
@ 2013-03-26 15:53           ` Eric Dumazet
  2013-03-26 16:42             ` Hannes Frederic Sowa
  2013-04-07 22:47             ` Hannes Frederic Sowa
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2013-03-26 15:53 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, yannick, xiyou.wangcong, davem

On Tue, 2013-03-26 at 01:17 +0100, Hannes Frederic Sowa wrote:

> This is the newest version of this patch. I only stripped out the setting
> of the SOCK_USE_WRITE_QUEUE socket flag (I was unsure if otherwise
> sock_wfree needed the socket lock but I am pretty confident that is
> is not needed). Perhaps someone finds some time to have a look at this
> patch. Also, if you come up with things my test case does not cover yet
> please let me know and I will catch up with the tests.
> 
> Patch is based on net-next.
> 
> Thanks!
> 
> [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
> 
> In case of unix datagram sockets, skb memory was only accounted in the
> sending socket's sk_wmem_alloc. Hence, if one receiver would stop to
> receive frames on its socket, the sending socket's send buffer space
> could get exhausted and the socket would block sending datagrams to
> other destionations, too.
> 
> This patch places the refcounted peer's unix address for AF_UNIX
> SOCK_DGRAM sockets into the skb's UNIXCB. So a reference from the skb
> to the receiving struct sock can be set and so enables to do proper skb
> destructor handling for rmem and wmem. Buffer memory is then accounted
> to the receiving socket. If the socket rmem is exhausted the normal
> blocking and timeout behaviour kicks in.
> 
> Resource exhausion protection for unix dgram sockets is now based
> only on sockets rmem checking. Unix dgram sockets do not rely on
> sk_max_ack_backlog anymore. The controls for this are
> /proc/sys/net/core/{r,w}mem_{default,max}.
> 
> This patch also changes the reporting of unix dgram rqueue size, as it
> now reports not only the size of the first fragment but the amount of
> readable memory for the socket.
> 
> Based on the patches from Yannick Koehler and Cong Wang.

This opens the possibility of a sender to flood a receiver, instead of
being blocked by its own sndbuf.

Do we want such regression ? How many applications might rely on
existing behavior ?

Its not clear what is the appropriate way to handle this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-26 15:53           ` Eric Dumazet
@ 2013-03-26 16:42             ` Hannes Frederic Sowa
  2013-04-07 22:47             ` Hannes Frederic Sowa
  1 sibling, 0 replies; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-26 16:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, yannick, xiyou.wangcong, davem

On Tue, Mar 26, 2013 at 08:53:38AM -0700, Eric Dumazet wrote:
> > This patch also changes the reporting of unix dgram rqueue size, as it
> > now reports not only the size of the first fragment but the amount of
> > readable memory for the socket.
> > 
> > Based on the patches from Yannick Koehler and Cong Wang.
> 
> This opens the possibility of a sender to flood a receiver, instead of
> being blocked by its own sndbuf.

Hm, the sender should get blocked by the receiver's rcvbuf. This opens the
possiblity to flood many receivers at once. But somehow this is the purpose of
this patch. Or am I missing something?

> Do we want such regression ? How many applications might rely on
> existing behavior ?

I tried to not break existing applications. The only way I can think about how
problems could arise would be by applications redoing the buffer calculations
in userspace?

I think it is a bug that a unix dgram socket can trick another dgram
socket into a situation where it cannot accept frames anymore (in case of a
ping-pong protocol).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending
  2013-03-26 15:53           ` Eric Dumazet
  2013-03-26 16:42             ` Hannes Frederic Sowa
@ 2013-04-07 22:47             ` Hannes Frederic Sowa
  1 sibling, 0 replies; 9+ messages in thread
From: Hannes Frederic Sowa @ 2013-04-07 22:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, yannick, xiyou.wangcong, davem

I am still unsure if I should just drop this patch from my todo
list. Eric, may I ask you for additional input again? I try to
specifically answer your question now with code samples. If you do think
it is not worth the effort I will finally drop the patch from my queue.
Thanks a lot!

On Tue, Mar 26, 2013 at 08:53:38AM -0700, Eric Dumazet wrote:
> This opens the possibility of a sender to flood a receiver, instead of
> being blocked by its own sndbuf.

In unix_dgram_sendmsg there are two points where the sending process
could be prevented from delivery of the unix dgram msg:

a) The first one is at sock_alloc_send_pskb and checks if the sk_sndbuf
is smaller than the maximum buffer size. I didn't change anything here,
so it only blocks if sndbuf filled up.

b) The second one is where we check if the receiving sockets has less
than sk_max_ack_backlog of outstanding datagrams. This was checked by
unix_recvq_full. I changed the code that it does now check the status of
the other socket's receive buffer instead of the number of outstanding
datagrams in the receiver's queue. If the rcvbuf is full, it stops the
delivery of the message to the receiver (unchanged blocking/o_nonblock
behaviour):

| @@ -1559,7 +1601,7 @@ restart:
|                         goto out_unlock;
|         }
| 
| -       if (unix_peer(other) != sk && unix_recvq_full(other)) {
| +       if (unix_rmem_full(other, skb)) {
|                 if (!timeo) {
|                         err = -EAGAIN;
|                         goto out_unlock;
| 

This is unix_recvq_full:

| +static inline bool unix_rmem_full(struct sock const *sk,
| +                                 struct sk_buff const *skb)
| +{
| +       return sk_rmem_alloc_get(sk) + skb->truesize > sk->sk_rcvbuf;
| +}
| +

These checks ensure that a sending socket can not flood a receiver with
messages but instead has to back down.

The maximum rcvbuf size is taken from
/proc/sys/net/core/rmem_{default,max}, so we already have a safe default
setting (we could actually add seperate net/unix/rmem_{default,max} knobs).

This patch would help to prevent that a server socket in a
request/response kind of protocol could be stopped to answer to furhter
requests because its send buffer has filled up because other clients
did not read their messages yet. Instead it could handle this situation
for each client properly.

I also implemented the necessary changes for ->poll().

I tried to come up with a list what could change for user-space
applications but actually found this one only:

a) the SIOCOUTQ ioctl will report a different value: it won't report the
number of not yet received bytes by the other socket but the number of
not yet delivered bytes. I think this is rather harmless As the memory
overhead is accounted too and an application which does rely on this
feature would also have problems as soon as the kernel internal data
structures grow.

I hope I did not forget an important aspect of this change.

Thanks again,

  Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-04-07 22:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-04 23:14 [PATCH RFC] unix: account skb memory to receiving socket's sk_rmem_alloc on sending Hannes Frederic Sowa
2013-03-09 21:43 ` Hannes Frederic Sowa
2013-03-10  4:31   ` Eric Dumazet
2013-03-10  4:40     ` Hannes Frederic Sowa
2013-03-11 19:37       ` Hannes Frederic Sowa
2013-03-26  0:17         ` Hannes Frederic Sowa
2013-03-26 15:53           ` Eric Dumazet
2013-03-26 16:42             ` Hannes Frederic Sowa
2013-04-07 22:47             ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).