netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] UDP memory accounting and limitation (take 9)
@ 2007-11-28 18:48 Hideo AOKI
  2007-11-28 18:52 ` [PATCH 1/4] udp: fix send buffer check Hideo AOKI
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Hideo AOKI @ 2007-11-28 18:48 UTC (permalink / raw)
  To: Herbert Xu, netdev
  Cc: haoki, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

Hello,

This is the latest patch set of UDP memory accounting and limitation.

I implemented memory schedule functions for datagram protocols
like stream protocols. Moreover, to call memory schedule functions
from IP layer, I added a wrapper function, sk_wmem_schedule(), as
I proposed last week:

http://www.spinics.net/lists/netdev/msg47950.html

I would like to know whether this is acceptable.

In addition, I restructured patch set. I merged UDP IPv4 memory
accounting code at once and re-divided into introducing sysctls patch
and accounting patch. I think that new structure is easier to review
than previous ones.

The patch set is for net-2.6.

By the way, sk_forward_alloc support is my future work. I'm going
to do it if this patch set is acceptable.


Changelog take 8 -> take 9:
 * introduced mem_schdeule functions for datargram protocols
 * removed protocol check function, from patch set
 * restructured patch set

Changelog take 7 -> take 8:

 * sk_datagram_pages(): avoided using divide instruction
 * udp_recvmsg(): fixed referring released truesize in accounting


Changelog take 6 -> take 7:
 * renamed /proc/sys/net/ipv4/udp_rmem to
   /proc/sys/net/ipv4/udp_rmem_min
 * renamed /proc/sys/net/ipv4/udp_wmem to
   /proc/sys/net/ipv4/udp_wmem_min
 * rebased to net-2.6


Changelog take 5 -> take 6:

 * removed minimal limit of /proc/sys/net/ipv4/udp_mem
 * added udp_init() for default value calculation of parameters
 * added /proc/sys/net/ipv4/udp_rmem and
   /proc/sys/net/ipv4/udp_rmem
 * added limitation code to ip_ufo_append_data()
 * improved accounting for receiving packet
 * fixed typos
 * rebased to 2.6.24-rc1


Changelog take 4 -> take 5:

 * removing unnessesary EXPORT_SYMBOLs
 * adding minimal limit of /proc/sys/net/ipv4/udp_mem
 * bugfix of UDP limit affecting protocol other than UDP
 * introducing __ip_check_max_skb_pages()
 * using CTL_UNNUMBERED
 * adding udp_mem usage to Documentation/networking/ip_sysctl.txt


Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] udp: fix send buffer check
  2007-11-28 18:48 [PATCH 0/4] UDP memory accounting and limitation (take 9) Hideo AOKI
@ 2007-11-28 18:52 ` Hideo AOKI
  2007-11-28 18:52 ` [PATCH 2/4] datagram: mem_scheudle functions Hideo AOKI
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Hideo AOKI @ 2007-11-28 18:52 UTC (permalink / raw)
  To: Herbert Xu, netdev, haoki
  Cc: David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

This patch introduces sndbuf size check before memory allocation for
send buffer.

signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
signed-off-by: Hideo Aoki <haoki@redhat.com>
---

 ip_output.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-take9a2-p1/net/ipv4/ip_output.c
--- net-2.6/net/ipv4/ip_output.c	2007-11-14 10:49:06.000000000 -0500
+++ net-2.6-udp-take9a2-p1/net/ipv4/ip_output.c	2007-11-27 11:11:37.000000000 -0500
@@ -1004,6 +1004,11 @@ alloc_new_skb:
 					frag = &skb_shinfo(skb)->frags[i];
 				}
 			} else if (i < MAX_SKB_FRAGS) {
+				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
+				    > 2 * sk->sk_sndbuf) {
+					err = -ENOBUFS;
+					goto error;
+				}
 				if (copy > PAGE_SIZE)
 					copy = PAGE_SIZE;
 				page = alloc_pages(sk->sk_allocation, 0);
-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/4] datagram: mem_scheudle functions
  2007-11-28 18:48 [PATCH 0/4] UDP memory accounting and limitation (take 9) Hideo AOKI
  2007-11-28 18:52 ` [PATCH 1/4] udp: fix send buffer check Hideo AOKI
@ 2007-11-28 18:52 ` Hideo AOKI
  2007-12-01 12:09   ` Herbert Xu
  2007-11-28 18:53 ` [PATCH 3/4] udp: add udp_mem, udp_rmem_min and udp_wmem_min Hideo AOKI
  2007-11-28 18:53 ` [PATCH 4/4] udp: memory accounting in IPv4 Hideo AOKI
  3 siblings, 1 reply; 17+ messages in thread
From: Hideo AOKI @ 2007-11-28 18:52 UTC (permalink / raw)
  To: Herbert Xu, netdev
  Cc: David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

This patch introduces datagram memory accounting functions. Owing to
call memory schedule functions from IP layer, sk_wmem_schedule() is
also added.

Cc: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
signed-off-by: Hideo Aoki <haoki@redhat.com>
---

 include/net/sock.h  |   31 +++++++++++++++++++++++++++++++
 net/core/datagram.c |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff -pruN net-2.6-udp-take9a2-p1/include/net/sock.h net-2.6-udp-take9a2-p2/include/net/sock.h
--- net-2.6-udp-take9a2-p1/include/net/sock.h	2007-11-20 10:29:40.000000000 -0500
+++ net-2.6-udp-take9a2-p2/include/net/sock.h	2007-11-27 11:11:38.000000000 -0500
@@ -778,6 +778,37 @@ static inline int sk_stream_wmem_schedul
 	       sk_stream_mem_schedule(sk, size, 0);
 }

+extern int sk_datagram_mem_schedule(struct sock *sk, int size, int kind);
+
+#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
+
+static inline int sk_datagram_pages(int amt)
+{
+	/* Cast to unsigned as an optimization, since amt is always positive. */
+	return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
+}
+
+static inline int sk_datagram_rmem_schedule(struct sock *sk,
+					    struct sk_buff *skb)
+{
+	return sk_datagram_mem_schedule(sk, skb->truesize, 1);
+}
+
+static inline int sk_datagram_wmem_schedule(struct sock *sk, int size)
+{
+	return sk_datagram_mem_schedule(sk, size, 0);
+}
+
+static inline int sk_wmem_schedule(struct sock *sk, int size)
+{
+	if (sk->sk_type == SOCK_STREAM)
+		return sk_stream_wmem_schedule(sk, size);
+	else if (sk->sk_type == SOCK_DGRAM)
+		return sk_datagram_wmem_schedule(sk, size);
+	else
+		return 1;
+}
+
 /* Used by processes to "lock" a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff -pruN net-2.6-udp-take9a2-p1/net/core/datagram.c net-2.6-udp-take9a2-p2/net/core/datagram.c
--- net-2.6-udp-take9a2-p1/net/core/datagram.c	2007-11-14 10:49:06.000000000 -0500
+++ net-2.6-udp-take9a2-p2/net/core/datagram.c	2007-11-28 12:10:05.000000000 -0500
@@ -484,6 +484,39 @@ fault:
 }

 /**
+ * 	sk_datagram_mem_schedule - memory accounting for datagram protocls
+ *	@sk: socket
+ *	@size: memory size to allocate
+ *	@kind: allocation type
+ *
+ *	If kind is 0, it means wmem allocation. Otherwise it means rmem
+ *	allocation.
+ */
+int sk_datagram_mem_schedule(struct sock *sk, int size, int kind)
+{
+	int amt;
+	struct proto *prot = sk->sk_prot;
+
+	/* Don't account and limit memory if protocol doesn't support. */
+	if (prot->memory_allocated == NULL)
+		return 1;
+
+	amt = sk_datagram_pages(size);
+	atomic_add(amt, prot->memory_allocated);
+	if (kind &&
+	    (atomic_read(prot->memory_allocated) < prot->sysctl_mem[0] ||
+	     atomic_read(&sk->sk_rmem_alloc) + size < prot->sysctl_rmem[0]))
+		return 1;
+	else if (atomic_read(prot->memory_allocated) < prot->sysctl_mem[0] ||
+		 atomic_read(&sk->sk_wmem_alloc) + size < prot->sysctl_wmem[0])
+		return 1;
+
+	/* Undo changes. */
+	atomic_sub(amt, prot->memory_allocated);
+	return 0;
+}
+
+/**
  * 	datagram_poll - generic datagram poll
  *	@file: file struct
  *	@sock: socket
@@ -542,3 +575,4 @@ EXPORT_SYMBOL(skb_copy_and_csum_datagram
 EXPORT_SYMBOL(skb_copy_datagram_iovec);
 EXPORT_SYMBOL(skb_free_datagram);
 EXPORT_SYMBOL(skb_recv_datagram);
+EXPORT_SYMBOL(sk_datagram_mem_schedule);
-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] udp: add udp_mem, udp_rmem_min and udp_wmem_min
  2007-11-28 18:48 [PATCH 0/4] UDP memory accounting and limitation (take 9) Hideo AOKI
  2007-11-28 18:52 ` [PATCH 1/4] udp: fix send buffer check Hideo AOKI
  2007-11-28 18:52 ` [PATCH 2/4] datagram: mem_scheudle functions Hideo AOKI
@ 2007-11-28 18:53 ` Hideo AOKI
  2007-11-28 18:53 ` [PATCH 4/4] udp: memory accounting in IPv4 Hideo AOKI
  3 siblings, 0 replies; 17+ messages in thread
From: Hideo AOKI @ 2007-11-28 18:53 UTC (permalink / raw)
  To: Herbert Xu, netdev
  Cc: David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

This patch adds sysctl parameters for customizing UDP memory accounting:
     /proc/sys/net/ipv4/udp_mem
     /proc/sys/net/ipv4/udp_rmem_min
     /proc/sys/net/ipv4/udp_wmem_min

Udp_mem indicates number of pages which can be used for all UDP sockets.
Each UDP packet is dropped, when the number of pages for socket buffer is
beyond udp_mem and the socket already consumes minimum buffer.

This patch is also introduced memory_allocated variable for UDP protocol.

Cc: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
signed-off-by: Hideo Aoki <haoki@redhat.com>
---

 Documentation/networking/ip-sysctl.txt |   18 ++++++++++++++++++
 include/net/udp.h                      |    9 +++++++++
 net/ipv4/af_inet.c                     |    3 +++
 net/ipv4/proc.c                        |    3 ++-
 net/ipv4/sysctl_net_ipv4.c             |   31 +++++++++++++++++++++++++++++++
 net/ipv4/udp.c                         |   27 +++++++++++++++++++++++++++
 6 files changed, 90 insertions(+), 1 deletion(-)

diff -pruN net-2.6-udp-take9a2-p2/Documentation/networking/ip-sysctl.txt net-2.6-udp-take9a2-p3/Documentation/networking/ip-sysctl.txt
--- net-2.6-udp-take9a2-p2/Documentation/networking/ip-sysctl.txt	2007-11-14 10:48:49.000000000 -0500
+++ net-2.6-udp-take9a2-p3/Documentation/networking/ip-sysctl.txt	2007-11-28 12:11:02.000000000 -0500
@@ -446,6 +446,24 @@ tcp_dma_copybreak - INTEGER
 	and CONFIG_NET_DMA is enabled.
 	Default: 4096

+UDP variables:
+
+udp_mem - INTEGER
+	Number of pages allowed for queueing by all UDP sockets.
+	Default is calculated at boot time from amount of available memory.
+
+udp_rmem_min - INTEGER
+	Minimal size of receive buffer used by UDP sockets. Each UDP socket
+	is able to use the size for receiving data, even if total pages of UDP
+	sockets exceed udp_mem. The unit is byte.
+	Default: 4096
+
+udp_wmem_min - INTEGER
+	Minimal size of send buffer used by UDP sockets. Each UDP socket is
+	able to use the size for sending data, even if total pages of UDP
+	sockets exceed udp_mem. The unit is byte.
+	Default: 4096
+
 CIPSOv4 Variables:

 cipso_cache_enable - BOOLEAN
diff -pruN net-2.6-udp-take9a2-p2/include/net/udp.h net-2.6-udp-take9a2-p3/include/net/udp.h
--- net-2.6-udp-take9a2-p2/include/net/udp.h	2007-11-14 10:49:05.000000000 -0500
+++ net-2.6-udp-take9a2-p3/include/net/udp.h	2007-11-28 12:11:02.000000000 -0500
@@ -65,6 +65,13 @@ extern rwlock_t udp_hash_lock;

 extern struct proto udp_prot;

+extern atomic_t udp_memory_allocated;
+
+/* sysctl variables for udp */
+extern int sysctl_udp_mem;
+extern int sysctl_udp_rmem_min;
+extern int sysctl_udp_wmem_min;
+
 struct sk_buff;

 /*
@@ -173,4 +180,6 @@ extern void udp_proc_unregister(struct u
 extern int  udp4_proc_init(void);
 extern void udp4_proc_exit(void);
 #endif
+
+extern void udp_init(void);
 #endif	/* _UDP_H */
diff -pruN net-2.6-udp-take9a2-p2/net/ipv4/af_inet.c net-2.6-udp-take9a2-p3/net/ipv4/af_inet.c
--- net-2.6-udp-take9a2-p2/net/ipv4/af_inet.c	2007-11-14 10:49:06.000000000 -0500
+++ net-2.6-udp-take9a2-p3/net/ipv4/af_inet.c	2007-11-28 12:11:02.000000000 -0500
@@ -1418,6 +1418,9 @@ static int __init inet_init(void)
 	/* Setup TCP slab cache for open requests. */
 	tcp_init();

+	/* Setup UDP memory threshold */
+	udp_init();
+
 	/* Add UDP-Lite (RFC 3828) */
 	udplite4_register();

diff -pruN net-2.6-udp-take9a2-p2/net/ipv4/proc.c net-2.6-udp-take9a2-p3/net/ipv4/proc.c
--- net-2.6-udp-take9a2-p2/net/ipv4/proc.c	2007-11-14 10:49:07.000000000 -0500
+++ net-2.6-udp-take9a2-p3/net/ipv4/proc.c	2007-11-28 12:11:02.000000000 -0500
@@ -56,7 +56,8 @@ static int sockstat_seq_show(struct seq_
 		   sock_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
 		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
 		   atomic_read(&tcp_memory_allocated));
-	seq_printf(seq, "UDP: inuse %d\n", sock_prot_inuse(&udp_prot));
+	seq_printf(seq, "UDP: inuse %d mem %d\n", sock_prot_inuse(&udp_prot),
+		   atomic_read(&udp_memory_allocated));
 	seq_printf(seq, "UDPLITE: inuse %d\n", sock_prot_inuse(&udplite_prot));
 	seq_printf(seq, "RAW: inuse %d\n", sock_prot_inuse(&raw_prot));
 	seq_printf(seq,  "FRAG: inuse %d memory %d\n",
diff -pruN net-2.6-udp-take9a2-p2/net/ipv4/sysctl_net_ipv4.c net-2.6-udp-take9a2-p3/net/ipv4/sysctl_net_ipv4.c
--- net-2.6-udp-take9a2-p2/net/ipv4/sysctl_net_ipv4.c	2007-11-20 10:29:40.000000000 -0500
+++ net-2.6-udp-take9a2-p3/net/ipv4/sysctl_net_ipv4.c	2007-11-28 12:11:02.000000000 -0500
@@ -18,6 +18,7 @@
 #include <net/ip.h>
 #include <net/route.h>
 #include <net/tcp.h>
+#include <net/udp.h>
 #include <net/cipso_ipv4.h>
 #include <net/inet_frag.h>

@@ -885,6 +886,36 @@ ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_mem",
+		.data		= &sysctl_udp_mem,
+		.maxlen		= sizeof(sysctl_udp_mem),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_rmem_min",
+		.data		= &sysctl_udp_rmem_min,
+		.maxlen		= sizeof(sysctl_udp_rmem_min),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_wmem_min",
+		.data		= &sysctl_udp_wmem_min,
+		.maxlen		= sizeof(sysctl_udp_wmem_min),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
 	{ .ctl_name = 0 }
 };

diff -pruN net-2.6-udp-take9a2-p2/net/ipv4/udp.c net-2.6-udp-take9a2-p3/net/ipv4/udp.c
--- net-2.6-udp-take9a2-p2/net/ipv4/udp.c	2007-11-14 10:49:07.000000000 -0500
+++ net-2.6-udp-take9a2-p3/net/ipv4/udp.c	2007-11-28 12:11:02.000000000 -0500
@@ -82,6 +82,7 @@
 #include <asm/system.h>
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <linux/bootmem.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
 #include <linux/module.h>
@@ -114,6 +115,11 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta
 struct hlist_head udp_hash[UDP_HTABLE_SIZE];
 DEFINE_RWLOCK(udp_hash_lock);

+atomic_t udp_memory_allocated;
+int sysctl_udp_mem __read_mostly;
+int sysctl_udp_rmem_min __read_mostly;
+int sysctl_udp_wmem_min __read_mostly;
+
 static inline int __udp_lib_lport_inuse(__u16 num,
 					const struct hlist_head udptable[])
 {
@@ -1449,6 +1455,10 @@ struct proto udp_prot = {
 	.hash		   = udp_lib_hash,
 	.unhash		   = udp_lib_unhash,
 	.get_port	   = udp_v4_get_port,
+	.memory_allocated  = &udp_memory_allocated,
+	.sysctl_mem	   = &sysctl_udp_mem,
+	.sysctl_wmem	   = &sysctl_udp_wmem_min,
+	.sysctl_rmem	   = &sysctl_udp_rmem_min,
 	.obj_size	   = sizeof(struct udp_sock),
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_udp_setsockopt,
@@ -1644,6 +1654,23 @@ void udp4_proc_exit(void)
 }
 #endif /* CONFIG_PROC_FS */

+void __init udp_init(void)
+{
+	unsigned long limit;
+
+	/* Set the pressure threshold up by the same strategy of TCP. It is a
+	 * fraction of global memory that is up to 1/2 at 256 MB, decreasing
+	 * toward zero with the amount of memory, with a floor of 128 pages.
+	 */
+	limit = min(nr_all_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
+	limit = (limit * (nr_all_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = max(limit, 128UL);
+	sysctl_udp_mem = limit / 2 * 3;
+
+	sysctl_udp_rmem_min = SK_DATAGRAM_MEM_QUANTUM;
+	sysctl_udp_wmem_min = SK_DATAGRAM_MEM_QUANTUM;
+}
+
 EXPORT_SYMBOL(udp_disconnect);
 EXPORT_SYMBOL(udp_hash);
 EXPORT_SYMBOL(udp_hash_lock);
-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/4] udp: memory accounting in IPv4
  2007-11-28 18:48 [PATCH 0/4] UDP memory accounting and limitation (take 9) Hideo AOKI
                   ` (2 preceding siblings ...)
  2007-11-28 18:53 ` [PATCH 3/4] udp: add udp_mem, udp_rmem_min and udp_wmem_min Hideo AOKI
@ 2007-11-28 18:53 ` Hideo AOKI
  2007-12-01 12:21   ` Herbert Xu
  3 siblings, 1 reply; 17+ messages in thread
From: Hideo AOKI @ 2007-11-28 18:53 UTC (permalink / raw)
  To: Herbert Xu, netdev
  Cc: David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

This patch adds UDP memory usage accounting in IPv4.

Cc: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
signed-off-by: Hideo Aoki <haoki@redhat.com>
---

 af_inet.c   |   30 +++++++++++++++++++++++++++++-
 ip_output.c |   49 ++++++++++++++++++++++++++++++++++++++++++-------
 udp.c       |   16 ++++++++++++++++
 3 files changed, 87 insertions(+), 8 deletions(-)

diff -pruN net-2.6-udp-take9a2-p3/net/ipv4/af_inet.c net-2.6-udp-take9a2-p4/net/ipv4/af_inet.c
--- net-2.6-udp-take9a2-p3/net/ipv4/af_inet.c	2007-11-28 12:11:02.000000000 -0500
+++ net-2.6-udp-take9a2-p4/net/ipv4/af_inet.c	2007-11-28 12:11:04.000000000 -0500
@@ -126,13 +126,41 @@ extern void ip_mc_drop_socket(struct soc
 static struct list_head inetsw[SOCK_MAX];
 static DEFINE_SPINLOCK(inetsw_lock);

+/**
+ *	__skb_queue_purge_and_sub_memory_allocated
+ *		- empty a list and subtruct memory allocation counter
+ *	@sk:   sk
+ *	@list: list to empty
+ *	Delete all buffers on an &sk_buff list and subtruct the
+ *	truesize of the sk_buff for memory accounting. Each buffer
+ *	is removed from the list and one reference dropped. This
+ *	function does not take the list lock and the caller must
+ *	hold the relevant locks to use it.
+ */
+static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
+					struct sk_buff_head *list)
+{
+	struct sk_buff *skb;
+	int purged_skb_size = 0;
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		purged_skb_size += sk_datagram_pages(skb->truesize);
+		kfree_skb(skb);
+	}
+	atomic_sub(purged_skb_size, sk->sk_prot->memory_allocated);
+}
+
 /* New destruction routine */

 void inet_sock_destruct(struct sock *sk)
 {
 	struct inet_sock *inet = inet_sk(sk);

-	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_prot->memory_allocated && sk->sk_type != SOCK_STREAM)
+		__skb_queue_purge_and_sub_memory_allocated(sk,
+				&sk->sk_receive_queue);
+	else
+		__skb_queue_purge(&sk->sk_receive_queue);
+
 	__skb_queue_purge(&sk->sk_error_queue);

 	if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
diff -pruN net-2.6-udp-take9a2-p3/net/ipv4/ip_output.c net-2.6-udp-take9a2-p4/net/ipv4/ip_output.c
--- net-2.6-udp-take9a2-p3/net/ipv4/ip_output.c	2007-11-27 11:11:37.000000000 -0500
+++ net-2.6-udp-take9a2-p4/net/ipv4/ip_output.c	2007-11-28 12:11:09.000000000 -0500
@@ -75,6 +75,7 @@
 #include <net/icmp.h>
 #include <net/checksum.h>
 #include <net/inetpeer.h>
+#include <net/udp.h>
 #include <linux/igmp.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_bridge.h>
@@ -707,16 +708,19 @@ static inline int ip_ufo_append_data(str
 {
 	struct sk_buff *skb;
 	int err;
+	int size = 0;

 	/* There is support for UDP fragmentation offload by network
 	 * device, so create one single skb packet containing complete
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-		skb = sock_alloc_send_skb(sk,
-			hh_len + fragheaderlen + transhdrlen + 20,
-			(flags & MSG_DONTWAIT), &err);
+		size = hh_len + fragheaderlen + transhdrlen + 20;
+		if (!sk_wmem_schedule(sk, size))
+			return -ENOBUFS;

+		skb = sock_alloc_send_skb(sk, size, (flags & MSG_DONTWAIT),
+					  &err);
 		if (skb == NULL)
 			return err;

@@ -737,8 +741,12 @@ static inline int ip_ufo_append_data(str
 		sk->sk_sndmsg_off = 0;
 	}

-	err = skb_append_datato_frags(sk,skb, getfrag, from,
-			       (length - transhdrlen));
+	size = length - transhdrlen;
+	if (!sk_wmem_schedule(sk, size)) {
+		err = -ENOBUFS;
+		goto fail;
+	}
+	err = skb_append_datato_frags(sk, skb, getfrag, from, size);
 	if (!err) {
 		/* specify the length of each IP datagram fragment*/
 		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
@@ -750,6 +758,7 @@ static inline int ip_ufo_append_data(str
 	/* There is not enough support do UFO ,
 	 * so follow normal path
 	 */
+fail:
 	kfree_skb(skb);
 	return err;
 }
@@ -908,6 +917,12 @@ alloc_new_skb:
 			if (datalen == length + fraggap)
 				alloclen += rt->u.dst.trailer_len;

+			if (!sk_wmem_schedule(sk, alloclen + hh_len + 15 +
+					      sizeof(struct sk_buff))) {
+				err = -ENOBUFS;
+				goto error;
+			}
+
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len + 15,
@@ -1004,6 +1019,10 @@ alloc_new_skb:
 					frag = &skb_shinfo(skb)->frags[i];
 				}
 			} else if (i < MAX_SKB_FRAGS) {
+				if (!sk_wmem_schedule(sk, PAGE_SIZE)) {
+					err = -ENOBUFS;
+					goto error;
+				}
 				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
 				    > 2 * sk->sk_sndbuf) {
 					err = -ENOBUFS;
@@ -1119,6 +1138,12 @@ ssize_t	ip_append_page(struct sock *sk,
 			fraggap = skb_prev->len - maxfraglen;

 			alloclen = fragheaderlen + hh_len + fraggap + 15;
+
+			if (!sk_wmem_schedule(sk, alloclen +
+					      sizeof(struct sk_buff))) {
+				err = -ENOBUFS;
+				goto error;
+			}
 			skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation);
 			if (unlikely(!skb)) {
 				err = -ENOBUFS;
@@ -1213,13 +1238,14 @@ int ip_push_pending_frames(struct sock *
 	struct iphdr *iph;
 	__be16 df = 0;
 	__u8 ttl;
-	int err = 0;
+	int err = 0, send_page_size;

 	if ((skb = __skb_dequeue(&sk->sk_write_queue)) == NULL)
 		goto out;
 	tail_skb = &(skb_shinfo(skb)->frag_list);

 	/* move skb->data to ip header from ext header */
+	send_page_size = sk_datagram_pages(skb->truesize);
 	if (skb->data < skb_network_header(skb))
 		__skb_pull(skb, skb_network_offset(skb));
 	while ((tmp_skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) {
@@ -1229,6 +1255,7 @@ int ip_push_pending_frames(struct sock *
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
 		skb->truesize += tmp_skb->truesize;
+		send_page_size += sk_datagram_pages(tmp_skb->truesize);
 		__sock_put(tmp_skb->sk);
 		tmp_skb->destructor = NULL;
 		tmp_skb->sk = NULL;
@@ -1284,6 +1311,8 @@ int ip_push_pending_frames(struct sock *
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL,
 		      skb->dst->dev, dst_output);
+	if (sk->sk_prot->memory_allocated)
+		atomic_sub(send_page_size, sk->sk_prot->memory_allocated);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
@@ -1306,9 +1335,15 @@ error:
 void ip_flush_pending_frames(struct sock *sk)
 {
 	struct sk_buff *skb;
+	int num_flush_mem = 0;

-	while ((skb = __skb_dequeue_tail(&sk->sk_write_queue)) != NULL)
+	while ((skb = __skb_dequeue_tail(&sk->sk_write_queue)) != NULL) {
+		num_flush_mem += sk_datagram_pages(skb->truesize);
 		kfree_skb(skb);
+	}
+
+	if (sk->sk_prot->memory_allocated)
+		atomic_sub(num_flush_mem, sk->sk_prot->memory_allocated);

 	ip_cork_release(inet_sk(sk));
 }
diff -pruN net-2.6-udp-take9a2-p3/net/ipv4/udp.c net-2.6-udp-take9a2-p4/net/ipv4/udp.c
--- net-2.6-udp-take9a2-p3/net/ipv4/udp.c	2007-11-28 12:11:02.000000000 -0500
+++ net-2.6-udp-take9a2-p4/net/ipv4/udp.c	2007-11-28 12:11:09.000000000 -0500
@@ -833,6 +833,7 @@ int udp_recvmsg(struct kiocb *iocb, stru
 	unsigned int ulen, copied;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
+	int truesize;

 	/*
 	 *	Check any passed addresses
@@ -897,14 +898,18 @@ try_again:
 		err = ulen;

 out_free:
+	truesize = skb->truesize;
 	skb_free_datagram(sk, skb);
+	atomic_sub(sk_datagram_pages(truesize), sk->sk_prot->memory_allocated);
 out:
 	return err;

 csum_copy_err:
 	UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);

+	truesize = skb->truesize;
 	skb_kill_datagram(sk, skb, flags);
+	atomic_sub(sk_datagram_pages(truesize), sk->sk_prot->memory_allocated);

 	if (noblock)
 		return -EAGAIN;
@@ -946,6 +951,7 @@ int udp_queue_rcv_skb(struct sock * sk,
 {
 	struct udp_sock *up = udp_sk(sk);
 	int rc;
+	int scheduled = 0;

 	/*
 	 *	Charge it to the socket, dropping if the queue is full.
@@ -1022,6 +1028,13 @@ int udp_queue_rcv_skb(struct sock * sk,
 			goto drop;
 	}

+	if (sk_datagram_rmem_schedule(sk, skb))
+		scheduled = skb->truesize;
+	else {
+		UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+		goto drop;
+	}
+
 	if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
 		/* Note that an ENOMEM error is charged twice */
 		if (rc == -ENOMEM)
@@ -1035,6 +1048,9 @@ int udp_queue_rcv_skb(struct sock * sk,
 drop:
 	UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
 	kfree_skb(skb);
+	if (scheduled)
+		atomic_sub(sk_datagram_pages(scheduled),
+			   sk->sk_prot->memory_allocated);
 	return -1;
 }

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] datagram: mem_scheudle functions
  2007-11-28 18:52 ` [PATCH 2/4] datagram: mem_scheudle functions Hideo AOKI
@ 2007-12-01 12:09   ` Herbert Xu
  2007-12-04  0:10     ` Hideo AOKI
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-12-01 12:09 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

On Wed, Nov 28, 2007 at 01:52:59PM -0500, Hideo AOKI wrote:
>
> +static inline int sk_wmem_schedule(struct sock *sk, int size)
> +{
> +	if (sk->sk_type == SOCK_STREAM)
> +		return sk_stream_wmem_schedule(sk, size);
> +	else if (sk->sk_type == SOCK_DGRAM)
> +		return sk_datagram_wmem_schedule(sk, size);
> +	else
> +		return 1;
> +}

Why do we need this function? As far as I can see we always know
whether it's a stream or datagram socket at compile time so doing
a run-time test is pointless.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-11-28 18:53 ` [PATCH 4/4] udp: memory accounting in IPv4 Hideo AOKI
@ 2007-12-01 12:21   ` Herbert Xu
  2007-12-01 13:08     ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-12-01 12:21 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

On Wed, Nov 28, 2007 at 01:53:36PM -0500, Hideo AOKI wrote:
>
> +/**
> + *	__skb_queue_purge_and_sub_memory_allocated
> + *		- empty a list and subtruct memory allocation counter
> + *	@sk:   sk
> + *	@list: list to empty
> + *	Delete all buffers on an &sk_buff list and subtruct the
> + *	truesize of the sk_buff for memory accounting. Each buffer
> + *	is removed from the list and one reference dropped. This
> + *	function does not take the list lock and the caller must
> + *	hold the relevant locks to use it.
> + */
> +static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
> +					struct sk_buff_head *list)
> +{
> +	struct sk_buff *skb;
> +	int purged_skb_size = 0;
> +	while ((skb = __skb_dequeue(list)) != NULL) {
> +		purged_skb_size += sk_datagram_pages(skb->truesize);
> +		kfree_skb(skb);
> +	}
> +	atomic_sub(purged_skb_size, sk->sk_prot->memory_allocated);
> +}

Thanks, this is a lot better than before!

However, I'm still a little concerned about the effect of two more
atomic op's per packet that we're adding here.  Hang on a sec, that
should've been Dave's line since atomic ops are cheap on x86 :)

But seriously, it's not so much that we have two more atomic op's
per packet, but we have two more writes to a single global counter
for each packet.  This is going to really suck on SMP.

So what I'd like to see is a scheme that's similar to sk_forward_alloc.
The idea is that each socket allocates memory using mem_schedule and
then stores it in sk_forward_alloc.  Each packet then only has to
add to/subtract from sk_forward_alloc.

There is one big problem with this though, UDP is not serialised like
TCP.  So you can't just use sk_forward_alloc since it's not an atomic_t.

We'll need to think about this one a bit more.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-01 12:21   ` Herbert Xu
@ 2007-12-01 13:08     ` Eric Dumazet
  2007-12-01 13:16       ` Herbert Xu
  2007-12-04  0:14       ` Hideo AOKI
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2007-12-01 13:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Hideo AOKI, netdev, David Miller, Satoshi Oshima, Bill Fink,
	Andi Kleen, Evgeniy Polyakov, Stephen Hemminger, yoshfuji,
	Yumiko Sugita

Herbert Xu a écrit :
> On Wed, Nov 28, 2007 at 01:53:36PM -0500, Hideo AOKI wrote:
>> +/**
>> + *	__skb_queue_purge_and_sub_memory_allocated
>> + *		- empty a list and subtruct memory allocation counter
>> + *	@sk:   sk
>> + *	@list: list to empty
>> + *	Delete all buffers on an &sk_buff list and subtruct the
>> + *	truesize of the sk_buff for memory accounting. Each buffer
>> + *	is removed from the list and one reference dropped. This
>> + *	function does not take the list lock and the caller must
>> + *	hold the relevant locks to use it.
>> + */
>> +static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
>> +					struct sk_buff_head *list)
>> +{
>> +	struct sk_buff *skb;
>> +	int purged_skb_size = 0;
>> +	while ((skb = __skb_dequeue(list)) != NULL) {
>> +		purged_skb_size += sk_datagram_pages(skb->truesize);
>> +		kfree_skb(skb);
>> +	}
>> +	atomic_sub(purged_skb_size, sk->sk_prot->memory_allocated);
>> +}
> 
> Thanks, this is a lot better than before!
> 
> However, I'm still a little concerned about the effect of two more
> atomic op's per packet that we're adding here.  Hang on a sec, that
> should've been Dave's line since atomic ops are cheap on x86 :)
> 
> But seriously, it's not so much that we have two more atomic op's
> per packet, but we have two more writes to a single global counter
> for each packet.  This is going to really suck on SMP.
> 
> So what I'd like to see is a scheme that's similar to sk_forward_alloc.
> The idea is that each socket allocates memory using mem_schedule and
> then stores it in sk_forward_alloc.  Each packet then only has to
> add to/subtract from sk_forward_alloc.
> 
> There is one big problem with this though, UDP is not serialised like
> TCP.  So you can't just use sk_forward_alloc since it's not an atomic_t.
> 
> We'll need to think about this one a bit more.

I agree adding yet another atomics ops is a big problem.

Another idea, coupled with recent work on percpu storage done by Christoph 
Lameter, would be to use kind of a percpu_counter :

We dont really need strong and precise memory accounting (UDP , but TCP as 
well), just some kind of limit to avoid memory to be too much used.

That is, updating a percpu variable, and doing some updates to a global 
counter only when this percpu variable escapes from a given range.

Lot of contended cache lines could benefit from this relaxing (count of 
sockets...)

I would wait first that Christoph work is done, so that we dont need atomic 
ops on local cpu storage (and no need to disable preemption too).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-01 13:08     ` Eric Dumazet
@ 2007-12-01 13:16       ` Herbert Xu
  2007-12-04  0:14       ` Hideo AOKI
  1 sibling, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2007-12-01 13:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hideo AOKI, netdev, David Miller, Satoshi Oshima, Bill Fink,
	Andi Kleen, Evgeniy Polyakov, Stephen Hemminger, yoshfuji,
	Yumiko Sugita

On Sat, Dec 01, 2007 at 02:08:31PM +0100, Eric Dumazet wrote:
> 
> I agree adding yet another atomics ops is a big problem.
> 
> Another idea, coupled with recent work on percpu storage done by Christoph 
> Lameter, would be to use kind of a percpu_counter :

Yes that's an interesting idea.

> We dont really need strong and precise memory accounting (UDP , but TCP as 
> well), just some kind of limit to avoid memory to be too much used.

BTW it's no big deal for TCP because it's completely serialised so it
doesn't use atomic ops for the accounting.  More importantly, it uses
sk_forward_alloc so not every packet needs to touch the global counter.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] datagram: mem_scheudle functions
  2007-12-01 12:09   ` Herbert Xu
@ 2007-12-04  0:10     ` Hideo AOKI
  2007-12-15 14:45       ` Herbert Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Hideo AOKI @ 2007-12-04  0:10 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

Herbert Xu wrote:
> On Wed, Nov 28, 2007 at 01:52:59PM -0500, Hideo AOKI wrote:
>> +static inline int sk_wmem_schedule(struct sock *sk, int size)
>> +{
>> +	if (sk->sk_type == SOCK_STREAM)
>> +		return sk_stream_wmem_schedule(sk, size);
>> +	else if (sk->sk_type == SOCK_DGRAM)
>> +		return sk_datagram_wmem_schedule(sk, size);
>> +	else
>> +		return 1;
>> +}
> 
> Why do we need this function? As far as I can see we always know
> whether it's a stream or datagram socket at compile time so doing
> a run-time test is pointless.

Because we have to call wmem_schedule function in ip_append_data()
which is used by several protocols both stream and datagram.
I just thought adding the sk_wmem_schedule() was only way to call
proper function from ip_append_data().

Please let me know if I misunderstand or there is better way to
call wmem_schedule functions.

Best regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-01 13:08     ` Eric Dumazet
  2007-12-01 13:16       ` Herbert Xu
@ 2007-12-04  0:14       ` Hideo AOKI
  2007-12-04  0:26         ` Herbert Xu
  1 sibling, 1 reply; 17+ messages in thread
From: Hideo AOKI @ 2007-12-04  0:14 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

Eric Dumazet wrote:
> Herbert Xu a écrit :
>> However, I'm still a little concerned about the effect of two more
>> atomic op's per packet that we're adding here.  Hang on a sec, that
>> should've been Dave's line since atomic ops are cheap on x86 :)
>>
>> But seriously, it's not so much that we have two more atomic op's
>> per packet, but we have two more writes to a single global counter
>> for each packet.  This is going to really suck on SMP.
>>
>> So what I'd like to see is a scheme that's similar to sk_forward_alloc.
>> The idea is that each socket allocates memory using mem_schedule and
>> then stores it in sk_forward_alloc.  Each packet then only has to
>> add to/subtract from sk_forward_alloc.
>>
>> There is one big problem with this though, UDP is not serialised like
>> TCP.  So you can't just use sk_forward_alloc since it's not an atomic_t.
>>
>> We'll need to think about this one a bit more.
> 
> I agree adding yet another atomics ops is a big problem.
> 
> Another idea, coupled with recent work on percpu storage done by
> Christoph Lameter, would be to use kind of a percpu_counter :
> 
> We dont really need strong and precise memory accounting (UDP , but TCP
> as well), just some kind of limit to avoid memory to be too much used.
> 
> That is, updating a percpu variable, and doing some updates to a global
> counter only when this percpu variable escapes from a given range.
> 
> Lot of contended cache lines could benefit from this relaxing (count of
> sockets...)
> 
> I would wait first that Christoph work is done, so that we dont need
> atomic ops on local cpu storage (and no need to disable preemption too).

Thank you for your comments.
I understood your concern of atomic operations.

Let me try to use sk_forward_alloc at first, while percpu storage
is an interesting idea.

Many thanks,
Hideo

-- 
Hitachi Computer Products (America) Inc.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-04  0:14       ` Hideo AOKI
@ 2007-12-04  0:26         ` Herbert Xu
  2007-12-06  4:28           ` Hideo AOKI
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-12-04  0:26 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: Eric Dumazet, netdev, David Miller, Satoshi Oshima, Bill Fink,
	Andi Kleen, Evgeniy Polyakov, Stephen Hemminger, yoshfuji,
	Yumiko Sugita

On Mon, Dec 03, 2007 at 07:14:26PM -0500, Hideo AOKI wrote:
>
> Let me try to use sk_forward_alloc at first, while percpu storage
> is an interesting idea.

Actually I don't think sk_forward_alloc would work for UDP because
it runs lockless (unlike TCP which is run under a the socket lock).

So it's either going to be the atomic op or per-cpu counters.  For
me the atomic op isn't the issue, it's the SMP cache-line bouncing
that's more important so having something that did atomic ops on a
socket counter which then feeds into the global counter would solve
my concerns.

But let's wait and see what Dave has to say about this too.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-04  0:26         ` Herbert Xu
@ 2007-12-06  4:28           ` Hideo AOKI
  2007-12-10  9:22             ` Herbert Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Hideo AOKI @ 2007-12-06  4:28 UTC (permalink / raw)
  To: Herbert Xu, netdev, David Miller
  Cc: Eric Dumazet, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

Herbert Xu wrote:
> On Mon, Dec 03, 2007 at 07:14:26PM -0500, Hideo AOKI wrote:
>> Let me try to use sk_forward_alloc at first, while percpu storage
>> is an interesting idea.
> 
> Actually I don't think sk_forward_alloc would work for UDP because
> it runs lockless (unlike TCP which is run under a the socket lock).
> 
> So it's either going to be the atomic op or per-cpu counters.  For
> me the atomic op isn't the issue, it's the SMP cache-line bouncing
> that's more important so having something that did atomic ops on a
> socket counter which then feeds into the global counter would solve
> my concerns.
> 
> But let's wait and see what Dave has to say about this too.

Hello,

I suppose that he also wants to have per-socket accounting to avoid
global counter access.

To achieve this, I think I have three ideas at present. I'd
appreciate if you let me know acceptable idea or any suggestions.

1. Using sk_forward_alloc and adding socket lock

   UDP already uses a socket lock to send message. However, it doesn't
   use the lock to receive message. I wonder if we can also use the
   lock when sk_forward_alloc is updated in receive processing.
   I understand performance issue might occur, but ...


2. Adding new atomic_t variable for memory accounting

   Datagram protocols will use the variable for per-socket memory
   accounting. Stream protocols continue to use sk_forward_alloc.


3. Replacing current sk_forward_alloc with union type

   Stream protocols use the union as int, and datagram protocols use
   it as atomic_t.

Best regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-06  4:28           ` Hideo AOKI
@ 2007-12-10  9:22             ` Herbert Xu
  2007-12-11  1:28               ` Hideo AOKI
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-12-10  9:22 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: netdev, David Miller, Eric Dumazet, Satoshi Oshima, Bill Fink,
	Andi Kleen, Evgeniy Polyakov, Stephen Hemminger, yoshfuji,
	Yumiko Sugita

On Wed, Dec 05, 2007 at 11:28:34PM -0500, Hideo AOKI wrote:
>
> 1. Using sk_forward_alloc and adding socket lock
> 
>    UDP already uses a socket lock to send message. However, it doesn't
>    use the lock to receive message. I wonder if we can also use the
>    lock when sk_forward_alloc is updated in receive processing.
>    I understand performance issue might occur, but ...

Having discussed this with Dave we've agreed that this is the
best way to go.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] udp: memory accounting in IPv4
  2007-12-10  9:22             ` Herbert Xu
@ 2007-12-11  1:28               ` Hideo AOKI
  0 siblings, 0 replies; 17+ messages in thread
From: Hideo AOKI @ 2007-12-11  1:28 UTC (permalink / raw)
  To: Herbert Xu, David Miller
  Cc: netdev, Eric Dumazet, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita,
	haoki

Herbert Xu wrote:
> On Wed, Dec 05, 2007 at 11:28:34PM -0500, Hideo AOKI wrote:
>> 1. Using sk_forward_alloc and adding socket lock
>>
>>    UDP already uses a socket lock to send message. However, it doesn't
>>    use the lock to receive message. I wonder if we can also use the
>>    lock when sk_forward_alloc is updated in receive processing.
>>    I understand performance issue might occur, but ...
> 
> Having discussed this with Dave we've agreed that this is the
> best way to go.
> 
> Thanks,

Hello,

Thank you so much for reviewing.

I chose this solution and developed new patch set.
I'm testing the patch set right now.
I'll submit it to netdev as soon as I finish the test.

Best regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] datagram: mem_scheudle functions
  2007-12-04  0:10     ` Hideo AOKI
@ 2007-12-15 14:45       ` Herbert Xu
  2007-12-18 17:02         ` Hideo AOKI
  0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2007-12-15 14:45 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

On Mon, Dec 03, 2007 at 07:10:45PM -0500, Hideo AOKI wrote:
>
> Because we have to call wmem_schedule function in ip_append_data()
> which is used by several protocols both stream and datagram.
> I just thought adding the sk_wmem_schedule() was only way to call
> proper function from ip_append_data().

ip_append_data can't possibly work for stream protocols.  So
which stream protocol is calling it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] datagram: mem_scheudle functions
  2007-12-15 14:45       ` Herbert Xu
@ 2007-12-18 17:02         ` Hideo AOKI
  0 siblings, 0 replies; 17+ messages in thread
From: Hideo AOKI @ 2007-12-18 17:02 UTC (permalink / raw)
  To: Herbert Xu
  Cc: netdev, David Miller, Satoshi Oshima, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

Hello,

I really apologize for not replying to this mail sooner.

Herbert Xu wrote:
> On Mon, Dec 03, 2007 at 07:10:45PM -0500, Hideo AOKI wrote:
>> Because we have to call wmem_schedule function in ip_append_data()
>> which is used by several protocols both stream and datagram.
>> I just thought adding the sk_wmem_schedule() was only way to call
>> proper function from ip_append_data().
> 
> ip_append_data can't possibly work for stream protocols.  So
> which stream protocol is calling it?

I think TCP could call ip_append_data(). Here is possible call graph.

tcp_v4_rcv()
  -> tcp_v4_timewait_ack()
       -> tcp_v4_send_ack()
            -> ip_send_reply()
                 -> *ip_append_data()*

tcp_v4_reqsk_send_ack()
  -> tcp_v4_send_ack()
       -> ip_send_reply()
            -> *ip_append_data()*

tcp_v4_do_rcv()
  -> tcp_v4_send_reset()
       -> ip_send_reply()
            -> *ip_append_data()*


Regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-12-18 17:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-28 18:48 [PATCH 0/4] UDP memory accounting and limitation (take 9) Hideo AOKI
2007-11-28 18:52 ` [PATCH 1/4] udp: fix send buffer check Hideo AOKI
2007-11-28 18:52 ` [PATCH 2/4] datagram: mem_scheudle functions Hideo AOKI
2007-12-01 12:09   ` Herbert Xu
2007-12-04  0:10     ` Hideo AOKI
2007-12-15 14:45       ` Herbert Xu
2007-12-18 17:02         ` Hideo AOKI
2007-11-28 18:53 ` [PATCH 3/4] udp: add udp_mem, udp_rmem_min and udp_wmem_min Hideo AOKI
2007-11-28 18:53 ` [PATCH 4/4] udp: memory accounting in IPv4 Hideo AOKI
2007-12-01 12:21   ` Herbert Xu
2007-12-01 13:08     ` Eric Dumazet
2007-12-01 13:16       ` Herbert Xu
2007-12-04  0:14       ` Hideo AOKI
2007-12-04  0:26         ` Herbert Xu
2007-12-06  4:28           ` Hideo AOKI
2007-12-10  9:22             ` Herbert Xu
2007-12-11  1:28               ` Hideo AOKI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).