netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] UDP memory accounting and limitation (take 6)
@ 2007-10-29 21:18 Hideo AOKI
  2007-10-29 21:22 ` [PATCH 1/5] fix send buffer check Hideo AOKI
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:18 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

Hello,

This is the latest patch set of UDP memory accounting and limitation.

The number of pages for socket buffer is limited up to
/proc/sys/net/ipv4/udp_mem. I removed the minimal limit number
to use the feature from the former patch set (take5). And udp_init()
is introduced to calculate default value.

In addition, I added /proc/sys/net/ipv4/udp_rmem and
/proc/sys/net/ipv4/udp_wmem to be able to allocate minimum buffer to
each socket.

As a result, UDP packet is drooped when the number of pages for
socket buffer is beyond the limit and the socket already consumes
minimum buffer.

Detailed change log is below.

Changelog take 5 -> take 6:

  * removed minimal limit of /proc/sys/net/udp_mem
  * added udp_init() for default value calculation of parameters
  * added /proc/sys/net/udp_rmem and /proc/sys/net/udp_rmem
  * added limitation code to ip_ufo_append_data()
  * improved accounting for receiving packet
  * fixed typos
  * rebased to 2.6.24-rc1


Changelog take 4 -> take 5:

  * removing unnessesary EXPORT_SYMBOLs
  * adding minimal limit of /proc/sys/net/udp_mem
  * bugfix of UDP limit affecting protocol other than UDP
  * introducing __ip_check_max_skb_pages()
  * using CTL_UNNUMBERED
  * adding udp_mem usage to Documentation/networking/ip_sysctl.txt


Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] fix send buffer check
  2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
@ 2007-10-29 21:22 ` Hideo AOKI
  2007-11-09 12:24   ` Herbert Xu
  2007-10-29 21:23 ` [PATCH 2/5] accounting unit and variable Hideo AOKI
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:22 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

[-- Attachment #1: Type: text/plain, Size: 138 bytes --]

This patch introduces sndbuf size check before memory allocation for
send buffer.

--
Hideo Aoki
Hitachi Computer Products (America) Inc.

[-- Attachment #2: ip_append_data_sndbuf_check.patch --]
[-- Type: text/x-patch, Size: 802 bytes --]


Signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>

 ip_output.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -pruN linux-2.6.24-rc1/net/ipv4/ip_output.c linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/ip_output.c
--- linux-2.6.24-rc1/net/ipv4/ip_output.c	2007-10-24 11:34:34.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/ip_output.c	2007-10-24 11:46:43.000000000 -0400
@@ -1004,6 +1004,11 @@ alloc_new_skb:
 					frag = &skb_shinfo(skb)->frags[i];
 				}
 			} else if (i < MAX_SKB_FRAGS) {
+				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
+				    > 2 * sk->sk_sndbuf) {
+					err = -ENOBUFS;
+					goto error;
+				}
 				if (copy > PAGE_SIZE)
 					copy = PAGE_SIZE;
 				page = alloc_pages(sk->sk_allocation, 0);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] accounting unit and variable
  2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
  2007-10-29 21:22 ` [PATCH 1/5] fix send buffer check Hideo AOKI
@ 2007-10-29 21:23 ` Hideo AOKI
  2007-11-09 12:34   ` Herbert Xu
  2007-10-29 21:23 ` [PATCH 3/5] memory accounting Hideo AOKI
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:23 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

[-- Attachment #1: Type: text/plain, Size: 139 bytes --]

This patch introduces global variable for UDP memory accounting.
The unit is page.

--
Hideo Aoki
Hitachi Computer Products (America) Inc.

[-- Attachment #2: udp_memory_account_common.patch --]
[-- Type: text/x-patch, Size: 2960 bytes --]


Signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>

 include/net/sock.h |    7 +++++++
 include/net/udp.h  |    2 ++
 net/ipv4/proc.c    |    3 ++-
 net/ipv4/udp.c     |    2 ++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p1/include/net/sock.h linux-2.6.24-rc1-mem003-ipv4-dev-p2/include/net/sock.h
--- linux-2.6.24-rc1-mem003-ipv4-dev-p1/include/net/sock.h	2007-10-24 11:34:32.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p2/include/net/sock.h	2007-10-24 11:47:51.000000000 -0400
@@ -727,6 +727,13 @@ static inline int sk_stream_wmem_schedul
 	       sk_stream_mem_schedule(sk, size, 0);
 }
 
+#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
+
+static inline int sk_datagram_pages(int amt)
+{
+	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
+}
+
 /* Used by processes to "lock" a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p1/include/net/udp.h linux-2.6.24-rc1-mem003-ipv4-dev-p2/include/net/udp.h
--- linux-2.6.24-rc1-mem003-ipv4-dev-p1/include/net/udp.h	2007-10-10 11:49:42.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p2/include/net/udp.h	2007-10-24 11:47:51.000000000 -0400
@@ -65,6 +65,8 @@ extern rwlock_t udp_hash_lock;
 
 extern struct proto udp_prot;
 
+extern atomic_t udp_memory_allocated;
+
 struct sk_buff;
 
 /*
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/proc.c linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/proc.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/proc.c	2007-10-24 11:34:34.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/proc.c	2007-10-24 11:47:51.000000000 -0400
@@ -67,7 +67,8 @@ static int sockstat_seq_show(struct seq_
 		   fold_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
 		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
 		   atomic_read(&tcp_memory_allocated));
-	seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+	seq_printf(seq, "UDP: inuse %d mem %d\n", fold_prot_inuse(&udp_prot),
+		   atomic_read(&udp_memory_allocated));
 	seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
 	seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
 	seq_printf(seq,  "FRAG: inuse %d memory %d\n",
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/udp.c linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/udp.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p1/net/ipv4/udp.c	2007-10-24 11:34:35.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/udp.c	2007-10-24 12:27:37.000000000 -0400
@@ -114,6 +114,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta
 struct hlist_head udp_hash[UDP_HTABLE_SIZE];
 DEFINE_RWLOCK(udp_hash_lock);
 
+atomic_t udp_memory_allocated;
+
 static inline int __udp_lib_lport_inuse(__u16 num,
 					const struct hlist_head udptable[])
 {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] memory accounting
  2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
  2007-10-29 21:22 ` [PATCH 1/5] fix send buffer check Hideo AOKI
  2007-10-29 21:23 ` [PATCH 2/5] accounting unit and variable Hideo AOKI
@ 2007-10-29 21:23 ` Hideo AOKI
  2007-11-09 13:07   ` Herbert Xu
  2007-10-29 21:23 ` [PATCH 4/5] memory limitation by using udp_mem Hideo AOKI
  2007-10-29 21:23 ` [PATCH 5/5] introduce udp_rmem and udp_wmem Hideo AOKI
  4 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:23 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

[-- Attachment #1: Type: text/plain, Size: 109 bytes --]

This patch adds UDP memory usage accounting in IPv4.

--
Hideo Aoki
Hitachi Computer Products (America) Inc.

[-- Attachment #2: udp_memory_account_ipv4.patch --]
[-- Type: text/x-patch, Size: 6589 bytes --]


Signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>

 af_inet.c   |   30 +++++++++++++++++++++++++++++-
 ip_output.c |   25 ++++++++++++++++++++++---
 udp.c       |    9 +++++++++
 3 files changed, 60 insertions(+), 4 deletions(-)

diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/af_inet.c linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/af_inet.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/af_inet.c	2007-10-24 11:34:34.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/af_inet.c	2007-10-24 19:10:27.000000000 -0400
@@ -126,13 +126,41 @@ extern void ip_mc_drop_socket(struct soc
 static struct list_head inetsw[SOCK_MAX];
 static DEFINE_SPINLOCK(inetsw_lock);
 
+/**
+ *	__skb_queue_purge_and_sub_memory_allocated
+ *		- empty a list and subtruct memory allocation counter
+ *	@sk:   sk
+ *	@list: list to empty
+ *	Delete all buffers on an &sk_buff list and subtruct the
+ *	truesize of the sk_buff for memory accounting. Each buffer
+ *	is removed from the list and one reference dropped. This
+ *	function does not take the list lock and the caller must
+ *	hold the relevant locks to use it.
+ */
+static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
+					struct sk_buff_head *list)
+{
+	struct sk_buff *skb;
+	int purged_skb_size = 0;
+	while ((skb = __skb_dequeue(list)) != NULL) {
+		purged_skb_size += sk_datagram_pages(skb->truesize);
+		kfree_skb(skb);
+	}
+	atomic_sub(purged_skb_size, sk->sk_prot->memory_allocated);
+}
+
 /* New destruction routine */
 
 void inet_sock_destruct(struct sock *sk)
 {
 	struct inet_sock *inet = inet_sk(sk);
 
-	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_prot->memory_allocated && sk->sk_type != SOCK_STREAM)
+		__skb_queue_purge_and_sub_memory_allocated(sk,
+				&sk->sk_receive_queue);
+	else
+		__skb_queue_purge(&sk->sk_receive_queue);
+
 	__skb_queue_purge(&sk->sk_error_queue);
 
 	if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/ip_output.c linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/ip_output.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/ip_output.c	2007-10-24 11:46:43.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/ip_output.c	2007-10-24 12:31:47.000000000 -0400
@@ -743,6 +743,8 @@ static inline int ip_ufo_append_data(str
 		/* specify the length of each IP datagram fragment*/
 		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+		atomic_add(sk_datagram_pages(skb->truesize),
+			   sk->sk_prot->memory_allocated);
 		__skb_queue_tail(&sk->sk_write_queue, skb);
 
 		return 0;
@@ -924,6 +926,9 @@ alloc_new_skb:
 			}
 			if (skb == NULL)
 				goto error;
+			if (sk->sk_prot->memory_allocated)
+				atomic_add(sk_datagram_pages(skb->truesize),
+					   sk->sk_prot->memory_allocated);
 
 			/*
 			 *	Fill in the control structures
@@ -1023,6 +1028,8 @@ alloc_new_skb:
 				frag = &skb_shinfo(skb)->frags[i];
 				skb->truesize += PAGE_SIZE;
 				atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc);
+				if (sk->sk_prot->memory_allocated)
+					atomic_inc(sk->sk_prot->memory_allocated);
 			} else {
 				err = -EMSGSIZE;
 				goto error;
@@ -1123,7 +1130,9 @@ ssize_t	ip_append_page(struct sock *sk, 
 			if (unlikely(!skb)) {
 				err = -ENOBUFS;
 				goto error;
-			}
+			} else if (sk->sk_prot->memory_allocated)
+				atomic_add(sk_datagram_pages(skb->truesize),
+					   sk->sk_prot->memory_allocated);
 
 			/*
 			 *	Fill in the control structures
@@ -1202,13 +1211,14 @@ int ip_push_pending_frames(struct sock *
 	struct iphdr *iph;
 	__be16 df = 0;
 	__u8 ttl;
-	int err = 0;
+	int err = 0, send_page_size;
 
 	if ((skb = __skb_dequeue(&sk->sk_write_queue)) == NULL)
 		goto out;
 	tail_skb = &(skb_shinfo(skb)->frag_list);
 
 	/* move skb->data to ip header from ext header */
+	send_page_size = sk_datagram_pages(skb->truesize);
 	if (skb->data < skb_network_header(skb))
 		__skb_pull(skb, skb_network_offset(skb));
 	while ((tmp_skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) {
@@ -1218,6 +1228,7 @@ int ip_push_pending_frames(struct sock *
 		skb->len += tmp_skb->len;
 		skb->data_len += tmp_skb->len;
 		skb->truesize += tmp_skb->truesize;
+		send_page_size += sk_datagram_pages(tmp_skb->truesize);
 		__sock_put(tmp_skb->sk);
 		tmp_skb->destructor = NULL;
 		tmp_skb->sk = NULL;
@@ -1273,6 +1284,8 @@ int ip_push_pending_frames(struct sock *
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL,
 		      skb->dst->dev, dst_output);
+	if (sk->sk_prot->memory_allocated)
+		atomic_sub(send_page_size, sk->sk_prot->memory_allocated);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
@@ -1302,9 +1315,15 @@ void ip_flush_pending_frames(struct sock
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct sk_buff *skb;
+	int num_flush_mem = 0;
 
-	while ((skb = __skb_dequeue_tail(&sk->sk_write_queue)) != NULL)
+	while ((skb = __skb_dequeue_tail(&sk->sk_write_queue)) != NULL) {
+		num_flush_mem += sk_datagram_pages(skb->truesize);
 		kfree_skb(skb);
+	}
+
+	if (sk->sk_prot->memory_allocated)
+		atomic_sub(num_flush_mem, sk->sk_prot->memory_allocated);
 
 	inet->cork.flags &= ~IPCORK_OPT;
 	kfree(inet->cork.opt);
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/udp.c linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/udp.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p2/net/ipv4/udp.c	2007-10-24 12:27:37.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/udp.c	2007-10-26 19:43:20.000000000 -0400
@@ -894,6 +894,9 @@ try_again:
 
 out_free:
 	skb_free_datagram(sk, skb);
+	atomic_sub(sk_datagram_pages(skb->truesize),
+		   sk->sk_prot->memory_allocated);
+
 out:
 	return err;
 
@@ -901,6 +904,8 @@ csum_copy_err:
 	UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
 
 	skb_kill_datagram(sk, skb, flags);
+	atomic_sub(sk_datagram_pages(skb->truesize),
+		   sk->sk_prot->memory_allocated);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1025,6 +1030,9 @@ int udp_queue_rcv_skb(struct sock * sk, 
 		goto drop;
 	}
 
+	atomic_add(sk_datagram_pages(skb->truesize),
+		   sk->sk_prot->memory_allocated);
+
 	UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
 	return 0;
 
@@ -1449,6 +1457,7 @@ struct proto udp_prot = {
 	.hash		   = udp_lib_hash,
 	.unhash		   = udp_lib_unhash,
 	.get_port	   = udp_v4_get_port,
+	.memory_allocated  = &udp_memory_allocated,
 	.obj_size	   = sizeof(struct udp_sock),
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_udp_setsockopt,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/5] memory limitation by using udp_mem
  2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
                   ` (2 preceding siblings ...)
  2007-10-29 21:23 ` [PATCH 3/5] memory accounting Hideo AOKI
@ 2007-10-29 21:23 ` Hideo AOKI
  2007-10-29 21:23 ` [PATCH 5/5] introduce udp_rmem and udp_wmem Hideo AOKI
  4 siblings, 0 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:23 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

[-- Attachment #1: Type: text/plain, Size: 105 bytes --]

This patch introduces memory limitation for UDP.

--
Hideo Aoki
Hitachi Computer Products (America) Inc.

[-- Attachment #2: udp_memory_limit_ipv4.patch --]
[-- Type: text/x-patch, Size: 8842 bytes --]


Signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>

 Documentation/networking/ip-sysctl.txt |    6 ++++
 include/net/udp.h                      |    3 ++
 net/ipv4/af_inet.c                     |    3 ++
 net/ipv4/ip_output.c                   |   47 ++++++++++++++++++++++++++++++---
 net/ipv4/sysctl_net_ipv4.c             |   11 +++++++
 net/ipv4/udp.c                         |   24 ++++++++++++++++
 6 files changed, 91 insertions(+), 3 deletions(-)

diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/Documentation/networking/ip-sysctl.txt linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/Documentation/networking/ip-sysctl.txt	2007-10-24 11:33:58.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt	2007-10-26 20:35:52.000000000 -0400
@@ -446,6 +446,12 @@ tcp_dma_copybreak - INTEGER
 	and CONFIG_NET_DMA is enabled.
 	Default: 4096
 
+UDP variables:
+
+udp_mem - INTEGER
+	Number of pages allowed for queueing by all UDP sockets.
+	Default is calculated at boot time from amount of available memory.
+
 CIPSOv4 Variables:
 
 cipso_cache_enable - BOOLEAN
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/include/net/udp.h linux-2.6.24-rc1-mem003-ipv4-dev-p4/include/net/udp.h
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/include/net/udp.h	2007-10-24 11:47:51.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/include/net/udp.h	2007-10-26 20:35:52.000000000 -0400
@@ -66,6 +66,7 @@ extern rwlock_t udp_hash_lock;
 extern struct proto udp_prot;
 
 extern atomic_t udp_memory_allocated;
+extern int sysctl_udp_mem;
 
 struct sk_buff;
 
@@ -175,4 +176,6 @@ extern void udp_proc_unregister(struct u
 extern int  udp4_proc_init(void);
 extern void udp4_proc_exit(void);
 #endif
+
+extern void udp_init(void);
 #endif	/* _UDP_H */
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/af_inet.c linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/af_inet.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/af_inet.c	2007-10-24 19:10:27.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/af_inet.c	2007-10-26 20:35:52.000000000 -0400
@@ -1446,6 +1446,9 @@ static int __init inet_init(void)
 	/* Setup TCP slab cache for open requests. */
 	tcp_init();
 
+	/* Setup UDP memory threshold */
+	udp_init();
+
 	/* Add UDP-Lite (RFC 3828) */
 	udplite4_register();
 
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/ip_output.c linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/ip_output.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/ip_output.c	2007-10-24 12:31:47.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/ip_output.c	2007-10-29 09:36:32.000000000 -0400
@@ -75,6 +75,7 @@
 #include <net/icmp.h>
 #include <net/checksum.h>
 #include <net/inetpeer.h>
+#include <net/udp.h>
 #include <linux/igmp.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_bridge.h>
@@ -699,6 +700,20 @@ csum_page(struct page *page, int offset,
 	return csum;
 }
 
+static inline int __ip_check_max_skb_pages(struct sock *sk, int size)
+{
+	switch(sk->sk_protocol) {
+	case IPPROTO_UDP:
+		if (atomic_read(sk->sk_prot->memory_allocated) + size
+		    > sk->sk_prot->sysctl_mem[0])
+			return -ENOBUFS;
+		/* Fall through */	
+	default:
+		break;
+	}
+	return 0;
+}
+
 static inline int ip_ufo_append_data(struct sock *sk,
 			int getfrag(void *from, char *to, int offset, int len,
 			       int odd, struct sk_buff *skb),
@@ -707,16 +722,20 @@ static inline int ip_ufo_append_data(str
 {
 	struct sk_buff *skb;
 	int err;
+	int size = 0;
 
 	/* There is support for UDP fragmentation offload by network
 	 * device, so create one single skb packet containing complete
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-		skb = sock_alloc_send_skb(sk,
-			hh_len + fragheaderlen + transhdrlen + 20,
-			(flags & MSG_DONTWAIT), &err);
+		size = hh_len + fragheaderlen + transhdrlen + 20;
+		err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size));
+		if (err)
+			return err;
 
+		skb = sock_alloc_send_skb(sk, size, (flags & MSG_DONTWAIT),
+					  &err);
 		if (skb == NULL)
 			return err;
 
@@ -737,6 +756,10 @@ static inline int ip_ufo_append_data(str
 		sk->sk_sndmsg_off = 0;
 	}
 
+	err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size + length - 
+							     transhdrlen));
+	if (err)
+		goto fail;
 	err = skb_append_datato_frags(sk,skb, getfrag, from,
 			       (length - transhdrlen));
 	if (!err) {
@@ -752,6 +775,7 @@ static inline int ip_ufo_append_data(str
 	/* There is not enough support do UFO ,
 	 * so follow normal path
 	 */
+fail:
 	kfree_skb(skb);
 	return err;
 }
@@ -910,6 +934,12 @@ alloc_new_skb:
 			if (datalen == length + fraggap)
 				alloclen += rt->u.dst.trailer_len;
 
+			err = __ip_check_max_skb_pages(sk,
+				sk_datagram_pages(SKB_DATA_ALIGN(alloclen + hh_len + 15)
+				+ sizeof(struct sk_buff)));
+			if (err)
+				goto error;
+
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len + 15,
@@ -1009,6 +1039,11 @@ alloc_new_skb:
 					frag = &skb_shinfo(skb)->frags[i];
 				}
 			} else if (i < MAX_SKB_FRAGS) {
+				err = __ip_check_max_skb_pages(sk,
+					sk_datagram_pages(PAGE_SIZE));
+				if (err)
+					goto error;
+
 				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
 				    > 2 * sk->sk_sndbuf) {
 					err = -ENOBUFS;
@@ -1126,6 +1161,12 @@ ssize_t	ip_append_page(struct sock *sk, 
 			fraggap = skb_prev->len - maxfraglen;
 
 			alloclen = fragheaderlen + hh_len + fraggap + 15;
+
+			err = __ip_check_max_skb_pages(sk,
+				sk_datagram_pages(alloclen + sizeof(struct sk_buff)));
+			if (err)
+				goto error;
+
 			skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation);
 			if (unlikely(!skb)) {
 				err = -ENOBUFS;
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/sysctl_net_ipv4.c linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/sysctl_net_ipv4.c	2007-10-24 11:34:34.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/sysctl_net_ipv4.c	2007-10-26 20:35:52.000000000 -0400
@@ -18,6 +18,7 @@
 #include <net/ip.h>
 #include <net/route.h>
 #include <net/tcp.h>
+#include <net/udp.h>
 #include <net/cipso_ipv4.h>
 #include <net/inet_frag.h>
 
@@ -885,6 +886,16 @@ ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_mem",
+		.data		= &sysctl_udp_mem,
+		.maxlen		= sizeof(sysctl_udp_mem),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
 	{ .ctl_name = 0 }
 };
 
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/udp.c linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/udp.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p3/net/ipv4/udp.c	2007-10-26 19:43:20.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/udp.c	2007-10-26 20:35:52.000000000 -0400
@@ -82,6 +82,7 @@
 #include <asm/system.h>
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <linux/bootmem.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
 #include <linux/module.h>
@@ -115,6 +116,7 @@ struct hlist_head udp_hash[UDP_HTABLE_SI
 DEFINE_RWLOCK(udp_hash_lock);
 
 atomic_t udp_memory_allocated;
+int sysctl_udp_mem __read_mostly;
 
 static inline int __udp_lib_lport_inuse(__u16 num,
 					const struct hlist_head udptable[])
@@ -1023,6 +1025,13 @@ int udp_queue_rcv_skb(struct sock * sk, 
 			goto drop;
 	}
 
+	if ((atomic_read(sk->sk_prot->memory_allocated)
+		       + sk_datagram_pages(skb->truesize))
+		> sk->sk_prot->sysctl_mem[0]) {
+		UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+		goto drop;
+	}
+
 	if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
 		/* Note that an ENOMEM error is charged twice */
 		if (rc == -ENOMEM)
@@ -1458,6 +1467,7 @@ struct proto udp_prot = {
 	.unhash		   = udp_lib_unhash,
 	.get_port	   = udp_v4_get_port,
 	.memory_allocated  = &udp_memory_allocated,
+	.sysctl_mem	   = &sysctl_udp_mem,
 	.obj_size	   = sizeof(struct udp_sock),
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_udp_setsockopt,
@@ -1652,6 +1662,20 @@ void udp4_proc_exit(void)
 }
 #endif /* CONFIG_PROC_FS */
 
+void __init udp_init(void)
+{
+	unsigned long limit;
+
+	/* Set the pressure threshold up by the same strategy of TCP. It is a
+	 * fraction of global memory that is up to 1/2 at 256 MB, decreasing
+	 * toward zero with the amount of memory, with a floor of 128 pages.
+	 */
+	limit = min(nr_all_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
+	limit = (limit * (nr_all_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = max(limit, 128UL);
+	sysctl_udp_mem = limit / 2 * 3;
+}
+
 EXPORT_SYMBOL(udp_disconnect);
 EXPORT_SYMBOL(udp_hash);
 EXPORT_SYMBOL(udp_hash_lock);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/5] introduce udp_rmem and udp_wmem
  2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
                   ` (3 preceding siblings ...)
  2007-10-29 21:23 ` [PATCH 4/5] memory limitation by using udp_mem Hideo AOKI
@ 2007-10-29 21:23 ` Hideo AOKI
  2007-10-30  4:52   ` Bill Fink
  4 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-10-29 21:23 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Satoshi Oshima, Herbert Xu, Andi Kleen, Stephen Hemminger,
	Evgeniy Polyakov, yoshfuji, Yumiko Sugita

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

This patch added /proc/sys/net/udp_rmem and /proc/sys/net/udp_rmem.
Each UDP packet is drooped when the number of pages for socket buffer
is beyond the limit and the socket already consumes minimum buffer.

--
Hideo Aoki
Hitachi Computer Products (America) Inc.

[-- Attachment #2: add_udp_rmem_wmem.patch --]
[-- Type: text/x-patch, Size: 5434 bytes --]


Cc: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>

 Documentation/networking/ip-sysctl.txt |   12 ++++++++++++
 include/net/udp.h                      |    4 ++++
 net/ipv4/ip_output.c                   |    4 +++-
 net/ipv4/sysctl_net_ipv4.c             |   20 ++++++++++++++++++++
 net/ipv4/udp.c                         |   13 +++++++++++--
 5 files changed, 50 insertions(+), 3 deletions(-)

diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt linux-2.6.24-rc1-mem003-ipv4-dev-p5/Documentation/networking/ip-sysctl.txt
--- linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt	2007-10-26 20:35:52.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/Documentation/networking/ip-sysctl.txt	2007-10-29 09:44:05.000000000 -0400
@@ -452,6 +452,18 @@ udp_mem - INTEGER
 	Number of pages allowed for queueing by all UDP sockets.
 	Default is calculated at boot time from amount of available memory.
 
+udp_rmem - INTEGER
+	Minimal size of receive buffer used by UDP sockets. Each UDP socket
+	is able to use the size for receiving data, even if total pages of UDP
+	sockets exceed udp_mem. The unit is byte.
+	Default: 4096
+
+udp_wmem - INTEGER
+	Minimal size of send buffer used by UDP sockets. Each UDP socket is
+	able to use the size for sending data, even if total pages of UDP
+	sockets exceed udp_mem. The unit is byte.
+	Default: 4096
+
 CIPSOv4 Variables:
 
 cipso_cache_enable - BOOLEAN
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/include/net/udp.h linux-2.6.24-rc1-mem003-ipv4-dev-p5/include/net/udp.h
--- linux-2.6.24-rc1-mem003-ipv4-dev-p4/include/net/udp.h	2007-10-26 20:35:52.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/include/net/udp.h	2007-10-29 09:44:05.000000000 -0400
@@ -66,7 +66,11 @@ extern rwlock_t udp_hash_lock;
 extern struct proto udp_prot;
 
 extern atomic_t udp_memory_allocated;
+
+/* sysctl variables for udp */
 extern int sysctl_udp_mem;
+extern int sysctl_udp_rmem;
+extern int sysctl_udp_wmem;
 
 struct sk_buff;
 
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/ip_output.c linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/ip_output.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/ip_output.c	2007-10-29 09:36:32.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/ip_output.c	2007-10-29 09:44:05.000000000 -0400
@@ -705,7 +705,9 @@ static inline int __ip_check_max_skb_pag
 	switch(sk->sk_protocol) {
 	case IPPROTO_UDP:
 		if (atomic_read(sk->sk_prot->memory_allocated) + size
-		    > sk->sk_prot->sysctl_mem[0])
+		    > sk->sk_prot->sysctl_mem[0] &&
+		    atomic_read(&sk->sk_wmem_alloc) + size
+		    > sk->sk_prot->sysctl_wmem[0])
 			return -ENOBUFS;
 		/* Fall through */	
 	default:
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/sysctl_net_ipv4.c linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/sysctl_net_ipv4.c	2007-10-26 20:35:52.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/sysctl_net_ipv4.c	2007-10-29 09:44:05.000000000 -0400
@@ -896,6 +896,26 @@ ctl_table ipv4_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &zero
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_rmem",
+		.data		= &sysctl_udp_rmem,
+		.maxlen		= sizeof(sysctl_udp_rmem),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "udp_wmem",
+		.data		= &sysctl_udp_wmem,
+		.maxlen		= sizeof(sysctl_udp_wmem),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
 	{ .ctl_name = 0 }
 };
 
diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/udp.c linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/udp.c
--- linux-2.6.24-rc1-mem003-ipv4-dev-p4/net/ipv4/udp.c	2007-10-26 20:35:52.000000000 -0400
+++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/net/ipv4/udp.c	2007-10-29 09:44:05.000000000 -0400
@@ -117,6 +117,8 @@ DEFINE_RWLOCK(udp_hash_lock);
 
 atomic_t udp_memory_allocated;
 int sysctl_udp_mem __read_mostly;
+int sysctl_udp_rmem __read_mostly;
+int sysctl_udp_wmem __read_mostly;
 
 static inline int __udp_lib_lport_inuse(__u16 num,
 					const struct hlist_head udptable[])
@@ -1026,8 +1028,10 @@ int udp_queue_rcv_skb(struct sock * sk, 
 	}
 
 	if ((atomic_read(sk->sk_prot->memory_allocated)
-		       + sk_datagram_pages(skb->truesize))
-		> sk->sk_prot->sysctl_mem[0]) {
+	     + sk_datagram_pages(skb->truesize))
+	    > sk->sk_prot->sysctl_mem[0] &&
+	    atomic_read(&sk->sk_rmem_alloc) + skb->truesize 
+	    > sk->sk_prot->sysctl_rmem[0]) {
 		UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
 		goto drop;
 	}
@@ -1468,6 +1472,8 @@ struct proto udp_prot = {
 	.get_port	   = udp_v4_get_port,
 	.memory_allocated  = &udp_memory_allocated,
 	.sysctl_mem	   = &sysctl_udp_mem,
+	.sysctl_wmem	   = &sysctl_udp_wmem,
+	.sysctl_rmem	   = &sysctl_udp_rmem,
 	.obj_size	   = sizeof(struct udp_sock),
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_udp_setsockopt,
@@ -1674,6 +1680,9 @@ void __init udp_init(void)
 	limit = (limit * (nr_all_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
 	limit = max(limit, 128UL);
 	sysctl_udp_mem = limit / 2 * 3;
+
+	sysctl_udp_rmem = SK_DATAGRAM_MEM_QUANTUM;
+	sysctl_udp_wmem = SK_DATAGRAM_MEM_QUANTUM;
 }
 
 EXPORT_SYMBOL(udp_disconnect);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] introduce udp_rmem and udp_wmem
  2007-10-29 21:23 ` [PATCH 5/5] introduce udp_rmem and udp_wmem Hideo AOKI
@ 2007-10-30  4:52   ` Bill Fink
  2007-11-02 15:42     ` Hideo AOKI
  0 siblings, 1 reply; 20+ messages in thread
From: Bill Fink @ 2007-10-30  4:52 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, Satoshi Oshima, Herbert Xu, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

On Mon, 29 Oct 2007, Hideo AOKI wrote:

> This patch added /proc/sys/net/udp_rmem and /proc/sys/net/udp_rmem.
> Each UDP packet is drooped when the number of pages for socket buffer
> is beyond the limit and the socket already consumes minimum buffer.

I think you meant /proc/sys/net/ipv4/udp_{r,w}mem above.

Patch not in-lined making replying more difficult.

Cutting and pasting:

> diff -pruN linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt linux-2.6.24-rc1-mem003-ipv4-dev-p5/Documentation/networking/ip-sysctl.txt
> --- linux-2.6.24-rc1-mem003-ipv4-dev-p4/Documentation/networking/ip-sysctl.txt	2007-10-26 20:35:52.000000000 -0400
> +++ linux-2.6.24-rc1-mem003-ipv4-dev-p5/Documentation/networking/ip-sysctl.txt	2007-10-29 09:44:05.000000000 -0400
> @@ -452,6 +452,18 @@ udp_mem - INTEGER
>  	Number of pages allowed for queueing by all UDP sockets.
>  	Default is calculated at boot time from amount of available memory.
>  
> +udp_rmem - INTEGER
> +	Minimal size of receive buffer used by UDP sockets. Each UDP socket
> +	is able to use the size for receiving data, even if total pages of UDP
> +	sockets exceed udp_mem. The unit is byte.
> +	Default: 4096
> +
> +udp_wmem - INTEGER
> +	Minimal size of send buffer used by UDP sockets. Each UDP socket is
> +	able to use the size for sending data, even if total pages of UDP
> +	sockets exceed udp_mem. The unit is byte.
> +	Default: 4096
> +
>  CIPSOv4 Variables:
>  
>  cipso_cache_enable - BOOLEAN

I think either the above should be renamed to udp_{r,w}mem_min, or
they should be changed to a 3-tuple like tcp_{r,w}mem, and the code
refactored accordingly (but then what to do about
/proc/sys/net/core/{r,w}mem_max).

						-Bill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] introduce udp_rmem and udp_wmem
  2007-10-30  4:52   ` Bill Fink
@ 2007-11-02 15:42     ` Hideo AOKI
  0 siblings, 0 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-11-02 15:42 UTC (permalink / raw)
  To: Bill Fink
  Cc: David Miller, netdev, Satoshi Oshima, Herbert Xu, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

Hello,

I'm sorry to not respond quickly.

Bill Fink wrote:
> On Mon, 29 Oct 2007, Hideo AOKI wrote:
> 
>> This patch added /proc/sys/net/udp_rmem and /proc/sys/net/udp_rmem.
>> Each UDP packet is drooped when the number of pages for socket buffer
>> is beyond the limit and the socket already consumes minimum buffer.
> 
> I think you meant /proc/sys/net/ipv4/udp_{r,w}mem above.

You're right. New parameters are added to /proc/sys/net/ipv4.

> Patch not in-lined making replying more difficult.

I apologize for the inconvenient. I'll send patch as in-line next time.

>>> +udp_rmem - INTEGER
>> +	Minimal size of receive buffer used by UDP sockets. Each UDP socket
>> +	is able to use the size for receiving data, even if total pages of UDP
>> +	sockets exceed udp_mem. The unit is byte.
>> +	Default: 4096
>> +
>> +udp_wmem - INTEGER
>> +	Minimal size of send buffer used by UDP sockets. Each UDP socket is
>> +	able to use the size for sending data, even if total pages of UDP
>> +	sockets exceed udp_mem. The unit is byte.
>> +	Default: 4096
>> +
>>  CIPSOv4 Variables:
>>  
>>  cipso_cache_enable - BOOLEAN
> 
> I think either the above should be renamed to udp_{r,w}mem_min, or
> they should be changed to a 3-tuple like tcp_{r,w}mem, and the code
> refactored accordingly (but then what to do about
> /proc/sys/net/core/{r,w}mem_max).
> 
> 						-Bill

I understood. Then, I'll rename them to udp_{r,w}mem_min next take.
Please let me know if there is any suggestions.

Many thanks,
Hideo

-- 
Hitachi Computer Products (America) Inc.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] fix send buffer check
  2007-10-29 21:22 ` [PATCH 1/5] fix send buffer check Hideo AOKI
@ 2007-11-09 12:24   ` Herbert Xu
  2007-11-14  3:25     ` Hideo AOKI
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-11-09 12:24 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

On Mon, Oct 29, 2007 at 05:22:53PM -0400, Hideo AOKI wrote:
> This patch introduces sndbuf size check before memory allocation for
> send buffer.

Looks good, what about IPv6?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-10-29 21:23 ` [PATCH 2/5] accounting unit and variable Hideo AOKI
@ 2007-11-09 12:34   ` Herbert Xu
  2007-11-14  3:27     ` Hideo AOKI
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-11-09 12:34 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
>  
> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
> +
> +static inline int sk_datagram_pages(int amt)
> +{
> +	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
> +}

Does this really have to be int? Unsigned would let the compiler
optimise this to a simple shift.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] memory accounting
  2007-10-29 21:23 ` [PATCH 3/5] memory accounting Hideo AOKI
@ 2007-11-09 13:07   ` Herbert Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2007-11-09 13:07 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

On Mon, Oct 29, 2007 at 05:23:21PM -0400, Hideo AOKI wrote:
> This patch adds UDP memory usage accounting in IPv4.
> 
> --
> Hideo Aoki
> Hitachi Computer Products (America) Inc.

> 
> Signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
> Signed-off-by: Hideo Aoki <haoki@redhat.com>

This looks fine for now.  Although in future I'd prefer this
to use the forward alloc model used by stream sockets.  In fact
we should be able to share most of that code too.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] fix send buffer check
  2007-11-14  2:39 [PATCH 0/5] UDP memory accounting and limitation (take 7) Hideo AOKI
@ 2007-11-14  2:47 ` Hideo AOKI
  0 siblings, 0 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-11-14  2:47 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: Hideo AOKI, Satoshi Oshima, Herbert Xu, Bill Fink, Andi Kleen,
	Evgeniy Polyakov, Stephen Hemminger, yoshfuji, Yumiko Sugita

This patch introduces sndbuf size check before memory allocation for
send buffer.

signed-off-by: Satoshi Oshima <satoshi.oshima.fk@hitachi.com>
signed-off-by: Hideo Aoki <haoki@redhat.com>
---

  ip_output.c |    5 +++++
  1 file changed, 5 insertions(+)

diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-p1/net/ipv4/ip_output.c
--- net-2.6/net/ipv4/ip_output.c	2007-11-13 08:19:57.000000000 -0500
+++ net-2.6-udp-p1/net/ipv4/ip_output.c	2007-11-13 16:10:03.000000000 -0500
@@ -1004,6 +1004,11 @@ alloc_new_skb:
  					frag = &skb_shinfo(skb)->frags[i];
  				}
  			} else if (i < MAX_SKB_FRAGS) {
+				if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
+				    > 2 * sk->sk_sndbuf) {
+					err = -ENOBUFS;
+					goto error;
+				}
  				if (copy > PAGE_SIZE)
  					copy = PAGE_SIZE;
  				page = alloc_pages(sk->sk_allocation, 0);
--
Hideo Aoki
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] fix send buffer check
  2007-11-09 12:24   ` Herbert Xu
@ 2007-11-14  3:25     ` Hideo AOKI
  0 siblings, 0 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-11-14  3:25 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

Hello,

I'm sorry that my response is always late.

Herbert Xu wrote:
> On Mon, Oct 29, 2007 at 05:22:53PM -0400, Hideo AOKI wrote:
>> This patch introduces sndbuf size check before memory allocation for
>> send buffer.
> 
> Looks good, what about IPv6?

I'm going to develop IPv6 part if IPv4 patch set is accepted.

Many thanks,
Hideo

--
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-09 12:34   ` Herbert Xu
@ 2007-11-14  3:27     ` Hideo AOKI
  2007-11-14  3:45       ` Herbert Xu
  2007-11-14  3:55       ` David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-11-14  3:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

Hello,

Herbert Xu wrote:
> On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
>>  
>> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
>> +
>> +static inline int sk_datagram_pages(int amt)
>> +{
>> +	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
>> +}
> 
> Does this really have to be int? Unsigned would let the compiler
> optimise this to a simple shift.

Thank you for the comment.

This inline function is used to calculate the first argument of atomic_add()
and atomic_sub(). Since the argument is int, I believe that using int is
better than using unsigned int.

Best regards,
Hideo

--
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-14  3:27     ` Hideo AOKI
@ 2007-11-14  3:45       ` Herbert Xu
  2007-11-14  3:55       ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2007-11-14  3:45 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, Satoshi Oshima, Andi Kleen,
	Stephen Hemminger, Evgeniy Polyakov, yoshfuji, Yumiko Sugita

On Tue, Nov 13, 2007 at 10:27:13PM -0500, Hideo AOKI wrote:
> 
> Herbert Xu wrote:
> >> 
> >>+#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
> >>+
> >>+static inline int sk_datagram_pages(int amt)
> >>+{
> >>+	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
> >>+}
> >
> >Does this really have to be int? Unsigned would let the compiler
> >optimise this to a simple shift.
> 
> Thank you for the comment.
> 
> This inline function is used to calculate the first argument of atomic_add()
> and atomic_sub(). Since the argument is int, I believe that using int is
> better than using unsigned int.

That doesn't really answer my question.  Is this quantity ever
negative? If not you should make it unsigned for the reason I
gave above.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-14  3:27     ` Hideo AOKI
  2007-11-14  3:45       ` Herbert Xu
@ 2007-11-14  3:55       ` David Miller
  2007-11-14 15:32         ` Hideo AOKI
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2007-11-14  3:55 UTC (permalink / raw)
  To: haoki
  Cc: herbert, netdev, satoshi.oshima.fk, andi, shemminger, johnpol,
	yoshfuji, yumiko.sugita.yf

From: Hideo AOKI <haoki@redhat.com>
Date: Tue, 13 Nov 2007 22:27:13 -0500

> Herbert Xu wrote:
> > On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
> >>  
> >> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
> >> +
> >> +static inline int sk_datagram_pages(int amt)
> >> +{
> >> +	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
> >> +}
> > 
> > Does this really have to be int? Unsigned would let the compiler
> > optimise this to a simple shift.
> 
> Thank you for the comment.
> 
> This inline function is used to calculate the first argument of atomic_add()
> and atomic_sub(). Since the argument is int, I believe that using int is
> better than using unsigned int.

If you know the values will always be positive, as you will know here,
it is OK to us unsigned int here and avoids the unacceptable expensive
divide instruction.

Please fix this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-14  3:55       ` David Miller
@ 2007-11-14 15:32         ` Hideo AOKI
  2007-11-14 23:30           ` Hideo AOKI
  0 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-11-14 15:32 UTC (permalink / raw)
  To: David Miller, herbert
  Cc: netdev, satoshi.oshima.fk, andi, shemminger, johnpol, yoshfuji,
	yumiko.sugita.yf

David Miller wrote:
> From: Hideo AOKI <haoki@redhat.com>
> Date: Tue, 13 Nov 2007 22:27:13 -0500
> 
>> Herbert Xu wrote:
>>> On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
>>>>  
>>>> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
>>>> +
>>>> +static inline int sk_datagram_pages(int amt)
>>>> +{
>>>> +	return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
>>>> +}
>>> Does this really have to be int? Unsigned would let the compiler
>>> optimise this to a simple shift.
>> Thank you for the comment.
>>
>> This inline function is used to calculate the first argument of atomic_add()
>> and atomic_sub(). Since the argument is int, I believe that using int is
>> better than using unsigned int.
> 
> If you know the values will always be positive, as you will know here,
> it is OK to us unsigned int here and avoids the unacceptable expensive
> divide instruction.
> 
> Please fix this.

Hello,

Thanks for your comments. I finally understood.

I'll fix it and resubmit the patch as soon as possible.

Regards,
Hideo

--
Hitachi Computer Products (America) Inc.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-14 15:32         ` Hideo AOKI
@ 2007-11-14 23:30           ` Hideo AOKI
  2007-11-15  1:09             ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Hideo AOKI @ 2007-11-14 23:30 UTC (permalink / raw)
  To: David Miller, herbert
  Cc: Hideo AOKI, netdev, satoshi.oshima.fk, andi, shemminger, johnpol,
	yoshfuji, yumiko.sugita.yf

Hideo AOKI wrote:
> David Miller wrote:
>> From: Hideo AOKI <haoki@redhat.com>
>> Date: Tue, 13 Nov 2007 22:27:13 -0500
>>
>>> Herbert Xu wrote:
>>>> On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
>>>>>  
>>>>> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
>>>>> +
>>>>> +static inline int sk_datagram_pages(int amt)
>>>>> +{
>>>>> +    return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
>>>>> +}
>>>> Does this really have to be int? Unsigned would let the compiler
>>>> optimise this to a simple shift.
>>> Thank you for the comment.
>>>
>>> This inline function is used to calculate the first argument of 
>>> atomic_add()
>>> and atomic_sub(). Since the argument is int, I believe that using int is
>>> better than using unsigned int.
>>
>> If you know the values will always be positive, as you will know here,
>> it is OK to us unsigned int here and avoids the unacceptable expensive
>> divide instruction.
>>
>> Please fix this.
> 
> Thanks for your comments. I finally understood.
> 
> I'll fix it and resubmit the patch as soon as possible.

Let me propose the following code to use a shift instruction instead
of a divide instruction. I confirmed that the code could remove a
divide instruction, however, I would like to have comment on this
implementation.

+#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
+
+static inline int sk_datagram_pages(int amt)
+{
+	/* Cast to unsigned as an optimization, since amt is always positive. */
+	return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
+}
+

Please let me know if there is proper coding style.

I'm re-testing the whole patch set right now. If there is no problem,
I'll resubmit new patch set, which includes this fix, tomorrow.

Many thanks,
Hideo

-- 
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-14 23:30           ` Hideo AOKI
@ 2007-11-15  1:09             ` Herbert Xu
  2007-11-15 21:37               ` Hideo AOKI
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2007-11-15  1:09 UTC (permalink / raw)
  To: Hideo AOKI
  Cc: David Miller, netdev, satoshi.oshima.fk, andi, shemminger,
	johnpol, yoshfuji, yumiko.sugita.yf

On Wed, Nov 14, 2007 at 06:30:51PM -0500, Hideo AOKI wrote:
>
> +#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
> +
> +static inline int sk_datagram_pages(int amt)
> +{
> +	/* Cast to unsigned as an optimization, since amt is always 
> positive. */
> +	return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
> +}
> +

Thanks, this looks OK to me.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] accounting unit and variable
  2007-11-15  1:09             ` Herbert Xu
@ 2007-11-15 21:37               ` Hideo AOKI
  0 siblings, 0 replies; 20+ messages in thread
From: Hideo AOKI @ 2007-11-15 21:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, netdev, satoshi.oshima.fk, andi, shemminger,
	johnpol, yoshfuji, yumiko.sugita.yf

Herbert Xu wrote:
> On Wed, Nov 14, 2007 at 06:30:51PM -0500, Hideo AOKI wrote:
>> +#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE)
>> +
>> +static inline int sk_datagram_pages(int amt)
>> +{
>> +	/* Cast to unsigned as an optimization, since amt is always 
>> positive. */
>> +	return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM);
>> +}
>> +
> 
> Thanks, this looks OK to me.

Hello,

Thank you for reviewing. Then, I'll send take 8 patch set later.

Regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-11-15 21:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 21:18 [PATCH 0/5] UDP memory accounting and limitation (take 6) Hideo AOKI
2007-10-29 21:22 ` [PATCH 1/5] fix send buffer check Hideo AOKI
2007-11-09 12:24   ` Herbert Xu
2007-11-14  3:25     ` Hideo AOKI
2007-10-29 21:23 ` [PATCH 2/5] accounting unit and variable Hideo AOKI
2007-11-09 12:34   ` Herbert Xu
2007-11-14  3:27     ` Hideo AOKI
2007-11-14  3:45       ` Herbert Xu
2007-11-14  3:55       ` David Miller
2007-11-14 15:32         ` Hideo AOKI
2007-11-14 23:30           ` Hideo AOKI
2007-11-15  1:09             ` Herbert Xu
2007-11-15 21:37               ` Hideo AOKI
2007-10-29 21:23 ` [PATCH 3/5] memory accounting Hideo AOKI
2007-11-09 13:07   ` Herbert Xu
2007-10-29 21:23 ` [PATCH 4/5] memory limitation by using udp_mem Hideo AOKI
2007-10-29 21:23 ` [PATCH 5/5] introduce udp_rmem and udp_wmem Hideo AOKI
2007-10-30  4:52   ` Bill Fink
2007-11-02 15:42     ` Hideo AOKI
  -- strict thread matches above, loose matches on Subject: below --
2007-11-14  2:39 [PATCH 0/5] UDP memory accounting and limitation (take 7) Hideo AOKI
2007-11-14  2:47 ` [PATCH 1/5] fix send buffer check Hideo AOKI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).