netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Could we avoid touching dst->refcount in some cases ?
@ 2008-11-24  8:57 Eric Dumazet
  2008-11-24  9:42 ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2008-11-24  8:57 UTC (permalink / raw)
  To: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

tbench has hard time incrementing decrementing the route cache refcount
shared by all communications on localhost.

On real world, we also have this problem on RTP servers sending many UDP
frames to mediagateways, especially big ones handling thousand of streams.

Given that route entries are using RCU, we probably can avoid incrementing
their refcount in case of connected sockets ?

Here is a (untested and probably not working at all) patch on UDP part to
illustrate the idea :


[-- Attachment #2: avoid_touching_refcount.patch --]
[-- Type: text/plain, Size: 1271 bytes --]

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index da869ce..c385f13 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -553,6 +553,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int ulen = len;
 	struct ipcm_cookie ipc;
 	struct rtable *rt = NULL;
+	int rt_release = 0;
 	int free = 0;
 	int connected = 0;
 	__be32 daddr, faddr, saddr;
@@ -656,8 +657,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		connected = 0;
 	}
 
+	rcu_read_lock();
 	if (connected)
-		rt = (struct rtable*)sk_dst_check(sk, 0);
+		rt = (struct rtable *)__sk_dst_check(sk, 0);
 
 	if (rt == NULL) {
 		struct flowi fl = { .oif = ipc.oif,
@@ -681,11 +683,14 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 
 		err = -EACCES;
+		rt_release = 1;
 		if ((rt->rt_flags & RTCF_BROADCAST) &&
 		    !sock_flag(sk, SOCK_BROADCAST))
 			goto out;
-		if (connected)
-			sk_dst_set(sk, dst_clone(&rt->u.dst));
+		if (connected) {
+			sk_dst_set(sk, &rt->u.dst);
+			rt_release = 0;
+		}
 	}
 
 	if (msg->msg_flags&MSG_CONFIRM)
@@ -730,7 +735,9 @@ do_append_data:
 	release_sock(sk);
 
 out:
-	ip_rt_put(rt);
+	if (rt_release)
+		ip_rt_put(rt);
+	rcu_read_unlock();
 	if (free)
 		kfree(ipc.opt);
 	if (!err)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-12-18  6:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-24  8:57 [RFC] Could we avoid touching dst->refcount in some cases ? Eric Dumazet
2008-11-24  9:42 ` Andi Kleen
2008-11-24 10:14   ` Eric Dumazet
2008-11-24 11:24     ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() Eric Dumazet
2008-11-24 13:59       ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() Eric Dumazet
2008-11-25  0:07         ` David Miller
2008-11-24 23:55       ` [PATCH] net: avoid a pair of dst_hold()/dst_release() in ip_append_data() David Miller
2008-11-25  2:22         ` Andi Kleen
2008-11-24 11:27     ` [RFC] Could we avoid touching dst->refcount in some cases ? Andi Kleen
2008-11-24 23:36       ` David Miller
2008-11-24 23:39     ` David Miller
2008-11-25  4:43       ` Eric Dumazet
2008-11-25  5:00         ` David Miller
2008-11-26  0:00           ` [PATCH] net: release skb->dst in sock_queue_rcv_skb() Eric Dumazet
2008-11-26  0:23             ` David Miller
2008-11-26  2:04             ` David Miller
2008-11-26  7:39               ` Eric Dumazet
2008-11-26  9:08                 ` David Miller
2008-12-17 11:25             ` net-next: broken IP_PKTINFO [was Re: [PATCH] net: release skb->dst in sock_queue_rcv_skb()] Mark McLoughlin
2008-12-18  3:34               ` net-next: broken IP_PKTINFO David Miller
2008-12-18  5:59                 ` Eric Dumazet
2008-12-18  6:17                   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).