All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Waychison <Michael.Waychison@Sun.COM>
To: "Lever, Charles" <Charles.Lever@netapp.com>
Cc: Olaf Kirch <okir@suse.de>, nfs@lists.sourceforge.net
Subject: [PATCH] xprt sharing (was Re: xprt_bindresvport)
Date: Wed, 08 Dec 2004 13:17:53 -0500	[thread overview]
Message-ID: <41B74551.5040908@sun.com> (raw)
In-Reply-To: <482A3FA0050D21419C269D13989C61130435EC6F@lavender-fe.eng.netapp.com>

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lever, Charles wrote:
>>the current xprt_bindresvport implementation will search for 
>>a privileged
>>port by counting down from 800 to 0. I think this is a bug, because it
>>will potentially interfere with services trying to bind to 
>>low ports as
>>well.
> 
> 
> is this idle speculation, or do you actually have a test case that
> fails?  :^)
> 

Well, I haven't seen this 'interfere' with services yet, I can imagine
that it is plausible for a service to want to grab some port at a later
time only to have it in use by nfs.

> 
>>The bindresvport implementation in glibc picks from the 600-1023
>>range.
> 
> 
> we should review what other RPC implementations do (namely the reference
> implementation, Solaris).
> 
> but also notice this cuts the usable port range in half (from ~800 to
> ~420).  we need some form of mitigation to ensure we aren't limiting the
> number of NFS mounts a client can have.


This has been bugging me for a while.  The fact that we are limitting
ourselves to a single nfs mount per port.  From what I can tell, Solaris
shares the transports between nfs mounts from the same server and saves
themselves a lot of trouble with running out of port numbers in doing so.

The attached patch does the same for Linux against 2.6.9.  We share
xprts from existing connections, effectively removing any limit on the
number of nfs mounts we have in the system.

The only thing to worry about now is any talking to the portmapper or
mountd from userspace using tcp, which will put the reserved ports in
TIME_WAIT state.  This can limit the 'speed' at which we mount many mounts.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBt0VQdQs4kOxk3/MRAjo8AKCJjoqPxk1R3ev+o+UyquE0kiw0LQCfebi8
6mfhMSHLidFslVyt6jFeNis=
=0uCI
-----END PGP SIGNATURE-----

[-- Attachment #2: xprt_sharing.diff --]
[-- Type: text/x-patch, Size: 5311 bytes --]

This patch allows for sharing of xprts.  This is done by keeping a list of
current xprts and passing them back to the caller of xprt_create_proto if they
match the specifications required (IP X port X protocol X timeout).

We do this multiplexing at the xprt layer as it handles transport creation and
destruction.

This patch has been tested in a test-only environment but has been able to
handle a couple hundreds distinct nfs mounts from the same server over a single
tcp stream.

This effectively gets rid of the 800 nfs mounts max problem, as long as you
aren't mounting from many (800) nfs servers.

Signed-off-by: Mike Waychison <michael.waychison@sun.com>
Index: linux-2.6.9-nfs_portsharing/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.6.9-nfs_portsharing.orig/include/linux/sunrpc/xprt.h	2004-10-18 14:54:40.000000000 -0700
+++ linux-2.6.9-nfs_portsharing/include/linux/sunrpc/xprt.h	2004-10-26 13:22:41.000000000 -0700
@@ -15,6 +15,8 @@
 #include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/xdr.h>
 
+#include <asm/atomic.h>
+
 /*
  * The transport code maintains an estimate on the maximum number of out-
  * standing RPC requests, using a smoothed version of the congestion
@@ -194,6 +196,9 @@ struct rpc_xprt {
 	void			(*old_write_space)(struct sock *);
 
 	wait_queue_head_t	cong_wait;
+
+	atomic_t		count;		/* shared xprt refcount */
+	struct list_head	shared;		/* link to shared list */
 };
 
 #ifdef __KERNEL__
Index: linux-2.6.9-nfs_portsharing/net/sunrpc/xprt.c
===================================================================
--- linux-2.6.9-nfs_portsharing.orig/net/sunrpc/xprt.c	2004-10-18 14:54:39.000000000 -0700
+++ linux-2.6.9-nfs_portsharing/net/sunrpc/xprt.c	2004-10-26 15:27:56.713490488 -0700
@@ -78,6 +78,12 @@
 #define XPRT_MAX_RESVPORT	(800)
 
 /*
+ * List of shared xprt
+ */
+static DECLARE_MUTEX(shared_xprt_sem);
+static LIST_HEAD(shared_xprt_list);
+
+/*
  * Local functions
  */
 static void	xprt_request_init(struct rpc_task *, struct rpc_xprt *);
@@ -1395,6 +1401,30 @@ xprt_release(struct rpc_task *task)
 }
 
 /*
+ * Compare two rpc_timeout to see if they are the same.
+ */
+static int
+xprt_is_same_timeout(struct rpc_timeout *left, struct rpc_timeout *right)
+{
+	return left->to_initval     == right->to_initval
+            && left->to_maxval      == right->to_maxval
+            && left->to_increment   == right->to_increment
+            && left->to_retries     == right->to_retries
+            && left->to_exponential == right->to_exponential;
+}
+/*
+ * Check to see if the timeout is the default timeout.
+ */
+static int
+xprt_is_default_timeout(struct rpc_timeout *to, int proto)
+{
+	struct rpc_timeout defaultto;
+
+	xprt_default_timeout(&defaultto, proto);
+	return xprt_is_same_timeout(&defaultto, to);
+}
+
+/*
  * Set default timeout parameters
  */
 void
@@ -1472,6 +1502,8 @@ xprt_setup(int proto, struct sockaddr_in
 	xprt->timer.data = (unsigned long) xprt;
 	xprt->last_used = jiffies;
 	xprt->port = XPRT_MAX_RESVPORT;
+	INIT_LIST_HEAD(&xprt->shared);
+	atomic_set(&xprt->count, 1);
 
 	/* Set timeout parameters */
 	if (to) {
@@ -1617,8 +1649,8 @@ failed:
 /*
  * Create an RPC client transport given the protocol and peer address.
  */
-struct rpc_xprt *
-xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+static struct rpc_xprt *
+__xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
 {
 	struct rpc_xprt	*xprt;
 
@@ -1631,6 +1663,43 @@ xprt_create_proto(int proto, struct sock
 }
 
 /*
+ * Create an RPC client transport that is shared given the protocol and peer
+ * address.
+ */
+struct rpc_xprt *
+xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+{
+	struct rpc_xprt *xprt;
+
+	down(&shared_xprt_sem);
+	/* walk the list and find an existing mathing xprt */
+	list_for_each_entry(xprt, &shared_xprt_list, shared) {
+		/* Filter out mismatches */
+		if (sap->sin_addr.s_addr != xprt->addr.sin_addr.s_addr)
+			continue;
+		if (sap->sin_port != xprt->addr.sin_port)
+			continue;
+		if (xprt->prot != proto)
+			continue;
+		if (to == NULL && !xprt_is_default_timeout(&xprt->timeout, proto))
+			continue;
+		if (to && !xprt_is_same_timeout(&xprt->timeout, to))
+			continue;
+
+		atomic_inc(&xprt->count);
+		goto out;
+	}
+
+	/* make a new one */
+	xprt = __xprt_create_proto(proto, sap, to);
+	if (!IS_ERR(xprt))
+		list_add(&xprt->shared, &shared_xprt_list);
+out:
+	up(&shared_xprt_sem);
+	return xprt;
+}
+
+/*
  * Prepare for transport shutdown.
  */
 void
@@ -1658,8 +1727,8 @@ xprt_clear_backlog(struct rpc_xprt *xprt
 /*
  * Destroy an RPC transport, killing off all requests.
  */
-int
-xprt_destroy(struct rpc_xprt *xprt)
+static int
+__xprt_destroy(struct rpc_xprt *xprt)
 {
 	dprintk("RPC:      destroying transport %p\n", xprt);
 	xprt_shutdown(xprt);
@@ -1670,3 +1739,20 @@ xprt_destroy(struct rpc_xprt *xprt)
 
 	return 0;
 }
+
+/*
+ * Destroy a shared RPC transport.
+ * (XXX: what about the remaining live requests?)
+ */
+int
+xprt_destroy(struct rpc_xprt *xprt)
+{
+	int ret = 0;
+	down(&shared_xprt_sem);
+	if (atomic_dec_and_test(&xprt->count)) {
+		list_del_init(&xprt->shared);
+		ret = __xprt_destroy(xprt);
+	}
+	up(&shared_xprt_sem);
+	return ret;
+}

  reply	other threads:[~2004-12-08 18:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-08 14:33 xprt_bindresvport Lever, Charles
2004-12-08 18:17 ` Mike Waychison [this message]
2004-12-09 11:31   ` [PATCH] xprt sharing (was Re: xprt_bindresvport) Olaf Kirch
2004-12-09 13:36     ` Trond Myklebust
2004-12-09 13:44       ` Olaf Kirch
2004-12-09 16:20         ` Trond Myklebust
2004-12-09 19:34     ` Dan Stromberg
2004-12-09 21:33       ` Trond Myklebust
2004-12-09 22:29         ` Dan Stromberg
2004-12-09 11:01 ` xprt_bindresvport Olaf Kirch
  -- strict thread matches above, loose matches on Subject: below --
2004-12-08 19:08 [PATCH] xprt sharing (was Re: xprt_bindresvport) Lever, Charles
2004-12-08 21:58 ` Mike Waychison
2004-12-09 11:22 ` Olaf Kirch
2004-12-09 13:33   ` Trond Myklebust
2004-12-09 13:41     ` Olaf Kirch
2004-12-08 22:00 Lever, Charles
2004-12-09  8:54 Peter Åstrand
2004-12-09 11:14 ` Olaf Kirch
2004-12-09 14:03 Lever, Charles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41B74551.5040908@sun.com \
    --to=michael.waychison@sun.com \
    --cc=Charles.Lever@netapp.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=okir@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.