All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: xprt_bindresvport
@ 2004-12-08 14:33 Lever, Charles
  2004-12-08 18:17 ` [PATCH] xprt sharing (was Re: xprt_bindresvport) Mike Waychison
  2004-12-09 11:01 ` xprt_bindresvport Olaf Kirch
  0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-12-08 14:33 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs

> the current xprt_bindresvport implementation will search for=20
> a privileged
> port by counting down from 800 to 0. I think this is a bug, because it
> will potentially interfere with services trying to bind to=20
> low ports as
> well.

is this idle speculation, or do you actually have a test case that
fails?  :^)

> The bindresvport implementation in glibc picks from the 600-1023
> range.

we should review what other RPC implementations do (namely the reference
implementation, Solaris).

but also notice this cuts the usable port range in half (from ~800 to
~420).  we need some form of mitigation to ensure we aren't limiting the
number of NFS mounts a client can have.

> I also think it would be good to start at a "random" port. Otherwise,
> when you reboot, the server may still have a TCB for the old=20
> connection
> and send you an ACK probe when you try to connect (if all=20
> goes well), and
> the client's TCP stack will RST and fail the connect. If things go
> not-so-well you have a packet filter somewhere inbetween that eats the
> ACK probe because its connection tracking engine thinks the connection
> is in half-open and shouldn't see any SYN-less ACKs yet.

i don't agree.  you're just making the bad behavior more rare by
choosing ports at random; you are not actually addressing the root
problem you describe.  is it possible to fix the RPC client to retry the
connection in this case?  and/or fix the server to recognize and remove
the context for the old connection?

introducing randomness here will make reproducing problems in this area
very difficult.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-08 14:33 xprt_bindresvport Lever, Charles
@ 2004-12-08 18:17 ` Mike Waychison
  2004-12-09 11:31   ` Olaf Kirch
  2004-12-09 11:01 ` xprt_bindresvport Olaf Kirch
  1 sibling, 1 reply; 19+ messages in thread
From: Mike Waychison @ 2004-12-08 18:17 UTC (permalink / raw)
  To: Lever, Charles; +Cc: Olaf Kirch, nfs

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lever, Charles wrote:
>>the current xprt_bindresvport implementation will search for 
>>a privileged
>>port by counting down from 800 to 0. I think this is a bug, because it
>>will potentially interfere with services trying to bind to 
>>low ports as
>>well.
> 
> 
> is this idle speculation, or do you actually have a test case that
> fails?  :^)
> 

Well, I haven't seen this 'interfere' with services yet, I can imagine
that it is plausible for a service to want to grab some port at a later
time only to have it in use by nfs.

> 
>>The bindresvport implementation in glibc picks from the 600-1023
>>range.
> 
> 
> we should review what other RPC implementations do (namely the reference
> implementation, Solaris).
> 
> but also notice this cuts the usable port range in half (from ~800 to
> ~420).  we need some form of mitigation to ensure we aren't limiting the
> number of NFS mounts a client can have.


This has been bugging me for a while.  The fact that we are limitting
ourselves to a single nfs mount per port.  From what I can tell, Solaris
shares the transports between nfs mounts from the same server and saves
themselves a lot of trouble with running out of port numbers in doing so.

The attached patch does the same for Linux against 2.6.9.  We share
xprts from existing connections, effectively removing any limit on the
number of nfs mounts we have in the system.

The only thing to worry about now is any talking to the portmapper or
mountd from userspace using tcp, which will put the reserved ports in
TIME_WAIT state.  This can limit the 'speed' at which we mount many mounts.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBt0VQdQs4kOxk3/MRAjo8AKCJjoqPxk1R3ev+o+UyquE0kiw0LQCfebi8
6mfhMSHLidFslVyt6jFeNis=
=0uCI
-----END PGP SIGNATURE-----

[-- Attachment #2: xprt_sharing.diff --]
[-- Type: text/x-patch, Size: 5311 bytes --]

This patch allows for sharing of xprts.  This is done by keeping a list of
current xprts and passing them back to the caller of xprt_create_proto if they
match the specifications required (IP X port X protocol X timeout).

We do this multiplexing at the xprt layer as it handles transport creation and
destruction.

This patch has been tested in a test-only environment but has been able to
handle a couple hundreds distinct nfs mounts from the same server over a single
tcp stream.

This effectively gets rid of the 800 nfs mounts max problem, as long as you
aren't mounting from many (800) nfs servers.

Signed-off-by: Mike Waychison <michael.waychison@sun.com>
Index: linux-2.6.9-nfs_portsharing/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.6.9-nfs_portsharing.orig/include/linux/sunrpc/xprt.h	2004-10-18 14:54:40.000000000 -0700
+++ linux-2.6.9-nfs_portsharing/include/linux/sunrpc/xprt.h	2004-10-26 13:22:41.000000000 -0700
@@ -15,6 +15,8 @@
 #include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/xdr.h>
 
+#include <asm/atomic.h>
+
 /*
  * The transport code maintains an estimate on the maximum number of out-
  * standing RPC requests, using a smoothed version of the congestion
@@ -194,6 +196,9 @@ struct rpc_xprt {
 	void			(*old_write_space)(struct sock *);
 
 	wait_queue_head_t	cong_wait;
+
+	atomic_t		count;		/* shared xprt refcount */
+	struct list_head	shared;		/* link to shared list */
 };
 
 #ifdef __KERNEL__
Index: linux-2.6.9-nfs_portsharing/net/sunrpc/xprt.c
===================================================================
--- linux-2.6.9-nfs_portsharing.orig/net/sunrpc/xprt.c	2004-10-18 14:54:39.000000000 -0700
+++ linux-2.6.9-nfs_portsharing/net/sunrpc/xprt.c	2004-10-26 15:27:56.713490488 -0700
@@ -78,6 +78,12 @@
 #define XPRT_MAX_RESVPORT	(800)
 
 /*
+ * List of shared xprt
+ */
+static DECLARE_MUTEX(shared_xprt_sem);
+static LIST_HEAD(shared_xprt_list);
+
+/*
  * Local functions
  */
 static void	xprt_request_init(struct rpc_task *, struct rpc_xprt *);
@@ -1395,6 +1401,30 @@ xprt_release(struct rpc_task *task)
 }
 
 /*
+ * Compare two rpc_timeout to see if they are the same.
+ */
+static int
+xprt_is_same_timeout(struct rpc_timeout *left, struct rpc_timeout *right)
+{
+	return left->to_initval     == right->to_initval
+            && left->to_maxval      == right->to_maxval
+            && left->to_increment   == right->to_increment
+            && left->to_retries     == right->to_retries
+            && left->to_exponential == right->to_exponential;
+}
+/*
+ * Check to see if the timeout is the default timeout.
+ */
+static int
+xprt_is_default_timeout(struct rpc_timeout *to, int proto)
+{
+	struct rpc_timeout defaultto;
+
+	xprt_default_timeout(&defaultto, proto);
+	return xprt_is_same_timeout(&defaultto, to);
+}
+
+/*
  * Set default timeout parameters
  */
 void
@@ -1472,6 +1502,8 @@ xprt_setup(int proto, struct sockaddr_in
 	xprt->timer.data = (unsigned long) xprt;
 	xprt->last_used = jiffies;
 	xprt->port = XPRT_MAX_RESVPORT;
+	INIT_LIST_HEAD(&xprt->shared);
+	atomic_set(&xprt->count, 1);
 
 	/* Set timeout parameters */
 	if (to) {
@@ -1617,8 +1649,8 @@ failed:
 /*
  * Create an RPC client transport given the protocol and peer address.
  */
-struct rpc_xprt *
-xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+static struct rpc_xprt *
+__xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
 {
 	struct rpc_xprt	*xprt;
 
@@ -1631,6 +1663,43 @@ xprt_create_proto(int proto, struct sock
 }
 
 /*
+ * Create an RPC client transport that is shared given the protocol and peer
+ * address.
+ */
+struct rpc_xprt *
+xprt_create_proto(int proto, struct sockaddr_in *sap, struct rpc_timeout *to)
+{
+	struct rpc_xprt *xprt;
+
+	down(&shared_xprt_sem);
+	/* walk the list and find an existing mathing xprt */
+	list_for_each_entry(xprt, &shared_xprt_list, shared) {
+		/* Filter out mismatches */
+		if (sap->sin_addr.s_addr != xprt->addr.sin_addr.s_addr)
+			continue;
+		if (sap->sin_port != xprt->addr.sin_port)
+			continue;
+		if (xprt->prot != proto)
+			continue;
+		if (to == NULL && !xprt_is_default_timeout(&xprt->timeout, proto))
+			continue;
+		if (to && !xprt_is_same_timeout(&xprt->timeout, to))
+			continue;
+
+		atomic_inc(&xprt->count);
+		goto out;
+	}
+
+	/* make a new one */
+	xprt = __xprt_create_proto(proto, sap, to);
+	if (!IS_ERR(xprt))
+		list_add(&xprt->shared, &shared_xprt_list);
+out:
+	up(&shared_xprt_sem);
+	return xprt;
+}
+
+/*
  * Prepare for transport shutdown.
  */
 void
@@ -1658,8 +1727,8 @@ xprt_clear_backlog(struct rpc_xprt *xprt
 /*
  * Destroy an RPC transport, killing off all requests.
  */
-int
-xprt_destroy(struct rpc_xprt *xprt)
+static int
+__xprt_destroy(struct rpc_xprt *xprt)
 {
 	dprintk("RPC:      destroying transport %p\n", xprt);
 	xprt_shutdown(xprt);
@@ -1670,3 +1739,20 @@ xprt_destroy(struct rpc_xprt *xprt)
 
 	return 0;
 }
+
+/*
+ * Destroy a shared RPC transport.
+ * (XXX: what about the remaining live requests?)
+ */
+int
+xprt_destroy(struct rpc_xprt *xprt)
+{
+	int ret = 0;
+	down(&shared_xprt_sem);
+	if (atomic_dec_and_test(&xprt->count)) {
+		list_del_init(&xprt->shared);
+		ret = __xprt_destroy(xprt);
+	}
+	up(&shared_xprt_sem);
+	return ret;
+}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)
@ 2004-12-08 19:08 Lever, Charles
  2004-12-08 21:58 ` Mike Waychison
  2004-12-09 11:22 ` Olaf Kirch
  0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-12-08 19:08 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Olaf Kirch, nfs

> > but also notice this cuts the usable port range in half=20
> (from ~800 to
> > ~420).  we need some form of mitigation to ensure we aren't=20
> limiting the
> > number of NFS mounts a client can have.
>=20
> This has been bugging me for a while.  The fact that we are limitting
> ourselves to a single nfs mount per port.  From what I can=20
> tell, Solaris
> shares the transports between nfs mounts from the same server=20
> and saves
> themselves a lot of trouble with running out of port numbers=20
> in doing so.
>=20
> The attached patch does the same for Linux against 2.6.9.  We share
> xprts from existing connections, effectively removing any limit on the
> number of nfs mounts we have in the system.
>=20
> The only thing to worry about now is any talking to the portmapper or
> mountd from userspace using tcp, which will put the reserved ports in
> TIME_WAIT state.  This can limit the 'speed' at which we=20
> mount many mounts.

we're looking at a similar solution.  we want to make sure we don't
limit the scalability of everyone's mount point by making them all
funnel through a single slot table.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-08 19:08 [PATCH] xprt sharing (was Re: xprt_bindresvport) Lever, Charles
@ 2004-12-08 21:58 ` Mike Waychison
  2004-12-09 11:22 ` Olaf Kirch
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Waychison @ 2004-12-08 21:58 UTC (permalink / raw)
  To: Lever, Charles; +Cc: Olaf Kirch, nfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lever, Charles wrote:
>>>but also notice this cuts the usable port range in half 
>>
>>(from ~800 to
>>
>>>~420).  we need some form of mitigation to ensure we aren't 
>>
>>limiting the
>>
>>>number of NFS mounts a client can have.
>>
>>This has been bugging me for a while.  The fact that we are limitting
>>ourselves to a single nfs mount per port.  From what I can 
>>tell, Solaris
>>shares the transports between nfs mounts from the same server 
>>and saves
>>themselves a lot of trouble with running out of port numbers 
>>in doing so.
>>
>>The attached patch does the same for Linux against 2.6.9.  We share
>>xprts from existing connections, effectively removing any limit on the
>>number of nfs mounts we have in the system.
>>
>>The only thing to worry about now is any talking to the portmapper or
>>mountd from userspace using tcp, which will put the reserved ports in
>>TIME_WAIT state.  This can limit the 'speed' at which we 
>>mount many mounts.
> 
> 
> we're looking at a similar solution.  we want to make sure we don't
> limit the scalability of everyone's mount point by making them all
> funnel through a single slot table.
> 

Can you post any work in progress for this?   The xprt patch I posted
was written a while ago, and I just realized this afternoon that it
doesn't seem to do the right thing for tcp sockets that are autoclosed.

If you have a similar patch that works, it would save me the trouble ;)


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBt3jodQs4kOxk3/MRArXNAKCaahtv7uNfhX2n2yaz/N3D18t0vgCfSOLa
rOARY+qtJrFfWOtb0m18cSk=
=HlXX
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)
@ 2004-12-08 22:00 Lever, Charles
  0 siblings, 0 replies; 19+ messages in thread
From: Lever, Charles @ 2004-12-08 22:00 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Olaf Kirch, nfs

i don't have a patch like this, but trond may have something.

> -----Original Message-----
> From: Mike Waychison [mailto:Michael.Waychison@Sun.COM]=20
> Sent: Wednesday, December 08, 2004 4:58 PM
> To: Lever, Charles
> Cc: Olaf Kirch; nfs@lists.sourceforge.net
> Subject: Re: [PATCH] xprt sharing (was Re: [NFS] xprt_bindresvport)
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Lever, Charles wrote:
> >>>but also notice this cuts the usable port range in half=20
> >>
> >>(from ~800 to
> >>
> >>>~420).  we need some form of mitigation to ensure we aren't=20
> >>
> >>limiting the
> >>
> >>>number of NFS mounts a client can have.
> >>
> >>This has been bugging me for a while.  The fact that we are=20
> limitting
> >>ourselves to a single nfs mount per port.  From what I can=20
> >>tell, Solaris
> >>shares the transports between nfs mounts from the same server=20
> >>and saves
> >>themselves a lot of trouble with running out of port numbers=20
> >>in doing so.
> >>
> >>The attached patch does the same for Linux against 2.6.9.  We share
> >>xprts from existing connections, effectively removing any=20
> limit on the
> >>number of nfs mounts we have in the system.
> >>
> >>The only thing to worry about now is any talking to the=20
> portmapper or
> >>mountd from userspace using tcp, which will put the=20
> reserved ports in
> >>TIME_WAIT state.  This can limit the 'speed' at which we=20
> >>mount many mounts.
> >=20
> >=20
> > we're looking at a similar solution.  we want to make sure we don't
> > limit the scalability of everyone's mount point by making them all
> > funnel through a single slot table.
> >=20
>=20
> Can you post any work in progress for this?   The xprt patch I posted
> was written a while ago, and I just realized this afternoon that it
> doesn't seem to do the right thing for tcp sockets that are=20
> autoclosed.
>=20
> If you have a similar patch that works, it would save me the=20
> trouble ;)
>=20
>=20
> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
>=20
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE:  The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>=20
> iD8DBQFBt3jodQs4kOxk3/MRArXNAKCaahtv7uNfhX2n2yaz/N3D18t0vgCfSOLa
> rOARY+qtJrFfWOtb0m18cSk=3D
> =3DHlXX
> -----END PGP SIGNATURE-----
>=20


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
@ 2004-12-09  8:54 Peter Åstrand
  2004-12-09 11:14 ` Olaf Kirch
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Åstrand @ 2004-12-09  8:54 UTC (permalink / raw)
  To: nfs

On Wed, 8 Dec 2004, Mike Waychison wrote:

> This has been bugging me for a while.  The fact that we are limitting
> ourselves to a single nfs mount per port.  From what I can tell, Solaris
> shares the transports between nfs mounts from the same server and saves
> themselves a lot of trouble with running out of port numbers in doing so.
> 
> The attached patch does the same for Linux against 2.6.9.  We share
> xprts from existing connections, effectively removing any limit on the
> number of nfs mounts we have in the system.

Good work! 


> The only thing to worry about now is any talking to the portmapper or
> mountd from userspace using tcp, which will put the reserved ports in
> TIME_WAIT state.  This can limit the 'speed' at which we mount many
> mounts.

How about using SO_REUSEADDR?

-- 
Peter Åstrand		Chief Developer
Cendio			www.thinlinc.com
Teknikringen 3		www.cendio.se
583 30 Linköping        Phone: +46-13-21 46 00




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xprt_bindresvport
  2004-12-08 14:33 xprt_bindresvport Lever, Charles
  2004-12-08 18:17 ` [PATCH] xprt sharing (was Re: xprt_bindresvport) Mike Waychison
@ 2004-12-09 11:01 ` Olaf Kirch
  1 sibling, 0 replies; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 11:01 UTC (permalink / raw)
  To: Lever, Charles; +Cc: nfs

On Wed, Dec 08, 2004 at 06:33:15AM -0800, Lever, Charles wrote:
> > port by counting down from 800 to 0. I think this is a bug, because it
> > will potentially interfere with services trying to bind to 
> > low ports as
> > well.
> 
> is this idle speculation, or do you actually have a test case that
> fails?  :^)

I've seen many cases of glibc's bindresvport (which allocates from
600-1023) snatching ports used by other services, e.g. cups. The way
we solve this in user space is by having a file in /etc with a blacklist
of ports bindresvport isn't allowed to touch.

I admit that I haven't seen NFS step on someone else toes yet. But it
all depends on the number of mounts you have, and on coincidence.

At any rate, it really looks like a bug to me. Why should our scan include
ports in the low range (which is a no-no) but refuse to touch ports in the
800-1023 range? It's just an inverted test, IMO.

> i don't agree.  you're just making the bad behavior more rare by
> choosing ports at random; you are not actually addressing the root
> problem you describe.  is it possible to fix the RPC client to retry the
> connection in this case?  and/or fix the server to recognize and remove
> the context for the old connection?

You cannot fix the RPC client. It will not see any packets from the
server, so it simply sits there until it times out and retries the
connection. This can cause system boot to take quite a long time.

You can fix the firewall, this is currently being tossed around on the
netfilter list. Unfortunately not all routers out there run the latest and
greatest netfilter code; so we don't really have much control over that.

> introducing randomness here will make reproducing problems in this area
> very difficult.

I agree. And I would only enable randomization if we change the port
range to 600-up or something like this.

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09  8:54 Peter Åstrand
@ 2004-12-09 11:14 ` Olaf Kirch
  0 siblings, 0 replies; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 11:14 UTC (permalink / raw)
  To: Peter Åstrand; +Cc: nfs

On Thu, Dec 09, 2004 at 09:54:58AM +0100, Peter =C5strand wrote:
> > The only thing to worry about now is any talking to the portmapper or
> > mountd from userspace using tcp, which will put the reserved ports in
> > TIME_WAIT state.  This can limit the 'speed' at which we mount many
> > mounts.
>=20
> How about using SO_REUSEADDR?

You cannot use it safely on active (i.e. client) sockets. Consider this

Application A:
	set SO_REUSEADDR
	bind to port 1234
	connect to server foo, port 2049

Application B:
	set SO_REUSEADDR
	bind to port 1234 (succeeds because of REUSEADDR)
	connect to server foo, port 2049: fails with EADDRNOTAVAIL,
		because there already is a connection from
		client:1234 -> foo:2049

Olaf
--=20
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-08 19:08 [PATCH] xprt sharing (was Re: xprt_bindresvport) Lever, Charles
  2004-12-08 21:58 ` Mike Waychison
@ 2004-12-09 11:22 ` Olaf Kirch
  2004-12-09 13:33   ` Trond Myklebust
  1 sibling, 1 reply; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 11:22 UTC (permalink / raw)
  To: Lever, Charles; +Cc: Mike Waychison, nfs

On Wed, Dec 08, 2004 at 11:08:18AM -0800, Lever, Charles wrote:
> we're looking at a similar solution.  we want to make sure we don't
> limit the scalability of everyone's mount point by making them all
> funnel through a single slot table.

I think you will hit all sorts of limits on the way if you do this,
not just the sunrpc locks.  Using a single TCP connection means you
will have to serialize all sendmsg() calls across all clients, because
otherwise you'll mess up the RPC record framing.
You may also run into the max send/recv buffer sizes of a socket.

I cannot see how this can scale very well.

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-08 18:17 ` [PATCH] xprt sharing (was Re: xprt_bindresvport) Mike Waychison
@ 2004-12-09 11:31   ` Olaf Kirch
  2004-12-09 13:36     ` Trond Myklebust
  2004-12-09 19:34     ` Dan Stromberg
  0 siblings, 2 replies; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 11:31 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Lever, Charles, nfs

On Wed, Dec 08, 2004 at 01:17:53PM -0500, Mike Waychison wrote:
> This has been bugging me for a while.  The fact that we are limitting
> ourselves to a single nfs mount per port.  From what I can tell, Solaris
> shares the transports between nfs mounts from the same server and saves
> themselves a lot of trouble with running out of port numbers in doing so.

Shouldn't we allow NFS mounts to use non-privileged ports? Many
environments don't really care about the "security" provided by privileged
ports, but would be more than happy if they can run with a few hundred
NFS mounts

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 11:22 ` Olaf Kirch
@ 2004-12-09 13:33   ` Trond Myklebust
  2004-12-09 13:41     ` Olaf Kirch
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-12-09 13:33 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Charles Lever, Mike Waychison, nfs

to den 09.12.2004 Klokka 12:22 (+0100) skreiv Olaf Kirch:

> I think you will hit all sorts of limits on the way if you do this,
> not just the sunrpc locks.  Using a single TCP connection means you
> will have to serialize all sendmsg() calls across all clients, because
> otherwise you'll mess up the RPC record framing.
> You may also run into the max send/recv buffer sizes of a socket.
> 
> I cannot see how this can scale very well.

As long as it scales better than 1 privileged port per mountpoint. ;-)

Seriously, though: we *already* have this serialization problem with the
single client per transport case, and so there is nothing that needs to
added to the locking in order to deal with multiple clients per
transport. IOW contention today is at the per-request level and it would
have to remain so for the shared transport case.

Note also that we could also create pools of several transport sockets
per server: the current locking scheme allows for that too. That would
improve per-request scalability at the same time as it allows us to
limit the privileged port usage. There be a couple of small dragons
there (unless you are running on NFSv4.1 w/ sessions, you cannot replay
a request on a different port for instance) but such a scheme does not
have to be too sophisticated to be useful.

Cheers,
  Trond

-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 11:31   ` Olaf Kirch
@ 2004-12-09 13:36     ` Trond Myklebust
  2004-12-09 13:44       ` Olaf Kirch
  2004-12-09 19:34     ` Dan Stromberg
  1 sibling, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-12-09 13:36 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Mike Waychison, Charles Lever, nfs

to den 09.12.2004 Klokka 12:31 (+0100) skreiv Olaf Kirch:

> Shouldn't we allow NFS mounts to use non-privileged ports? Many
> environments don't really care about the "security" provided by privileged
> ports, but would be more than happy if they can run with a few hundred
> NFS mounts

Most AUTH_SYS based models still require it, however I agree that use of
strong security makes the privileged port totally redundant.

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 13:33   ` Trond Myklebust
@ 2004-12-09 13:41     ` Olaf Kirch
  0 siblings, 0 replies; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 13:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Charles Lever, Mike Waychison, nfs

On Thu, Dec 09, 2004 at 08:33:32AM -0500, Trond Myklebust wrote:
> Seriously, though: we *already* have this serialization problem with the
> single client per transport case, and so there is nothing that needs to
> added to the locking in order to deal with multiple clients per
> transport. IOW contention today is at the per-request level and it would
> have to remain so for the shared transport case.

But two separate mounts with separate sockets do not serialize (at least
they shouldn't). And contention doesn't happen on the client only.
The server needs to serialize sending over TCP as well; the more
sockets you have the less likely it will step on its own toes.

> Note also that we could also create pools of several transport sockets
> per server: the current locking scheme allows for that too. That would
> improve per-request scalability at the same time as it allows us to
> limit the privileged port usage.

Yes, that would help scalability a lot.

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 13:36     ` Trond Myklebust
@ 2004-12-09 13:44       ` Olaf Kirch
  2004-12-09 16:20         ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Olaf Kirch @ 2004-12-09 13:44 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Mike Waychison, Charles Lever, nfs

On Thu, Dec 09, 2004 at 08:36:55AM -0500, Trond Myklebust wrote:
> > Shouldn't we allow NFS mounts to use non-privileged ports? Many
> > environments don't really care about the "security" provided by privileged
> > ports, but would be more than happy if they can run with a few hundred
> > NFS mounts
> 
> Most AUTH_SYS based models still require it, however I agree that use of
> strong security makes the privileged port totally redundant.

The Linux auth_sys works fine with unprivileged ports if you allow it to;
so why shouldn't we make that configurable on the client too? It'd sure
help some  installations who for some reason or other habe an excessive
number of exported file systems (see .sig below :).

Olaf
-- 
Olaf Kirch     | Things that make Monday morning interesting, #2:
okir@suse.de   |        "We have 8,000 NFS mount points, why do we keep
---------------+ 	 running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)
@ 2004-12-09 14:03 Lever, Charles
  0 siblings, 0 replies; 19+ messages in thread
From: Lever, Charles @ 2004-12-09 14:03 UTC (permalink / raw)
  To: Olaf Kirch, Trond Myklebust; +Cc: Mike Waychison, nfs

> But two separate mounts with separate sockets do not=20
> serialize (at least they shouldn't).

there's so much spin locking and BKL activity in both the RPC client and
the NFS client that effectively, there is significant serialization
today.

> > Note also that we could also create pools of several=20
> transport sockets=20
> > per server: the current locking scheme allows for that too.=20
> That would=20
> > improve per-request scalability at the same time as it allows us to=20
> > limit the privileged port usage.
>=20
> Yes, that would help scalability a lot.

that's one direction we would like to take after the transport switch
API is integrated into 2.6.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 13:44       ` Olaf Kirch
@ 2004-12-09 16:20         ` Trond Myklebust
  0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2004-12-09 16:20 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Mike Waychison, Charles Lever, nfs

to den 09.12.2004 Klokka 14:44 (+0100) skreiv Olaf Kirch:

> The Linux auth_sys works fine with unprivileged ports if you allow it to;
> so why shouldn't we make that configurable on the client too? It'd sure
> help some  installations who for some reason or other habe an excessive
> number of exported file systems (see .sig below :).

_Another_ mount option? Urgh... 8-)

Seriously: if you have 8000 NFS mount points, you do not want 8000
different superblocks, 8000 different RPC client structs, and 8000
different sockets.
Apart from being a ridiculous waste of resources, that does little to
pop the contention bubble. It just ends up pushing it down to the
(shared) device layer.

So while I am open to the idea of making use of unprivileged ports, I do
not accept it as a substitute for socket sharing.

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 11:31   ` Olaf Kirch
  2004-12-09 13:36     ` Trond Myklebust
@ 2004-12-09 19:34     ` Dan Stromberg
  2004-12-09 21:33       ` Trond Myklebust
  1 sibling, 1 reply; 19+ messages in thread
From: Dan Stromberg @ 2004-12-09 19:34 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: strombrg, Mike Waychison, Lever, Charles, nfs

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

On Thu, 2004-12-09 at 12:31 +0100, Olaf Kirch wrote:
> On Wed, Dec 08, 2004 at 01:17:53PM -0500, Mike Waychison wrote:
> > This has been bugging me for a while.  The fact that we are limitting
> > ourselves to a single nfs mount per port.  From what I can tell, Solaris
> > shares the transports between nfs mounts from the same server and saves
> > themselves a lot of trouble with running out of port numbers in doing so.
> 
> Shouldn't we allow NFS mounts to use non-privileged ports? Many
> environments don't really care about the "security" provided by privileged
> ports, but would be more than happy if they can run with a few hundred
> NFS mounts

IMO, this is a good time to apply the principle: "Be liberal in what you
accept, and conservative in what you send".

Last I heard, windows didn't even have a concept of a reserved port.

When I wrote a BSD-compatible printsystem in python, I made it accept
connections from any port, but generate connections only from reserved
ports.

It'd probably be worthwhile to have options to make NFS (and my
printsystem) generate any port (not just reserved ones), and accept only
reserved ports - but the default probably should be to accept any port,
and send only reserved ports - not because reserved ports are effective
at all, but because it'll avoid never ending questions about why NFS
isn't working.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 19:34     ` Dan Stromberg
@ 2004-12-09 21:33       ` Trond Myklebust
  2004-12-09 22:29         ` Dan Stromberg
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-12-09 21:33 UTC (permalink / raw)
  To: Dan Stromberg; +Cc: Olaf Kirch, Mike Waychison, Charles Lever, nfs

to den 09.12.2004 Klokka 11:34 (-0800) skreiv Dan Stromberg:

> It'd probably be worthwhile to have options to make NFS (and my
> printsystem) generate any port (not just reserved ones), and accept only
> reserved ports - but the default probably should be to accept any port,
> and send only reserved ports - not because reserved ports are effective
> at all, but because it'll avoid never ending questions about why NFS
> isn't working.

Sure. The questions will no longer read "why isn't NFS working". They'll
read "why can any Tom, Dick and Harry suddenly read my private email
directly from the NFS server?". 8-)

The standard AUTH_SYS/AUTH_UNIX authentication scheme only checks the
source IP address, and trusts the client 100% when it comes to supplying
the correct uid/gid/.... (see RFC1831). By placing the additional
requirement that the source must be a privileged port, one is at least
able to prevent ordinary users on an authorized client from being able
to spoof NFS requests.

Cheers,
  Trond

-- 
Trond Myklebust <trond.myklebust@fys.uio.no>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)
  2004-12-09 21:33       ` Trond Myklebust
@ 2004-12-09 22:29         ` Dan Stromberg
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Stromberg @ 2004-12-09 22:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: strombrg, Olaf Kirch, Mike Waychison, Charles Lever, nfs

[-- Attachment #1: Type: text/plain, Size: 1184 bytes --]

On Thu, 2004-12-09 at 16:33 -0500, Trond Myklebust wrote:
> to den 09.12.2004 Klokka 11:34 (-0800) skreiv Dan Stromberg:
> 
> > It'd probably be worthwhile to have options to make NFS (and my
> > printsystem) generate any port (not just reserved ones), and accept only
> > reserved ports - but the default probably should be to accept any port,
> > and send only reserved ports - not because reserved ports are effective
> > at all, but because it'll avoid never ending questions about why NFS
> > isn't working.
> 
> Sure. The questions will no longer read "why isn't NFS working". They'll
> read "why can any Tom, Dick and Harry suddenly read my private email
> directly from the NFS server?". 8-)
> 
> The standard AUTH_SYS/AUTH_UNIX authentication scheme only checks the
> source IP address, and trusts the client 100% when it comes to supplying
> the correct uid/gid/.... (see RFC1831). By placing the additional
> requirement that the source must be a privileged port, one is at least
> able to prevent ordinary users on an authorized client from being able
> to spoof NFS requests.
> 
> Cheers,
>   Trond

I bow before your superior geekiness.  :)



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-12-09 22:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-08 14:33 xprt_bindresvport Lever, Charles
2004-12-08 18:17 ` [PATCH] xprt sharing (was Re: xprt_bindresvport) Mike Waychison
2004-12-09 11:31   ` Olaf Kirch
2004-12-09 13:36     ` Trond Myklebust
2004-12-09 13:44       ` Olaf Kirch
2004-12-09 16:20         ` Trond Myklebust
2004-12-09 19:34     ` Dan Stromberg
2004-12-09 21:33       ` Trond Myklebust
2004-12-09 22:29         ` Dan Stromberg
2004-12-09 11:01 ` xprt_bindresvport Olaf Kirch
  -- strict thread matches above, loose matches on Subject: below --
2004-12-08 19:08 [PATCH] xprt sharing (was Re: xprt_bindresvport) Lever, Charles
2004-12-08 21:58 ` Mike Waychison
2004-12-09 11:22 ` Olaf Kirch
2004-12-09 13:33   ` Trond Myklebust
2004-12-09 13:41     ` Olaf Kirch
2004-12-08 22:00 Lever, Charles
2004-12-09  8:54 Peter Åstrand
2004-12-09 11:14 ` Olaf Kirch
2004-12-09 14:03 Lever, Charles

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.