Linux NFS development
 help / color / mirror / Atom feed
From: Tom Tucker <tom@opengridcomputing.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, linux-nfs@vger.kernel.org
Subject: Re: [RFC,PATCH 11/38] svc: Add xpo_accept transport function
Date: Fri, 30 Nov 2007 15:47:20 -0600	[thread overview]
Message-ID: <1196459240.5432.56.camel@trinity.ogc.int> (raw)
In-Reply-To: <443F726B-E9E6-4859-8B61-2F8A305EF241@oracle.com>


On Fri, 2007-11-30 at 16:01 -0500, Chuck Lever wrote:
> The refactoring here helps clarity.  More below.
> 
> On Nov 29, 2007, at 5:40 PM, Tom Tucker wrote:
> > Previously, the accept logic looked into the socket state to determine
> > whether to call accept or recv when data-ready was indicated on an  
> > endpoint.
> > Since some transports don't use sockets, this logic was changed to  
> > use a flag
> > bit (SK_LISTENER) to identify listening endpoints. A transport  
> > function
> > (xpo_accept) was added to allow each transport to define its own  
> > accept
> > processing. A transport's initialization logic is reponsible for  
> > setting the
> > SK_LISTENER bit. I didn't see any way to do this in transport  
> > independent
> > logic since the passive side of a UDP connection doesn't listen and
> > always recv's.
> >
> > In the svc_recv function, if the SK_LISTENER bit is set, the transport
> > xpo_accept function is called to handle accept processing.
> >
> > Note that all functions are defined even if they don't make sense
> > for a given transport. For example, accept doesn't mean anything for
> > UDP. The fuction is defined anyway and bug checks if called. The
> 
> s/fuction/function/
> 
> :-O

ok.

> 
> > UDP transport should never set the SK_LISTENER bit.
> >
> > The code that poaches connections when the connection
> > limit is hit was moved to a subroutine to make the accept logic path
> > easier to follow. Since this is in the new connection path, it should
> > not be a performance issue.
> >
> > Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> > ---
> >
> >  include/linux/sunrpc/svc_xprt.h |    1
> >  include/linux/sunrpc/svcsock.h  |    1
> >  net/sunrpc/svcsock.c            |  127 ++++++++++++++++++++ 
> > +------------------
> >  3 files changed, 72 insertions(+), 57 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/ 
> > svc_xprt.h
> > index 3adc8f3..1527ff1 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -10,6 +10,7 @@
> >  #include <linux/sunrpc/svc.h>
> >
> >  struct svc_xprt_ops {
> > +	struct svc_xprt	*(*xpo_accept)(struct svc_xprt *);
> >  	int		(*xpo_has_wspace)(struct svc_xprt *);
> >  	int		(*xpo_recvfrom)(struct svc_rqst *);
> >  	void		(*xpo_prep_reply_hdr)(struct svc_rqst *);
> > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/ 
> > svcsock.h
> > index 08e78d0..9882ce0 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -36,6 +36,7 @@ struct svc_sock {
> >  #define	SK_DEFERRED	8			/* request on sk_deferred */
> >  #define	SK_OLD		9			/* used for temp socket aging mark+sweep */
> >  #define	SK_DETACHED	10			/* detached from tempsocks list */
> > +#define SK_LISTENER	11			/* listening endpoint */
> >
> >  	atomic_t    	    	sk_reserved;	/* space on outq that is reserved */
> >
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 38ecdd1..661162b 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -912,6 +912,12 @@ static int svc_udp_has_wspace(struct svc_xprt  
> > *xprt)
> >  	return 1;
> >  }
> >
> > +static struct svc_xprt *svc_udp_accept(struct svc_xprt *xprt)
> > +{
> > +	BUG();
> > +	return NULL;
> > +}
> > +
> >  static struct svc_xprt_ops svc_udp_ops = {
> >  	.xpo_recvfrom = svc_udp_recvfrom,
> >  	.xpo_sendto = svc_udp_sendto,
> > @@ -920,6 +926,7 @@ static struct svc_xprt_ops svc_udp_ops = {
> >  	.xpo_free = svc_sock_free,
> >  	.xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
> >  	.xpo_has_wspace = svc_udp_has_wspace,
> > +	.xpo_accept = svc_udp_accept,
> >  };
> >
> >  static struct svc_xprt_class svc_udp_class = {
> > @@ -1044,9 +1051,9 @@ static inline int svc_port_is_privileged 
> > (struct sockaddr *sin)
> >  /*
> >   * Accept a TCP connection
> >   */
> > -static void
> > -svc_tcp_accept(struct svc_sock *svsk)
> > +static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
> >  {
> > +	struct svc_sock *svsk = container_of(xprt, struct svc_sock,  
> > sk_xprt);
> >  	struct sockaddr_storage addr;
> >  	struct sockaddr	*sin = (struct sockaddr *) &addr;
> >  	struct svc_serv	*serv = svsk->sk_server;
> > @@ -1058,7 +1065,7 @@ svc_tcp_accept(struct svc_sock *svsk)
> >
> >  	dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
> >  	if (!sock)
> > -		return;
> > +		return NULL;
> >
> >  	clear_bit(SK_CONN, &svsk->sk_flags);
> >  	err = kernel_accept(sock, &newsock, O_NONBLOCK);
> > @@ -1069,7 +1076,7 @@ svc_tcp_accept(struct svc_sock *svsk)
> >  		else if (err != -EAGAIN && net_ratelimit())
> >  			printk(KERN_WARNING "%s: accept failed (err %d)!\n",
> >  				   serv->sv_name, -err);
> > -		return;
> > +		return NULL;
> >  	}
> >
> >  	set_bit(SK_CONN, &svsk->sk_flags);
> > @@ -1115,59 +1122,14 @@ svc_tcp_accept(struct svc_sock *svsk)
> >
> >  	svc_sock_received(newsvsk);
> >
> > -	/* make sure that we don't have too many active connections.
> > -	 * If we have, something must be dropped.
> > -	 *
> > -	 * There's no point in trying to do random drop here for
> > -	 * DoS prevention. The NFS clients does 1 reconnect in 15
> > -	 * seconds. An attacker can easily beat that.
> > -	 *
> > -	 * The only somewhat efficient mechanism would be if drop
> > -	 * old connections from the same IP first. But right now
> > -	 * we don't even record the client IP in svc_sock.
> > -	 */
> 
> > -	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> > -		struct svc_sock *svsk = NULL;
> > -		spin_lock_bh(&serv->sv_lock);
> > -		if (!list_empty(&serv->sv_tempsocks)) {
> > -			if (net_ratelimit()) {
> > -				/* Try to help the admin */
> > -				printk(KERN_NOTICE "%s: too many open TCP "
> > -					"sockets, consider increasing the "
> > -					"number of nfsd threads\n",
> > -						   serv->sv_name);
> > -				printk(KERN_NOTICE
> > -				       "%s: last TCP connect from %s\n",
> > -				       serv->sv_name, __svc_print_addr(sin,
> > -							buf, sizeof(buf)));
> > -			}
> > -			/*
> > -			 * Always select the oldest socket. It's not fair,
> > -			 * but so is life
> > -			 */
> > -			svsk = list_entry(serv->sv_tempsocks.prev,
> > -					  struct svc_sock,
> > -					  sk_list);
> > -			set_bit(SK_CLOSE, &svsk->sk_flags);
> > -			atomic_inc(&svsk->sk_inuse);
> > -		}
> > -		spin_unlock_bh(&serv->sv_lock);
> > -
> > -		if (svsk) {
> > -			svc_sock_enqueue(svsk);
> > -			svc_sock_put(svsk);
> > -		}
> > -
> > -	}
> > -
> >  	if (serv->sv_stats)
> >  		serv->sv_stats->nettcpconn++;
> >
> > -	return;
> > +	return &newsvsk->sk_xprt;
> >
> >  failed:
> >  	sock_release(newsock);
> > -	return;
> > +	return NULL;
> >  }
> >
> >  /*
> > @@ -1192,12 +1154,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> >  		return svc_deferred_recv(rqstp);
> >  	}
> >
> > -	if (svsk->sk_sk->sk_state == TCP_LISTEN) {
> > -		svc_tcp_accept(svsk);
> > -		svc_sock_received(svsk);
> > -		return 0;
> > -	}
> > -
> >  	if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
> >  		/* sndbuf needs to have room for one request
> >  		 * per thread, otherwise we can stall even when the
> > @@ -1403,6 +1359,7 @@ static struct svc_xprt_ops svc_tcp_ops = {
> >  	.xpo_free = svc_sock_free,
> >  	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> >  	.xpo_has_wspace = svc_tcp_has_wspace,
> > +	.xpo_accept = svc_tcp_accept,
> >  };
> >
> >  static struct svc_xprt_class svc_tcp_class = {
> > @@ -1433,6 +1390,7 @@ svc_tcp_init(struct svc_sock *svsk)
> >
> >  	if (sk->sk_state == TCP_LISTEN) {
> >  		dprintk("setting up TCP socket for listening\n");
> > +		set_bit(SK_LISTENER, &svsk->sk_flags);
> >  		sk->sk_data_ready = svc_tcp_listen_data_ready;
> >  		set_bit(SK_CONN, &svsk->sk_flags);
> >  	} else {
> > @@ -1484,6 +1442,55 @@ svc_sock_update_bufs(struct svc_serv *serv)
> >  	spin_unlock_bh(&serv->sv_lock);
> >  }
> >
> > +static void
> > +svc_check_conn_limits(struct svc_serv *serv)
> 
> Style police want the return value type and function declaration on  
> the same line.
> 

yes.

> > +{
> > +	char	buf[RPC_MAX_ADDRBUFLEN];
> > +
> > +	/* make sure that we don't have too many active connections.
> > +	 * If we have, something must be dropped.
> > +	 *
> > +	 * There's no point in trying to do random drop here for
> > +	 * DoS prevention. The NFS clients does 1 reconnect in 15
> > +	 * seconds. An attacker can easily beat that.
> > +	 *
> > +	 * The only somewhat efficient mechanism would be if drop
> > +	 * old connections from the same IP first. But right now
> > +	 * we don't even record the client IP in svc_sock.
> > +	 */
> 
> Just a personal preference: I think this would better serve as a  
> block comment placed just before the function declaration above.
> 

agreed.


> > +	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> 
> Would be nice if the naked constants were defined as macros  
> somewhere.  Some rationale for these values would be nice (does  
> anyone (ie, Greg!) remember why these values were chosen?).
> 

I've noodled on this before and produced nothing but heat. I'm afraid
the best I can come up with is #define THREE and #define TWENTY :-)

> > +		struct svc_sock *svsk = NULL;
> > +		spin_lock_bh(&serv->sv_lock);
> > +		if (!list_empty(&serv->sv_tempsocks)) {
> > +			if (net_ratelimit()) {
> > +				/* Try to help the admin */
> > +				printk(KERN_NOTICE "%s: too many open TCP "
> > +					"sockets, consider increasing the "
> > +					"number of nfsd threads\n",
> > +						   serv->sv_name);
> > +				printk(KERN_NOTICE
> > +				       "%s: last TCP connect from %s\n",
> > +				       serv->sv_name, buf);
> > +			}
> > +			/*
> > +			 * Always select the oldest socket. It's not fair,
> > +			 * but so is life
> > +			 */
> > +			svsk = list_entry(serv->sv_tempsocks.prev,
> > +					  struct svc_sock,
> > +					  sk_list);
> > +			set_bit(SK_CLOSE, &svsk->sk_flags);
> > +			atomic_inc(&svsk->sk_inuse);
> > +		}
> > +		spin_unlock_bh(&serv->sv_lock);
> > +
> > +		if (svsk) {
> > +			svc_sock_enqueue(svsk);
> > +			svc_sock_put(svsk);
> > +		}
> > +	}
> > +}
> > +
> >  /*
> >   * Receive the next request on any socket.  This code is carefully
> >   * organised not to touch any cachelines in the shared svc_serv
> > @@ -1579,6 +1586,12 @@ svc_recv(struct svc_rqst *rqstp, long timeout)
> >  	if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
> >  		dprintk("svc_recv: found SK_CLOSE\n");
> >  		svc_delete_socket(svsk);
> > +	} else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> > +		struct svc_xprt *newxpt;
> > +		newxpt = svsk->sk_xprt.xpt_ops->xpo_accept(&svsk->sk_xprt);
> > +		if (newxpt)
> > +			svc_check_conn_limits(svsk->sk_server);
> > +		svc_sock_received(svsk);
> >  	} else {
> >  		dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> >  			rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2007-11-30 21:43 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-29 22:39 [RFC,PATCH 00/38] SVC Transport Switch Tom Tucker
     [not found] ` <20071129223917.14563.77633.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:39   ` [RFC,PATCH 01/38] svc: Add an svc transport class Tom Tucker
2007-11-29 22:39   ` [RFC,PATCH 02/38] svc: Make svc_sock the tcp/udp transport Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 03/38] svc: Change the svc_sock in the rqstp structure to a transport Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 04/38] svc: Add a max payload value to the transport Tom Tucker
     [not found]     ` <20071129224002.14563.96227.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 20:22       ` Chuck Lever
2007-11-30 20:51         ` Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 05/38] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 06/38] svc: Add transport specific xpo_release function Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 07/38] svc: Add per-transport delete functions Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 08/38] svc: Add xpo_prep_reply_hdr Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 09/38] svc: Add a transport function that checks for write space Tom Tucker
     [not found]     ` <20071129224012.14563.23130.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 20:46       ` Chuck Lever
2007-11-30 21:39         ` Tom Tucker
     [not found]           ` <1196458764.5432.52.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-11-30 22:43             ` Chuck Lever
2007-12-10 20:43               ` Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 10/38] svc: Move close processing to a single place Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker
     [not found]     ` <20071129224016.14563.67547.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 21:01       ` Chuck Lever
2007-11-30 21:47         ` Tom Tucker [this message]
2007-11-29 22:40   ` [RFC,PATCH 12/38] svc: Add a generic transport svc_create_xprt function Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 13/38] svc: Change services to use new svc_create_xprt service Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 14/38] svc: Change sk_inuse to a kref Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 15/38] svc: Move sk_flags to the svc_xprt structure Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 16/38] svc: Move sk_server and sk_pool to svc_xprt Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 17/38] svc: Make close transport independent Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 18/38] svc: Move sk_reserved to svc_xprt Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 19/38] svc: Make the enqueue service transport neutral and export it Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 20/38] svc: Make svc_send transport neutral Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 21/38] svc: Change svc_sock_received to svc_xprt_received and export it Tom Tucker
     [not found]     ` <20071129224037.14563.69171.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 21:33       ` Chuck Lever
2007-11-30 23:17         ` Tom Tucker
     [not found]           ` <1196464634.5432.68.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-11-30 23:23             ` Chuck Lever
2007-11-29 22:40   ` [RFC,PATCH 22/38] svc: Remove sk_lastrecv Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 24/38] svc: Make deferral processing xprt independent Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 25/38] svc: Move the sockaddr information to svc_xprt Tom Tucker
     [not found]     ` <20071129224046.14563.59353.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 23:20       ` Chuck Lever
2007-11-29 22:40   ` [RFC,PATCH 26/38] svc: Make svc_sock_release svc_xprt_release Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 27/38] svc: Make svc_recv transport neutral Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 28/38] svc: Make svc_age_temp_sockets svc_age_temp_transports Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 29/38] svc: Move common create logic to common code Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 30/38] svc: Removing remaining references to rq_sock in rqstp Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 31/38] svc: Make svc_check_conn_limits xprt independent Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 32/38] svc: Move the xprt independent code to the svc_xprt.c file Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 33/38] svc: Add transport hdr size for defer/revisit Tom Tucker
     [not found]     ` <20071129224103.14563.72780.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 23:51       ` Chuck Lever
2007-11-29 22:41   ` [RFC,PATCH 34/38] svc: Add /proc/sys/sunrpc/transport files Tom Tucker
     [not found]     ` <20071129224105.14563.48684.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-12-03 16:44       ` Chuck Lever
2007-12-03 16:58         ` J. Bruce Fields
2007-12-03 18:30           ` Chuck Lever
2007-12-03 19:10             ` Tom Tucker
     [not found]               ` <1196709058.5811.21.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-12-03 20:36                 ` Chuck Lever
2007-12-04  0:45                   ` Tom Tucker
2007-12-05  8:44           ` Greg Banks
2007-11-29 22:41   ` [RFC,PATCH 35/38] knfsd: Support adding transports by writing portlist file Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 36/38] svc: Add svc API that queries for a transport instance Tom Tucker
     [not found]     ` <20071129224109.14563.34563.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-12-01  0:00       ` Chuck Lever
2007-11-29 22:41   ` [RFC,PATCH 37/38] knfsd: Modify write_ports to use svc_find_xprt service Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 38/38] svc: Add svc_xprt_names service to replace svc_sock_names Tom Tucker
2007-11-29 23:18   ` [RFC,PATCH 00/38] SVC Transport Switch Tom Tucker
  -- strict thread matches above, loose matches on Subject: below --
2007-11-29 22:51 Tom Tucker
     [not found] ` <20071129225142.15107.46200.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:53   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker
2007-11-29 22:55 [RFC,PATCH 00/38] RPC Transport Switch Tom Tucker
     [not found] ` <20071129225510.15275.82660.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:56   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1196459240.5432.56.camel@trinity.ogc.int \
    --to=tom@opengridcomputing.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox