All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Tucker <tom@opengridcomputing.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, linux-nfs@vger.kernel.org
Subject: Re: [RFC,PATCH 11/38] svc: Add xpo_accept transport function
Date: Fri, 30 Nov 2007 15:47:20 -0600	[thread overview]
Message-ID: <1196459240.5432.56.camel@trinity.ogc.int> (raw)
In-Reply-To: <443F726B-E9E6-4859-8B61-2F8A305EF241@oracle.com>


On Fri, 2007-11-30 at 16:01 -0500, Chuck Lever wrote:
> The refactoring here helps clarity.  More below.
> 
> On Nov 29, 2007, at 5:40 PM, Tom Tucker wrote:
> > Previously, the accept logic looked into the socket state to determine
> > whether to call accept or recv when data-ready was indicated on an  
> > endpoint.
> > Since some transports don't use sockets, this logic was changed to  
> > use a flag
> > bit (SK_LISTENER) to identify listening endpoints. A transport  
> > function
> > (xpo_accept) was added to allow each transport to define its own  
> > accept
> > processing. A transport's initialization logic is reponsible for  
> > setting the
> > SK_LISTENER bit. I didn't see any way to do this in transport  
> > independent
> > logic since the passive side of a UDP connection doesn't listen and
> > always recv's.
> >
> > In the svc_recv function, if the SK_LISTENER bit is set, the transport
> > xpo_accept function is called to handle accept processing.
> >
> > Note that all functions are defined even if they don't make sense
> > for a given transport. For example, accept doesn't mean anything for
> > UDP. The fuction is defined anyway and bug checks if called. The
> 
> s/fuction/function/
> 
> :-O

ok.

> 
> > UDP transport should never set the SK_LISTENER bit.
> >
> > The code that poaches connections when the connection
> > limit is hit was moved to a subroutine to make the accept logic path
> > easier to follow. Since this is in the new connection path, it should
> > not be a performance issue.
> >
> > Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> > ---
> >
> >  include/linux/sunrpc/svc_xprt.h |    1
> >  include/linux/sunrpc/svcsock.h  |    1
> >  net/sunrpc/svcsock.c            |  127 ++++++++++++++++++++ 
> > +------------------
> >  3 files changed, 72 insertions(+), 57 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/ 
> > svc_xprt.h
> > index 3adc8f3..1527ff1 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -10,6 +10,7 @@
> >  #include <linux/sunrpc/svc.h>
> >
> >  struct svc_xprt_ops {
> > +	struct svc_xprt	*(*xpo_accept)(struct svc_xprt *);
> >  	int		(*xpo_has_wspace)(struct svc_xprt *);
> >  	int		(*xpo_recvfrom)(struct svc_rqst *);
> >  	void		(*xpo_prep_reply_hdr)(struct svc_rqst *);
> > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/ 
> > svcsock.h
> > index 08e78d0..9882ce0 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -36,6 +36,7 @@ struct svc_sock {
> >  #define	SK_DEFERRED	8			/* request on sk_deferred */
> >  #define	SK_OLD		9			/* used for temp socket aging mark+sweep */
> >  #define	SK_DETACHED	10			/* detached from tempsocks list */
> > +#define SK_LISTENER	11			/* listening endpoint */
> >
> >  	atomic_t    	    	sk_reserved;	/* space on outq that is reserved */
> >
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 38ecdd1..661162b 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -912,6 +912,12 @@ static int svc_udp_has_wspace(struct svc_xprt  
> > *xprt)
> >  	return 1;
> >  }
> >
> > +static struct svc_xprt *svc_udp_accept(struct svc_xprt *xprt)
> > +{
> > +	BUG();
> > +	return NULL;
> > +}
> > +
> >  static struct svc_xprt_ops svc_udp_ops = {
> >  	.xpo_recvfrom = svc_udp_recvfrom,
> >  	.xpo_sendto = svc_udp_sendto,
> > @@ -920,6 +926,7 @@ static struct svc_xprt_ops svc_udp_ops = {
> >  	.xpo_free = svc_sock_free,
> >  	.xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
> >  	.xpo_has_wspace = svc_udp_has_wspace,
> > +	.xpo_accept = svc_udp_accept,
> >  };
> >
> >  static struct svc_xprt_class svc_udp_class = {
> > @@ -1044,9 +1051,9 @@ static inline int svc_port_is_privileged 
> > (struct sockaddr *sin)
> >  /*
> >   * Accept a TCP connection
> >   */
> > -static void
> > -svc_tcp_accept(struct svc_sock *svsk)
> > +static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt)
> >  {
> > +	struct svc_sock *svsk = container_of(xprt, struct svc_sock,  
> > sk_xprt);
> >  	struct sockaddr_storage addr;
> >  	struct sockaddr	*sin = (struct sockaddr *) &addr;
> >  	struct svc_serv	*serv = svsk->sk_server;
> > @@ -1058,7 +1065,7 @@ svc_tcp_accept(struct svc_sock *svsk)
> >
> >  	dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
> >  	if (!sock)
> > -		return;
> > +		return NULL;
> >
> >  	clear_bit(SK_CONN, &svsk->sk_flags);
> >  	err = kernel_accept(sock, &newsock, O_NONBLOCK);
> > @@ -1069,7 +1076,7 @@ svc_tcp_accept(struct svc_sock *svsk)
> >  		else if (err != -EAGAIN && net_ratelimit())
> >  			printk(KERN_WARNING "%s: accept failed (err %d)!\n",
> >  				   serv->sv_name, -err);
> > -		return;
> > +		return NULL;
> >  	}
> >
> >  	set_bit(SK_CONN, &svsk->sk_flags);
> > @@ -1115,59 +1122,14 @@ svc_tcp_accept(struct svc_sock *svsk)
> >
> >  	svc_sock_received(newsvsk);
> >
> > -	/* make sure that we don't have too many active connections.
> > -	 * If we have, something must be dropped.
> > -	 *
> > -	 * There's no point in trying to do random drop here for
> > -	 * DoS prevention. The NFS clients does 1 reconnect in 15
> > -	 * seconds. An attacker can easily beat that.
> > -	 *
> > -	 * The only somewhat efficient mechanism would be if drop
> > -	 * old connections from the same IP first. But right now
> > -	 * we don't even record the client IP in svc_sock.
> > -	 */
> 
> > -	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> > -		struct svc_sock *svsk = NULL;
> > -		spin_lock_bh(&serv->sv_lock);
> > -		if (!list_empty(&serv->sv_tempsocks)) {
> > -			if (net_ratelimit()) {
> > -				/* Try to help the admin */
> > -				printk(KERN_NOTICE "%s: too many open TCP "
> > -					"sockets, consider increasing the "
> > -					"number of nfsd threads\n",
> > -						   serv->sv_name);
> > -				printk(KERN_NOTICE
> > -				       "%s: last TCP connect from %s\n",
> > -				       serv->sv_name, __svc_print_addr(sin,
> > -							buf, sizeof(buf)));
> > -			}
> > -			/*
> > -			 * Always select the oldest socket. It's not fair,
> > -			 * but so is life
> > -			 */
> > -			svsk = list_entry(serv->sv_tempsocks.prev,
> > -					  struct svc_sock,
> > -					  sk_list);
> > -			set_bit(SK_CLOSE, &svsk->sk_flags);
> > -			atomic_inc(&svsk->sk_inuse);
> > -		}
> > -		spin_unlock_bh(&serv->sv_lock);
> > -
> > -		if (svsk) {
> > -			svc_sock_enqueue(svsk);
> > -			svc_sock_put(svsk);
> > -		}
> > -
> > -	}
> > -
> >  	if (serv->sv_stats)
> >  		serv->sv_stats->nettcpconn++;
> >
> > -	return;
> > +	return &newsvsk->sk_xprt;
> >
> >  failed:
> >  	sock_release(newsock);
> > -	return;
> > +	return NULL;
> >  }
> >
> >  /*
> > @@ -1192,12 +1154,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> >  		return svc_deferred_recv(rqstp);
> >  	}
> >
> > -	if (svsk->sk_sk->sk_state == TCP_LISTEN) {
> > -		svc_tcp_accept(svsk);
> > -		svc_sock_received(svsk);
> > -		return 0;
> > -	}
> > -
> >  	if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
> >  		/* sndbuf needs to have room for one request
> >  		 * per thread, otherwise we can stall even when the
> > @@ -1403,6 +1359,7 @@ static struct svc_xprt_ops svc_tcp_ops = {
> >  	.xpo_free = svc_sock_free,
> >  	.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> >  	.xpo_has_wspace = svc_tcp_has_wspace,
> > +	.xpo_accept = svc_tcp_accept,
> >  };
> >
> >  static struct svc_xprt_class svc_tcp_class = {
> > @@ -1433,6 +1390,7 @@ svc_tcp_init(struct svc_sock *svsk)
> >
> >  	if (sk->sk_state == TCP_LISTEN) {
> >  		dprintk("setting up TCP socket for listening\n");
> > +		set_bit(SK_LISTENER, &svsk->sk_flags);
> >  		sk->sk_data_ready = svc_tcp_listen_data_ready;
> >  		set_bit(SK_CONN, &svsk->sk_flags);
> >  	} else {
> > @@ -1484,6 +1442,55 @@ svc_sock_update_bufs(struct svc_serv *serv)
> >  	spin_unlock_bh(&serv->sv_lock);
> >  }
> >
> > +static void
> > +svc_check_conn_limits(struct svc_serv *serv)
> 
> Style police want the return value type and function declaration on  
> the same line.
> 

yes.

> > +{
> > +	char	buf[RPC_MAX_ADDRBUFLEN];
> > +
> > +	/* make sure that we don't have too many active connections.
> > +	 * If we have, something must be dropped.
> > +	 *
> > +	 * There's no point in trying to do random drop here for
> > +	 * DoS prevention. The NFS clients does 1 reconnect in 15
> > +	 * seconds. An attacker can easily beat that.
> > +	 *
> > +	 * The only somewhat efficient mechanism would be if drop
> > +	 * old connections from the same IP first. But right now
> > +	 * we don't even record the client IP in svc_sock.
> > +	 */
> 
> Just a personal preference: I think this would better serve as a  
> block comment placed just before the function declaration above.
> 

agreed.


> > +	if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> 
> Would be nice if the naked constants were defined as macros  
> somewhere.  Some rationale for these values would be nice (does  
> anyone (ie, Greg!) remember why these values were chosen?).
> 

I've noodled on this before and produced nothing but heat. I'm afraid
the best I can come up with is #define THREE and #define TWENTY :-)

> > +		struct svc_sock *svsk = NULL;
> > +		spin_lock_bh(&serv->sv_lock);
> > +		if (!list_empty(&serv->sv_tempsocks)) {
> > +			if (net_ratelimit()) {
> > +				/* Try to help the admin */
> > +				printk(KERN_NOTICE "%s: too many open TCP "
> > +					"sockets, consider increasing the "
> > +					"number of nfsd threads\n",
> > +						   serv->sv_name);
> > +				printk(KERN_NOTICE
> > +				       "%s: last TCP connect from %s\n",
> > +				       serv->sv_name, buf);
> > +			}
> > +			/*
> > +			 * Always select the oldest socket. It's not fair,
> > +			 * but so is life
> > +			 */
> > +			svsk = list_entry(serv->sv_tempsocks.prev,
> > +					  struct svc_sock,
> > +					  sk_list);
> > +			set_bit(SK_CLOSE, &svsk->sk_flags);
> > +			atomic_inc(&svsk->sk_inuse);
> > +		}
> > +		spin_unlock_bh(&serv->sv_lock);
> > +
> > +		if (svsk) {
> > +			svc_sock_enqueue(svsk);
> > +			svc_sock_put(svsk);
> > +		}
> > +	}
> > +}
> > +
> >  /*
> >   * Receive the next request on any socket.  This code is carefully
> >   * organised not to touch any cachelines in the shared svc_serv
> > @@ -1579,6 +1586,12 @@ svc_recv(struct svc_rqst *rqstp, long timeout)
> >  	if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
> >  		dprintk("svc_recv: found SK_CLOSE\n");
> >  		svc_delete_socket(svsk);
> > +	} else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> > +		struct svc_xprt *newxpt;
> > +		newxpt = svsk->sk_xprt.xpt_ops->xpo_accept(&svsk->sk_xprt);
> > +		if (newxpt)
> > +			svc_check_conn_limits(svsk->sk_server);
> > +		svc_sock_received(svsk);
> >  	} else {
> >  		dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> >  			rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2007-11-30 21:43 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-29 22:39 [RFC,PATCH 00/38] SVC Transport Switch Tom Tucker
     [not found] ` <20071129223917.14563.77633.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:39   ` [RFC,PATCH 01/38] svc: Add an svc transport class Tom Tucker
2007-11-29 22:39   ` [RFC,PATCH 02/38] svc: Make svc_sock the tcp/udp transport Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 03/38] svc: Change the svc_sock in the rqstp structure to a transport Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 04/38] svc: Add a max payload value to the transport Tom Tucker
     [not found]     ` <20071129224002.14563.96227.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 20:22       ` Chuck Lever
2007-11-30 20:51         ` Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 05/38] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 06/38] svc: Add transport specific xpo_release function Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 07/38] svc: Add per-transport delete functions Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 08/38] svc: Add xpo_prep_reply_hdr Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 09/38] svc: Add a transport function that checks for write space Tom Tucker
     [not found]     ` <20071129224012.14563.23130.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 20:46       ` Chuck Lever
2007-11-30 21:39         ` Tom Tucker
     [not found]           ` <1196458764.5432.52.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-11-30 22:43             ` Chuck Lever
2007-12-10 20:43               ` Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 10/38] svc: Move close processing to a single place Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker
     [not found]     ` <20071129224016.14563.67547.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 21:01       ` Chuck Lever
2007-11-30 21:47         ` Tom Tucker [this message]
2007-11-29 22:40   ` [RFC,PATCH 12/38] svc: Add a generic transport svc_create_xprt function Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 13/38] svc: Change services to use new svc_create_xprt service Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 14/38] svc: Change sk_inuse to a kref Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 15/38] svc: Move sk_flags to the svc_xprt structure Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 16/38] svc: Move sk_server and sk_pool to svc_xprt Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 17/38] svc: Make close transport independent Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 18/38] svc: Move sk_reserved to svc_xprt Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 19/38] svc: Make the enqueue service transport neutral and export it Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 20/38] svc: Make svc_send transport neutral Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 21/38] svc: Change svc_sock_received to svc_xprt_received and export it Tom Tucker
     [not found]     ` <20071129224037.14563.69171.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 21:33       ` Chuck Lever
2007-11-30 23:17         ` Tom Tucker
     [not found]           ` <1196464634.5432.68.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-11-30 23:23             ` Chuck Lever
2007-11-29 22:40   ` [RFC,PATCH 22/38] svc: Remove sk_lastrecv Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 24/38] svc: Make deferral processing xprt independent Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 25/38] svc: Move the sockaddr information to svc_xprt Tom Tucker
     [not found]     ` <20071129224046.14563.59353.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 23:20       ` Chuck Lever
2007-11-29 22:40   ` [RFC,PATCH 26/38] svc: Make svc_sock_release svc_xprt_release Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 27/38] svc: Make svc_recv transport neutral Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 28/38] svc: Make svc_age_temp_sockets svc_age_temp_transports Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 29/38] svc: Move common create logic to common code Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 30/38] svc: Removing remaining references to rq_sock in rqstp Tom Tucker
2007-11-29 22:40   ` [RFC,PATCH 31/38] svc: Make svc_check_conn_limits xprt independent Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 32/38] svc: Move the xprt independent code to the svc_xprt.c file Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 33/38] svc: Add transport hdr size for defer/revisit Tom Tucker
     [not found]     ` <20071129224103.14563.72780.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-30 23:51       ` Chuck Lever
2007-11-29 22:41   ` [RFC,PATCH 34/38] svc: Add /proc/sys/sunrpc/transport files Tom Tucker
     [not found]     ` <20071129224105.14563.48684.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-12-03 16:44       ` Chuck Lever
2007-12-03 16:58         ` J. Bruce Fields
2007-12-03 18:30           ` Chuck Lever
2007-12-03 19:10             ` Tom Tucker
     [not found]               ` <1196709058.5811.21.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2007-12-03 20:36                 ` Chuck Lever
2007-12-04  0:45                   ` Tom Tucker
2007-12-05  8:44           ` Greg Banks
2007-11-29 22:41   ` [RFC,PATCH 35/38] knfsd: Support adding transports by writing portlist file Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 36/38] svc: Add svc API that queries for a transport instance Tom Tucker
     [not found]     ` <20071129224109.14563.34563.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-12-01  0:00       ` Chuck Lever
2007-11-29 22:41   ` [RFC,PATCH 37/38] knfsd: Modify write_ports to use svc_find_xprt service Tom Tucker
2007-11-29 22:41   ` [RFC,PATCH 38/38] svc: Add svc_xprt_names service to replace svc_sock_names Tom Tucker
2007-11-29 23:18   ` [RFC,PATCH 00/38] SVC Transport Switch Tom Tucker
  -- strict thread matches above, loose matches on Subject: below --
2007-11-29 22:51 Tom Tucker
     [not found] ` <20071129225142.15107.46200.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:53   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker
2007-11-29 22:55 [RFC,PATCH 00/38] RPC Transport Switch Tom Tucker
     [not found] ` <20071129225510.15275.82660.stgit-gUwIgmpLGaKNDNWfRnPdfg@public.gmane.org>
2007-11-29 22:56   ` [RFC,PATCH 11/38] svc: Add xpo_accept transport function Tom Tucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1196459240.5432.56.camel@trinity.ogc.int \
    --to=tom@opengridcomputing.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.