cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [RFC PATCH 1/3] NLM lock failover - lock release
@ 2006-06-29 18:11 Wendy Cheng
  2006-06-29 19:11 ` [Cluster-devel] Re: [NFS] " Chuck Lever
  2006-06-29 23:06 ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Wendy Cheng @ 2006-06-29 18:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The patch piggy-backs the logic into "rq_daddr" field of struct svc_rqst 
where NFS server ip address is stored. Upon writing IPv4 address in 
standard dot notation into /proc/fs/nfsd/nlm_unlock, the logic will 
examine NLM's global nlm_files list and subsequently unlock the 
associated file if server ip address matches.

Due to the size of rq_daddr (u32), we would not be able to support IPV6 
for this round of changes. Another to-do item is to enable client:server 
ip pairs to allow NFS V4 failover.




-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gfs_nlm_unlock.patch
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20060629/0e2f4e2b/attachment.ksh>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-29 18:11 [Cluster-devel] [RFC PATCH 1/3] NLM lock failover - lock release Wendy Cheng
@ 2006-06-29 19:11 ` Chuck Lever
  2006-06-29 21:49   ` Wendy Cheng
  2006-06-29 23:06 ` Trond Myklebust
  1 sibling, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2006-06-29 19:11 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 6/29/06, Wendy Cheng <wcheng@redhat.com> wrote:
> The patch piggy-backs the logic into "rq_daddr" field of struct svc_rqst
> where NFS server ip address is stored. Upon writing IPv4 address in
> standard dot notation into /proc/fs/nfsd/nlm_unlock, the logic will
> examine NLM's global nlm_files list and subsequently unlock the
> associated file if server ip address matches.
>
> Due to the size of rq_daddr (u32), we would not be able to support IPV6
> for this round of changes. Another to-do item is to enable client:server
> ip pairs to allow NFS V4 failover.

FYI: I have a patch in my IPv6 patchset that increases the size of
this field specifically in order to hold an IPv6 address.  See:

   http://oss.oracle.com/~cel/linux-2.6/2.6.17/patches/52-svc-rq_daddr.diff

for the individual patch, and surrounding patches for more context.

-- 
"We who cut mere stones must always be envisioning cathedrals"
   -- Quarry worker's creed



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-29 19:11 ` [Cluster-devel] Re: [NFS] " Chuck Lever
@ 2006-06-29 21:49   ` Wendy Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Wendy Cheng @ 2006-06-29 21:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Chuck Lever wrote:

>On 6/29/06, Wendy Cheng <wcheng@redhat.com> wrote:
>  
>
>>The patch piggy-backs the logic into "rq_daddr" field of struct svc_rqst
>>where NFS server ip address is stored. Upon writing IPv4 address in
>>standard dot notation into /proc/fs/nfsd/nlm_unlock, the logic will
>>examine NLM's global nlm_files list and subsequently unlock the
>>associated file if server ip address matches.
>>
>>Due to the size of rq_daddr (u32), we would not be able to support IPV6
>>for this round of changes. Another to-do item is to enable client:server
>>ip pairs to allow NFS V4 failover.
>>    
>>
>
>FYI: I have a patch in my IPv6 patchset that increases the size of
>this field specifically in order to hold an IPv6 address.  See:
>
>   http://oss.oracle.com/~cel/linux-2.6/2.6.17/patches/52-svc-rq_daddr.diff
>
>for the individual patch, and surrounding patches for more context.
>
>  
>
ok, thanks ...

-- Wendy

-- 
S. Wendy Cheng
wcheng at redhat.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-29 18:11 [Cluster-devel] [RFC PATCH 1/3] NLM lock failover - lock release Wendy Cheng
  2006-06-29 19:11 ` [Cluster-devel] Re: [NFS] " Chuck Lever
@ 2006-06-29 23:06 ` Trond Myklebust
  2006-06-30  3:57   ` Wendy Cheng
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-06-29 23:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2006-06-29 at 14:11 -0400, Wendy Cheng wrote:
> The patch piggy-backs the logic into "rq_daddr" field of struct svc_rqst 
> where NFS server ip address is stored. Upon writing IPv4 address in 
> standard dot notation into /proc/fs/nfsd/nlm_unlock, the logic will 
> examine NLM's global nlm_files list and subsequently unlock the 
> associated file if server ip address matches.
> 
> Due to the size of rq_daddr (u32), we would not be able to support IPV6 
> for this round of changes. Another to-do item is to enable client:server 
> ip pairs to allow NFS V4 failover.
> 
> 
> 
> 
> plain text document attachment (gfs_nlm_unlock.patch)
>  fs/lockd/svcsubs.c          |   58 ++++++++++++++++++++++++++++++++++++++++----
>  fs/nfsd/nfsctl.c            |   54 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/lockd/bind.h  |    5 +++
>  include/linux/lockd/lockd.h |    3 ++
>  net/sunrpc/svcsock.c        |    5 +++
>  5 files changed, 120 insertions(+), 5 deletions(-)
> 
> --- linux-2.6.17/include/linux/lockd/lockd.h	2006-06-17 21:49:35.000000000 -0400
> +++ linux-2.6.17-1/include/linux/lockd/lockd.h	2006-06-27 10:58:47.000000000 -0400
> @@ -105,6 +105,7 @@ struct nlm_file {
>  	unsigned int		f_count;	/* reference count */
>  	struct semaphore	f_sema;		/* avoid concurrent access */
>  	int		       	f_hash;		/* hash of f_handle */
> +	__u32			f_iaddr;	/* server ip for failover */
>  };
>  
>  /*
> @@ -133,6 +134,7 @@ struct nlm_block {
>  #define NLM_ACT_CHECK		0		/* check for locks */
>  #define NLM_ACT_MARK		1		/* mark & sweep */
>  #define NLM_ACT_UNLOCK		2		/* release all locks */
> +#define NLM_ACT_FO_UNLOCK	3		/* failover release locks */
>  
>  /*
>   * Global variables
> @@ -196,6 +198,7 @@ void		  nlm_release_file(struct nlm_file
>  void		  nlmsvc_mark_resources(void);
>  void		  nlmsvc_free_host_resources(struct nlm_host *);
>  void		  nlmsvc_invalidate_all(void);
> +int 		  nlmsvc_fo_unlock(struct in_addr *);
>  
>  static __inline__ struct inode *
>  nlmsvc_file_inode(struct nlm_file *file)
> --- linux-2.6.17/net/sunrpc/svcsock.c	2006-06-17 21:49:35.000000000 -0400
> +++ linux-2.6.17-1/net/sunrpc/svcsock.c	2006-06-27 10:58:47.000000000 -0400
> @@ -454,6 +454,7 @@ svc_recvfrom(struct svc_rqst *rqstp, str
>  	struct msghdr	msg;
>  	struct socket	*sock;
>  	int		len, alen;
> +	struct sockaddr_in   daddr;
>  
>  	rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
>  	sock = rqstp->rq_sock->sk_sock;
> @@ -474,6 +475,10 @@ svc_recvfrom(struct svc_rqst *rqstp, str
>  	alen = sizeof(rqstp->rq_addr);
>  	sock->ops->getname(sock, (struct sockaddr *)&rqstp->rq_addr, &alen, 1);
>  
> +	/* add server ip for nlm lock failover */
> +	sock->ops->getname(sock, (struct sockaddr *)&daddr, &alen, 0);
> +	rqstp->rq_daddr = daddr.sin_addr.s_addr;
> +

Hmm.... Why would you want to do this on every receive when you could
just store the ip address in the struct svc_sock once and for all?

That said, how do you envisage this working in the cases where the
socket is bound to INADDR_ANY?

Cheers,
  Trond




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-29 23:06 ` Trond Myklebust
@ 2006-06-30  3:57   ` Wendy Cheng
       [not found]     ` <message from Wendy Cheng on Thursday June 29>
  2006-06-30  4:39     ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Wendy Cheng @ 2006-06-30  3:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2006-06-29 at 19:06 -0400, Trond Myklebust wrote:

> >  
> > +	/* add server ip for nlm lock failover */
> > +	sock->ops->getname(sock, (struct sockaddr *)&daddr, &alen, 0);
> > +	rqstp->rq_daddr = daddr.sin_addr.s_addr;
> > +
> 
> Hmm.... Why would you want to do this on every receive when you could
> just store the ip address in the struct svc_sock once and for all?

ok, will do that - save latency. Thanks.

> 
> That said, how do you envisage this working in the cases where the
> socket is bound to INADDR_ANY?

This is "our" (server's) address, not peer address - for this request to
arrive "here", it can't be INADDR_ANY. Can it ? Remember "rq_daddr" will
only be used during failover in a clustered NFS servers environment. 


-- Wendy





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
       [not found]     ` <message from Wendy Cheng on Thursday June 29>
@ 2006-06-30  4:15       ` Neil Brown
  2006-06-30  4:38         ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2006-06-30  4:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thursday June 29, wcheng at redhat.com wrote:
> On Thu, 2006-06-29 at 19:06 -0400, Trond Myklebust wrote:
> 
> > >  
> > > +	/* add server ip for nlm lock failover */
> > > +	sock->ops->getname(sock, (struct sockaddr *)&daddr, &alen, 0);
> > > +	rqstp->rq_daddr = daddr.sin_addr.s_addr;
> > > +
> > 
> > Hmm.... Why would you want to do this on every receive when you could
> > just store the ip address in the struct svc_sock once and for all?
> 
> ok, will do that - save latency. Thanks.
> 
> > 
> > That said, how do you envisage this working in the cases where the
> > socket is bound to INADDR_ANY?
> 
> This is "our" (server's) address, not peer address - for this request to
> arrive "here", it can't be INADDR_ANY. Can it ? Remember "rq_daddr" will
> only be used during failover in a clustered NFS servers environment. 

The socket can only be bound to INADDR_ANY for UDP, and in that case
we already set rq_daddr correctly.

For a TCP socket, it will be connected, so the local an remote
endpoints will be well defined.

So the code as it stands should work fine.

But yes, it would be best to record the local and remote addresses in
svc_tcp_accept rather than called ->getname twice in svc_recvfrom.

NeilBrown



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-30  4:15       ` Neil Brown
@ 2006-06-30  4:38         ` Trond Myklebust
       [not found]           ` <message from Trond Myklebust on Friday June 30>
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-06-30  4:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, 2006-06-30 at 14:15 +1000, Neil Brown wrote:

> The socket can only be bound to INADDR_ANY for UDP, and in that case
> we already set rq_daddr correctly.

As I understand it, Wendy is considering the case of a multi-homed
server. She wants to record the IP address on which we received the
datagram so that she knows which locks to invalidate in the case of a
migration of that particular IP address onto another server.

My point is that she won't get that information if the socket is bound
to INADDR_ANY, as would be the case for a UDP socket.

Cheers,
  Trond



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
  2006-06-30  3:57   ` Wendy Cheng
       [not found]     ` <message from Wendy Cheng on Thursday June 29>
@ 2006-06-30  4:39     ` Trond Myklebust
  1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2006-06-30  4:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2006-06-29 at 23:57 -0400, Wendy Cheng wrote:
> This is "our" (server's) address, not peer address - for this request to
> arrive "here", it can't be INADDR_ANY. Can it ?

Consider the case of a UDP socket.

Cheers,
 Trond



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Cluster-devel] Re: [NFS] [RFC PATCH 1/3] NLM lock failover - lock release
       [not found]           ` <message from Trond Myklebust on Friday June 30>
@ 2006-06-30  5:00             ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2006-06-30  5:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Friday June 30, trond.myklebust at fys.uio.no wrote:
> On Fri, 2006-06-30 at 14:15 +1000, Neil Brown wrote:
> 
> > The socket can only be bound to INADDR_ANY for UDP, and in that case
> > we already set rq_daddr correctly.
> 
> As I understand it, Wendy is considering the case of a multi-homed
> server. She wants to record the IP address on which we received the
> datagram so that she knows which locks to invalidate in the case of a
> migration of that particular IP address onto another server.
> 
> My point is that she won't get that information if the socket is bound
> to INADDR_ANY, as would be the case for a UDP socket.

Yes.  But that code fragment covered the TCP case only.
The extra called to ->getname was placed in svc_recvfrom which is only
called from svc_tcp_recvfrom, not from svc_udp_recvfrom (Yes, I agree
there is room for confusion there).
svc_udp_recvfrom already has
	rqstp->rq_addr.sin_addr.s_addr = skb->nh.iph->saddr;
	rqstp->rq_daddr = skb->nh.iph->daddr;

so it sets rq_daddr correctly as each packet is received.
rq_daddr was never set for requests received via TCP because it was
never used for tcp (it was used only to set the source address for UDP
replies).

NeilBrown



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-06-30  5:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-29 18:11 [Cluster-devel] [RFC PATCH 1/3] NLM lock failover - lock release Wendy Cheng
2006-06-29 19:11 ` [Cluster-devel] Re: [NFS] " Chuck Lever
2006-06-29 21:49   ` Wendy Cheng
2006-06-29 23:06 ` Trond Myklebust
2006-06-30  3:57   ` Wendy Cheng
     [not found]     ` <message from Wendy Cheng on Thursday June 29>
2006-06-30  4:15       ` Neil Brown
2006-06-30  4:38         ` Trond Myklebust
     [not found]           ` <message from Trond Myklebust on Friday June 30>
2006-06-30  5:00             ` Neil Brown
2006-06-30  4:39     ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).