Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Tom Tucker <tom@opengridcomputing.com>
Cc: bfields@fieldses.org, neilb@suse.de, nfs@lists.sourceforge.net,
	gnb@sgi.com
Subject: Re: [RFC,PATCH] svc: Fix the UDP address logic
Date: Tue, 23 Oct 2007 11:49:38 -0400	[thread overview]
Message-ID: <471E1812.6010906@oracle.com> (raw)
In-Reply-To: <471E1564.8040800@opengridcomputing.com>

[-- Attachment #1: Type: text/plain, Size: 4988 bytes --]

Tom Tucker wrote:
> Chuck Lever wrote:
>> Tom Tucker wrote:
>>> When the address information was moved to the svc_xprt structure, a bug
>>> was introduced to UDP that caused an incorrect address to be used
>>> when responding to RPC on the UDP transport. This was the result of
>>> failing to completely implement the generic address logic for the UDP 
>>> transport.
>>>
>>> Thanks to Greg for pointing this out...
>>>
>>> Since I confused myself, it's probably a good idea to describe how 
>>> this is supposed to work. The transport is responsible for setting 
>>> the xpt_local and xpt_remote addresses in the svc_xprt structure as
>>> part of xpo_recvfrom processing. This cannot be done in a generic way 
>>> and in fact varies between TCP, UDP and RDMA. A set of xpo_ functions
>>> (e.g. getlocalname, getremotename) could have been added but this would
>>> have resulted in additional caching and copying of the addresses around.
>>> The generic svc_recv code copies the addresses from the svc_xprt 
>>> structure into the rqstp structure as part of svc_recv processing. 
>>> The xpt_local address should also be set on listening endpoints, for
>>> tcp/rdma this is done as part of endpoint creation and for new 
>>> connections in xpo_accept processing.
>>> This patch was tested with Connectathon over V3 on UDP.
>>>
>>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>> ---
>>>
>>>  net/sunrpc/svcsock.c |   16 ++++++++++------
>>>  1 files changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index e1a27ee..0f8a3d5 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -473,17 +473,21 @@ svc_write_space(struct sock *sk)
>>>  static inline void svc_udp_get_dest_address(struct svc_rqst *rqstp,
>>>                          struct cmsghdr *cmh)
>>>  {
>>> -    struct svc_sock *svsk =
>>> -        container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
>>> +    struct svc_xprt *xprt = rqstp->rq_xprt;
>>> +    struct svc_sock *svsk = container_of(xprt, struct svc_sock, 
>>> sk_xprt);
>>>      switch (svsk->sk_sk->sk_family) {
>>>      case AF_INET: {
>>>          struct in_pktinfo *pki = CMSG_DATA(cmh);
>>> -        rqstp->rq_daddr.addr.s_addr = pki->ipi_spec_dst.s_addr;
>>> +        struct sockaddr_in *sin =
>>> +            (struct sockaddr_in *)&xprt->xpt_local;
>>> +        sin->sin_addr.s_addr = pki->ipi_spec_dst.s_addr;
>>>          break;
>>>          }
>>>      case AF_INET6: {
>>>          struct in6_pktinfo *pki = CMSG_DATA(cmh);
>>> -        ipv6_addr_copy(&rqstp->rq_daddr.addr6, &pki->ipi6_addr);
>>> +        struct sockaddr_in6 *sin6 =
>>> +            (struct sockaddr_in6 *)&xprt->xpt_local;
>>> +        ipv6_addr_copy(&sin6->sin6_addr, &pki->ipi6_addr);
>>>          break;
>>>          }
>>>      }
>>> @@ -506,7 +510,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>>      struct cmsghdr *cmh = &buffer.hdr;
>>>      int        err, len;
>>>      struct msghdr msg = {
>>> -        .msg_name = svc_addr(rqstp),
>>> +        .msg_name = &svsk->sk_xprt.xpt_remote,
>>>          .msg_control = cmh,
>>>          .msg_controllen = sizeof(buffer),
>>>          .msg_flags = MSG_DONTWAIT,
>>> @@ -541,7 +545,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>>          svc_xprt_received(&svsk->sk_xprt);
>>>          return -EAGAIN;
>>>      }
>>> -    rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
>>> +    svsk->sk_xprt.xpt_remotelen = sizeof(svsk->sk_xprt.xpt_remote);
>>
>> This worries me a bit.  Why isn't rq_addrlen involved here?
> rq_addrlen gets set in the generic code (see svc_recv). The source of 
> the address is the transport, not the rqstp structure. The rqstp 
> structure caches it for use when responding to the request in 
> svc_process. Ironically, I think the HA case that Greg referred to is 
> the only reason we can't get rid of rq_addr/rq_daddr altogether and 
> access the address data directly from svc_xprt. Since this address may 
> change for UDP, we have to store it in the rqstp structure and use it to 
> reply because another request could arrive in the interim and change the 
> address data in svc_xprt.
>>   the xpt_remote field should be a sockaddr_storage, 
> ...it is.
>> and the remote_len field should be the actual length of the address, 
>> not the size of the storage space.
> Perhaps it should be, but a) kernel_recvmsg doesn't give us a length, b) 
> to determine it would require a switch, the degenerate case for which is 
> sizeof(storage),

I don't agree: there is no degenerate case, and sizeof(storage) should 
never ever be used.

> and c) in it's current usage, big-enough is 
> good-enough. A weaker point is "that's the way it's always been" ;-)

Please change it to use an appropriate address length.  Using 
sizeof(struct sockaddr_storage) is broken, or at the very least 
inefficient (this struct is huge).  The use of sizeof() here is 
inconsistent with the design of the IPv6 changes I made last year.

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 327 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 314 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2007-10-23 15:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-22 21:01 [RFC,PATCH] svc: Fix the UDP address logic Tom Tucker
2007-10-23  1:53 ` Greg Banks
2007-10-23  2:13   ` Tom Tucker
2007-10-23  2:46     ` Greg Banks
2007-10-23 15:13 ` Chuck Lever
2007-10-23 15:38   ` Tom Tucker
2007-10-23 15:49     ` Chuck Lever [this message]
2007-10-23 16:13       ` Tom Tucker
2007-10-23 16:24         ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=471E1812.6010906@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=gnb@sgi.com \
    --cc=neilb@suse.de \
    --cc=nfs@lists.sourceforge.net \
    --cc=tom@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox