From: Tom Tucker <tom@opengridcomputing.com>
To: chuck.lever@oracle.com
Cc: bfields@fieldses.org, neilb@suse.de, nfs@lists.sourceforge.net,
gnb@sgi.com
Subject: Re: [RFC,PATCH] svc: Fix the UDP address logic
Date: Tue, 23 Oct 2007 11:13:36 -0500 [thread overview]
Message-ID: <471E1DB0.2000407@opengridcomputing.com> (raw)
In-Reply-To: <471E1812.6010906@oracle.com>
Chuck Lever wrote:
> Tom Tucker wrote:
>> Chuck Lever wrote:
>>> Tom Tucker wrote:
>>>> When the address information was moved to the svc_xprt structure, a
>>>> bug
>>>> was introduced to UDP that caused an incorrect address to be used
>>>> when responding to RPC on the UDP transport. This was the result of
>>>> failing to completely implement the generic address logic for the
>>>> UDP transport.
>>>>
>>>> Thanks to Greg for pointing this out...
>>>>
>>>> Since I confused myself, it's probably a good idea to describe how
>>>> this is supposed to work. The transport is responsible for setting
>>>> the xpt_local and xpt_remote addresses in the svc_xprt structure as
>>>> part of xpo_recvfrom processing. This cannot be done in a generic
>>>> way and in fact varies between TCP, UDP and RDMA. A set of xpo_
>>>> functions
>>>> (e.g. getlocalname, getremotename) could have been added but this
>>>> would
>>>> have resulted in additional caching and copying of the addresses
>>>> around.
>>>> The generic svc_recv code copies the addresses from the svc_xprt
>>>> structure into the rqstp structure as part of svc_recv processing.
>>>> The xpt_local address should also be set on listening endpoints, for
>>>> tcp/rdma this is done as part of endpoint creation and for new
>>>> connections in xpo_accept processing.
>>>> This patch was tested with Connectathon over V3 on UDP.
>>>>
>>>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>>> ---
>>>>
>>>> net/sunrpc/svcsock.c | 16 ++++++++++------
>>>> 1 files changed, 10 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index e1a27ee..0f8a3d5 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -473,17 +473,21 @@ svc_write_space(struct sock *sk)
>>>> static inline void svc_udp_get_dest_address(struct svc_rqst *rqstp,
>>>> struct cmsghdr *cmh)
>>>> {
>>>> - struct svc_sock *svsk =
>>>> - container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
>>>> + struct svc_xprt *xprt = rqstp->rq_xprt;
>>>> + struct svc_sock *svsk = container_of(xprt, struct svc_sock,
>>>> sk_xprt);
>>>> switch (svsk->sk_sk->sk_family) {
>>>> case AF_INET: {
>>>> struct in_pktinfo *pki = CMSG_DATA(cmh);
>>>> - rqstp->rq_daddr.addr.s_addr = pki->ipi_spec_dst.s_addr;
>>>> + struct sockaddr_in *sin =
>>>> + (struct sockaddr_in *)&xprt->xpt_local;
>>>> + sin->sin_addr.s_addr = pki->ipi_spec_dst.s_addr;
>>>> break;
>>>> }
>>>> case AF_INET6: {
>>>> struct in6_pktinfo *pki = CMSG_DATA(cmh);
>>>> - ipv6_addr_copy(&rqstp->rq_daddr.addr6, &pki->ipi6_addr);
>>>> + struct sockaddr_in6 *sin6 =
>>>> + (struct sockaddr_in6 *)&xprt->xpt_local;
>>>> + ipv6_addr_copy(&sin6->sin6_addr, &pki->ipi6_addr);
>>>> break;
>>>> }
>>>> }
>>>> @@ -506,7 +510,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>>> struct cmsghdr *cmh = &buffer.hdr;
>>>> int err, len;
>>>> struct msghdr msg = {
>>>> - .msg_name = svc_addr(rqstp),
>>>> + .msg_name = &svsk->sk_xprt.xpt_remote,
>>>> .msg_control = cmh,
>>>> .msg_controllen = sizeof(buffer),
>>>> .msg_flags = MSG_DONTWAIT,
>>>> @@ -541,7 +545,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>>> svc_xprt_received(&svsk->sk_xprt);
>>>> return -EAGAIN;
>>>> }
>>>> - rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
>>>> + svsk->sk_xprt.xpt_remotelen = sizeof(svsk->sk_xprt.xpt_remote);
>>>
>>> This worries me a bit. Why isn't rq_addrlen involved here?
>> rq_addrlen gets set in the generic code (see svc_recv). The source of
>> the address is the transport, not the rqstp structure. The rqstp
>> structure caches it for use when responding to the request in
>> svc_process. Ironically, I think the HA case that Greg referred to is
>> the only reason we can't get rid of rq_addr/rq_daddr altogether and
>> access the address data directly from svc_xprt. Since this address
>> may change for UDP, we have to store it in the rqstp structure and
>> use it to reply because another request could arrive in the interim
>> and change the address data in svc_xprt.
>>> the xpt_remote field should be a sockaddr_storage,
>> ...it is.
>>> and the remote_len field should be the actual length of the address,
>>> not the size of the storage space.
>> Perhaps it should be, but a) kernel_recvmsg doesn't give us a length,
>> b) to determine it would require a switch, the degenerate case for
>> which is sizeof(storage),
>
> I don't agree: there is no degenerate case, and sizeof(storage) should
> never ever be used.
>
>> and c) in it's current usage, big-enough is good-enough. A weaker
>> point is "that's the way it's always been" ;-)
>
> Please change it to use an appropriate address length. Using
> sizeof(struct sockaddr_storage) is broken, or at the very least
> inefficient (this struct is huge).
Yes, for TCP/UDP 128B is getting copied when it really only needs to be
16 or 28.
> The use of sizeof() here is inconsistent with the design of the IPv6
> changes I made last year.
Fair enough, but in practical terms that means if ss_family is not
AF_INET or AF_INET6, we'll return an error or is there something I'm
missing?
Tom
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-10-23 16:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-22 21:01 [RFC,PATCH] svc: Fix the UDP address logic Tom Tucker
2007-10-23 1:53 ` Greg Banks
2007-10-23 2:13 ` Tom Tucker
2007-10-23 2:46 ` Greg Banks
2007-10-23 15:13 ` Chuck Lever
2007-10-23 15:38 ` Tom Tucker
2007-10-23 15:49 ` Chuck Lever
2007-10-23 16:13 ` Tom Tucker [this message]
2007-10-23 16:24 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=471E1DB0.2000407@opengridcomputing.com \
--to=tom@opengridcomputing.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=gnb@sgi.com \
--cc=neilb@suse.de \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox