From: Chuck Lever <chuck.lever@oracle.com>
To: Tom Tucker <tom@opengridcomputing.com>
Cc: bfields@fieldses.org, neilb@suse.de, nfs@lists.sourceforge.net,
gnb@sgi.com
Subject: Re: [RFC,PATCH] svc: Fix the UDP address logic
Date: Tue, 23 Oct 2007 11:49:38 -0400 [thread overview]
Message-ID: <471E1812.6010906@oracle.com> (raw)
In-Reply-To: <471E1564.8040800@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 4988 bytes --]
Tom Tucker wrote:
> Chuck Lever wrote:
>> Tom Tucker wrote:
>>> When the address information was moved to the svc_xprt structure, a bug
>>> was introduced to UDP that caused an incorrect address to be used
>>> when responding to RPC on the UDP transport. This was the result of
>>> failing to completely implement the generic address logic for the UDP
>>> transport.
>>>
>>> Thanks to Greg for pointing this out...
>>>
>>> Since I confused myself, it's probably a good idea to describe how
>>> this is supposed to work. The transport is responsible for setting
>>> the xpt_local and xpt_remote addresses in the svc_xprt structure as
>>> part of xpo_recvfrom processing. This cannot be done in a generic way
>>> and in fact varies between TCP, UDP and RDMA. A set of xpo_ functions
>>> (e.g. getlocalname, getremotename) could have been added but this would
>>> have resulted in additional caching and copying of the addresses around.
>>> The generic svc_recv code copies the addresses from the svc_xprt
>>> structure into the rqstp structure as part of svc_recv processing.
>>> The xpt_local address should also be set on listening endpoints, for
>>> tcp/rdma this is done as part of endpoint creation and for new
>>> connections in xpo_accept processing.
>>> This patch was tested with Connectathon over V3 on UDP.
>>>
>>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>> ---
>>>
>>> net/sunrpc/svcsock.c | 16 ++++++++++------
>>> 1 files changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index e1a27ee..0f8a3d5 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -473,17 +473,21 @@ svc_write_space(struct sock *sk)
>>> static inline void svc_udp_get_dest_address(struct svc_rqst *rqstp,
>>> struct cmsghdr *cmh)
>>> {
>>> - struct svc_sock *svsk =
>>> - container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
>>> + struct svc_xprt *xprt = rqstp->rq_xprt;
>>> + struct svc_sock *svsk = container_of(xprt, struct svc_sock,
>>> sk_xprt);
>>> switch (svsk->sk_sk->sk_family) {
>>> case AF_INET: {
>>> struct in_pktinfo *pki = CMSG_DATA(cmh);
>>> - rqstp->rq_daddr.addr.s_addr = pki->ipi_spec_dst.s_addr;
>>> + struct sockaddr_in *sin =
>>> + (struct sockaddr_in *)&xprt->xpt_local;
>>> + sin->sin_addr.s_addr = pki->ipi_spec_dst.s_addr;
>>> break;
>>> }
>>> case AF_INET6: {
>>> struct in6_pktinfo *pki = CMSG_DATA(cmh);
>>> - ipv6_addr_copy(&rqstp->rq_daddr.addr6, &pki->ipi6_addr);
>>> + struct sockaddr_in6 *sin6 =
>>> + (struct sockaddr_in6 *)&xprt->xpt_local;
>>> + ipv6_addr_copy(&sin6->sin6_addr, &pki->ipi6_addr);
>>> break;
>>> }
>>> }
>>> @@ -506,7 +510,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>> struct cmsghdr *cmh = &buffer.hdr;
>>> int err, len;
>>> struct msghdr msg = {
>>> - .msg_name = svc_addr(rqstp),
>>> + .msg_name = &svsk->sk_xprt.xpt_remote,
>>> .msg_control = cmh,
>>> .msg_controllen = sizeof(buffer),
>>> .msg_flags = MSG_DONTWAIT,
>>> @@ -541,7 +545,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
>>> svc_xprt_received(&svsk->sk_xprt);
>>> return -EAGAIN;
>>> }
>>> - rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
>>> + svsk->sk_xprt.xpt_remotelen = sizeof(svsk->sk_xprt.xpt_remote);
>>
>> This worries me a bit. Why isn't rq_addrlen involved here?
> rq_addrlen gets set in the generic code (see svc_recv). The source of
> the address is the transport, not the rqstp structure. The rqstp
> structure caches it for use when responding to the request in
> svc_process. Ironically, I think the HA case that Greg referred to is
> the only reason we can't get rid of rq_addr/rq_daddr altogether and
> access the address data directly from svc_xprt. Since this address may
> change for UDP, we have to store it in the rqstp structure and use it to
> reply because another request could arrive in the interim and change the
> address data in svc_xprt.
>> the xpt_remote field should be a sockaddr_storage,
> ...it is.
>> and the remote_len field should be the actual length of the address,
>> not the size of the storage space.
> Perhaps it should be, but a) kernel_recvmsg doesn't give us a length, b)
> to determine it would require a switch, the degenerate case for which is
> sizeof(storage),
I don't agree: there is no degenerate case, and sizeof(storage) should
never ever be used.
> and c) in it's current usage, big-enough is
> good-enough. A weaker point is "that's the way it's always been" ;-)
Please change it to use an appropriate address length. Using
sizeof(struct sockaddr_storage) is broken, or at the very least
inefficient (this struct is huge). The use of sizeof() here is
inconsistent with the design of the IPv6 changes I made last year.
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 327 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard
[-- Attachment #3: Type: text/plain, Size: 314 bytes --]
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
[-- Attachment #4: Type: text/plain, Size: 140 bytes --]
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-10-23 15:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-22 21:01 [RFC,PATCH] svc: Fix the UDP address logic Tom Tucker
2007-10-23 1:53 ` Greg Banks
2007-10-23 2:13 ` Tom Tucker
2007-10-23 2:46 ` Greg Banks
2007-10-23 15:13 ` Chuck Lever
2007-10-23 15:38 ` Tom Tucker
2007-10-23 15:49 ` Chuck Lever [this message]
2007-10-23 16:13 ` Tom Tucker
2007-10-23 16:24 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=471E1812.6010906@oracle.com \
--to=chuck.lever@oracle.com \
--cc=bfields@fieldses.org \
--cc=gnb@sgi.com \
--cc=neilb@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=tom@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox