* Legacy NFS client DNS resolver fails since 2.6.37
@ 2012-10-29 1:03 Chuck Lever
2012-10-29 1:59 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Chuck Lever @ 2012-10-29 1:03 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux NFS Mailing List
Hi Neil-
To use the legacy DNS resolver for resolving hostnames in NFSv4 referrals, I've installed the /sbin/nfs_cache_getent script on my NFS client "degas." I've confirmed it works with a 2.6.36 kernel.
However, since 2.6.37 commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" the legacy DNS resolver appears not to work. When attempting to follow a referral that uses a server hostname, the client fails 100% of the time to mount the referred to server with an error such as:
[cel@degas example.net]$ ls home
ls: cannot open directory home: No such file or directory
The contents of the dns_resolve cache appear to indicate that there are resolution results in the cache, but the CACHE_VALID flag is not set for that entry:
[root@degas dns_resolve]# cat content
# ip address hostname ttl
# , klimt.example.net 48
klimt.example.net is the hostname that is contained in the referral.
I have a second referral called "ip-address" in the same directory (domainroot), with the same content except the IP address of klimt is used instead of its hostname. Following that second referral always works.
I've tried every stable.0 release up to 3.6.0, and the behavior is roughly the same for each, which suggests that there is no upstream fix for this issue thus far.
Since I've never seen a problem like this reported, I'm wondering if anyone else can confirm this issue.
I have a narrow interest in fixing the legacy DNS server in stable kernels, but there may also be a latent problem with the RPC cache implementation that could spell trouble for other consumers, even post-3.6.
A rough outline of how you might reproduce this:
+ Build and install a 2.6.37 or later kernel for your NFS client with CONFIG_NFS_USE_LEGACY_DNS=y.
+ Set up an NFS server with "refer=" exports. man exports(5)
+ On your client, mount the server directory that contains the exports, then try to "cd" through one of the referrals.
If you don't feel up to replicating the above arrangement, can you suggest cache debugging instrumentation that can be added to my client to help nail this? Thanks for any advice!
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Legacy NFS client DNS resolver fails since 2.6.37
2012-10-29 1:03 Legacy NFS client DNS resolver fails since 2.6.37 Chuck Lever
@ 2012-10-29 1:59 ` NeilBrown
2012-10-29 17:47 ` Chuck Lever
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-10-29 1:59 UTC (permalink / raw)
To: Chuck Lever; +Cc: Linux NFS Mailing List
[-- Attachment #1: Type: text/plain, Size: 3485 bytes --]
On Sun, 28 Oct 2012 21:03:45 -0400 Chuck Lever <chuck.lever@oracle.com> wrote:
> Hi Neil-
>
> To use the legacy DNS resolver for resolving hostnames in NFSv4 referrals, I've installed the /sbin/nfs_cache_getent script on my NFS client "degas." I've confirmed it works with a 2.6.36 kernel.
>
> However, since 2.6.37 commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" the legacy DNS resolver appears not to work. When attempting to follow a referral that uses a server hostname, the client fails 100% of the time to mount the referred to server with an error such as:
>
> [cel@degas example.net]$ ls home
> ls: cannot open directory home: No such file or directory
>
> The contents of the dns_resolve cache appear to indicate that there are resolution results in the cache, but the CACHE_VALID flag is not set for that entry:
>
> [root@degas dns_resolve]# cat content
> # ip address hostname ttl
> # , klimt.example.net 48
>
> klimt.example.net is the hostname that is contained in the referral.
>
> I have a second referral called "ip-address" in the same directory (domainroot), with the same content except the IP address of klimt is used instead of its hostname. Following that second referral always works.
>
> I've tried every stable.0 release up to 3.6.0, and the behavior is roughly the same for each, which suggests that there is no upstream fix for this issue thus far.
>
> Since I've never seen a problem like this reported, I'm wondering if anyone else can confirm this issue.
>
> I have a narrow interest in fixing the legacy DNS server in stable kernels, but there may also be a latent problem with the RPC cache implementation that could spell trouble for other consumers, even post-3.6.
>
> A rough outline of how you might reproduce this:
>
> + Build and install a 2.6.37 or later kernel for your NFS client with CONFIG_NFS_USE_LEGACY_DNS=y.
>
> + Set up an NFS server with "refer=" exports. man exports(5)
>
> + On your client, mount the server directory that contains the exports, then try to "cd" through one of the referrals.
>
> If you don't feel up to replicating the above arrangement, can you suggest cache debugging instrumentation that can be added to my client to help nail this? Thanks for any advice!
>
Hi Chuck,
looks like I messed up.
Every other cache uses absolute timestamps for expiry time. The dns resolver
differs from this and uses relative time stamps (ttl). I obviously didn't
understand this properly when I wrote the patch that broke things.
In particular, using get_expiry() is inappropriate in this context.
Something like this should fix it.
NeilBrown
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index 31c26c4..d9415a2 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -217,7 +217,7 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen)
{
char buf1[NFS_DNS_HOSTNAME_MAXLEN+1];
struct nfs_dns_ent key, *item;
- unsigned long ttl;
+ unsigned int ttl;
ssize_t len;
int ret = -EINVAL;
@@ -240,7 +240,8 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen)
key.namelen = len;
memset(&key.h, 0, sizeof(key.h));
- ttl = get_expiry(&buf);
+ if (get_int(&buf, &ttl) < 0)
+ goto out;
if (ttl == 0)
goto out;
key.h.expiry_time = ttl + seconds_since_boot();
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Legacy NFS client DNS resolver fails since 2.6.37
2012-10-29 1:59 ` NeilBrown
@ 2012-10-29 17:47 ` Chuck Lever
0 siblings, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2012-10-29 17:47 UTC (permalink / raw)
To: NeilBrown; +Cc: Linux NFS Mailing List
On Oct 28, 2012, at 9:59 PM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 28 Oct 2012 21:03:45 -0400 Chuck Lever <chuck.lever@oracle.com> wrote:
>
>> Hi Neil-
>>
>> To use the legacy DNS resolver for resolving hostnames in NFSv4 referrals, I've installed the /sbin/nfs_cache_getent script on my NFS client "degas." I've confirmed it works with a 2.6.36 kernel.
>>
>> However, since 2.6.37 commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" the legacy DNS resolver appears not to work. When attempting to follow a referral that uses a server hostname, the client fails 100% of the time to mount the referred to server with an error such as:
>>
>> [cel@degas example.net]$ ls home
>> ls: cannot open directory home: No such file or directory
>>
>> The contents of the dns_resolve cache appear to indicate that there are resolution results in the cache, but the CACHE_VALID flag is not set for that entry:
>>
>> [root@degas dns_resolve]# cat content
>> # ip address hostname ttl
>> # , klimt.example.net 48
>>
>> klimt.example.net is the hostname that is contained in the referral.
>>
>> I have a second referral called "ip-address" in the same directory (domainroot), with the same content except the IP address of klimt is used instead of its hostname. Following that second referral always works.
>>
>> I've tried every stable.0 release up to 3.6.0, and the behavior is roughly the same for each, which suggests that there is no upstream fix for this issue thus far.
>>
>> Since I've never seen a problem like this reported, I'm wondering if anyone else can confirm this issue.
>>
>> I have a narrow interest in fixing the legacy DNS server in stable kernels, but there may also be a latent problem with the RPC cache implementation that could spell trouble for other consumers, even post-3.6.
>>
>> A rough outline of how you might reproduce this:
>>
>> + Build and install a 2.6.37 or later kernel for your NFS client with CONFIG_NFS_USE_LEGACY_DNS=y.
>>
>> + Set up an NFS server with "refer=" exports. man exports(5)
>>
>> + On your client, mount the server directory that contains the exports, then try to "cd" through one of the referrals.
>>
>> If you don't feel up to replicating the above arrangement, can you suggest cache debugging instrumentation that can be added to my client to help nail this? Thanks for any advice!
>>
>
>
> Hi Chuck,
> looks like I messed up.
> Every other cache uses absolute timestamps for expiry time. The dns resolver
> differs from this and uses relative time stamps (ttl). I obviously didn't
> understand this properly when I wrote the patch that broke things.
> In particular, using get_expiry() is inappropriate in this context.
>
> Something like this should fix it.
I built a 3.7-rc2 kernel with CONFIG_NFS_USE_LEGACY_DNS=y.
Without your patch, following a referral containing a hostname does not work on this kernel. After applying your patch, following the same referral works as expected.
Tested-by: Chuck Lever <chuck.lever@oracle.com>
IMO, this fix should go to all stable kernels => 2.6.37, and to 3.7-rc.
Good news is that this problem does not affect other RPC cache consumers. Thanks for the quick response!
> NeilBrown
>
>
> diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
> index 31c26c4..d9415a2 100644
> --- a/fs/nfs/dns_resolve.c
> +++ b/fs/nfs/dns_resolve.c
> @@ -217,7 +217,7 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen)
> {
> char buf1[NFS_DNS_HOSTNAME_MAXLEN+1];
> struct nfs_dns_ent key, *item;
> - unsigned long ttl;
> + unsigned int ttl;
> ssize_t len;
> int ret = -EINVAL;
>
> @@ -240,7 +240,8 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen)
> key.namelen = len;
> memset(&key.h, 0, sizeof(key.h));
>
> - ttl = get_expiry(&buf);
> + if (get_int(&buf, &ttl) < 0)
> + goto out;
> if (ttl == 0)
> goto out;
> key.h.expiry_time = ttl + seconds_since_boot();
>
>
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-10-29 18:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-29 1:03 Legacy NFS client DNS resolver fails since 2.6.37 Chuck Lever
2012-10-29 1:59 ` NeilBrown
2012-10-29 17:47 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).