linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NooB Assitance with debugging NFSv4 client requested
@ 2010-12-10 15:43 Tim Watts
       [not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
  2010-12-16 22:03 ` Tim Watts
  0 siblings, 2 replies; 7+ messages in thread
From: Tim Watts @ 2010-12-10 15:43 UTC (permalink / raw)
  To: linux-nfs

Hi,

I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4 
server is running Centos 5.5 and we use MIT kerberos and LDAP for 
users/groups. This seems to work well with Centos 5.5 clients

All works fine with my Ubuntu client, except after a while my client 
acts like it loses its authentication - symptom: home directory mount 
drops to "nobody" - I see the mount as "other" - no write access, can 
read files that have world read bit set etc.

This can happen anytime between 48 hours and 2 hours after a full client 
reboot. It seems to be triggered by active use of thunderbird via the 
NFSv4 mounted home dir which suggests it may be load sensitive.

When it happens, if I unmount my home dir (killing the desktop of 
course) , then remount the fault is cleared and I can work again.

What doesn't work is just doing a kinit -f or restarting idmapd or gssd.

I have run rpc.gssd in foreground debug mode and that doesn't say much 
during the problem times, ditto idmapd. We are using openldap for passwd 
and group lookups cached locally with nscd.

I have tried upping kernel debugging:

rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root 
callback client mount all

but I'm not sure what I'm looking for.

The symptoms feel like the kernel is losing the ticket or timing it out 
or possibly the ID mapping is failing - is there any way to examining 
the state of the kernel ticket cache or anything else I could be looking 
for?

I am tempted to say this is a bug, possibly in the Ubuntu build, but I 
would like to investigate further.

Any pointers much appreciated as to how I might isolate the fault further.

Cheers

Tim
-- 
Tim Watts


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
       [not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-13 10:22   ` Tim Watts
  2010-12-13 11:43     ` Tim Watts
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-13 10:22 UTC (permalink / raw)
  To: linux-nfs

Bump - anyone :)

Please note - I'm not requesting anyone fix a deeply obscure problem. 
I'm a sysadmin - I'd just be really grateful for some pointers to help 
me continue debugging this myself...

Ta,

Tim

On 10/12/10 15:43, Tim Watts wrote:
> Hi,
>
> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> server is running Centos 5.5 and we use MIT kerberos and LDAP for
> users/groups. This seems to work well with Centos 5.5 clients
>
> All works fine with my Ubuntu client, except after a while my client
> acts like it loses its authentication - symptom: home directory mount
> drops to "nobody" - I see the mount as "other" - no write access, can
> read files that have world read bit set etc.
>
> This can happen anytime between 48 hours and 2 hours after a full client
> reboot. It seems to be triggered by active use of thunderbird via the
> NFSv4 mounted home dir which suggests it may be load sensitive.
>
> When it happens, if I unmount my home dir (killing the desktop of
> course) , then remount the fault is cleared and I can work again.
>
> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>
> I have run rpc.gssd in foreground debug mode and that doesn't say much
> during the problem times, ditto idmapd. We are using openldap for passwd
> and group lookups cached locally with nscd.
>
> I have tried upping kernel debugging:
>
> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> callback client mount all
>
> but I'm not sure what I'm looking for.
>
> The symptoms feel like the kernel is losing the ticket or timing it out
> or possibly the ID mapping is failing - is there any way to examining
> the state of the kernel ticket cache or anything else I could be looking
> for?
>
> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> would like to investigate further.
>
> Any pointers much appreciated as to how I might isolate the fault further.
>
> Cheers
>
> Tim


-- 
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
  2010-12-13 10:22   ` Tim Watts
@ 2010-12-13 11:43     ` Tim Watts
  2010-12-13 18:55       ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-13 11:43 UTC (permalink / raw)
  To: linux-nfs

Hmm,

Wonder if this is a clue (not noticed this before, but running idmapd in 
the foreground, it just whined):

rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No 
such file or directory

(Repeated several times).

I'm veering towards this issue being an idmap problem.

Cheers

tim

On 13/12/10 10:22, Tim Watts wrote:
> Bump - anyone :)
>
> Please note - I'm not requesting anyone fix a deeply obscure problem.
> I'm a sysadmin - I'd just be really grateful for some pointers to help
> me continue debugging this myself...
>
> Ta,
>
> Tim
>
> On 10/12/10 15:43, Tim Watts wrote:
>> Hi,
>>
>> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
>> server is running Centos 5.5 and we use MIT kerberos and LDAP for
>> users/groups. This seems to work well with Centos 5.5 clients
>>
>> All works fine with my Ubuntu client, except after a while my client
>> acts like it loses its authentication - symptom: home directory mount
>> drops to "nobody" - I see the mount as "other" - no write access, can
>> read files that have world read bit set etc.
>>
>> This can happen anytime between 48 hours and 2 hours after a full client
>> reboot. It seems to be triggered by active use of thunderbird via the
>> NFSv4 mounted home dir which suggests it may be load sensitive.
>>
>> When it happens, if I unmount my home dir (killing the desktop of
>> course) , then remount the fault is cleared and I can work again.
>>
>> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>>
>> I have run rpc.gssd in foreground debug mode and that doesn't say much
>> during the problem times, ditto idmapd. We are using openldap for passwd
>> and group lookups cached locally with nscd.
>>
>> I have tried upping kernel debugging:
>>
>> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
>> callback client mount all
>>
>> but I'm not sure what I'm looking for.
>>
>> The symptoms feel like the kernel is losing the ticket or timing it out
>> or possibly the ID mapping is failing - is there any way to examining
>> the state of the kernel ticket cache or anything else I could be looking
>> for?
>>
>> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
>> would like to investigate further.
>>
>> Any pointers much appreciated as to how I might isolate the fault
>> further.
>>
>> Cheers
>>
>> Tim
>
>


-- 
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
  2010-12-13 11:43     ` Tim Watts
@ 2010-12-13 18:55       ` J. Bruce Fields
  2010-12-13 19:15         ` Tim Watts
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2010-12-13 18:55 UTC (permalink / raw)
  To: Tim Watts; +Cc: linux-nfs

On Mon, Dec 13, 2010 at 11:43:12AM +0000, Tim Watts wrote:
> Hmm,
> 
> Wonder if this is a clue (not noticed this before, but running
> idmapd in the foreground, it just whined):
> 
> rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No
> such file or directory
> 
> (Repeated several times).
> 
> I'm veering towards this issue being an idmap problem.

All permissions are enforced on the server side--the effect of an
idmapper failure should just be that "ls -l" shows bad results (e.g.,
"nobody" as the owner of every file) and that commands like "chown"
fail.

If you're actually being prevented from performing normal filesystem
operations, then the most likely culprit is with gssd and krb5.

Might be worth filing a bug with ubuntu.

rpc.gssd debugging you've tried.  Sniffing client/server and client/kdc
traffic with wireshark might also show some more information.

I dunno.  Maybe strace rpc.gssd during the failure?

--b.

> 
> Cheers
> 
> tim
> 
> On 13/12/10 10:22, Tim Watts wrote:
> >Bump - anyone :)
> >
> >Please note - I'm not requesting anyone fix a deeply obscure problem.
> >I'm a sysadmin - I'd just be really grateful for some pointers to help
> >me continue debugging this myself...
> >
> >Ta,
> >
> >Tim
> >
> >On 10/12/10 15:43, Tim Watts wrote:
> >>Hi,
> >>
> >>I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> >>server is running Centos 5.5 and we use MIT kerberos and LDAP for
> >>users/groups. This seems to work well with Centos 5.5 clients
> >>
> >>All works fine with my Ubuntu client, except after a while my client
> >>acts like it loses its authentication - symptom: home directory mount
> >>drops to "nobody" - I see the mount as "other" - no write access, can
> >>read files that have world read bit set etc.
> >>
> >>This can happen anytime between 48 hours and 2 hours after a full client
> >>reboot. It seems to be triggered by active use of thunderbird via the
> >>NFSv4 mounted home dir which suggests it may be load sensitive.
> >>
> >>When it happens, if I unmount my home dir (killing the desktop of
> >>course) , then remount the fault is cleared and I can work again.
> >>
> >>What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
> >>
> >>I have run rpc.gssd in foreground debug mode and that doesn't say much
> >>during the problem times, ditto idmapd. We are using openldap for passwd
> >>and group lookups cached locally with nscd.
> >>
> >>I have tried upping kernel debugging:
> >>
> >>rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> >>callback client mount all
> >>
> >>but I'm not sure what I'm looking for.
> >>
> >>The symptoms feel like the kernel is losing the ticket or timing it out
> >>or possibly the ID mapping is failing - is there any way to examining
> >>the state of the kernel ticket cache or anything else I could be looking
> >>for?
> >>
> >>I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> >>would like to investigate further.
> >>
> >>Any pointers much appreciated as to how I might isolate the fault
> >>further.
> >>
> >>Cheers
> >>
> >>Tim
> >
> >
> 
> 
> -- 
> Tim Watts
> Linux Sysadmin, High Energy Physics, Imperial College London
> Tel: 020 759 47809
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
  2010-12-13 18:55       ` J. Bruce Fields
@ 2010-12-13 19:15         ` Tim Watts
  0 siblings, 0 replies; 7+ messages in thread
From: Tim Watts @ 2010-12-13 19:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Hi Bruce,

On 13/12/10 18:55, J. Bruce Fields wrote:
> On Mon, Dec 13, 2010 at 11:43:12AM +0000, Tim Watts wrote:
>> Hmm,
>>
>> Wonder if this is a clue (not noticed this before, but running
>> idmapd in the foreground, it just whined):
>>
>> rpc.idmapd: nfscb: read(/var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap): No
>> such file or directory
>>
>> (Repeated several times).
>>
>> I'm veering towards this issue being an idmap problem.
>
> All permissions are enforced on the server side--the effect of an
> idmapper failure should just be that "ls -l" shows bad results (e.g.,
> "nobody" as the owner of every file) and that commands like "chown"
> fail.

Thanks very much for your reply.

Thanks for confirming the logic - I certainly now believe I have an 
issue with idmapd - sometimes the user will change to "nobody" and the 
group is good, and sometimes the reverse will happen.

We are using LDAP - with nscd - I might give myself a static /etc/passwd 
and /etc/groups entry to see if that makes the problem go away.

I'll run idmapd in debug plus under strace to see what's happening.


> If you're actually being prevented from performing normal filesystem
> operations, then the most likely culprit is with gssd and krb5.

I think I *may* have discovered the cause, eventually.

I have a couple of bash functions that maintain a user kerb principle 
and a root one - the root one being in /tmp/krb5cc_<uid>_root - and the 
bash function switches the env var to suit the mode I want to be in.

Seems that having two files of the form /tmp/krb5cc_<uid>* might be 
confusing something, so I have made the root one have a totally non 
conforming name to see if that was the cause.

> Might be worth filing a bug with ubuntu.

I did - it got classified under "nfs4_acl_tools"(!). I don't think 
Ubuntu are big on NFSv4 and I'm happy to try to get to the bottom of 
this, but it's pretty new to me (I've run a lot of NFSv3 before though).

> rpc.gssd debugging you've tried.  Sniffing client/server and client/kdc
> traffic with wireshark might also show some more information.

Yes - that would be worth doing.

> I dunno.  Maybe strace rpc.gssd during the failure?

Yes - might be worth doing that too.

Luckily during the failure conditions, the desktop is usable enough so I 
can at least get around and look at stuff.

Many thanks for the pointers :)

Cheers

Tim

-- 
Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
  2010-12-10 15:43 NooB Assitance with debugging NFSv4 client requested Tim Watts
       [not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-16 22:03 ` Tim Watts
       [not found]   ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Tim Watts @ 2010-12-16 22:03 UTC (permalink / raw)
  To: linux-nfs

On 10/12/10 15:43, Tim Watts wrote:
> Hi,
>
> I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> server is running Centos 5.5 and we use MIT kerberos and LDAP for
> users/groups. This seems to work well with Centos 5.5 clients
>
> All works fine with my Ubuntu client, except after a while my client
> acts like it loses its authentication - symptom: home directory mount
> drops to "nobody" - I see the mount as "other" - no write access, can
> read files that have world read bit set etc.
>
> This can happen anytime between 48 hours and 2 hours after a full client
> reboot. It seems to be triggered by active use of thunderbird via the
> NFSv4 mounted home dir which suggests it may be load sensitive.
>
> When it happens, if I unmount my home dir (killing the desktop of
> course) , then remount the fault is cleared and I can work again.
>
> What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
>
> I have run rpc.gssd in foreground debug mode and that doesn't say much
> during the problem times, ditto idmapd. We are using openldap for passwd
> and group lookups cached locally with nscd.
>
> I have tried upping kernel debugging:
>
> rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> callback client mount all
>
> but I'm not sure what I'm looking for.
>
> The symptoms feel like the kernel is losing the ticket or timing it out
> or possibly the ID mapping is failing - is there any way to examining
> the state of the kernel ticket cache or anything else I could be looking
> for?
>
> I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> would like to investigate further.
>
> Any pointers much appreciated as to how I might isolate the fault further.
>
> Cheers
>
> Tim

Right - thanks for all the discussions...

The security (krb tickets) seem stable now I have renamed my root 
principle cache file to /tmp/kerberos5cc_tjw_root rather than 
/tmp/krb5cc_myuid_root

idmap seems still unstable - often my file groups go to nobody despite 
my file owners being correct.

Our LDAP server setup is new (replaced NIS) and we see sometimes that 
other servers (eg the mailserver) occasionally don't get an LDAP query 
through in a timely manner once in a blue moon.

I would like to see idmapd try a bit harder and/or not cache bad/missing 
answers for so long but I think that is roughly where the other problem 
lies - thus is debuggable and soluable.

Thanks for the help guys!

Cheers

Tim

-- 

Tim Watts
Linux Sysadmin, High Energy Physics, Imperial College London
Tel: 020 759 47809

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: NooB Assitance with debugging NFSv4 client requested
       [not found]   ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
@ 2010-12-17 19:16     ` J. Bruce Fields
  0 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2010-12-17 19:16 UTC (permalink / raw)
  To: Tim Watts; +Cc: linux-nfs

On Thu, Dec 16, 2010 at 10:03:53PM +0000, Tim Watts wrote:
> On 10/12/10 15:43, Tim Watts wrote:
> >Hi,
> >
> >I have an NFSv4 client set up on Ubuntu 10.04.1 LTS x86. The NFSv4
> >server is running Centos 5.5 and we use MIT kerberos and LDAP for
> >users/groups. This seems to work well with Centos 5.5 clients
> >
> >All works fine with my Ubuntu client, except after a while my client
> >acts like it loses its authentication - symptom: home directory mount
> >drops to "nobody" - I see the mount as "other" - no write access, can
> >read files that have world read bit set etc.
> >
> >This can happen anytime between 48 hours and 2 hours after a full client
> >reboot. It seems to be triggered by active use of thunderbird via the
> >NFSv4 mounted home dir which suggests it may be load sensitive.
> >
> >When it happens, if I unmount my home dir (killing the desktop of
> >course) , then remount the fault is cleared and I can work again.
> >
> >What doesn't work is just doing a kinit -f or restarting idmapd or gssd.
> >
> >I have run rpc.gssd in foreground debug mode and that doesn't say much
> >during the problem times, ditto idmapd. We are using openldap for passwd
> >and group lookups cached locally with nscd.
> >
> >I have tried upping kernel debugging:
> >
> >rpcdebug -m nfs -s vfs dircache lookupcache pagecache proc xdr file root
> >callback client mount all
> >
> >but I'm not sure what I'm looking for.
> >
> >The symptoms feel like the kernel is losing the ticket or timing it out
> >or possibly the ID mapping is failing - is there any way to examining
> >the state of the kernel ticket cache or anything else I could be looking
> >for?
> >
> >I am tempted to say this is a bug, possibly in the Ubuntu build, but I
> >would like to investigate further.
> >
> >Any pointers much appreciated as to how I might isolate the fault further.
> >
> >Cheers
> >
> >Tim
> 
> Right - thanks for all the discussions...
> 
> The security (krb tickets) seem stable now I have renamed my root
> principle cache file to /tmp/kerberos5cc_tjw_root rather than
> /tmp/krb5cc_myuid_root
> 
> idmap seems still unstable - often my file groups go to nobody
> despite my file owners being correct.
> 
> Our LDAP server setup is new (replaced NIS) and we see sometimes
> that other servers (eg the mailserver) occasionally don't get an
> LDAP query through in a timely manner once in a blue moon.
> 
> I would like to see idmapd try a bit harder and/or not cache
> bad/missing answers for so long but I think that is roughly where
> the other problem lies - thus is debuggable and soluable.

Let us know what you figure out.

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-17 19:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-10 15:43 NooB Assitance with debugging NFSv4 client requested Tim Watts
     [not found] ` <4D024A8E.50407-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
2010-12-13 10:22   ` Tim Watts
2010-12-13 11:43     ` Tim Watts
2010-12-13 18:55       ` J. Bruce Fields
2010-12-13 19:15         ` Tim Watts
2010-12-16 22:03 ` Tim Watts
     [not found]   ` <4D0A8CC9.9050306-AQ/gCgVxFfnQzY9nttDBhA@public.gmane.org>
2010-12-17 19:16     ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).