All of lore.kernel.org
 help / color / mirror / Atom feed
* A weird autofs problem
@ 2007-08-16 23:44 Jason L Tibbitts III
  2007-08-17  3:17 ` Ian Kent
  0 siblings, 1 reply; 6+ messages in thread
From: Jason L Tibbitts III @ 2007-08-16 23:44 UTC (permalink / raw)
  To: autofs

I'm trying to debug a very weird problem I've started seeing with
autofs over the last couple of days.  The machines I'm seeing this on
are all running Fedora Core 5 i386, fully updated (autofs 4.1.4-33 and
kernel-smp 2.6.20-1.2320.fc5).  Yes, I'm in the process of rolling out
F7, but that will still take at least a couple of weeks and in the
meantime I'd like to get to the bottom of this if possible.

We use autofs to automount user home directories under /home.  Some
users are reporting that their home directories are just disappearing
out from under them; attempts to write to them result in "permission
denied".  Only once have I seen a system in this state (users keep
rebooting the machines) and I don't have a screen capture, but I
recall that df showed the user directory as being "there" but showed
dashes instead of numbers for the capacity figures.  Unfortunately I
don't recall what showed up in the Filesystem column.

I can't think of anything which has changed on the affected systems.
Since they're running FC5 nothing is getting updated.  The automount
maps are served via LDAP and as far as I can tell the queries aren't
failing.  The server doesn't report any problems.  Nothing has changed
recently on either the LDAP servers or the NFS servers.

So I'm at a complete loss here.  If I can access a system while the
problem is manifesting I'm sure I can find some more useful
information but until then I was hoping that either someone might have
seen something liks this and had an idea of what might be causing it
or someone might have some idea of how I might go about debugging
this.  I have debugging turned on but haven't seen anything useful
yet.

 - J<

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A weird autofs problem
  2007-08-16 23:44 A weird autofs problem Jason L Tibbitts III
@ 2007-08-17  3:17 ` Ian Kent
  2007-08-17 13:35   ` Jim Summers
  2007-08-17 14:27   ` Jason L Tibbitts III
  0 siblings, 2 replies; 6+ messages in thread
From: Ian Kent @ 2007-08-17  3:17 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: autofs

On Thu, 2007-08-16 at 18:44 -0500, Jason L Tibbitts III wrote:
> I'm trying to debug a very weird problem I've started seeing with
> autofs over the last couple of days.  The machines I'm seeing this on
> are all running Fedora Core 5 i386, fully updated (autofs 4.1.4-33 and
> kernel-smp 2.6.20-1.2320.fc5).  Yes, I'm in the process of rolling out
> F7, but that will still take at least a couple of weeks and in the
> meantime I'd like to get to the bottom of this if possible.
> 
> We use autofs to automount user home directories under /home.  Some
> users are reporting that their home directories are just disappearing
> out from under them; attempts to write to them result in "permission
> denied".  Only once have I seen a system in this state (users keep
> rebooting the machines) and I don't have a screen capture, but I
> recall that df showed the user directory as being "there" but showed
> dashes instead of numbers for the capacity figures.  Unfortunately I
> don't recall what showed up in the Filesystem column.

That NFS problem was fixed in 2.6.18 I think.

> 
> I can't think of anything which has changed on the affected systems.
> Since they're running FC5 nothing is getting updated.  The automount
> maps are served via LDAP and as far as I can tell the queries aren't
> failing.  The server doesn't report any problems.  Nothing has changed
> recently on either the LDAP servers or the NFS servers.

Something has changed?

> 
> So I'm at a complete loss here.  If I can access a system while the
> problem is manifesting I'm sure I can find some more useful
> information but until then I was hoping that either someone might have
> seen something liks this and had an idea of what might be causing it
> or someone might have some idea of how I might go about debugging
> this.  I have debugging turned on but haven't seen anything useful
> yet.

We'll need you to get a first hand look at a broken system and have a
look around and see what really is wrong. And you know that a debug log
would be good too.

Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A weird autofs problem
  2007-08-17  3:17 ` Ian Kent
@ 2007-08-17 13:35   ` Jim Summers
  2007-08-17 14:27   ` Jason L Tibbitts III
  1 sibling, 0 replies; 6+ messages in thread
From: Jim Summers @ 2007-08-17 13:35 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

Ian Kent wrote:
> On Thu, 2007-08-16 at 18:44 -0500, Jason L Tibbitts III wrote:
>> I'm trying to debug a very weird problem I've started seeing with
>> autofs over the last couple of days.  The machines I'm seeing this on
>> are all running Fedora Core 5 i386, fully updated (autofs 4.1.4-33 and
>> kernel-smp 2.6.20-1.2320.fc5).  Yes, I'm in the process of rolling out
>> F7, but that will still take at least a couple of weeks and in the
>> meantime I'd like to get to the bottom of this if possible.
>>
>> We use autofs to automount user home directories under /home.  Some
>> users are reporting that their home directories are just disappearing
>> out from under them; attempts to write to them result in "permission
>> denied".  Only once have I seen a system in this state (users keep
>> rebooting the machines) and I don't have a screen capture, but I
>> recall that df showed the user directory as being "there" but showed
>> dashes instead of numbers for the capacity figures.  Unfortunately I
>> don't recall what showed up in the Filesystem column.
> 
> That NFS problem was fixed in 2.6.18 I think.
> 

when i read this, nfs was the first thing that popped into my mind also. 
second would be a network outage.  i have experienced the exact symptoms, but 
on static nfs mounts, after i have brought down the server for extended 
periods, or the net is out for long periods, then a client will do this.  i 
simply recover by umounting then doing a mount -a.

hth



>> I can't think of anything which has changed on the affected systems.
>> Since they're running FC5 nothing is getting updated.  The automount
>> maps are served via LDAP and as far as I can tell the queries aren't
>> failing.  The server doesn't report any problems.  Nothing has changed
>> recently on either the LDAP servers or the NFS servers.
> 
> Something has changed?
> 
>> So I'm at a complete loss here.  If I can access a system while the
>> problem is manifesting I'm sure I can find some more useful
>> information but until then I was hoping that either someone might have
>> seen something liks this and had an idea of what might be causing it
>> or someone might have some idea of how I might go about debugging
>> this.  I have debugging turned on but haven't seen anything useful
>> yet.
> 
> We'll need you to get a first hand look at a broken system and have a
> look around and see what really is wrong. And you know that a debug log
> would be good too.
> 
> Ian
> 
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs

-- 
Jim Summers
School of Computer Science-University of Oklahoma
-------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A weird autofs problem
  2007-08-17  3:17 ` Ian Kent
  2007-08-17 13:35   ` Jim Summers
@ 2007-08-17 14:27   ` Jason L Tibbitts III
  2007-08-17 18:40     ` Ian Kent
  1 sibling, 1 reply; 6+ messages in thread
From: Jason L Tibbitts III @ 2007-08-17 14:27 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

>>>>> "IK" == Ian Kent <raven@themaw.net> writes:

IK> That NFS problem was fixed in 2.6.18 I think.

Well, I'm at 2.6.20, so....

IK> Something has changed?

I honestly can't think of anything.

IK> We'll need you to get a first hand look at a broken system and
IK> have a look around and see what really is wrong. And you know that
IK> a debug log would be good too.

OK, here's a machine that's gone bad.  df shows:

nas01:/export/users-00/boyarkin
                             -         -         -   -  /home/boyarkin

Attempting to unmount the directory results in "device is busy" as the
user is currently logged in and has files open on the filesystem.

Unfortunately because the user has not logged out of this machine, I
haven't been able to restart the automounter so debugging is not
enabled.

Simply running "umount -l /home/boyarkin" as roott causes the mount to
appear properly.

I've kicked the user out of the machine and restarted the automounter
with debugging on.  I'll continue to watch this machine and see if the
problem reoccurs.

 - J<

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A weird autofs problem
  2007-08-17 14:27   ` Jason L Tibbitts III
@ 2007-08-17 18:40     ` Ian Kent
  2007-08-17 18:57       ` Jason L Tibbitts III
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Kent @ 2007-08-17 18:40 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: autofs

On Fri, 2007-08-17 at 09:27 -0500, Jason L Tibbitts III wrote:
> >>>>> "IK" == Ian Kent <raven@themaw.net> writes:
> 
> IK> That NFS problem was fixed in 2.6.18 I think.
> 
> Well, I'm at 2.6.20, so....
> 
> IK> Something has changed?
> 
> I honestly can't think of anything.
> 
> IK> We'll need you to get a first hand look at a broken system and
> IK> have a look around and see what really is wrong. And you know that
> IK> a debug log would be good too.
> 
> OK, here's a machine that's gone bad.  df shows:
> 
> nas01:/export/users-00/boyarkin
>                              -         -         -   -  /home/boyarkin
> 
> Attempting to unmount the directory results in "device is busy" as the
> user is currently logged in and has files open on the filesystem.

What about the output of an ls as you referred bogus entries being
listed. Is that actually the case? Is there anything in the log that
might give us a clue as to what's going on?

> 
> Unfortunately because the user has not logged out of this machine, I
> haven't been able to restart the automounter so debugging is not
> enabled.
> 
> Simply running "umount -l /home/boyarkin" as roott causes the mount to
> appear properly.

Yes, that would do the trick.

> 
> I've kicked the user out of the machine and restarted the automounter
> with debugging on.  I'll continue to watch this machine and see if the
> problem reoccurs.
> 
>  - J<

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A weird autofs problem
  2007-08-17 18:40     ` Ian Kent
@ 2007-08-17 18:57       ` Jason L Tibbitts III
  0 siblings, 0 replies; 6+ messages in thread
From: Jason L Tibbitts III @ 2007-08-17 18:57 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

>>>>> "IK" == Ian Kent <raven@themaw.net> writes:

IK> What about the output of an ls as you referred bogus entries being
IK> listed.

No, the filesystem is completely inaccessible at this point.

However, I have another hint which seems to let autofs off the hook:

These happened on the NFS server around the time of the first report
of problems:
Aug 15 11:34:36 nas01 nscd: 14741 Error getting context of nscd
Aug 15 11:34:47 nas01 last message repeated 18 times
Aug 15 11:34:47 nas01 nscd: nscd shutdown succeeded
Aug 15 11:34:47 nas01 nscd: 21046 Access Vector Cache (AVC) started
Aug 15 11:34:47 nas01 nscd: nscd startup succeeded

(actually the first line has been appearing in the logs for ages; I
assume it's just more selinux stuff which I still don't understand.)

And then this morning:

Aug 17 10:16:12 nas01 kernel: nscd[14745]: segfault at 0000002afed9e933 rip 000000552aab7cf2 rsp 0000000040c00750 error 4
Aug 17 10:16:27 nas01 kernel: nscd[21050]: segfault at 0000002b99607280 rip 000000552aab7d08 rsp 0000000040c00750 error 4

so nscd definitely screwed up in some manner.  I restarted it:

Aug 17 10:30:27 nas01 nscd: 22346 Access Vector Cache (AVC) started
Aug 17 10:30:27 nas01 nscd: nscd startup succeeded
Aug 17 10:30:27 nas01 nscd: 22346 invalid persistent database file "/var/db/nscd/passwd": verification failed

and so far no reports of problems.

Sorry to try to lay any blame on autofs.  Just another tribute to how
impenetrable NFS can be.

 - J<

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-08-17 18:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-16 23:44 A weird autofs problem Jason L Tibbitts III
2007-08-17  3:17 ` Ian Kent
2007-08-17 13:35   ` Jim Summers
2007-08-17 14:27   ` Jason L Tibbitts III
2007-08-17 18:40     ` Ian Kent
2007-08-17 18:57       ` Jason L Tibbitts III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.