* [NFS] Help! NFS broken
@ 2008-12-07 14:48 mike
[not found] ` <bd9320b30812070648x652c8430uf567b9c80cda07a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: mike @ 2008-12-07 14:48 UTC (permalink / raw)
To: nfs
I upgraded my Ubuntu Hardy server to Intrepid the night before last.
When I woke up yesterday morning, my main server I use as my ssh
gateway into the others was totally messed up.
My server is FreeBSD. I haven't had to touch it since I set it up.
My clients are 6 Ubuntu servers. All are identical in packages,
configs (I diff'ed /etc), kernel versions, network setup, etc.
Only *one* of the machines is suffering (and of course, one of the
most important ones) - and it isn't even one of the busiest.
I've tried downgrading the kernel on the box suffering the issue from
2.6.27-10 to 2.6.27-7, my next attempt will be picking a kernel .deb
that was from the previous Ubuntu release...
What is odd is that it works great after reboot and lasts for a couple
hours, then stops working. I can umount -l /home and then try to
remount it (see below) but it never gets anywhere and eventually dies
with a generic message. I tried to strace -f it, and it gave me
nothing to work with. The FreeBSD server doesn't give me anything in
logs to go off of either. I can ping and ssh between the two no
problem at this point still. It's just NFS that is odd. Also I did
notice trying to restart services manually and try to debug them that
portmap seemed to throw a kernel error in my logs once in a while. But
I don't get a connection to portmap when I run the mount command, and
I would assume if portmap is required for mounting NFS shares that it
would need to contact it. That could totally be irrelevant though.
Any help or insight or request for additional information is
appreciated. On-list or off-list is fine. I will pay someone via
Paypal who can help me resolve this quickly...
[root@lvs01 ~]# mount -vvvv /home
mount: fstab path: "/etc/fstab"
mount: mtab path: "/etc/mtab"
mount: lock path: "/etc/mtab~"
mount: temp path: "/etc/mtab.tmp"
mount: spec: "raid01:/home"
mount: node: "/home"
mount: types: "nfs"
mount: opts: "rsize=8192,rsize=8192,tcp,rw,acregmin=30"
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "raid01:/home"
mount: external mount: argv[2] = "/home"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,rsize=8192,rsize=8192,tcp,acregmin=30"
mount.nfs: timeout set for Sun Dec 7 06:36:39 2008
mount.nfs: text-based options:
'rsize=8192,rsize=8192,tcp,acregmin=30,addr=10.13.220.94'
(just stalls here, normally a connection is near instant. eventually
it will die with a generic error message. i can control-C to quit it
too, so it's not frozen completely)
thanks...
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [NFS] Help! NFS broken
[not found] ` <bd9320b30812070648x652c8430uf567b9c80cda07a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-12-08 23:38 ` J. Bruce Fields
2008-12-09 0:12 ` mike
0 siblings, 1 reply; 3+ messages in thread
From: J. Bruce Fields @ 2008-12-08 23:38 UTC (permalink / raw)
To: mike; +Cc: nfs
On Sun, Dec 07, 2008 at 06:48:52AM -0800, mike wrote:
> I upgraded my Ubuntu Hardy server to Intrepid the night before last.
> When I woke up yesterday morning, my main server I use as my ssh
> gateway into the others was totally messed up.
>
> My server is FreeBSD. I haven't had to touch it since I set it up.
>
> My clients are 6 Ubuntu servers. All are identical in packages,
> configs (I diff'ed /etc), kernel versions, network setup, etc.
So the "server" in the first paragraph is an NFS client, and its NFS
server is the FreeBSD machine?
And what are the first symptoms? Any threads accessing the NFS
filesystem just hang? A sysrq-T trace on the client showing where
they're hanging might be helpful.
--b.
>
> Only *one* of the machines is suffering (and of course, one of the
> most important ones) - and it isn't even one of the busiest.
>
> I've tried downgrading the kernel on the box suffering the issue from
> 2.6.27-10 to 2.6.27-7, my next attempt will be picking a kernel .deb
> that was from the previous Ubuntu release...
>
> What is odd is that it works great after reboot and lasts for a couple
> hours, then stops working. I can umount -l /home and then try to
> remount it (see below) but it never gets anywhere and eventually dies
> with a generic message. I tried to strace -f it, and it gave me
> nothing to work with. The FreeBSD server doesn't give me anything in
> logs to go off of either. I can ping and ssh between the two no
> problem at this point still. It's just NFS that is odd. Also I did
> notice trying to restart services manually and try to debug them that
> portmap seemed to throw a kernel error in my logs once in a while. But
> I don't get a connection to portmap when I run the mount command, and
> I would assume if portmap is required for mounting NFS shares that it
> would need to contact it. That could totally be irrelevant though.
>
> Any help or insight or request for additional information is
> appreciated. On-list or off-list is fine. I will pay someone via
> Paypal who can help me resolve this quickly...
>
> [root@lvs01 ~]# mount -vvvv /home
> mount: fstab path: "/etc/fstab"
> mount: mtab path: "/etc/mtab"
> mount: lock path: "/etc/mtab~"
> mount: temp path: "/etc/mtab.tmp"
> mount: spec: "raid01:/home"
> mount: node: "/home"
> mount: types: "nfs"
> mount: opts: "rsize=8192,rsize=8192,tcp,rw,acregmin=30"
> mount: external mount: argv[0] = "/sbin/mount.nfs"
> mount: external mount: argv[1] = "raid01:/home"
> mount: external mount: argv[2] = "/home"
> mount: external mount: argv[3] = "-v"
> mount: external mount: argv[4] = "-o"
> mount: external mount: argv[5] = "rw,rsize=8192,rsize=8192,tcp,acregmin=30"
> mount.nfs: timeout set for Sun Dec 7 06:36:39 2008
> mount.nfs: text-based options:
> 'rsize=8192,rsize=8192,tcp,acregmin=30,addr=10.13.220.94'
> (just stalls here, normally a connection is near instant. eventually
> it will die with a generic error message. i can control-C to quit it
> too, so it's not frozen completely)
>
>
>
> thanks...
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you. Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
> _______________________________________________
> Please note that nfs@lists.sourceforge.net is being discontinued.
> Please subscribe to linux-nfs@vger.kernel.org instead.
> http://vger.kernel.org/vger-lists.html#linux-nfs
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [NFS] Help! NFS broken
2008-12-08 23:38 ` J. Bruce Fields
@ 2008-12-09 0:12 ` mike
0 siblings, 0 replies; 3+ messages in thread
From: mike @ 2008-12-09 0:12 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: nfs
On Mon, Dec 8, 2008 at 3:38 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> So the "server" in the first paragraph is an NFS client, and its NFS
> server is the FreeBSD machine?
Yes
> And what are the first symptoms? Any threads accessing the NFS
> filesystem just hang? A sysrq-T trace on the client showing where
> they're hanging might be helpful.
Honestly, these are production, and I looked in every place I could
think for any hints, and I get nothing. I can't really be using this
to test either. What is odd is identically configured machines (down
to the same files in /etc, same packages from dpkg -l etc) have no
issue.
For a last result, I tried different kernel versions (from Ubuntu):
linux-image-2.6.27-10-server - broken
linux-image-2.6.27-7-server - broken
linux-image-2.6.28-2-server - i think i used this too quick and it was
broken (I might be wrong and wound up deciding to go back instead of
forward)
linux-image-2.6.24-16-server - working for 2 days now, so sticking with it
Note that all the other nodes (5 of the 6 identical nodes are fine,
this was the one bad one) are running the default kernel in Intrepid
at the moment: linux-image-2.6.27-10-server and don't seem to be
suffering from any issues.
So it seems to be a combination of those kernels + that machine.
Problem is, that machine's configuration is identical - same
nfs-utils, portmap, etc, etc. and from my rsync scan, even the
majority of files (and anything that should be relevant) in /etc are
identical too.
This might be an Ubuntu bug or something flaky with 2.6.27 (maybe
2.6.28 too) and NFS in general but I don't know how I can produce any
worthwhile debugging, especially considering this is in production.
When I wrote this I saw no fix in place; the kernel downgrade appears
to be the workaround for now.
Sorry I can't be more help. At the point of this email I had the
luxury of a broken setup to debug, but now that I've stabilized it I
have to keep it this way :)
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-12-09 0:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-07 14:48 [NFS] Help! NFS broken mike
[not found] ` <bd9320b30812070648x652c8430uf567b9c80cda07a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-12-08 23:38 ` J. Bruce Fields
2008-12-09 0:12 ` mike
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox