From: Lukas Razik <linux@razik.name>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Jim Rees <rees@umich.edu>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [BUG?] Maybe NFS bug since 2.6.37 on SPARC64
Date: Thu, 3 Nov 2011 23:09:24 +0000 (GMT) [thread overview]
Message-ID: <1320361764.48851.YahooMailNeo@web24708.mail.ird.yahoo.com> (raw)
In-Reply-To: <92DF2E31-FABF-40A5-8F78-89B64363568B@oracle.com>
Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
> On Nov 3, 2011, at 5:37 PM, Lukas Razik wrote:
>
>>> On Nov 3, 2011, at 5:11 PM, Jim Rees wrote:
>>
>>>
>>>> Trond Myklebust wrote:
>>>>
>>>>> [ 442.666622] NFS: failed to create MNT RPC client, status=-60
>>>>> [ 442.666732] NFS: unable to mount server 137.226.167.241,
> error -60
>>>>> [ 442.666868] VFS: Unable to mount root fs via NFS, trying
> floppy.
>>>>> [ 442.667032] VFS: Insert root floppy and press ENTER
>>>>>
>>>> Error 60 is ETIMEDOUT on SPARC, so it seems that the problem is
>>>> basically the same one that you see in your 2.6.32 trace
> (rpcbind:
>>>> server 137.226.167.241 not responding, timed out) except that now
> it is
>>>> a fatal error.
>>>>
>>>> Any idea why the first RPC calls might be failing here? A switch
>>>> misconfiguration or something like that perhaps?
>>>>
>>>> Wasn't there a change in the way nfs mount options are handled
> by the
>>> kernel
>>>> for nfsroot about the time of 2.6.39? Something about changing
> from
>>> default
>>>> udp to tcp maybe?
>>>
>>> There was a change, but it was changed back to UDP because of problems
> like
>>> this. Behavior in 3.0 or the latest 2.6.39 stable kernel may be
> improved.
>>>
>>
>> I don't know if this was a tip to test newest 2.6.39 but as I wrote in
> my first email
>> http://thread.gmane.org/gmane.linux.nfs/44596
>> that's the output of linux-2.6.39.4 with "nfsdebug":
>>
>> [ 407.571521] IP-Config: Complete:
>> [ 407.571589] device=eth0, addr=137.226.167.242, mask=255.255.255.224,
> gw=137.226.167.225,
>> [ 407.571793] host=cluster2, domain=, nis-domain=(none),
>> [ 407.571907] bootserver=255.255.255.255, rootserver=137.226.167.241,
> rootpath=
>> [ 407.572332] Root-NFS: nfsroot=/srv/nfs/cluster2
>> [ 407.572726] NFS: nfs mount opts='udp,nolock,addr=137.226.167.241'
>> [ 407.572927] NFS: parsing nfs mount option 'udp'
>> [ 407.572995] NFS: parsing nfs mount option 'nolock'
>> [ 407.573071] NFS: parsing nfs mount option 'addr=137.226.167.241'
>> [ 407.573139] NFS: MNTPATH: '/srv/nfs/cluster2'
>> [ 407.573203] NFS: sending MNT request for
> 137.226.167.241:/srv/nfs/cluster2
>> [ 408.617894] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx
>> [ 408.638319] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [ 442.666622] NFS: failed to create MNT RPC client, status=-60
>> [ 442.666732] NFS: unable to mount server 137.226.167.241, error -60
>> [ 442.666868] VFS: Unable to mount root fs via NFS, trying floppy.
>> [ 442.667032] VFS: Insert root floppy and press ENTER
>>
>> And this behaviour is exactly the same as in all other 2.6.37 - 2.6.39.4
> which I've tested.
>> So if anybody of you all have an idea what I could try to do, I'll
> follow...
>
> Find out why the very first RPC on your system always fails. As Trond says, the
> only reason this worked on the older kernels is because NFSROOT fell back to a
> default port for NFSD. This is also broken behavior, but in your case it
> happened to work so you never noticed it.
>
> I seem to recall there's a way to set the NFS and RPC debugging flags on the
> kernel command line so more information can be captured during boot. But I
> don't see it under Documentation/.
>
> You could add a line in fs/nfs/nfsroot.c:nfs_root_debug() to set flags also in
> the rpc_debug global variable to gather more information.
>
OK
I've watched wireshark on cluster1 during start up of cluster2 (with linux-2.6.32) which first tries 10003 and then 10005.
The result is that cluster1 doesn't get a datagram for port 10003:
http://net.razik.de/linux/T5120/cluster2_NFSROOT_MOUNT.png
The first ARP request in the screenshot came _after_ the <tag> in this kernel log:
[ 6492.807917] IP-Config: Complete:
[ 6492.807978] device=eth0, addr=137.226.167.242, mask=255.255.255.224, gw=137.226.167.225,
[ 6492.808227] host=cluster2, domain=, nis-domain=(none),
[ 6492.808312] bootserver=255.255.255.255, rootserver=137.226.167.241, rootpath=
[ 6492.808570] Looking up port of RPC 100003/2 on 137.226.167.241
[ 6493.886014] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx
[ 6493.905840] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<tag>
[ 6527.827055] rpcbind: server 137.226.167.241 not responding, timed out
[ 6527.827237] Root-NFS: Unable to get nfsd port number from server, using default
[ 6527.827353] Looking up port of RPC 100005/1 on 137.226.167.241
[ 6527.842212] VFS: Mounted root (nfs filesystem) on device 0:15.
So I don't think that it's a problem of the hardware between the machines.
There's no reason why I wouldn't see an ARP requests from cluster2 which would have been sent _before_ the <tag> if there would be one. I think: cluster2 never sends a request to port 10003.
What do you think?
next prev parent reply other threads:[~2011-11-03 23:09 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-03 19:43 [BUG?] Maybe NFS bug since 2.6.37 on SPARC64 Lukas Razik
2011-11-03 20:54 ` Trond Myklebust
2011-11-03 21:10 ` Chuck Lever
2011-11-03 21:11 ` Jim Rees
2011-11-03 21:16 ` Chuck Lever
2011-11-03 21:37 ` Lukas Razik
2011-11-03 21:51 ` Chuck Lever
2011-11-03 23:09 ` Lukas Razik [this message]
2011-11-03 23:59 ` Jim Rees
2011-11-04 0:59 ` Lukas Razik
2011-11-04 1:06 ` Chuck Lever
2011-11-04 1:33 ` Lukas Razik
2011-11-04 9:44 ` Lukas Razik
2011-11-04 13:20 ` Jim Rees
2011-11-04 14:01 ` Chuck Lever
2011-11-04 14:09 ` Myklebust, Trond
2011-11-04 14:24 ` J. Bruce Fields
2011-11-04 14:46 ` Jim Rees
2011-11-04 15:02 ` Lukas Razik
2011-11-04 15:18 ` Myklebust, Trond
2011-11-04 15:46 ` Lukas Razik
2011-11-04 22:55 ` Chuck Lever
2011-11-04 23:17 ` Lukas Razik
2011-11-04 13:54 ` Chuck Lever
2011-11-04 14:57 ` Lukas Razik
2011-11-04 16:56 ` Lukas Razik
2011-11-04 17:55 ` Lukas Razik
2011-11-04 23:15 ` NFSROOT mount fails on SPARC after 2.6.37 Chuck Lever
2011-11-05 2:03 ` David Miller
2011-11-05 2:38 ` Trond Myklebust
2011-11-04 23:40 ` [BUG?] Maybe NFS bug since 2.6.37 on SPARC64 Lukas Razik
2011-11-05 1:19 ` Trond Myklebust
2011-11-05 1:52 ` Lukas Razik
2011-11-05 2:14 ` Lukas Razik
2011-11-05 2:30 ` Trond Myklebust
2011-11-05 2:31 ` Trond Myklebust
2011-11-05 2:31 ` Trond Myklebust
2011-11-05 3:51 ` Lukas Razik
2011-11-05 13:05 ` Jim Rees
2011-11-12 11:35 ` Lukas Razik
2011-11-12 18:49 ` Jim Rees
2011-11-12 21:06 ` Chuck Lever
2011-11-13 1:03 ` Lukas Razik
2011-11-13 19:32 ` Chuck Lever
2011-11-13 21:28 ` Lukas Razik
2011-11-13 22:19 ` Lukas Razik
2011-11-14 15:31 ` Chuck Lever
2011-11-03 21:18 ` Lukas Razik
2011-11-03 21:38 ` Jim Rees
2011-11-03 21:58 ` Lukas Razik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1320361764.48851.YahooMailNeo@web24708.mail.ird.yahoo.com \
--to=linux@razik.name \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@umich.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox