From: Richard Hirst <rhirst@levanta.com>
To: nfs@lists.sourceforge.net
Subject: Re: block dev minor > 255 and exporting fs
Date: Fri, 7 Oct 2005 10:45:32 +0100 [thread overview]
Message-ID: <20051007094532.GW6490@levanta.com> (raw)
> Hi. I've noticed that an NFS mount times out when I export a
> filesystem residing on a block device with a "large" minor number,
> i.e. beyond the old limit of 255 from when there were only eight bits
> for the minor number of devices.
When I looked in to this I decided the problem lay in userland not
kernel land. Once you get to minor numbers greater than 255, this
kernel code:
+++ linux-2.6.10/fs/nfsd/nfsfh.c 2005-08-05 17:35:12.128552514 +0100
@@ -351,8 +351,13 @@
if (!old_valid_dev(ex_dev) && ref_fh_fsid_type == 0) {
/* for newer device numbers, we must use a newer fsid format */
ref_fh_version = 1;
ref_fh_fsid_type = 3;
}
switches from using a type 0 fsid to a type 3 fsid.
Then somewhere in mountd it reads that fsid and tries to interpret
it. Trouble is nfs-utils only understands fsid types 0 and 1. I'm
a bit vague about this .. it was while ago I looked at it, but IIRC
the nfs-utils code was here:
nfs-utils-1.0.6/utils/mountd/cache.c round line 122:
if (fsidtype < 0 || fsidtype > 1)
goto out; /* unknown type */
Anyway, the fsid type 0 can actually handle up to 16 bits for major
and minor and 16 bits was enough for me, so I hacked my kernel to
use fsid type 0 for minors up to 64K.
Obviously things might have moved on since I looked at those code
versions.
(I'm not subscribed, please CC me on replies)
Richard
>
> If I use a block device with a lower minor number, things work as
> expected, and if I "wrap" a high-numbered device in a trivial md set,
> using /dev/md0 with its minor number of zero, things work as expected.
>
> Without initial success I've looked at the kernel sources to see where
> the nfs server might be using only eight of the twenty bits 2.6 uses
> for minor numbers. Does anyone know where that might be occuring?
>
> The nfs server in my tests is a debian testing machine running
> 2.6.12-1-amd64-generic, and the client is a debian stable system
> running a custom 2.6.13-rc6 kernel, but I've seen this problem on
> other systems a while ago. At that time I found out that 255 was the
> magic minor number after which problems started occuring, if I recall
> correctly. If you don't have block devices with high minor numbers to
> test with, you can replicate this problem using the vblade:
>
> http://sourceforge.net/projects/aoetools/
>
> ... and the aoe driver in any 2.6 kernel from 2.6.11. Anyway, here
> are the details for interested parties. The nfs server is "makki" and
> the client is "kokone".
>
> makki:/home/ecashin# modprobe aoe
> makki:/home/ecashin# ls -l /dev/etherd/e2.1
> brw-rw---- 1 root disk 152, 336 2005-10-05 08:24 /dev/etherd/e2.1
> makki:/home/ecashin# mount /dev/etherd/e2.1 /mnt/aoe/e2.1
> makki:/home/ecashin# grep aoe /etc/exports
> /mnt/aoe/e2.1 *.coraid.com(rw,sync)
> makki:/home/ecashin#
>
> On the client, mount times out.
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> mount: makki:/mnt/aoe/e2.1: can't read superblock
> root@kokone root# tail /var/log/everything
> ...
> Oct 5 12:27:16 kokone kernel: nfs: server makki not responding, timed out
> Oct 5 12:27:37 kokone last message repeated 2 times
> root@kokone root#
>
> I can use a trivial one-device linear software RAID on the nfs server
> so that nfs doesn't see the high minor device number. This is just
> using a low-minor-number md device as a wrapper for the
> high-minor-number aoe device.
>
> makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
> Stopping NFS kernel daemon: mountd nfsd.
> Unexporting directories for NFS kernel daemon...done.
> Stopping NFS common utilities: statd.
> makki:/home/ecashin# umount /mnt/aoe/e2.1
> makki:/home/ecashin# ls -l /dev/md0
> brw-rw---- 1 root disk 9, 0 2005-10-05 08:40 /dev/md0
> makki:/home/ecashin# mdadm -B --auto=md --force -l linear -n 1 /dev/md0 /dev/etherd/e2.1
> mdadm: array /dev/md0 built and started.
> makki:/home/ecashin# mount /dev/md0 /mnt/aoe/e2.1
> makki:/home/ecashin# ls /mnt/aoe/e2.1
> screen
> makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
> Starting NFS common utilities: statd.
> Exporting directories for NFS kernel daemon...done.
> Starting NFS kernel daemon: nfsd mountd.
> makki:/home/ecashin#
>
> Then on the client, all goes well:
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> root@kokone root# ls /mnt/makki
> screen
> root@kokone root# umount /mnt/makki
>
> So I have a nice workaround, but I would rather not need it. Things
> go well *without* the md wrapper if the aoe device has a minor number
> below 256. What part of the nfs server doesn't use all twenty bits
> that 2.6 uses for the device minor number? I remember guessing that
> it was a handle or tag used in the protocol, but that was a long time
> ago.
>
> makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
> Stopping NFS kernel daemon: mountd nfsd.
> Unexporting directories for NFS kernel daemon...done.
> Stopping NFS common utilities: statd.
> makki:/home/ecashin# umount /mnt/aoe/e2.1
> makki:/home/ecashin# mdadm -S /dev/md0
> makki:/home/ecashin# sync
> makki:/home/ecashin# ls -l /dev/etherd/e0.0
> brw-rw---- 1 root disk 152, 0 2005-10-05 08:49 /dev/etherd/e0.0
> makki:/home/ecashin# mount /dev/etherd/e0.0 /mnt/aoe/e2.1
> makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
> Starting NFS common utilities: statd.
> Exporting directories for NFS kernel daemon...done.
> Starting NFS kernel daemon: nfsd mountd.
> makki:/home/ecashin#
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> root@kokone root# ls /mnt/makki
> screen
> root@kokone root#
>
> --
> Ed L Cashin <ecashin@coraid.com>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads, discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next reply other threads:[~2005-10-07 9:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-07 9:45 Richard Hirst [this message]
2005-10-14 7:41 ` block dev minor > 255 and exporting fs Neil Brown
-- strict thread matches above, loose matches on Subject: below --
2005-10-05 17:32 Ed L Cashin
2005-10-06 6:33 ` Neil Brown
2005-10-06 16:38 ` Ed L Cashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051007094532.GW6490@levanta.com \
--to=rhirst@levanta.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.