Linux NFS development
 help / color / mirror / Atom feed
From: Richard Hirst <rhirst@levanta.com>
To: nfs@lists.sourceforge.net
Subject: Re: block dev minor > 255 and exporting fs
Date: Fri, 7 Oct 2005 10:45:32 +0100	[thread overview]
Message-ID: <20051007094532.GW6490@levanta.com> (raw)

> Hi.  I've noticed that an NFS mount times out when I export a
> filesystem residing on a block device with a "large" minor number,
> i.e. beyond the old limit of 255 from when there were only eight bits
> for the minor number of devices.

When I looked in to this I decided the problem lay in userland not
kernel land.  Once you get to minor numbers greater than 255, this
kernel code:

+++ linux-2.6.10/fs/nfsd/nfsfh.c        2005-08-05 17:35:12.128552514 +0100
@@ -351,8 +351,13 @@

        if (!old_valid_dev(ex_dev) && ref_fh_fsid_type == 0) {
                /* for newer device numbers, we must use a newer fsid format */
                ref_fh_version = 1;
                ref_fh_fsid_type = 3;
        }


switches from using a type 0 fsid to a type 3 fsid.

Then somewhere in mountd it reads that fsid and tries to interpret
it.  Trouble is nfs-utils only understands fsid types 0 and 1.  I'm
a bit vague about this .. it was while ago I looked at it, but IIRC
the nfs-utils code was here:

nfs-utils-1.0.6/utils/mountd/cache.c round line 122:

        if (fsidtype < 0 || fsidtype > 1)
                goto out; /* unknown type */


Anyway, the fsid type 0 can actually handle up to 16 bits for major
and minor and 16 bits was enough for me, so I hacked my kernel to
use fsid type 0 for minors up to 64K.

Obviously things might have moved on since I looked at those code
versions.

(I'm not subscribed, please CC me on replies)

Richard


> 
> If I use a block device with a lower minor number, things work as
> expected, and if I "wrap" a high-numbered device in a trivial md set,
> using /dev/md0 with its minor number of zero, things work as expected.
> 
> Without initial success I've looked at the kernel sources to see where
> the nfs server might be using only eight of the twenty bits 2.6 uses
> for minor numbers.  Does anyone know where that might be occuring?
> 
> The nfs server in my tests is a debian testing machine running
> 2.6.12-1-amd64-generic, and the client is a debian stable system
> running a custom 2.6.13-rc6 kernel, but I've seen this problem on
> other systems a while ago.  At that time I found out that 255 was the
> magic minor number after which problems started occuring, if I recall
> correctly.  If you don't have block devices with high minor numbers to
> test with, you can replicate this problem using the vblade:
> 
>   http://sourceforge.net/projects/aoetools/
> 
> ... and the aoe driver in any 2.6 kernel from 2.6.11.  Anyway, here
> are the details for interested parties.  The nfs server is "makki" and
> the client is "kokone".
> 
>   makki:/home/ecashin# modprobe aoe
>   makki:/home/ecashin# ls -l /dev/etherd/e2.1
>   brw-rw----  1 root disk 152, 336 2005-10-05 08:24 /dev/etherd/e2.1
>   makki:/home/ecashin# mount /dev/etherd/e2.1 /mnt/aoe/e2.1 
>   makki:/home/ecashin# grep aoe /etc/exports
>   /mnt/aoe/e2.1 *.coraid.com(rw,sync)
>   makki:/home/ecashin# 
> 
> On the client, mount times out.
> 
>   root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki 
>   mount: makki:/mnt/aoe/e2.1: can't read superblock
>   root@kokone root# tail /var/log/everything
> ...
>   Oct  5 12:27:16 kokone kernel: nfs: server makki not responding, timed out
>   Oct  5 12:27:37 kokone last message repeated 2 times
>   root@kokone root# 
> 
> I can use a trivial one-device linear software RAID on the nfs server
> so that nfs doesn't see the high minor device number.  This is just
> using a low-minor-number md device as a wrapper for the
> high-minor-number aoe device.
> 
>   makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
>   Stopping NFS kernel daemon: mountd nfsd.
>   Unexporting directories for NFS kernel daemon...done.
>   Stopping NFS common utilities: statd.
>   makki:/home/ecashin# umount /mnt/aoe/e2.1
>   makki:/home/ecashin# ls -l /dev/md0
>   brw-rw----  1 root disk 9, 0 2005-10-05 08:40 /dev/md0
>   makki:/home/ecashin# mdadm -B --auto=md --force -l linear -n 1 /dev/md0 /dev/etherd/e2.1
>   mdadm: array /dev/md0 built and started.
>   makki:/home/ecashin# mount /dev/md0 /mnt/aoe/e2.1
>   makki:/home/ecashin# ls /mnt/aoe/e2.1
>   screen
>   makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
>   Starting NFS common utilities: statd.
>   Exporting directories for NFS kernel daemon...done.
>   Starting NFS kernel daemon: nfsd mountd.
>   makki:/home/ecashin# 
> 
> Then on the client, all goes well:
> 
>   root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
>   root@kokone root# ls /mnt/makki
>   screen
>   root@kokone root# umount /mnt/makki
> 
> So I have a nice workaround, but I would rather not need it.  Things
> go well *without* the md wrapper if the aoe device has a minor number
> below 256.  What part of the nfs server doesn't use all twenty bits
> that 2.6 uses for the device minor number?  I remember guessing that
> it was a handle or tag used in the protocol, but that was a long time
> ago.
> 
>   makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
>   Stopping NFS kernel daemon: mountd nfsd.
>   Unexporting directories for NFS kernel daemon...done.
>   Stopping NFS common utilities: statd.
>   makki:/home/ecashin# umount /mnt/aoe/e2.1 
>   makki:/home/ecashin# mdadm -S /dev/md0
>   makki:/home/ecashin# sync
>   makki:/home/ecashin# ls -l /dev/etherd/e0.0
>   brw-rw----  1 root disk 152, 0 2005-10-05 08:49 /dev/etherd/e0.0
>   makki:/home/ecashin# mount /dev/etherd/e0.0 /mnt/aoe/e2.1
>   makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
>   Starting NFS common utilities: statd.
>   Exporting directories for NFS kernel daemon...done.
>   Starting NFS kernel daemon: nfsd mountd.
>   makki:/home/ecashin# 
> 
>   root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
>   root@kokone root# ls /mnt/makki
>   screen
>   root@kokone root# 
> 
> -- 
>   Ed L Cashin <ecashin@coraid.com>
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads, discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2005-10-07  9:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07  9:45 Richard Hirst [this message]
2005-10-14  7:41 ` block dev minor > 255 and exporting fs Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2005-10-05 17:32 Ed L Cashin
2005-10-06  6:33 ` Neil Brown
2005-10-06 16:38   ` Ed L Cashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051007094532.GW6490@levanta.com \
    --to=rhirst@levanta.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox