tmpfs, NFS, file handles

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* tmpfs, NFS, file handles
@ 2002-02-20 16:46 Peter J. Braam
  2002-02-20 16:56 ` Jeff Garzik
  0 siblings, 1 reply; 11+ messages in thread
From: Peter J. Braam @ 2002-02-20 16:46 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: phil

Hi, 

At present one can probably not run NFS (or InterMezzo) on top of
tmpfs.

Is there a suggested solution for fh_to_dentry and dentry_to_fh for
tmpfs?  

An "iget" based solution might work but at present tmpfs inodes are
not hashed.

Thanks for any suggestions!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 16:46 tmpfs, NFS, file handles Peter J. Braam
@ 2002-02-20 16:56 ` Jeff Garzik
  2002-02-20 19:21   ` Peter J. Braam
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-02-20 16:56 UTC (permalink / raw)
  To: Peter J. Braam; +Cc: linux-kernel, linux-fsdevel, phil

"Peter J. Braam" wrote:
> 
> Hi,
> 
> At present one can probably not run NFS (or InterMezzo) on top of
> tmpfs.
> 
> Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> tmpfs?
> 
> An "iget" based solution might work but at present tmpfs inodes are
> not hashed.

I talked to neil brown about NFS and ramfs... he mentioned using
iunique()

-- 
Jeff Garzik      | "Why is it that attractive girls like you
Building 1024    |  always seem to have a boyfriend?"
MandrakeSoft     | "Because I'm a nympho that owns a brewery?"
                 |             - BBC TV show "Coupling"

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 16:56 ` Jeff Garzik
@ 2002-02-20 19:21   ` Peter J. Braam
  2002-02-20 19:42     ` Trond Myklebust
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter J. Braam @ 2002-02-20 19:21 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, linux-fsdevel, phil

Hi, 

> "Peter J. Braam" wrote:
...
> > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > tmpfs?
> > 
> > An "iget" based solution might work but at present tmpfs inodes are
> > not hashed.
On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
...
> I talked to neil brown about NFS and ramfs... he mentioned using
> iunique()


So do I understand that hashing tmpfs inodes is perhaps the way to go?

Would the following also work? 

 - have a 32 bit counter: set inode->i_ino to count++
 - up the generation number each time the counter warps. 

Between boot cycles NFS could still get confused, that might be helped
by setting the initial generation to the system time. 

Thoughts anyone? 

- Peter -

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 19:21   ` Peter J. Braam
@ 2002-02-20 19:42     ` Trond Myklebust
  2002-02-20 22:53     ` Neil Brown
  2002-02-21  7:40     ` Christoph Rohland
  2 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2002-02-20 19:42 UTC (permalink / raw)
  To: Peter J. Braam; +Cc: linux-kernel, linux-fsdevel

>>>>> " " == Peter J Braam <braam@clusterfs.com> writes:

     > Would the following also work?

     > - have a 32 bit counter: set inode->i_ino to count++

That is exactly what iunique() does except that it also checks for
uniqueness and allows you to specify a minimum value. Sooner or later
your 32-bit counter will wrap round...

     > - up the generation number each time the counter warps.

     > Between boot cycles NFS could still get confused, that might be
     > helped by setting the initial generation to the system time.

Yep. That is what the 'fat' filesystem does.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 19:21   ` Peter J. Braam
  2002-02-20 19:42     ` Trond Myklebust
@ 2002-02-20 22:53     ` Neil Brown
  2002-02-21  4:43       ` David Chow
  2002-02-21  7:40     ` Christoph Rohland
  2 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2002-02-20 22:53 UTC (permalink / raw)
  To: Peter J. Braam; +Cc: Jeff Garzik, linux-kernel, linux-fsdevel, phil

On Wednesday February 20, braam@clusterfs.com wrote:
> Hi, 
> 
> > "Peter J. Braam" wrote:
> ...
> > > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > > tmpfs?
> > > 
> > > An "iget" based solution might work but at present tmpfs inodes are
> > > not hashed.
> On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
> ...
> > I talked to neil brown about NFS and ramfs... he mentioned using
> > iunique()
... but Trond had a better idea....
> 
> 
> So do I understand that hashing tmpfs inodes is perhaps the way to go?
> 
> Would the following also work? 
> 
>  - have a 32 bit counter: set inode->i_ino to count++
>  - up the generation number each time the counter warps. 

You don't just need a number in inode->i_ino.  You also need to be
able to find an inode given that number.
So you need to store all the inodes in a hash table.
But you don't want to penalise non-NFS users.

I would probably:
   leave i_ino as set by new_inode
   initialise inode->i_generation to CURRENT_TIME

   in dentry_to_fh,
     check if list_empty(&inode->i_hash)
       if it is, then add the inode to some hash table indexed by the
           address of the inode
       put the address of the inode, i_ino and i_generation in the filehandle

   in fh_to_dentry,
     lookup the given address in the hash table.
     if it is found, check the i_ino and i_generation


That means you are only hashing inodes exported by NFS, and you have
a pretty good guarantee of uniqueness (providing time doesn't go
backwards).

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 22:53     ` Neil Brown
@ 2002-02-21  4:43       ` David Chow
  2002-02-21  5:04         ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: David Chow @ 2002-02-21  4:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: Peter J. Braam, Jeff Garzik, linux-kernel, linux-fsdevel, phil

在 週四, 2002-02-21 06:53, Neil Brown 寫道：
> On Wednesday February 20, braam@clusterfs.com wrote:
> > Hi, 
> > 
> > > "Peter J. Braam" wrote:
> > ...
> > > > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > > > tmpfs?
> > > > 
> > > > An "iget" based solution might work but at present tmpfs inodes are
> > > > not hashed.
> > On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
> > ...
> > > I talked to neil brown about NFS and ramfs... he mentioned using
> > > iunique()
> ... but Trond had a better idea....
> > 
> > 
> > So do I understand that hashing tmpfs inodes is perhaps the way to go?
> > 
> > Would the following also work? 
> > 
> >  - have a 32 bit counter: set inode->i_ino to count++
> >  - up the generation number each time the counter warps. 
> 
> You don't just need a number in inode->i_ino.  You also need to be
> able to find an inode given that number.
> So you need to store all the inodes in a hash table.
> But you don't want to penalise non-NFS users.
> 
> I would probably:
>    leave i_ino as set by new_inode
>    initialise inode->i_generation to CURRENT_TIME
> 
>    in dentry_to_fh,
>      check if list_empty(&inode->i_hash)
>        if it is, then add the inode to some hash table indexed by the
>            address of the inode
>        put the address of the inode, i_ino and i_generation in the filehandle
> 
>    in fh_to_dentry,
>      lookup the given address in the hash table.
>      if it is found, check the i_ino and i_generation
> 
> 
> That means you are only hashing inodes exported by NFS, and you have
> a pretty good guarantee of uniqueness (providing time doesn't go
> backwards).
> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

What I suggest is nfsd should export a symbol called
generic_fh_to_dentry() such that it will be more generic like
generic_file_read() to handle gneeric calls for every fs.

Thanks,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-21  4:43       ` David Chow
@ 2002-02-21  5:04         ` Neil Brown
       [not found]           ` <3C790FB2.50503@rcn.com.hk>
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2002-02-21  5:04 UTC (permalink / raw)
  To: David Chow; +Cc: Peter J. Braam, Jeff Garzik, linux-kernel, linux-fsdevel, phil

On  February 21, davidchow@shaolinmicro.com wrote:
> 
> What I suggest is nfsd should export a symbol called
> generic_fh_to_dentry() such that it will be more generic like
> generic_file_read() to handle gneeric calls for every fs.

But every filesystem is really very different in this reguard.

What would you think this "generic_fh_to_dentry" should do?

We actually already have one.  You set ->fh_to_dentry to NULL, and the
it used "iget". 

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
  2002-02-20 19:21   ` Peter J. Braam
  2002-02-20 19:42     ` Trond Myklebust
  2002-02-20 22:53     ` Neil Brown
@ 2002-02-21  7:40     ` Christoph Rohland
  2 siblings, 0 replies; 11+ messages in thread
From: Christoph Rohland @ 2002-02-21  7:40 UTC (permalink / raw)
  To: Peter J. Braam; +Cc: Jeff Garzik, linux-kernel, linux-fsdevel, phil

Hi Peter,

On Wed, 20 Feb 2002, Peter J. Braam wrote:
> Between boot cycles NFS could still get confused, that might be
> helped by setting the initial generation to the system time.

Between boot cycles you loose _all_ tmpfs files. That's what the
'tmp' in tmpfs talks about ;-)

Greetings
		Christoph



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: tmpfs, NFS, file handles
@ 2002-02-21 16:16 Lever, Charles
  2002-02-21 22:58 ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Lever, Charles @ 2002-02-21 16:16 UTC (permalink / raw)
  To: 'Neil Brown'
  Cc: Jeff Garzik, linux-kernel, linux-fsdevel, phil, Peter J. Braam

> That means you are only hashing inodes exported by NFS, and you have
> a pretty good guarantee of uniqueness (providing time doesn't go
> backwards).

this may be obvious... apologies.

don't use the TOD directly -- it can go backwards if ntpd or an admin
sets it back.  better to use a monotonically increasing number that
you completely control yourself.

also, if your timer resolution isn't good enough, a window opens 
where two generated "uniquifiers" can be the same for all intents
and purposes.

if there's nothing else we've learned from NFS, it's that using
timestamps is a lousy way of managing cache coherency and file
identity.  ;-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: tmpfs, NFS, file handles
  2002-02-21 16:16 Lever, Charles
@ 2002-02-21 22:58 ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2002-02-21 22:58 UTC (permalink / raw)
  To: Lever, Charles
  Cc: Jeff Garzik, linux-kernel, linux-fsdevel, phil, Peter J. Braam

On Thursday February 21, Charles.Lever@netapp.com wrote:
> > That means you are only hashing inodes exported by NFS, and you have
> > a pretty good guarantee of uniqueness (providing time doesn't go
> > backwards).
> 
> this may be obvious... apologies.
> 
> don't use the TOD directly -- it can go backwards if ntpd or an admin
> sets it back.  better to use a monotonically increasing number that
> you completely control yourself.
> 
> also, if your timer resolution isn't good enough, a window opens 
> where two generated "uniquifiers" can be the same for all intents
> and purposes.

Certainly timeofday by itself isn't enough for the various reasons you
mention.  But it does help to avoid accepting filehandles from before
the last reboot.
In my proposal there there were three numbers:
   An address
   A sequentially assigned inode number
   A time of day.

Any two of these is probably adequate most of the time, but could
occasionally result in equal filehandles for different files.  Adding
a third makes collision virtually impossible.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tmpfs, NFS, file handles
       [not found]           ` <3C790FB2.50503@rcn.com.hk>
@ 2002-02-24 21:49             ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2002-02-24 21:49 UTC (permalink / raw)
  To: David Chow
  Cc: David Chow, Peter J. Braam, Jeff Garzik, linux-kernel,
	linux-fsdevel, phil

On Monday February 25, davidchow@rcn.com.hk wrote:
> 
> 
> Neil Brown wrote:
> 
> >On  February 21, davidchow@shaolinmicro.com wrote:
> >
> >>What I suggest is nfsd should export a symbol called
> >>generic_fh_to_dentry() such that it will be more generic like
> >>generic_file_read() to handle gneeric calls for every fs.
> >>
> >
> >But every filesystem is really very different in this reguard.
> >
> >What would you think this "generic_fh_to_dentry" should do?
> >
> >We actually already have one.  You set ->fh_to_dentry to NULL, and the
> >it used "iget". 
> >
> >NeilBrown
> >
> You know, we have serious problem implementing non block device 
> filesystems with NFS.  I actually spend quite a long while understanding 
> the nfsd code in 2.4 . Here is the problems that I suffer....

Well, you are not alone.  NFS appears to have been designed with UFS -
the original Unix File System, or close relatives - specifically in
mind.
The further a file system diverges from this, the harder it is to
provide NFS support.

> 
> nfsd calling iget to read an inode info directly into the dcache even 
> though there is no valid linked dentry in the dcache that is in the 
> list_empty(inode->i_dentry) list, but what we have implement in our non 
> block device filesystem is that our inode is dynamically generated using 
> lookup(), that means the inode is only valid if we have dentry 
> information and going through a normal 
> lookup(neg-dentry)=>read_inode(ino) procedure. When we implement a non 
> block device fs and want to serve it with nfsd we also suffer from this 
> problem. Usually non block device system is some kind of fs related with 
> name space, and name space in terms of kernel space is dentry. I bet 
> most of the non block device fs hvae some similar problem.

The "iget" approach is really a hack that sort-of works for ext2 and
pretty much doesn't work for any other filesystem.
Some time in 2.5, this "iget" approach will go away.  Any filesystem
that wants to be NFS-exportable will have to define explicit methods
for exporting, quite similiar to (but sibtly different from)  the
current  fh_to_dentry and dentry_to_fh methods.  
In 2.4, you should not even consider supporting iget usage by kNFSd,
you should supply fh_to_dentry  and dentry_to_fh.

> 
> I suggests the fh handle mechanism can handle this kind of situation so 
> that non block device filesystems can work with NFS . The current nfs 
> have to maintain a stateless design so there are no easy way to not 
> allowing VFS to not touch the dcache during a request verification.
> 
> I suggest the fh_verify should go through a proper lookup process for 
> inodes that have an empty inode->i_dentry list. This will make life for 
> non block device filesystem much easier.

What do you mean by "a proper lookup process"?  Do you mean having a
full path name from the filesystem root and following that?
That might make it easy for the filesystem, but it would make it
impossible for the NFS server.
Not only does the NFS server not know the name of the file, it cannot
know as the file might have been renamed by some other process since the
NFS server last knew the name.

NFSv4 has a concept of a "volatile" file handle that is supposed to
help with this:  The server can tell the client "that file handle
doesn't work any more", and then the client might re-issue the
filename look requests.  But there are still possible issues with
files being renamed between accesses.

> 
> I think all non block device fs and non-Unix based fs (fs that don't use 
> inode number identification) will have the same problem because they 
> simply use iunique and ino have no meaning to them unless the procedure 
> lookup()=>read_inode() sequece is properly executed.

Yes.  It is a difficult problem.
But to work with NFS (v2 or v3) the filesystem *MUST* be able to
provide a stable, fixed length identifier for a file that is not
changed by rename, truncate, or server restart (or anything else, but
those are often the difficult ones).
The extent to which your filesystem cannot provide such an identifier
is the extent to which it cannot support NFS.
FAT based filesystems are a good example.  We provide 90% support.  If
you try to access a FAT filesystem from Linux (with no_subtree_check),
it will mostly work.
But if you open a file on the client, truncate it, rename it, reboot
the server, and then write to it, it will fail.  The same is not true
of ext2.

I hope this helps.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-02-24 21:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-20 16:46 tmpfs, NFS, file handles Peter J. Braam
2002-02-20 16:56 ` Jeff Garzik
2002-02-20 19:21   ` Peter J. Braam
2002-02-20 19:42     ` Trond Myklebust
2002-02-20 22:53     ` Neil Brown
2002-02-21  4:43       ` David Chow
2002-02-21  5:04         ` Neil Brown
     [not found]           ` <3C790FB2.50503@rcn.com.hk>
2002-02-24 21:49             ` Neil Brown
2002-02-21  7:40     ` Christoph Rohland
  -- strict thread matches above, loose matches on Subject: below --
2002-02-21 16:16 Lever, Charles
2002-02-21 22:58 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox