From: Nikolaus Rath <Nikolaus@rath.org>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Does NFS4 need st_gen?
Date: Fri, 21 Oct 2011 13:44:13 -0400 [thread overview]
Message-ID: <4EA1AF6D.60603@rath.org> (raw)
In-Reply-To: <1319217023.4537.28.camel@lade.trondhjem.org>
On 10/21/2011 01:10 PM, Trond Myklebust wrote:
> On Fri, 2011-10-21 at 12:09 -0400, Nikolaus Rath wrote:
>> On 10/21/2011 12:00 PM, Trond Myklebust wrote:
>>> On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote:
>>>> Trond Myklebust <Trond.Myklebust@netapp.com> writes:
>>>>> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
>>>>>> "J. Bruce Fields" <bfields@fieldses.org> writes:
>>>>>>> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>>>>>>>> I'm working on a FUSE file system that stores file system metadata in an
>>>>>>>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>>>>>>>> of inode generation numbers would keep the code much simpler, because I
>>>>>>>> want to delete inode-rows from the SQL table when the last reference to
>>>>>>>> the inode is deleted (so I can't keep track of the generation no).
>>>>>>>
>>>>>>> You can use current time, or a counter, or something, as the generation
>>>>>>> number.
>>>>>>
>>>>>> With current time I'm screwed if the system clock doesn't have
>>>>>> sufficiently fine granularity. With a counter, I either have to remember
>>>>>> counter values per-inode even after the inode is deleted, or the global
>>>>>> counter will overflow at some point (in which case I may just as well
>>>>>> require unique inodes in the first place).
>>>>>
>>>>> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
>>>>> do you expect it to take you to create+destroy between 2^256 and 2^1024
>>>>> inodes? I'm guessing that we'll all be long dead and the universe will
>>>>> have undergone heat death before that happens...
>>>>
>>>> Please stop assuming that I'm stupid or haven't thought about the
>>>> problem at all. The bottleneck is not the length of the NFS file handle,
>>>> but the length of the inode and generation number (both of which are
>>>> restricted to 32bit by FUSE) together with the requirement that not only
>>>> both of them together need to be unique forever, but the inode also
>>>> needs to be unique at any given instant (so they cannot be trivially
>>>> combined to form a 64bit value).
>>>
>>> No. The point is you don't need a generation number if you don't want to
>>> implement one...
>>>
>>> You can use any unique identifier + the inode number, and the unique
>>> identifier is only limited by the size of the filehandle.
>>
>> So how do you choose the unique identifier? It's limited by FUSE to
>> 32bit and therefore can't be a global counter, it can't be a timestamp
>
> AFAICS fuse gives you a 64-bit inode number and a 32-bit generation
> counter.
Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
for inodes, so on 32bit systems you only have 32bit inodes even if ino_t
is 64bit.
> IOW: start allocating inode numbers incrementally from 0 - 2^64, then
> each time you overflow the 64-bit inode number counter, bump the
> generation number. You'll have to skip those inode numbers that are
> already allocated in the subsequent generations, but the total number of
> unique combinations is still likely to be more than large enough not to
> be a worry.
Yes, as I said eariler, it is possible to do with the available 32 + 32
bits, but it does introduce additional complexity.
>> because the system clock may not have enough resolution, and it can't be
>> a per-inode counter because then I can't discard the counter after the
>> inode has been deleted.
>
> If you need more unique values, then modify fuse to allow your
> filesystem to manage the exportfs interface. The fuse ABI is versioned,
> and can be extended to support new features.
FUSE 3 will have 64bit inodes, and I don't think this feature would make
it into 2.x.
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
next prev parent reply other threads:[~2011-10-21 17:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-19 14:17 Does NFS4 need st_gen? Nikolaus Rath
2011-10-19 17:15 ` J. Bruce Fields
2011-10-19 19:11 ` Nikolaus Rath
2011-10-20 12:02 ` J. Bruce Fields
2011-10-20 17:21 ` Nikolaus Rath
2011-10-20 19:57 ` J. Bruce Fields
2011-10-20 20:37 ` Nikolaus Rath
2011-10-21 0:07 ` Trond Myklebust
2011-10-21 13:54 ` Nikolaus Rath
2011-10-21 16:00 ` Trond Myklebust
2011-10-21 16:09 ` Nikolaus Rath
2011-10-21 17:10 ` Trond Myklebust
2011-10-21 17:44 ` Nikolaus Rath [this message]
2011-10-22 18:47 ` Boaz Harrosh
2011-10-22 18:59 ` Boaz Harrosh
2011-10-22 20:10 ` Nikolaus Rath
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EA1AF6D.60603@rath.org \
--to=nikolaus@rath.org \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox