From: Li Zefan <lizf@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC][PATCH] Btrfs: New inode number allocator
Date: Thu, 27 Jan 2011 15:10:56 +0800 [thread overview]
Message-ID: <4D411A80.6050109@cn.fujitsu.com> (raw)
In-Reply-To: <1296070624-sup-5026@think>
Chris Mason wrote:
> Excerpts from Li Zefan's message of 2011-01-25 20:53:00 -0500:
>> (WARNING: this patch is not completed or well-tested)
>>
>> We used to allocate inode number by searching through inode items, but
>> it made the allocation slower and slower as more and more files created.
>>
>> The current code just records the highest objectid in the btree without
>> reusing old inode numbers, which will make the filesystem run out of
>> inode number as we create/delete files.
>>
>> In this patch, free inode numbers are stored in the fs tree with key:
>>
>> [start, BTRFS_INO_EXTENT_KEY, end]
>
> Thanks a lot for working on this, it isn't an easy problem.
>
> I think Josef's free space cache for the extent allocation tree is the
> model you want to use. They are actually solving exactly the same
> problem:
>
> In the extent allocation tree, a free extent is one with no keys in the
> tree.
>
> In the FS tree, a free inode is one with no keys in the tree.
>
> He has a cache that gets written on a per block group basis for the free
> extents in that block group. It's a somewhat easier problem to solve in
> the inode number cache because you don't have the same problem where you
> need free blocks to store the free block cache ;)
>
> In his code, the cache stores the generation number of the commit that
> was used to create the cache. If a cache unaware kernel mounts the
> filesystem and makes changes, we notice on the next mount because the
> cache generation number doesn't match the filesystem generation number.
>
> It will probably be easiest to dedicate a specific objectid to the inode
> number cache in each FS tree (say objectid == -12ULL), and then put the
> caching items directly in the tree under that objectid.
>
> I'd suggest that you also reuse his code to compactly store a range of
> free extents. It wouldn't be hard to have a simple compression scheme
> that stored ranges for huge chunks of free inode numbers and did a
> bitmask for ranges where there are lots of free individual inodes.
>
I'll take your suggestion and try to implement it. Thanks.
(btw, I'll be off from Feb 29th to Mar 7th for Chinese Spring Festival)
prev parent reply other threads:[~2011-01-27 7:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-26 1:53 [RFC][PATCH] Btrfs: New inode number allocator Li Zefan
2011-01-26 18:30 ` Goffredo Baroncelli
2011-01-26 19:56 ` Chris Mason
2011-01-27 7:10 ` Li Zefan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D411A80.6050109@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.