From: Li Zefan <lizf@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC][PATCH] Btrfs: New inode number allocator
Date: Thu, 27 Jan 2011 15:10:56 +0800 [thread overview]
Message-ID: <4D411A80.6050109@cn.fujitsu.com> (raw)
In-Reply-To: <1296070624-sup-5026@think>
Chris Mason wrote:
> Excerpts from Li Zefan's message of 2011-01-25 20:53:00 -0500:
>> (WARNING: this patch is not completed or well-tested)
>>
>> We used to allocate inode number by searching through inode items, but
>> it made the allocation slower and slower as more and more files created.
>>
>> The current code just records the highest objectid in the btree without
>> reusing old inode numbers, which will make the filesystem run out of
>> inode number as we create/delete files.
>>
>> In this patch, free inode numbers are stored in the fs tree with key:
>>
>> [start, BTRFS_INO_EXTENT_KEY, end]
>
> Thanks a lot for working on this, it isn't an easy problem.
>
> I think Josef's free space cache for the extent allocation tree is the
> model you want to use. They are actually solving exactly the same
> problem:
>
> In the extent allocation tree, a free extent is one with no keys in the
> tree.
>
> In the FS tree, a free inode is one with no keys in the tree.
>
> He has a cache that gets written on a per block group basis for the free
> extents in that block group. It's a somewhat easier problem to solve in
> the inode number cache because you don't have the same problem where you
> need free blocks to store the free block cache ;)
>
> In his code, the cache stores the generation number of the commit that
> was used to create the cache. If a cache unaware kernel mounts the
> filesystem and makes changes, we notice on the next mount because the
> cache generation number doesn't match the filesystem generation number.
>
> It will probably be easiest to dedicate a specific objectid to the inode
> number cache in each FS tree (say objectid == -12ULL), and then put the
> caching items directly in the tree under that objectid.
>
> I'd suggest that you also reuse his code to compactly store a range of
> free extents. It wouldn't be hard to have a simple compression scheme
> that stored ranges for huge chunks of free inode numbers and did a
> bitmask for ranges where there are lots of free individual inodes.
>
I'll take your suggestion and try to implement it. Thanks.
(btw, I'll be off from Feb 29th to Mar 7th for Chinese Spring Festival)
prev parent reply other threads:[~2011-01-27 7:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-26 1:53 [RFC][PATCH] Btrfs: New inode number allocator Li Zefan
2011-01-26 18:30 ` Goffredo Baroncelli
2011-01-26 19:56 ` Chris Mason
2011-01-27 7:10 ` Li Zefan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D411A80.6050109@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).