Re: [RFC][PATCH] Btrfs: New inode number allocator

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Li Zefan <lizf@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC][PATCH] Btrfs: New inode number allocator
Date: Thu, 27 Jan 2011 15:10:56 +0800	[thread overview]
Message-ID: <4D411A80.6050109@cn.fujitsu.com> (raw)
In-Reply-To: <1296070624-sup-5026@think>

Chris Mason wrote:
> Excerpts from Li Zefan's message of 2011-01-25 20:53:00 -0500:
>> (WARNING: this patch is not completed or well-tested)
>>
>> We used to allocate inode number by searching through inode items, but 
>> it made the allocation slower and slower as more and more files created.
>>
>> The current code just records the highest objectid in the btree without
>> reusing old inode numbers, which will make the filesystem run out of
>> inode number as we create/delete files.
>>
>> In this patch, free inode numbers are stored in the fs tree with key:
>>
>>     [start, BTRFS_INO_EXTENT_KEY, end]
> 
> Thanks a lot for working on this, it isn't an easy problem.
> 
> I think Josef's free space cache for the extent allocation tree is the
> model you want to use.  They are actually solving exactly the same
> problem:
> 
> In the extent allocation tree, a free extent is one with no keys in the
> tree.
> 
> In the FS tree, a free inode is one with no keys in the tree.
> 
> He has a cache that gets written on a per block group basis for the free
> extents in that block group.  It's a somewhat easier problem to solve in
> the inode number cache because you don't have the same problem where you
> need free blocks to store the free block cache ;)
> 
> In his code, the cache stores the generation number of the commit that
> was used to create the cache.  If a cache unaware kernel mounts the
> filesystem and makes changes, we notice on the next mount because the
> cache generation number doesn't match the filesystem generation number.
> 
> It will probably be easiest to dedicate a specific objectid to the inode
> number cache in each FS tree (say objectid == -12ULL), and then put the
> caching items directly in the tree under that objectid.
> 
> I'd suggest that you also reuse his code to compactly store a range of
> free extents.  It wouldn't be hard to have a simple compression scheme
> that stored ranges for huge chunks of free inode numbers and did a
> bitmask for ranges where there are lots of free individual inodes.
> 

I'll take your suggestion and try to implement it. Thanks.

(btw, I'll be off from Feb 29th to Mar 7th for Chinese Spring Festival)

     prev parent reply	other threads:[~2011-01-27  7:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-26  1:53 [RFC][PATCH] Btrfs: New inode number allocator Li Zefan
2011-01-26 18:30 ` Goffredo Baroncelli
2011-01-26 19:56 ` Chris Mason
2011-01-27  7:10   ` Li Zefan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D411A80.6050109@cn.fujitsu.com \
    --to=lizf@cn.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).