* Allocator changes
@ 2009-10-21 21:41 Josef Bacik
2009-10-22 17:37 ` Josef Bacik
2009-10-26 9:28 ` Chris Mason
0 siblings, 2 replies; 3+ messages in thread
From: Josef Bacik @ 2009-10-21 21:41 UTC (permalink / raw)
To: chris.mason; +Cc: linux-btrfs
Hello,
I've been thinking recently about how to fix the caching stuff so it isn't so
damned slow and had started to work on writing out the free space cache into
bitmaps for every dirty bitmap, but then Eric Sandeen pointed out that xfs just
has a couple of btree's to track free space, so I started thinking about doing
it that way, since we do have this nice btree code laying about.
There are a couple of problems with adding another free space root however:
1) Backwards compatibility. This would be yet another format change that would
be rolling and users would not be able to go back on. The problem here is that
both the userspace and kernel sides would have to be changed, so if old
userspace tools made changes to a new fs there would be no way of signalling
that we need to make sure we rebuild the free space tree, so it would have to be
an incompatible change. I'm less worried about this, but it is an issue to
consider.
2) COW. This part really sucks, we get into the same sort of recursiveness that
we had with the extent tree, only worse because in order to COW a block in the
free space tree during allocation we'd have to come back into the allocator.
The way I'm thinking about fixing this is doing something like
path->recursive_locking = 1. This way when we do a tree_lock, if we are already
locked then we just set a flag on the eb->flags to say we recursively locked it,
and then when we walk back unlocking stuff we check that flag and clear it if
its set, otherwise we unlock the eb->lock spinlock.
This lets us get rid of all of the free space cache code and simply use the free
space btree for keeping track of our free space. This would do well for keeping
our memory footprint down. We can still keep the block group stuff and
find_free_extent mostly in place, that way we can still mark block groups as
read-only and be sure we don't end up allocating any of the area within that
byte range.
Once all that is in place, we can do a per_cpu cluster/pool where we add big
extents that we read out of the btree and then we can keep down the number of
times we hit the btree. Then we just flush the percpu extents back to the btree
everytime we commit the transaction and we're good to go.
This keeps us from having to do the caching kthread at all and I think will
greatly help our bootup times. What do you think? Thanks,
Josef
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Allocator changes
2009-10-21 21:41 Allocator changes Josef Bacik
@ 2009-10-22 17:37 ` Josef Bacik
2009-10-26 9:28 ` Chris Mason
1 sibling, 0 replies; 3+ messages in thread
From: Josef Bacik @ 2009-10-22 17:37 UTC (permalink / raw)
To: Josef Bacik; +Cc: chris.mason, linux-btrfs
On Wed, Oct 21, 2009 at 05:41:36PM -0400, Josef Bacik wrote:
> Hello,
>
> I've been thinking recently about how to fix the caching stuff so it isn't so
> damned slow and had started to work on writing out the free space cache into
> bitmaps for every dirty bitmap, but then Eric Sandeen pointed out that xfs just
> has a couple of btree's to track free space, so I started thinking about doing
> it that way, since we do have this nice btree code laying about.
>
> There are a couple of problems with adding another free space root however:
>
> 1) Backwards compatibility. This would be yet another format change that would
> be rolling and users would not be able to go back on. The problem here is that
> both the userspace and kernel sides would have to be changed, so if old
> userspace tools made changes to a new fs there would be no way of signalling
> that we need to make sure we rebuild the free space tree, so it would have to be
> an incompatible change. I'm less worried about this, but it is an issue to
> consider.
>
> 2) COW. This part really sucks, we get into the same sort of recursiveness that
> we had with the extent tree, only worse because in order to COW a block in the
> free space tree during allocation we'd have to come back into the allocator.
> The way I'm thinking about fixing this is doing something like
> path->recursive_locking = 1. This way when we do a tree_lock, if we are already
> locked then we just set a flag on the eb->flags to say we recursively locked it,
> and then when we walk back unlocking stuff we check that flag and clear it if
> its set, otherwise we unlock the eb->lock spinlock.
>
Actually doing this doesn't actually work, since theres no way to keep track of
who is recursively locking without saving the pid or some such other garbage.
So what I'm going to do is have a patch->delay_cow. Everytime we do a search in
the free space tree we'll set delay_cow, and mark the nodes in the path that we
need to cow with what the operation we need to do (cow, split, new root etc),
and then once we find the leaf with the free space we just look for enough space
to allocate enough blocks for our operations and then do the cow operations
then. We also keep the nodes that we want to cow as blocked so we keep other
people from walking down that path until after we've run the delayed cow
operations. Thanks,
Josef
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Allocator changes
2009-10-21 21:41 Allocator changes Josef Bacik
2009-10-22 17:37 ` Josef Bacik
@ 2009-10-26 9:28 ` Chris Mason
1 sibling, 0 replies; 3+ messages in thread
From: Chris Mason @ 2009-10-26 9:28 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Wed, Oct 21, 2009 at 05:41:36PM -0400, Josef Bacik wrote:
> Hello,
>
> I've been thinking recently about how to fix the caching stuff so it isn't so
> damned slow and had started to work on writing out the free space cache into
> bitmaps for every dirty bitmap, but then Eric Sandeen pointed out that xfs just
> has a couple of btree's to track free space, so I started thinking about doing
> it that way, since we do have this nice btree code laying about.
>
> There are a couple of problems with adding another free space root however:
The free space root seems like it will have a pretty high overhead to
me. But, I do like your general idea of caching the free space map at
commit time as an on-disk cache. I think as we nail down the enospc
corners we can go back and give that a try.
To get around the cow problem, we can log the cache to the log tree.
-chris
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-10-26 9:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-21 21:41 Allocator changes Josef Bacik
2009-10-22 17:37 ` Josef Bacik
2009-10-26 9:28 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox