From: Nick Piggin <npiggin@suse.de>
To: linux-fsdevel@vger.kernel.org
Subject: [patch] fsblock preview
Date: Mon, 15 Sep 2008 10:30:14 +0200 [thread overview]
Message-ID: <20080915083014.GA3407@wotan.suse.de> (raw)
In-Reply-To: <20080914221500.GH27080@wotan.suse.de>
OK, vger doesn't seem to like my patch, so I'll have to give a url to it,
sorry.
http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/fsblock/2.6.27-rc5/fsb-preview.patch
I've been doing some work on fsblock again lately, so in case anybody might
find it interesting, here is a "preview" patch. Basically it compiles and
runs OK for me here, under a few stress tests. I wouldn't say it is close to
bug free, and it needs a lot of bits and pieces to polish up like error
handling.
I've also just stripped out the large block size support in the patch I'm
mailing out... I have been developing with ext2 without large lock support
sizes so those paths have rotted a bit and besides they still really need
a bit more changes to some VM paths.
Since I last posted fsblock, there have been some big changes:
- Using a per block spinlock to protect most access now. This eliminates
some races I had against dirtying vs cleaning, and with fsblock
refcounting and reclaim.
- fsblock_no_cache aka "nobh" mode now works well due to the above. When
/proc/sys/vm/fsblock_no_cache is 1, you never get fsblocks hanging around
longer than they have to. You also would never be subject to the circular
referencing "orphan" pages that buffer heads are subject to.
- RCU is gone. This is actually a good thing because in "nobh" mode, some
workloads will rapidly allocate and free the structures, and that can
be costly with RCU.
- struct fsblock has shrunk to 32 bytes on 64-bit. Less than 1/3 the size
of struct buffer_head. Although absolute size doesn't matter so much now
(because of no_cache mode). I even have an optional feature "bdflush"
that increases the size... although I do want to keep it within 64 bytes
(one cacheline).
- added an "intermediate" mode which provides a ->data pointer in struct
fsblock_meta, and means it is trivial to transition filesystems to
fsblock (although they would not be able to support superpage blocks).
- Added ext2 intermediate support.
- Had to modify the VM a little bit in order to close races with freeing a
page's fsblock before it can be cleaned (or still has a chance to be
dirtied via mmap). fsblock of course ensures that zero memory allocations
are required in the writeout path.
- Lockless pagecache has been merged in mainline, which means the largest
granularity of synchronisation anywhere in the fsblock core code is on a
per-page basis (buffer uses per-inode private_lock). This is one of the
reasons I am skeptical that keeping pagecache state in extents is better: it
would be rather impressive if it could match the straight line speed or
scalability of fsblock.
- However, I *have* always agreed that it makes sense to keep (some) block
state in extents, because that is going to change much less frequently, and
should be represented with fewer extents provided the filesystem layout is
reasonable. So I've written a (very) basic extent cache for block mappings,
which can be used by filesystems that don't have good in-memory block
mapping structures themselves (like ext2, for example). No reclaim for this
at present, I should just add a simple shrinker.
- bdflush... it's commented out so it won't build by default, but basically
because fslbock properly keeps block dirty state in synch with page dirty
state, I can keep sorted structure of dirty fsblocks per device, and do
writeout based on that rather than this fragile walking over inodes that
pdflush does. Of course it won't work with delayed allocation, so something
would have to be figured out with that (perhaps allocate all outstanding
blocks before each writeout pass).
The thing I like about bdflush is that it can easily do nice submit
ordering of inter-file as well as file/metadata blocks for writeout. I
don't know if it will come to anything, but at least it is not tightly
coupled with the core fsblock stuff. It's a bit hacky at the moment ;)
- Still not using a private bdev for fsblock filesystems... I never got around
to figuring out how to do this. This means that sometimes funny things will
happen with block_dev device if pages and buffers try to use it. It mostly
works OK but is a hack that I need to fix.
- Finally, for those not listening last time. I'm doing block sizes larger
than page size (up to 16MB IIRC, but easily expandable to much higher) with
fsblock using exactly the same data structures. Although I haven't included
that in the patch here.
next prev parent reply other threads:[~2008-09-15 8:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080911165335.GA31244@wotan.suse.de>
2008-09-12 9:48 ` [patch] fsblock preview Nick Piggin
[not found] ` <20080914221500.GH27080@wotan.suse.de>
2008-09-15 8:30 ` Nick Piggin [this message]
2008-09-16 11:35 ` Neil Brown
2008-09-23 4:39 ` Nick Piggin
2008-09-24 1:31 ` Neil Brown
2008-09-25 4:38 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080915083014.GA3407@wotan.suse.de \
--to=npiggin@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).