From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 2/2] hpfs: optimize quad buffer loading
Date: Wed, 29 Jan 2014 16:05:02 +0100 (CET) [thread overview]
Message-ID: <alpine.DEB.2.02.1401291551160.15720@artax.karlin.mff.cuni.cz> (raw)
In-Reply-To: <CA+55aFwKyBq_m5E7PLm-FBrn3NSFyXrhmn5SCT=7x93emoAhCw@mail.gmail.com>
On Tue, 28 Jan 2014, Linus Torvalds wrote:
> On Tue, Jan 28, 2014 at 5:01 PM, Mikulas Patocka
> <mikulas@artax.karlin.mff.cuni.cz> wrote:
> >
> > The page cache doesn't handle different-size buffers for one page.
>
> Correct, but that should not be relevant..
>
> > HPFS
> > has some 2kB structures (dnodes, bitmaps) and some 512-byte structures
> > (fnodes, anodes). We can have a 4kB page that contains one 2kB dnode and
> > four 512-byte anodes or fnodes. That is impossible to create with
> > create_empty_buffers.
>
> Damn. You're both right and wrong.
>
> It's true that buffer heads within a page have to be the same size,
> but that's not really relevant - you don't work with pages, so you
> could have two totally independent 2kB buffer heads allocated to
> within one page.
Suppose that 8 consecutive sectors on the disk contain this data:
dnode (4 sectors)
fnode (1 sector)
file content (3 sectors)
--- now, you can't access that fnode using 2kB buffer, if you did and if
you marked that buffer dirty, you damage file content.
So you need different-sized buffers on one page.
> And that's actually how filesystems that virtually map pages do things
> - they just fill the page with (equal-sized) buffer heads indexed on
> the filesystem inode, and the buffer heads don't have to be related to
> each other physically on the disk.
>
> In fact, even the sizes don't even really *have* to be the same (in
> theory the list of page buffers could point to five buffers: one 2k
> and four 512-byte bhs), but all the helper functions to populate the
> buffer head lists etc do assume that.
>
> And way back when, buffer heads had their own hashed lookup, so even
> with the bd_dev approach you could have two non-consecutive
> independent 2kB bh's in the same page.
>
> So you used to be wrong.
>
> But the reason you're right is that we got rid of the buffer head
> hashes, and now use the page-level hashing to look up the page that
> the buffer heads are in, which does mean that now you can't really
> alias different sizes on different pages any more, or have one page
> that contains buffer heads that aren't related to each other
> physically on the disk any more.
Page-level lookup doesn't seem as a problem to me. All you need to do is
to add "blocksize" argument to __find_get_block_slow, change
index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
to
index = block >> (PAGE_CACHE_SHIFT - __ffs(blocksize));
and change
else if (bh->b_blocknr == block)
to
else if (bh->b_blocknr == block && bh->b_size == block)
That would be enough to be able to find a buffer with different block
size on one page.
The worse problem is how to create such buffers.
And how to synchronize it with concurrent access from userspace using
def_blk_aops, that would be very hard.
> So yeah, very annoying, we're *so* close to being able to do this, but
> because the buffer heads are really no longer "primary" data
> structures and don't have any indexing of their own, we can't actually
> do it.
>
> Linus
Mikulas
next prev parent reply other threads:[~2014-01-29 15:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-28 23:10 [PATCH 1/2] hpfs: remember free space Mikulas Patocka
2014-01-28 23:11 ` [PATCH 2/2] hpfs: optimize quad buffer loading Mikulas Patocka
2014-01-28 23:44 ` Linus Torvalds
2014-01-29 1:01 ` Mikulas Patocka
2014-01-29 1:51 ` Mikulas Patocka
2014-01-29 2:05 ` Linus Torvalds
2014-01-29 14:50 ` Mikulas Patocka
2014-01-29 2:01 ` Linus Torvalds
2014-01-29 15:05 ` Mikulas Patocka [this message]
2014-01-30 17:59 ` Linus Torvalds
2014-01-31 17:41 ` Mikulas Patocka
2014-01-31 17:52 ` Linus Torvalds
2014-01-31 18:10 ` Mikulas Patocka
2014-01-31 18:40 ` Linus Torvalds
2014-01-31 19:19 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.02.1401291551160.15720@artax.karlin.mff.cuni.cz \
--to=mikulas@artax.karlin.mff.cuni.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).