linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 2/2] hpfs: optimize quad buffer loading
Date: Wed, 29 Jan 2014 16:05:02 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.02.1401291551160.15720@artax.karlin.mff.cuni.cz> (raw)
In-Reply-To: <CA+55aFwKyBq_m5E7PLm-FBrn3NSFyXrhmn5SCT=7x93emoAhCw@mail.gmail.com>



On Tue, 28 Jan 2014, Linus Torvalds wrote:

> On Tue, Jan 28, 2014 at 5:01 PM, Mikulas Patocka
> <mikulas@artax.karlin.mff.cuni.cz> wrote:
> >
> > The page cache doesn't handle different-size buffers for one page.
> 
> Correct, but that should not be relevant..
> 
> > HPFS
> > has some 2kB structures (dnodes, bitmaps) and some 512-byte structures
> > (fnodes, anodes). We can have a 4kB page that contains one 2kB dnode and
> > four 512-byte anodes or fnodes. That is impossible to create with
> > create_empty_buffers.
> 
> Damn. You're both right and wrong.
> 
> It's true that buffer heads within a page have to be the same size,
> but that's not really relevant - you don't work with pages, so you
> could have two totally independent 2kB buffer heads allocated to
> within one page.

Suppose that 8 consecutive sectors on the disk contain this data:
dnode (4 sectors)
fnode (1 sector)
file content (3 sectors)
--- now, you can't access that fnode using 2kB buffer, if you did and if 
you marked that buffer dirty, you damage file content.

So you need different-sized buffers on one page.

> And that's actually how filesystems that virtually map pages do things
> - they just fill the page with (equal-sized) buffer heads indexed on
> the filesystem inode, and the buffer heads don't have to be related to
> each other physically on the disk.
> 
> In fact, even the sizes don't even really *have* to be the same (in
> theory the list of page buffers could point to five buffers: one 2k
> and four 512-byte bhs), but all the helper functions to populate the
> buffer head lists etc do assume that.
> 
> And way back when, buffer heads had their own hashed lookup, so even
> with the bd_dev approach you could have two non-consecutive
> independent 2kB bh's in the same page.
> 
> So you used to be wrong.
> 
> But the reason you're right is that we got rid of the buffer head
> hashes, and now use the page-level hashing to look up the page that
> the buffer heads are in, which does mean that now you can't really
> alias different sizes on different pages any more, or have one page
> that contains buffer heads that aren't related to each other
> physically on the disk any more.

Page-level lookup doesn't seem as a problem to me. All you need to do is 
to add "blocksize" argument to __find_get_block_slow, change
index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
to
index = block >> (PAGE_CACHE_SHIFT - __ffs(blocksize));

and change
else if (bh->b_blocknr == block)
to
else if (bh->b_blocknr == block && bh->b_size == block)

That would be enough to be able to find a buffer with different block 
size on one page.


The worse problem is how to create such buffers.

And how to synchronize it with concurrent access from userspace using 
def_blk_aops, that would be very hard.

> So yeah, very annoying, we're *so* close to being able to do this, but
> because the buffer heads are really no longer "primary" data
> structures and don't have any indexing of their own, we can't actually
> do it.
> 
>                Linus

Mikulas

  reply	other threads:[~2014-01-29 15:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-28 23:10 [PATCH 1/2] hpfs: remember free space Mikulas Patocka
2014-01-28 23:11 ` [PATCH 2/2] hpfs: optimize quad buffer loading Mikulas Patocka
2014-01-28 23:44   ` Linus Torvalds
2014-01-29  1:01     ` Mikulas Patocka
2014-01-29  1:51       ` Mikulas Patocka
2014-01-29  2:05         ` Linus Torvalds
2014-01-29 14:50           ` Mikulas Patocka
2014-01-29  2:01       ` Linus Torvalds
2014-01-29 15:05         ` Mikulas Patocka [this message]
2014-01-30 17:59           ` Linus Torvalds
2014-01-31 17:41             ` Mikulas Patocka
2014-01-31 17:52               ` Linus Torvalds
2014-01-31 18:10                 ` Mikulas Patocka
2014-01-31 18:40                   ` Linus Torvalds
2014-01-31 19:19                     ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1401291551160.15720@artax.karlin.mff.cuni.cz \
    --to=mikulas@artax.karlin.mff.cuni.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).