linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Multi-device update
Date: Wed, 16 Apr 2008 14:04:03 -0400	[thread overview]
Message-ID: <200804161404.04202.chris.mason@oracle.com> (raw)
In-Reply-To: <87fxtlitle.fsf@basil.nowhere.org>

On Wednesday 16 April 2008, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >> > The async work queues include code to checksum data pages without the
> >> > FS mutex
> >>
> >> Are they able to distribute work to other cores?
> >
> > Yes, it just uses a workqueue.
>
> Unfortunately work queues don't do that by default currently. They
> tend to process on the current CPU only.

Well, I see multiple work queue threads using CPU time, but I haven't spent 
much time optimizing it.  There's definitely room for improvement.

>
> > The current implemention is pretty simple, it
> > surely could be more effective at spreading the work around.
> >
> > I'm testing a variant that only tosses over to the async queue for
> > pdflush, inline reclaim should stay inline.
>
> Longer term I would hope that write checksum will be basically free by
> doing csum-copy at write() time. The only problem is just where to store
> the checksum between the write and the final IO? There's no space in
> struct page.

At write time is easier (except for mmap) because I can toss the csum directly 
into the btree inside btrfs_file_write.  The current code avoids that 
complexity and does it all at writeout.

One advantage to the current code is that I'm able to optimize tree searches 
away but checksumming a bunch of pages at a time.  Multiple pages worth of 
checksums get stored in a single btree item, so at least for btree operations 
the current code is fairly optimal.

>
> The same could be also done for read() but that might be a little more
> tricky because it would require delayed error reporting and it might
> be difficult to do this for partial blocks?

Yeah, it doesn't quite fit with how the kernel does reads.  For now it is much 
easier if the retry-other-mirror operation happens long before copy_to_user.

-chris


  reply	other threads:[~2008-04-16 18:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-16 15:34 Multi-device update Chris Mason
2008-04-16 16:14 ` Andi Kleen
2008-04-16 16:54   ` Chris Mason
2008-04-16 17:43     ` Andi Kleen
2008-04-16 18:04       ` Chris Mason [this message]
2008-04-16 18:10         ` Andi Kleen
2008-04-16 18:14           ` Jens Axboe
2008-04-16 18:24             ` Andi Kleen
2008-04-16 18:26               ` Jens Axboe
2008-04-16 18:28               ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200804161404.04202.chris.mason@oracle.com \
    --to=chris.mason@oracle.com \
    --cc=andi@firstfloor.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).