From: Chris Mason <chris.mason@oracle.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Multi-device update
Date: Wed, 16 Apr 2008 14:04:03 -0400 [thread overview]
Message-ID: <200804161404.04202.chris.mason@oracle.com> (raw)
In-Reply-To: <87fxtlitle.fsf@basil.nowhere.org>
On Wednesday 16 April 2008, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >> > The async work queues include code to checksum data pages without the
> >> > FS mutex
> >>
> >> Are they able to distribute work to other cores?
> >
> > Yes, it just uses a workqueue.
>
> Unfortunately work queues don't do that by default currently. They
> tend to process on the current CPU only.
Well, I see multiple work queue threads using CPU time, but I haven't spent
much time optimizing it. There's definitely room for improvement.
>
> > The current implemention is pretty simple, it
> > surely could be more effective at spreading the work around.
> >
> > I'm testing a variant that only tosses over to the async queue for
> > pdflush, inline reclaim should stay inline.
>
> Longer term I would hope that write checksum will be basically free by
> doing csum-copy at write() time. The only problem is just where to store
> the checksum between the write and the final IO? There's no space in
> struct page.
At write time is easier (except for mmap) because I can toss the csum directly
into the btree inside btrfs_file_write. The current code avoids that
complexity and does it all at writeout.
One advantage to the current code is that I'm able to optimize tree searches
away but checksumming a bunch of pages at a time. Multiple pages worth of
checksums get stored in a single btree item, so at least for btree operations
the current code is fairly optimal.
>
> The same could be also done for read() but that might be a little more
> tricky because it would require delayed error reporting and it might
> be difficult to do this for partial blocks?
Yeah, it doesn't quite fit with how the kernel does reads. For now it is much
easier if the retry-other-mirror operation happens long before copy_to_user.
-chris
next prev parent reply other threads:[~2008-04-16 18:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 15:34 Multi-device update Chris Mason
2008-04-16 16:14 ` Andi Kleen
2008-04-16 16:54 ` Chris Mason
2008-04-16 17:43 ` Andi Kleen
2008-04-16 18:04 ` Chris Mason [this message]
2008-04-16 18:10 ` Andi Kleen
2008-04-16 18:14 ` Jens Axboe
2008-04-16 18:24 ` Andi Kleen
2008-04-16 18:26 ` Jens Axboe
2008-04-16 18:28 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200804161404.04202.chris.mason@oracle.com \
--to=chris.mason@oracle.com \
--cc=andi@firstfloor.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).