Multi-device update

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Multi-device update
@ 2008-04-16 15:34 Chris Mason
  2008-04-16 16:14 ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-04-16 15:34 UTC (permalink / raw)
  To: linux-btrfs

Hello everyone,

I've pushed out another set of changes to the unstable trees, and these 
include:

* RAID 1+0 support
    mkfs.btrfs -d raid10 -m raid10 /dev/sd...
    4 or more drives required.

* async work queues for checksumming writes
* Better back references in the multi-device data structs

The async work queues include code to checksum data pages without the FS mutex 
held, greatly increasing streaming write throughput.  On my 4 drive system, I 
was getting around 120MB/s writes with checksumming on.  Now I get 180MB/s, 
which is disk speed.

The rest of the week will be spent doing hot add/remove of devices.  Happy 
testing ;)

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 15:34 Multi-device update Chris Mason
@ 2008-04-16 16:14 ` Andi Kleen
  2008-04-16 16:54   ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2008-04-16 16:14 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

Chris Mason <chris.mason@oracle.com> writes:
>
> The async work queues include code to checksum data pages without the FS mutex 

Are they able to distribute work to other cores?

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 16:14 ` Andi Kleen
@ 2008-04-16 16:54   ` Chris Mason
  2008-04-16 17:43     ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-04-16 16:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-btrfs

On Wednesday 16 April 2008, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> > The async work queues include code to checksum data pages without the FS
> > mutex
>
> Are they able to distribute work to other cores?

Yes, it just uses a workqueue.  The current implemention is pretty simple, it 
surely could be more effective at spreading the work around.

I'm testing a variant that only tosses over to the async queue for pdflush, 
inline reclaim should stay inline.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 16:54   ` Chris Mason
@ 2008-04-16 17:43     ` Andi Kleen
  2008-04-16 18:04       ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2008-04-16 17:43 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

Chris Mason <chris.mason@oracle.com> writes:

> On Wednesday 16 April 2008, Andi Kleen wrote:
>> Chris Mason <chris.mason@oracle.com> writes:
>> > The async work queues include code to checksum data pages without the FS
>> > mutex
>>
>> Are they able to distribute work to other cores?
>
> Yes, it just uses a workqueue. 

Unfortunately work queues don't do that by default currently. They
tend to process on the current CPU only.

> The current implemention is pretty simple, it 
> surely could be more effective at spreading the work around.
>
> I'm testing a variant that only tosses over to the async queue for pdflush, 
> inline reclaim should stay inline.

Longer term I would hope that write checksum will be basically free by doing
csum-copy at write() time. The only problem is just where to store the
checksum between the write and the final IO? There's no space in 
struct page.

The same could be also done for read() but that might be a little more
tricky because it would require delayed error reporting and it might
be difficult to do this for partial blocks?

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 17:43     ` Andi Kleen
@ 2008-04-16 18:04       ` Chris Mason
  2008-04-16 18:10         ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2008-04-16 18:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-btrfs

On Wednesday 16 April 2008, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >> > The async work queues include code to checksum data pages without the
> >> > FS mutex
> >>
> >> Are they able to distribute work to other cores?
> >
> > Yes, it just uses a workqueue.
>
> Unfortunately work queues don't do that by default currently. They
> tend to process on the current CPU only.

Well, I see multiple work queue threads using CPU time, but I haven't spent 
much time optimizing it.  There's definitely room for improvement.

>
> > The current implemention is pretty simple, it
> > surely could be more effective at spreading the work around.
> >
> > I'm testing a variant that only tosses over to the async queue for
> > pdflush, inline reclaim should stay inline.
>
> Longer term I would hope that write checksum will be basically free by
> doing csum-copy at write() time. The only problem is just where to store
> the checksum between the write and the final IO? There's no space in
> struct page.

At write time is easier (except for mmap) because I can toss the csum directly 
into the btree inside btrfs_file_write.  The current code avoids that 
complexity and does it all at writeout.

One advantage to the current code is that I'm able to optimize tree searches 
away but checksumming a bunch of pages at a time.  Multiple pages worth of 
checksums get stored in a single btree item, so at least for btree operations 
the current code is fairly optimal.

>
> The same could be also done for read() but that might be a little more
> tricky because it would require delayed error reporting and it might
> be difficult to do this for partial blocks?

Yeah, it doesn't quite fit with how the kernel does reads.  For now it is much 
easier if the retry-other-mirror operation happens long before copy_to_user.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 18:04       ` Chris Mason
@ 2008-04-16 18:10         ` Andi Kleen
  2008-04-16 18:14           ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2008-04-16 18:10 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

Chris Mason wrote:
> On Wednesday 16 April 2008, Andi Kleen wrote:
>> Chris Mason <chris.mason@oracle.com> writes:
>>> On Wednesday 16 April 2008, Andi Kleen wrote:
>>>> Chris Mason <chris.mason@oracle.com> writes:
>>>>> The async work queues include code to checksum data pages without the
>>>>> FS mutex
>>>> Are they able to distribute work to other cores?
>>> Yes, it just uses a workqueue.
>> Unfortunately work queues don't do that by default currently. They
>> tend to process on the current CPU only.
> 
> Well, I see multiple work queue threads using CPU time, but I haven't spent 
> much time optimizing it.  There's definitely room for improvement.

That's likely because you submit from multiple CPUs. But with a single
submitter running on a single CPU there shouldn't be any load balancing
currently.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 18:10         ` Andi Kleen
@ 2008-04-16 18:14           ` Jens Axboe
  2008-04-16 18:24             ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2008-04-16 18:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Mason, linux-btrfs

On Wed, Apr 16 2008, Andi Kleen wrote:
> Chris Mason wrote:
> > On Wednesday 16 April 2008, Andi Kleen wrote:
> >> Chris Mason <chris.mason@oracle.com> writes:
> >>> On Wednesday 16 April 2008, Andi Kleen wrote:
> >>>> Chris Mason <chris.mason@oracle.com> writes:
> >>>>> The async work queues include code to checksum data pages without the
> >>>>> FS mutex
> >>>> Are they able to distribute work to other cores?
> >>> Yes, it just uses a workqueue.
> >> Unfortunately work queues don't do that by default currently. They
> >> tend to process on the current CPU only.
> > 
> > Well, I see multiple work queue threads using CPU time, but I haven't spent 
> > much time optimizing it.  There's definitely room for improvement.
> 
> That's likely because you submit from multiple CPUs. But with a single
> submitter running on a single CPU there shouldn't be any load balancing
> currently.

There have been various implementations of queue_work_on() posted
through the years, I've had one version that I've used off and on for a
long time:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=c68c42fd6df96f5b3fb5b8b47c571f233d054c71

then you need some balancing decider on top of that, of course.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 18:14           ` Jens Axboe
@ 2008-04-16 18:24             ` Andi Kleen
  2008-04-16 18:26               ` Jens Axboe
  2008-04-16 18:28               ` Chris Mason
  0 siblings, 2 replies; 10+ messages in thread
From: Andi Kleen @ 2008-04-16 18:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Chris Mason, linux-btrfs


> There have been various implementations of queue_work_on() posted
> through the years, I've had one version that I've used off and on for a
> long time:

queue_work_on is the wrong interface I think. You rather
want a pool of non pinned threads that are then load balanced by the
scheduler (who knows best what cpus have cycles available)

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 18:24             ` Andi Kleen
@ 2008-04-16 18:26               ` Jens Axboe
  2008-04-16 18:28               ` Chris Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2008-04-16 18:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Chris Mason, linux-btrfs

On Wed, Apr 16 2008, Andi Kleen wrote:
> 
> > There have been various implementations of queue_work_on() posted
> > through the years, I've had one version that I've used off and on for a
> > long time:
> 
> queue_work_on is the wrong interface I think. You rather
> want a pool of non pinned threads that are then load balanced by the
> scheduler (who knows best what cpus have cycles available)

Yeah, that actually sounds like the best interface. What I described
typically ends up trying to be too clever, you really want to leave any
scheduling decisions to the scheduler.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multi-device update
  2008-04-16 18:24             ` Andi Kleen
  2008-04-16 18:26               ` Jens Axboe
@ 2008-04-16 18:28               ` Chris Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Mason @ 2008-04-16 18:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jens Axboe, linux-btrfs

On Wednesday 16 April 2008, Andi Kleen wrote:
> > There have been various implementations of queue_work_on() posted
> > through the years, I've had one version that I've used off and on for a
> > long time:
>
> queue_work_on is the wrong interface I think. You rather
> want a pool of non pinned threads that are then load balanced by the
> scheduler (who knows best what cpus have cycles available)

Fair enough, I'll tune things a bit.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-04-16 18:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-16 15:34 Multi-device update Chris Mason
2008-04-16 16:14 ` Andi Kleen
2008-04-16 16:54   ` Chris Mason
2008-04-16 17:43     ` Andi Kleen
2008-04-16 18:04       ` Chris Mason
2008-04-16 18:10         ` Andi Kleen
2008-04-16 18:14           ` Jens Axboe
2008-04-16 18:24             ` Andi Kleen
2008-04-16 18:26               ` Jens Axboe
2008-04-16 18:28               ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).