All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] dm-userspace
@ 2006-04-19 19:48 Dan Smith
  2006-04-20 17:50 ` Eric Van Hensbergen
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Smith @ 2006-04-19 19:48 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 1307 bytes --]

Hi List,

As you all know, I'm working on a userspace-controlled CoW
implementation using device-mapper.  The idea is to offer a pseudo
device through device-mapper that has CoW behavior, but where the
block allocation decisions are made from userspace.

My thoughts are that it might be best to abandon the concept of the
"dm-cow" target and instead work on a "dm-userspace" target.  The
userspace cow application I'm working on would remain mostly the
same.  Similarly, dm-userspace would look almost identical to my
dm-cow does right now, but more generic.  The target would simply
present the details of the data passed to the map() function to
userspace, which would respond with a target device and sector of
where to send the request.

A generic dm-userspace target would allow for testing of new
algorithms (RAID, CoW, etc) from a userspace application, as well as
some more interesting things involving distributed applications.  Just
like FUSE allows for some neat (although not necessarily
high-performance) tricks, dm-userspace could allow the same thing for
block devices.

Would the device-mapper maintainers be interested in accepting
something like dm-userspace upstream?

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-19 19:48 Dan Smith
@ 2006-04-20 17:50 ` Eric Van Hensbergen
  2006-04-20 20:06   ` Dan Smith
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Van Hensbergen @ 2006-04-20 17:50 UTC (permalink / raw)
  To: device-mapper development

On 4/19/06, Dan Smith <danms@us.ibm.com> wrote:
>
> A generic dm-userspace target would allow for testing of new
> algorithms (RAID, CoW, etc) from a userspace application, as well as
> some more interesting things involving distributed applications.  Just
> like FUSE allows for some neat (although not necessarily
> high-performance) tricks, dm-userspace could allow the same thing for
> block devices.
>
> Would the device-mapper maintainers be interested in accepting
> something like dm-userspace upstream?
>

It seems like this would be really useful for prototyping and
debugging new device mapper modules (like the dm-cache ideas I posted
about a few weeks back).  At the very least, I'd be interested in
using it?

A couple of questions:

1) how would you handle permissions?  IIRC FUSE allows normal users to
bind their own FUSE userspace file systems, would something similar
happen for dm-userspace or would even binding a userspace
device-mapper module require root?

2) How close would the userspace API be to the kernel device-mapper
API?  It'd be nice to have something close so that userspace code
could easily be migrated into the kernel (for performance reasons) as
appropriate.

3) When do you think you'll be able to post a patch for RFC?

           -eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-20 17:50 ` Eric Van Hensbergen
@ 2006-04-20 20:06   ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-20 20:06 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 2333 bytes --]

EVH> It seems like this would be really useful for prototyping and
EVH> debugging new device mapper modules (like the dm-cache ideas I
EVH> posted about a few weeks back).

Yes, I think that is a major benefit to going with the dm-userspace
idea over the cow-specific dm-cow.

EVH> 1) how would you handle permissions?  IIRC FUSE allows normal
EVH> users to bind their own FUSE userspace file systems, would
EVH> something similar happen for dm-userspace or would even binding a
EVH> userspace device-mapper module require root?

Hmm, I think that due to the way device-mapper works, this would be
difficult.  Without the ability to create a pseudo-device with a
dm-userspace target, I think you'd be out of luck.

EVH> 2) How close would the userspace API be to the kernel
EVH> device-mapper API?  It'd be nice to have something close so that
EVH> userspace code could easily be migrated into the kernel (for
EVH> performance reasons) as appropriate.

Well, currently I pass basically the same information to userspace.
You get the location of the access and whether it was a read or a
write.  The userspace module passes back a destination location,
device, and whether or not to copy the area from a source location
(which gives you an interface to kcopyd).

I think that we could easily add a layer to be able to run simple
device-mapper modules in userspace with it, similar to how nfsim
works, which may be very useful to people trying to write new
device-mapper targets.  What are people's thoughts on this?

Something I should mention here: to simplify things and reduce
communication, the current module blocks contiguous regions of the
disk together so that you can talk about whole chunks at a time,
instead of each individual bio request, which may be of varying size
and location.  Block sizes can be no smaller than 512 bytes.  I think
that most device-mapper work will be dealing with fixed blocks of some
size, so this shouldn't be a problem.

EVH> 3) When do you think you'll be able to post a patch for RFC?

I'm currently just cleaning some things up at the moment.  I would be
glad to post a patch and a sample userspace app if people would be
willing to take a look at it.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC] dm-userspace
@ 2006-04-26 22:45 ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-26 22:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 2368 bytes --]

Xen needs to be able to directly access disk formats such as QEMU's
qcow, VMware's vmdk, and possibly others.  Most of these formats are
based on copy-on-write ideas, and thus have a base image and a bunch
of modified blocks stored elsewhere.  Presenting this to a virtual
machine transparently as a normal block device would be ideal.  The
solution I propose is to use device-mapper for redirecting block
accesses to the appropriate locations within either the base image or
the COW space, with the following constraints:

1. The block-allocation algorithm and formatting scheme should not be
   in the kernel.  This gives the most flexibility and puts the
   complexity in userspace.
2. Actual data flow should happen only in the kernel, and userspace
   should be able to control it without the blocks being passed back
   and forth.

So, I developed a generic device-mapper target called dm-userspace
which allows a userspace application to control the block mapping in a
mostly generic way.  With the functionality it provides, I was able to
write a userspace daemon that handles the mapping of blocks such that
a qcow file could be presented as a single block device, mounted and
accessed as if it were a normal disk.  If/when VMware releases their
vmdk spec under the GPL, adding support for it would be relatively
simple.  This would give us a unified block device to export to the
virtual machine, that would be backed by a complex format such as vmdk
or qcow.

In addition to providing support for the above scenario, dm-userspace
could be used for other things as well.  It's possible that new
device-mapper targets could be developed in userspace using a special
application that used dm-userspace to simulate the kernel
environment.  Additionally, filesystem debuggers may be able to use
dm-userspace to provide interactive control and logging of disk
writes. 

A patch against 2.6.16.9 to add dm-userspace to the kernel is
available here:

  http://static.danplanet.com/dm-userspace/dmu-2.6.16.9.patch

After you have a patched kernel, you can build the (very tiny) helper
library and example program, available here:

  http://static.danplanet.com/dm-userspace/libdmu-0.1.tar.gz

Comments would be appreciated :)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC] dm-userspace
@ 2006-04-26 22:45 ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-26 22:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2368 bytes --]

Xen needs to be able to directly access disk formats such as QEMU's
qcow, VMware's vmdk, and possibly others.  Most of these formats are
based on copy-on-write ideas, and thus have a base image and a bunch
of modified blocks stored elsewhere.  Presenting this to a virtual
machine transparently as a normal block device would be ideal.  The
solution I propose is to use device-mapper for redirecting block
accesses to the appropriate locations within either the base image or
the COW space, with the following constraints:

1. The block-allocation algorithm and formatting scheme should not be
   in the kernel.  This gives the most flexibility and puts the
   complexity in userspace.
2. Actual data flow should happen only in the kernel, and userspace
   should be able to control it without the blocks being passed back
   and forth.

So, I developed a generic device-mapper target called dm-userspace
which allows a userspace application to control the block mapping in a
mostly generic way.  With the functionality it provides, I was able to
write a userspace daemon that handles the mapping of blocks such that
a qcow file could be presented as a single block device, mounted and
accessed as if it were a normal disk.  If/when VMware releases their
vmdk spec under the GPL, adding support for it would be relatively
simple.  This would give us a unified block device to export to the
virtual machine, that would be backed by a complex format such as vmdk
or qcow.

In addition to providing support for the above scenario, dm-userspace
could be used for other things as well.  It's possible that new
device-mapper targets could be developed in userspace using a special
application that used dm-userspace to simulate the kernel
environment.  Additionally, filesystem debuggers may be able to use
dm-userspace to provide interactive control and logging of disk
writes. 

A patch against 2.6.16.9 to add dm-userspace to the kernel is
available here:

  http://static.danplanet.com/dm-userspace/dmu-2.6.16.9.patch

After you have a patched kernel, you can build the (very tiny) helper
library and example program, available here:

  http://static.danplanet.com/dm-userspace/libdmu-0.1.tar.gz

Comments would be appreciated :)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-26 22:45 ` Dan Smith
@ 2006-04-26 22:55   ` Ming Zhang
  -1 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-26 22:55 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-kernel

just curious, will the speed be a problem here? considering each time it
needs to contact user space for mapping a piece of data. and the size
unit is per sector in dm?

do u have any benchmark results about overhead?

ming


On Wed, 2006-04-26 at 15:45 -0700, Dan Smith wrote:
> Xen needs to be able to directly access disk formats such as QEMU's
> qcow, VMware's vmdk, and possibly others.  Most of these formats are
> based on copy-on-write ideas, and thus have a base image and a bunch
> of modified blocks stored elsewhere.  Presenting this to a virtual
> machine transparently as a normal block device would be ideal.  The
> solution I propose is to use device-mapper for redirecting block
> accesses to the appropriate locations within either the base image or
> the COW space, with the following constraints:
> 
> 1. The block-allocation algorithm and formatting scheme should not be
>    in the kernel.  This gives the most flexibility and puts the
>    complexity in userspace.
> 2. Actual data flow should happen only in the kernel, and userspace
>    should be able to control it without the blocks being passed back
>    and forth.
> 
> So, I developed a generic device-mapper target called dm-userspace
> which allows a userspace application to control the block mapping in a
> mostly generic way.  With the functionality it provides, I was able to
> write a userspace daemon that handles the mapping of blocks such that
> a qcow file could be presented as a single block device, mounted and
> accessed as if it were a normal disk.  If/when VMware releases their
> vmdk spec under the GPL, adding support for it would be relatively
> simple.  This would give us a unified block device to export to the
> virtual machine, that would be backed by a complex format such as vmdk
> or qcow.
> 
> In addition to providing support for the above scenario, dm-userspace
> could be used for other things as well.  It's possible that new
> device-mapper targets could be developed in userspace using a special
> application that used dm-userspace to simulate the kernel
> environment.  Additionally, filesystem debuggers may be able to use
> dm-userspace to provide interactive control and logging of disk
> writes. 
> 
> A patch against 2.6.16.9 to add dm-userspace to the kernel is
> available here:
> 
>   http://static.danplanet.com/dm-userspace/dmu-2.6.16.9.patch
> 
> After you have a patched kernel, you can build the (very tiny) helper
> library and example program, available here:
> 
>   http://static.danplanet.com/dm-userspace/libdmu-0.1.tar.gz
> 
> Comments would be appreciated :)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-04-26 22:55   ` Ming Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-26 22:55 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-kernel

just curious, will the speed be a problem here? considering each time it
needs to contact user space for mapping a piece of data. and the size
unit is per sector in dm?

do u have any benchmark results about overhead?

ming


On Wed, 2006-04-26 at 15:45 -0700, Dan Smith wrote:
> Xen needs to be able to directly access disk formats such as QEMU's
> qcow, VMware's vmdk, and possibly others.  Most of these formats are
> based on copy-on-write ideas, and thus have a base image and a bunch
> of modified blocks stored elsewhere.  Presenting this to a virtual
> machine transparently as a normal block device would be ideal.  The
> solution I propose is to use device-mapper for redirecting block
> accesses to the appropriate locations within either the base image or
> the COW space, with the following constraints:
> 
> 1. The block-allocation algorithm and formatting scheme should not be
>    in the kernel.  This gives the most flexibility and puts the
>    complexity in userspace.
> 2. Actual data flow should happen only in the kernel, and userspace
>    should be able to control it without the blocks being passed back
>    and forth.
> 
> So, I developed a generic device-mapper target called dm-userspace
> which allows a userspace application to control the block mapping in a
> mostly generic way.  With the functionality it provides, I was able to
> write a userspace daemon that handles the mapping of blocks such that
> a qcow file could be presented as a single block device, mounted and
> accessed as if it were a normal disk.  If/when VMware releases their
> vmdk spec under the GPL, adding support for it would be relatively
> simple.  This would give us a unified block device to export to the
> virtual machine, that would be backed by a complex format such as vmdk
> or qcow.
> 
> In addition to providing support for the above scenario, dm-userspace
> could be used for other things as well.  It's possible that new
> device-mapper targets could be developed in userspace using a special
> application that used dm-userspace to simulate the kernel
> environment.  Additionally, filesystem debuggers may be able to use
> dm-userspace to provide interactive control and logging of disk
> writes. 
> 
> A patch against 2.6.16.9 to add dm-userspace to the kernel is
> available here:
> 
>   http://static.danplanet.com/dm-userspace/dmu-2.6.16.9.patch
> 
> After you have a patched kernel, you can build the (very tiny) helper
> library and example program, available here:
> 
>   http://static.danplanet.com/dm-userspace/libdmu-0.1.tar.gz
> 
> Comments would be appreciated :)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-26 22:55   ` [dm-devel] " Ming Zhang
@ 2006-04-26 23:07     ` Dan Smith
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-26 23:07 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1609 bytes --]

MZ> just curious, will the speed be a problem here? 

I'm glad you asked... :)

MZ> considering each time it needs to contact user space for mapping a
MZ> piece of data. 

Actually, that's not the case.  The idea is for mappings to be cached
in the kernel module so that the communication with userspace only
needs to happen once per block.  The thought is to ask once for a
read, and then remember that mapping until a write happens, which
might change the story.  If so, we ask userspace again.

Right now, the kernel module expires mappings in a pretty brain-dead
way to make sure the list doesn't get too long.  An intelligent data
structure and expiration method would probably improve performance
quite a bit.

I don't have any benchmark data to post right now.  I did some quick
analysis a while back and found it to be not too bad.  When using loop
devices as a backing store, I achieved performance as high as a little
under 50% of native.

MZ> and the size unit is per sector in dm?

Well, for qcow it is a sector, yes.  The module itself, however, can
use any block size (as long as it is a multiple of a sector).  Before
I started work on qcow support, I wrote a test application that used
2MiB blocks, which is where I got the approximately 50% performance
value I described above.

Our thought is that this would mostly be used for the OS images of
virtual machines, which shouldn't change much, which would help to
prevent constantly asking userspace to map blocks.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-04-26 23:07     ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-26 23:07 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

MZ> just curious, will the speed be a problem here? 

I'm glad you asked... :)

MZ> considering each time it needs to contact user space for mapping a
MZ> piece of data. 

Actually, that's not the case.  The idea is for mappings to be cached
in the kernel module so that the communication with userspace only
needs to happen once per block.  The thought is to ask once for a
read, and then remember that mapping until a write happens, which
might change the story.  If so, we ask userspace again.

Right now, the kernel module expires mappings in a pretty brain-dead
way to make sure the list doesn't get too long.  An intelligent data
structure and expiration method would probably improve performance
quite a bit.

I don't have any benchmark data to post right now.  I did some quick
analysis a while back and found it to be not too bad.  When using loop
devices as a backing store, I achieved performance as high as a little
under 50% of native.

MZ> and the size unit is per sector in dm?

Well, for qcow it is a sector, yes.  The module itself, however, can
use any block size (as long as it is a multiple of a sector).  Before
I started work on qcow support, I wrote a test application that used
2MiB blocks, which is where I got the approximately 50% performance
value I described above.

Our thought is that this would mostly be used for the OS images of
virtual machines, which shouldn't change much, which would help to
prevent constantly asking userspace to map blocks.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-26 23:07     ` [dm-devel] " Dan Smith
@ 2006-04-26 23:41       ` Ming Zhang
  -1 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-26 23:41 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, linux-kernel



On Wed, 2006-04-26 at 16:07 -0700, Dan Smith wrote:
> MZ> just curious, will the speed be a problem here? 
> 
> I'm glad you asked... :)
> 
> MZ> considering each time it needs to contact user space for mapping a
> MZ> piece of data. 
> 
> Actually, that's not the case.  The idea is for mappings to be cached
> in the kernel module so that the communication with userspace only
> needs to happen once per block.  The thought is to ask once for a
> read, and then remember that mapping until a write happens, which
> might change the story.  If so, we ask userspace again.

sounds reasonable. saw the caching now.


> 
> Right now, the kernel module expires mappings in a pretty brain-dead
> way to make sure the list doesn't get too long.  An intelligent data
> structure and expiration method would probably improve performance
> quite a bit.
> 
> I don't have any benchmark data to post right now.  I did some quick
> analysis a while back and found it to be not too bad.  When using loop
> devices as a backing store, I achieved performance as high as a little
> under 50% of native.

o. :P 50% is a considerable amount. anyway, good start. ;)


> 
> MZ> and the size unit is per sector in dm?
> 
> Well, for qcow it is a sector, yes.  The module itself, however, can
> use any block size (as long as it is a multiple of a sector).  Before
> I started work on qcow support, I wrote a test application that used
> 2MiB blocks, which is where I got the approximately 50% performance
> value I described above.

pure read or read and write mixed?


> 
> Our thought is that this would mostly be used for the OS images of
> virtual machines, which shouldn't change much, which would help to
> prevent constantly asking userspace to map blocks.
> 

if this is the scenario, then may be more aggressive mapping can be used
here.

u might have interest on this. some developers are working on a general
scsi target layer that pass scsi cdb to user space for processing while
keep data transfer in kernel space. so both of u will meet same overhead
here. so 2 projects might learn from each other on this.

ps, trivial thing, the userspace_request is frequently used and can use
a slab cache.


ming

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-04-26 23:41       ` Ming Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-26 23:41 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, linux-kernel



On Wed, 2006-04-26 at 16:07 -0700, Dan Smith wrote:
> MZ> just curious, will the speed be a problem here? 
> 
> I'm glad you asked... :)
> 
> MZ> considering each time it needs to contact user space for mapping a
> MZ> piece of data. 
> 
> Actually, that's not the case.  The idea is for mappings to be cached
> in the kernel module so that the communication with userspace only
> needs to happen once per block.  The thought is to ask once for a
> read, and then remember that mapping until a write happens, which
> might change the story.  If so, we ask userspace again.

sounds reasonable. saw the caching now.


> 
> Right now, the kernel module expires mappings in a pretty brain-dead
> way to make sure the list doesn't get too long.  An intelligent data
> structure and expiration method would probably improve performance
> quite a bit.
> 
> I don't have any benchmark data to post right now.  I did some quick
> analysis a while back and found it to be not too bad.  When using loop
> devices as a backing store, I achieved performance as high as a little
> under 50% of native.

o. :P 50% is a considerable amount. anyway, good start. ;)


> 
> MZ> and the size unit is per sector in dm?
> 
> Well, for qcow it is a sector, yes.  The module itself, however, can
> use any block size (as long as it is a multiple of a sector).  Before
> I started work on qcow support, I wrote a test application that used
> 2MiB blocks, which is where I got the approximately 50% performance
> value I described above.

pure read or read and write mixed?


> 
> Our thought is that this would mostly be used for the OS images of
> virtual machines, which shouldn't change much, which would help to
> prevent constantly asking userspace to map blocks.
> 

if this is the scenario, then may be more aggressive mapping can be used
here.

u might have interest on this. some developers are working on a general
scsi target layer that pass scsi cdb to user space for processing while
keep data transfer in kernel space. so both of u will meet same overhead
here. so 2 projects might learn from each other on this.

ps, trivial thing, the userspace_request is frequently used and can use
a slab cache.


ming




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-26 23:41       ` [dm-devel] " Ming Zhang
@ 2006-04-27  2:22         ` Dan Smith
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-27  2:22 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2267 bytes --]

MZ> o. :P 50% is a considerable amount. anyway, good start. ;)

Indeed, it is a considerable performance hit, but I haven't really
done much in the way of a serious performance analysis.

MZ> pure read or read and write mixed?

Actually IIRC, that was the write performance only (I used bonnie++ to
get the numbers).  I believe the read performance is generally good
for large blocks.  If the block is already mapped for write, then you
get the reads for free.  I really should resurrect my older tests and
see if I can produce something more current :)

My previous numbers were gathered by using an additional step of
actually rewriting the device-mapper table periodically, using
dm-linear to statically map blocks that were mapped for writing.  I
think that with a better data structure in dm-userspace (i.e. better
than a linked-list), performance will be better without the need to
constantly suspend and resume the device to change tables.

MZ> if this is the scenario, then may be more aggressive mapping can
MZ> be used here.

Right, so the userspace side may be able to improve performance by
mapping blocks in advance.  If it is believed that the next several
blocks will be written to sequentially, the userspace app can push
mappings for those in the same message as the response to the initial
block, which would eliminate several additional requests.

Perhaps something could be done with certain CoW formats that would
allow the userspace app to push a bunch of mappings that it believes
might be needed, and then have the kernel report back later which were
actually used.  In that case, you could reclaim space in the CoW
device that you incorrectly predicted would be needed.

MZ> u might have interest on this. some developers are working on a
MZ> general scsi target layer that pass scsi cdb to user space for
MZ> processing while keep data transfer in kernel space. so both of u
MZ> will meet same overhead here. so 2 projects might learn from each
MZ> other on this.

Great!

MZ> ps, trivial thing, the userspace_request is frequently used and
MZ> can use a slab cache.

Ah, ok, good point... thanks ;)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-04-27  2:22         ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-04-27  2:22 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2267 bytes --]

MZ> o. :P 50% is a considerable amount. anyway, good start. ;)

Indeed, it is a considerable performance hit, but I haven't really
done much in the way of a serious performance analysis.

MZ> pure read or read and write mixed?

Actually IIRC, that was the write performance only (I used bonnie++ to
get the numbers).  I believe the read performance is generally good
for large blocks.  If the block is already mapped for write, then you
get the reads for free.  I really should resurrect my older tests and
see if I can produce something more current :)

My previous numbers were gathered by using an additional step of
actually rewriting the device-mapper table periodically, using
dm-linear to statically map blocks that were mapped for writing.  I
think that with a better data structure in dm-userspace (i.e. better
than a linked-list), performance will be better without the need to
constantly suspend and resume the device to change tables.

MZ> if this is the scenario, then may be more aggressive mapping can
MZ> be used here.

Right, so the userspace side may be able to improve performance by
mapping blocks in advance.  If it is believed that the next several
blocks will be written to sequentially, the userspace app can push
mappings for those in the same message as the response to the initial
block, which would eliminate several additional requests.

Perhaps something could be done with certain CoW formats that would
allow the userspace app to push a bunch of mappings that it believes
might be needed, and then have the kernel report back later which were
actually used.  In that case, you could reclaim space in the CoW
device that you incorrectly predicted would be needed.

MZ> u might have interest on this. some developers are working on a
MZ> general scsi target layer that pass scsi cdb to user space for
MZ> processing while keep data transfer in kernel space. so both of u
MZ> will meet same overhead here. so 2 projects might learn from each
MZ> other on this.

Great!

MZ> ps, trivial thing, the userspace_request is frequently used and
MZ> can use a slab cache.

Ah, ok, good point... thanks ;)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-27  2:22         ` [dm-devel] " Dan Smith
@ 2006-04-27 13:09           ` Ming Zhang
  -1 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-27 13:09 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, linux-kernel

On Wed, 2006-04-26 at 19:22 -0700, Dan Smith wrote:
> MZ> o. :P 50% is a considerable amount. anyway, good start. ;)
> 
> Indeed, it is a considerable performance hit, but I haven't really
> done much in the way of a serious performance analysis.
> 
> MZ> pure read or read and write mixed?
> 
> Actually IIRC, that was the write performance only (I used bonnie++ to
> get the numbers).  I believe the read performance is generally good
> for large blocks.  If the block is already mapped for write, then you
> get the reads for free.  I really should resurrect my older tests and
> see if I can produce something more current :)

yes, considering you load a mapping for every 2MB data block, then it
should close to dm-linear for sequential read.

> 
> My previous numbers were gathered by using an additional step of
> actually rewriting the device-mapper table periodically, using
> dm-linear to statically map blocks that were mapped for writing.  I
> think that with a better data structure in dm-userspace (i.e. better
> than a linked-list), performance will be better without the need to
> constantly suspend and resume the device to change tables.

ic. sounds reasonable.

> 
> MZ> if this is the scenario, then may be more aggressive mapping can
> MZ> be used here.
> 
> Right, so the userspace side may be able to improve performance by
> mapping blocks in advance.  If it is believed that the next several
> blocks will be written to sequentially, the userspace app can push
> mappings for those in the same message as the response to the initial
> block, which would eliminate several additional requests.

this is like the prefetch of mapping information.

> 
> Perhaps something could be done with certain CoW formats that would
> allow the userspace app to push a bunch of mappings that it believes
> might be needed, and then have the kernel report back later which were
> actually used.  In that case, you could reclaim space in the CoW
> device that you incorrectly predicted would be needed.

right. and i think this might be COW formats unrelated. this solely
depends on the mapping logic at user space to do intentional allocation,
tracing, and cleaning.

> 
> MZ> u might have interest on this. some developers are working on a
> MZ> general scsi target layer that pass scsi cdb to user space for
> MZ> processing while keep data transfer in kernel space. so both of u
> MZ> will meet same overhead here. so 2 projects might learn from each
> MZ> other on this.
> 
> Great!

project name is stgt, you can find it at berlios.de, which is down right
now. :P


> 
> MZ> ps, trivial thing, the userspace_request is frequently used and
> MZ> can use a slab cache.
> 
> Ah, ok, good point... thanks ;)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-04-27 13:09           ` Ming Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-04-27 13:09 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, linux-kernel

On Wed, 2006-04-26 at 19:22 -0700, Dan Smith wrote:
> MZ> o. :P 50% is a considerable amount. anyway, good start. ;)
> 
> Indeed, it is a considerable performance hit, but I haven't really
> done much in the way of a serious performance analysis.
> 
> MZ> pure read or read and write mixed?
> 
> Actually IIRC, that was the write performance only (I used bonnie++ to
> get the numbers).  I believe the read performance is generally good
> for large blocks.  If the block is already mapped for write, then you
> get the reads for free.  I really should resurrect my older tests and
> see if I can produce something more current :)

yes, considering you load a mapping for every 2MB data block, then it
should close to dm-linear for sequential read.

> 
> My previous numbers were gathered by using an additional step of
> actually rewriting the device-mapper table periodically, using
> dm-linear to statically map blocks that were mapped for writing.  I
> think that with a better data structure in dm-userspace (i.e. better
> than a linked-list), performance will be better without the need to
> constantly suspend and resume the device to change tables.

ic. sounds reasonable.

> 
> MZ> if this is the scenario, then may be more aggressive mapping can
> MZ> be used here.
> 
> Right, so the userspace side may be able to improve performance by
> mapping blocks in advance.  If it is believed that the next several
> blocks will be written to sequentially, the userspace app can push
> mappings for those in the same message as the response to the initial
> block, which would eliminate several additional requests.

this is like the prefetch of mapping information.

> 
> Perhaps something could be done with certain CoW formats that would
> allow the userspace app to push a bunch of mappings that it believes
> might be needed, and then have the kernel report back later which were
> actually used.  In that case, you could reclaim space in the CoW
> device that you incorrectly predicted would be needed.

right. and i think this might be COW formats unrelated. this solely
depends on the mapping logic at user space to do intentional allocation,
tracing, and cleaning.

> 
> MZ> u might have interest on this. some developers are working on a
> MZ> general scsi target layer that pass scsi cdb to user space for
> MZ> processing while keep data transfer in kernel space. so both of u
> MZ> will meet same overhead here. so 2 projects might learn from each
> MZ> other on this.
> 
> Great!

project name is stgt, you can find it at berlios.de, which is down right
now. :P


> 
> MZ> ps, trivial thing, the userspace_request is frequently used and
> MZ> can use a slab cache.
> 
> Ah, ok, good point... thanks ;)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-04-26 22:55   ` [dm-devel] " Ming Zhang
@ 2006-05-09 23:02     ` Dan Smith
  -1 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-05-09 23:02 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, Xen Developers, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2828 bytes --]

(I'm including the xen-devel list on this, as things are starting to
get interesting).

MZ> do u have any benchmark results about overhead?

So, I've spent some time over the last week working to improve
performance and collect some benchmark data.

I moved to using slab caches for the request and remap objects, which
helped a little.  I also added a poll() method to the control device,
which improved performance significantly.  Finally, I changed the
internal remap storage data structure to a hash table, which had a
very large performance impact (about 8x).

Copying data to a device backed by dm-userspace presents a worst-case
scenario, especially with a small block-size like what qcow uses.  In
one of my tests, I copy about 20MB of data to a dm-userspace device,
backed by files hooked up to the loopback driver.  I compare this with
a "control" of a single loop-mounted image file (i.e., without
dm-userspace or CoW).  I measured the time to mount, copy, and unmount
the device, which (with the recent performance improvements) are
approximately:

  Normal Loop:        1 seconds
  dm-userspace/qcow: 10 seconds

For comparison, before adding poll() and the hash table, the
dm-userspace number was over 70 seconds.

One of the most interesting cases for us, however, is providing a
CoW-based VM disk image, which is mostly used for reading, with a
small amount of writing for configuration data.  To test this, I used
Xen to compare a fresh FC4 boot (firstboot, where things like SSH keys
are generated and written to disk) that used an LVM volume as root to
using dm-userspace (and loopback-files) as the root.  The numbers are
approximately:

  LVM root:          26 seconds
  dm-userspace/qcow: 27 seconds

Note that this does not yet include any read-ahead type behavior, nor
does it include priming the kernel module with remaps at create-time
(which results in a few initial compulsory "misses").  Also, I removed
the remap expiration functionality while adding the hash table and
have not yet added it back, so that may further improve performance
for large amounts of remaps (and bucket collisions).

Here is a link to a patch against 2.6.16.14:

  http://static.danplanet.com/dm-userspace/dmu-2.6.16.14-patch

Here are links to the userspace library, as well as the cow daemon,
which provides qcow support:

  http://static.danplanet.com/dm-userspace/libdmu-0.2.tar.gz
  http://static.danplanet.com/dm-userspace/cowd-0.1.tar.gz

(Note that the daemon is still rather rough, and the qcow
implementation has some bugs.  However, it works for light testing and
the occasional luck-assisted heavy testing)

As always, comments welcome and appreciated :)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-05-09 23:02     ` Dan Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Smith @ 2006-05-09 23:02 UTC (permalink / raw)
  To: mingz; +Cc: device-mapper development, linux-kernel, Xen Developers

[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]

(I'm including the xen-devel list on this, as things are starting to
get interesting).

MZ> do u have any benchmark results about overhead?

So, I've spent some time over the last week working to improve
performance and collect some benchmark data.

I moved to using slab caches for the request and remap objects, which
helped a little.  I also added a poll() method to the control device,
which improved performance significantly.  Finally, I changed the
internal remap storage data structure to a hash table, which had a
very large performance impact (about 8x).

Copying data to a device backed by dm-userspace presents a worst-case
scenario, especially with a small block-size like what qcow uses.  In
one of my tests, I copy about 20MB of data to a dm-userspace device,
backed by files hooked up to the loopback driver.  I compare this with
a "control" of a single loop-mounted image file (i.e., without
dm-userspace or CoW).  I measured the time to mount, copy, and unmount
the device, which (with the recent performance improvements) are
approximately:

  Normal Loop:        1 seconds
  dm-userspace/qcow: 10 seconds

For comparison, before adding poll() and the hash table, the
dm-userspace number was over 70 seconds.

One of the most interesting cases for us, however, is providing a
CoW-based VM disk image, which is mostly used for reading, with a
small amount of writing for configuration data.  To test this, I used
Xen to compare a fresh FC4 boot (firstboot, where things like SSH keys
are generated and written to disk) that used an LVM volume as root to
using dm-userspace (and loopback-files) as the root.  The numbers are
approximately:

  LVM root:          26 seconds
  dm-userspace/qcow: 27 seconds

Note that this does not yet include any read-ahead type behavior, nor
does it include priming the kernel module with remaps at create-time
(which results in a few initial compulsory "misses").  Also, I removed
the remap expiration functionality while adding the hash table and
have not yet added it back, so that may further improve performance
for large amounts of remaps (and bucket collisions).

Here is a link to a patch against 2.6.16.14:

  http://static.danplanet.com/dm-userspace/dmu-2.6.16.14-patch

Here are links to the userspace library, as well as the cow daemon,
which provides qcow support:

  http://static.danplanet.com/dm-userspace/libdmu-0.2.tar.gz
  http://static.danplanet.com/dm-userspace/cowd-0.1.tar.gz

(Note that the daemon is still rather rough, and the qcow
implementation has some bugs.  However, it works for light testing and
the occasional luck-assisted heavy testing)

As always, comments welcome and appreciated :)

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] dm-userspace
  2006-05-09 23:02     ` [dm-devel] " Dan Smith
@ 2006-05-10 13:27       ` Ming Zhang
  -1 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-05-10 13:27 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, Xen Developers, linux-kernel

On Tue, 2006-05-09 at 16:02 -0700, Dan Smith wrote:
> (I'm including the xen-devel list on this, as things are starting to
> get interesting).
> 
> MZ> do u have any benchmark results about overhead?
> 
> So, I've spent some time over the last week working to improve
> performance and collect some benchmark data.
> 
> I moved to using slab caches for the request and remap objects, which
> helped a little.  I also added a poll() method to the control device,
> which improved performance significantly.  Finally, I changed the
> internal remap storage data structure to a hash table, which had a
> very large performance impact (about 8x).


why need a poll here? ask a dumb question.

this is interesting. have u ever check the average loop up path length
with single queue and has table? this can improve by 8X, quite
impressive.



> 
> Copying data to a device backed by dm-userspace presents a worst-case
> scenario, especially with a small block-size like what qcow uses.  In
> one of my tests, I copy about 20MB of data to a dm-userspace device,
> backed by files hooked up to the loopback driver.  I compare this with
> a "control" of a single loop-mounted image file (i.e., without
> dm-userspace or CoW).  I measured the time to mount, copy, and unmount
> the device, which (with the recent performance improvements) are
> approximately:
> 
>   Normal Loop:        1 seconds
>   dm-userspace/qcow: 10 seconds
> 
> For comparison, before adding poll() and the hash table, the
> dm-userspace number was over 70 seconds.

nice improvement!

> 
> One of the most interesting cases for us, however, is providing a
> CoW-based VM disk image, which is mostly used for reading, with a
> small amount of writing for configuration data.  To test this, I used
> Xen to compare a fresh FC4 boot (firstboot, where things like SSH keys
> are generated and written to disk) that used an LVM volume as root to
> using dm-userspace (and loopback-files) as the root.  The numbers are
> approximately:
> 
>   LVM root:          26 seconds
>   dm-userspace/qcow: 27 seconds

this is quite impressive, i think application take most of the time and
some time are overlapped with io. and with little io here, this little
difference is what u can get. i think this will be very helpful for
diskless san boot.



> 
> Note that this does not yet include any read-ahead type behavior, nor
> does it include priming the kernel module with remaps at create-time
> (which results in a few initial compulsory "misses").  Also, I removed
> the remap expiration functionality while adding the hash table and
> have not yet added it back, so that may further improve performance
> for large amounts of remaps (and bucket collisions).
> 
> Here is a link to a patch against 2.6.16.14:
> 
>   http://static.danplanet.com/dm-userspace/dmu-2.6.16.14-patch
> 
> Here are links to the userspace library, as well as the cow daemon,
> which provides qcow support:
> 
>   http://static.danplanet.com/dm-userspace/libdmu-0.2.tar.gz
>   http://static.danplanet.com/dm-userspace/cowd-0.1.tar.gz
> 
> (Note that the daemon is still rather rough, and the qcow
> implementation has some bugs.  However, it works for light testing and
> the occasional luck-assisted heavy testing)
> 
> As always, comments welcome and appreciated :)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] [RFC] dm-userspace
@ 2006-05-10 13:27       ` Ming Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Zhang @ 2006-05-10 13:27 UTC (permalink / raw)
  To: Dan Smith; +Cc: device-mapper development, linux-kernel, Xen Developers

On Tue, 2006-05-09 at 16:02 -0700, Dan Smith wrote:
> (I'm including the xen-devel list on this, as things are starting to
> get interesting).
> 
> MZ> do u have any benchmark results about overhead?
> 
> So, I've spent some time over the last week working to improve
> performance and collect some benchmark data.
> 
> I moved to using slab caches for the request and remap objects, which
> helped a little.  I also added a poll() method to the control device,
> which improved performance significantly.  Finally, I changed the
> internal remap storage data structure to a hash table, which had a
> very large performance impact (about 8x).


why need a poll here? ask a dumb question.

this is interesting. have u ever check the average loop up path length
with single queue and has table? this can improve by 8X, quite
impressive.



> 
> Copying data to a device backed by dm-userspace presents a worst-case
> scenario, especially with a small block-size like what qcow uses.  In
> one of my tests, I copy about 20MB of data to a dm-userspace device,
> backed by files hooked up to the loopback driver.  I compare this with
> a "control" of a single loop-mounted image file (i.e., without
> dm-userspace or CoW).  I measured the time to mount, copy, and unmount
> the device, which (with the recent performance improvements) are
> approximately:
> 
>   Normal Loop:        1 seconds
>   dm-userspace/qcow: 10 seconds
> 
> For comparison, before adding poll() and the hash table, the
> dm-userspace number was over 70 seconds.

nice improvement!

> 
> One of the most interesting cases for us, however, is providing a
> CoW-based VM disk image, which is mostly used for reading, with a
> small amount of writing for configuration data.  To test this, I used
> Xen to compare a fresh FC4 boot (firstboot, where things like SSH keys
> are generated and written to disk) that used an LVM volume as root to
> using dm-userspace (and loopback-files) as the root.  The numbers are
> approximately:
> 
>   LVM root:          26 seconds
>   dm-userspace/qcow: 27 seconds

this is quite impressive, i think application take most of the time and
some time are overlapped with io. and with little io here, this little
difference is what u can get. i think this will be very helpful for
diskless san boot.



> 
> Note that this does not yet include any read-ahead type behavior, nor
> does it include priming the kernel module with remaps at create-time
> (which results in a few initial compulsory "misses").  Also, I removed
> the remap expiration functionality while adding the hash table and
> have not yet added it back, so that may further improve performance
> for large amounts of remaps (and bucket collisions).
> 
> Here is a link to a patch against 2.6.16.14:
> 
>   http://static.danplanet.com/dm-userspace/dmu-2.6.16.14-patch
> 
> Here are links to the userspace library, as well as the cow daemon,
> which provides qcow support:
> 
>   http://static.danplanet.com/dm-userspace/libdmu-0.2.tar.gz
>   http://static.danplanet.com/dm-userspace/cowd-0.1.tar.gz
> 
> (Note that the daemon is still rather rough, and the qcow
> implementation has some bugs.  However, it works for light testing and
> the occasional luck-assisted heavy testing)
> 
> As always, comments welcome and appreciated :)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-05-10 13:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-26 22:45 [RFC] dm-userspace Dan Smith
2006-04-26 22:45 ` Dan Smith
2006-04-26 22:55 ` Ming Zhang
2006-04-26 22:55   ` [dm-devel] " Ming Zhang
2006-04-26 23:07   ` Dan Smith
2006-04-26 23:07     ` [dm-devel] " Dan Smith
2006-04-26 23:41     ` Ming Zhang
2006-04-26 23:41       ` [dm-devel] " Ming Zhang
2006-04-27  2:22       ` Dan Smith
2006-04-27  2:22         ` [dm-devel] " Dan Smith
2006-04-27 13:09         ` Ming Zhang
2006-04-27 13:09           ` [dm-devel] " Ming Zhang
2006-05-09 23:02   ` Dan Smith
2006-05-09 23:02     ` [dm-devel] " Dan Smith
2006-05-10 13:27     ` Ming Zhang
2006-05-10 13:27       ` [dm-devel] " Ming Zhang
  -- strict thread matches above, loose matches on Subject: below --
2006-04-19 19:48 Dan Smith
2006-04-20 17:50 ` Eric Van Hensbergen
2006-04-20 20:06   ` Dan Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.