snapshot scalability

All of lore.kernel.org
 help / color / mirror / Atom feed

* snapshot scalability
       [not found] ` <44DB5BDD020000B60000BD96@lucius.provo.novell.com>
@ 2006-08-10 10:46   ` Haripriya S
  2006-09-27  5:24     ` dm-snapshot scalability - chained delta snapshots approach Haripriya S
  0 siblings, 1 reply; 9+ messages in thread
From: Haripriya S @ 2006-08-10 10:46 UTC (permalink / raw)
  To: dm-devel

Hi,

A co-worker recently did some tests on DM snapshots using bonnie, and
here is a rough summary of what he got as write throughput:

No Snapshots     - 373 MB/s
One Snapshots   - 55 MB/s
Two Snapshots  - 16 MB/s
Four Snapshots  - 14 MB/s
Eight Snapshots  - 6.5 MB/s

He is doing some more tests now to verify these results, but I wanted
to quickly check with the dm snapshot community. Are there any current
known scalability limits on snapshots and do the numbers mentioned here
look normal ?

Thanks and Regards,
Haripriya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-08-10 10:46   ` snapshot scalability Haripriya S
@ 2006-09-27  5:24     ` Haripriya S
  2006-09-27  9:17       ` Jan Blunck
  0 siblings, 1 reply; 9+ messages in thread
From: Haripriya S @ 2006-09-27  5:24 UTC (permalink / raw)
  To: dm-devel

Hi,

I had previously put out some performance numbers for origin writes
where performance goes down drastically w.r.t. the number of snapshots.
Going further, we identified one of the reasons for the performance drop
with increase in number of snapshots as the COW copies that happen to
every snapshot COW device when an origin write happens.

We have currently experimented with dm-snapshot code with two different
approaches and have got good performance numbers. I describe the first
approach and the results here and appreciate your opinions and inputs on
this.

Approach 1 - Chained delta snapshots

In the current design, each snapshot COW device contains all the diffs
from the origin as exceptions. In the new scheme, each snapshot COW
device contains only delta diffs from previous snapshot. So, assuming an
origin has 16 snapshots, with the current design 16 COW copies will be
done. With the new scheme, only 1 COW copy will be done, which means the
performance will not degrade so rapidly when the number of snapshots
increases.

Lets assume the snapshots for a given origin volume are chained based
on the order of creation. We define two chains, read chain is the same
as the snapshot creation order, and write chain is in the reverse
order.

Origin write: 
When an origin write happens, for the copy-on-write, the current scheme
creates pending   exceptions to every snapshot in the chain. In the new
scheme, we create copy-on-write exceptions for that block only to the
first snapshot in the write chain (the most recent snapshot).

Snapshot write:
If the snapshot already contains an exception for the given block, and
it was created due to a copy-on-write, then that block is copied to the
previous snapshot (the next snapshot in the write chain). Otherwise the
exception is created or block is overwritten.

Snapshot read:
If an exception for the block is found in the current snapshot's COW,
then use that. Else traverse through the read chain and use the first
exception for that block. If the block is not found in any of them, then
use the origin.

Origin read:
No change

Advantages:
1. Very simple, adds very few lines of code to existing dm-snap code.
2. Does not change the dm-snapshot architecture, and no changes
required in LVM or EVMS
3. Since the COW copies due to origin write will always go to the most
recent snapshot, snapshot COW devices can be created with less size.
Whenever the COW allocation increase beyond say 90%, a new snapshot can
be created which will take all the subsequent COW copies. This may avoid
making COW devices invalid.

Disadvantages:
1. snapshots which were independent previously are now dependent on
each other. Corruption of one COW device will affect the other snapshots
as well.
2. Will have a small impact in snapshot read performance, currently (if
I understood right) since exceptions are in memory this may not be big.
3. There is a need to change the disk exception structure (we need at
least a bit to indicate that a particular exception was created because
of COW copy, instead of due to a snapshot write). But the comments in
exception-store.c say 
    * There is no backward or forward compatibility implemented,
    * snapshots with different disk versions than the kernel will
    * not be usable.  It is expected that "lvcreate" will blank out
    * the start of a fresh COW device before calling the snapshot
   * constructor.
so this may not be a huge problem.
4. When snapshots are deleted the COW exceptions have to be transferred
to the next snapshot in the write chain.

I have prototype code for this approach which works ok for the
read/write paths, but has not been tested very thoroughly. There is
still more work to be done in terms of snapshot deletion etc.
Preliminary results using this code has suggested that the scalability
of origin writes w.r.t. snapshots has improved tremendously.

Preliminary numbers:

Origin Write(using dd)    Chained delta snapshot prototype    Current
DM design
  1 snapshot                                      933 KB/s             
            950KB/s
  4 snapshots                                    932 KB/s              
           720 KB/s
  8 snapshots                                    927 KB/s              
           470 KB/s
  16 snapshots                                  905 KB/s               
          257 KB/s

We would love to hear your comments on this approach.

Thanks and Regards,
Haripriya S.

>>> "Haripriya S" <SHARIPRIYA@novell.com> 08/10/06 4:16 PM >>> 
Hi,

A co- worker recently did some tests on DM snapshots using bonnie, and
here is a rough summary of what he got as write throughput:

No Snapshots     -  373 MB/s
One Snapshots   -  55 MB/s
Two Snapshots  -  16 MB/s
Four Snapshots  -  14 MB/s
Eight Snapshots  -  6.5 MB/s

He is doing some more tests now to verify these results, but I wanted
to quickly check with the dm snapshot community. Are there any current
known scalability limits on snapshots and do the numbers mentioned
here
look normal ?

Thanks and Regards,
Haripriya

--
dm- devel mailing list
dm- devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm- devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-09-27  5:24     ` dm-snapshot scalability - chained delta snapshots approach Haripriya S
@ 2006-09-27  9:17       ` Jan Blunck
  2006-09-27 10:40         ` rgammans
  2006-10-23 17:16         ` Molle Bestefich
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Blunck @ 2006-09-27  9:17 UTC (permalink / raw)
  To: device-mapper development

On Tue, Sep 26, Haripriya S wrote:

> I had previously put out some performance numbers for origin writes
> where performance goes down drastically w.r.t. the number of snapshots.
> Going further, we identified one of the reasons for the performance drop
> with increase in number of snapshots as the COW copies that happen to
> every snapshot COW device when an origin write happens.

Thanks a lot for your work in this area of the device-mapper. Your performance
numbers show that work is really necessary here.

> We have currently experimented with dm-snapshot code with two different
> approaches and have got good performance numbers. I describe the first
> approach and the results here and appreciate your opinions and inputs on
> this.
> 
> Approach 1 - Chained delta snapshots

This means that every snapshot still has its own exception store. This would
make deletion of snapshots unnecessary complex. It moves the work (copying of
chunks) to the deletion of the snapshot.

We discussed some of the ideas about snapshots here at the dm summit. The
general ideas are as follows:

- one exception store per origin device that is shared by all snapshots
- don't keep the complete exception tables in memory all the time
- limit kcopyd outstanding requests

This would address the two biggest problems that I see with the snapshot
target. The throughput issues should be addressed by only writing to one
exception store. The memory issues should be addressed by the changes to the
exception table handling. Although that includes a complete redesign of the
exception store code.

There are still ongoing discussions about the snapshot target. It would be
nice if you have additional thoughts about this proposal. I guess it is
similar to one of your prototypes.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-09-27  9:17       ` Jan Blunck
@ 2006-09-27 10:40         ` rgammans
  2006-09-27 14:47           ` Bill Rugolsky Jr.
  2006-10-23 17:16         ` Molle Bestefich
  1 sibling, 1 reply; 9+ messages in thread
From: rgammans @ 2006-09-27 10:40 UTC (permalink / raw)
  To: jblunck, device-mapper development

[-- Attachment #1.1: Type: text/plain, Size: 1915 bytes --]

On Wed, Sep 27, 2006 at 11:17:00AM +0200, Jan Blunck wrote:
> We discussed some of the ideas about snapshots here at the dm summit. The
> general ideas are as follows:
> 
> - one exception store per origin device that is shared by all snapshots
> - don't keep the complete exception tables in memory all the time
> - limit kcopyd outstanding requests
[snip]
> target. The throughput issues should be addressed by only writing to one
> exception store. The memory issues should be addressed by the changes to the

I have a need fro a 'snapshot' type dm mode which has this
characterstic. Eg, it leavse to origin device completely untouch by any
changes. 

I was thinking that I'd have to code it myself from scratch as I could
see any simple way of reuse the existing dm-snap code - especially since
in my case the origin device will always be a physical volume (ie hda).

However if I can make use of a new dm-exception-store and possibly
even contribute to it this would be better.

I was considering some sort of B or B+ -tree type arrangement as then
we can use the buffer-cache (I'm assuming something similiar still
exists after the bh -> bio rewrite but I 'm a lttle behind) to store
the commonly referenced exceptions, which should keep the memory
required by the tables down at times of high memory pressure.

> There are still ongoing discussions about the snapshot target. It would be
> nice if you have additional thoughts about this proposal. I guess it is
> similar to one of your prototypes.

Is this where those discussion are taking place if I want to help 
and particpate?

TTFN
-- 
Roger. 	                        Home| http://www.sandman.uklinux.net/
Master of Peng Shui.      (Ancient oriental art of Penguin Arranging)
Work|Independent Sys Consultant | http://www.computer-surgery.co.uk/
 New key Fpr: 1227 ABB1 7545 77A7 6816  2D18 4EBC AA9B 8EE3 1DD3

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-09-27 10:40         ` rgammans
@ 2006-09-27 14:47           ` Bill Rugolsky Jr.
  0 siblings, 0 replies; 9+ messages in thread
From: Bill Rugolsky Jr. @ 2006-09-27 14:47 UTC (permalink / raw)
  To: device-mapper development

On Wed, Sep 27, 2006 at 11:40:35AM +0100, rgammans@computer-surgery.co.uk wrote:
> I was considering some sort of B or B+ -tree type arrangement as then
> we can use the buffer-cache (I'm assuming something similiar still
> exists after the bh -> bio rewrite but I 'm a lttle behind) to store
> the commonly referenced exceptions, which should keep the memory
> required by the tables down at times of high memory pressure.
> 
> > There are still ongoing discussions about the snapshot target. It would be
> > nice if you have additional thoughts about this proposal. I guess it is
> > similar to one of your prototypes.
> 
> Is this where those discussion are taking place if I want to help 
> and particpate?
 
Daniel Phillips's csnap target was based on BTree design:

  http://sources.redhat.com/cluster/csnap/

There are papers there describing the design in great detail.

Unfortunately, his various projects (csnap, ddraid, ...) seem to have
been abandoned.

Regards,

    Bill Rugolsky

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-09-27  9:17       ` Jan Blunck
  2006-09-27 10:40         ` rgammans
@ 2006-10-23 17:16         ` Molle Bestefich
  2006-10-24  9:52           ` Jan Blunck
  2006-10-26  8:29           ` Haripriya S
  1 sibling, 2 replies; 9+ messages in thread
From: Molle Bestefich @ 2006-10-23 17:16 UTC (permalink / raw)
  To: device-mapper development

Haripriya S wrote:
> Approach 1 - Chained delta snapshots
>
> Advantages:
> 1. Very simple, adds very few lines of code to existing dm-snap code.

Nice.

> 2. Does not change the dm-snapshot architecture, and no changes
> required in LVM or EVMS

Nice.

> 3. Since the COW copies due to origin write will always go to the most
> recent snapshot, snapshot COW devices can be created with less size.
> Whenever the COW allocation increase beyond say 90%, a new snapshot can
> be created which will take all the subsequent COW copies. This may avoid
> making COW devices invalid.

Nice !!!!!

> Disadvantages:
> 1. snapshots which were independent previously are now dependent on
> each other. Corruption of one COW device will affect the other snapshots
> as well.

Fixing dm-snapshot so devices do not get corrupted would make
dm-snapshot immensely more useful.
One way of doing that is to provoke bugs to more quickly become
visible to the user.  I think your patch might accomplish this.
Another way is to keep the code simple.  I'd say your patch does that.

(A third way is extensive testing, and a fourth is mathematically
proving that the code is sane.  But who has the time and energy ;-).)

Overall, what you're doing looks like a good thing for stability.

> 2. Will have a small impact in snapshot read performance,
> currently (if I understood right)

Minor disadvantage compared to the massive improvements seen in write speed.
Can be optimized on later.

(Fx. caching a list of which exceptions exist other places in the chain..)

> 3. There is a need to change the disk exception structure

Hopefully there's a version number on disk which allows incompatible
tools to skip lv's or whatever.

If not, this is a great excuse to create one.

> 4. When snapshots are deleted the COW exceptions have to be transferred
> to the next snapshot in the write chain.

Jan Blunck wrote:
> This means that every snapshot still has its own exception store.
> This would make deletion of snapshots unnecessary complex.

Complex, how?

Necessary operations (in order listed):
 * Acquire exclusive lock on this snapshot.
 * Check that next snapshot has room for exceptions, abort if not.
 * Acquire exclusive lock on next snapshot.
 * Move all exceptions to next snapshot.
 * Unlock next snapshot.
 * Remove this snapshot.
 * Done...

Sounds simple to me, but maybe I'm missing the point.

> It moves the work (copying of chunks)
> to the deletion of the snapshot.

Snapshot deletion is usually a "low privilege" task, something done
to redeem disk space on a periodic schedule.  It is not something a
user absolutely needs to finish immediately.  Sounds like a very
fair deal to me, but then again, I'm just a user.

> We discussed some of the ideas about snapshots here at the dm summit. The
> general ideas are as follows:
>
> - one exception store per origin device that is shared by all snapshots

Now that sounds complex.

> Although that includes a complete redesign of the exception store code.

Especially when you say stuff like that :-).

> The throughput issues should be addressed by only
> writing to one exception store.

Wouldn't this make debugging more complex, and further add to
the difficulty of snapshot resizing?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-10-23 17:16         ` Molle Bestefich
@ 2006-10-24  9:52           ` Jan Blunck
  2006-10-25 12:09             ` Haripriya S
  2006-10-26  8:29           ` Haripriya S
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Blunck @ 2006-10-24  9:52 UTC (permalink / raw)
  To: dm-devel

On Mon, Oct 23, Molle Bestefich wrote:

> >This means that every snapshot still has its own exception store.
> >This would make deletion of snapshots unnecessary complex.
> 
> Complex, how?
> 
> Necessary operations (in order listed):
> * Acquire exclusive lock on this snapshot.
> * Check that next snapshot has room for exceptions, abort if not.
> * Acquire exclusive lock on next snapshot.
> * Move all exceptions to next snapshot.
> * Unlock next snapshot.
> * Remove this snapshot.
> * Done...
> 
> Sounds simple to me, but maybe I'm missing the point.

Hmm, sounds simple. Somehow I can't remember exactly where I thought the
problem is ...

> >We discussed some of the ideas about snapshots here at the dm summit. The
> >general ideas are as follows:
> >
> >- one exception store per origin device that is shared by all snapshots
> 
> Now that sounds complex.

But that is something already implemented for clustered snapshots although
that is userspace code.

> >Although that includes a complete redesign of the exception store code.
> 
> Especially when you say stuff like that :-).
> 

The chained-snapshots approach needs that too.

> >The throughput issues should be addressed by only
> >writing to one exception store.
> 
> Wouldn't this make debugging more complex, and further add to
> the difficulty of snapshot resizing?

Resizing? Nope, you only need to resize the exception store thats it. Resizing
the chained-snapshots approach is complex however: in the worst case you have
to move the exception stores to get enough free space.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-10-24  9:52           ` Jan Blunck
@ 2006-10-25 12:09             ` Haripriya S
  0 siblings, 0 replies; 9+ messages in thread
From: Haripriya S @ 2006-10-25 12:09 UTC (permalink / raw)
  To: dm-devel

>>> Jan Blunck <jblunck@suse.de> 10/24/06 3:22 PM >>> 
On Mon, Oct 23, Molle Bestefich wrote:

> > >Although that includes a complete redesign of the exception store
code.
> > 
> > Especially when you say stuff like that :- ).
> > 

> The chained- snapshots approach needs that too.

 In the chained snapshots, the only addition is a way to tell if an
exception 
 is to be preserved or can be written over. So for every disk-exception
an 
additional field is required (Alasdair also recently suggested that we
could use 
a bitmap to save space). So I would say this is not a major
re-architecture of 
the disk exception structures but a simple (but incompatible) format
change.

> > >The throughput issues should be addressed by only
> > >writing to one exception store.
> > 
> > Wouldn't this make debugging more complex, and further add to
> > the difficulty of snapshot resizing?

> Resizing? Nope, you only need to resize the exception store thats it.
Resizing
> the chained- snapshots approach is complex however: in the worst case
you have
> to move the exception stores to get enough free space.

I agree that there is work to be done while resizing. It seems simple
to code 
though and can be done similar to a snapshot delete. If a snapshot is
being 
resized, and will lose exception data, then we need to move the
exception 
and data to the first snapshot after this snapshot in the write chain
which has 
space to hold that data. Yes, data move is involved here. btw I
couldn't figure out 
how resize will work with the common exception store approach. Can you

please explain that in detail ?

Thanks and Regards,
Haripriya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-snapshot scalability - chained delta snapshots approach
  2006-10-23 17:16         ` Molle Bestefich
  2006-10-24  9:52           ` Jan Blunck
@ 2006-10-26  8:29           ` Haripriya S
  1 sibling, 0 replies; 9+ messages in thread
From: Haripriya S @ 2006-10-26  8:29 UTC (permalink / raw)
  To: device-mapper development

>>> "Molle Bestefich" <molle.bestefich@gmail.com> 10/23/06 10:46 PM >>>

> Complex, how?
> 
> Necessary operations (in order listed):
>  * Acquire exclusive lock on this snapshot.
>  * Check that next snapshot has room for exceptions, abort if not.

If the next snapshot in the write chain does not have
room, then we need to go through the list of earlier 
snapshots and move the exceptions to the first snapshot
which has room. This is because the earlier snapshots depend
on the data being copied in at least one of the later snapshots.
If the next snapshot is the earliest snapshot, then we can abort.

>  * Acquire exclusive lock on next snapshot.
>  * Move all exceptions to next snapshot.
>  * Unlock next snapshot.
>  * Remove this snapshot.
>  * Done...

Yes.

Thanks and Regards,
Haripriya

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-10-26  8:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <44DA246B020000B60000BD22@lucius.provo.novell.com>
     [not found] ` <44DB5BDD020000B60000BD96@lucius.provo.novell.com>
2006-08-10 10:46   ` snapshot scalability Haripriya S
2006-09-27  5:24     ` dm-snapshot scalability - chained delta snapshots approach Haripriya S
2006-09-27  9:17       ` Jan Blunck
2006-09-27 10:40         ` rgammans
2006-09-27 14:47           ` Bill Rugolsky Jr.
2006-10-23 17:16         ` Molle Bestefich
2006-10-24  9:52           ` Jan Blunck
2006-10-25 12:09             ` Haripriya S
2006-10-26  8:29           ` Haripriya S

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.