* snapshot scalability [not found] ` <44DB5BDD020000B60000BD96@lucius.provo.novell.com> @ 2006-08-10 10:46 ` Haripriya S 2006-09-27 5:24 ` dm-snapshot scalability - chained delta snapshots approach Haripriya S 0 siblings, 1 reply; 9+ messages in thread From: Haripriya S @ 2006-08-10 10:46 UTC (permalink / raw) To: dm-devel Hi, A co-worker recently did some tests on DM snapshots using bonnie, and here is a rough summary of what he got as write throughput: No Snapshots - 373 MB/s One Snapshots - 55 MB/s Two Snapshots - 16 MB/s Four Snapshots - 14 MB/s Eight Snapshots - 6.5 MB/s He is doing some more tests now to verify these results, but I wanted to quickly check with the dm snapshot community. Are there any current known scalability limits on snapshots and do the numbers mentioned here look normal ? Thanks and Regards, Haripriya ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-08-10 10:46 ` snapshot scalability Haripriya S @ 2006-09-27 5:24 ` Haripriya S 2006-09-27 9:17 ` Jan Blunck 0 siblings, 1 reply; 9+ messages in thread From: Haripriya S @ 2006-09-27 5:24 UTC (permalink / raw) To: dm-devel Hi, I had previously put out some performance numbers for origin writes where performance goes down drastically w.r.t. the number of snapshots. Going further, we identified one of the reasons for the performance drop with increase in number of snapshots as the COW copies that happen to every snapshot COW device when an origin write happens. We have currently experimented with dm-snapshot code with two different approaches and have got good performance numbers. I describe the first approach and the results here and appreciate your opinions and inputs on this. Approach 1 - Chained delta snapshots In the current design, each snapshot COW device contains all the diffs from the origin as exceptions. In the new scheme, each snapshot COW device contains only delta diffs from previous snapshot. So, assuming an origin has 16 snapshots, with the current design 16 COW copies will be done. With the new scheme, only 1 COW copy will be done, which means the performance will not degrade so rapidly when the number of snapshots increases. Lets assume the snapshots for a given origin volume are chained based on the order of creation. We define two chains, read chain is the same as the snapshot creation order, and write chain is in the reverse order. Origin write: When an origin write happens, for the copy-on-write, the current scheme creates pending exceptions to every snapshot in the chain. In the new scheme, we create copy-on-write exceptions for that block only to the first snapshot in the write chain (the most recent snapshot). Snapshot write: If the snapshot already contains an exception for the given block, and it was created due to a copy-on-write, then that block is copied to the previous snapshot (the next snapshot in the write chain). Otherwise the exception is created or block is overwritten. Snapshot read: If an exception for the block is found in the current snapshot's COW, then use that. Else traverse through the read chain and use the first exception for that block. If the block is not found in any of them, then use the origin. Origin read: No change Advantages: 1. Very simple, adds very few lines of code to existing dm-snap code. 2. Does not change the dm-snapshot architecture, and no changes required in LVM or EVMS 3. Since the COW copies due to origin write will always go to the most recent snapshot, snapshot COW devices can be created with less size. Whenever the COW allocation increase beyond say 90%, a new snapshot can be created which will take all the subsequent COW copies. This may avoid making COW devices invalid. Disadvantages: 1. snapshots which were independent previously are now dependent on each other. Corruption of one COW device will affect the other snapshots as well. 2. Will have a small impact in snapshot read performance, currently (if I understood right) since exceptions are in memory this may not be big. 3. There is a need to change the disk exception structure (we need at least a bit to indicate that a particular exception was created because of COW copy, instead of due to a snapshot write). But the comments in exception-store.c say * There is no backward or forward compatibility implemented, * snapshots with different disk versions than the kernel will * not be usable. It is expected that "lvcreate" will blank out * the start of a fresh COW device before calling the snapshot * constructor. so this may not be a huge problem. 4. When snapshots are deleted the COW exceptions have to be transferred to the next snapshot in the write chain. I have prototype code for this approach which works ok for the read/write paths, but has not been tested very thoroughly. There is still more work to be done in terms of snapshot deletion etc. Preliminary results using this code has suggested that the scalability of origin writes w.r.t. snapshots has improved tremendously. Preliminary numbers: Origin Write(using dd) Chained delta snapshot prototype Current DM design 1 snapshot 933 KB/s 950KB/s 4 snapshots 932 KB/s 720 KB/s 8 snapshots 927 KB/s 470 KB/s 16 snapshots 905 KB/s 257 KB/s We would love to hear your comments on this approach. Thanks and Regards, Haripriya S. >>> "Haripriya S" <SHARIPRIYA@novell.com> 08/10/06 4:16 PM >>> Hi, A co- worker recently did some tests on DM snapshots using bonnie, and here is a rough summary of what he got as write throughput: No Snapshots - 373 MB/s One Snapshots - 55 MB/s Two Snapshots - 16 MB/s Four Snapshots - 14 MB/s Eight Snapshots - 6.5 MB/s He is doing some more tests now to verify these results, but I wanted to quickly check with the dm snapshot community. Are there any current known scalability limits on snapshots and do the numbers mentioned here look normal ? Thanks and Regards, Haripriya -- dm- devel mailing list dm- devel@redhat.com https://www.redhat.com/mailman/listinfo/dm- devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-09-27 5:24 ` dm-snapshot scalability - chained delta snapshots approach Haripriya S @ 2006-09-27 9:17 ` Jan Blunck 2006-09-27 10:40 ` rgammans 2006-10-23 17:16 ` Molle Bestefich 0 siblings, 2 replies; 9+ messages in thread From: Jan Blunck @ 2006-09-27 9:17 UTC (permalink / raw) To: device-mapper development On Tue, Sep 26, Haripriya S wrote: > I had previously put out some performance numbers for origin writes > where performance goes down drastically w.r.t. the number of snapshots. > Going further, we identified one of the reasons for the performance drop > with increase in number of snapshots as the COW copies that happen to > every snapshot COW device when an origin write happens. Thanks a lot for your work in this area of the device-mapper. Your performance numbers show that work is really necessary here. > We have currently experimented with dm-snapshot code with two different > approaches and have got good performance numbers. I describe the first > approach and the results here and appreciate your opinions and inputs on > this. > > Approach 1 - Chained delta snapshots This means that every snapshot still has its own exception store. This would make deletion of snapshots unnecessary complex. It moves the work (copying of chunks) to the deletion of the snapshot. We discussed some of the ideas about snapshots here at the dm summit. The general ideas are as follows: - one exception store per origin device that is shared by all snapshots - don't keep the complete exception tables in memory all the time - limit kcopyd outstanding requests This would address the two biggest problems that I see with the snapshot target. The throughput issues should be addressed by only writing to one exception store. The memory issues should be addressed by the changes to the exception table handling. Although that includes a complete redesign of the exception store code. There are still ongoing discussions about the snapshot target. It would be nice if you have additional thoughts about this proposal. I guess it is similar to one of your prototypes. Jan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-09-27 9:17 ` Jan Blunck @ 2006-09-27 10:40 ` rgammans 2006-09-27 14:47 ` Bill Rugolsky Jr. 2006-10-23 17:16 ` Molle Bestefich 1 sibling, 1 reply; 9+ messages in thread From: rgammans @ 2006-09-27 10:40 UTC (permalink / raw) To: jblunck, device-mapper development [-- Attachment #1.1: Type: text/plain, Size: 1915 bytes --] On Wed, Sep 27, 2006 at 11:17:00AM +0200, Jan Blunck wrote: > We discussed some of the ideas about snapshots here at the dm summit. The > general ideas are as follows: > > - one exception store per origin device that is shared by all snapshots > - don't keep the complete exception tables in memory all the time > - limit kcopyd outstanding requests [snip] > target. The throughput issues should be addressed by only writing to one > exception store. The memory issues should be addressed by the changes to the I have a need fro a 'snapshot' type dm mode which has this characterstic. Eg, it leavse to origin device completely untouch by any changes. I was thinking that I'd have to code it myself from scratch as I could see any simple way of reuse the existing dm-snap code - especially since in my case the origin device will always be a physical volume (ie hda). However if I can make use of a new dm-exception-store and possibly even contribute to it this would be better. I was considering some sort of B or B+ -tree type arrangement as then we can use the buffer-cache (I'm assuming something similiar still exists after the bh -> bio rewrite but I 'm a lttle behind) to store the commonly referenced exceptions, which should keep the memory required by the tables down at times of high memory pressure. > There are still ongoing discussions about the snapshot target. It would be > nice if you have additional thoughts about this proposal. I guess it is > similar to one of your prototypes. Is this where those discussion are taking place if I want to help and particpate? TTFN -- Roger. Home| http://www.sandman.uklinux.net/ Master of Peng Shui. (Ancient oriental art of Penguin Arranging) Work|Independent Sys Consultant | http://www.computer-surgery.co.uk/ New key Fpr: 1227 ABB1 7545 77A7 6816 2D18 4EBC AA9B 8EE3 1DD3 [-- Attachment #1.2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-09-27 10:40 ` rgammans @ 2006-09-27 14:47 ` Bill Rugolsky Jr. 0 siblings, 0 replies; 9+ messages in thread From: Bill Rugolsky Jr. @ 2006-09-27 14:47 UTC (permalink / raw) To: device-mapper development On Wed, Sep 27, 2006 at 11:40:35AM +0100, rgammans@computer-surgery.co.uk wrote: > I was considering some sort of B or B+ -tree type arrangement as then > we can use the buffer-cache (I'm assuming something similiar still > exists after the bh -> bio rewrite but I 'm a lttle behind) to store > the commonly referenced exceptions, which should keep the memory > required by the tables down at times of high memory pressure. > > > There are still ongoing discussions about the snapshot target. It would be > > nice if you have additional thoughts about this proposal. I guess it is > > similar to one of your prototypes. > > Is this where those discussion are taking place if I want to help > and particpate? Daniel Phillips's csnap target was based on BTree design: http://sources.redhat.com/cluster/csnap/ There are papers there describing the design in great detail. Unfortunately, his various projects (csnap, ddraid, ...) seem to have been abandoned. Regards, Bill Rugolsky ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-09-27 9:17 ` Jan Blunck 2006-09-27 10:40 ` rgammans @ 2006-10-23 17:16 ` Molle Bestefich 2006-10-24 9:52 ` Jan Blunck 2006-10-26 8:29 ` Haripriya S 1 sibling, 2 replies; 9+ messages in thread From: Molle Bestefich @ 2006-10-23 17:16 UTC (permalink / raw) To: device-mapper development Haripriya S wrote: > Approach 1 - Chained delta snapshots > > Advantages: > 1. Very simple, adds very few lines of code to existing dm-snap code. Nice. > 2. Does not change the dm-snapshot architecture, and no changes > required in LVM or EVMS Nice. > 3. Since the COW copies due to origin write will always go to the most > recent snapshot, snapshot COW devices can be created with less size. > Whenever the COW allocation increase beyond say 90%, a new snapshot can > be created which will take all the subsequent COW copies. This may avoid > making COW devices invalid. Nice !!!!! > Disadvantages: > 1. snapshots which were independent previously are now dependent on > each other. Corruption of one COW device will affect the other snapshots > as well. Fixing dm-snapshot so devices do not get corrupted would make dm-snapshot immensely more useful. One way of doing that is to provoke bugs to more quickly become visible to the user. I think your patch might accomplish this. Another way is to keep the code simple. I'd say your patch does that. (A third way is extensive testing, and a fourth is mathematically proving that the code is sane. But who has the time and energy ;-).) Overall, what you're doing looks like a good thing for stability. > 2. Will have a small impact in snapshot read performance, > currently (if I understood right) Minor disadvantage compared to the massive improvements seen in write speed. Can be optimized on later. (Fx. caching a list of which exceptions exist other places in the chain..) > 3. There is a need to change the disk exception structure Hopefully there's a version number on disk which allows incompatible tools to skip lv's or whatever. If not, this is a great excuse to create one. > 4. When snapshots are deleted the COW exceptions have to be transferred > to the next snapshot in the write chain. Jan Blunck wrote: > This means that every snapshot still has its own exception store. > This would make deletion of snapshots unnecessary complex. Complex, how? Necessary operations (in order listed): * Acquire exclusive lock on this snapshot. * Check that next snapshot has room for exceptions, abort if not. * Acquire exclusive lock on next snapshot. * Move all exceptions to next snapshot. * Unlock next snapshot. * Remove this snapshot. * Done... Sounds simple to me, but maybe I'm missing the point. > It moves the work (copying of chunks) > to the deletion of the snapshot. Snapshot deletion is usually a "low privilege" task, something done to redeem disk space on a periodic schedule. It is not something a user absolutely needs to finish immediately. Sounds like a very fair deal to me, but then again, I'm just a user. > We discussed some of the ideas about snapshots here at the dm summit. The > general ideas are as follows: > > - one exception store per origin device that is shared by all snapshots Now that sounds complex. > Although that includes a complete redesign of the exception store code. Especially when you say stuff like that :-). > The throughput issues should be addressed by only > writing to one exception store. Wouldn't this make debugging more complex, and further add to the difficulty of snapshot resizing? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-10-23 17:16 ` Molle Bestefich @ 2006-10-24 9:52 ` Jan Blunck 2006-10-25 12:09 ` Haripriya S 2006-10-26 8:29 ` Haripriya S 1 sibling, 1 reply; 9+ messages in thread From: Jan Blunck @ 2006-10-24 9:52 UTC (permalink / raw) To: dm-devel On Mon, Oct 23, Molle Bestefich wrote: > >This means that every snapshot still has its own exception store. > >This would make deletion of snapshots unnecessary complex. > > Complex, how? > > Necessary operations (in order listed): > * Acquire exclusive lock on this snapshot. > * Check that next snapshot has room for exceptions, abort if not. > * Acquire exclusive lock on next snapshot. > * Move all exceptions to next snapshot. > * Unlock next snapshot. > * Remove this snapshot. > * Done... > > Sounds simple to me, but maybe I'm missing the point. Hmm, sounds simple. Somehow I can't remember exactly where I thought the problem is ... > >We discussed some of the ideas about snapshots here at the dm summit. The > >general ideas are as follows: > > > >- one exception store per origin device that is shared by all snapshots > > Now that sounds complex. But that is something already implemented for clustered snapshots although that is userspace code. > >Although that includes a complete redesign of the exception store code. > > Especially when you say stuff like that :-). > The chained-snapshots approach needs that too. > >The throughput issues should be addressed by only > >writing to one exception store. > > Wouldn't this make debugging more complex, and further add to > the difficulty of snapshot resizing? Resizing? Nope, you only need to resize the exception store thats it. Resizing the chained-snapshots approach is complex however: in the worst case you have to move the exception stores to get enough free space. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-10-24 9:52 ` Jan Blunck @ 2006-10-25 12:09 ` Haripriya S 0 siblings, 0 replies; 9+ messages in thread From: Haripriya S @ 2006-10-25 12:09 UTC (permalink / raw) To: dm-devel >>> Jan Blunck <jblunck@suse.de> 10/24/06 3:22 PM >>> On Mon, Oct 23, Molle Bestefich wrote: > > >Although that includes a complete redesign of the exception store code. > > > > Especially when you say stuff like that :- ). > > > The chained- snapshots approach needs that too. In the chained snapshots, the only addition is a way to tell if an exception is to be preserved or can be written over. So for every disk-exception an additional field is required (Alasdair also recently suggested that we could use a bitmap to save space). So I would say this is not a major re-architecture of the disk exception structures but a simple (but incompatible) format change. > > >The throughput issues should be addressed by only > > >writing to one exception store. > > > > Wouldn't this make debugging more complex, and further add to > > the difficulty of snapshot resizing? > Resizing? Nope, you only need to resize the exception store thats it. Resizing > the chained- snapshots approach is complex however: in the worst case you have > to move the exception stores to get enough free space. I agree that there is work to be done while resizing. It seems simple to code though and can be done similar to a snapshot delete. If a snapshot is being resized, and will lose exception data, then we need to move the exception and data to the first snapshot after this snapshot in the write chain which has space to hold that data. Yes, data move is involved here. btw I couldn't figure out how resize will work with the common exception store approach. Can you please explain that in detail ? Thanks and Regards, Haripriya ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-snapshot scalability - chained delta snapshots approach 2006-10-23 17:16 ` Molle Bestefich 2006-10-24 9:52 ` Jan Blunck @ 2006-10-26 8:29 ` Haripriya S 1 sibling, 0 replies; 9+ messages in thread From: Haripriya S @ 2006-10-26 8:29 UTC (permalink / raw) To: device-mapper development >>> "Molle Bestefich" <molle.bestefich@gmail.com> 10/23/06 10:46 PM >>> > Complex, how? > > Necessary operations (in order listed): > * Acquire exclusive lock on this snapshot. > * Check that next snapshot has room for exceptions, abort if not. If the next snapshot in the write chain does not have room, then we need to go through the list of earlier snapshots and move the exceptions to the first snapshot which has room. This is because the earlier snapshots depend on the data being copied in at least one of the later snapshots. If the next snapshot is the earliest snapshot, then we can abort. > * Acquire exclusive lock on next snapshot. > * Move all exceptions to next snapshot. > * Unlock next snapshot. > * Remove this snapshot. > * Done... Yes. Thanks and Regards, Haripriya ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-10-26 8:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <44DA246B020000B60000BD22@lucius.provo.novell.com>
[not found] ` <44DB5BDD020000B60000BD96@lucius.provo.novell.com>
2006-08-10 10:46 ` snapshot scalability Haripriya S
2006-09-27 5:24 ` dm-snapshot scalability - chained delta snapshots approach Haripriya S
2006-09-27 9:17 ` Jan Blunck
2006-09-27 10:40 ` rgammans
2006-09-27 14:47 ` Bill Rugolsky Jr.
2006-10-23 17:16 ` Molle Bestefich
2006-10-24 9:52 ` Jan Blunck
2006-10-25 12:09 ` Haripriya S
2006-10-26 8:29 ` Haripriya S
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.