* [PATCH RFC] dm snapshot: shared exception store
@ 2008-08-04 8:22 FUJITA Tomonori
2008-08-06 19:14 ` Mikulas Patocka
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: FUJITA Tomonori @ 2008-08-04 8:22 UTC (permalink / raw)
To: dm-devel; +Cc: agk
This is a new implementation of dm-snapshot.
The important design differences from the current dm-snapshot are:
- It uses one exception store per origin device that is shared by all snapshots.
- It doesn't keep the complete exception tables in memory.
I took the exception store code of Zumastor (http://zumastor.org/).
Zumastor is remote replication software (a local server sends the
delta between two snapshots to a remote server, and then the remote
server applies the delta in an atomic manner. So the data on the
remote server is always consistent).
Zumastor snapshot fulfills the above two requirements, but it is
implemented in user space. The dm kernel module sends the information
of a request to user space and the user space daemon tells the kernel
what to do.
Zumastor user-space daemon needs to take care about replication so the
user-space approach makes sense but I think that the pure user-space
approach is an overkill just for snapshot. I prefer to implement
snapshot in kernel space (as the current dm-snapshot does). I think
that we can add features for remote replication software like Zumastor
to it, that is, features to provide user space a delta between two
snapshots and apply the delta in an atomic manner (via ioctl or
something else).
Note that the code is still in a very early stage. There are lots of
TODO items:
- snapshot deletion support
- writable snapshot support
- protection for unexpected events (probably journaling)
- performance improvement (handling exception cache and format, locking, etc)
- better integration with the current snapshot code
- improvement on error handling
- cleanups
- generating a delta between two snapshots
- applying a delta to in a atomic manner
The patch against 2.6.26 is available at:
http://www.kernel.org/pub/linux/kernel/people/tomo/dm-snap/0001-dm-snapshot-dm-snapshot-shared-exception-store.patch
Here's an example (/dev/sdb1 as an origin device and /dev/sdg1 as a cow device):
- creates the set of an origin and a cow:
flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot-origin /dev/sdb1 /dev/sdg1 P2 16 |dmsetup create work
- no snapshot yet:
flax:~# dmsetup status
work: 0 125017767 snapshot-origin : no snapshot
- creates one snapshot (the id of the snapshot is 0):
flax:~# dmsetup message /dev/mapper/work 0 snapshot create 0
- creates one snapshot (the id of the snapshot is 1):
flax:~# dmsetup message /dev/mapper/work 0 snapshot create 1
- there are two snapshots (#0 and #1):
flax:~# dmsetup status
work: 0 125017767 snapshot-origin 0 1
- let's access to the snapshots:
flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 0|dmsetup create work-snap0
flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 1|dmsetup create work-snap1
flax:~# ls /dev/mapper/
control work work-snap0 work-snap1
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-04 8:22 [PATCH RFC] dm snapshot: shared exception store FUJITA Tomonori @ 2008-08-06 19:14 ` Mikulas Patocka 2008-08-09 5:01 ` FUJITA Tomonori 2008-08-12 12:56 ` Daniel Phillips 2008-08-12 19:07 ` Daniel Phillips 2 siblings, 1 reply; 20+ messages in thread From: Mikulas Patocka @ 2008-08-06 19:14 UTC (permalink / raw) To: agk; +Cc: dm-devel Hi I looked at it. Alasdair had some concerns about the interface on the phone call. From my point of view, the Fujita's interface is OK (using messages to manipulate the snapshot storage and using targets to access the snapshots). Alasdair, could you be pls. more specific about it? What I would propose to change in the upcoming redesign: - develop it as a separate target, not patch against dm-snapshot. The code reuse from dm-snapshot is minimal, and keeping the old code around will likely consume more coding time then the potential code reuse will save. - drop that limitation on maximum 64 snapshots. If we are going to redesign it, we should design it without such a limit, so that we wouldn't have to redesign it again (why we need more than 64 --- for example to take periodic snapshots every few minutes to record system activity). The limit on number of snapshots can be dropped if we index b-tree nodes by a key that contains chunk number and range of snapshot numbers where this applies. - do some cache for metadata, don't read the b-tree from the root node from disk all the time. Ideally the cache should be integrated with page cache so that it's size would tune automatically (I'm not sure if it's possible to cleanly code it, though). - the b-tree is good structure, I'd create log-structured filesystem to hold the b-tree. The advantage is that it will require less synchronization overhead in clustering. Also, log-structured filesystem will bring you crash recovery (with minimum coding overhead) and it has very good write performance. - deleting the snapshot --- this needs to walk the whole b-tree --- it is slow. Keeping another b-tree of chunks belonging to the given snapshot would be overkill. I think the best solution would be to split the device into large areas and use per-snapshot bitmap that says if the snapshot has some exceptions allocated in the pertaining area (similar to the dirty-bitmap of raid1). For short lived snapshots this will save walking the b-tree. For long-lived snapshots there is no help to speed it up... But delete performance is not that critical anyway because deleting can be done asynchronously without user waiting for it. Mikulas > This is a new implementation of dm-snapshot. > > The important design differences from the current dm-snapshot are: > > - It uses one exception store per origin device that is shared by all snapshots. > - It doesn't keep the complete exception tables in memory. > > I took the exception store code of Zumastor (http://zumastor.org/). > > Zumastor is remote replication software (a local server sends the > delta between two snapshots to a remote server, and then the remote > server applies the delta in an atomic manner. So the data on the > remote server is always consistent). > > Zumastor snapshot fulfills the above two requirements, but it is > implemented in user space. The dm kernel module sends the information > of a request to user space and the user space daemon tells the kernel > what to do. > > Zumastor user-space daemon needs to take care about replication so the > user-space approach makes sense but I think that the pure user-space > approach is an overkill just for snapshot. I prefer to implement > snapshot in kernel space (as the current dm-snapshot does). I think > that we can add features for remote replication software like Zumastor > to it, that is, features to provide user space a delta between two > snapshots and apply the delta in an atomic manner (via ioctl or > something else). > > Note that the code is still in a very early stage. There are lots of > TODO items: > > - snapshot deletion support > - writable snapshot support > - protection for unexpected events (probably journaling) > - performance improvement (handling exception cache and format, locking, etc) > - better integration with the current snapshot code > - improvement on error handling > - cleanups > - generating a delta between two snapshots > - applying a delta to in a atomic manner > > The patch against 2.6.26 is available at: > > http://www.kernel.org/pub/linux/kernel/people/tomo/dm-snap/0001-dm-snapshot-dm-snapshot-shared-exception-store.patch > > > Here's an example (/dev/sdb1 as an origin device and /dev/sdg1 as a cow device): > > - creates the set of an origin and a cow: > > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot-origin /dev/sdb1 /dev/sdg1 P2 16 |dmsetup create work > > - no snapshot yet: > > flax:~# dmsetup status > work: 0 125017767 snapshot-origin : no snapshot > > > - creates one snapshot (the id of the snapshot is 0): > > flax:~# dmsetup message /dev/mapper/work 0 snapshot create 0 > > > - creates one snapshot (the id of the snapshot is 1): > > flax:~# dmsetup message /dev/mapper/work 0 snapshot create 1 > > > - there are two snapshots (#0 and #1): > > flax:~# dmsetup status > work: 0 125017767 snapshot-origin 0 1 > > > - let's access to the snapshots: > > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 0|dmsetup create work-snap0 > flax:~# echo 0 `blockdev --getsize /dev/sdb1` snapshot /dev/sdb1 1|dmsetup create work-snap1 > > flax:~# ls /dev/mapper/ > control work work-snap0 work-snap1 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-06 19:14 ` Mikulas Patocka @ 2008-08-09 5:01 ` FUJITA Tomonori 2008-08-11 22:12 ` Mikulas Patocka 0 siblings, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-09 5:01 UTC (permalink / raw) To: mpatocka; +Cc: dm-devel, agk On Wed, 6 Aug 2008 15:14:50 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: > Hi > > I looked at it. Thanks! I didn't expect someone read the patch. I'll submit patches in more proper manner next time. > Alasdair had some concerns about the interface on the phone call. From my > point of view, the Fujita's interface is OK (using messages to manipulate > the snapshot storage and using targets to access the snapshots). Alasdair, > could you be pls. more specific about it? Yeah, we can't use dmsetup create/destroy to create/delete snapshots. We need something different. I have no strong opinion about it. Whatever interface is fine by me as long as it works. > What I would propose to change in the upcoming redesign: > > - develop it as a separate target, not patch against dm-snapshot. The code > reuse from dm-snapshot is minimal, and keeping the old code around will > likely consume more coding time then the potential code reuse will save. It's fine by me if the maintainer prefers it. Alasdair? > - drop that limitation on maximum 64 snapshots. If we are going to > redesign it, we should design it without such a limit, so that we wouldn't > have to redesign it again (why we need more than 64 --- for example to > take periodic snapshots every few minutes to record system activity). The > limit on number of snapshots can be dropped if we index b-tree nodes by a > key that contains chunk number and range of snapshot numbers where this > applies. Unfortunately it's the limitation of the current b-tree format. As far as I know, there is no code that we can use, which supports unlimited and writable snapshot. > - do some cache for metadata, don't read the b-tree from the root node > from disk all the time. The current code already does. > Ideally the cache should be integrated with page > cache so that it's size would tune automatically (I'm not sure if it's > possible to cleanly code it, though). Agreed. The current code invents the own cache code. I don't like it but there is no other option. > - the b-tree is good structure, I'd create log-structured filesystem to > hold the b-tree. The advantage is that it will require less > synchronization overhead in clustering. Also, log-structured filesystem > will bring you crash recovery (with minimum coding overhead) and it has > very good write performance. A log-structured filesystem is pretty complex. Even though we don't need a complete log-structured filesystem, it's still too complex, IMO. A copy-on-Write manner to update the b-tree on disk (as some of the latest file systems do) is a possible option. Another option is using journaling as I wrote. > - deleting the snapshot --- this needs to walk the whole b-tree --- it is > slow. Keeping another b-tree of chunks belonging to the given snapshot > would be overkill. I think the best solution would be to split the device > into large areas and use per-snapshot bitmap that says if the snapshot has > some exceptions allocated in the pertaining area (similar to the > dirty-bitmap of raid1). For short lived snapshots this will save walking > the b-tree. For long-lived snapshots there is no help to speed it up... > But delete performance is not that critical anyway because deleting can be > done asynchronously without user waiting for it. Yeah, it would be nice to delete a snapshot really quickly but it's not a must. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-09 5:01 ` FUJITA Tomonori @ 2008-08-11 22:12 ` Mikulas Patocka 2008-08-11 23:34 ` FUJITA Tomonori 0 siblings, 1 reply; 20+ messages in thread From: Mikulas Patocka @ 2008-08-11 22:12 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: dm-devel, agk > > - drop that limitation on maximum 64 snapshots. If we are going to > > redesign it, we should design it without such a limit, so that we wouldn't > > have to redesign it again (why we need more than 64 --- for example to > > take periodic snapshots every few minutes to record system activity). The > > limit on number of snapshots can be dropped if we index b-tree nodes by a > > key that contains chunk number and range of snapshot numbers where this > > applies. > > Unfortunately it's the limitation of the current b-tree > format. As far as I know, there is no code that we can use, which > supports unlimited and writable snapshot. So use different format --- we in RedHat plan redesigning it too. One of the needed features is "rolling snapshots" --- i.e. you take snapshot every 5 minutes or so and you keep them around. The result is that you have complete history of the system activity. And this 64-snapshot limitation would not allow this. The problem if we use this format is that we will spend a lot of time developing and finalizing it --- and then a requirement for rolling snapshots comes --- and we'll have to throw it away and start from scratch. So I'd rather do b-tree without limitation on number of snapshots from the beginning. Another good thing would be the ability to compress several consecutive chunks into one b-tree entry. But I think with multiple snapshots, there is no clean way how to do it. Maybe design it without this possibility, and then use some dirty hack to compress consecutive chunks in most common cases (such as for example when no one writes to the snapshots). > > - do some cache for metadata, don't read the b-tree from the root node > > from disk all the time. > > The current code already does. I see. That GFP_NOFS allocation shouldn't be there, because - it is not reliable - it can recurse back into block writing via swapper (use GFP_NOIO to avoid that) The correct solution would be to preallocate one or more buffers in the target constructor. When running, get additional buffers with GFP_NOIO, but if that fails, use the preallocated buffer. --- this way it can handle temporary memory shortage without data corruption. I'll write some generic code for that caching, I think it could be useful even for other targets, so it'd be best to write it into main dm module. > > Ideally the cache should be integrated with page > > cache so that it's size would tune automatically (I'm not sure if it's > > possible to cleanly code it, though). > > Agreed. The current code invents the own cache code. I don't like it > but there is no other option. Yes. Theoretically you can create your own address_space_operations and try to integrate it into memory management. Practically, it's hard to say if it will work (and if it will be maintainable as memory management changes). > > - the b-tree is good structure, I'd create log-structured filesystem to > > hold the b-tree. The advantage is that it will require less > > synchronization overhead in clustering. Also, log-structured filesystem > > will bring you crash recovery (with minimum coding overhead) and it has > > very good write performance. > > A log-structured filesystem is pretty complex. Even though we don't > need a complete log-structured filesystem, it's still too complex, > IMO. I think it's not really harder than journaling. Maybe it's even easier, because in journaling you have replay code that is very hard to test and debug (ext3 had some replay bug even recently). In log-structured filesystem there is no replay code, it is always consistent. (I obviously don't mean to develop the whole filesystem for that --- just use the main idea that you write always forward into unallocated space) + good for performance, majority of operations are writes + doesn't need cache-synchronization for cluster + can be simultaneously read by more cluster nodes and written by one cluster node (all other formats require read:write exclusion) > A copy-on-Write manner to update the b-tree on disk (as some of the > latest file systems do) is a possible option. That is what I mean. When we modify a node, one possibility is to write b-tree blocks back to the root to unallocated space. The other possibility is to write just one block to new space and mark it in superblock as "redirected" from the old location. When the array of redirected blocks fills up, write all b-tree blocks up to the root and erase the array of redirected blocks (this will improve performance because you don't have to write the full path up to root on every block update). Another question is, where the superblock should be located. Just one superblock at the beginning would be bad for disk seeks, maybe have superblock at each disk track (approximatelly ... we don't know where the tracks area), use some sequence counter to tell which one is the newest, and write to the one that is near to the data. > Another option is using journaling as I wrote. > > > > - deleting the snapshot --- this needs to walk the whole b-tree --- it is > > slow. Keeping another b-tree of chunks belonging to the given snapshot > > would be overkill. I think the best solution would be to split the device > > into large areas and use per-snapshot bitmap that says if the snapshot has > > some exceptions allocated in the pertaining area (similar to the > > dirty-bitmap of raid1). For short lived snapshots this will save walking > > the b-tree. For long-lived snapshots there is no help to speed it up... > > But delete performance is not that critical anyway because deleting can be > > done asynchronously without user waiting for it. > > Yeah, it would be nice to delete a snapshot really quickly but it's > not a must. Mikulas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-11 22:12 ` Mikulas Patocka @ 2008-08-11 23:34 ` FUJITA Tomonori 2008-08-12 0:15 ` Steve VanDeBogart 2008-08-14 0:14 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-11 23:34 UTC (permalink / raw) To: dm-devel; +Cc: agk On Mon, 11 Aug 2008 18:12:08 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: > > > - drop that limitation on maximum 64 snapshots. If we are going to > > > redesign it, we should design it without such a limit, so that we wouldn't > > > have to redesign it again (why we need more than 64 --- for example to > > > take periodic snapshots every few minutes to record system activity). The > > > limit on number of snapshots can be dropped if we index b-tree nodes by a > > > key that contains chunk number and range of snapshot numbers where this > > > applies. > > > > Unfortunately it's the limitation of the current b-tree > > format. As far as I know, there is no code that we can use, which > > supports unlimited and writable snapshot. > > So use different format --- we in RedHat plan redesigning it too. One of > the needed features is "rolling snapshots" --- i.e. you take snapshot > every 5 minutes or so and you keep them around. The result is that you > have complete history of the system activity. I think that implementing a better format is far more difficult than you think. for example, see the tux3 vs. HAMMER discussion between Daniel Phillips and Matthew Dillon. Unless Alasdair tells me that unlimited snapshots is a must, probably I will not work on it. I'm focusing integrating a snapshot feature into dm cleanly. Of course, I'm happy to use the better snapshot code if it's available. > And this 64-snapshot limitation would not allow this. The problem if we > use this format is that we will spend a lot of time developing and > finalizing it --- and then a requirement for rolling snapshots comes --- > and we'll have to throw it away and start from scratch. So I'd rather do > b-tree without limitation on number of snapshots from the beginning. The advantage of taking the snapshot code from Zumastor is that it has worked for a while. I don't expect much effort to stabilize the snapshot code. The main issue here is that how to integrate it into dm nicely. I think that we have the version number in the super block to handle better snapshot formats. > Another good thing would be the ability to compress several consecutive > chunks into one b-tree entry. But I think with multiple snapshots, there > is no clean way how to do it. Maybe design it without this possibility, > and then use some dirty hack to compress consecutive chunks in most common > cases (such as for example when no one writes to the snapshots). > > > > - do some cache for metadata, don't read the b-tree from the root node > > > from disk all the time. > > > > The current code already does. > > I see. That GFP_NOFS allocation shouldn't be there, because > - it is not reliable > - it can recurse back into block writing via swapper (use GFP_NOIO to > avoid that) > > The correct solution would be to preallocate one or more buffers in the > target constructor. When running, get additional buffers with GFP_NOIO, > but if that fails, use the preallocated buffer. --- this way it can handle > temporary memory shortage without data corruption. > > I'll write some generic code for that caching, I think it could be useful > even for other targets, so it'd be best to write it into main dm module. I'm not sure that other dm targets need such feature but I'm happy to use it if it is provided. Next time, I'll submit this feature as a separate patch. > > > - the b-tree is good structure, I'd create log-structured filesystem to > > > hold the b-tree. The advantage is that it will require less > > > synchronization overhead in clustering. Also, log-structured filesystem > > > will bring you crash recovery (with minimum coding overhead) and it has > > > very good write performance. > > > > A log-structured filesystem is pretty complex. Even though we don't > > need a complete log-structured filesystem, it's still too complex, > > IMO. > > I think it's not really harder than journaling. Maybe it's even easier, > because in journaling you have replay code that is very hard to test and > debug (ext3 had some replay bug even recently). In log-structured > filesystem there is no replay code, it is always consistent. > > (I obviously don't mean to develop the whole filesystem for that --- just > use the main idea that you write always forward into unallocated space) > > + good for performance, majority of operations are writes > + doesn't need cache-synchronization for cluster > + can be simultaneously read by more cluster nodes and written by one > cluster node (all other formats require read:write exclusion) A log-structured file system is much more difficult than journaling. And it's not better than it looks. If a log-structured file system is really nice, we have tons of log-structured file systems. In reality, we don't. AFAIK, no widely-used operating systems (such as Linux, *BSD, Solaris, Windows, etc) don't use a log-structured file systems as a default file system. > > A copy-on-Write manner to update the b-tree on disk (as some of the > > latest file systems do) is a possible option. > > That is what I mean. Then, I don't think you are talking about a log-structured file system. In general, we don't classify a copy-on-write file system like ZFS as a log-structured file system. > When we modify a node, one possibility is to write > b-tree blocks back to the root to unallocated space. The other possibility > is to write just one block to new space and mark it in superblock as > "redirected" from the old location. When the array of redirected blocks > fills up, write all b-tree blocks up to the root and erase the array of > redirected blocks (this will improve performance because you don't have to > write the full path up to root on every block update). > > Another question is, where the superblock should be located. Just one > superblock at the beginning would be bad for disk seeks, maybe have > superblock at each disk track (approximatelly ... we don't know where the > tracks area), use some sequence counter to tell which one is the newest, > and write to the one that is near to the data. > > > Another option is using journaling as I wrote. > > > > > > > - deleting the snapshot --- this needs to walk the whole b-tree --- it is > > > slow. Keeping another b-tree of chunks belonging to the given snapshot > > > would be overkill. I think the best solution would be to split the device > > > into large areas and use per-snapshot bitmap that says if the snapshot has > > > some exceptions allocated in the pertaining area (similar to the > > > dirty-bitmap of raid1). For short lived snapshots this will save walking > > > the b-tree. For long-lived snapshots there is no help to speed it up... > > > But delete performance is not that critical anyway because deleting can be > > > done asynchronously without user waiting for it. > > > > Yeah, it would be nice to delete a snapshot really quickly but it's > > not a must. > > Mikulas > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-11 23:34 ` FUJITA Tomonori @ 2008-08-12 0:15 ` Steve VanDeBogart 2008-08-12 12:30 ` Daniel Phillips 2008-08-14 0:14 ` Daniel Phillips 1 sibling, 1 reply; 20+ messages in thread From: Steve VanDeBogart @ 2008-08-12 0:15 UTC (permalink / raw) To: device-mapper development; +Cc: agk On Tue, 12 Aug 2008, FUJITA Tomonori wrote: > On Mon, 11 Aug 2008 18:12:08 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: >> - drop that limitation on maximum 64 snapshots. If we are going to >> redesign it, we should design it without such a limit, so that we wouldn't >> have to redesign it again (why we need more than 64 --- for example to >> take periodic snapshots every few minutes to record system activity). The >> limit on number of snapshots can be dropped if we index b-tree nodes by a >> key that contains chunk number and range of snapshot numbers where this >> applies. > > Unfortunately it's the limitation of the current b-tree > format. As far as I know, there is no code that we can use, which > supports unlimited and writable snapshot. I've recently worked on the limit of 64 snapshots and the storage cost of 2x64bits per modified chunk. A btree format that fixes these two issue is described in this post: http://lwn.net/Articles/288896/ If you have the time / energy, I believe that this format will work well and be simple and elegant. I can't speak for Daniel Phillips, but I suspect he is concentrating on tux3 and not on getting this format into Zumastor. If you don't want to implement "versioned pointers," an earlier format change is implemented as a patch against Zumastor here: http://groups.google.com/group/zumastor/browse_thread/thread/523ee7925add3dfc/a5d26a4b48fd8906?lnk=gst&q=#a5d26a4b48fd8906 It removes the 64 snapshot limit and reduces the meta-data storage requirements, but does not support snapshots of snapshots. This patch has undergone reasonable testing and can be considered beta level code. With both of these formats, in the context of the Zumastor codebase, the number of snapshots is limited by a requirement that all metadata about a specific chunk fit within a single btree node. This limits the number of snapshots to approximately a quarter the chunk size. i.e. 4k chunks would support approximately 500 snapshots. Removing that restriction would increase the number of supported snapshots by a factor of eight, at which point the next restriction is encountered. >> - deleting the snapshot --- this needs to walk the whole b-tree --- it is >> slow. Keeping another b-tree of chunks belonging to the given snapshot >> would be overkill. I think the best solution would be to split the device >> into large areas and use per-snapshot bitmap that says if the snapshot has >> some exceptions allocated in the pertaining area (similar to the >> dirty-bitmap of raid1). For short lived snapshots this will save walking >> the b-tree. For long-lived snapshots there is no help to speed it up... >> But delete performance is not that critical anyway because deleting can be >> done asynchronously without user waiting for it. I don't know if it would be useful for your port, but there are some patches floating around the Zumastor mailing list that implement background delete. They're not production ready but are a good start on the implementation. -- Steve ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 0:15 ` Steve VanDeBogart @ 2008-08-12 12:30 ` Daniel Phillips 0 siblings, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2008-08-12 12:30 UTC (permalink / raw) To: dm-devel; +Cc: zumastor, agk Hi Steve, On Monday 11 August 2008 17:15, Steve VanDeBogart wrote: > On Tue, 12 Aug 2008, FUJITA Tomonori wrote: > > On Mon, 11 Aug 2008 18:12:08 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: > >> - drop that limitation on maximum 64 snapshots. If we are going to > >> redesign it, we should design it without such a limit, so that we wouldn't > >> have to redesign it again (why we need more than 64 --- for example to > >> take periodic snapshots every few minutes to record system activity). The > >> limit on number of snapshots can be dropped if we index b-tree nodes by a > >> key that contains chunk number and range of snapshot numbers where this > >> applies. > > > > Unfortunately it's the limitation of the current b-tree > > format. As far as I know, there is no code that we can use, which > > supports unlimited and writable snapshot. > > I've recently worked on the limit of 64 snapshots and the storage cost of > 2x64bits per modified chunk. A btree format that fixes these two issue > is described in this post: http://lwn.net/Articles/288896/ If you have > the time / energy, I believe that this format will work well and be > simple and elegant. I can't speak for Daniel Phillips, but I suspect he > is concentrating on tux3 and not on getting this format into Zumastor. It is very much the intention to get the versioned pointer code into ddsnap. There is also this code: http://tux3.org/tux3?f=81a1dd303e2a;file=user/test/dleaf.c which implements a compressed leaf dictionary format that I believe you last saw on a whiteboard a few weeks ago. It now works pretty well, in part thanks to Shapor. The idea is to thoroughly shake out this code in Tux3 then backport to ddsnap. But nothing stands in the way of somebody just putting that in now. Incidentally, it did turn out to be possible to make the group entries 32 bits. Demented code to be honest, but the leaf compression is really good while the speed is roughly the same as the existing code, and it has the benefit of supporting 48 bit block numbers while the existing code only supports 32. It also has the pleasant property of most of the memmoves being zero bytes, because I got it right this time and put the leaf dictionary upside down at the top of the block instead of having the exceptions at the top. You are right that I will not be merging this code in the immediate future. Anybody who wants to take that on is more than welcome. It will not be a hard project to integrate that code and the algorithms are quite interesting. Over time, a few other pieces of Tux3 will get merged back into ddsnap, for example, the forward logging atomic update method to eliminate most of the remaining journal overhead. > With both of these formats, in the context of the Zumastor codebase, the > number of snapshots is limited by a requirement that all metadata about > a specific chunk fit within a single btree node. This limits the > number of snapshots to approximately a quarter the chunk size. i.e. 4k > chunks would support approximately 500 snapshots. One eighth the chunk size, you meant. Chunk pointers being 8 bytes, and the leaf directory overhead being insignificant by the time a block has been split down to just a single logical address. > Removing that restriction would increase the number of supported > snapshots by a factor of eight, at which point the next restriction > is encountered. I think the next restriction is the size of the version table in the superblock, which is easily overcome. Then the next one after that is the number of bits available in the block pointer for the version, which can resonably be 16 with 48 bit block pointers, giving 2^16 user visible snapshots, which is getting pretty close to unlimited. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-11 23:34 ` FUJITA Tomonori 2008-08-12 0:15 ` Steve VanDeBogart @ 2008-08-14 0:14 ` Daniel Phillips 2008-08-15 8:17 ` FUJITA Tomonori 2008-08-15 13:53 ` Ryusuke Konishi 1 sibling, 2 replies; 20+ messages in thread From: Daniel Phillips @ 2008-08-14 0:14 UTC (permalink / raw) To: dm-devel; +Cc: agk On Monday 11 August 2008 16:34, FUJITA Tomonori wrote: > On Mon, 11 Aug 2008 18:12:08 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: > > So use different format --- we in RedHat plan redesigning it too. One of > > the needed features is "rolling snapshots" --- i.e. you take snapshot Matt Dillon's Hammer for BSD has rolling snapshots, effectively infinite snapshots on a per-fsync basis. I strongly suggest that you think about rustling up some engineers to join a porting team, both Red Hat and NTT. > > every 5 minutes or so and you keep them around. The result is that you > > have complete history of the system activity. > > I think that implementing a better format is far more difficult than > you think. for example, see the tux3 vs. HAMMER discussion between > Daniel Phillips and Matthew Dillon. > > Unless Alasdair tells me that unlimited snapshots is a must, probably > I will not work on it. I'm focusing integrating a snapshot feature > into dm cleanly. > > Of course, I'm happy to use the better snapshot code if it's > available. Very sensible. Over time it will be available, but there is a whole lot of benefit to starting with code that is known to work. One thing you need to keep in mind: any time you have a memory-using daemon doing work on behalf of block IO code you need to implement anti-deadlock measures of the kind ddsnap implements via bio-throttle. Other ways have been proposed to solve these deadlocks, but the bio-throttle approach is the only one that has been observed to work reliably. You are also welcome to port the real thing to kernel: ddsnapd. The scope of that work would be roughly what you have already accomplished. You would only port the part that responds to kernel read/write requests. I could take care of designing and implementing a kernel interface between your port and the rest of ddsnapd that does such things as respond to control messages and generate block delta lists. You can optionally leave the delete code in userspace, except for the journalling, which has to work together with the journalling that will be triggered for origin write and snapshot read/write. Since delete can now run in background (with my new patch) that should not be a performance issue at all. It is also possible to interface ddsnapd seamlessly to LVM2. That mechanism has been built into LVM2 since forever, via dynamically loadable modules in the LVM2 userspace support. I am not sure what would be done about LVM2 re replication. Perhaps LVM2 can be taught to understand that. Otherwise, I do not think it would be hard to create a suitable out-of-band interface for replication control. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-14 0:14 ` Daniel Phillips @ 2008-08-15 8:17 ` FUJITA Tomonori 2008-08-15 8:43 ` Daniel Phillips 2008-08-15 13:53 ` Ryusuke Konishi 1 sibling, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-15 8:17 UTC (permalink / raw) To: phillips, dm-devel; +Cc: agk On Wed, 13 Aug 2008 17:14:08 -0700 Daniel Phillips <phillips@phunq.net> wrote: > requests. I could take care of designing and implementing a kernel > interface between your port and the rest of ddsnapd that does such > things as respond to control messages and generate block delta > lists. As I said at the first submission, I plan to add such features to the new dm-snapshot code. Then we can have simple user-space code that focus on the replication. A daemon program requests delta from the kernel, and sends it to another daemon program on the remote server. The daemon on the remote server asks the kernel to apply delta. The advantage of this approach, the above replication program can work with any snapshot implementation, which could live in dm or file systems like btrfs. File systems could implement the snapshot features more efficiently than dm. My question related with this issue is, any chance to modify Zumastor's ddsnapd in a such way. Well, I guess, it would be better to ask on Zumastor mailing list. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-15 8:17 ` FUJITA Tomonori @ 2008-08-15 8:43 ` Daniel Phillips 2008-08-15 9:25 ` FUJITA Tomonori 0 siblings, 1 reply; 20+ messages in thread From: Daniel Phillips @ 2008-08-15 8:43 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: zumastor, dm-devel, agk On Friday 15 August 2008 01:17, FUJITA Tomonori wrote: > On Wed, 13 Aug 2008 17:14:08 -0700 > Daniel Phillips <phillips@phunq.net> wrote: > > > requests. I could take care of designing and implementing a kernel > > interface between your port and the rest of ddsnapd that does such > > things as respond to control messages and generate block delta > > lists. > > As I said at the first submission, I plan to add such features to the > new dm-snapshot code. Then we can have simple user-space code that > focus on the replication. Well, I suppose when you get it working we can always port it back to ddsnap :-) Ddsnap already has quite simple userspace code to do the replication, or it would be simple if it were cleaned up a little. There is nothing complex about this. But the kernel will have to generate the block difference list because it needs access to the snapshot store btree to do this. > A daemon program requests delta from the > kernel, and sends it to another daemon program on the remote > server. The daemon on the remote server asks the kernel to apply > delta. The downstream server just writes the delta to the origin, there is no need to ask the kernel to do this. > The advantage of this approach, the above replication program can work > with any snapshot implementation, which could live in dm or file > systems like btrfs. File systems could implement the snapshot features > more efficiently than dm. When you replicate a volume you can just send a list of changed blocks as ddsnap does. This is not the case with a filesystem delta, which has to send the changed blocks of each filesystem object logically, along with relevant metadata such as changed permissions, ownership, file sizes etc. > My question related with this issue is, any chance to modify > Zumastor's ddsnapd in a such way. Well, I guess, it would be better to > ask on Zumastor mailing list. CC added. Yes, it is planned to modify ddsnap to implement a redirect on write strategy where you essentially use a snapshot as the origin. This will be a lot more practical after we have snapshots of snapshots using the versioned pointer code. Versioned pointers by itself will take a few months to go in and be stable. Things do not move awfully fast with this storage work, I think that is some kind of tradition. There is a lot that can still be done to improve efficiency even before going to redirect on write. Probably another doubling of throughput is possible by straightforward techniques such as batching up transfers better and more improvements to the journalling code, or replacement of the journal by a logging technique. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-15 8:43 ` Daniel Phillips @ 2008-08-15 9:25 ` FUJITA Tomonori 2008-08-16 20:14 ` Daniel Phillips 0 siblings, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-15 9:25 UTC (permalink / raw) To: dm-devel; +Cc: zumastor, agk On Fri, 15 Aug 2008 01:43:55 -0700 Daniel Phillips <phillips@phunq.net> wrote: > > A daemon program requests delta from the > > kernel, and sends it to another daemon program on the remote > > server. The daemon on the remote server asks the kernel to apply > > delta. > > The downstream server just writes the delta to the origin, there is no > need to ask the kernel to do this. > > > The advantage of this approach, the above replication program can work > > with any snapshot implementation, which could live in dm or file > > systems like btrfs. File systems could implement the snapshot features > > more efficiently than dm. > > When you replicate a volume you can just send a list of changed blocks > as ddsnap does. This is not the case with a filesystem delta, which > has to send the changed blocks of each filesystem object logically, > along with relevant metadata such as changed permissions, ownership, > file sizes etc. What I think about is... User-space replication programs don't know anything about delta. Delta might be a list of changed blocks, or something more complicated. So the downstream server can't simply write the delta to the origin. It always asks the kernel. User-space replication programs don't care about the format of delta. A file system can give user-space programs whatever as delta. All a file system needs to do is applying the delta that it gave to user space. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-15 9:25 ` FUJITA Tomonori @ 2008-08-16 20:14 ` Daniel Phillips 0 siblings, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2008-08-16 20:14 UTC (permalink / raw) To: zumastor; +Cc: dm-devel, agk On Friday 15 August 2008 02:25, FUJITA Tomonori wrote: > > On Fri, 15 Aug 2008 01:43:55 -0700 > Daniel Phillips <phillips@phunq.net> wrote: > > > > A daemon program requests delta from the > > > kernel, and sends it to another daemon program on the remote > > > server. The daemon on the remote server asks the kernel to apply > > > delta. > > > > The downstream server just writes the delta to the origin, there is no > > need to ask the kernel to do this. > > > > > The advantage of this approach, the above replication program can work > > > with any snapshot implementation, which could live in dm or file > > > systems like btrfs. File systems could implement the snapshot features > > > more efficiently than dm. > > > > When you replicate a volume you can just send a list of changed blocks > > as ddsnap does. This is not the case with a filesystem delta, which > > has to send the changed blocks of each filesystem object logically, > > along with relevant metadata such as changed permissions, ownership, > > file sizes etc. > > What I think about is... > > User-space replication programs don't know anything about delta. Delta > might be a list of changed blocks, or something more complicated. So > the downstream server can't simply write the delta to the origin. It > always asks the kernel. Just like a volume, there is no real need to ask the kernel to take care of the details of the filesystem layout. Say your delta says "truncate file x to y bytes, change these blocks and move it to directory z". You can do all of that with standard filesystem calls, no need to create a new kernel interface. In general, all the logical elements of a filesystem that are visible to a user should be modifiable by a user. The only special thing the filesystem has to provide is the ability to snapshot. > User-space replication programs don't care about the format of > delta. A file system can give user-space programs whatever as > delta. All a file system needs to do is applying the delta that it > gave to user space. We usually try to do it the opposite way: the kernel knows as little as possible about the format of data it transfers, while userspace knows the rest. In this case usespace already knows everything it needs to know, which is just the Posix filesystem interface. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-14 0:14 ` Daniel Phillips 2008-08-15 8:17 ` FUJITA Tomonori @ 2008-08-15 13:53 ` Ryusuke Konishi 1 sibling, 0 replies; 20+ messages in thread From: Ryusuke Konishi @ 2008-08-15 13:53 UTC (permalink / raw) To: device-mapper development; +Cc: agk Hello folks, 2008/8/14 Daniel Phillips <phillips@phunq.net>: > On Monday 11 August 2008 16:34, FUJITA Tomonori wrote: >> On Mon, 11 Aug 2008 18:12:08 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote: >> > So use different format --- we in RedHat plan redesigning it too. One of >> > the needed features is "rolling snapshots" --- i.e. you take snapshot > > Matt Dillon's Hammer for BSD has rolling snapshots, effectively > infinite snapshots on a per-fsync basis. I strongly suggest that you > think about rustling up some engineers to join a porting team, both > Red Hat and NTT. I'm a developer of NILFS, a Linux log-structured filesystem with the rolling snapshots (we call it continous snapshots); NILFS can take inifinite snapshots every few seconds or on a per-fsync basis. ( It is available from http://www.nilfs.org/ ) Though it works fine and is maintained for the recent kernels, it's too complicated due to its LFS nature (as Tomonori pointed out). So, I am happy if we can realize continuous snapshotting filesystem much simpler on nicely supported dm or block layer features. We also have to keep in mind that filesystem development requires much more energy than usual projects. It seems that a lot of development energy is dispersed around filesystems especially for Linux. I'm still working on NILFS to make it simple enough, but I am also glad to cooperate with other engineers if other FS people or Red Hat people hope for that. With regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-04 8:22 [PATCH RFC] dm snapshot: shared exception store FUJITA Tomonori 2008-08-06 19:14 ` Mikulas Patocka @ 2008-08-12 12:56 ` Daniel Phillips 2008-08-12 13:14 ` FUJITA Tomonori 2008-08-12 19:07 ` Daniel Phillips 2 siblings, 1 reply; 20+ messages in thread From: Daniel Phillips @ 2008-08-12 12:56 UTC (permalink / raw) To: dm-devel; +Cc: agk Hi tomonori, An impressive patch. You just want to use getblk for alloc_chunk_buffer, not vmalloc. That is what the buffer cache is for, and that is why I emulated it in userspace. More comments later. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 12:56 ` Daniel Phillips @ 2008-08-12 13:14 ` FUJITA Tomonori 2008-08-12 19:00 ` Daniel Phillips 0 siblings, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-12 13:14 UTC (permalink / raw) To: dm-devel; +Cc: agk On Tue, 12 Aug 2008 05:56:57 -0700 Daniel Phillips <phillips@phunq.net> wrote: > Hi tomonori, > > An impressive patch. Thanks, and thanks a lot for the nice snapshot code. > You just want to use getblk for alloc_chunk_buffer, > not vmalloc. I think that it means that we cache all the chunks, both btree chunks and the data chunks (which are passed to the upper layer such as file systems). I think that we don't want cache the latter in dm. > That is what the buffer cache is for, and that is why I > emulated it in userspace. > > More comments later. Thanks. I'll post a new patchset in a more reasonable format shortly (tomorrow or day after tomorrow, hopefully). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 13:14 ` FUJITA Tomonori @ 2008-08-12 19:00 ` Daniel Phillips 2008-08-12 23:24 ` FUJITA Tomonori 0 siblings, 1 reply; 20+ messages in thread From: Daniel Phillips @ 2008-08-12 19:00 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: dm-devel, agk On Tuesday 12 August 2008 06:14, FUJITA Tomonori wrote: > > You just want to use getblk for alloc_chunk_buffer, > > not vmalloc. > > I think that it means that we cache all the chunks, both btree chunks > and the data chunks (which are passed to the upper layer such as file > systems). I think that we don't want cache the latter in dm. That is true. However your code should not be reading data chunks into memory at all. The only time the snapshot code has to read a data chunk is when performing the copy from origin to snapshot store in make_unique. Your code does not directly perform this task as far as I can see. That would be done in a part of the dm snapshot code your patch does not touch, which I seem to recall uses the kcopyd mechanism. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 19:00 ` Daniel Phillips @ 2008-08-12 23:24 ` FUJITA Tomonori 2008-08-12 23:29 ` FUJITA Tomonori 0 siblings, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-12 23:24 UTC (permalink / raw) To: dm-devel; +Cc: agk On Tue, 12 Aug 2008 12:00:36 -0700 Daniel Phillips <phillips@phunq.net> wrote: > On Tuesday 12 August 2008 06:14, FUJITA Tomonori wrote: > > > You just want to use getblk for alloc_chunk_buffer, > > > not vmalloc. > > > > I think that it means that we cache all the chunks, both btree chunks > > and the data chunks (which are passed to the upper layer such as file > > systems). I think that we don't want cache the latter in dm. > > That is true. However your code should not be reading data chunks into > memory at all. The only time the snapshot code has to read a data > chunk is when performing the copy from origin to snapshot store in > make_unique. Your code does not directly perform this task as far as I > can see. That would be done in a part of the dm snapshot code your > patch does not touch, which I seem to recall uses the kcopyd mechanism. Yes, dm-snapshot doesn't access to data chunks. However, dm does for dm-snapshot (by using submit_bio friends). So if dm-snapshot uses __getblk, we use both __getblk and submit_bio with a single device at the same time. I think that it's the right thing. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 23:24 ` FUJITA Tomonori @ 2008-08-12 23:29 ` FUJITA Tomonori 2008-08-13 0:28 ` Daniel Phillips 0 siblings, 1 reply; 20+ messages in thread From: FUJITA Tomonori @ 2008-08-12 23:29 UTC (permalink / raw) To: dm-devel; +Cc: agk On Wed, 13 Aug 2008 08:24:41 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > On Tue, 12 Aug 2008 12:00:36 -0700 > Daniel Phillips <phillips@phunq.net> wrote: > > > On Tuesday 12 August 2008 06:14, FUJITA Tomonori wrote: > > > > You just want to use getblk for alloc_chunk_buffer, > > > > not vmalloc. > > > > > > I think that it means that we cache all the chunks, both btree chunks > > > and the data chunks (which are passed to the upper layer such as file > > > systems). I think that we don't want cache the latter in dm. > > > > That is true. However your code should not be reading data chunks into > > memory at all. The only time the snapshot code has to read a data > > chunk is when performing the copy from origin to snapshot store in > > make_unique. Your code does not directly perform this task as far as I > > can see. That would be done in a part of the dm snapshot code your > > patch does not touch, which I seem to recall uses the kcopyd mechanism. > > Yes, dm-snapshot doesn't access to data chunks. However, dm does for > dm-snapshot (by using submit_bio friends). So if dm-snapshot uses > __getblk, we use both __getblk and submit_bio with a single device at > the same time. I think that it's the right thing. Oops, I meant, I don't think that it's the right thing. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-12 23:29 ` FUJITA Tomonori @ 2008-08-13 0:28 ` Daniel Phillips 0 siblings, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2008-08-13 0:28 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: dm-devel, agk On Tuesday 12 August 2008 16:29, FUJITA Tomonori wrote: > On Wed, 13 Aug 2008 08:24:41 +0900 > FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > > > On Tue, 12 Aug 2008 12:00:36 -0700 > > Daniel Phillips <phillips@phunq.net> wrote: > > > > > On Tuesday 12 August 2008 06:14, FUJITA Tomonori wrote: > > > > > You just want to use getblk for alloc_chunk_buffer, > > > > > not vmalloc. > > > > > > > > I think that it means that we cache all the chunks, both btree chunks > > > > and the data chunks (which are passed to the upper layer such as file > > > > systems). I think that we don't want cache the latter in dm. > > > > > > That is true. However your code should not be reading data chunks into > > > memory at all. The only time the snapshot code has to read a data > > > chunk is when performing the copy from origin to snapshot store in > > > make_unique. Your code does not directly perform this task as far as I > > > can see. That would be done in a part of the dm snapshot code your > > > patch does not touch, which I seem to recall uses the kcopyd mechanism. > > > > Yes, dm-snapshot doesn't access to data chunks. However, dm does for > > dm-snapshot (by using submit_bio friends). So if dm-snapshot uses > > __getblk, we use both __getblk and submit_bio with a single device at > > the same time. I think that it's the right thing. > > Oops, I meant, I don't think that it's the right thing. Sure, it is perfectly ok. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC] dm snapshot: shared exception store 2008-08-04 8:22 [PATCH RFC] dm snapshot: shared exception store FUJITA Tomonori 2008-08-06 19:14 ` Mikulas Patocka 2008-08-12 12:56 ` Daniel Phillips @ 2008-08-12 19:07 ` Daniel Phillips 2 siblings, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2008-08-12 19:07 UTC (permalink / raw) To: dm-devel; +Cc: agk On Monday 04 August 2008 01:22, FUJITA Tomonori wrote: > Note that the code is still in a very early stage. There are lots of > TODO items: > > ... > > - applying a delta to in a atomic manner This is not necessary because of the way the ddsnap replication algorithm works. A delta is always written to the downstream origin, and nothing else accesses the downstream origin while that is taking place. Regards, Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-08-16 20:14 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-04 8:22 [PATCH RFC] dm snapshot: shared exception store FUJITA Tomonori 2008-08-06 19:14 ` Mikulas Patocka 2008-08-09 5:01 ` FUJITA Tomonori 2008-08-11 22:12 ` Mikulas Patocka 2008-08-11 23:34 ` FUJITA Tomonori 2008-08-12 0:15 ` Steve VanDeBogart 2008-08-12 12:30 ` Daniel Phillips 2008-08-14 0:14 ` Daniel Phillips 2008-08-15 8:17 ` FUJITA Tomonori 2008-08-15 8:43 ` Daniel Phillips 2008-08-15 9:25 ` FUJITA Tomonori 2008-08-16 20:14 ` Daniel Phillips 2008-08-15 13:53 ` Ryusuke Konishi 2008-08-12 12:56 ` Daniel Phillips 2008-08-12 13:14 ` FUJITA Tomonori 2008-08-12 19:00 ` Daniel Phillips 2008-08-12 23:24 ` FUJITA Tomonori 2008-08-12 23:29 ` FUJITA Tomonori 2008-08-13 0:28 ` Daniel Phillips 2008-08-12 19:07 ` Daniel Phillips
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.