* Fwd: block level cow operation [not found] <CAD6i1f+GsVZJwaz1R3NDjP_m8nOCUsmqHTQS3R=M+d+hq8f5vw@mail.gmail.com> @ 2013-04-09 9:05 ` Prashant Shah 2013-04-09 9:56 ` Lukáš Czerner ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Prashant Shah @ 2013-04-09 9:05 UTC (permalink / raw) To: linux-ext4 Hi, I am trying to implement copy on write operation by reading the original disk block and writing it to some other location and then allowing the write to pass though (block the write operation till the read or original block completes) I tried using submit_bio() / sb_bread() to read the block and using the completion API to signal the end of reading the block but the performance of this is very bad. It takes around 12 times more time for any disk writes. Is there any better way to improve the performance ? Not waiting for the completion of the read operation and letting the disk write go through gives good performance but under 10% of the cases the read happens after the write and ends up the the new data and not the original data. Regards. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: block level cow operation 2013-04-09 9:05 ` Fwd: block level cow operation Prashant Shah @ 2013-04-09 9:56 ` Lukáš Czerner 2013-04-09 14:46 ` Dmitry Monakhov 2013-04-09 21:02 ` Theodore Ts'o 2 siblings, 0 replies; 6+ messages in thread From: Lukáš Czerner @ 2013-04-09 9:56 UTC (permalink / raw) To: Prashant Shah; +Cc: linux-ext4 On Tue, 9 Apr 2013, Prashant Shah wrote: > Date: Tue, 9 Apr 2013 14:35:56 +0530 > From: Prashant Shah <pshah.mumbai@gmail.com> > To: linux-ext4@vger.kernel.org > Subject: Fwd: block level cow operation > > Hi, > > I am trying to implement copy on write operation Hi, In ext4 ? Why are you trying to do that ? > by reading the > original disk block and writing it to some other location and then > allowing the write to pass though (block the write operation till the > read or original block completes) I tried using submit_bio() / > sb_bread() to read the block and using the completion API to signal > the end of reading the block but the performance of this is very bad. > It takes around 12 times more time for any disk writes. Is there any > better way to improve the performance ? I am not sure what you're trying to achieve here, but the simplest answer is yes, there is a way to improve the performance - use device mapper to do this. thinp target provides you with block level cow functionality which enables you to do snapshots efficiently for example. -Lukas > > Not waiting for the completion of the read operation and letting the > disk write go through gives good performance but under 10% of the > cases the read happens after the write and ends up the the new data > and not the original data. > > Regards. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: block level cow operation 2013-04-09 9:05 ` Fwd: block level cow operation Prashant Shah 2013-04-09 9:56 ` Lukáš Czerner @ 2013-04-09 14:46 ` Dmitry Monakhov 2013-04-25 13:00 ` Prashant Shah 2013-04-09 21:02 ` Theodore Ts'o 2 siblings, 1 reply; 6+ messages in thread From: Dmitry Monakhov @ 2013-04-09 14:46 UTC (permalink / raw) To: Prashant Shah, linux-ext4 On Tue, 9 Apr 2013 14:35:56 +0530, Prashant Shah <pshah.mumbai@gmail.com> wrote: > Hi, > > I am trying to implement copy on write operation by reading the > original disk block and writing it to some other location and then > allowing the write to pass though (block the write operation till the > read or original block completes) I tried using submit_bio() / > sb_bread() to read the block and using the completion API to signal > the end of reading the block but the performance of this is very bad. > It takes around 12 times more time for any disk writes. Is there any > better way to improve the performance ? > Yes obviously instead of synchronous block handling (block by block) which give about ~1-3Mb/s you should not block bio/requests handling, but simply deffer original bio. Some things like that: OUR_MAIN_ENTERING_POINT { if (bio->bi_rw == WRITE) { if (cow_required(bio)) cow_bio = create_cow_copy(bio) submit_bio(cow_bio); } /* Cow is not required */ submit_bio(bio); } create_cow_bio(struct *bio) { /* Save original content, and once it will be done we will * issue original bio */ */ cow_bio = alloc_bio(); cow_bio.bi_sector = bio->bi_sector; .... cow_bio->bi_private = bio; cow_bio->bi_end_io = cow_end_io } cow_end_io(struct bio *cow_bio, int error) ; { /* Once we done with saving original content we may send original bio, But end_io may be called from various contexts even from interrupt context , so we are not allowed to call submit_bio() So we will put original bio to the list and let our worker thread submit it for us later */ add_bio_to_the_list((struct bio*)cow_bio->bi_private); } This approach gives us reasonable performance ~3 times slower than disk throughput. For a reference implementation you may look at driver/dm/dm-snap or to Acronis snapapi module (AFAIR it is opensource) } > Not waiting for the completion of the read operation and letting the > disk write go through gives good performance but under 10% of the > cases the read happens after the write and ends up the the new data > and not the original data. Noooo never do that. Block layer will not guarantee you an order. > > Regards. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: block level cow operation 2013-04-09 14:46 ` Dmitry Monakhov @ 2013-04-25 13:00 ` Prashant Shah 2013-05-10 13:14 ` Prashant Shah 0 siblings, 1 reply; 6+ messages in thread From: Prashant Shah @ 2013-04-25 13:00 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-ext4 Hi, On Tue, Apr 9, 2013 at 8:16 PM, Dmitry Monakhov <dmonakhov@openvz.org> wrote: > > you should not block bio/requests handling, but simply deffer original > bio. Some things like that: > > OUR_MAIN_ENTERING_POINT { > if (bio->bi_rw == WRITE) { > if (cow_required(bio)) > cow_bio = create_cow_copy(bio) > submit_bio(cow_bio); > } > /* Cow is not required */ > submit_bio(bio); > } > This approach gives us reasonable performance ~3 times slower than disk > throughput. > For a reference implementation you may look at driver/dm/dm-snap or to > Acronis snapapi module (AFAIR it is opensource) > } Thanks. That is what I was looking for. Got the ref code from snapapi module which is opensource. Its not something that is specific to any filesystem. Regards. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: block level cow operation 2013-04-25 13:00 ` Prashant Shah @ 2013-05-10 13:14 ` Prashant Shah 0 siblings, 0 replies; 6+ messages in thread From: Prashant Shah @ 2013-05-10 13:14 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-ext4 Hi, On Thu, Apr 25, 2013 at 6:30 PM, Prashant Shah <pshah.mumbai@gmail.com> wrote: > Hi, > > On Tue, Apr 9, 2013 at 8:16 PM, Dmitry Monakhov <dmonakhov@openvz.org> wrote: >> >> you should not block bio/requests handling, but simply deffer original >> bio. Some things like that: >> >> OUR_MAIN_ENTERING_POINT { >> if (bio->bi_rw == WRITE) { >> if (cow_required(bio)) >> cow_bio = create_cow_copy(bio) >> submit_bio(cow_bio); >> } >> /* Cow is not required */ >> submit_bio(bio); >> } > >> This approach gives us reasonable performance ~3 times slower than disk >> throughput. >> For a reference implementation you may look at driver/dm/dm-snap or to >> Acronis snapapi module (AFAIR it is opensource) >> } Is this scenario possible ? If a write bio (bio1) for a particular sector is under cow and waiting for the read of the original block to complete. At the same time there is another write bio (bio2) for the same sector. The original order is bio1 then bio2. Now since bio1 is delayed due to cow and the new order becomes bio2 followed by bio1 that goes in the queue. This will cause the final on-disk write to be bio1. Regards. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: block level cow operation 2013-04-09 9:05 ` Fwd: block level cow operation Prashant Shah 2013-04-09 9:56 ` Lukáš Czerner 2013-04-09 14:46 ` Dmitry Monakhov @ 2013-04-09 21:02 ` Theodore Ts'o 2 siblings, 0 replies; 6+ messages in thread From: Theodore Ts'o @ 2013-04-09 21:02 UTC (permalink / raw) To: Prashant Shah; +Cc: linux-ext4 On Tue, Apr 09, 2013 at 02:35:56PM +0530, Prashant Shah wrote: > I am trying to implement copy on write operation by reading the > original disk block and writing it to some other location.... Lukas asked the correct first question, which is why are you trying to do this? If the goal is to make COW snapshots, then there's a lot of accounting information that you'll need to keep track of, and it is very doubtful ext4 will be the right place to do things. If the goal is to do efficient writes into cheap eMMC flash for random write workloads (i.e., which is the same problem f2fs is trying to solve), it's not totally insane to try to adapt ext4 to handle this problem. #1 You'd need to add support into mballoc to understand how to align its block writes on eMMC erase block boundaries, and to have a mode where it gives you sequentially increasing physical blocks ignoring the logical block numbers. #2 You'd need to intercept the write requests at the writepages() and writepage() calls, and that's where the decision would have to be made to allocate a new set of block numbers, based on some flag setting that would either be on a per-filesystem or per open file basis. As part of the I/O completion callback, where today we have code paths to convert an uninitialized extent to initialized extents, we could teach that code path to update the logical block mapping. #3 You'd have to come up with some approach to deal with direct I/O (including potentially not supporting COW writes for DIO). #4 You'd probably only want to do this for indirect block mapped files, since for a random write workload, the extent tree would become very inefficient very quickly. So it's not _insane_ but it's a huge amount of work, and it would be very trickly, and it's not something that I would recommend, say, if a student was looking for a term project. It would also not be faster on SSD or HDD's. The only reason to do something like this would be to deal with the extremely low-cost FTL of cheap eMMC flash devices (where the BOM cost of eMMC is approximately two orders of magnitude cheaper than SSD's). So if you are benchmarking this on a HDD or SSD, don't be surprised if it's much slower. And if you are benchmarking on eMMC, you have to make sure that you have the writes appropriately erase block aligned, or any performance gains would be hopeless. Regards, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-10 13:14 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAD6i1f+GsVZJwaz1R3NDjP_m8nOCUsmqHTQS3R=M+d+hq8f5vw@mail.gmail.com> 2013-04-09 9:05 ` Fwd: block level cow operation Prashant Shah 2013-04-09 9:56 ` Lukáš Czerner 2013-04-09 14:46 ` Dmitry Monakhov 2013-04-25 13:00 ` Prashant Shah 2013-05-10 13:14 ` Prashant Shah 2013-04-09 21:02 ` Theodore Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).