* How to find out which pages were copied-on-write? @ 2004-07-06 15:58 Lutz Vieweg 2004-07-09 11:31 ` Robin Holt 0 siblings, 1 reply; 10+ messages in thread From: Lutz Vieweg @ 2004-07-06 15:58 UTC (permalink / raw) To: linux-kernel Hi, in an application that MAP_PRIVATEly mmap()s a file it would be quite helpful for me to find out which pages have been copied-on-write. I found that mincore() does a similar thing by reporting which pages are currently residing in physical memory, but what I want to know is which pages differ from the original file image on disk. Can you recommend a way to do that? (does not need to be portable beyond Linux) Alternatively, it would be sufficient if I could turn a private mapping into a shared one (and possibly do an msync() afterwards if I need to make sure the changes have been written out). Would such a feature need a lot of effort to implement? Yet another feature that I could use if it were available: A "copy-on-read"-mapping. There, a page would become a private copy of a process once _another_ process wrote data to the corresponding file location. But I suspect that feature could be very hard to implement... Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-06 15:58 How to find out which pages were copied-on-write? Lutz Vieweg @ 2004-07-09 11:31 ` Robin Holt 2004-07-09 20:42 ` Lutz Vieweg 0 siblings, 1 reply; 10+ messages in thread From: Robin Holt @ 2004-07-09 11:31 UTC (permalink / raw) To: Lutz Vieweg; +Cc: linux-kernel OK, now that I am considering this problem, I am trying to figure out what problem we are trying to solve. By reading your email, I gather that you have a single threaded application which is doing an mmap on a file as a MAP_PRIVATE mapping. The memory area is then handed to a library which may modify some pages. You want to decide after the return if you had success and thereby control the writing of the updated data back to the file. Because of the size of the file, doing a second mapping and comparing/copying pages is unreasonable and you would like to only modify the pages that have actually changed. If that is not what you are trying to do, please give me a similar description of _WHAT_ you are trying to do and not the _HOW_ you think the kernel can make this easier. On Tue, Jul 06, 2004 at 05:58:04PM +0200, Lutz Vieweg wrote: > Hi, > > in an application that MAP_PRIVATEly mmap()s a file it would > be quite helpful for me to find out which pages have been > copied-on-write. > > I found that mincore() does a similar thing by reporting which > pages are currently residing in physical memory, but what > I want to know is which pages differ from the original file > image on disk. > > Can you recommend a way to do that? (does not need to be > portable beyond Linux) > > Alternatively, it would be sufficient if I could turn > a private mapping into a shared one (and possibly do an > msync() afterwards if I need to make sure the changes > have been written out). Would such a feature need a > lot of effort to implement? > > > Yet another feature that I could use if it were available: > A "copy-on-read"-mapping. There, a page would become a private > copy of a process once _another_ process wrote data to the > corresponding file location. But I suspect that feature > could be very hard to implement... This is a different way of thinking of copy-on-write. I believe you are thinking of the time when there are two processes sharing the page. When one process takes the write fault, the page is copied and the by that process and the other process becomes the exclusive owner of the page. Thanks, Robin Holt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-09 11:31 ` Robin Holt @ 2004-07-09 20:42 ` Lutz Vieweg 2004-07-10 8:11 ` Michael Clark 0 siblings, 1 reply; 10+ messages in thread From: Lutz Vieweg @ 2004-07-09 20:42 UTC (permalink / raw) To: Robin Holt; +Cc: linux-kernel Robin Holt wrote: > OK, now that I am considering this problem, I am trying to figure out > what problem we are trying to solve. > > By reading your email, I gather that you have a single threaded > application which is doing an mmap on a file as a MAP_PRIVATE mapping. > The memory area is then handed to a library which may modify some pages. > You want to decide after the return if you had success and thereby > control the writing of the updated data back to the file. Because of > the size of the file, doing a second mapping and comparing/copying pages > is unreasonable and you would like to only modify the pages that have > actually changed. That's about it, the most important issue is that I want to avoid having an inconsistent file on the disk for long periods, because a) the application could crash and b) another process might want to map the same file (read-only). And since the application is reaching points where the data is consistent, while it is not in between, it would be nice to have a private mapping while it is inconsistent and commit the changes only at the points where the application knows the data is consistent. Turning a private into a shared mapping would be a perfect solution since that would mean another process could map the file at any time and find consistent data. The second best solution would be the one where the application just manually writes out the changed pages at the time of consistence, this would at least reduce the times when the data on disk is inconsistent to a minimum. >>Yet another feature that I could use if it were available: >>A "copy-on-read"-mapping. There, a page would become a private >>copy of a process once _another_ process wrote data to the >>corresponding file location. But I suspect that feature >>could be very hard to implement... > > This is a different way of thinking of copy-on-write. I believe you > are thinking of the time when there are two processes sharing the page. > When one process takes the write fault, the page is copied and the by that > process and the other process becomes the exclusive owner of the page. A little different: Think of N processes (N may be 8 or so) that mmap() a file using a new mode "MAP_SNAPSHOT" (which could be read-only if a mix with private copy-on-write pages was too hard to realize), and 1 process mmap()ing the same file using MAP_SHARED. Once the N processes mmap()ed the file using MAP_SNAPSHOT, their "view" of the file content would never change, that is, if the one process that mmap()ed the file with MAP_SHARED writes to a page, that page _is_ written to disk the usual way, but the other N processes get a copy of the page before it has been changed, so they will always see the same data. Once the processes that mmap()ed using MAP_SNAPSHOT unmap the file, the copies of the pages that were changed on disk are simply discarded. That would - similar to the features mentioned above - allow one process to efficiently work on portions of a huge file over a longer period of time, and only at times when the file in total contains consistent data, other processes could be instructed to mmap() them again to obtain a newer version. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-09 20:42 ` Lutz Vieweg @ 2004-07-10 8:11 ` Michael Clark 2004-07-12 17:21 ` Lutz Vieweg 0 siblings, 1 reply; 10+ messages in thread From: Michael Clark @ 2004-07-10 8:11 UTC (permalink / raw) To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel HPAs library LPSM sounds like what you're looking for. http://freshmeat.net/projects/lpsm/ Or you can do what you want the hard way using mprotect and a SEGV handler. ~mc On 07/10/04 04:42, Lutz Vieweg wrote: > Robin Holt wrote: > >> OK, now that I am considering this problem, I am trying to figure out >> what problem we are trying to solve. >> >> By reading your email, I gather that you have a single threaded >> application which is doing an mmap on a file as a MAP_PRIVATE mapping. >> The memory area is then handed to a library which may modify some pages. >> You want to decide after the return if you had success and thereby >> control the writing of the updated data back to the file. Because of >> the size of the file, doing a second mapping and comparing/copying pages >> is unreasonable and you would like to only modify the pages that have >> actually changed. > > > That's about it, the most important issue is that I want to avoid > having an inconsistent file on the disk for long periods, because a) > the application could crash and b) another process might want to map > the same file (read-only). And since the application is reaching points > where the data is consistent, while it is not in between, it would be nice > to have a private mapping while it is inconsistent and commit the changes > only at the points where the application knows the data is consistent. > > Turning a private into a shared mapping would be a perfect solution > since that would mean another process could map the file at any time > and find consistent data. The second best solution would be the > one where the application just manually writes out the changed pages > at the time of consistence, this would at least reduce the times when the > data on disk is inconsistent to a minimum. > > > >>> Yet another feature that I could use if it were available: >>> A "copy-on-read"-mapping. There, a page would become a private >>> copy of a process once _another_ process wrote data to the >>> corresponding file location. But I suspect that feature >>> could be very hard to implement... >> >> >> This is a different way of thinking of copy-on-write. I believe you >> are thinking of the time when there are two processes sharing the page. >> When one process takes the write fault, the page is copied and the by >> that >> process and the other process becomes the exclusive owner of the page. > > > A little different: Think of N processes (N may be 8 or so) that mmap() > a file using a new mode "MAP_SNAPSHOT" (which could be read-only if a mix > with private copy-on-write pages was too hard to realize), and 1 process > mmap()ing the same file using MAP_SHARED. Once the N processes mmap()ed > the file using MAP_SNAPSHOT, their "view" of the file content would never > change, that is, if the one process that mmap()ed the file with MAP_SHARED > writes to a page, that page _is_ written to disk the usual way, but the > other N processes get a copy of the page before it has been changed, so > they will always see the same data. > Once the processes that mmap()ed using MAP_SNAPSHOT unmap the file, the > copies of the pages that were changed on disk are simply discarded. > > That would - similar to the features mentioned above - allow one process > to efficiently work on portions of a huge file over a longer period of > time, > and only at times when the file in total contains consistent data, other > processes could be instructed to mmap() them again to obtain a newer > version. > > > Regards, > > Lutz Vieweg > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michael Clark, . . . . . . . . . . . . michael@metaparadigm.com Metaparadigm Pte. Ltd . . . . . . . . http://www.metaparadigm.com "Explore Operations Research: The Science of Better at www.scienceofbetter.org " ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-10 8:11 ` Michael Clark @ 2004-07-12 17:21 ` Lutz Vieweg 2004-07-13 4:16 ` Michael Clark 0 siblings, 1 reply; 10+ messages in thread From: Lutz Vieweg @ 2004-07-12 17:21 UTC (permalink / raw) To: Michael Clark; +Cc: Robin Holt, linux-kernel Michael Clark wrote: > HPAs library LPSM sounds like what you're looking for. > > http://freshmeat.net/projects/lpsm/ > > Or you can do what you want the hard way using mprotect and a SEGV handler. Certainly a valid idea to consider - doing all those things in userspace... so thanks for the hint! But wouldn't that introduce a significant overhead and undermine all of the nice advantages the kernel might have in scheduling I/O operations? However, I shall really consider and profile the mprotect/sighandler approach... Regards, Lutz Vieweg PS: I'm using my own allocator already, so using the C-library implementation wouldn't gain me much... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-12 17:21 ` Lutz Vieweg @ 2004-07-13 4:16 ` Michael Clark 2004-07-13 13:04 ` Lutz Vieweg 0 siblings, 1 reply; 10+ messages in thread From: Michael Clark @ 2004-07-13 4:16 UTC (permalink / raw) To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel On 07/13/04 01:21, Lutz Vieweg wrote: > Michael Clark wrote: > >> HPAs library LPSM sounds like what you're looking for. >> >> http://freshmeat.net/projects/lpsm/ >> >> Or you can do what you want the hard way using mprotect and a SEGV >> handler. > > > Certainly a valid idea to consider - doing all those things in > userspace... so > thanks for the hint! > > But wouldn't that introduce a significant overhead and undermine all of the > nice advantages the kernel might have in scheduling I/O operations? Not really. Plain read/write IO is generally faster than mmap IO anyway. You don't use mmap for speed but rather for convenience. > However, I shall really consider and profile the mprotect/sighandler > approach... > > Regards, > > Lutz Vieweg > > PS: I'm using my own allocator already, so using the C-library > implementation > wouldn't gain me much... This wasn't why I suggested it. It's has the commit semantics on memory mapped files that you were asking about (the allocator is optional I believe). ~mc ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-13 4:16 ` Michael Clark @ 2004-07-13 13:04 ` Lutz Vieweg 2004-07-13 15:02 ` Michael Clark 0 siblings, 1 reply; 10+ messages in thread From: Lutz Vieweg @ 2004-07-13 13:04 UTC (permalink / raw) To: Michael Clark; +Cc: Robin Holt, linux-kernel Michael Clark wrote: >> But wouldn't that introduce a significant overhead and undermine all >> of the >> nice advantages the kernel might have in scheduling I/O operations? > > Not really. Plain read/write IO is generally faster than mmap IO anyway. Well, that was my result, too, when I measured mmap() vs. read()/write() with the 2.4.x kernels, however, I was quite impressed recently when I measured write operations with MAP_SHARED regions under 2.6.7 (CPU x86_64), they were not at all slower than ordinary write()s. (congratulations to the involved kernel hackers on that! :-) > You don't use mmap for speed but rather for convenience. But isn't an advantage with mmap() that there's no need for the kernel to copy what is to be written to a dedicated buffer? The kernel could initiate DMA writes directly from the working memory... Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-13 13:04 ` Lutz Vieweg @ 2004-07-13 15:02 ` Michael Clark 2004-07-13 15:39 ` Lutz Vieweg 0 siblings, 1 reply; 10+ messages in thread From: Michael Clark @ 2004-07-13 15:02 UTC (permalink / raw) To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel On 07/13/04 21:04, Lutz Vieweg wrote: >> You don't use mmap for speed but rather for convenience. > > > But isn't an advantage with mmap() that there's no need for the kernel > to copy what is to be written to a dedicated buffer? The kernel > could initiate DMA writes directly from the working memory... Yes, but page faults are expensive too. Each time a page is written out it needs to be marked read only again and will cause a page fault for the next write access from userspace. For certain workloads this can easily add up to more than copy_(to|from)_user in read/write. read/write also gives you more explicit control on IO batching and scheduling (when to read or write). Less need for the kernel to employ tricks to effectively coaslesce IOs on dirtied pages or sense streaming access patterns. ~mc ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-13 15:02 ` Michael Clark @ 2004-07-13 15:39 ` Lutz Vieweg 2004-07-14 0:25 ` Michael Clark 0 siblings, 1 reply; 10+ messages in thread From: Lutz Vieweg @ 2004-07-13 15:39 UTC (permalink / raw) To: Michael Clark; +Cc: Robin Holt, linux-kernel Michael Clark wrote: > On 07/13/04 21:04, Lutz Vieweg wrote: > >>> You don't use mmap for speed but rather for convenience. >> >> But isn't an advantage with mmap() that there's no need for the kernel >> to copy what is to be written to a dedicated buffer? The kernel >> could initiate DMA writes directly from the working memory... > > Yes, but page faults are expensive too. Each time a page is written > out it needs to be marked read only again and will cause a page fault > for the next write access from userspace. For certain workloads this > can easily add up to more than copy_(to|from)_user in read/write. But I would need exactly the same number of pagefaults if I implemented the "mark-dirty-on-write" logic in userspace using SIGSEGV and signal handlers, as it is done by the LPSM software... > read/write also gives you more explicit control on IO batching and > scheduling (when to read or write). Less need for the kernel to employ > tricks to effectively coaslesce IOs on dirtied pages or sense > streaming access patterns. But if the kernel would turn a private copy of a c-o-w page into a "dirty"-page that is marked for writing out to disk, another process could mmap() the very same page even before it has been written to disk, while if I write out dirty pages using write() in userspace, other processes probably won't notice before all the data has reached the disk, which could take quite some time. And unlike the user space application, the kernel knows which writes go to which physical disk so it can e.g. make better use of striping. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How to find out which pages were copied-on-write? 2004-07-13 15:39 ` Lutz Vieweg @ 2004-07-14 0:25 ` Michael Clark 0 siblings, 0 replies; 10+ messages in thread From: Michael Clark @ 2004-07-14 0:25 UTC (permalink / raw) To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel On 07/13/04 23:39, Lutz Vieweg wrote: > Michael Clark wrote: > >> On 07/13/04 21:04, Lutz Vieweg wrote: >> >>>> You don't use mmap for speed but rather for convenience. >>> >>> >>> But isn't an advantage with mmap() that there's no need for the kernel >>> to copy what is to be written to a dedicated buffer? The kernel >>> could initiate DMA writes directly from the working memory... >> >> >> Yes, but page faults are expensive too. Each time a page is written >> out it needs to be marked read only again and will cause a page fault >> for the next write access from userspace. For certain workloads this >> can easily add up to more than copy_(to|from)_user in read/write. > > > But I would need exactly the same number of pagefaults if I implemented > the "mark-dirty-on-write" logic in userspace using SIGSEGV and signal > handlers, as it is done by the LPSM software... Yes, that's sort of my point although you get the commit semantics you want, albeit a little more usersapce signal overhead and an mprotect call (you've already taken the exception so the extra signal overhead shouldn't be too much), but perhaps less overall page faults than MAP_SHARED (which is the mmap variant that writes dirtied pages back to backing store) as you control when to mark the page clean again ie. do the writeout at your commit point and not before. Really my point was you don't use mmap for speed but rather for convenience. ~mc ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-07-14 0:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-07-06 15:58 How to find out which pages were copied-on-write? Lutz Vieweg 2004-07-09 11:31 ` Robin Holt 2004-07-09 20:42 ` Lutz Vieweg 2004-07-10 8:11 ` Michael Clark 2004-07-12 17:21 ` Lutz Vieweg 2004-07-13 4:16 ` Michael Clark 2004-07-13 13:04 ` Lutz Vieweg 2004-07-13 15:02 ` Michael Clark 2004-07-13 15:39 ` Lutz Vieweg 2004-07-14 0:25 ` Michael Clark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox