How to find out which pages were copied-on-write?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* How to find out which pages were copied-on-write?
@ 2004-07-06 15:58 Lutz Vieweg
  2004-07-09 11:31 ` Robin Holt
  0 siblings, 1 reply; 10+ messages in thread
From: Lutz Vieweg @ 2004-07-06 15:58 UTC (permalink / raw)
  To: linux-kernel

Hi,

in an application that MAP_PRIVATEly mmap()s a file it would
be quite helpful for me to find out which pages have been
copied-on-write.

I found that mincore() does a similar thing by reporting which
pages are currently residing in physical memory, but what
I want to know is which pages differ from the original file
image on disk.

Can you recommend a way to do that? (does not need to be
portable beyond Linux)

Alternatively, it would be sufficient if I could turn
a private mapping into a shared one (and possibly do an
msync() afterwards if I need to make sure the changes
have been written out). Would such a feature need a
lot of effort to implement?

Yet another feature that I could use if it were available:
A "copy-on-read"-mapping. There, a page would become a private
copy of a process once _another_ process wrote data to the
corresponding file location. But I suspect that feature
could be very hard to implement...

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-06 15:58 How to find out which pages were copied-on-write? Lutz Vieweg
@ 2004-07-09 11:31 ` Robin Holt
  2004-07-09 20:42   ` Lutz Vieweg
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Holt @ 2004-07-09 11:31 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: linux-kernel

OK, now that I am considering this problem,  I am trying to figure out
what problem we are trying to solve.

By reading your email, I gather that you have a single threaded
application which is doing an mmap on a file as a MAP_PRIVATE mapping.
The memory area is then handed to a library which may modify some pages.
You want to decide after the return if you had success and thereby
control the writing of the updated data back to the file.  Because of
the size of the file, doing a second mapping and comparing/copying pages
is unreasonable and you would like to only modify the pages that have
actually changed.

If that is not what you are trying to do, please give me a similar
description of _WHAT_ you are trying to do and not the _HOW_ you think
the kernel can make this easier.

On Tue, Jul 06, 2004 at 05:58:04PM +0200, Lutz Vieweg wrote:
> Hi,
> 
> in an application that MAP_PRIVATEly mmap()s a file it would
> be quite helpful for me to find out which pages have been
> copied-on-write.
> 
> I found that mincore() does a similar thing by reporting which
> pages are currently residing in physical memory, but what
> I want to know is which pages differ from the original file
> image on disk.
> 
> Can you recommend a way to do that? (does not need to be
> portable beyond Linux)
> 
> Alternatively, it would be sufficient if I could turn
> a private mapping into a shared one (and possibly do an
> msync() afterwards if I need to make sure the changes
> have been written out). Would such a feature need a
> lot of effort to implement?
> 
> 
> Yet another feature that I could use if it were available:
> A "copy-on-read"-mapping. There, a page would become a private
> copy of a process once _another_ process wrote data to the
> corresponding file location. But I suspect that feature
> could be very hard to implement...

This is a different way of thinking of copy-on-write.  I believe you
are thinking of the time when there are two processes sharing the page.
When one process takes the write fault, the page is copied and the by that
process and the other process becomes the exclusive owner of the page.

Thanks,
Robin Holt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-09 11:31 ` Robin Holt
@ 2004-07-09 20:42   ` Lutz Vieweg
  2004-07-10  8:11     ` Michael Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Lutz Vieweg @ 2004-07-09 20:42 UTC (permalink / raw)
  To: Robin Holt; +Cc: linux-kernel

Robin Holt wrote:
> OK, now that I am considering this problem,  I am trying to figure out
> what problem we are trying to solve.
> 
> By reading your email, I gather that you have a single threaded
> application which is doing an mmap on a file as a MAP_PRIVATE mapping.
> The memory area is then handed to a library which may modify some pages.
> You want to decide after the return if you had success and thereby
> control the writing of the updated data back to the file.  Because of
> the size of the file, doing a second mapping and comparing/copying pages
> is unreasonable and you would like to only modify the pages that have
> actually changed.

That's about it, the most important issue is that I want to avoid
having an inconsistent file on the disk for long periods, because a)
the application could crash and b) another process might want to map
the same file (read-only). And since the application is reaching points
where the data is consistent, while it is not in between, it would be nice
to have a private mapping while it is inconsistent and commit the changes
only at the points where the application knows the data is consistent.

Turning a private into a shared mapping would be a perfect solution
since that would mean another process could map the file at any time
and find consistent data. The second best solution would be the
one where the application just manually writes out the changed pages
at the time of consistence, this would at least reduce the times when the
data on disk is inconsistent to a minimum.

>>Yet another feature that I could use if it were available:
>>A "copy-on-read"-mapping. There, a page would become a private
>>copy of a process once _another_ process wrote data to the
>>corresponding file location. But I suspect that feature
>>could be very hard to implement...
> 
> This is a different way of thinking of copy-on-write.  I believe you
> are thinking of the time when there are two processes sharing the page.
> When one process takes the write fault, the page is copied and the by that
> process and the other process becomes the exclusive owner of the page.

A little different: Think of N processes (N may be 8 or so) that mmap()
a file using a new mode "MAP_SNAPSHOT" (which could be read-only if a mix
with private copy-on-write pages was too hard to realize), and 1 process
mmap()ing the same file using MAP_SHARED. Once the N processes mmap()ed
the file using MAP_SNAPSHOT, their "view" of the file content would never
change, that is, if the one process that mmap()ed the file with MAP_SHARED
writes to a page, that page _is_ written to disk the usual way, but the
other N processes get a copy of the page before it has been changed, so
they will always see the same data.
Once the processes that mmap()ed using MAP_SNAPSHOT unmap the file, the
copies of the pages that were changed on disk are simply discarded.

That would - similar to the features mentioned above - allow one process
to efficiently work on portions of a huge file over a longer period of time,
and only at times when the file in total contains consistent data, other
processes could be instructed to mmap() them again to obtain a newer version.

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-09 20:42   ` Lutz Vieweg
@ 2004-07-10  8:11     ` Michael Clark
  2004-07-12 17:21       ` Lutz Vieweg
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Clark @ 2004-07-10  8:11 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel

HPAs library LPSM sounds like what you're looking for.

http://freshmeat.net/projects/lpsm/

Or you can do what you want the hard way using mprotect and a SEGV handler.

~mc

On 07/10/04 04:42, Lutz Vieweg wrote:
> Robin Holt wrote:
> 
>> OK, now that I am considering this problem,  I am trying to figure out
>> what problem we are trying to solve.
>>
>> By reading your email, I gather that you have a single threaded
>> application which is doing an mmap on a file as a MAP_PRIVATE mapping.
>> The memory area is then handed to a library which may modify some pages.
>> You want to decide after the return if you had success and thereby
>> control the writing of the updated data back to the file.  Because of
>> the size of the file, doing a second mapping and comparing/copying pages
>> is unreasonable and you would like to only modify the pages that have
>> actually changed.
> 
> 
> That's about it, the most important issue is that I want to avoid
> having an inconsistent file on the disk for long periods, because a)
> the application could crash and b) another process might want to map
> the same file (read-only). And since the application is reaching points
> where the data is consistent, while it is not in between, it would be nice
> to have a private mapping while it is inconsistent and commit the changes
> only at the points where the application knows the data is consistent.
> 
> Turning a private into a shared mapping would be a perfect solution
> since that would mean another process could map the file at any time
> and find consistent data. The second best solution would be the
> one where the application just manually writes out the changed pages
> at the time of consistence, this would at least reduce the times when the
> data on disk is inconsistent to a minimum.
> 
> 
> 
>>> Yet another feature that I could use if it were available:
>>> A "copy-on-read"-mapping. There, a page would become a private
>>> copy of a process once _another_ process wrote data to the
>>> corresponding file location. But I suspect that feature
>>> could be very hard to implement...
>>
>>
>> This is a different way of thinking of copy-on-write.  I believe you
>> are thinking of the time when there are two processes sharing the page.
>> When one process takes the write fault, the page is copied and the by 
>> that
>> process and the other process becomes the exclusive owner of the page.
> 
> 
> A little different: Think of N processes (N may be 8 or so) that mmap()
> a file using a new mode "MAP_SNAPSHOT" (which could be read-only if a mix
> with private copy-on-write pages was too hard to realize), and 1 process
> mmap()ing the same file using MAP_SHARED. Once the N processes mmap()ed
> the file using MAP_SNAPSHOT, their "view" of the file content would never
> change, that is, if the one process that mmap()ed the file with MAP_SHARED
> writes to a page, that page _is_ written to disk the usual way, but the
> other N processes get a copy of the page before it has been changed, so
> they will always see the same data.
> Once the processes that mmap()ed using MAP_SNAPSHOT unmap the file, the
> copies of the pages that were changed on disk are simply discarded.
> 
> That would - similar to the features mentioned above - allow one process
> to efficiently work on portions of a huge file over a longer period of 
> time,
> and only at times when the file in total contains consistent data, other
> processes could be instructed to mmap() them again to obtain a newer 
> version.
> 
> 
> Regards,
> 
> Lutz Vieweg
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michael Clark,  . . . . . . . . . . . .  michael@metaparadigm.com
Metaparadigm Pte. Ltd . . . . . . . . http://www.metaparadigm.com

                    "Explore Operations Research:
          The Science of Better at www.scienceofbetter.org "

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-10  8:11     ` Michael Clark
@ 2004-07-12 17:21       ` Lutz Vieweg
  2004-07-13  4:16         ` Michael Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Lutz Vieweg @ 2004-07-12 17:21 UTC (permalink / raw)
  To: Michael Clark; +Cc: Robin Holt, linux-kernel

Michael Clark wrote:
> HPAs library LPSM sounds like what you're looking for.
> 
> http://freshmeat.net/projects/lpsm/
> 
> Or you can do what you want the hard way using mprotect and a SEGV handler.

Certainly a valid idea to consider - doing all those things in userspace... so
thanks for the hint!

But wouldn't that introduce a significant overhead and undermine all of the
nice advantages the kernel might have in scheduling I/O operations?

However, I shall really consider and profile the mprotect/sighandler approach...

Regards,

Lutz Vieweg

PS: I'm using my own allocator already, so using the C-library implementation
     wouldn't gain me much...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-12 17:21       ` Lutz Vieweg
@ 2004-07-13  4:16         ` Michael Clark
  2004-07-13 13:04           ` Lutz Vieweg
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Clark @ 2004-07-13  4:16 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel

On 07/13/04 01:21, Lutz Vieweg wrote:
> Michael Clark wrote:
> 
>> HPAs library LPSM sounds like what you're looking for.
>>
>> http://freshmeat.net/projects/lpsm/
>>
>> Or you can do what you want the hard way using mprotect and a SEGV 
>> handler.
> 
> 
> Certainly a valid idea to consider - doing all those things in 
> userspace... so
> thanks for the hint!
> 
> But wouldn't that introduce a significant overhead and undermine all of the
> nice advantages the kernel might have in scheduling I/O operations?

Not really. Plain read/write IO is generally faster than mmap IO anyway.
You don't use mmap for speed but rather for convenience.

> However, I shall really consider and profile the mprotect/sighandler 
> approach...
> 
> Regards,
> 
> Lutz Vieweg
> 
> PS: I'm using my own allocator already, so using the C-library 
> implementation
>     wouldn't gain me much...

This wasn't why I suggested it. It's has the commit semantics
on memory mapped files that you were asking about (the allocator
is optional I believe).

~mc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-13  4:16         ` Michael Clark
@ 2004-07-13 13:04           ` Lutz Vieweg
  2004-07-13 15:02             ` Michael Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Lutz Vieweg @ 2004-07-13 13:04 UTC (permalink / raw)
  To: Michael Clark; +Cc: Robin Holt, linux-kernel

Michael Clark wrote:

>> But wouldn't that introduce a significant overhead and undermine all 
>> of the
>> nice advantages the kernel might have in scheduling I/O operations?
>  
> Not really. Plain read/write IO is generally faster than mmap IO anyway.

Well, that was my result, too, when I measured mmap() vs. read()/write()
with the 2.4.x kernels, however, I was quite impressed recently when
I measured write operations with MAP_SHARED regions under 2.6.7
(CPU x86_64), they were not at all slower than ordinary write()s.
(congratulations to the involved kernel hackers on that! :-)

> You don't use mmap for speed but rather for convenience.

But isn't an advantage with mmap() that there's no need for the kernel
to copy what is to be written to a dedicated buffer? The kernel
could initiate DMA writes directly from the working memory...

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-13 13:04           ` Lutz Vieweg
@ 2004-07-13 15:02             ` Michael Clark
  2004-07-13 15:39               ` Lutz Vieweg
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Clark @ 2004-07-13 15:02 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel

On 07/13/04 21:04, Lutz Vieweg wrote:
>> You don't use mmap for speed but rather for convenience.
> 
> 
> But isn't an advantage with mmap() that there's no need for the kernel
> to copy what is to be written to a dedicated buffer? The kernel
> could initiate DMA writes directly from the working memory...

Yes, but page faults are expensive too. Each time a page is written
out it needs to be marked read only again and will cause a page fault
for the next write access from userspace. For certain workloads this
can easily add up to more than copy_(to|from)_user in read/write.

read/write also gives you more explicit control on IO batching and
scheduling (when to read or write). Less need for the kernel to employ
tricks to effectively coaslesce IOs on dirtied pages or sense
streaming access patterns.

~mc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-13 15:02             ` Michael Clark
@ 2004-07-13 15:39               ` Lutz Vieweg
  2004-07-14  0:25                 ` Michael Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Lutz Vieweg @ 2004-07-13 15:39 UTC (permalink / raw)
  To: Michael Clark; +Cc: Robin Holt, linux-kernel

Michael Clark wrote:
> On 07/13/04 21:04, Lutz Vieweg wrote:
> 
>>> You don't use mmap for speed but rather for convenience.
>>
>> But isn't an advantage with mmap() that there's no need for the kernel
>> to copy what is to be written to a dedicated buffer? The kernel
>> could initiate DMA writes directly from the working memory...
> 
> Yes, but page faults are expensive too. Each time a page is written
> out it needs to be marked read only again and will cause a page fault
> for the next write access from userspace. For certain workloads this
> can easily add up to more than copy_(to|from)_user in read/write.

But I would need exactly the same number of pagefaults if I implemented
the "mark-dirty-on-write" logic in userspace using SIGSEGV and signal
handlers, as it is done by the LPSM software...

> read/write also gives you more explicit control on IO batching and
> scheduling (when to read or write). Less need for the kernel to employ
> tricks to effectively coaslesce IOs on dirtied pages or sense
> streaming access patterns.

But if the kernel would turn a private copy of a c-o-w page into a
"dirty"-page that is marked for writing out to disk, another process
could mmap() the very same page even before it has been written to disk,
while if I write out dirty pages using write() in userspace, other
processes probably won't notice before all the data has reached the disk,
which could take quite some time.

And unlike the user space application, the kernel knows which writes
go to which physical disk so it can e.g. make better use of striping.

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to find out which pages were copied-on-write?
  2004-07-13 15:39               ` Lutz Vieweg
@ 2004-07-14  0:25                 ` Michael Clark
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Clark @ 2004-07-14  0:25 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Robin Holt, linux-kernel

On 07/13/04 23:39, Lutz Vieweg wrote:
> Michael Clark wrote:
> 
>> On 07/13/04 21:04, Lutz Vieweg wrote:
>>
>>>> You don't use mmap for speed but rather for convenience.
>>>
>>>
>>> But isn't an advantage with mmap() that there's no need for the kernel
>>> to copy what is to be written to a dedicated buffer? The kernel
>>> could initiate DMA writes directly from the working memory...
>>
>>
>> Yes, but page faults are expensive too. Each time a page is written
>> out it needs to be marked read only again and will cause a page fault
>> for the next write access from userspace. For certain workloads this
>> can easily add up to more than copy_(to|from)_user in read/write.
> 
> 
> But I would need exactly the same number of pagefaults if I implemented
> the "mark-dirty-on-write" logic in userspace using SIGSEGV and signal
> handlers, as it is done by the LPSM software...

Yes, that's sort of my point although you get the commit semantics
you want, albeit a little more usersapce signal overhead and an mprotect
call (you've already taken the exception so the extra signal overhead
shouldn't be too much), but perhaps less overall page faults than
MAP_SHARED (which is the mmap variant that writes dirtied pages back
to backing store) as you control when to mark the page clean again
ie. do the writeout at your commit point and not before.

Really my point was you don't use mmap for speed but rather for convenience.

~mc

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-07-14  0:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-06 15:58 How to find out which pages were copied-on-write? Lutz Vieweg
2004-07-09 11:31 ` Robin Holt
2004-07-09 20:42   ` Lutz Vieweg
2004-07-10  8:11     ` Michael Clark
2004-07-12 17:21       ` Lutz Vieweg
2004-07-13  4:16         ` Michael Clark
2004-07-13 13:04           ` Lutz Vieweg
2004-07-13 15:02             ` Michael Clark
2004-07-13 15:39               ` Lutz Vieweg
2004-07-14  0:25                 ` Michael Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox