From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anton Altaparmakov <aia21@cam.ac.uk>
Subject: Re: Writing out a (file) mmapped page
Date: Tue, 14 Jun 2005 16:36:23 +0100
Message-ID: <1118763383.16269.61.camel@imp.csi.cam.ac.uk>
References: <8e70aacf050612094916d32276@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ppsw-0.csi.cam.ac.uk ([131.111.8.130]:30884 "EHLO
	ppsw-0.csi.cam.ac.uk") by vger.kernel.org with ESMTP
	id S261169AbVFNPga (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 14 Jun 2005 11:36:30 -0400
To: Martin Jambor <jamborm@gmail.com>
In-Reply-To: <8e70aacf050612094916d32276@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Hi,

DISCLAIMER:  I am no vm expert and the below is just my take on things.
And note it is i386 specific, I known next to nothing about the insides
of other architectures...  I am sure that if I have described anything
wrong someone who knows better will correct me.  (-:

On Sun, 2005-06-12 at 18:49 +0200, Martin Jambor wrote:
> I have spent a few hours trying to find out how dirty mmapped pages
> are written out in filesystems using the "generic" functions but so
> far I have not been successful. The main thing that escapes me is the
> following:
> 
> block_write_full_page() writes out only buffers marked dirty or whole
> page when there are no buffers associated with it. Where in kernel are
> buffers either marked dirty or stripped off a mmaped page when the
> page itself becomes dirty?  I would be very grateful for a pointer to
> the source, possibly accompanied by a brief explanation of how it gets
> called.

That is a little trickier than one might expect because it is partially
done in hardware (heavily arch dependent though).

When a write to a writable mmapped page happens the CPU sets the page
dirty flag in hardware.  So there is no code where you can see this
happen.  Later on, at msync() time or munmap() time, this hardware dirty
bit results in a set_page_dirty() being called which for buffer based
filesystems will be a __set_page_dirty_buffers() which will dirty both
the buffers and the page.  For example take the msync case:

mm/msync.c::sys_msync ->
	msync_interval ->
		filemap_sync->
			sync_page_range->
				sync_pud_range ->
					sync_pmd_range ->
						sync_pte_range ->
							set_page_dirty

The beauty of short inline functions.  (-:

When the page is not writable, then a page fault occurs and the generic
page fault handler is called:

arch/i386/kernel/entry.S defines the page_fault entry:

ENTRY(page_fault)
        pushl $do_page_fault
        jmp error_code

arch/i386/mm/fault.c::do_page_fault -> (We now move to non-arch specific
code.)
	mm/memory.c::handle_mm_fault ->
		handle_pte_fault

Now the code paths diverge depending on what page the fault is hitting
(shared or not, read-only or not, swap page or not, page not present in
memory at all, etc).

In the end all cases end up generating a writable page (see
pte_mkwrite), either by modifying the existing one, or by copying the
existing shared page - also known as COW = Copy On Write, or by creating
a new page if none was present at all, and doing a pte_mkdirty (or an
equivalent thereof).

The pte_mkdirty causes the page to be dirty in the page tables and this
gets caught at msync/munmap time and results in set_page_dirty being
called as described above.  (Note the pte_mkdirty results in exactly the
same thing happening to the page as when the cpu does it in the
writable-page-already-present case I described above.)

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/