From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Altaparmakov Subject: Re: Writing out a (file) mmapped page Date: Tue, 14 Jun 2005 16:36:23 +0100 Message-ID: <1118763383.16269.61.camel@imp.csi.cam.ac.uk> References: <8e70aacf050612094916d32276@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from ppsw-0.csi.cam.ac.uk ([131.111.8.130]:30884 "EHLO ppsw-0.csi.cam.ac.uk") by vger.kernel.org with ESMTP id S261169AbVFNPga (ORCPT ); Tue, 14 Jun 2005 11:36:30 -0400 To: Martin Jambor In-Reply-To: <8e70aacf050612094916d32276@mail.gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi, DISCLAIMER: I am no vm expert and the below is just my take on things. And note it is i386 specific, I known next to nothing about the insides of other architectures... I am sure that if I have described anything wrong someone who knows better will correct me. (-: On Sun, 2005-06-12 at 18:49 +0200, Martin Jambor wrote: > I have spent a few hours trying to find out how dirty mmapped pages > are written out in filesystems using the "generic" functions but so > far I have not been successful. The main thing that escapes me is the > following: > > block_write_full_page() writes out only buffers marked dirty or whole > page when there are no buffers associated with it. Where in kernel are > buffers either marked dirty or stripped off a mmaped page when the > page itself becomes dirty? I would be very grateful for a pointer to > the source, possibly accompanied by a brief explanation of how it gets > called. That is a little trickier than one might expect because it is partially done in hardware (heavily arch dependent though). When a write to a writable mmapped page happens the CPU sets the page dirty flag in hardware. So there is no code where you can see this happen. Later on, at msync() time or munmap() time, this hardware dirty bit results in a set_page_dirty() being called which for buffer based filesystems will be a __set_page_dirty_buffers() which will dirty both the buffers and the page. For example take the msync case: mm/msync.c::sys_msync -> msync_interval -> filemap_sync-> sync_page_range-> sync_pud_range -> sync_pmd_range -> sync_pte_range -> set_page_dirty The beauty of short inline functions. (-: When the page is not writable, then a page fault occurs and the generic page fault handler is called: arch/i386/kernel/entry.S defines the page_fault entry: ENTRY(page_fault) pushl $do_page_fault jmp error_code arch/i386/mm/fault.c::do_page_fault -> (We now move to non-arch specific code.) mm/memory.c::handle_mm_fault -> handle_pte_fault Now the code paths diverge depending on what page the fault is hitting (shared or not, read-only or not, swap page or not, page not present in memory at all, etc). In the end all cases end up generating a writable page (see pte_mkwrite), either by modifying the existing one, or by copying the existing shared page - also known as COW = Copy On Write, or by creating a new page if none was present at all, and doing a pte_mkdirty (or an equivalent thereof). The pte_mkdirty causes the page to be dirty in the page tables and this gets caught at msync/munmap time and results in set_page_dirty being called as described above. (Note the pte_mkdirty results in exactly the same thing happening to the page as when the cpu does it in the writable-page-already-present case I described above.) Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/