From mboxrd@z Thu Jan  1 00:00:00 1970
From: Nikita Danilov <nikita@clusterfs.com>
Subject: Re: Address space operations questions
Date: Wed, 30 Mar 2005 17:55:16 +0400
Message-ID: <16970.44996.53630.886769@gargle.gargle.HOWL>
References: <8e70aacf05032616151c958eed@mail.gmail.com>
	<8e70aacf05032914306a827923@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from [80.71.243.242] ([80.71.243.242]:37760 "EHLO tau.rusteko.ru")
	by vger.kernel.org with ESMTP id S261898AbVC3NzY (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 30 Mar 2005 08:55:24 -0500
To: Martin Jambor <jamborm@gmail.com>
In-Reply-To: <8e70aacf05032914306a827923@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Martin Jambor writes:
 > Hi,
 > 
 > I have problems understanding the purpose of different entries of
 > struc address_space_operations in 2.6 kernels:
 > 
 > 1. What is bmap for and what is it supposed to do?

->bmap() maps logical block offset within "object" to physical block
number. It is used in few places, notably in the implementation of
FIBMAP ioctl.

 > 
 > 2. What is the difference between sync_page and write_page?

(It is spelt ->writepage() by the way).

->sync_page() is an awful misnomer. Usually, when page IO operation is
requested by calling ->writepage() or ->readpage(), file-system queues
IO request (e.g., disk-based file system may do this my calling
submit_bio()), but underlying device driver does not proceed with this
IO immediately, because IO scheduling is more efficient when there are
multiple requests in the queue.

Only when something really wants to wait for IO completion
(wait_on_page_{locked,writeback}() are used to wait for read and write
completion respectively) IO queue is processed. To do this
wait_on_page_bit() calls ->sync_page() (see block_sync_page()---standard
implementation of ->sync_page() for disk-based file systems).

So, semantics of ->sync_page() are roughly "kick underlying storage
driver to actually perform all IO queued for this page, and, maybe, for
other pages on this device too".

 > 
 > 3. What exactly (fs independent) is the relation in between
 > write_page, prepare_write and commit_write? Does prepare make sure a
 > page can be written (like allocating space), commit mark it dirty a
 > write write it sometime later on?

->prepare_write() and ->commit_write() are only used by
generic_file_write() (so, one may argue that they shouldn't be placed
into struct address_space at all).

generic_file_write() has a loop for each page overlapping with portion
of file that write goes into:

     a_ops->prepare_write(file, page, from, to);
     copy_from_user(...);
     a_ops->commit_write(file, page, from, to);

In page is partially overwritten, ->prepare_write() has to read parts of
the page that are not covered by write. ->commit_write() is expected to
mark page (or buffers) and inode dirty, and update inode size, if write
extends file.

As for block allocation and transaction handling, this is up to the file
system back end.

Usually ->commit_write() doesn't start IO by itself, it just marks pages
dirty. Write-out is done by balance_dirty_pages_ratelimited(): when
number of dirty pages in the system exceeds some threshold, kernel calls
->writepages() of dirty inodes.

->writepage() is used in two places:

    - by VM scanner to write out dirty page from tail of the inactive
    list.  This is "rare" path, because balance_dirty_pages() is
    supposed to keep amount of dirty pages under control.

    - by mpage_writepages(): default implementation of ->writepages()
    method.

 > 
 > Thak you very much for any insight,
 > 
 > Martin

Hope this helps.

Nikita.