From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steve French <smfrench@austin.rr.com>
Subject: Re: writethrough of mmapped pages
Date: Sat, 31 Jan 2004 17:45:49 -0600
Sender: linux-fsdevel-owner@vger.kernel.org
Message-ID: <401C3E2D.7040806@austin.rr.com>
References: <401C371B.6020909@austin.rr.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ms-smtp-01.texas.rr.com ([24.93.47.40]:38898 "EHLO
	ms-smtp-01-eri0.texas.rr.com") by vger.kernel.org with ESMTP
	id S265173AbUAaXpy (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 31 Jan 2004 18:45:54 -0500
To: unlisted-recipients:; (no To-header on input)
List-Id: linux-fsdevel.vger.kernel.org

The experiment described at the bottom (always calling the generic read 
and write but doing filemap_fdatasync and invalidate_remote_inode when 
file not cacheable) failed - same symptom.   It looks like the portion 
of page implicitly extended is not zeroed properly but I am not sure why 
calling filemap_fdatasync early would make a difference.  In the note 
below there is a typo - I meant invalidate_remote_inode not 
invalidate_inode_pages (since when the file is not cacheable I have to 
force the client to reread the pages).

Steve French wrote

> Unfortunately - an experiment I tried yielded very strange results and 
> made things much worse.
> First I disabled caching in my code by setting 
> /proc/fs/cifs/OplockEnabled to zero (simulating losing
> the caching token) and ran the fsx file system exerciser test case 
> which almost immediately failed.
>
> The fsx test case did the following steps before failing:
> 1) create new file
> 2) truncate up to 0x13E76
> 3) write to range 0x17098 - 0x26857
> 4) read 0xC73E through 0x1B801
> 5) check the data just read which turned up incorrect random data 
> starting at offset 0x13000
>
> What I see in my code (from the cifs vfs perspective) is basically :
> 1) create/open
> 2) setattrs (set file size)
> 3) write 0x17098 - 0x26857
> 4) but during the read (since I did not have a caching token and 
> therefore was not calling
> generic_read in my read routine) I tried the experiment of calling 
> filemap_fdatawrite (to
> handle the case in which mmapped data might exist locallly for this 
> file but not be written to the
> server for the range I was about to read from the server) before 
> issuing the read call.  This is
> a disaster because it causes a write of two (!) dirty pages to the 
> server (there should not
> be any).  4) the filemap_fdatawrite caused writepage of one page at 
> 0x0000 (fortunately all zero as would be expected)
> but ... and this is the problem ... it then it did a writepage of  one 
> page at 0x13000 (note that this range should be zero
> from the truncate (truncate_up to 0x13E76) and the rest should be zero 
> implicitly from the write beyond end of file.
> In any case - there never really was a write to this page (at 0x13000) 
> and the server copy of the data was fine before this write which 
> writes junk - locally the
> page cache thinks this page is dirty and the write of a bogus dirty 
> page out is bad.
> 5) then I do the read from the server which returns the data which is 
> fine from 0xC73E but includes the junk starting at offset 0x1B00 (and 
> then the testcase fails)
>
> The logic inside the read routine originally had been to just call 
> generic_file_read and the write routine just called generic_file_write 
> (and all single machine remote cases  worked fine that way for many 
> months) but the change I made to fix the oplock problem changed the 
> read wrapper routine to do:
>
>     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
>         return generic_file_read(file,read_data,read_size,poffset);
>     else /* this calls the server directly to get the data and does 
> not go through the pagecache */ {
>         /* to get mmap data to get flushed in the case in which the 
> caching token lost
>        and we are not going through the pagecache I tried adding a 
> call to
>        filemap_fdatawrite here which caused the problems described in 
> the note */
>         return cifs_read(file,read_data,read_size,poffset);     }
>
> and similarly in my write wrapper routine:
>
>    /* check whether we can cache writes locally */
>    /* if not then check whether we can use the page cache at all */
>    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)         
> return generic_file_write(file,write_data, write_size,poffset);
>
>    else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
>        written = generic_file_write(file,write_data, write_size,poffset);
>        /* since we can not cache writes, need to sync the data */
>        if(file->f_dentry->d_inode->i_mapping == NULL)
>            return -EIO;
>        filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
>            return written;
>    } else { /* this does not go through the page cache but goes 
> directly to the server */
>         return cifs_write(file,write_data,write_size,poffset);
>    }
>
> It seems that I am going to always have to go through the pagecache it 
> seems (remove the call to cifs_write and cifs_read and always call the 
> generic_file_read and generic_file_write), perhaps the next thing to 
> try is: this for read:
>
>     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
>         return generic_file_read(file,read_data,read_size,poffset);
>     else {
>        if(file->f_dentry->d_inode->i_mapping)
>            
> filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);              
> invalidate_inode_pages(file->f_dentry->d_inode);
>         return generic_file_read(file,read_data,read_size,poffset);
>    }
>
> and this for write:
>    /* check whether we can cache writes locally */
>    /* if not then check whether we can use the page cache at all */
>    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)         
> return generic_file_write(file,write_data, write_size,poffset);
>    else {
>        written = generic_file_write(file,write_data, write_size,poffset);
>        /* since we can not cache writes, need to sync the data */
>        if(file->f_dentry->d_inode->i_mapping)
>            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
>       return written;
>    }
>
> Are there better ways to do this?
>