From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve French Subject: Re: writethrough of mmapped pages Date: Sat, 31 Jan 2004 17:45:49 -0600 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <401C3E2D.7040806@austin.rr.com> References: <401C371B.6020909@austin.rr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org Return-path: Received: from ms-smtp-01.texas.rr.com ([24.93.47.40]:38898 "EHLO ms-smtp-01-eri0.texas.rr.com") by vger.kernel.org with ESMTP id S265173AbUAaXpy (ORCPT ); Sat, 31 Jan 2004 18:45:54 -0500 To: unlisted-recipients:; (no To-header on input) List-Id: linux-fsdevel.vger.kernel.org The experiment described at the bottom (always calling the generic read and write but doing filemap_fdatasync and invalidate_remote_inode when file not cacheable) failed - same symptom. It looks like the portion of page implicitly extended is not zeroed properly but I am not sure why calling filemap_fdatasync early would make a difference. In the note below there is a typo - I meant invalidate_remote_inode not invalidate_inode_pages (since when the file is not cacheable I have to force the client to reread the pages). Steve French wrote > Unfortunately - an experiment I tried yielded very strange results and > made things much worse. > First I disabled caching in my code by setting > /proc/fs/cifs/OplockEnabled to zero (simulating losing > the caching token) and ran the fsx file system exerciser test case > which almost immediately failed. > > The fsx test case did the following steps before failing: > 1) create new file > 2) truncate up to 0x13E76 > 3) write to range 0x17098 - 0x26857 > 4) read 0xC73E through 0x1B801 > 5) check the data just read which turned up incorrect random data > starting at offset 0x13000 > > What I see in my code (from the cifs vfs perspective) is basically : > 1) create/open > 2) setattrs (set file size) > 3) write 0x17098 - 0x26857 > 4) but during the read (since I did not have a caching token and > therefore was not calling > generic_read in my read routine) I tried the experiment of calling > filemap_fdatawrite (to > handle the case in which mmapped data might exist locallly for this > file but not be written to the > server for the range I was about to read from the server) before > issuing the read call. This is > a disaster because it causes a write of two (!) dirty pages to the > server (there should not > be any). 4) the filemap_fdatawrite caused writepage of one page at > 0x0000 (fortunately all zero as would be expected) > but ... and this is the problem ... it then it did a writepage of one > page at 0x13000 (note that this range should be zero > from the truncate (truncate_up to 0x13E76) and the rest should be zero > implicitly from the write beyond end of file. > In any case - there never really was a write to this page (at 0x13000) > and the server copy of the data was fine before this write which > writes junk - locally the > page cache thinks this page is dirty and the write of a bogus dirty > page out is bad. > 5) then I do the read from the server which returns the data which is > fine from 0xC73E but includes the junk starting at offset 0x1B00 (and > then the testcase fails) > > The logic inside the read routine originally had been to just call > generic_file_read and the write routine just called generic_file_write > (and all single machine remote cases worked fine that way for many > months) but the change I made to fix the oplock problem changed the > read wrapper routine to do: > > if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) > return generic_file_read(file,read_data,read_size,poffset); > else /* this calls the server directly to get the data and does > not go through the pagecache */ { > /* to get mmap data to get flushed in the case in which the > caching token lost > and we are not going through the pagecache I tried adding a > call to > filemap_fdatawrite here which caused the problems described in > the note */ > return cifs_read(file,read_data,read_size,poffset); } > > and similarly in my write wrapper routine: > > /* check whether we can cache writes locally */ > /* if not then check whether we can use the page cache at all */ > if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) > return generic_file_write(file,write_data, write_size,poffset); > > else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) { > written = generic_file_write(file,write_data, write_size,poffset); > /* since we can not cache writes, need to sync the data */ > if(file->f_dentry->d_inode->i_mapping == NULL) > return -EIO; > filemap_fdatawrite(file->f_dentry->d_inode->i_mapping); > return written; > } else { /* this does not go through the page cache but goes > directly to the server */ > return cifs_write(file,write_data,write_size,poffset); > } > > It seems that I am going to always have to go through the pagecache it > seems (remove the call to cifs_write and cifs_read and always call the > generic_file_read and generic_file_write), perhaps the next thing to > try is: this for read: > > if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) > return generic_file_read(file,read_data,read_size,poffset); > else { > if(file->f_dentry->d_inode->i_mapping) > > filemap_fdatawrite(file->f_dentry->d_inode->i_mapping); > invalidate_inode_pages(file->f_dentry->d_inode); > return generic_file_read(file,read_data,read_size,poffset); > } > > and this for write: > /* check whether we can cache writes locally */ > /* if not then check whether we can use the page cache at all */ > if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) > return generic_file_write(file,write_data, write_size,poffset); > else { > written = generic_file_write(file,write_data, write_size,poffset); > /* since we can not cache writes, need to sync the data */ > if(file->f_dentry->d_inode->i_mapping) > filemap_fdatawrite(file->f_dentry->d_inode->i_mapping); > return written; > } > > Are there better ways to do this? >