From: Steve French <smfrench@austin.rr.com>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org
Subject: Re: writethrough of mmapped pages
Date: Sat, 31 Jan 2004 17:45:49 -0600 [thread overview]
Message-ID: <401C3E2D.7040806@austin.rr.com> (raw)
In-Reply-To: 401C371B.6020909@austin.rr.com
The experiment described at the bottom (always calling the generic read
and write but doing filemap_fdatasync and invalidate_remote_inode when
file not cacheable) failed - same symptom. It looks like the portion
of page implicitly extended is not zeroed properly but I am not sure why
calling filemap_fdatasync early would make a difference. In the note
below there is a typo - I meant invalidate_remote_inode not
invalidate_inode_pages (since when the file is not cacheable I have to
force the client to reread the pages).
Steve French wrote
> Unfortunately - an experiment I tried yielded very strange results and
> made things much worse.
> First I disabled caching in my code by setting
> /proc/fs/cifs/OplockEnabled to zero (simulating losing
> the caching token) and ran the fsx file system exerciser test case
> which almost immediately failed.
>
> The fsx test case did the following steps before failing:
> 1) create new file
> 2) truncate up to 0x13E76
> 3) write to range 0x17098 - 0x26857
> 4) read 0xC73E through 0x1B801
> 5) check the data just read which turned up incorrect random data
> starting at offset 0x13000
>
> What I see in my code (from the cifs vfs perspective) is basically :
> 1) create/open
> 2) setattrs (set file size)
> 3) write 0x17098 - 0x26857
> 4) but during the read (since I did not have a caching token and
> therefore was not calling
> generic_read in my read routine) I tried the experiment of calling
> filemap_fdatawrite (to
> handle the case in which mmapped data might exist locallly for this
> file but not be written to the
> server for the range I was about to read from the server) before
> issuing the read call. This is
> a disaster because it causes a write of two (!) dirty pages to the
> server (there should not
> be any). 4) the filemap_fdatawrite caused writepage of one page at
> 0x0000 (fortunately all zero as would be expected)
> but ... and this is the problem ... it then it did a writepage of one
> page at 0x13000 (note that this range should be zero
> from the truncate (truncate_up to 0x13E76) and the rest should be zero
> implicitly from the write beyond end of file.
> In any case - there never really was a write to this page (at 0x13000)
> and the server copy of the data was fine before this write which
> writes junk - locally the
> page cache thinks this page is dirty and the write of a bogus dirty
> page out is bad.
> 5) then I do the read from the server which returns the data which is
> fine from 0xC73E but includes the junk starting at offset 0x1B00 (and
> then the testcase fails)
>
> The logic inside the read routine originally had been to just call
> generic_file_read and the write routine just called generic_file_write
> (and all single machine remote cases worked fine that way for many
> months) but the change I made to fix the oplock problem changed the
> read wrapper routine to do:
>
> if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
> return generic_file_read(file,read_data,read_size,poffset);
> else /* this calls the server directly to get the data and does
> not go through the pagecache */ {
> /* to get mmap data to get flushed in the case in which the
> caching token lost
> and we are not going through the pagecache I tried adding a
> call to
> filemap_fdatawrite here which caused the problems described in
> the note */
> return cifs_read(file,read_data,read_size,poffset); }
>
> and similarly in my write wrapper routine:
>
> /* check whether we can cache writes locally */
> /* if not then check whether we can use the page cache at all */
> if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)
> return generic_file_write(file,write_data, write_size,poffset);
>
> else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
> written = generic_file_write(file,write_data, write_size,poffset);
> /* since we can not cache writes, need to sync the data */
> if(file->f_dentry->d_inode->i_mapping == NULL)
> return -EIO;
> filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
> return written;
> } else { /* this does not go through the page cache but goes
> directly to the server */
> return cifs_write(file,write_data,write_size,poffset);
> }
>
> It seems that I am going to always have to go through the pagecache it
> seems (remove the call to cifs_write and cifs_read and always call the
> generic_file_read and generic_file_write), perhaps the next thing to
> try is: this for read:
>
> if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
> return generic_file_read(file,read_data,read_size,poffset);
> else {
> if(file->f_dentry->d_inode->i_mapping)
>
> filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
> invalidate_inode_pages(file->f_dentry->d_inode);
> return generic_file_read(file,read_data,read_size,poffset);
> }
>
> and this for write:
> /* check whether we can cache writes locally */
> /* if not then check whether we can use the page cache at all */
> if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)
> return generic_file_write(file,write_data, write_size,poffset);
> else {
> written = generic_file_write(file,write_data, write_size,poffset);
> /* since we can not cache writes, need to sync the data */
> if(file->f_dentry->d_inode->i_mapping)
> filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
> return written;
> }
>
> Are there better ways to do this?
>
next prev parent reply other threads:[~2004-01-31 23:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-01-31 23:15 writethrough of mmapped pages Steve French
2004-01-31 23:45 ` Steve French [this message]
2004-02-01 0:16 ` Steve French
-- strict thread matches above, loose matches on Subject: below --
2004-01-30 23:30 Steve French
2004-01-31 0:02 ` Andrew Morton
2004-01-31 10:21 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=401C3E2D.7040806@austin.rr.com \
--to=smfrench@austin.rr.com \
--cc=linux-cifs-client@lists.samba.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.