All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve French <smfrench@austin.rr.com>
To: linux-fsdevel@vger.kernel.org, linux-cifs-client@lists.samba.org
Subject: Re: writethrough of mmapped pages
Date: Sat, 31 Jan 2004 17:15:39 -0600	[thread overview]
Message-ID: <401C371B.6020909@austin.rr.com> (raw)

 > You mean that you want the filesytem to know whenever an application
 > attempts to modify a mmapped page?  After the modification, rather than
 > before?

That would be helpful.   But alternatively if a simple check for dirty 
pages for an inode
is cheap (ie would not hurt performance to be called every few seconds) 
- then when
the read and read/write caching tokens for an inode (actually file 
handle but in this case
inode is close enough) are lost (oplock broken)  I might be able to have 
a kernel thread
simply sync all of the filesystem's dirty inodes which are not cacheable.

Unfortunately - an experiment I tried yielded very strange results and 
made things much worse.
First I disabled caching in my code by setting 
/proc/fs/cifs/OplockEnabled to zero (simulating losing
the caching token) and ran the fsx file system exerciser test case which 
almost immediately failed.

The fsx test case did the following steps before failing:
1) create new file
2) truncate up to 0x13E76
3) write to range 0x17098 - 0x26857
4) read 0xC73E through 0x1B801
5) check the data just read which turned up incorrect random data 
starting at offset 0x13000

What I see in my code (from the cifs vfs perspective) is basically :
1) create/open
2) setattrs (set file size)
3) write 0x17098 - 0x26857
4) but during the read (since I did not have a caching token and 
therefore was not calling
generic_read in my read routine) I tried the experiment of calling 
filemap_fdatawrite (to
handle the case in which mmapped data might exist locallly for this file 
but not be written to the
server for the range I was about to read from the server) before issuing 
the read call.  This is
a disaster because it causes a write of two (!) dirty pages to the 
server (there should not
be any).  
4) the filemap_fdatawrite caused writepage of one page at 0x0000 
(fortunately all zero as would be expected)
but ... and this is the problem ... it then it did a writepage of  one 
page at 0x13000 (note that this range should be zero
from the truncate (truncate_up to 0x13E76) and the rest should be zero 
implicitly from the write beyond end of file.
In any case - there never really was a write to this page (at 0x13000) 
and the server copy of the data was fine before this write which writes 
junk - locally the
page cache thinks this page is dirty and the write of a bogus dirty page 
out is bad.
5) then I do the read from the server which returns the data which is 
fine from 0xC73E but includes the junk starting at offset 0x1B00 (and 
then the testcase fails)

The logic inside the read routine originally had been to just call 
generic_file_read and the write routine just called generic_file_write 
(and all single machine remote cases  worked fine that way for many 
months) but the change I made to fix the oplock problem changed the read 
wrapper routine to do:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else /* this calls the server directly to get the data and does not 
go through the pagecache */ {
         /* to get mmap data to get flushed in the case in which the 
caching token lost
        and we are not going through the pagecache I tried adding a call to
        filemap_fdatawrite here which caused the problems described in 
the note */
         return cifs_read(file,read_data,read_size,poffset);  
    }

and similarly in my write wrapper routine:

    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);

    else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping == NULL)
            return -EIO;
        filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
            return written;
    } else { /* this does not go through the page cache but goes 
directly to the server */
         return cifs_write(file,write_data,write_size,poffset);
    }

It seems that I am going to always have to go through the pagecache it 
seems (remove the call to cifs_write and cifs_read and always call the 
generic_file_read and generic_file_write), perhaps the next thing to try 
is: this for read:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else {
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);      
         invalidate_inode_pages(file->f_dentry->d_inode);
         return generic_file_read(file,read_data,read_size,poffset);
    }

and this for write:
    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);
    else {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
       return written;
    }

Are there better ways to do this?

             reply	other threads:[~2004-01-31 23:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-31 23:15 Steve French [this message]
2004-01-31 23:45 ` writethrough of mmapped pages Steve French
2004-02-01  0:16   ` Steve French
  -- strict thread matches above, loose matches on Subject: below --
2004-01-30 23:30 Steve French
2004-01-31  0:02 ` Andrew Morton
2004-01-31 10:21   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=401C371B.6020909@austin.rr.com \
    --to=smfrench@austin.rr.com \
    --cc=linux-cifs-client@lists.samba.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.