writethrough of mmapped pages

All of lore.kernel.org
 help / color / mirror / Atom feed

* writethrough of mmapped pages
@ 2004-01-30 23:30 Steve French
  2004-01-31  0:02 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2004-01-30 23:30 UTC (permalink / raw)
  To: linux-fsdevel

Is there an easy way to locally force the page cache to do in effect
writethrough of mmapped pages (when I lose my write caching token in
cifs vfs e.g.)?  Standard writes can be hooked easily enough by wrapping
the inode operation for write and only calling generic_write if they are
(still) cacheable locally which is what I do in current code, but
mmapped writes still go through the page cache presumably and thus would
have to be synced if a non-mmmapped read followed an mmapped write of an
overlapping range.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: writethrough of mmapped pages
  2004-01-30 23:30 Steve French
@ 2004-01-31  0:02 ` Andrew Morton
  2004-01-31 10:21   ` David Woodhouse
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2004-01-31  0:02 UTC (permalink / raw)
  To: Steve French; +Cc: linux-fsdevel

Steve French <smfltc@us.ibm.com> wrote:
>
> Is there an easy way to locally force the page cache to do in effect
> writethrough of mmapped pages (when I lose my write caching token in
> cifs vfs e.g.)?  Standard writes can be hooked easily enough by wrapping
> the inode operation for write and only calling generic_write if they are
> (still) cacheable locally which is what I do in current code, but
> mmapped writes still go through the page cache presumably and thus would
> have to be synced if a non-mmmapped read followed an mmapped write of an
> overlapping range.

You mean that you want the filesytem to know whenever an application
attempts to modify a mmapped page?  After the modification, rather than
before?

Not sure how to do that - even if you modify the MM to take a trap on every
write the page still hasn't been modified on entry to the fault handler.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: writethrough of mmapped pages
  2004-01-31  0:02 ` Andrew Morton
@ 2004-01-31 10:21   ` David Woodhouse
  0 siblings, 0 replies; 6+ messages in thread
From: David Woodhouse @ 2004-01-31 10:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Steve French, linux-fsdevel

On Fri, 2004-01-30 at 16:02 -0800, Andrew Morton wrote:
> You mean that you want the filesytem to know whenever an application
> attempts to modify a mmapped page?  After the modification, rather than
> before?
> 
> Not sure how to do that - even if you modify the MM to take a trap on every
> write the page still hasn't been modified on entry to the fault handler.

If I'm ever going to implement shared writable mmap in JFFS2 I'm going
to need part of this actually -- a hook in do_wp_page() so I can
preallocate data structures, and make appropriate reservations on the
medium to make sure we can always satisfy the writepage() without having
to garbage collect or allocate memory. 

Knowing about it before the modification is fine for me though.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: writethrough of mmapped pages
@ 2004-01-31 23:15 Steve French
  2004-01-31 23:45 ` Steve French
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2004-01-31 23:15 UTC (permalink / raw)
  To: linux-fsdevel, linux-cifs-client

 > You mean that you want the filesytem to know whenever an application
 > attempts to modify a mmapped page?  After the modification, rather than
 > before?

That would be helpful.   But alternatively if a simple check for dirty 
pages for an inode
is cheap (ie would not hurt performance to be called every few seconds) 
- then when
the read and read/write caching tokens for an inode (actually file 
handle but in this case
inode is close enough) are lost (oplock broken)  I might be able to have 
a kernel thread
simply sync all of the filesystem's dirty inodes which are not cacheable.

Unfortunately - an experiment I tried yielded very strange results and 
made things much worse.
First I disabled caching in my code by setting 
/proc/fs/cifs/OplockEnabled to zero (simulating losing
the caching token) and ran the fsx file system exerciser test case which 
almost immediately failed.

The fsx test case did the following steps before failing:
1) create new file
2) truncate up to 0x13E76
3) write to range 0x17098 - 0x26857
4) read 0xC73E through 0x1B801
5) check the data just read which turned up incorrect random data 
starting at offset 0x13000

What I see in my code (from the cifs vfs perspective) is basically :
1) create/open
2) setattrs (set file size)
3) write 0x17098 - 0x26857
4) but during the read (since I did not have a caching token and 
therefore was not calling
generic_read in my read routine) I tried the experiment of calling 
filemap_fdatawrite (to
handle the case in which mmapped data might exist locallly for this file 
but not be written to the
server for the range I was about to read from the server) before issuing 
the read call.  This is
a disaster because it causes a write of two (!) dirty pages to the 
server (there should not
be any).  
4) the filemap_fdatawrite caused writepage of one page at 0x0000 
(fortunately all zero as would be expected)
but ... and this is the problem ... it then it did a writepage of  one 
page at 0x13000 (note that this range should be zero
from the truncate (truncate_up to 0x13E76) and the rest should be zero 
implicitly from the write beyond end of file.
In any case - there never really was a write to this page (at 0x13000) 
and the server copy of the data was fine before this write which writes 
junk - locally the
page cache thinks this page is dirty and the write of a bogus dirty page 
out is bad.
5) then I do the read from the server which returns the data which is 
fine from 0xC73E but includes the junk starting at offset 0x1B00 (and 
then the testcase fails)

The logic inside the read routine originally had been to just call 
generic_file_read and the write routine just called generic_file_write 
(and all single machine remote cases  worked fine that way for many 
months) but the change I made to fix the oplock problem changed the read 
wrapper routine to do:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else /* this calls the server directly to get the data and does not 
go through the pagecache */ {
         /* to get mmap data to get flushed in the case in which the 
caching token lost
        and we are not going through the pagecache I tried adding a call to
        filemap_fdatawrite here which caused the problems described in 
the note */
         return cifs_read(file,read_data,read_size,poffset);  
    }

and similarly in my write wrapper routine:

    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);

    else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping == NULL)
            return -EIO;
        filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
            return written;
    } else { /* this does not go through the page cache but goes 
directly to the server */
         return cifs_write(file,write_data,write_size,poffset);
    }

It seems that I am going to always have to go through the pagecache it 
seems (remove the call to cifs_write and cifs_read and always call the 
generic_file_read and generic_file_write), perhaps the next thing to try 
is: this for read:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else {
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);      
         invalidate_inode_pages(file->f_dentry->d_inode);
         return generic_file_read(file,read_data,read_size,poffset);
    }

and this for write:
    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);
    else {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
       return written;
    }

Are there better ways to do this?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: writethrough of mmapped pages
  2004-01-31 23:15 writethrough of mmapped pages Steve French
@ 2004-01-31 23:45 ` Steve French
  2004-02-01  0:16   ` Steve French
  0 siblings, 1 reply; 6+ messages in thread
From: Steve French @ 2004-01-31 23:45 UTC (permalink / raw)
  Cc: linux-fsdevel, linux-cifs-client

The experiment described at the bottom (always calling the generic read 
and write but doing filemap_fdatasync and invalidate_remote_inode when 
file not cacheable) failed - same symptom.   It looks like the portion 
of page implicitly extended is not zeroed properly but I am not sure why 
calling filemap_fdatasync early would make a difference.  In the note 
below there is a typo - I meant invalidate_remote_inode not 
invalidate_inode_pages (since when the file is not cacheable I have to 
force the client to reread the pages).

Steve French wrote

> Unfortunately - an experiment I tried yielded very strange results and 
> made things much worse.
> First I disabled caching in my code by setting 
> /proc/fs/cifs/OplockEnabled to zero (simulating losing
> the caching token) and ran the fsx file system exerciser test case 
> which almost immediately failed.
>
> The fsx test case did the following steps before failing:
> 1) create new file
> 2) truncate up to 0x13E76
> 3) write to range 0x17098 - 0x26857
> 4) read 0xC73E through 0x1B801
> 5) check the data just read which turned up incorrect random data 
> starting at offset 0x13000
>
> What I see in my code (from the cifs vfs perspective) is basically :
> 1) create/open
> 2) setattrs (set file size)
> 3) write 0x17098 - 0x26857
> 4) but during the read (since I did not have a caching token and 
> therefore was not calling
> generic_read in my read routine) I tried the experiment of calling 
> filemap_fdatawrite (to
> handle the case in which mmapped data might exist locallly for this 
> file but not be written to the
> server for the range I was about to read from the server) before 
> issuing the read call.  This is
> a disaster because it causes a write of two (!) dirty pages to the 
> server (there should not
> be any).  4) the filemap_fdatawrite caused writepage of one page at 
> 0x0000 (fortunately all zero as would be expected)
> but ... and this is the problem ... it then it did a writepage of  one 
> page at 0x13000 (note that this range should be zero
> from the truncate (truncate_up to 0x13E76) and the rest should be zero 
> implicitly from the write beyond end of file.
> In any case - there never really was a write to this page (at 0x13000) 
> and the server copy of the data was fine before this write which 
> writes junk - locally the
> page cache thinks this page is dirty and the write of a bogus dirty 
> page out is bad.
> 5) then I do the read from the server which returns the data which is 
> fine from 0xC73E but includes the junk starting at offset 0x1B00 (and 
> then the testcase fails)
>
> The logic inside the read routine originally had been to just call 
> generic_file_read and the write routine just called generic_file_write 
> (and all single machine remote cases  worked fine that way for many 
> months) but the change I made to fix the oplock problem changed the 
> read wrapper routine to do:
>
>     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
>         return generic_file_read(file,read_data,read_size,poffset);
>     else /* this calls the server directly to get the data and does 
> not go through the pagecache */ {
>         /* to get mmap data to get flushed in the case in which the 
> caching token lost
>        and we are not going through the pagecache I tried adding a 
> call to
>        filemap_fdatawrite here which caused the problems described in 
> the note */
>         return cifs_read(file,read_data,read_size,poffset);     }
>
> and similarly in my write wrapper routine:
>
>    /* check whether we can cache writes locally */
>    /* if not then check whether we can use the page cache at all */
>    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)         
> return generic_file_write(file,write_data, write_size,poffset);
>
>    else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
>        written = generic_file_write(file,write_data, write_size,poffset);
>        /* since we can not cache writes, need to sync the data */
>        if(file->f_dentry->d_inode->i_mapping == NULL)
>            return -EIO;
>        filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
>            return written;
>    } else { /* this does not go through the page cache but goes 
> directly to the server */
>         return cifs_write(file,write_data,write_size,poffset);
>    }
>
> It seems that I am going to always have to go through the pagecache it 
> seems (remove the call to cifs_write and cifs_read and always call the 
> generic_file_read and generic_file_write), perhaps the next thing to 
> try is: this for read:
>
>     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
>         return generic_file_read(file,read_data,read_size,poffset);
>     else {
>        if(file->f_dentry->d_inode->i_mapping)
>            
> filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);              
> invalidate_inode_pages(file->f_dentry->d_inode);
>         return generic_file_read(file,read_data,read_size,poffset);
>    }
>
> and this for write:
>    /* check whether we can cache writes locally */
>    /* if not then check whether we can use the page cache at all */
>    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll)         
> return generic_file_write(file,write_data, write_size,poffset);
>    else {
>        written = generic_file_write(file,write_data, write_size,poffset);
>        /* since we can not cache writes, need to sync the data */
>        if(file->f_dentry->d_inode->i_mapping)
>            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
>       return written;
>    }
>
> Are there better ways to do this?
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: writethrough of mmapped pages
  2004-01-31 23:45 ` Steve French
@ 2004-02-01  0:16   ` Steve French
  0 siblings, 0 replies; 6+ messages in thread
From: Steve French @ 2004-02-01  0:16 UTC (permalink / raw)
  Cc: linux-fsdevel, linux-cifs-client

Well the filemap_fdatawrite after generic_write seems to cause the 
problem. Removing the call to it there worked around the problem - fsx 
works fine as long as I don't flush the dirty data with 
filemap_fdatawrite after the call to generic_write in my write wrapper 
routine. Removing it fixed the problem (but that doesn't seem to be very 
useful since when I lose the caching token - I really do need to flush 
the data to the server after every write - and if filemap_fdatawrite is 
not safe to call after every write - what routine should I call?)

Steve French wrote:

> The experiment described at the bottom (always calling the generic 
> read and write but doing filemap_fdatasync and invalidate_remote_inode 
> when file not cacheable) failed - same symptom. It looks like the 
> portion of page implicitly extended is not zeroed properly but I am 
> not sure why calling filemap_fdatasync early would make a difference. 
> In the note below there is a typo - I meant invalidate_remote_inode 
> not invalidate_inode_pages (since when the file is not cacheable I 
> have to force the client to reread the pages).
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-02-01  0:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-31 23:15 writethrough of mmapped pages Steve French
2004-01-31 23:45 ` Steve French
2004-02-01  0:16   ` Steve French
  -- strict thread matches above, loose matches on Subject: below --
2004-01-30 23:30 Steve French
2004-01-31  0:02 ` Andrew Morton
2004-01-31 10:21   ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.