* [PATCH] md: stop using do_sync_mapping_range @ 2009-09-23 13:18 Christoph Hellwig 2009-09-23 13:47 ` Jamie Lokier 2009-09-23 22:52 ` Neil Brown 0 siblings, 2 replies; 6+ messages in thread From: Christoph Hellwig @ 2009-09-23 13:18 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, linux-fsdevel It's a very awkward way to write out all data and wait for it, so just call filemap_write_and_wait. I still can't figure what the point of all this is, so a comment would surely be helpful. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: vfs-2.6.git/drivers/md/bitmap.c =================================================================== --- vfs-2.6.git.orig/drivers/md/bitmap.c 2009-09-22 14:31:08.698262797 -0300 +++ vfs-2.6.git/drivers/md/bitmap.c 2009-09-22 14:33:01.573762756 -0300 @@ -1621,10 +1621,7 @@ int bitmap_create(mddev_t *mddev) bitmap->offset = mddev->bitmap_offset; if (file) { get_file(file); - do_sync_mapping_range(file->f_mapping, 0, LLONG_MAX, - SYNC_FILE_RANGE_WAIT_BEFORE | - SYNC_FILE_RANGE_WRITE | - SYNC_FILE_RANGE_WAIT_AFTER); + filemap_write_and_wait(file->f_mapping); } /* read superblock from bitmap file (this sets bitmap->chunksize) */ err = bitmap_read_sb(bitmap); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] md: stop using do_sync_mapping_range 2009-09-23 13:18 [PATCH] md: stop using do_sync_mapping_range Christoph Hellwig @ 2009-09-23 13:47 ` Jamie Lokier 2009-09-26 15:11 ` Christoph Hellwig 2009-09-23 22:52 ` Neil Brown 1 sibling, 1 reply; 6+ messages in thread From: Jamie Lokier @ 2009-09-23 13:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Neil Brown, linux-raid, linux-fsdevel Christoph Hellwig wrote: > - do_sync_mapping_range(file->f_mapping, 0, LLONG_MAX, > - SYNC_FILE_RANGE_WAIT_BEFORE | > - SYNC_FILE_RANGE_WRITE | > - SYNC_FILE_RANGE_WAIT_AFTER); > + filemap_write_and_wait(file->f_mapping); > > It's a very awkward way to write out all data and wait for it, so just > call filemap_write_and_wait. I still can't figure what the point of > all this is, so a comment would surely be helpful. The SYNC_FILE_RANGE_WAIT_BEFORE is to make sure that writes which were started before the most recent dirtying are completed before initiating a second write on the same pages to make sure the most recently dirty data is written. Does filemap_write_and_wait() do that? Fwiw, It's partially redundant: it isn't necessary to have a second write when the first write request was queued in the elevator but hadn't yet reached the device. -- Jamie ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] md: stop using do_sync_mapping_range 2009-09-23 13:47 ` Jamie Lokier @ 2009-09-26 15:11 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2009-09-26 15:11 UTC (permalink / raw) To: Jamie Lokier; +Cc: Christoph Hellwig, Neil Brown, linux-raid, linux-fsdevel On Wed, Sep 23, 2009 at 02:47:15PM +0100, Jamie Lokier wrote: > Christoph Hellwig wrote: > > - do_sync_mapping_range(file->f_mapping, 0, LLONG_MAX, > > - SYNC_FILE_RANGE_WAIT_BEFORE | > > - SYNC_FILE_RANGE_WRITE | > > - SYNC_FILE_RANGE_WAIT_AFTER); > > + filemap_write_and_wait(file->f_mapping); > > > > It's a very awkward way to write out all data and wait for it, so just > > call filemap_write_and_wait. I still can't figure what the point of > > all this is, so a comment would surely be helpful. > > The SYNC_FILE_RANGE_WAIT_BEFORE is to make sure that writes which were > started before the most recent dirtying are completed before > initiating a second write on the same pages to make sure the most > recently dirty data is written. > > Does filemap_write_and_wait() do that? Yes, for WB_SYNC_ALL syncs write_cache_pages or the fs-specific equivalent waits for pages marked writeback before starting starting the writeout. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] md: stop using do_sync_mapping_range 2009-09-23 13:18 [PATCH] md: stop using do_sync_mapping_range Christoph Hellwig 2009-09-23 13:47 ` Jamie Lokier @ 2009-09-23 22:52 ` Neil Brown 2009-09-26 15:13 ` Christoph Hellwig 1 sibling, 1 reply; 6+ messages in thread From: Neil Brown @ 2009-09-23 22:52 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-raid, linux-fsdevel On Wednesday September 23, hch@lst.de wrote: > It's a very awkward way to write out all data and wait for it, so just > call filemap_write_and_wait. I still can't figure what the point of > all this is, so a comment would surely be helpful. When md/bitmap accesses a file, it uses bmap to find addresses and then submit_bh to do IO, so it completely by-passes the page cache. So this code is present to ensure that the page cache has no dirty pages for the file before we start using the file. I don't recall exactly why I used do_sync_mapping_range. I suspect that I looked at what "sys_fsync" used (do_fsync?) and found that wasn't exported, so I looked at what sys_sync_file_range used, found that was exported, and so used that. Looking at the current state of the VFS, I think I would rather use vfs_fsync. So would you be happy with something like the following? Thanks, NeilBrown diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 6986b00..60e2b32 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -1624,10 +1624,11 @@ int bitmap_create(mddev_t *mddev) bitmap->offset = mddev->bitmap_offset; if (file) { get_file(file); - do_sync_mapping_range(file->f_mapping, 0, LLONG_MAX, - SYNC_FILE_RANGE_WAIT_BEFORE | - SYNC_FILE_RANGE_WRITE | - SYNC_FILE_RANGE_WAIT_AFTER); + /* As future accesses to this file will use bmap, + * and bypass the page cache, we must sync the file + * first. + */ + vfs_fsync(file, file->f_dentry, 1); } /* read superblock from bitmap file (this sets bitmap->chunksize) */ err = bitmap_read_sb(bitmap); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] md: stop using do_sync_mapping_range 2009-09-23 22:52 ` Neil Brown @ 2009-09-26 15:13 ` Christoph Hellwig 2009-09-26 21:51 ` NeilBrown 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2009-09-26 15:13 UTC (permalink / raw) To: Neil Brown; +Cc: Christoph Hellwig, linux-raid, linux-fsdevel On Thu, Sep 24, 2009 at 08:52:36AM +1000, Neil Brown wrote: > On Wednesday September 23, hch@lst.de wrote: > > It's a very awkward way to write out all data and wait for it, so just > > call filemap_write_and_wait. I still can't figure what the point of > > all this is, so a comment would surely be helpful. > > When md/bitmap accesses a file, it uses bmap to find addresses and then > submit_bh to do IO, so it completely by-passes the page cache. > So this code is present to ensure that the page cache has no dirty > pages for the file before we start using the file. > > I don't recall exactly why I used do_sync_mapping_range. I suspect > that I looked at what "sys_fsync" used (do_fsync?) and found that > wasn't exported, so I looked at what sys_sync_file_range used, found > that was exported, and so used that. > > Looking at the current state of the VFS, I think I would rather use > vfs_fsync. Using vfs_fsync is better, but using bmap is extremly dangerous, I'm kinda suprised how this managed to get sneaked in without notice. bmap really is just an advisory interface as the mappings by change underneath all the time. Some modern filesystems like btrfs really can't implement it at all. > So would you be happy with something like the following? It's at least a small improvement.. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] md: stop using do_sync_mapping_range 2009-09-26 15:13 ` Christoph Hellwig @ 2009-09-26 21:51 ` NeilBrown 0 siblings, 0 replies; 6+ messages in thread From: NeilBrown @ 2009-09-26 21:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Christoph Hellwig, linux-raid, linux-fsdevel On Sun, September 27, 2009 1:13 am, Christoph Hellwig wrote: > On Thu, Sep 24, 2009 at 08:52:36AM +1000, Neil Brown wrote: >> On Wednesday September 23, hch@lst.de wrote: >> > It's a very awkward way to write out all data and wait for it, so just >> > call filemap_write_and_wait. I still can't figure what the point of >> > all this is, so a comment would surely be helpful. >> >> When md/bitmap accesses a file, it uses bmap to find addresses and then >> submit_bh to do IO, so it completely by-passes the page cache. >> So this code is present to ensure that the page cache has no dirty >> pages for the file before we start using the file. >> >> I don't recall exactly why I used do_sync_mapping_range. I suspect >> that I looked at what "sys_fsync" used (do_fsync?) and found that >> wasn't exported, so I looked at what sys_sync_file_range used, found >> that was exported, and so used that. >> >> Looking at the current state of the VFS, I think I would rather use >> vfs_fsync. > > Using vfs_fsync is better, but using bmap is extremly dangerous, I'm > kinda suprised how this managed to get sneaked in without notice. bmap > really is just an advisory interface as the mappings by change > underneath all the time. Some modern filesystems like btrfs really > can't implement it at all. > I used bmap because I wanted writeout characteristics similar to swapout, and mm/swapfile.c uses bmap. I copied the ideas from that code. If bmap is unsafe for bitmap files, then presumably it is unsafe for swap files too. This is exactly the argument that I used when Andrew Morton queried the code on the way in. The swap-over-NFS patches introduce a new interface to the filesystem that supports swapout (i.e. writeout that is guaranteed not to block on mem allocation etc). If they ever get upstream, I will change bitmap files to use that interface when it is present. > >> So would you be happy with something like the following? > > It's at least a small improvement.. Thanks. NeilBrown ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-09-26 21:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-23 13:18 [PATCH] md: stop using do_sync_mapping_range Christoph Hellwig 2009-09-23 13:47 ` Jamie Lokier 2009-09-26 15:11 ` Christoph Hellwig 2009-09-23 22:52 ` Neil Brown 2009-09-26 15:13 ` Christoph Hellwig 2009-09-26 21:51 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).