public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: jmoyer@redhat.com
Cc: Zach Brown <zach.brown@oracle.com>, linux-kernel@vger.kernel.org
Subject: Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path
Date: Wed, 11 Oct 2006 11:37:20 -0700	[thread overview]
Message-ID: <20061011113720.463e331c.akpm@osdl.org> (raw)
In-Reply-To: <m3odsi4x3a.fsf@redhat.com>

On Wed, 11 Oct 2006 12:48:57 -0400
jmoyer@redhat.com wrote:

> ==> Regarding Re: [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path; Andrew Morton <akpm@osdl.org> adds:
> 
> akpm> Patch is below.  The end result looks like:
> 
> 
> akpm> 	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
> akpm> 	if (unlikely(file->f_flags & O_DIRECT)) {
> akpm> 		loff_t endbyte;
> akpm> 		ssize_t written_buffered;
> 
> akpm> 		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
> akpm> 							ppos, count, ocount);
> akpm> 		if (written < 0 || written == count)
> akpm> 			goto out;
> akpm> 		/*
> akpm> 		 * direct-io write to a hole: fall through to buffered I/O
> akpm> 		 * for completing the rest of the request.
> akpm> 		 */
> akpm> 		pos += written;
> akpm> 		count -= written;
> akpm> 		written_buffered = generic_file_buffered_write(iocb, iov,
> akpm> 						nr_segs, pos, ppos, count,
> akpm> 						written);
> 
> akpm> 		/*
> akpm> 		 * We need to ensure that the page cache pages are written to
> akpm> 		 * disk and invalidated to preserve the expected O_DIRECT
> akpm> 		 * semantics.
> akpm> 		 */
> akpm> 		endbyte = pos + written_buffered - 1;
> 
> We probably want to handle the case where generic_file_buffered_write
> returns an error or nothing written.

If generic_file_buffered_write() returned an error, we want to save that
error code somewhere.

If generic_file_buffered_write() didn't actually write anything then that's
pretty weird, but the following code should handle that correctly and do
the right thing.  So perhaps that case is so rare it isn't worth the
test-n-branch to optimise for it?


> akpm> 		err = do_sync_file_range(file, pos, endbyte,
> akpm> 					 SYNC_FILE_RANGE_WAIT_BEFORE|
> akpm> 					 SYNC_FILE_RANGE_WRITE|
> akpm> 					 SYNC_FILE_RANGE_WAIT_AFTER);
> akpm> 		if (err == 0) {
> akpm> 			written += written_buffered;
> akpm> 			invalidate_mapping_pages(mapping,
> akpm> 						 pos >> PAGE_CACHE_SHIFT,
> akpm> 						 endbyte >> PAGE_CACHE_SHIFT);
> 
> generic_file_buffered_write takes written as an argument, and returns that
> amount plus whatever it managed to write.

How weird.

>  As such, you don't want to add
> written_buffered to written.  Instead, you want written = written_buffered.
> The endbyte calculation has to be altered in kind.

OK.

> Incremental, locally tested patch attached.  Comments are welcome as
> always.  Once there is consensus, I'll send this off for testing with
> Oracle again.


So I'd propose:

diff -puN mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix mm/filemap.c
--- a/mm/filemap.c~direct-io-sync-and-invalidate-file-region-when-falling-back-to-buffered-write-fix
+++ a/mm/filemap.c
@@ -2291,19 +2291,30 @@ __generic_file_aio_write_nolock(struct k
 		written_buffered = generic_file_buffered_write(iocb, iov,
 						nr_segs, pos, ppos, count,
 						written);
+		/*
+		 * If generic_file_buffered_write() retuned a synchronous error
+		 * then we want to return the number of bytes which were
+		 * direct-written, or the error code if that was zero.  Note
+		 * that this differs from normal direct-io semantics, which
+		 * will return -EFOO even if some bytes were written.
+		 */
+		if (written_buffered < 0) {
+			err = written_buffered;
+			goto out;
+		}
 
 		/*
 		 * We need to ensure that the page cache pages are written to
 		 * disk and invalidated to preserve the expected O_DIRECT
 		 * semantics.
 		 */
-		endbyte = pos + written_buffered - 1;
+		endbyte = pos + written_buffered - written - 1;
 		err = do_sync_file_range(file, pos, endbyte,
 					 SYNC_FILE_RANGE_WAIT_BEFORE|
 					 SYNC_FILE_RANGE_WRITE|
 					 SYNC_FILE_RANGE_WAIT_AFTER);
 		if (err == 0) {
-			written += written_buffered;
+			written = written_buffered;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_CACHE_SHIFT,
 						 endbyte >> PAGE_CACHE_SHIFT);
_


  reply	other threads:[~2006-10-11 18:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-04 17:04 [patch] call truncate_inode_pages in the DIO fallback to buffered I/O path Jeff Moyer
2006-10-04 17:25 ` Andrew Morton
2006-10-04 17:51   ` Zach Brown
2006-10-04 17:53     ` Jeff Moyer
2006-10-04 18:16       ` Andrew Morton
2006-10-04 18:40         ` Zach Brown
2006-10-04 19:16           ` Andrew Morton
2006-10-04 20:53             ` Jeff Moyer
2006-10-04 21:22               ` Jeff Moyer
2006-10-04 23:55               ` Andrew Morton
2006-10-05 19:31                 ` Jeff Moyer
2006-10-06 20:11                   ` Andrew Morton
2006-10-11 16:48                     ` jmoyer
2006-10-11 18:37                       ` Andrew Morton [this message]
2006-10-12 22:01                         ` Jeff Moyer
2006-10-12 22:37                           ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061011113720.463e331c.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox