Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file that never had nuls in it?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Peter Linich <plinich@cse.unsw.edu.au>
Subject: Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file   that never had nuls in it?
Date: Tue, 29 May 2007 16:39:37 +1000	[thread overview]
Message-ID: <465BCAA9.3070707@yahoo.com.au> (raw)
In-Reply-To: <18011.51290.257450.26100@notabene.brown>

[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]

Neil Brown wrote:
> [resending with correct To address - please reply to this one]
> 
> It appears that there is a race when reading from a file that is
> concurrently being truncated.  It is possible to read a number of
> bytes that matches the size of the file before the truncate, but the
> actual bytes are all nuls - values that had never been in the file.
> 
> Below is a simple C program to demonstrate this, and a patch that
> might fix it (initial testing is positive, but it might just make the
> window smaller).
> To trial the program run two instances, naming the same file as the
> only argument.  Every '!' indicates a read that found a nul.  I get
> one every few minutes.
> e.g.  cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile
> 
> My exploration suggests the problem is in do_generic_mapping_read in
> mm/filemap.c. 
> This code:
>    gets the size of the file
>    triggers readahead
>    gets the appropriate page
>    If the page is up-to-date, return data.
> 
> If a truncate happens just before readahead is triggered, then
> the size will be the pre-truncate size of the file, while the page
> could have been read by the readahead and so will be up-to-date and
> full of nuls.
> 
> Note that if do_generic_mapping_read calls readpage explicitly, it
> samples the value of inode->i_size again after the read.  However if
> the readpage is called by the readahead code, i_size is not
> re-sampled.
> 
> I am not 100% confident of every aspect of this explanation (I haven't
> traced all the way through the read-ahead code) but it seems to fit
> the available data including the fact that if I disable read-ahead
> (blockdev --setra 0) then the apparent problem goes away.
> 
> The patch below moves the code for re-sampling i_size from after the
> readpage call to before the "actor" call.
> 
> Questions:
>   - Is this a problem, and should it be fixed (I think "yes").

I think you are right.

>   - Is the patch appropriate, and does it have no negative
>     consequences?.
>     (Obviously some comments should be tidied up to reflect the new
>     reality).

Would it be better (and closer to following the existing logic) if
we sampled i_size before testing each page for uptodateness? It might
also cost a little less in the fastpath case of finding an uptodate
page.

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: mm-ra-zerofill-fix.patch --]
[-- Type: text/plain, Size: 1235 bytes --]

Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2007-05-29 16:34:38.000000000 +1000
+++ linux-2.6/mm/filemap.c	2007-05-29 16:35:42.000000000 +1000
@@ -863,13 +863,11 @@ void do_generic_mapping_read(struct addr
 {
 	struct inode *inode = mapping->host;
 	unsigned long index;
-	unsigned long end_index;
 	unsigned long offset;
 	unsigned long last_index;
 	unsigned long next_index;
 	unsigned long prev_index;
 	unsigned int prev_offset;
-	loff_t isize;
 	struct page *cached_page;
 	int error;
 	struct file_ra_state ra = *_ra;
@@ -882,15 +880,17 @@ void do_generic_mapping_read(struct addr
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
-	isize = i_size_read(inode);
-	if (!isize)
-		goto out;
-
-	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
 	for (;;) {
+		loff_t isize;
 		struct page *page;
+		unsigned long end_index;
 		unsigned long nr, ret;
 
+		isize = i_size_read(inode);
+		if (!isize)
+			goto out;
+		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = PAGE_CACHE_SIZE;
 		if (index >= end_index) {

next prev parent reply	other threads:[~2007-05-29  6:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-29  6:29 [PATCH/RFC] Is it OK for 'read' to return nuls for a file that never had nuls in it? Neil Brown
2007-05-29  6:29 ` Neil Brown
2007-05-29  6:39 ` Nick Piggin [this message]
2007-05-29  7:00   ` Neil Brown
2007-05-29  7:00     ` Neil Brown
2007-05-29  7:28     ` Nick Piggin
2007-05-29  7:28       ` Nick Piggin
2007-05-31  6:51       ` Neil Brown
2007-05-31  6:51         ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2007-05-29  6:23 Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465BCAA9.3070707@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=plinich@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.