data loss in 2.6.9-rc1-mm1

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* data loss in 2.6.9-rc1-mm1
@ 2004-08-27 10:55 Gergely Tamas
  2004-08-27 11:05 ` Anton Altaparmakov
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Gergely Tamas @ 2004-08-27 10:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

Hi!

I've hit the following data loss problem under 2.6.9-rc1-mm1.

If I copy data from a file to another the target will be smaller then
the source file.

2.6.9-rc1 does not have this problem
2.6.8.1-mm4 does not have this problem
2.6.9-rc1-mm1 _does have_ this problem

I tried this with reiserfs and xfs and it happened with both of them.

See the testcase at the bottom of this mail.

Thanks in advance,
Gergely

--------------------------------------------------

$ uname -r
2.6.9-rc1

$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028646 seconds (366045254 bytes/sec)

$ du -sb testfile
10485760        testfile

$ cat testfile > testfile.1

$ du -sb testfile.1
10485760        testfile.1

--------------------------------------------------

$ uname -r
2.6.8.1-mm4

$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028632 seconds (366226397 bytes/sec)

$ du -sb testfile
10485760        testfile

$ cat testfile > testfile.1

$ du -sb testfile.1
10485760        testfile.1

--------------------------------------------------

$ uname -r
2.6.9-rc1-mm1

$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec)

$ du -sb testfile
10485760        testfile

$ cat testfile > testfile.1

$ du -sb testfile.1
10481664        testfile.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas
@ 2004-08-27 11:05 ` Anton Altaparmakov
  2004-08-27 11:40   ` Gergely Tamas
  2004-08-27 12:35   ` Fabio Coatti
  2004-08-27 11:17 ` Tim Schmielau
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: Anton Altaparmakov @ 2004-08-27 11:05 UTC (permalink / raw)
  To: Gergely Tamas; +Cc: lkml, Andrew Morton

On Fri, 2004-08-27 at 11:55, Gergely Tamas wrote:

> I've hit the following data loss problem under 2.6.9-rc1-mm1.
> 
> If I copy data from a file to another the target will be smaller then
> the source file.
[snip]
> $ uname -r
> 2.6.9-rc1-mm1
> 
> $ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec)
> 
> $ du -sb testfile
> 10485760        testfile
> 
> $ cat testfile > testfile.1
> 
> $ du -sb testfile.1
> 10481664        testfile.1

The difference is exactly 4096 bytes, i.e. 1 whole page.  Seems like an
off-by-one error somewhere in the file access or page cache code.

It would be interesting to know whether the read is truncated or whether
the write is truncated.  So could you tell us what is returned by:

cat testfile | wc -c

Also your .config would probably be helpful.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 11:05 ` Anton Altaparmakov
@ 2004-08-27 11:40   ` Gergely Tamas
  2004-08-27 12:35   ` Fabio Coatti
  1 sibling, 0 replies; 27+ messages in thread
From: Gergely Tamas @ 2004-08-27 11:40 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: lkml, Andrew Morton

Hi!

 > The difference is exactly 4096 bytes, i.e. 1 whole page.  Seems like an
 > off-by-one error somewhere in the file access or page cache code.
 > 
 > It would be interesting to know whether the read is truncated or whether
 > the write is truncated.  So could you tell us what is returned by:
 > 
 > cat testfile | wc -c

$ cat testfile | wc -c
10481664

 > Also your .config would probably be helpful.

[ http://dice.mfa.kfki.hu/dot.config-2.6.9-rc1-mm1.gz ]

Thanks,
Gergely

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 11:05 ` Anton Altaparmakov
  2004-08-27 11:40   ` Gergely Tamas
@ 2004-08-27 12:35   ` Fabio Coatti
  1 sibling, 0 replies; 27+ messages in thread
From: Fabio Coatti @ 2004-08-27 12:35 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Gergely Tamas, lkml, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

Alle 13:05, venerdì 27 agosto 2004, Anton Altaparmakov ha scritto:

>
> The difference is exactly 4096 bytes, i.e. 1 whole page.  Seems like an
> off-by-one error somewhere in the file access or page cache code.
>
> It would be interesting to know whether the read is truncated or whether
> the write is truncated.  

I can confirm very bad behaviour on my machine running 2.6.9-rc1-mm1

I've noticed curruption of rpm database, (symptom: error: db4 error(-30989) 
from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found )

So I've tried to rebuild the database, failing with the same error.

I've completely wiped out the database direcory, copied another one from a 
similar machine (just for testing). Now under 2.6.7-mm7 all database is read 
just fine, but under 2.6.9-rc1-mm1 i get the same error DB_PAGE_NOTFOUND, so 
I suspect read error.
If I try to tar the archive, I get several errors, i.e.:
tar: /var/lib/rpm/Packages: File shrank by 4096 bytes; padding with zeros
(2.6.9-rc1-mm1)

Filesystem: jfs
system: P IV 1.70 
Attached you can find my config file for 2.6.9-rc1-mm1.

I can make tests if needed.

-- 
Fabio "Cova" Coatti    http://members.ferrara.linux.it/cova     
Ferrara Linux Users Group           http://ferrara.linux.it
GnuPG fp:9765 A5B6 6843 17BC A646  BE8C FA56 373A 5374 C703
Old SysOps never die... they simply forget their password.

[-- Attachment #2: 269rc1mm1.config.gz --]
[-- Type: application/x-gzip, Size: 6899 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas
  2004-08-27 11:05 ` Anton Altaparmakov
@ 2004-08-27 11:17 ` Tim Schmielau
  2004-08-27 11:43   ` Gergely Tamas
  2004-08-27 11:37 ` Mikael Pettersson
  2004-08-27 11:55 ` Denis Vlasenko
  3 siblings, 1 reply; 27+ messages in thread
From: Tim Schmielau @ 2004-08-27 11:17 UTC (permalink / raw)
  To: Gergely Tamas; +Cc: linux-kernel, akpm

> $ uname -r
> 2.6.9-rc1-mm1
> 
> $ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec)
> 
> $ du -sb testfile
> 10485760        testfile
> 
> $ cat testfile > testfile.1
> 
> $ du -sb testfile.1
> 10481664        testfile.1


What does ls -l testfile.1 give?
What you describe actually can be correct behaviour, since the file is
all zeros.

Although yes, it seems highly improbable someone implemented an 
optimization that cuts away just one of 2560 pages.

Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 11:17 ` Tim Schmielau
@ 2004-08-27 11:43   ` Gergely Tamas
  0 siblings, 0 replies; 27+ messages in thread
From: Gergely Tamas @ 2004-08-27 11:43 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: linux-kernel, akpm

Hi!

 > What does ls -l testfile.1 give?

$ ls -l testfile{,.1}
-rw-r--r--  1 dice users 10485760 Aug 27 13:25 testfile
-rw-r--r--  1 dice users 10481664 Aug 27 13:25 testfile.1

 > What you describe actually can be correct behaviour, since the file is
 > all zeros.

Same test with other source file...

$ dd if=linux-2.6.8.1.tar.bz2 of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.061916 seconds (169354917 bytes/sec)

$ du -sb testfile
10485760        testfile

$ cat testfile > testfile.1

$ du -sb testfile.1
10481664        testfile.1

 > Although yes, it seems highly improbable someone implemented an 
 > optimization that cuts away just one of 2560 pages.

Thanks,
Gergely

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas
  2004-08-27 11:05 ` Anton Altaparmakov
  2004-08-27 11:17 ` Tim Schmielau
@ 2004-08-27 11:37 ` Mikael Pettersson
  2004-08-27 11:55 ` Denis Vlasenko
  3 siblings, 0 replies; 27+ messages in thread
From: Mikael Pettersson @ 2004-08-27 11:37 UTC (permalink / raw)
  To: Gergely Tamas; +Cc: linux-kernel, akpm

Gergely Tamas writes:
 > Hi!
 > 
 > I've hit the following data loss problem under 2.6.9-rc1-mm1.
 > 
 > If I copy data from a file to another the target will be smaller then
 > the source file.
 > 
 > 2.6.9-rc1 does not have this problem
 > 2.6.8.1-mm4 does not have this problem
 > 2.6.9-rc1-mm1 _does have_ this problem
 > 
 > I tried this with reiserfs and xfs and it happened with both of them.

I also saw weird errors when I (very briefly) ran 2.6.9-rc1-mm1
on my News server.
- scp of a newly created file to another box failed with an I/O error
- md5sums of newly created files were incorrect
- NFS mount request to this box failed with permission denied

I don't have time to investigate further, but I've experienced no
problems with either 2.6.9-rc1 or 2.6.8.1-mm4. All local file
systems were ext3, btw.

/Mikael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas
                   ` (2 preceding siblings ...)
  2004-08-27 11:37 ` Mikael Pettersson
@ 2004-08-27 11:55 ` Denis Vlasenko
  2004-08-27 13:56   ` Hugh Dickins
  3 siblings, 1 reply; 27+ messages in thread
From: Denis Vlasenko @ 2004-08-27 11:55 UTC (permalink / raw)
  To: Gergely Tamas, linux-kernel; +Cc: akpm

On Friday 27 August 2004 13:55, Gergely Tamas wrote:
> Hi!
>
> I've hit the following data loss problem under 2.6.9-rc1-mm1.
>
> If I copy data from a file to another the target will be smaller then
> the source file.
>
> 2.6.9-rc1 does not have this problem
> 2.6.8.1-mm4 does not have this problem
> 2.6.9-rc1-mm1 _does have_ this problem

I've seen some errors from KDE too. Let me do your test...

# dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
# cat testfile > testfile.1
# ls -l test*
-rw-r--r--    1 root     root     10485760 Aug 27 14:53 testfile
-rw-r--r--    1 root     root     10481664 Aug 27 14:53 testfile.1

-- 
vda

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 11:55 ` Denis Vlasenko
@ 2004-08-27 13:56   ` Hugh Dickins
  2004-08-27 14:18     ` Gergely Tamas
  2004-08-27 18:30     ` Ram Pai
  0 siblings, 2 replies; 27+ messages in thread
From: Hugh Dickins @ 2004-08-27 13:56 UTC (permalink / raw)
  To: Gergely Tamas; +Cc: Denis Vlasenko, Andrew Morton, Ram Pai, linux-kernel

On Fri, 27 Aug 2004, Denis Vlasenko wrote:
> On Friday 27 August 2004 13:55, Gergely Tamas wrote:
> >
> > I've hit the following data loss problem under 2.6.9-rc1-mm1.
> >
> > If I copy data from a file to another the target will be smaller then
> > the source file.
> >
> > 2.6.9-rc1 does not have this problem
> > 2.6.8.1-mm4 does not have this problem
> > 2.6.9-rc1-mm1 _does have_ this problem
> 
> I've seen some errors from KDE too. Let me do your test...
> 
> # dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
> 10+0 records in
> 10+0 records out
> # cat testfile > testfile.1
> # ls -l test*
> -rw-r--r--    1 root     root     10485760 Aug 27 14:53 testfile
> -rw-r--r--    1 root     root     10481664 Aug 27 14:53 testfile.1

Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
size multiple) data to!  You should find the patch below fixes it
(and, I hope, the issue the erroneous patches were trying to fix).

Signed-off-by: Hugh Dickins <hugh@veritas.com>

--- 2.6.9-rc1-mm1/mm/filemap.c	2004-08-26 12:09:50.000000000 +0100
+++ linux/mm/filemap.c	2004-08-27 14:35:32.113359872 +0100
@@ -722,10 +722,7 @@ void do_generic_mapping_read(struct addr
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
 	isize = i_size_read(inode);
-	if (!isize)
-		goto out;
-
-	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+	end_index = isize >> PAGE_CACHE_SHIFT;
 
 	for (;;) {
 		struct page *page;
@@ -733,6 +730,11 @@ void do_generic_mapping_read(struct addr
 
 		if (index > end_index)
 			goto out;
+		if (index == end_index) {
+			nr = isize & ~PAGE_CACHE_MASK;
+			if (nr <= offset)
+				goto out;
+		}
 
 		cond_resched();
 		page_cache_readahead(mapping, &ra, filp, index);
@@ -831,8 +833,8 @@ readpage:
 		 * another truncate extends the file - this is desired though).
 		 */
 		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
-		if (unlikely(!isize || index > end_index)) {
+		end_index = isize >> PAGE_CACHE_SHIFT;
+		if (unlikely(index > end_index)) {
 			page_cache_release(page);
 			goto out;
 		}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 13:56   ` Hugh Dickins
@ 2004-08-27 14:18     ` Gergely Tamas
  2004-08-27 15:36       ` Fabio Coatti
  2004-08-27 18:30     ` Ram Pai
  1 sibling, 1 reply; 27+ messages in thread
From: Gergely Tamas @ 2004-08-27 14:18 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Denis Vlasenko, Andrew Morton, Ram Pai, linux-kernel

 > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
 > size multiple) data to!  You should find the patch below fixes it
 > (and, I hope, the issue the erroneous patches were trying to fix).
 > 
 > Signed-off-by: Hugh Dickins <hugh@veritas.com>

$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028393 seconds (369307252 bytes/sec)

$ du -sb testfile
10485760        testfile

$ cat testfile > testfile.1

$ du -sb testfile.1
10485760        testfile.1

Seems to be working fine now, thanks.
Gergely

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 14:18     ` Gergely Tamas
@ 2004-08-27 15:36       ` Fabio Coatti
  0 siblings, 0 replies; 27+ messages in thread
From: Fabio Coatti @ 2004-08-27 15:36 UTC (permalink / raw)
  To: Gergely Tamas
  Cc: Hugh Dickins, Denis Vlasenko, Andrew Morton, Ram Pai,
	linux-kernel

Alle 16:18, venerdì 27 agosto 2004, Gergely Tamas ha scritto:
>  > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
>  > size multiple) data to!  You should find the patch below fixes it
>  > (and, I hope, the issue the erroneous patches were trying to fix).
>  >
>  > Signed-off-by: Hugh Dickins <hugh@veritas.com>


JFYI: It seems that this patch fixes my problem with rpm database as well; 
thanks a lot.


-- 
Fabio "Cova" Coatti    http://members.ferrara.linux.it/cova     
Ferrara Linux Users Group           http://ferrara.linux.it
GnuPG fp:9765 A5B6 6843 17BC A646  BE8C FA56 373A 5374 C703
Old SysOps never die... they simply forget their password.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 13:56   ` Hugh Dickins
  2004-08-27 14:18     ` Gergely Tamas
@ 2004-08-27 18:30     ` Ram Pai
  2004-08-27 19:08       ` Hugh Dickins
  1 sibling, 1 reply; 27+ messages in thread
From: Ram Pai @ 2004-08-27 18:30 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel

On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote:
> On Fri, 27 Aug 2004, Denis Vlasenko wrote:
> > On Friday 27 August 2004 13:55, Gergely Tamas wrote:
> > >
> > > I've hit the following data loss problem under 2.6.9-rc1-mm1.
> > >
> > > If I copy data from a file to another the target will be smaller then
> > > the source file.
> > >
> > > 2.6.9-rc1 does not have this problem
> > > 2.6.8.1-mm4 does not have this problem
> > > 2.6.9-rc1-mm1 _does have_ this problem
> > 
> > I've seen some errors from KDE too. Let me do your test...
> > 
> > # dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
> > 10+0 records in
> > 10+0 records out
> > # cat testfile > testfile.1
> > # ls -l test*
> > -rw-r--r--    1 root     root     10485760 Aug 27 14:53 testfile
> > -rw-r--r--    1 root     root     10481664 Aug 27 14:53 testfile.1
> 
> Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
> size multiple) data to!  You should find the patch below fixes it
> (and, I hope, the issue the erroneous patches were trying to fix).


Hmm.. now I fail to understand how this code works.

assuming page size is 4096, if the size of the file is 4096, is the
end_index 0 or is it 1?
I had this assumption:  

	file size in bytes			end_index
	-----------------			---------
		1 to 4096			0
		4097 to 2*4096			1
		2*4096+1 to 3*4096		2
		...				..

or is the isize value reported by i_size_read(inode) one less than the
size of the real file?

What am I missing?
RP


> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> 
> --- 2.6.9-rc1-mm1/mm/filemap.c	2004-08-26 12:09:50.000000000 +0100
> +++ linux/mm/filemap.c	2004-08-27 14:35:32.113359872 +0100
> @@ -722,10 +722,7 @@ void do_generic_mapping_read(struct addr
>  	offset = *ppos & ~PAGE_CACHE_MASK;
>  
>  	isize = i_size_read(inode);
> -	if (!isize)
> -		goto out;
> -
> -	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> +	end_index = isize >> PAGE_CACHE_SHIFT;
>  
>  	for (;;) {
>  		struct page *page;
> @@ -733,6 +730,11 @@ void do_generic_mapping_read(struct addr
>  
>  		if (index > end_index)
>  			goto out;
> +		if (index == end_index) {
> +			nr = isize & ~PAGE_CACHE_MASK;
> +			if (nr <= offset)
> +				goto out;
> +		}
>  
>  		cond_resched();
>  		page_cache_readahead(mapping, &ra, filp, index);
> @@ -831,8 +833,8 @@ readpage:
>  		 * another truncate extends the file - this is desired though).
>  		 */
>  		isize = i_size_read(inode);
> -		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> -		if (unlikely(!isize || index > end_index)) {
> +		end_index = isize >> PAGE_CACHE_SHIFT;
> +		if (unlikely(index > end_index)) {
>  			page_cache_release(page);
>  			goto out;
>  		}
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 18:30     ` Ram Pai
@ 2004-08-27 19:08       ` Hugh Dickins
  2004-08-27 21:04         ` Ram Pai
  0 siblings, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2004-08-27 19:08 UTC (permalink / raw)
  To: Ram Pai; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel

On 27 Aug 2004, Ram Pai wrote:
> On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote:
> > 
> > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
> > size multiple) data to!  You should find the patch below fixes it
> > (and, I hope, the issue the erroneous patches were trying to fix).
> 
> Hmm.. now I fail to understand how this code works.
> 
> assuming page size is 4096, if the size of the file is 4096, is the
> end_index 0 or is it 1?

Before your change and after mine, 1; with your change, 0.

> I had this assumption:  
> 
> 	file size in bytes			end_index
> 	-----------------			---------
> 		1 to 4096			0
> 		4097 to 2*4096			1
> 		2*4096+1 to 3*4096		2
> 		...				..

Well, that's what you changed it to, when you patched from the original
		end_index = isize >> PAGE_CACHE_SHIFT;
to		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;

But the "nr <= offset" check(s) relies on the original convention:
		0 to 4095			0
		4096 to 8191			1
		...				..

> or is the isize value reported by i_size_read(inode) one less than the
> size of the real file?

No!

> What am I missing?

You're expecting end_index to be the index of the last (possibly
incomplete) page of the file.  And that might be a reasonable way
of working it (though the special case of an empty file hints not).
But the nr,offset checks (I say checks because I added another like
the one further down, hopefully to fix the extra readahead issue)
require the original convention.  Just try it out with numbers.

Hugh


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 19:08       ` Hugh Dickins
@ 2004-08-27 21:04         ` Ram Pai
  2004-08-28  4:35           ` Nick Piggin
  0 siblings, 1 reply; 27+ messages in thread
From: Ram Pai @ 2004-08-27 21:04 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On Fri, 2004-08-27 at 12:08, Hugh Dickins wrote:
> On 27 Aug 2004, Ram Pai wrote:
> > On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote:
> > > 
> > > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page
> > > size multiple) data to!  You should find the patch below fixes it
> > > (and, I hope, the issue the erroneous patches were trying to fix).
> > 
> > Hmm.. now I fail to understand how this code works.
> > 
> > assuming page size is 4096, if the size of the file is 4096, is the
> > end_index 0 or is it 1?
> 
> Before your change and after mine, 1; with your change, 0.
> 
> > I had this assumption:  
> > 
> > 	file size in bytes			end_index
> > 	-----------------			---------
> > 		1 to 4096			0
> > 		4097 to 2*4096			1
> > 		2*4096+1 to 3*4096		2
> > 		...				..
> 
> Well, that's what you changed it to, when you patched from the original
> 		end_index = isize >> PAGE_CACHE_SHIFT;
> to		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
> 
> But the "nr <= offset" check(s) relies on the original convention:
> 		0 to 4095			0
> 		4096 to 8191			1
> 		...				..
> 
> > or is the isize value reported by i_size_read(inode) one less than the
> > size of the real file?
> 
> No!
> 
> > What am I missing?
> 
> You're expecting end_index to be the index of the last (possibly
> incomplete) page of the file.  And that might be a reasonable way
> of working it (though the special case of an empty file hints not).
> But the nr,offset checks (I say checks because I added another like
> the one further down, hopefully to fix the extra readahead issue)
> require the original convention.  Just try it out with numbers.

got it!  Everything got changed to the new convention except that
the calculation of 'nr' just before the check "nr <= offset" .

I have generated this patch which takes care of that and hence fixes the
data loss problem as well. I guess it is cleaner too. 

This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I
will forward it to Andrew.

RP


> 
> Hugh
> 

[-- Attachment #2: pagecache_out_of_indx_access_attempt3.patch --]
[-- Type: text/plain, Size: 1265 bytes --]

--- ram/linux-2.6.8.1/mm/filemap.c	2004-08-14 03:56:25.000000000 -0700
+++ linux-2.6.8.1/mm/filemap.c	2004-08-27 13:43:00.000000000 -0700
@@ -665,14 +665,18 @@ void do_generic_mapping_read(struct addr
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
 	isize = i_size_read(inode);
-	end_index = isize >> PAGE_CACHE_SHIFT;
-	if (index > end_index)
+	if (!isize)
 		goto out;
 
+	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+
 	for (;;) {
 		struct page *page;
 		unsigned long nr, ret;
 
+		if (index > end_index)
+			goto out;
+
 		cond_resched();
 		page_cache_readahead(mapping, &ra, filp, index);
 
@@ -688,7 +692,8 @@ page_ok:
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = PAGE_CACHE_SIZE;
 		if (index == end_index) {
-			nr = isize & ~PAGE_CACHE_MASK;
+			BUG_ON(isize==0);
+			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
 			if (nr <= offset) {
 				page_cache_release(page);
 				goto out;
@@ -770,8 +775,8 @@ readpage:
 		 * another truncate extends the file - this is desired though).
 		 */
 		isize = i_size_read(inode);
-		end_index = isize >> PAGE_CACHE_SHIFT;
-		if (index > end_index) {
+		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+		if (unlikely(!isize || index > end_index)) {
 			page_cache_release(page);
 			goto out;
 		}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-27 21:04         ` Ram Pai
@ 2004-08-28  4:35           ` Nick Piggin
  2004-08-28  5:01             ` Ram Pai
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2004-08-28  4:35 UTC (permalink / raw)
  To: Ram Pai
  Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

Ram Pai wrote:
> 
> got it!  Everything got changed to the new convention except that
> the calculation of 'nr' just before the check "nr <= offset" .
> 
> I have generated this patch which takes care of that and hence fixes the
> data loss problem as well. I guess it is cleaner too. 
> 
> This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I
> will forward it to Andrew.

It looks like it should be OK... but at what point does it become
simpler to use my patch which just moves the original calculation
up, and does it again if we have to ->readpage()?

(assuming you agree that it solves the problem)


[-- Attachment #2: mm-gmr-fix.patch --]
[-- Type: text/x-patch, Size: 1565 bytes --]




---

 linux-2.6-npiggin/mm/filemap.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

diff -puN mm/filemap.c~mm-gmr-fix mm/filemap.c
--- linux-2.6/mm/filemap.c~mm-gmr-fix	2004-08-28 14:14:02.000000000 +1000
+++ linux-2.6-npiggin/mm/filemap.c	2004-08-28 14:32:59.000000000 +1000
@@ -724,6 +724,15 @@ void do_generic_mapping_read(struct addr
 		struct page *page;
 		unsigned long nr, ret;
 
+		/* nr is the maximum number of bytes to copy from this page */
+		nr = PAGE_CACHE_SIZE;
+		if (index == end_index) {
+			nr = isize & ~PAGE_CACHE_MASK;
+			if (nr <= offset)
+				goto out;
+		}
+		nr = nr - offset;
+
 		cond_resched();
 		page_cache_readahead(mapping, &ra, filp, index);
 
@@ -736,17 +745,6 @@ find_page:
 		if (!PageUptodate(page))
 			goto page_not_up_to_date;
 page_ok:
-		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_CACHE_SIZE;
-		if (index == end_index) {
-			nr = isize & ~PAGE_CACHE_MASK;
-			if (nr <= offset) {
-				page_cache_release(page);
-				goto out;
-			}
-		}
-		nr = nr - offset;
-
 		/* If users can be writing to this page using arbitrary
 		 * virtual addresses, take care about potential aliasing
 		 * before reading the page on the kernel side.
@@ -826,6 +824,15 @@ readpage:
 			page_cache_release(page);
 			goto out;
 		}
+		nr = PAGE_CACHE_SIZE;
+		if (index == end_index) {
+			nr = isize & ~PAGE_CACHE_MASK;
+			if (nr <= offset) {
+				page_cache_release(page);
+				goto out;
+			}
+		}
+		nr = nr - offset;
 		goto page_ok;
 
 readpage_error:

_

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  4:35           ` Nick Piggin
@ 2004-08-28  5:01             ` Ram Pai
  2004-08-28  5:42               ` Andrew Morton
  2004-08-28  5:54               ` Nick Piggin
  0 siblings, 2 replies; 27+ messages in thread
From: Ram Pai @ 2004-08-28  5:01 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

On Fri, 2004-08-27 at 21:35, Nick Piggin wrote:
> Ram Pai wrote:
> > 
> > got it!  Everything got changed to the new convention except that
> > the calculation of 'nr' just before the check "nr <= offset" .
> > 
> > I have generated this patch which takes care of that and hence fixes the
> > data loss problem as well. I guess it is cleaner too. 
> > 
> > This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I
> > will forward it to Andrew.
> 
> It looks like it should be OK... but at what point does it become
> simpler to use my patch which just moves the original calculation
> up, and does it again if we have to ->readpage()?
> 
> (assuming you agree that it solves the problem)

I agree your patch also solves the problem.

Either way is fine. Even Hugh's patch almost does the same thing as
yours. The only advantage with my page is it does the calculation in
only one place and does not repeat it. Also I feel its more intuitive to
assume that index 0 covers range 0 to 4095 i.e index n covers range
n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1.  Currently the code assumes index 0
covers range 1 to 4096  i.e index n covers range (n*PAGE_SIZE)+1 to
(n+1)*PAGE_SIZE. 

this is the 4th time we are trying to nail down the same thing. We
better get it right this time. So any correct patch is ok with me.

RP 

 
> 
> 
> ______________________________________________________________________
> 
> 
> ---
> 
>  linux-2.6-npiggin/mm/filemap.c |   29 ++++++++++++++++++-----------
>  1 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff -puN mm/filemap.c~mm-gmr-fix mm/filemap.c
> --- linux-2.6/mm/filemap.c~mm-gmr-fix	2004-08-28 14:14:02.000000000 +1000
> +++ linux-2.6-npiggin/mm/filemap.c	2004-08-28 14:32:59.000000000 +1000
> @@ -724,6 +724,15 @@ void do_generic_mapping_read(struct addr
>  		struct page *page;
>  		unsigned long nr, ret;
>  
> +		/* nr is the maximum number of bytes to copy from this page */
> +		nr = PAGE_CACHE_SIZE;
> +		if (index == end_index) {
> +			nr = isize & ~PAGE_CACHE_MASK;
> +			if (nr <= offset)
> +				goto out;
> +		}
> +		nr = nr - offset;
> +
>  		cond_resched();
>  		page_cache_readahead(mapping, &ra, filp, index);
>  
> @@ -736,17 +745,6 @@ find_page:
>  		if (!PageUptodate(page))
>  			goto page_not_up_to_date;
>  page_ok:
> -		/* nr is the maximum number of bytes to copy from this page */
> -		nr = PAGE_CACHE_SIZE;
> -		if (index == end_index) {
> -			nr = isize & ~PAGE_CACHE_MASK;
> -			if (nr <= offset) {
> -				page_cache_release(page);
> -				goto out;
> -			}
> -		}
> -		nr = nr - offset;
> -
>  		/* If users can be writing to this page using arbitrary
>  		 * virtual addresses, take care about potential aliasing
>  		 * before reading the page on the kernel side.
> @@ -826,6 +824,15 @@ readpage:
>  			page_cache_release(page);
>  			goto out;
>  		}
> +		nr = PAGE_CACHE_SIZE;
> +		if (index == end_index) {
> +			nr = isize & ~PAGE_CACHE_MASK;
> +			if (nr <= offset) {
> +				page_cache_release(page);
> +				goto out;
> +			}
> +		}
> +		nr = nr - offset;
>  		goto page_ok;
>  
>  readpage_error:
> 
> _


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  5:01             ` Ram Pai
@ 2004-08-28  5:42               ` Andrew Morton
  2004-08-28  5:54               ` Nick Piggin
  1 sibling, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2004-08-28  5:42 UTC (permalink / raw)
  To: Ram Pai; +Cc: nickpiggin, hugh, dice, vda, linux-kernel

Ram Pai <linuxram@us.ibm.com> wrote:
>
> this is the 4th time we are trying to nail down the same thing. We
>  better get it right this time. So any correct patch is ok with me.

There's no rush and it's a kernel fastpath.  Let's get the optimal code
debugged and merged, please.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  5:01             ` Ram Pai
  2004-08-28  5:42               ` Andrew Morton
@ 2004-08-28  5:54               ` Nick Piggin
  2004-08-28  9:44                 ` Rafael J. Wysocki
  2004-08-28 14:52                 ` Hugh Dickins
  1 sibling, 2 replies; 27+ messages in thread
From: Nick Piggin @ 2004-08-28  5:54 UTC (permalink / raw)
  To: Ram Pai
  Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

Ram Pai wrote:
> On Fri, 2004-08-27 at 21:35, Nick Piggin wrote:
> 
>>Ram Pai wrote:
>>
>>>got it!  Everything got changed to the new convention except that
>>>the calculation of 'nr' just before the check "nr <= offset" .
>>>
>>>I have generated this patch which takes care of that and hence fixes the
>>>data loss problem as well. I guess it is cleaner too. 
>>>
>>>This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I
>>>will forward it to Andrew.
>>
>>It looks like it should be OK... but at what point does it become
>>simpler to use my patch which just moves the original calculation
>>up, and does it again if we have to ->readpage()?
>>
>>(assuming you agree that it solves the problem)
> 
> 
> I agree your patch also solves the problem.
> 
> Either way is fine. Even Hugh's patch almost does the same thing as
> yours.

Ahh, yep - Hugh just forgot to also move the "nr" calculation
into the ->readpage path, so it hits twice on the fast path.

> The only advantage with my page is it does the calculation in
> only one place and does not repeat it. Also I feel its more intuitive to

Well kind of - but you are having to jump through hoops to get there.
Yours does the following checks:

	/* fast path, read nr_pages from pagecache */
	if (!isize)
		goto out;
	for (i = 0; i < nr_pages; i++) {
		if (index > end_index)
			goto out;
		if (index == end_index) {
			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
			if (nr <= offset) {
				page_cache_release(page);
				goto out;
			}
		}

		/* slowpath, ->readpage */
		if (unlikely(!isize || index > end_index)) {
			page_cache_release(page);
			goto out;
		}
	}


Mine does:
	if (index > end_index)
		goto out;
	for (i = 0; i < pages_to_read; i++) {
		if (index == end_index) {
			nr = isize & ~PAGE_CACHE_MASK;
			if (nr <= offset)
				goto out;
		}

		/* slowpath, ->readpage */
		if (index > end_index) {
			page_cache_release(page);
			goto out;
		}
		if (index == end_index) {
			nr = isize & ~PAGE_CACHE_MASK;
			if (nr <= offset) {
				page_cache_release(page);
				goto out;
			}
		}
	}

So my fastpath is surely leaner, while the slowpath isn't a clear loser.

What's more, it looks like mine handles the corner case of reading off the
end of a non-PAGE_SIZE file (but within the same page). I think yours will
drop through and do the ->readpage, while mine doesn't...?


> assume that index 0 covers range 0 to 4095 i.e index n covers range
> n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1.  Currently the code assumes index 0
> covers range 1 to 4096  i.e index n covers range (n*PAGE_SIZE)+1 to
> (n+1)*PAGE_SIZE. 
> 

It is definitely a pretty ugly function all round. I like the 0-4095 thing
better too, but my counter argument to that is that this is the minimal
change, and similar to how it has previously worked.

> this is the 4th time we are trying to nail down the same thing. We
> better get it right this time. So any correct patch is ok with me.
> 

I agree. We'll leave it to someone else to decide, then ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  5:54               ` Nick Piggin
@ 2004-08-28  9:44                 ` Rafael J. Wysocki
  2004-08-28  9:45                   ` Andrew Morton
  2004-08-28 14:52                 ` Hugh Dickins
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2004-08-28  9:44 UTC (permalink / raw)
  To: Nick Piggin, Ram Pai
  Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

Well, guys, to make it 100% clear: if I apply the Nick's patch to the 
2.6.9-rc1-mm1 tree, it will fix the data loss issue.  Is that right?

RJW

On Saturday 28 of August 2004 07:54, Nick Piggin wrote:
> Ram Pai wrote:
> > On Fri, 2004-08-27 at 21:35, Nick Piggin wrote:
> >>Ram Pai wrote:
> >>>got it!  Everything got changed to the new convention except that
> >>>the calculation of 'nr' just before the check "nr <= offset" .
> >>>
> >>>I have generated this patch which takes care of that and hence fixes the
> >>>data loss problem as well. I guess it is cleaner too.
> >>>
> >>>This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I
> >>>will forward it to Andrew.
> >>
> >>It looks like it should be OK... but at what point does it become
> >>simpler to use my patch which just moves the original calculation
> >>up, and does it again if we have to ->readpage()?
> >>
> >>(assuming you agree that it solves the problem)
> >
> > I agree your patch also solves the problem.
> >
> > Either way is fine. Even Hugh's patch almost does the same thing as
> > yours.
>
> Ahh, yep - Hugh just forgot to also move the "nr" calculation
> into the ->readpage path, so it hits twice on the fast path.
>
> > The only advantage with my page is it does the calculation in
> > only one place and does not repeat it. Also I feel its more intuitive to
>
> Well kind of - but you are having to jump through hoops to get there.
> Yours does the following checks:
>
> 	/* fast path, read nr_pages from pagecache */
> 	if (!isize)
> 		goto out;
> 	for (i = 0; i < nr_pages; i++) {
> 		if (index > end_index)
> 			goto out;
> 		if (index == end_index) {
> 			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
> 			if (nr <= offset) {
> 				page_cache_release(page);
> 				goto out;
> 			}
> 		}
>
> 		/* slowpath, ->readpage */
> 		if (unlikely(!isize || index > end_index)) {
> 			page_cache_release(page);
> 			goto out;
> 		}
> 	}
>
>
> Mine does:
> 	if (index > end_index)
> 		goto out;
> 	for (i = 0; i < pages_to_read; i++) {
> 		if (index == end_index) {
> 			nr = isize & ~PAGE_CACHE_MASK;
> 			if (nr <= offset)
> 				goto out;
> 		}
>
> 		/* slowpath, ->readpage */
> 		if (index > end_index) {
> 			page_cache_release(page);
> 			goto out;
> 		}
> 		if (index == end_index) {
> 			nr = isize & ~PAGE_CACHE_MASK;
> 			if (nr <= offset) {
> 				page_cache_release(page);
> 				goto out;
> 			}
> 		}
> 	}
>
> So my fastpath is surely leaner, while the slowpath isn't a clear loser.
>
> What's more, it looks like mine handles the corner case of reading off the
> end of a non-PAGE_SIZE file (but within the same page). I think yours will
> drop through and do the ->readpage, while mine doesn't...?
>
> > assume that index 0 covers range 0 to 4095 i.e index n covers range
> > n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1.  Currently the code assumes index 0
> > covers range 1 to 4096  i.e index n covers range (n*PAGE_SIZE)+1 to
> > (n+1)*PAGE_SIZE.
>
> It is definitely a pretty ugly function all round. I like the 0-4095 thing
> better too, but my counter argument to that is that this is the minimal
> change, and similar to how it has previously worked.
>
> > this is the 4th time we are trying to nail down the same thing. We
> > better get it right this time. So any correct patch is ok with me.
>
> I agree. We'll leave it to someone else to decide, then ;)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled.
					-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  9:44                 ` Rafael J. Wysocki
@ 2004-08-28  9:45                   ` Andrew Morton
  2004-08-28 10:18                     ` Nick Piggin
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2004-08-28  9:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: nickpiggin, linuxram, hugh, dice, vda, linux-kernel

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> Well, guys, to make it 100% clear: if I apply the Nick's patch to the 
>  2.6.9-rc1-mm1 tree, it will fix the data loss issue.  Is that right?

Should do.  Or revert

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch

and then

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  9:45                   ` Andrew Morton
@ 2004-08-28 10:18                     ` Nick Piggin
  2004-08-28 10:47                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2004-08-28 10:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, linuxram, hugh, dice, vda, linux-kernel

Andrew Morton wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
>>Well, guys, to make it 100% clear: if I apply the Nick's patch to the 
>> 2.6.9-rc1-mm1 tree, it will fix the data loss issue.  Is that right?
> 
> 
> Should do.

It passes test cases that would previously fail here, so consider it
lightly tested. Note that the patch is on top of 2.6.9-rc1 though,
it becomes slightly deranged when applying straight onto mm. So don't
do that.

...

>  Or revert
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch
> 
> and then
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch
> 
> 
> 

Once you have these backed out mine should apply fine, but it only closes
some performance (not correctness) corner cases that the above patches
attempted to.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28 10:18                     ` Nick Piggin
@ 2004-08-28 10:47                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2004-08-28 10:47 UTC (permalink / raw)
  To: Nick Piggin, Andrew Morton; +Cc: linuxram, hugh, dice, vda, linux-kernel

On Saturday 28 of August 2004 12:18, Nick Piggin wrote:
> Andrew Morton wrote:
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >>Well, guys, to make it 100% clear: if I apply the Nick's patch to the
> >> 2.6.9-rc1-mm1 tree, it will fix the data loss issue.  Is that right?
> >
> > Should do.
>
> It passes test cases that would previously fail here, so consider it
> lightly tested. Note that the patch is on top of 2.6.9-rc1 though,
> it becomes slightly deranged when applying straight onto mm. So don't
> do that.
>
> ..
>
> >  Or revert
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2
> >.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch
> >
> > and then
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2
> >.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch
>
> Once you have these backed out mine should apply fine, but it only closes
> some performance (not correctness) corner cases that the above patches
> attempted to.

OK.  Thanks a lot,

RJW

-- 
For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled.
					-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28  5:54               ` Nick Piggin
  2004-08-28  9:44                 ` Rafael J. Wysocki
@ 2004-08-28 14:52                 ` Hugh Dickins
  2004-08-29  1:30                   ` Nick Piggin
  1 sibling, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2004-08-28 14:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ram Pai, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

On Sat, 28 Aug 2004, Nick Piggin wrote:
> 
> Ahh, yep - Hugh just forgot to also move the "nr" calculation
> into the ->readpage path, so it hits twice on the fast path.

Yes, your patch is better than mine.

> What's more, it looks like mine handles the corner case of reading off the
> end of a non-PAGE_SIZE file (but within the same page). I think yours will
> drop through and do the ->readpage, while mine doesn't...?

It's a common case, not a corner case: a short read to end of file,
then app does another read which returns 0 for the end of file: that
wouldn't normally fall through to readpage in Ram's case, but would
do unnecessary page_cache_readahead (won't do much) and find_get_page.

> I agree. We'll leave it to someone else to decide, then ;)

I vote for Nick's patch.

I do have one reservation on do_generic_mapping_read,
common to all these versions, unrelated to the current issue.

I'm surprised to notice that you're careful to re-i_size_read
after readpage, but otherwise rely on the initial i_size_read.
We could go around this loop many many times, faulting user pages
in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was
deficient after readpage, but at least it reassessed i_size each
time around the loop.  I guess if the file is truncated meanwhile,
the common case would be for a find_get_page to fail, and then the
situation be corrected after readpage; perhaps it's more likely to
show up as read returning too little on a large file being steadily
appended.  Maybe you already ruled these cases out as not worth the
overhead of handling, but it does surprise me that the old code was
safer in this respect.

Hugh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-28 14:52                 ` Hugh Dickins
@ 2004-08-29  1:30                   ` Nick Piggin
  2004-08-31  6:25                     ` Ram Pai
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Piggin @ 2004-08-29  1:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ram Pai, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

Hugh Dickins wrote:
> On Sat, 28 Aug 2004, Nick Piggin wrote:
> 
>>Ahh, yep - Hugh just forgot to also move the "nr" calculation
>>into the ->readpage path, so it hits twice on the fast path.
> 
> 
> Yes, your patch is better than mine.
> 
> 
>>What's more, it looks like mine handles the corner case of reading off the
>>end of a non-PAGE_SIZE file (but within the same page). I think yours will
>>drop through and do the ->readpage, while mine doesn't...?
> 
> 
> It's a common case, not a corner case: a short read to end of file,
> then app does another read which returns 0 for the end of file: that
> wouldn't normally fall through to readpage in Ram's case, but would
> do unnecessary page_cache_readahead (won't do much) and find_get_page.
> 

Ahh, yeah I guess that would be the more common case. I was just
thinking of just randomly reading past the end of the file - in
that case it *would* fall through to ->readpage if the page wasn't
in cache.

But anyway, I think we agree my (our) version should cover that.

> 
>>I agree. We'll leave it to someone else to decide, then ;)
> 
> 
> I vote for Nick's patch.
> 

OK - maybe that can go for a spin in the next -mm. Andrew did you
get it?

> I do have one reservation on do_generic_mapping_read,
> common to all these versions, unrelated to the current issue.
> 
> I'm surprised to notice that you're careful to re-i_size_read
> after readpage, but otherwise rely on the initial i_size_read.
> We could go around this loop many many times, faulting user pages
> in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was
> deficient after readpage, but at least it reassessed i_size each
> time around the loop.  I guess if the file is truncated meanwhile,
> the common case would be for a find_get_page to fail, and then the
> situation be corrected after readpage; perhaps it's more likely to
> show up as read returning too little on a large file being steadily
> appended.  Maybe you already ruled these cases out as not worth the
> overhead of handling, but it does surprise me that the old code was
> safer in this respect.
> 

Yeah I guess it is a case of doing the minimum that is really
needed.

Although considering page_cache_readahead call can do an i_size_read,
it might be nicer to probably change the interface to have it passed down
instead

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-29  1:30                   ` Nick Piggin
@ 2004-08-31  6:25                     ` Ram Pai
  2004-08-31  6:39                       ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Ram Pai @ 2004-08-31  6:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton,
	linux-kernel

On Sat, 2004-08-28 at 18:30, Nick Piggin wrote:
> Hugh Dickins wrote:
> > On Sat, 28 Aug 2004, Nick Piggin wrote:
 ..snip..
> 
> But anyway, I think we agree my (our) version should cover that.
> 
> > 
> >>I agree. We'll leave it to someone else to decide, then ;)
> > 
> > 
> > I vote for Nick's patch.
> > 
> 
> OK - maybe that can go for a spin in the next -mm. Andrew did you
> get it?

So in case my vote counts, add my vote too :)  .

> 
> > I do have one reservation on do_generic_mapping_read,
> > common to all these versions, unrelated to the current issue.
> > 
> > I'm surprised to notice that you're careful to re-i_size_read
> > after readpage, but otherwise rely on the initial i_size_read.
> > We could go around this loop many many times, faulting user pages
> > in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was
> > deficient after readpage, but at least it reassessed i_size each
> > time around the loop.  I guess if the file is truncated meanwhile,
> > the common case would be for a find_get_page to fail, and then the
> > situation be corrected after readpage; perhaps it's more likely to
> > show up as read returning too little on a large file being steadily
> > appended.  Maybe you already ruled these cases out as not worth the
> > overhead of handling, but it does surprise me that the old code was
> > safer in this respect.
> > 
> 
> Yeah I guess it is a case of doing the minimum that is really
> needed.
> 
> Although considering page_cache_readahead call can do an i_size_read,
> it might be nicer to probably change the interface to have it passed down
> instead

We are experimenting some patches to see if sending the i_size parameter
to page_cache_readahead() can help or not. There are couple of
advantages in doing that. (1) Quick ramp-up to max-readahead pages for
sequential workload. (2) being able to read atleast the requested number
of pages in one shot instead of reading one page at a time in case the
readahead window gets closed.

But the biggest performance boost has been seen with large max-readahead
window sizes. Currently most of the underlying block devices default to
32 pages max-readahead  even though the underlying device can handle
much larger reads. We could extract much more sequential read
performance if the max-readahead was set to much higher values like 256
pages which most modern devices are capable off. The problem AFAICT is
that the block device layer defaults the max-readahead value for most
block devices to 32, without consulting the capability of the underlying
block device driver. 

RP

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
  2004-08-31  6:25                     ` Ram Pai
@ 2004-08-31  6:39                       ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2004-08-31  6:39 UTC (permalink / raw)
  To: Ram Pai; +Cc: nickpiggin, hugh, dice, vda, linux-kernel

Ram Pai <linuxram@us.ibm.com> wrote:
>
> > OK - maybe that can go for a spin in the next -mm. Andrew did you
> > get it?
> 
> So in case my vote counts, add my vote too :)  .
> 

Can someone send me the patch?

> 
> But the biggest performance boost has been seen with large max-readahead
> window sizes. Currently most of the underlying block devices default to
> 32 pages max-readahead  even though the underlying device can handle
> much larger reads. We could extract much more sequential read
> performance if the max-readahead was set to much higher values like 256
> pages which most modern devices are capable off. The problem AFAICT is
> that the block device layer defaults the max-readahead value for most
> block devices to 32, without consulting the capability of the underlying
> block device driver. 

This can be done in startup scripts.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: data loss in 2.6.9-rc1-mm1
@ 2004-08-28 12:05 Joachim Bremer
  0 siblings, 0 replies; 27+ messages in thread
From: Joachim Bremer @ 2004-08-28 12:05 UTC (permalink / raw)
  To: linux-kernel

It may be architecture depend so here are my results:
machine: Tyan Thundewr K8W dual opteron with 2 G of memory, raid 4 on SATA drives
Distri: SuSE 9,1 64-bit, GCC 3.4.1
Stock 2.6.9-rc1-mm1: lots of errors. ldd on 32bit executables fail, reiserfsck fails on last block etc
2.6.9-rc1-mm1 with backed out pagecache patches: completely OK
2.6.9-rc1-mm1 with patch from Hugh Dickins: completely OK
2.6.9-rc1-mm1 Nick Piggin: only partially OK, will fail on reiserfsck with "bread End of file" on last block
of the md -device, random erros on compiling a kernel, mostly something like
"fs/ioctl.o: file not recognized: File truncated"

Joachim

________________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt neu bei WEB.DE FreeMail: http://freemail.web.de/?mc=021193

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2004-08-31  6:44 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas
2004-08-27 11:05 ` Anton Altaparmakov
2004-08-27 11:40   ` Gergely Tamas
2004-08-27 12:35   ` Fabio Coatti
2004-08-27 11:17 ` Tim Schmielau
2004-08-27 11:43   ` Gergely Tamas
2004-08-27 11:37 ` Mikael Pettersson
2004-08-27 11:55 ` Denis Vlasenko
2004-08-27 13:56   ` Hugh Dickins
2004-08-27 14:18     ` Gergely Tamas
2004-08-27 15:36       ` Fabio Coatti
2004-08-27 18:30     ` Ram Pai
2004-08-27 19:08       ` Hugh Dickins
2004-08-27 21:04         ` Ram Pai
2004-08-28  4:35           ` Nick Piggin
2004-08-28  5:01             ` Ram Pai
2004-08-28  5:42               ` Andrew Morton
2004-08-28  5:54               ` Nick Piggin
2004-08-28  9:44                 ` Rafael J. Wysocki
2004-08-28  9:45                   ` Andrew Morton
2004-08-28 10:18                     ` Nick Piggin
2004-08-28 10:47                       ` Rafael J. Wysocki
2004-08-28 14:52                 ` Hugh Dickins
2004-08-29  1:30                   ` Nick Piggin
2004-08-31  6:25                     ` Ram Pai
2004-08-31  6:39                       ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2004-08-28 12:05 Joachim Bremer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox