* data loss in 2.6.9-rc1-mm1
@ 2004-08-27 10:55 Gergely Tamas
2004-08-27 11:05 ` Anton Altaparmakov
` (3 more replies)
0 siblings, 4 replies; 27+ messages in thread
From: Gergely Tamas @ 2004-08-27 10:55 UTC (permalink / raw)
To: linux-kernel; +Cc: akpm
Hi!
I've hit the following data loss problem under 2.6.9-rc1-mm1.
If I copy data from a file to another the target will be smaller then
the source file.
2.6.9-rc1 does not have this problem
2.6.8.1-mm4 does not have this problem
2.6.9-rc1-mm1 _does have_ this problem
I tried this with reiserfs and xfs and it happened with both of them.
See the testcase at the bottom of this mail.
Thanks in advance,
Gergely
--------------------------------------------------
$ uname -r
2.6.9-rc1
$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028646 seconds (366045254 bytes/sec)
$ du -sb testfile
10485760 testfile
$ cat testfile > testfile.1
$ du -sb testfile.1
10485760 testfile.1
--------------------------------------------------
$ uname -r
2.6.8.1-mm4
$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028632 seconds (366226397 bytes/sec)
$ du -sb testfile
10485760 testfile
$ cat testfile > testfile.1
$ du -sb testfile.1
10485760 testfile.1
--------------------------------------------------
$ uname -r
2.6.9-rc1-mm1
$ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec)
$ du -sb testfile
10485760 testfile
$ cat testfile > testfile.1
$ du -sb testfile.1
10481664 testfile.1
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas @ 2004-08-27 11:05 ` Anton Altaparmakov 2004-08-27 11:40 ` Gergely Tamas 2004-08-27 12:35 ` Fabio Coatti 2004-08-27 11:17 ` Tim Schmielau ` (2 subsequent siblings) 3 siblings, 2 replies; 27+ messages in thread From: Anton Altaparmakov @ 2004-08-27 11:05 UTC (permalink / raw) To: Gergely Tamas; +Cc: lkml, Andrew Morton On Fri, 2004-08-27 at 11:55, Gergely Tamas wrote: > I've hit the following data loss problem under 2.6.9-rc1-mm1. > > If I copy data from a file to another the target will be smaller then > the source file. [snip] > $ uname -r > 2.6.9-rc1-mm1 > > $ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 > 10+0 records in > 10+0 records out > 10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec) > > $ du -sb testfile > 10485760 testfile > > $ cat testfile > testfile.1 > > $ du -sb testfile.1 > 10481664 testfile.1 The difference is exactly 4096 bytes, i.e. 1 whole page. Seems like an off-by-one error somewhere in the file access or page cache code. It would be interesting to know whether the read is truncated or whether the write is truncated. So could you tell us what is returned by: cat testfile | wc -c Also your .config would probably be helpful. Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 11:05 ` Anton Altaparmakov @ 2004-08-27 11:40 ` Gergely Tamas 2004-08-27 12:35 ` Fabio Coatti 1 sibling, 0 replies; 27+ messages in thread From: Gergely Tamas @ 2004-08-27 11:40 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: lkml, Andrew Morton Hi! > The difference is exactly 4096 bytes, i.e. 1 whole page. Seems like an > off-by-one error somewhere in the file access or page cache code. > > It would be interesting to know whether the read is truncated or whether > the write is truncated. So could you tell us what is returned by: > > cat testfile | wc -c $ cat testfile | wc -c 10481664 > Also your .config would probably be helpful. [ http://dice.mfa.kfki.hu/dot.config-2.6.9-rc1-mm1.gz ] Thanks, Gergely ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 11:05 ` Anton Altaparmakov 2004-08-27 11:40 ` Gergely Tamas @ 2004-08-27 12:35 ` Fabio Coatti 1 sibling, 0 replies; 27+ messages in thread From: Fabio Coatti @ 2004-08-27 12:35 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Gergely Tamas, lkml, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1410 bytes --] Alle 13:05, venerdì 27 agosto 2004, Anton Altaparmakov ha scritto: > > The difference is exactly 4096 bytes, i.e. 1 whole page. Seems like an > off-by-one error somewhere in the file access or page cache code. > > It would be interesting to know whether the read is truncated or whether > the write is truncated. I can confirm very bad behaviour on my machine running 2.6.9-rc1-mm1 I've noticed curruption of rpm database, (symptom: error: db4 error(-30989) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found ) So I've tried to rebuild the database, failing with the same error. I've completely wiped out the database direcory, copied another one from a similar machine (just for testing). Now under 2.6.7-mm7 all database is read just fine, but under 2.6.9-rc1-mm1 i get the same error DB_PAGE_NOTFOUND, so I suspect read error. If I try to tar the archive, I get several errors, i.e.: tar: /var/lib/rpm/Packages: File shrank by 4096 bytes; padding with zeros (2.6.9-rc1-mm1) Filesystem: jfs system: P IV 1.70 Attached you can find my config file for 2.6.9-rc1-mm1. I can make tests if needed. -- Fabio "Cova" Coatti http://members.ferrara.linux.it/cova Ferrara Linux Users Group http://ferrara.linux.it GnuPG fp:9765 A5B6 6843 17BC A646 BE8C FA56 373A 5374 C703 Old SysOps never die... they simply forget their password. [-- Attachment #2: 269rc1mm1.config.gz --] [-- Type: application/x-gzip, Size: 6899 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas 2004-08-27 11:05 ` Anton Altaparmakov @ 2004-08-27 11:17 ` Tim Schmielau 2004-08-27 11:43 ` Gergely Tamas 2004-08-27 11:37 ` Mikael Pettersson 2004-08-27 11:55 ` Denis Vlasenko 3 siblings, 1 reply; 27+ messages in thread From: Tim Schmielau @ 2004-08-27 11:17 UTC (permalink / raw) To: Gergely Tamas; +Cc: linux-kernel, akpm > $ uname -r > 2.6.9-rc1-mm1 > > $ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 > 10+0 records in > 10+0 records out > 10485760 bytes transferred in 0.028418 seconds (368981986 bytes/sec) > > $ du -sb testfile > 10485760 testfile > > $ cat testfile > testfile.1 > > $ du -sb testfile.1 > 10481664 testfile.1 What does ls -l testfile.1 give? What you describe actually can be correct behaviour, since the file is all zeros. Although yes, it seems highly improbable someone implemented an optimization that cuts away just one of 2560 pages. Tim ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 11:17 ` Tim Schmielau @ 2004-08-27 11:43 ` Gergely Tamas 0 siblings, 0 replies; 27+ messages in thread From: Gergely Tamas @ 2004-08-27 11:43 UTC (permalink / raw) To: Tim Schmielau; +Cc: linux-kernel, akpm Hi! > What does ls -l testfile.1 give? $ ls -l testfile{,.1} -rw-r--r-- 1 dice users 10485760 Aug 27 13:25 testfile -rw-r--r-- 1 dice users 10481664 Aug 27 13:25 testfile.1 > What you describe actually can be correct behaviour, since the file is > all zeros. Same test with other source file... $ dd if=linux-2.6.8.1.tar.bz2 of=testfile bs=$((1024*1024)) count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 0.061916 seconds (169354917 bytes/sec) $ du -sb testfile 10485760 testfile $ cat testfile > testfile.1 $ du -sb testfile.1 10481664 testfile.1 > Although yes, it seems highly improbable someone implemented an > optimization that cuts away just one of 2560 pages. Thanks, Gergely ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas 2004-08-27 11:05 ` Anton Altaparmakov 2004-08-27 11:17 ` Tim Schmielau @ 2004-08-27 11:37 ` Mikael Pettersson 2004-08-27 11:55 ` Denis Vlasenko 3 siblings, 0 replies; 27+ messages in thread From: Mikael Pettersson @ 2004-08-27 11:37 UTC (permalink / raw) To: Gergely Tamas; +Cc: linux-kernel, akpm Gergely Tamas writes: > Hi! > > I've hit the following data loss problem under 2.6.9-rc1-mm1. > > If I copy data from a file to another the target will be smaller then > the source file. > > 2.6.9-rc1 does not have this problem > 2.6.8.1-mm4 does not have this problem > 2.6.9-rc1-mm1 _does have_ this problem > > I tried this with reiserfs and xfs and it happened with both of them. I also saw weird errors when I (very briefly) ran 2.6.9-rc1-mm1 on my News server. - scp of a newly created file to another box failed with an I/O error - md5sums of newly created files were incorrect - NFS mount request to this box failed with permission denied I don't have time to investigate further, but I've experienced no problems with either 2.6.9-rc1 or 2.6.8.1-mm4. All local file systems were ext3, btw. /Mikael ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas ` (2 preceding siblings ...) 2004-08-27 11:37 ` Mikael Pettersson @ 2004-08-27 11:55 ` Denis Vlasenko 2004-08-27 13:56 ` Hugh Dickins 3 siblings, 1 reply; 27+ messages in thread From: Denis Vlasenko @ 2004-08-27 11:55 UTC (permalink / raw) To: Gergely Tamas, linux-kernel; +Cc: akpm On Friday 27 August 2004 13:55, Gergely Tamas wrote: > Hi! > > I've hit the following data loss problem under 2.6.9-rc1-mm1. > > If I copy data from a file to another the target will be smaller then > the source file. > > 2.6.9-rc1 does not have this problem > 2.6.8.1-mm4 does not have this problem > 2.6.9-rc1-mm1 _does have_ this problem I've seen some errors from KDE too. Let me do your test... # dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 10+0 records in 10+0 records out # cat testfile > testfile.1 # ls -l test* -rw-r--r-- 1 root root 10485760 Aug 27 14:53 testfile -rw-r--r-- 1 root root 10481664 Aug 27 14:53 testfile.1 -- vda ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 11:55 ` Denis Vlasenko @ 2004-08-27 13:56 ` Hugh Dickins 2004-08-27 14:18 ` Gergely Tamas 2004-08-27 18:30 ` Ram Pai 0 siblings, 2 replies; 27+ messages in thread From: Hugh Dickins @ 2004-08-27 13:56 UTC (permalink / raw) To: Gergely Tamas; +Cc: Denis Vlasenko, Andrew Morton, Ram Pai, linux-kernel On Fri, 27 Aug 2004, Denis Vlasenko wrote: > On Friday 27 August 2004 13:55, Gergely Tamas wrote: > > > > I've hit the following data loss problem under 2.6.9-rc1-mm1. > > > > If I copy data from a file to another the target will be smaller then > > the source file. > > > > 2.6.9-rc1 does not have this problem > > 2.6.8.1-mm4 does not have this problem > > 2.6.9-rc1-mm1 _does have_ this problem > > I've seen some errors from KDE too. Let me do your test... > > # dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 > 10+0 records in > 10+0 records out > # cat testfile > testfile.1 > # ls -l test* > -rw-r--r-- 1 root root 10485760 Aug 27 14:53 testfile > -rw-r--r-- 1 root root 10481664 Aug 27 14:53 testfile.1 Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page size multiple) data to! You should find the patch below fixes it (and, I hope, the issue the erroneous patches were trying to fix). Signed-off-by: Hugh Dickins <hugh@veritas.com> --- 2.6.9-rc1-mm1/mm/filemap.c 2004-08-26 12:09:50.000000000 +0100 +++ linux/mm/filemap.c 2004-08-27 14:35:32.113359872 +0100 @@ -722,10 +722,7 @@ void do_generic_mapping_read(struct addr offset = *ppos & ~PAGE_CACHE_MASK; isize = i_size_read(inode); - if (!isize) - goto out; - - end_index = (isize - 1) >> PAGE_CACHE_SHIFT; + end_index = isize >> PAGE_CACHE_SHIFT; for (;;) { struct page *page; @@ -733,6 +730,11 @@ void do_generic_mapping_read(struct addr if (index > end_index) goto out; + if (index == end_index) { + nr = isize & ~PAGE_CACHE_MASK; + if (nr <= offset) + goto out; + } cond_resched(); page_cache_readahead(mapping, &ra, filp, index); @@ -831,8 +833,8 @@ readpage: * another truncate extends the file - this is desired though). */ isize = i_size_read(inode); - end_index = (isize - 1) >> PAGE_CACHE_SHIFT; - if (unlikely(!isize || index > end_index)) { + end_index = isize >> PAGE_CACHE_SHIFT; + if (unlikely(index > end_index)) { page_cache_release(page); goto out; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 13:56 ` Hugh Dickins @ 2004-08-27 14:18 ` Gergely Tamas 2004-08-27 15:36 ` Fabio Coatti 2004-08-27 18:30 ` Ram Pai 1 sibling, 1 reply; 27+ messages in thread From: Gergely Tamas @ 2004-08-27 14:18 UTC (permalink / raw) To: Hugh Dickins; +Cc: Denis Vlasenko, Andrew Morton, Ram Pai, linux-kernel > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page > size multiple) data to! You should find the patch below fixes it > (and, I hope, the issue the erroneous patches were trying to fix). > > Signed-off-by: Hugh Dickins <hugh@veritas.com> $ dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 0.028393 seconds (369307252 bytes/sec) $ du -sb testfile 10485760 testfile $ cat testfile > testfile.1 $ du -sb testfile.1 10485760 testfile.1 Seems to be working fine now, thanks. Gergely ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 14:18 ` Gergely Tamas @ 2004-08-27 15:36 ` Fabio Coatti 0 siblings, 0 replies; 27+ messages in thread From: Fabio Coatti @ 2004-08-27 15:36 UTC (permalink / raw) To: Gergely Tamas Cc: Hugh Dickins, Denis Vlasenko, Andrew Morton, Ram Pai, linux-kernel Alle 16:18, venerdì 27 agosto 2004, Gergely Tamas ha scritto: > > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page > > size multiple) data to! You should find the patch below fixes it > > (and, I hope, the issue the erroneous patches were trying to fix). > > > > Signed-off-by: Hugh Dickins <hugh@veritas.com> JFYI: It seems that this patch fixes my problem with rpm database as well; thanks a lot. -- Fabio "Cova" Coatti http://members.ferrara.linux.it/cova Ferrara Linux Users Group http://ferrara.linux.it GnuPG fp:9765 A5B6 6843 17BC A646 BE8C FA56 373A 5374 C703 Old SysOps never die... they simply forget their password. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 13:56 ` Hugh Dickins 2004-08-27 14:18 ` Gergely Tamas @ 2004-08-27 18:30 ` Ram Pai 2004-08-27 19:08 ` Hugh Dickins 1 sibling, 1 reply; 27+ messages in thread From: Ram Pai @ 2004-08-27 18:30 UTC (permalink / raw) To: Hugh Dickins; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote: > On Fri, 27 Aug 2004, Denis Vlasenko wrote: > > On Friday 27 August 2004 13:55, Gergely Tamas wrote: > > > > > > I've hit the following data loss problem under 2.6.9-rc1-mm1. > > > > > > If I copy data from a file to another the target will be smaller then > > > the source file. > > > > > > 2.6.9-rc1 does not have this problem > > > 2.6.8.1-mm4 does not have this problem > > > 2.6.9-rc1-mm1 _does have_ this problem > > > > I've seen some errors from KDE too. Let me do your test... > > > > # dd if=/dev/zero of=testfile bs=$((1024*1024)) count=10 > > 10+0 records in > > 10+0 records out > > # cat testfile > testfile.1 > > # ls -l test* > > -rw-r--r-- 1 root root 10485760 Aug 27 14:53 testfile > > -rw-r--r-- 1 root root 10481664 Aug 27 14:53 testfile.1 > > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page > size multiple) data to! You should find the patch below fixes it > (and, I hope, the issue the erroneous patches were trying to fix). Hmm.. now I fail to understand how this code works. assuming page size is 4096, if the size of the file is 4096, is the end_index 0 or is it 1? I had this assumption: file size in bytes end_index ----------------- --------- 1 to 4096 0 4097 to 2*4096 1 2*4096+1 to 3*4096 2 ... .. or is the isize value reported by i_size_read(inode) one less than the size of the real file? What am I missing? RP > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > > --- 2.6.9-rc1-mm1/mm/filemap.c 2004-08-26 12:09:50.000000000 +0100 > +++ linux/mm/filemap.c 2004-08-27 14:35:32.113359872 +0100 > @@ -722,10 +722,7 @@ void do_generic_mapping_read(struct addr > offset = *ppos & ~PAGE_CACHE_MASK; > > isize = i_size_read(inode); > - if (!isize) > - goto out; > - > - end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > + end_index = isize >> PAGE_CACHE_SHIFT; > > for (;;) { > struct page *page; > @@ -733,6 +730,11 @@ void do_generic_mapping_read(struct addr > > if (index > end_index) > goto out; > + if (index == end_index) { > + nr = isize & ~PAGE_CACHE_MASK; > + if (nr <= offset) > + goto out; > + } > > cond_resched(); > page_cache_readahead(mapping, &ra, filp, index); > @@ -831,8 +833,8 @@ readpage: > * another truncate extends the file - this is desired though). > */ > isize = i_size_read(inode); > - end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > - if (unlikely(!isize || index > end_index)) { > + end_index = isize >> PAGE_CACHE_SHIFT; > + if (unlikely(index > end_index)) { > page_cache_release(page); > goto out; > } > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 18:30 ` Ram Pai @ 2004-08-27 19:08 ` Hugh Dickins 2004-08-27 21:04 ` Ram Pai 0 siblings, 1 reply; 27+ messages in thread From: Hugh Dickins @ 2004-08-27 19:08 UTC (permalink / raw) To: Ram Pai; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel On 27 Aug 2004, Ram Pai wrote: > On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote: > > > > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page > > size multiple) data to! You should find the patch below fixes it > > (and, I hope, the issue the erroneous patches were trying to fix). > > Hmm.. now I fail to understand how this code works. > > assuming page size is 4096, if the size of the file is 4096, is the > end_index 0 or is it 1? Before your change and after mine, 1; with your change, 0. > I had this assumption: > > file size in bytes end_index > ----------------- --------- > 1 to 4096 0 > 4097 to 2*4096 1 > 2*4096+1 to 3*4096 2 > ... .. Well, that's what you changed it to, when you patched from the original end_index = isize >> PAGE_CACHE_SHIFT; to end_index = (isize - 1) >> PAGE_CACHE_SHIFT; But the "nr <= offset" check(s) relies on the original convention: 0 to 4095 0 4096 to 8191 1 ... .. > or is the isize value reported by i_size_read(inode) one less than the > size of the real file? No! > What am I missing? You're expecting end_index to be the index of the last (possibly incomplete) page of the file. And that might be a reasonable way of working it (though the special case of an empty file hints not). But the nr,offset checks (I say checks because I added another like the one further down, hopefully to fix the extra readahead issue) require the original convention. Just try it out with numbers. Hugh ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 19:08 ` Hugh Dickins @ 2004-08-27 21:04 ` Ram Pai 2004-08-28 4:35 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Ram Pai @ 2004-08-27 21:04 UTC (permalink / raw) To: Hugh Dickins; +Cc: Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2011 bytes --] On Fri, 2004-08-27 at 12:08, Hugh Dickins wrote: > On 27 Aug 2004, Ram Pai wrote: > > On Fri, 2004-08-27 at 06:56, Hugh Dickins wrote: > > > > > > Hmm, 2.6.9-rc1-mm1 looks like not a release to trust your (page > > > size multiple) data to! You should find the patch below fixes it > > > (and, I hope, the issue the erroneous patches were trying to fix). > > > > Hmm.. now I fail to understand how this code works. > > > > assuming page size is 4096, if the size of the file is 4096, is the > > end_index 0 or is it 1? > > Before your change and after mine, 1; with your change, 0. > > > I had this assumption: > > > > file size in bytes end_index > > ----------------- --------- > > 1 to 4096 0 > > 4097 to 2*4096 1 > > 2*4096+1 to 3*4096 2 > > ... .. > > Well, that's what you changed it to, when you patched from the original > end_index = isize >> PAGE_CACHE_SHIFT; > to end_index = (isize - 1) >> PAGE_CACHE_SHIFT; > > But the "nr <= offset" check(s) relies on the original convention: > 0 to 4095 0 > 4096 to 8191 1 > ... .. > > > or is the isize value reported by i_size_read(inode) one less than the > > size of the real file? > > No! > > > What am I missing? > > You're expecting end_index to be the index of the last (possibly > incomplete) page of the file. And that might be a reasonable way > of working it (though the special case of an empty file hints not). > But the nr,offset checks (I say checks because I added another like > the one further down, hopefully to fix the extra readahead issue) > require the original convention. Just try it out with numbers. got it! Everything got changed to the new convention except that the calculation of 'nr' just before the check "nr <= offset" . I have generated this patch which takes care of that and hence fixes the data loss problem as well. I guess it is cleaner too. This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I will forward it to Andrew. RP > > Hugh > [-- Attachment #2: pagecache_out_of_indx_access_attempt3.patch --] [-- Type: text/plain, Size: 1265 bytes --] --- ram/linux-2.6.8.1/mm/filemap.c 2004-08-14 03:56:25.000000000 -0700 +++ linux-2.6.8.1/mm/filemap.c 2004-08-27 13:43:00.000000000 -0700 @@ -665,14 +665,18 @@ void do_generic_mapping_read(struct addr offset = *ppos & ~PAGE_CACHE_MASK; isize = i_size_read(inode); - end_index = isize >> PAGE_CACHE_SHIFT; - if (index > end_index) + if (!isize) goto out; + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; + for (;;) { struct page *page; unsigned long nr, ret; + if (index > end_index) + goto out; + cond_resched(); page_cache_readahead(mapping, &ra, filp, index); @@ -688,7 +692,8 @@ page_ok: /* nr is the maximum number of bytes to copy from this page */ nr = PAGE_CACHE_SIZE; if (index == end_index) { - nr = isize & ~PAGE_CACHE_MASK; + BUG_ON(isize==0); + nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1; if (nr <= offset) { page_cache_release(page); goto out; @@ -770,8 +775,8 @@ readpage: * another truncate extends the file - this is desired though). */ isize = i_size_read(inode); - end_index = isize >> PAGE_CACHE_SHIFT; - if (index > end_index) { + end_index = (isize - 1) >> PAGE_CACHE_SHIFT; + if (unlikely(!isize || index > end_index)) { page_cache_release(page); goto out; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-27 21:04 ` Ram Pai @ 2004-08-28 4:35 ` Nick Piggin 2004-08-28 5:01 ` Ram Pai 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2004-08-28 4:35 UTC (permalink / raw) To: Ram Pai Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 626 bytes --] Ram Pai wrote: > > got it! Everything got changed to the new convention except that > the calculation of 'nr' just before the check "nr <= offset" . > > I have generated this patch which takes care of that and hence fixes the > data loss problem as well. I guess it is cleaner too. > > This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I > will forward it to Andrew. It looks like it should be OK... but at what point does it become simpler to use my patch which just moves the original calculation up, and does it again if we have to ->readpage()? (assuming you agree that it solves the problem) [-- Attachment #2: mm-gmr-fix.patch --] [-- Type: text/x-patch, Size: 1565 bytes --] --- linux-2.6-npiggin/mm/filemap.c | 29 ++++++++++++++++++----------- 1 files changed, 18 insertions(+), 11 deletions(-) diff -puN mm/filemap.c~mm-gmr-fix mm/filemap.c --- linux-2.6/mm/filemap.c~mm-gmr-fix 2004-08-28 14:14:02.000000000 +1000 +++ linux-2.6-npiggin/mm/filemap.c 2004-08-28 14:32:59.000000000 +1000 @@ -724,6 +724,15 @@ void do_generic_mapping_read(struct addr struct page *page; unsigned long nr, ret; + /* nr is the maximum number of bytes to copy from this page */ + nr = PAGE_CACHE_SIZE; + if (index == end_index) { + nr = isize & ~PAGE_CACHE_MASK; + if (nr <= offset) + goto out; + } + nr = nr - offset; + cond_resched(); page_cache_readahead(mapping, &ra, filp, index); @@ -736,17 +745,6 @@ find_page: if (!PageUptodate(page)) goto page_not_up_to_date; page_ok: - /* nr is the maximum number of bytes to copy from this page */ - nr = PAGE_CACHE_SIZE; - if (index == end_index) { - nr = isize & ~PAGE_CACHE_MASK; - if (nr <= offset) { - page_cache_release(page); - goto out; - } - } - nr = nr - offset; - /* If users can be writing to this page using arbitrary * virtual addresses, take care about potential aliasing * before reading the page on the kernel side. @@ -826,6 +824,15 @@ readpage: page_cache_release(page); goto out; } + nr = PAGE_CACHE_SIZE; + if (index == end_index) { + nr = isize & ~PAGE_CACHE_MASK; + if (nr <= offset) { + page_cache_release(page); + goto out; + } + } + nr = nr - offset; goto page_ok; readpage_error: _ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 4:35 ` Nick Piggin @ 2004-08-28 5:01 ` Ram Pai 2004-08-28 5:42 ` Andrew Morton 2004-08-28 5:54 ` Nick Piggin 0 siblings, 2 replies; 27+ messages in thread From: Ram Pai @ 2004-08-28 5:01 UTC (permalink / raw) To: Nick Piggin Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel On Fri, 2004-08-27 at 21:35, Nick Piggin wrote: > Ram Pai wrote: > > > > got it! Everything got changed to the new convention except that > > the calculation of 'nr' just before the check "nr <= offset" . > > > > I have generated this patch which takes care of that and hence fixes the > > data loss problem as well. I guess it is cleaner too. > > > > This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I > > will forward it to Andrew. > > It looks like it should be OK... but at what point does it become > simpler to use my patch which just moves the original calculation > up, and does it again if we have to ->readpage()? > > (assuming you agree that it solves the problem) I agree your patch also solves the problem. Either way is fine. Even Hugh's patch almost does the same thing as yours. The only advantage with my page is it does the calculation in only one place and does not repeat it. Also I feel its more intuitive to assume that index 0 covers range 0 to 4095 i.e index n covers range n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1. Currently the code assumes index 0 covers range 1 to 4096 i.e index n covers range (n*PAGE_SIZE)+1 to (n+1)*PAGE_SIZE. this is the 4th time we are trying to nail down the same thing. We better get it right this time. So any correct patch is ok with me. RP > > > ______________________________________________________________________ > > > --- > > linux-2.6-npiggin/mm/filemap.c | 29 ++++++++++++++++++----------- > 1 files changed, 18 insertions(+), 11 deletions(-) > > diff -puN mm/filemap.c~mm-gmr-fix mm/filemap.c > --- linux-2.6/mm/filemap.c~mm-gmr-fix 2004-08-28 14:14:02.000000000 +1000 > +++ linux-2.6-npiggin/mm/filemap.c 2004-08-28 14:32:59.000000000 +1000 > @@ -724,6 +724,15 @@ void do_generic_mapping_read(struct addr > struct page *page; > unsigned long nr, ret; > > + /* nr is the maximum number of bytes to copy from this page */ > + nr = PAGE_CACHE_SIZE; > + if (index == end_index) { > + nr = isize & ~PAGE_CACHE_MASK; > + if (nr <= offset) > + goto out; > + } > + nr = nr - offset; > + > cond_resched(); > page_cache_readahead(mapping, &ra, filp, index); > > @@ -736,17 +745,6 @@ find_page: > if (!PageUptodate(page)) > goto page_not_up_to_date; > page_ok: > - /* nr is the maximum number of bytes to copy from this page */ > - nr = PAGE_CACHE_SIZE; > - if (index == end_index) { > - nr = isize & ~PAGE_CACHE_MASK; > - if (nr <= offset) { > - page_cache_release(page); > - goto out; > - } > - } > - nr = nr - offset; > - > /* If users can be writing to this page using arbitrary > * virtual addresses, take care about potential aliasing > * before reading the page on the kernel side. > @@ -826,6 +824,15 @@ readpage: > page_cache_release(page); > goto out; > } > + nr = PAGE_CACHE_SIZE; > + if (index == end_index) { > + nr = isize & ~PAGE_CACHE_MASK; > + if (nr <= offset) { > + page_cache_release(page); > + goto out; > + } > + } > + nr = nr - offset; > goto page_ok; > > readpage_error: > > _ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 5:01 ` Ram Pai @ 2004-08-28 5:42 ` Andrew Morton 2004-08-28 5:54 ` Nick Piggin 1 sibling, 0 replies; 27+ messages in thread From: Andrew Morton @ 2004-08-28 5:42 UTC (permalink / raw) To: Ram Pai; +Cc: nickpiggin, hugh, dice, vda, linux-kernel Ram Pai <linuxram@us.ibm.com> wrote: > > this is the 4th time we are trying to nail down the same thing. We > better get it right this time. So any correct patch is ok with me. There's no rush and it's a kernel fastpath. Let's get the optimal code debugged and merged, please. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 5:01 ` Ram Pai 2004-08-28 5:42 ` Andrew Morton @ 2004-08-28 5:54 ` Nick Piggin 2004-08-28 9:44 ` Rafael J. Wysocki 2004-08-28 14:52 ` Hugh Dickins 1 sibling, 2 replies; 27+ messages in thread From: Nick Piggin @ 2004-08-28 5:54 UTC (permalink / raw) To: Ram Pai Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel Ram Pai wrote: > On Fri, 2004-08-27 at 21:35, Nick Piggin wrote: > >>Ram Pai wrote: >> >>>got it! Everything got changed to the new convention except that >>>the calculation of 'nr' just before the check "nr <= offset" . >>> >>>I have generated this patch which takes care of that and hence fixes the >>>data loss problem as well. I guess it is cleaner too. >>> >>>This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I >>>will forward it to Andrew. >> >>It looks like it should be OK... but at what point does it become >>simpler to use my patch which just moves the original calculation >>up, and does it again if we have to ->readpage()? >> >>(assuming you agree that it solves the problem) > > > I agree your patch also solves the problem. > > Either way is fine. Even Hugh's patch almost does the same thing as > yours. Ahh, yep - Hugh just forgot to also move the "nr" calculation into the ->readpage path, so it hits twice on the fast path. > The only advantage with my page is it does the calculation in > only one place and does not repeat it. Also I feel its more intuitive to Well kind of - but you are having to jump through hoops to get there. Yours does the following checks: /* fast path, read nr_pages from pagecache */ if (!isize) goto out; for (i = 0; i < nr_pages; i++) { if (index > end_index) goto out; if (index == end_index) { nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1; if (nr <= offset) { page_cache_release(page); goto out; } } /* slowpath, ->readpage */ if (unlikely(!isize || index > end_index)) { page_cache_release(page); goto out; } } Mine does: if (index > end_index) goto out; for (i = 0; i < pages_to_read; i++) { if (index == end_index) { nr = isize & ~PAGE_CACHE_MASK; if (nr <= offset) goto out; } /* slowpath, ->readpage */ if (index > end_index) { page_cache_release(page); goto out; } if (index == end_index) { nr = isize & ~PAGE_CACHE_MASK; if (nr <= offset) { page_cache_release(page); goto out; } } } So my fastpath is surely leaner, while the slowpath isn't a clear loser. What's more, it looks like mine handles the corner case of reading off the end of a non-PAGE_SIZE file (but within the same page). I think yours will drop through and do the ->readpage, while mine doesn't...? > assume that index 0 covers range 0 to 4095 i.e index n covers range > n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1. Currently the code assumes index 0 > covers range 1 to 4096 i.e index n covers range (n*PAGE_SIZE)+1 to > (n+1)*PAGE_SIZE. > It is definitely a pretty ugly function all round. I like the 0-4095 thing better too, but my counter argument to that is that this is the minimal change, and similar to how it has previously worked. > this is the 4th time we are trying to nail down the same thing. We > better get it right this time. So any correct patch is ok with me. > I agree. We'll leave it to someone else to decide, then ;) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 5:54 ` Nick Piggin @ 2004-08-28 9:44 ` Rafael J. Wysocki 2004-08-28 9:45 ` Andrew Morton 2004-08-28 14:52 ` Hugh Dickins 1 sibling, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2004-08-28 9:44 UTC (permalink / raw) To: Nick Piggin, Ram Pai Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel Well, guys, to make it 100% clear: if I apply the Nick's patch to the 2.6.9-rc1-mm1 tree, it will fix the data loss issue. Is that right? RJW On Saturday 28 of August 2004 07:54, Nick Piggin wrote: > Ram Pai wrote: > > On Fri, 2004-08-27 at 21:35, Nick Piggin wrote: > >>Ram Pai wrote: > >>>got it! Everything got changed to the new convention except that > >>>the calculation of 'nr' just before the check "nr <= offset" . > >>> > >>>I have generated this patch which takes care of that and hence fixes the > >>>data loss problem as well. I guess it is cleaner too. > >>> > >>>This patch is generated w.r.t 2.6.8.1. If everybody blesses this patch I > >>>will forward it to Andrew. > >> > >>It looks like it should be OK... but at what point does it become > >>simpler to use my patch which just moves the original calculation > >>up, and does it again if we have to ->readpage()? > >> > >>(assuming you agree that it solves the problem) > > > > I agree your patch also solves the problem. > > > > Either way is fine. Even Hugh's patch almost does the same thing as > > yours. > > Ahh, yep - Hugh just forgot to also move the "nr" calculation > into the ->readpage path, so it hits twice on the fast path. > > > The only advantage with my page is it does the calculation in > > only one place and does not repeat it. Also I feel its more intuitive to > > Well kind of - but you are having to jump through hoops to get there. > Yours does the following checks: > > /* fast path, read nr_pages from pagecache */ > if (!isize) > goto out; > for (i = 0; i < nr_pages; i++) { > if (index > end_index) > goto out; > if (index == end_index) { > nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1; > if (nr <= offset) { > page_cache_release(page); > goto out; > } > } > > /* slowpath, ->readpage */ > if (unlikely(!isize || index > end_index)) { > page_cache_release(page); > goto out; > } > } > > > Mine does: > if (index > end_index) > goto out; > for (i = 0; i < pages_to_read; i++) { > if (index == end_index) { > nr = isize & ~PAGE_CACHE_MASK; > if (nr <= offset) > goto out; > } > > /* slowpath, ->readpage */ > if (index > end_index) { > page_cache_release(page); > goto out; > } > if (index == end_index) { > nr = isize & ~PAGE_CACHE_MASK; > if (nr <= offset) { > page_cache_release(page); > goto out; > } > } > } > > So my fastpath is surely leaner, while the slowpath isn't a clear loser. > > What's more, it looks like mine handles the corner case of reading off the > end of a non-PAGE_SIZE file (but within the same page). I think yours will > drop through and do the ->readpage, while mine doesn't...? > > > assume that index 0 covers range 0 to 4095 i.e index n covers range > > n*PAGE_SIZE to ((n+1)*PAGE_SIZE)-1. Currently the code assumes index 0 > > covers range 1 to 4096 i.e index n covers range (n*PAGE_SIZE)+1 to > > (n+1)*PAGE_SIZE. > > It is definitely a pretty ugly function all round. I like the 0-4095 thing > better too, but my counter argument to that is that this is the minimal > change, and similar to how it has previously worked. > > > this is the 4th time we are trying to nail down the same thing. We > > better get it right this time. So any correct patch is ok with me. > > I agree. We'll leave it to someone else to decide, then ;) > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard P. Feynman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 9:44 ` Rafael J. Wysocki @ 2004-08-28 9:45 ` Andrew Morton 2004-08-28 10:18 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Andrew Morton @ 2004-08-28 9:45 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: nickpiggin, linuxram, hugh, dice, vda, linux-kernel "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > Well, guys, to make it 100% clear: if I apply the Nick's patch to the > 2.6.9-rc1-mm1 tree, it will fix the data loss issue. Is that right? Should do. Or revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch and then ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 9:45 ` Andrew Morton @ 2004-08-28 10:18 ` Nick Piggin 2004-08-28 10:47 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2004-08-28 10:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Rafael J. Wysocki, linuxram, hugh, dice, vda, linux-kernel Andrew Morton wrote: > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > >>Well, guys, to make it 100% clear: if I apply the Nick's patch to the >> 2.6.9-rc1-mm1 tree, it will fix the data loss issue. Is that right? > > > Should do. It passes test cases that would previously fail here, so consider it lightly tested. Note that the patch is on top of 2.6.9-rc1 though, it becomes slightly deranged when applying straight onto mm. So don't do that. ... > Or revert > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch > > and then > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch > > > Once you have these backed out mine should apply fine, but it only closes some performance (not correctness) corner cases that the above patches attempted to. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 10:18 ` Nick Piggin @ 2004-08-28 10:47 ` Rafael J. Wysocki 0 siblings, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2004-08-28 10:47 UTC (permalink / raw) To: Nick Piggin, Andrew Morton; +Cc: linuxram, hugh, dice, vda, linux-kernel On Saturday 28 of August 2004 12:18, Nick Piggin wrote: > Andrew Morton wrote: > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > >>Well, guys, to make it 100% clear: if I apply the Nick's patch to the > >> 2.6.9-rc1-mm1 tree, it will fix the data loss issue. Is that right? > > > > Should do. > > It passes test cases that would previously fail here, so consider it > lightly tested. Note that the patch is on top of 2.6.9-rc1 though, > it becomes slightly deranged when applying straight onto mm. So don't > do that. > > .. > > > Or revert > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2 > >.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one-cleanup.patch > > > > and then > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2 > >.6.9-rc1-mm1/broken-out/re-fix-pagecache-reading-off-by-one.patch > > Once you have these backed out mine should apply fine, but it only closes > some performance (not correctness) corner cases that the above patches > attempted to. OK. Thanks a lot, RJW -- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard P. Feynman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 5:54 ` Nick Piggin 2004-08-28 9:44 ` Rafael J. Wysocki @ 2004-08-28 14:52 ` Hugh Dickins 2004-08-29 1:30 ` Nick Piggin 1 sibling, 1 reply; 27+ messages in thread From: Hugh Dickins @ 2004-08-28 14:52 UTC (permalink / raw) To: Nick Piggin Cc: Ram Pai, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel On Sat, 28 Aug 2004, Nick Piggin wrote: > > Ahh, yep - Hugh just forgot to also move the "nr" calculation > into the ->readpage path, so it hits twice on the fast path. Yes, your patch is better than mine. > What's more, it looks like mine handles the corner case of reading off the > end of a non-PAGE_SIZE file (but within the same page). I think yours will > drop through and do the ->readpage, while mine doesn't...? It's a common case, not a corner case: a short read to end of file, then app does another read which returns 0 for the end of file: that wouldn't normally fall through to readpage in Ram's case, but would do unnecessary page_cache_readahead (won't do much) and find_get_page. > I agree. We'll leave it to someone else to decide, then ;) I vote for Nick's patch. I do have one reservation on do_generic_mapping_read, common to all these versions, unrelated to the current issue. I'm surprised to notice that you're careful to re-i_size_read after readpage, but otherwise rely on the initial i_size_read. We could go around this loop many many times, faulting user pages in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was deficient after readpage, but at least it reassessed i_size each time around the loop. I guess if the file is truncated meanwhile, the common case would be for a find_get_page to fail, and then the situation be corrected after readpage; perhaps it's more likely to show up as read returning too little on a large file being steadily appended. Maybe you already ruled these cases out as not worth the overhead of handling, but it does surprise me that the old code was safer in this respect. Hugh ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-28 14:52 ` Hugh Dickins @ 2004-08-29 1:30 ` Nick Piggin 2004-08-31 6:25 ` Ram Pai 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2004-08-29 1:30 UTC (permalink / raw) To: Hugh Dickins Cc: Ram Pai, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel Hugh Dickins wrote: > On Sat, 28 Aug 2004, Nick Piggin wrote: > >>Ahh, yep - Hugh just forgot to also move the "nr" calculation >>into the ->readpage path, so it hits twice on the fast path. > > > Yes, your patch is better than mine. > > >>What's more, it looks like mine handles the corner case of reading off the >>end of a non-PAGE_SIZE file (but within the same page). I think yours will >>drop through and do the ->readpage, while mine doesn't...? > > > It's a common case, not a corner case: a short read to end of file, > then app does another read which returns 0 for the end of file: that > wouldn't normally fall through to readpage in Ram's case, but would > do unnecessary page_cache_readahead (won't do much) and find_get_page. > Ahh, yeah I guess that would be the more common case. I was just thinking of just randomly reading past the end of the file - in that case it *would* fall through to ->readpage if the page wasn't in cache. But anyway, I think we agree my (our) version should cover that. > >>I agree. We'll leave it to someone else to decide, then ;) > > > I vote for Nick's patch. > OK - maybe that can go for a spin in the next -mm. Andrew did you get it? > I do have one reservation on do_generic_mapping_read, > common to all these versions, unrelated to the current issue. > > I'm surprised to notice that you're careful to re-i_size_read > after readpage, but otherwise rely on the initial i_size_read. > We could go around this loop many many times, faulting user pages > in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was > deficient after readpage, but at least it reassessed i_size each > time around the loop. I guess if the file is truncated meanwhile, > the common case would be for a find_get_page to fail, and then the > situation be corrected after readpage; perhaps it's more likely to > show up as read returning too little on a large file being steadily > appended. Maybe you already ruled these cases out as not worth the > overhead of handling, but it does surprise me that the old code was > safer in this respect. > Yeah I guess it is a case of doing the minimum that is really needed. Although considering page_cache_readahead call can do an i_size_read, it might be nicer to probably change the interface to have it passed down instead ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-29 1:30 ` Nick Piggin @ 2004-08-31 6:25 ` Ram Pai 2004-08-31 6:39 ` Andrew Morton 0 siblings, 1 reply; 27+ messages in thread From: Ram Pai @ 2004-08-31 6:25 UTC (permalink / raw) To: Nick Piggin Cc: Hugh Dickins, Gergely Tamas, Denis Vlasenko, Andrew Morton, linux-kernel On Sat, 2004-08-28 at 18:30, Nick Piggin wrote: > Hugh Dickins wrote: > > On Sat, 28 Aug 2004, Nick Piggin wrote: ..snip.. > > But anyway, I think we agree my (our) version should cover that. > > > > >>I agree. We'll leave it to someone else to decide, then ;) > > > > > > I vote for Nick's patch. > > > > OK - maybe that can go for a spin in the next -mm. Andrew did you > get it? So in case my vote counts, add my vote too :) . > > > I do have one reservation on do_generic_mapping_read, > > common to all these versions, unrelated to the current issue. > > > > I'm surprised to notice that you're careful to re-i_size_read > > after readpage, but otherwise rely on the initial i_size_read. > > We could go around this loop many many times, faulting user pages > > in actor, rescheduling away: the old (e.g. 2.4 or 2.6.0) code was > > deficient after readpage, but at least it reassessed i_size each > > time around the loop. I guess if the file is truncated meanwhile, > > the common case would be for a find_get_page to fail, and then the > > situation be corrected after readpage; perhaps it's more likely to > > show up as read returning too little on a large file being steadily > > appended. Maybe you already ruled these cases out as not worth the > > overhead of handling, but it does surprise me that the old code was > > safer in this respect. > > > > Yeah I guess it is a case of doing the minimum that is really > needed. > > Although considering page_cache_readahead call can do an i_size_read, > it might be nicer to probably change the interface to have it passed down > instead We are experimenting some patches to see if sending the i_size parameter to page_cache_readahead() can help or not. There are couple of advantages in doing that. (1) Quick ramp-up to max-readahead pages for sequential workload. (2) being able to read atleast the requested number of pages in one shot instead of reading one page at a time in case the readahead window gets closed. But the biggest performance boost has been seen with large max-readahead window sizes. Currently most of the underlying block devices default to 32 pages max-readahead even though the underlying device can handle much larger reads. We could extract much more sequential read performance if the max-readahead was set to much higher values like 256 pages which most modern devices are capable off. The problem AFAICT is that the block device layer defaults the max-readahead value for most block devices to 32, without consulting the capability of the underlying block device driver. RP ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 2004-08-31 6:25 ` Ram Pai @ 2004-08-31 6:39 ` Andrew Morton 0 siblings, 0 replies; 27+ messages in thread From: Andrew Morton @ 2004-08-31 6:39 UTC (permalink / raw) To: Ram Pai; +Cc: nickpiggin, hugh, dice, vda, linux-kernel Ram Pai <linuxram@us.ibm.com> wrote: > > > OK - maybe that can go for a spin in the next -mm. Andrew did you > > get it? > > So in case my vote counts, add my vote too :) . > Can someone send me the patch? > > But the biggest performance boost has been seen with large max-readahead > window sizes. Currently most of the underlying block devices default to > 32 pages max-readahead even though the underlying device can handle > much larger reads. We could extract much more sequential read > performance if the max-readahead was set to much higher values like 256 > pages which most modern devices are capable off. The problem AFAICT is > that the block device layer defaults the max-readahead value for most > block devices to 32, without consulting the capability of the underlying > block device driver. This can be done in startup scripts. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data loss in 2.6.9-rc1-mm1 @ 2004-08-28 12:05 Joachim Bremer 0 siblings, 0 replies; 27+ messages in thread From: Joachim Bremer @ 2004-08-28 12:05 UTC (permalink / raw) To: linux-kernel It may be architecture depend so here are my results: machine: Tyan Thundewr K8W dual opteron with 2 G of memory, raid 4 on SATA drives Distri: SuSE 9,1 64-bit, GCC 3.4.1 Stock 2.6.9-rc1-mm1: lots of errors. ldd on 32bit executables fail, reiserfsck fails on last block etc 2.6.9-rc1-mm1 with backed out pagecache patches: completely OK 2.6.9-rc1-mm1 with patch from Hugh Dickins: completely OK 2.6.9-rc1-mm1 Nick Piggin: only partially OK, will fail on reiserfsck with "bread End of file" on last block of the md -device, random erros on compiling a kernel, mostly something like "fs/ioctl.o: file not recognized: File truncated" Joachim ________________________________________________________________ Verschicken Sie romantische, coole und witzige Bilder per SMS! Jetzt neu bei WEB.DE FreeMail: http://freemail.web.de/?mc=021193 ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2004-08-31 6:44 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-08-27 10:55 data loss in 2.6.9-rc1-mm1 Gergely Tamas 2004-08-27 11:05 ` Anton Altaparmakov 2004-08-27 11:40 ` Gergely Tamas 2004-08-27 12:35 ` Fabio Coatti 2004-08-27 11:17 ` Tim Schmielau 2004-08-27 11:43 ` Gergely Tamas 2004-08-27 11:37 ` Mikael Pettersson 2004-08-27 11:55 ` Denis Vlasenko 2004-08-27 13:56 ` Hugh Dickins 2004-08-27 14:18 ` Gergely Tamas 2004-08-27 15:36 ` Fabio Coatti 2004-08-27 18:30 ` Ram Pai 2004-08-27 19:08 ` Hugh Dickins 2004-08-27 21:04 ` Ram Pai 2004-08-28 4:35 ` Nick Piggin 2004-08-28 5:01 ` Ram Pai 2004-08-28 5:42 ` Andrew Morton 2004-08-28 5:54 ` Nick Piggin 2004-08-28 9:44 ` Rafael J. Wysocki 2004-08-28 9:45 ` Andrew Morton 2004-08-28 10:18 ` Nick Piggin 2004-08-28 10:47 ` Rafael J. Wysocki 2004-08-28 14:52 ` Hugh Dickins 2004-08-29 1:30 ` Nick Piggin 2004-08-31 6:25 ` Ram Pai 2004-08-31 6:39 ` Andrew Morton -- strict thread matches above, loose matches on Subject: below -- 2004-08-28 12:05 Joachim Bremer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox