All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@suse.de>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: okir@suse.de, nfs@lists.sourceforge.net, neilb@cse.unsw.edu.au,
	linux-kernel@vger.kernel.org
Subject: Re: Still data corruption with LTP doio on 2.6.17rc
Date: Mon, 12 Jun 2006 11:27:10 +0200	[thread overview]
Message-ID: <200606121127.10490.ak@suse.de> (raw)
In-Reply-To: <1149778177.15644.19.camel@lade.trondhjem.org>

On Thursday 08 June 2006 16:49, Trond Myklebust wrote:
> On Thu, 2006-06-08 at 12:44 +0200, Andi Kleen wrote:
> > I'm still seeing data corruption when running LTP over NFS
> > between two 2.6.17rc* hosts. I already saw this before 2.6.16
> > and reported.
> > 
> > Server is running knfsd 2.6.17-rc4-git9, client is running 2.6.17-rc6
> > with nfsroot. Both x86-64 and SUSE 10.0 userland. The file system
> > is exported as async and mounted with
> > /dev/root / nfs rw,vers=2,rsize=4096,wsize=4096,hard,nolock,proto=udp,timeo=11,retrans=2,addr=10.23.204.1 0 0
> > 
> > First I always get lots of
> > 
> > do_vfs_lock: VFS is out of sync with lock manager!
> > 
> > messages on the client. They don't seem to be directly related though. 
> > 
> > I set up ltp-full-20051103 on the NFS root and run it on the client
> > with runltplite.sh. Eventually it reports
> > 
> > 
> > <<<test_start>>>
> > tag=rwtest03 stime=1149754762
> > cmdline="export LTPROOT; rwtest -N rwtest03 -c -q -i 60s -n 2  -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$"
> > contacts=""
> > analysis=exit
> > initiation_status="ok"
> > <<<test_output>>>
> > 
> > doio(rwtest03) ( 8155) 08:19:23
> > ---------------------
> > *** DATA COMPARISON ERROR ***
> > check_file(/tmp/ltp-2256/mm-buff-8139, 7813848, 81293, U:8155:bigfoot:doio*, 20, 0) failed
> > 
> > Comparison fd is 3, with open flags 0
> > Corrupt regions follow - unprintable chars are represented as '.'
> > -----------------------------------------------------------------
> > corrupt bytes starting at file offset 7813848
> >     1st 32 expected bytes:  U:8155:bigfoot:doio*U:8155:bigfo
> >     1st 32 actual bytes:    ................................
> > 
> > Request number 36
> >           fd 4 is file /tmp/ltp-2256/mm-buff-8139 - open flags are 02 O_RDWR,
> >           write done at file offset 7813848 - pattern is U (0125)
> >           number of requests is 1, strides per request is 1
> >           i/o byte count = 81293
> >           memory alignment is unaligned
> > 
> > syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 4, 0)
> >         file is mmaped to: 0x2b73b87f0000
> >         file-mem=0x2b73b8f63ad8, length=81293, buffer=0x52d540
> > 
> > 
> > est03) ( 8152) 08:19:23
> > ---------------------
> > (parent) pid 8155 exited because of data compare errors
> 
> mmap() is still not 100% safe when the client believes that the file has
> changed on the server and needs to call invalidate_inode_pages2(): if
> there is dirty data on the page when unmap_mapping_range() gets called,
> then that dirty data may be lost (basically, we need a VM helper that
> does unmap+flush+invalidate_page).
> 
> The attached patch may, however reduce the number of calls to
> invalidate_inode_pages2() since it ensures that revalidations only occur
> if and when we're going to read from the page cache (i.e. in places when
> we _need_ the assurance that the page cache is valid).

Sorry for the delay.

I tested your patch with -rc6 and the problem is still there:

Also I did some testing with 2.6.16 and I couldn't reproduce it here
so maybe it was introduced afterwards (however I could swear I've seen
it before, but I can't remember which exact version number it was)

BTW you can probably easily reproduce it yourself by downloading LTP
from ltp.sourceforge.net and compiling/running it on a NFS mount.

-Andi

doio(rwtest03) ( 8621) 07:42:18
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-buff-8605, 507558, 49196, O:8621:bigfoot:doio*, 20, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 532480
    1st 32 expected bytes:  8621:bigfoot:doio*O:8621:bigfoot
    1st 32 actual bytes:    ................................

Request number 1
          fd 3 is file /tmp/ltp-2819/mm-buff-8605 - open flags are 02 O_RDWR,
          write done at file offset 507558 - pattern is O (0117)
          number of requests is 1, strides per request is 1
          i/o byte count = 49196
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2b0b89f6f000
        file-mem=0x2b0b89feaea6, length=49196, buffer=0x52d547


doio(rwtest03) ( 8618) 07:42:18
---------------------
(parent) pid 8621 exited because of data compare errors
rwtest(rwtest03) : doio reported errors (r=4)
rwtest03    1  FAIL  :  doio reported errors (r=4)
rwtest03    1  FAIL  :  Test failed
<<<execution_status>>>
duration=67 termination_type=exited termination_id=4 corefile=no
cutime=32 cstime=141
<<<test_end>>>
<<<test_start>>>
tag=rwtest04 stime=1150098204
cmdline="export LTPROOT; rwtest -N rwtest04 -c -q -i 60s -n 2  -f sync -s mmread,mmwrite -m random -Dv 10%25000:m
m-sync-$$"
contacts=""
analysis=exit
initiation_status="ok"
<<<test_output>>>
doio(rwtest04) ( 8640) 07:43:26
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-sync-8625, 4585176, 82650, T:8640:bigfoot:doio*, 20, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 4585176
    1st 32 expected bytes:  T:8640:bigfoot:doio*T:8640:bigfo
    1st 32 actual bytes:    ................................

Request number 54
          fd 3 is file /tmp/ltp-2819/mm-sync-8625 - open flags are 010002 O_RDWR,O_SYNC,
          write done at file offset 4585176 - pattern is T (0124)
          number of requests is 1, strides per request is 1
          i/o byte count = 82650
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2addadf68000
        file-mem=0x2addae3c76d8, length=82650, buffer=0x52d541


doio(rwtest04) ( 8638) 07:43:26
---------------------
(parent) pid 8640 exited because of data compare errors

doio(rwtest04) ( 8641) 07:43:26
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-sync-8625, 7161356, 118706, E:8641:bigfoot:doio*, 20, 0) failed
Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 7254016
    1st 32 expected bytes:  E:8641:bigfoot:doio*E:8641:bigfo
    1st 32 actual bytes:    ................................

Request number 61
          fd 3 is file /tmp/ltp-2819/mm-sync-8625 - open flags are 010002 O_RDWR,O_SYNC,
          write done at file offset 7161356 - pattern is E (0105)
          number of requests is 1, strides per request is 1
          i/o byte count = 118706
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2addadf68000
        file-mem=0x2addae63c60c, length=118706, buffer=0x52d543


doio(rwtest04) ( 8638) 07:43:26
---------------------
(parent) pid 8641 exited because of data compare errors
rwtest(rwtest04) : iogen reported errors (r=141)
rwtest04    1  FAIL  :  Test failed


_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <ak@suse.de>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: neilb@cse.unsw.edu.au, nfs@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, okir@suse.de
Subject: Re: Still data corruption with LTP doio on 2.6.17rc
Date: Mon, 12 Jun 2006 11:27:10 +0200	[thread overview]
Message-ID: <200606121127.10490.ak@suse.de> (raw)
In-Reply-To: <1149778177.15644.19.camel@lade.trondhjem.org>

On Thursday 08 June 2006 16:49, Trond Myklebust wrote:
> On Thu, 2006-06-08 at 12:44 +0200, Andi Kleen wrote:
> > I'm still seeing data corruption when running LTP over NFS
> > between two 2.6.17rc* hosts. I already saw this before 2.6.16
> > and reported.
> > 
> > Server is running knfsd 2.6.17-rc4-git9, client is running 2.6.17-rc6
> > with nfsroot. Both x86-64 and SUSE 10.0 userland. The file system
> > is exported as async and mounted with
> > /dev/root / nfs rw,vers=2,rsize=4096,wsize=4096,hard,nolock,proto=udp,timeo=11,retrans=2,addr=10.23.204.1 0 0
> > 
> > First I always get lots of
> > 
> > do_vfs_lock: VFS is out of sync with lock manager!
> > 
> > messages on the client. They don't seem to be directly related though. 
> > 
> > I set up ltp-full-20051103 on the NFS root and run it on the client
> > with runltplite.sh. Eventually it reports
> > 
> > 
> > <<<test_start>>>
> > tag=rwtest03 stime=1149754762
> > cmdline="export LTPROOT; rwtest -N rwtest03 -c -q -i 60s -n 2  -f buffered -s mmread,mmwrite -m random -Dv 10%25000:mm-buff-$$"
> > contacts=""
> > analysis=exit
> > initiation_status="ok"
> > <<<test_output>>>
> > 
> > doio(rwtest03) ( 8155) 08:19:23
> > ---------------------
> > *** DATA COMPARISON ERROR ***
> > check_file(/tmp/ltp-2256/mm-buff-8139, 7813848, 81293, U:8155:bigfoot:doio*, 20, 0) failed
> > 
> > Comparison fd is 3, with open flags 0
> > Corrupt regions follow - unprintable chars are represented as '.'
> > -----------------------------------------------------------------
> > corrupt bytes starting at file offset 7813848
> >     1st 32 expected bytes:  U:8155:bigfoot:doio*U:8155:bigfo
> >     1st 32 actual bytes:    ................................
> > 
> > Request number 36
> >           fd 4 is file /tmp/ltp-2256/mm-buff-8139 - open flags are 02 O_RDWR,
> >           write done at file offset 7813848 - pattern is U (0125)
> >           number of requests is 1, strides per request is 1
> >           i/o byte count = 81293
> >           memory alignment is unaligned
> > 
> > syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 4, 0)
> >         file is mmaped to: 0x2b73b87f0000
> >         file-mem=0x2b73b8f63ad8, length=81293, buffer=0x52d540
> > 
> > 
> > est03) ( 8152) 08:19:23
> > ---------------------
> > (parent) pid 8155 exited because of data compare errors
> 
> mmap() is still not 100% safe when the client believes that the file has
> changed on the server and needs to call invalidate_inode_pages2(): if
> there is dirty data on the page when unmap_mapping_range() gets called,
> then that dirty data may be lost (basically, we need a VM helper that
> does unmap+flush+invalidate_page).
> 
> The attached patch may, however reduce the number of calls to
> invalidate_inode_pages2() since it ensures that revalidations only occur
> if and when we're going to read from the page cache (i.e. in places when
> we _need_ the assurance that the page cache is valid).

Sorry for the delay.

I tested your patch with -rc6 and the problem is still there:

Also I did some testing with 2.6.16 and I couldn't reproduce it here
so maybe it was introduced afterwards (however I could swear I've seen
it before, but I can't remember which exact version number it was)

BTW you can probably easily reproduce it yourself by downloading LTP
from ltp.sourceforge.net and compiling/running it on a NFS mount.

-Andi

doio(rwtest03) ( 8621) 07:42:18
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-buff-8605, 507558, 49196, O:8621:bigfoot:doio*, 20, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 532480
    1st 32 expected bytes:  8621:bigfoot:doio*O:8621:bigfoot
    1st 32 actual bytes:    ................................

Request number 1
          fd 3 is file /tmp/ltp-2819/mm-buff-8605 - open flags are 02 O_RDWR,
          write done at file offset 507558 - pattern is O (0117)
          number of requests is 1, strides per request is 1
          i/o byte count = 49196
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2b0b89f6f000
        file-mem=0x2b0b89feaea6, length=49196, buffer=0x52d547


doio(rwtest03) ( 8618) 07:42:18
---------------------
(parent) pid 8621 exited because of data compare errors
rwtest(rwtest03) : doio reported errors (r=4)
rwtest03    1  FAIL  :  doio reported errors (r=4)
rwtest03    1  FAIL  :  Test failed
<<<execution_status>>>
duration=67 termination_type=exited termination_id=4 corefile=no
cutime=32 cstime=141
<<<test_end>>>
<<<test_start>>>
tag=rwtest04 stime=1150098204
cmdline="export LTPROOT; rwtest -N rwtest04 -c -q -i 60s -n 2  -f sync -s mmread,mmwrite -m random -Dv 10%25000:m
m-sync-$$"
contacts=""
analysis=exit
initiation_status="ok"
<<<test_output>>>
doio(rwtest04) ( 8640) 07:43:26
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-sync-8625, 4585176, 82650, T:8640:bigfoot:doio*, 20, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 4585176
    1st 32 expected bytes:  T:8640:bigfoot:doio*T:8640:bigfo
    1st 32 actual bytes:    ................................

Request number 54
          fd 3 is file /tmp/ltp-2819/mm-sync-8625 - open flags are 010002 O_RDWR,O_SYNC,
          write done at file offset 4585176 - pattern is T (0124)
          number of requests is 1, strides per request is 1
          i/o byte count = 82650
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2addadf68000
        file-mem=0x2addae3c76d8, length=82650, buffer=0x52d541


doio(rwtest04) ( 8638) 07:43:26
---------------------
(parent) pid 8640 exited because of data compare errors

doio(rwtest04) ( 8641) 07:43:26
---------------------
*** DATA COMPARISON ERROR ***
check_file(/tmp/ltp-2819/mm-sync-8625, 7161356, 118706, E:8641:bigfoot:doio*, 20, 0) failed
Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 7254016
    1st 32 expected bytes:  E:8641:bigfoot:doio*E:8641:bigfo
    1st 32 actual bytes:    ................................

Request number 61
          fd 3 is file /tmp/ltp-2819/mm-sync-8625 - open flags are 010002 O_RDWR,O_SYNC,
          write done at file offset 7161356 - pattern is E (0105)
          number of requests is 1, strides per request is 1
          i/o byte count = 118706
          memory alignment is unaligned

syscall:  mmap-write(NULL, 12800000, PROT_WRITE, MAP_SHARED, 3, 0)
        file is mmaped to: 0x2addadf68000
        file-mem=0x2addae63c60c, length=118706, buffer=0x52d543


doio(rwtest04) ( 8638) 07:43:26
---------------------
(parent) pid 8641 exited because of data compare errors
rwtest(rwtest04) : iogen reported errors (r=141)
rwtest04    1  FAIL  :  Test failed

  reply	other threads:[~2006-06-12  9:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-08 10:44 Still data corruption with LTP doio on 2.6.17rc Andi Kleen
2006-06-08 14:49 ` Trond Myklebust
2006-06-08 14:49   ` Trond Myklebust
2006-06-12  9:27   ` Andi Kleen [this message]
2006-06-12  9:27     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200606121127.10490.ak@suse.de \
    --to=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=nfs@lists.sourceforge.net \
    --cc=okir@suse.de \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.