* [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
@ 2012-10-15 2:12 Chen Gang
2012-10-15 4:27 ` Myklebust, Trond
0 siblings, 1 reply; 12+ messages in thread
From: Chen Gang @ 2012-10-15 2:12 UTC (permalink / raw)
To: Jeff Layton, Trond.Myklebust, linux-nfs, linux-kernel
Hello Trond Myklebust, Jeff Layton:
1) Root Cause:
A) begin truncate to smaller, after async read finish starting.
B) async read done come, after truncate operation change inode size.
C) in nfs_inode_attrs_need_update, nfs_size_need_update return true.
i) the bigger size is the original old size of client itself.
ii) the smaller size is the current true size.
iii) nfs_inode_attrs_need_update not consider this situation.
2) Fix nfs_size_need_update:
A) delete it:
i) it is for performance, not necessary (not for correctness).
ii) if it was necessary, it should use "!=" instead of '>'.
iii) it is the simplest way to fix this bug (maybe not best way).
B) consider this situation in it:
i) it is the best way.
ii) it is a little complex (need think of)
iii) sorry for I do not know how to fix it (at least now).
C) not touch it:
i) correct another place (such as nfs_update_inode)
ii) it is a bad idea (at least, I think it is)
iii) we need keep the source code as clearer as possible.
3) Test Result:
A) it is one client and one server separately, under 3.6-rc5 x86_32.
B) use one process (fsx-linux) test (only one user mode thread).
C) only use read, truncate, llseek, fstat operation for one file.
Before delete nfs_size_need_update, it causes issue.
After delete nfs_size_need_update, it is ok.
User Mode Log:
-------------------------------------------------------------------------
<<<test_start>>>
tag=nfsx-linux stime=1350202875
cmdline="export VERSION SOCKET_TYPE; TCbin=$LTPROOT/testcases/bin fsx.sh"
contacts=""
analysis=exit
<<<test_output>>>
Test Options:
VERSION: 2
RHOST: dhcp122.asianux.net
ITERATIONS: 50000
SOCKET_TYPE: udp
NFS_TYPE: nfs
Setting up remote machine: dhcp122.asianux.net
Mounting NFS filesystem dhcp122.asianux.net:/tmp/fsx1447.testdir on
/opt/ltp/testcases/bin/fsx1447 with options '-o proto=udp,vers=2 '
fsx-linux -N 50000 /opt/ltp/testcases/bin/fsx1447/testfile Starting
truncating to largest ever: 0x13e76
truncating to largest ever: 0x2e52c
truncating to largest ever: 0x3c2c2
truncating to largest ever: 0x3f15f
truncating to largest ever: 0x3fcb9
truncating to largest ever: 0x3fe96
truncating to largest ever: 0x3ff9d
Size error: expected 0x36ef9 stat 0x3bbca seek 0x36ef9
LOG DUMP (5652 total operations):
...
5636: 1350203089.781599 READ 0x143b6 thru 0x21ccb (0xd916 bytes)
5637: 1350203090.028214 READ 0x2a629 thru 0x2d0a1 (0x2a79 bytes)
5638: 1350203090.072029 TRUNCATE DOWN from 0x2d0a2 to 0x1bb35
5639: 1350203090.087401 READ 0x11a05 thru 0x1bb34 (0xa130 bytes)
5640: 1350203090.223985 READ 0x508c thru 0xa9da (0x594f bytes)
5641: 1350203090.245717 TRUNCATE DOWN from 0x1bb35 to 0x8830
5642: 1350203090.353502 READ 0x548f thru 0x882f (0x33a1 bytes)
5643: 1350203090.366596 READ 0x5802 thru 0x882f (0x302e bytes)
5644: 1350203090.366629 TRUNCATE UP from 0x8830 to 0x20011
5645: 1350203090.379476 TRUNCATE DOWN from 0x20011 to 0x134f4
5646: 1350203090.396234 READ 0x124a0 thru 0x134f3 (0x1054 bytes)
5647: 1350203090.401805 READ 0x880b thru 0x1189d (0x9093 bytes)
5648: 1350203090.532050 READ 0x134c7 thru 0x134f3 (0x2d bytes)
5649: 1350203090.532057 TRUNCATE UP from 0x134f4 to 0x3bbca
5650: 1350203090.546373 READ 0x2944c thru 0x2c1d6 (0x2d8b bytes)
5651: 1350203090.561228 READ 0xdbe1 thru 0x16260 (0x8680 bytes)
5652: 1350203090.751937 TRUNCATE DOWN from 0x3bbca to 0x36ef9
Correct content saved for comparison
(maybe hexdump "/opt/ltp/testcases/bin/fsx1447/testfile" vs
"/opt/ltp/testcases/bin/fsx1447/testfile.fsxgood")
fsx-linux -N 50000 /opt/ltp/testcases/bin/fsx1447/testfile Finished
Cleaning up testcase
Unmounting /opt/ltp/testcases/bin/fsx1447
Test Failed: Errors have resulted from this test
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=218 termination_type=exited termination_id=1 corefile=no
cutime=43 cstime=82
<<<test_end>>>
-------------------------------------------------------------------------
Kernel Mode Log: (using printk which I add)
-------------------------------------------------------------------------
Time: My Mark: Task ptr: comments (include function name):
[ 280.883701] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.890677] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.897437] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.897441] gchen_tag: f5e48c90, nfs_setattr_update_inode, cur=3bbca,
new=36ef9
[ 280.897450] gchen_tag: f5e48c90, nfs_setattr
[ 280.897462] gchen_tag: hit, f5c30000, nfs_refresh_inode_locked,
cur=36ef9, new=3bbca
[ 280.897469] gchen_tag: f5c30000, nfs_update_inode, change size,
cur=36ef9, new=3bbca
[ 280.898129] gchen_tag: f5e48c90, nfs_update_inode, change size,
cur=3bbca, new=36ef9
[ 280.977915] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=2000, pages=55
[ 281.019879] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=a000, pages=53
[ 281.070325] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=e000, pages=45
[ 281.087103] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=12000, pages=43
[ 281.129061] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=16000, pages=41
[ 281.163012] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=18000, pages=37
[ 281.213481] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=1c000, pages=33
[ 281.255727] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=22000, pages=31
[ 281.306177] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=24000, pages=27
[ 281.356888] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=2e000, pages=21
[ 281.398859] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=36000, pages=19
[ 281.585491] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=104d8, new=2000, pages=17
[ 281.644207] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=104d8, new=10000, pages=15
-------------------------------------------------------------------------
Thanks.
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-15 2:12 [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size Chen Gang
@ 2012-10-15 4:27 ` Myklebust, Trond
2012-10-15 4:52 ` Chen Gang
0 siblings, 1 reply; 12+ messages in thread
From: Myklebust, Trond @ 2012-10-15 4:27 UTC (permalink / raw)
To: Chen Gang
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2456 bytes --]
On Mon, 2012-10-15 at 10:12 +0800, Chen Gang wrote:
> Hello Trond Myklebust, Jeff Layton:
>
> 1) Root Cause:
> A) begin truncate to smaller, after async read finish starting.
> B) async read done come, after truncate operation change inode size.
> C) in nfs_inode_attrs_need_update, nfs_size_need_update return true.
> i) the bigger size is the original old size of client itself.
> ii) the smaller size is the current true size.
> iii) nfs_inode_attrs_need_update not consider this situation.
>
> 2) Fix nfs_size_need_update:
> A) delete it:
> i) it is for performance, not necessary (not for correctness).
> ii) if it was necessary, it should use "!=" instead of '>'.
> iii) it is the simplest way to fix this bug (maybe not best way).
> B) consider this situation in it:
> i) it is the best way.
> ii) it is a little complex (need think of)
> iii) sorry for I do not know how to fix it (at least now).
> C) not touch it:
> i) correct another place (such as nfs_update_inode)
> ii) it is a bad idea (at least, I think it is)
> iii) we need keep the source code as clearer as possible.
>
> 3) Test Result:
> A) it is one client and one server separately, under 3.6-rc5 x86_32.
> B) use one process (fsx-linux) test (only one user mode thread).
> C) only use read, truncate, llseek, fstat operation for one file.
>
> Before delete nfs_size_need_update, it causes issue.
> After delete nfs_size_need_update, it is ok.
nfs_size_need_update is not about performance. It is a heuristic that is
entirely about ensuring correctness when faced with the fact that most
Linux filesystems are utterly incapable of reporting with modifications
that occur within < 1 second intervals because their mtime/ctime is
limited to 1 second resolutions.
Now, what are the conditions of your test setup? The above bug report is
meaningless unless it includes a description of what is being exported
by the server (including a proper listing of the contents
of /etc/exports and /proc/mounts). It should also include a description
of the NFS client mount options (see /proc/mounts on the client).
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-15 4:27 ` Myklebust, Trond
@ 2012-10-15 4:52 ` Chen Gang
2012-10-15 5:39 ` Chen Gang
2012-10-15 12:32 ` Myklebust, Trond
0 siblings, 2 replies; 12+ messages in thread
From: Chen Gang @ 2012-10-15 4:52 UTC (permalink / raw)
To: Myklebust, Trond
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月15日 12:27, Myklebust, Trond 写道:
> nfs_size_need_update is not about performance. It is a heuristic that is
> entirely about ensuring correctness when faced with the fact that most
> Linux filesystems are utterly incapable of reporting with modifications
> that occur within < 1 second intervals because their mtime/ctime is
> limited to 1 second resolutions.
>
if truly it was for correctness, why not use "!=" instead of '>' ?
> Now, what are the conditions of your test setup? The above bug report is
> meaningless unless it includes a description of what is being exported
> by the server (including a proper listing of the contents
> of /etc/exports and /proc/mounts). It should also include a description
> of the NFS client mount options (see /proc/mounts on the client).
they are below, if you need additional information, please tell me again.
for server:
(nfsx-linux using rsh auto exportfs in cmd line, not in /etc/exports)
--------------------------------------------------------------------
root@dhcp122:~# exportfs
/tmp/fsx18251.testdir
<world>
/tmp <world>
root@dhcp122:~#
root@dhcp122:~# cat /etc/exports
# /etc/exports: the access control list for filesystems which may be
exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check)
hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#
/tmp *(rw,sync,no_root_squash,no_subtree_check)
root@dhcp122:~#
root@dhcp122:~# cat /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=1229628k,nr_inodes=189901,mode=755 0 0
devpts /dev/pts devpts
rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,relatime,size=516280k,mode=755 0 0
/dev/disk/by-uuid/e843c57e-98ce-44cc-8e02-6d8e8d8a01b6 / ext4
rw,relatime,errors=remount-ro,data=ordered 0 0
cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
root@dhcp122:~#
-----------------------------------------------------------------------
for client:
-----------------------------------------------------------------------
root@dhcp159:/opt/ltp/testscripts# cat /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=1103700k,nr_inodes=190392,mode=755 0 0
devpts /dev/pts devpts
rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,relatime,size=465908k,mode=755 0 0
/dev/disk/by-uuid/418ec1f1-ed9d-4cae-9336-6c742accf538 / ext4
rw,relatime,errors=remount-ro,data=ordered 0 0
cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
/dev/sda1 /mnt/sda1 ext3
rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0
dhcp122.asianux.net:/tmp/fsx18251.testdir/
/opt/ltp/testcases/bin/fsx18251 nfs
rw,relatime,vers=2,rsize=8192,wsize=8192,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.1.0.139,mountvers=1,mountport=39973,mountproto=udp,local_lock=none,addr=10.1.0.139
0 0
root@dhcp159:/opt/ltp/testscripts#
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-15 4:52 ` Chen Gang
@ 2012-10-15 5:39 ` Chen Gang
2012-10-15 12:32 ` Myklebust, Trond
1 sibling, 0 replies; 12+ messages in thread
From: Chen Gang @ 2012-10-15 5:39 UTC (permalink / raw)
To: Myklebust, Trond
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月15日 12:52, Chen Gang 写道:
>> Now, what are the conditions of your test setup? The above bug report is
>> > meaningless unless it includes a description of what is being exported
>> > by the server (including a proper listing of the contents
>> > of /etc/exports and /proc/mounts). It should also include a description
>> > of the NFS client mount options (see /proc/mounts on the client).
for exportfs command line is:
rsh -n $RHOST "/usr/sbin/exportfs -i -o no_root_squash,rw *:$TESTDIR"
$RHOST is dhcp122.asianux.net (10.1.0.122, not need input password)
$TESTDIR just the mount dir.
> they are below, if you need additional information, please tell me again.
>
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-15 4:52 ` Chen Gang
2012-10-15 5:39 ` Chen Gang
@ 2012-10-15 12:32 ` Myklebust, Trond
2012-10-16 1:37 ` Chen Gang
1 sibling, 1 reply; 12+ messages in thread
From: Myklebust, Trond @ 2012-10-15 12:32 UTC (permalink / raw)
To: Chen Gang
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5657 bytes --]
On Mon, 2012-10-15 at 12:52 +0800, Chen Gang wrote:
> äº 2012å¹´10æ15æ¥ 12:27, Myklebust, Trond åé:
> > nfs_size_need_update is not about performance. It is a heuristic that is
> > entirely about ensuring correctness when faced with the fact that most
> > Linux filesystems are utterly incapable of reporting with modifications
> > that occur within < 1 second intervals because their mtime/ctime is
> > limited to 1 second resolutions.
> >
>
> if truly it was for correctness, why not use "!=" instead of '>' ?
RPC is not ordered. The fact that we get one RPC reply before another
does not mean that the server sent them in that order.
This is doubly true when you use UDP as the transport protocol.
> > Now, what are the conditions of your test setup? The above bug report is
> > meaningless unless it includes a description of what is being exported
> > by the server (including a proper listing of the contents
> > of /etc/exports and /proc/mounts). It should also include a description
> > of the NFS client mount options (see /proc/mounts on the client).
>
> they are below, if you need additional information, please tell me again.
>
> for server:
> (nfsx-linux using rsh auto exportfs in cmd line, not in /etc/exports)
> --------------------------------------------------------------------
> root@dhcp122:~# exportfs
> /tmp/fsx18251.testdir
> <world>
> /tmp <world>
> root@dhcp122:~#
> root@dhcp122:~# cat /etc/exports
> # /etc/exports: the access control list for filesystems which may be
> exported
> # to NFS clients. See exports(5).
> #
> # Example for NFSv2 and NFSv3:
> # /srv/homes hostname1(rw,sync,no_subtree_check)
> hostname2(ro,sync,no_subtree_check)
> #
> # Example for NFSv4:
> # /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
> # /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
> #
> /tmp *(rw,sync,no_root_squash,no_subtree_check)
> root@dhcp122:~#
> root@dhcp122:~# cat /proc/mounts
> rootfs / rootfs rw 0 0
> sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
> proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
> udev /dev devtmpfs rw,relatime,size=1229628k,nr_inodes=189901,mode=755 0 0
> devpts /dev/pts devpts
> rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
> tmpfs /run tmpfs rw,nosuid,relatime,size=516280k,mode=755 0 0
> /dev/disk/by-uuid/e843c57e-98ce-44cc-8e02-6d8e8d8a01b6 / ext4
OK. So the export is part of this ext4 filesystem?
> rw,relatime,errors=remount-ro,data=ordered 0 0
> cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
> cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
> cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
> cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
> cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
> cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
> cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
> cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
> none /sys/fs/fuse/connections fusectl rw,relatime 0 0
> none /sys/kernel/debug debugfs rw,relatime 0 0
> none /sys/kernel/security securityfs rw,relatime 0 0
> none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
> none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
> rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
> nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
> root@dhcp122:~#
> -----------------------------------------------------------------------
>
> for client:
> -----------------------------------------------------------------------
>
> root@dhcp159:/opt/ltp/testscripts# cat /proc/mounts
> rootfs / rootfs rw 0 0
> sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
> proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
> udev /dev devtmpfs rw,relatime,size=1103700k,nr_inodes=190392,mode=755 0 0
> devpts /dev/pts devpts
> rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
> tmpfs /run tmpfs rw,nosuid,relatime,size=465908k,mode=755 0 0
> /dev/disk/by-uuid/418ec1f1-ed9d-4cae-9336-6c742accf538 / ext4
> rw,relatime,errors=remount-ro,data=ordered 0 0
> cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
> cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
> cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
> cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
> cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
> cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
> cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
> cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
> none /sys/fs/fuse/connections fusectl rw,relatime 0 0
> none /sys/kernel/debug debugfs rw,relatime 0 0
> none /sys/kernel/security securityfs rw,relatime 0 0
> none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
> none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
> rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
> nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
> /dev/sda1 /mnt/sda1 ext3
> rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0
> dhcp122.asianux.net:/tmp/fsx18251.testdir/
> /opt/ltp/testcases/bin/fsx18251 nfs
> rw,relatime,vers=2,rsize=8192,wsize=8192,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.1.0.139,mountvers=1,mountport=39973,mountproto=udp,local_lock=none,addr=10.1.0.139
> 0 0
> root@dhcp159:/opt/ltp/testscripts#
...and you are using NFSv2 with UDP?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-15 12:32 ` Myklebust, Trond
@ 2012-10-16 1:37 ` Chen Gang
2012-10-16 2:51 ` Myklebust, Trond
0 siblings, 1 reply; 12+ messages in thread
From: Chen Gang @ 2012-10-16 1:37 UTC (permalink / raw)
To: Myklebust, Trond
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月15日 20:32, Myklebust, Trond 写道:
> RPC is not ordered. The fact that we get one RPC reply before another
> does not mean that the server sent them in that order.
>
> This is doubly true when you use UDP as the transport protocol.
1) is it means: nfs_inode_attrs_need_update need not consider async
read_done situation ?
2) for correctness, I do not think "nfs_size_to_loff_t(fattr->size) >
i_size_read(inode)" in nfs_size_need_update is enough. (at least need
use "!=" instead of '>'), do you think so ?
3) another reference:
A) for an old kernel version (such as 2.6.27-rc9), no such issue
(because it did not have nfs_size_need_update).
B) the test tools which I use is from the LTP (Linux Test Project),
they use both udp and tcp to test both the nfsv2 and nfsv3.
C) truly LTP has its limitations: "for stress test, LTP let nfs client
and server under the same machine, which will cause kernel stable
issue", but for net test, LTP use different machine (I got our issue
from LTP net test).
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 1:37 ` Chen Gang
@ 2012-10-16 2:51 ` Myklebust, Trond
2012-10-16 4:13 ` Chen Gang
0 siblings, 1 reply; 12+ messages in thread
From: Myklebust, Trond @ 2012-10-16 2:51 UTC (permalink / raw)
To: Chen Gang
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2084 bytes --]
On Tue, 2012-10-16 at 09:37 +0800, Chen Gang wrote:
> äº 2012å¹´10æ15æ¥ 20:32, Myklebust, Trond åé:
> > RPC is not ordered. The fact that we get one RPC reply before another
> > does not mean that the server sent them in that order.
> >
> > This is doubly true when you use UDP as the transport protocol.
>
> 1) is it means: nfs_inode_attrs_need_update need not consider async
> read_done situation ?
I don't understand what you mean. This is mainly about the asynchronous
write situation...
> 2) for correctness, I do not think "nfs_size_to_loff_t(fattr->size) >
> i_size_read(inode)" in nfs_size_need_update is enough. (at least need
> use "!=" instead of '>'), do you think so ?
No... If I did, I would have changed this 15 years ago when I was
writing that code. Nothing here is new... 2.6.27-rc9 has the exact same
heuristics.
It boils down to the rule that if you want to ensure that data is not
_lost_, then you have to ensure that the cached file size is not less
than the true file size.
> 3) another reference:
>
> A) for an old kernel version (such as 2.6.27-rc9), no such issue
> (because it did not have nfs_size_need_update).
>
> B) the test tools which I use is from the LTP (Linux Test Project),
> they use both udp and tcp to test both the nfsv2 and nfsv3.
So what combinations are failing?
> C) truly LTP has its limitations: "for stress test, LTP let nfs client
> and server under the same machine, which will cause kernel stable
> issue", but for net test, LTP use different machine (I got our issue
> from LTP net test).
Running the client and server on the same machine is likely to deadlock
due to memory pressure issues. The client needs to be able to _increase_
memory pressure on the server in order to reduce its own pressure. That
doesn't work well when client == server.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 2:51 ` Myklebust, Trond
@ 2012-10-16 4:13 ` Chen Gang
2012-10-16 10:33 ` Jeff Layton
0 siblings, 1 reply; 12+ messages in thread
From: Chen Gang @ 2012-10-16 4:13 UTC (permalink / raw)
To: Myklebust, Trond
Cc: Jeff Layton, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月16日 10:51, Myklebust, Trond 写道:
>>
>> 1) is it means: nfs_inode_attrs_need_update need not consider async
>> read_done situation ?
>
> I don't understand what you mean. This is mainly about the asynchronous
> write situation...
for async read done, it will call nfs_readpage_result -> nfs_read_done
-> nfs_refresh_inode -> nfs_refresh_inode_locked ->
nfs_inode_attrs_need_update -> nfs_size_need_update.
we need consider the situation that "async read_done also call
nfs_size_need_update with an old useless larger file size".
you means, it need not consider async read (only consider async write is
enough), is it correct ?
>
> No... If I did, I would have changed this 15 years ago when I was
> writing that code. Nothing here is new... 2.6.27-rc9 has the exact same
> heuristics.
1) I have read the relative source code of 2.6.27-rc9, it is truly no
nfs_size_need_update function.
2) I have test the 2.6.27-rc9, it truly pass the LTP test of udp+nfsv2.
3) I got the 2.6.27-rc9 source code by this way (please check)
A) get source code from (git clone)
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
B) git archive v2.6.27-rc9 | tar -xf - -C ../2.6.27-rc9/
> It boils down to the rule that if you want to ensure that data is not
> _lost_, then you have to ensure that the cached file size is not less
> than the true file size.
>
1) you means: in some condition, the cached file size can be bigger than
the true file size ? can you give some example (which no negative
effect for correctness) ?
2) What I feel:
A) I am not quite familiar with nfs (so truly need your information);
B) I think it is truly a bug, but maybe nfs_size_need_update is not
the root cause (so need nfs maintainers' audit)
C) if nfs_size_need_update is truly not the root cause, I shall
continue analysing it, after get enough information from nfs maintainers.
>> B) the test tools which I use is from the LTP (Linux Test Project),
>> they use both udp and tcp to test both the nfsv2 and nfsv3.
>
> So what combinations are failing?
for udp + nfsv2 failing (I am not test udp + nfsv3)
>
>> C) truly LTP has its limitations: "for stress test, LTP let nfs client
>> and server under the same machine, which will cause kernel stable
>> issue", but for net test, LTP use different machine (I got our issue
>> from LTP net test).
>
> Running the client and server on the same machine is likely to deadlock
> due to memory pressure issues. The client needs to be able to _increase_
> memory pressure on the server in order to reduce its own pressure. That
> doesn't work well when client == server.
>
truly got confirmation from Jeff Layton, 1-2 months ago;
also thank you for giving confirmation too.
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 4:13 ` Chen Gang
@ 2012-10-16 10:33 ` Jeff Layton
2012-10-16 11:44 ` Chen Gang
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2012-10-16 10:33 UTC (permalink / raw)
To: Chen Gang
Cc: Myklebust, Trond, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
On Tue, 16 Oct 2012 12:13:38 +0800
Chen Gang <gang.chen@asianux.com> wrote:
> 于 2012年10月16日 10:51, Myklebust, Trond 写道:
>
> >>
> >> 1) is it means: nfs_inode_attrs_need_update need not consider async
> >> read_done situation ?
> >
> > I don't understand what you mean. This is mainly about the asynchronous
> > write situation...
>
> for async read done, it will call nfs_readpage_result -> nfs_read_done
> -> nfs_refresh_inode -> nfs_refresh_inode_locked ->
> nfs_inode_attrs_need_update -> nfs_size_need_update.
>
> we need consider the situation that "async read_done also call
> nfs_size_need_update with an old useless larger file size".
>
> you means, it need not consider async read (only consider async write is
> enough), is it correct ?
>
> >
> > No... If I did, I would have changed this 15 years ago when I was
> > writing that code. Nothing here is new... 2.6.27-rc9 has the exact same
> > heuristics.
>
> 1) I have read the relative source code of 2.6.27-rc9, it is truly no
> nfs_size_need_update function.
>
> 2) I have test the 2.6.27-rc9, it truly pass the LTP test of udp+nfsv2.
>
> 3) I got the 2.6.27-rc9 source code by this way (please check)
> A) get source code from (git clone)
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> B) git archive v2.6.27-rc9 | tar -xf - -C ../2.6.27-rc9/
>
>
> > It boils down to the rule that if you want to ensure that data is not
> > _lost_, then you have to ensure that the cached file size is not less
> > than the true file size.
> >
>
> 1) you means: in some condition, the cached file size can be bigger than
> the true file size ? can you give some example (which no negative
> effect for correctness) ?
>
> 2) What I feel:
> A) I am not quite familiar with nfs (so truly need your information);
> B) I think it is truly a bug, but maybe nfs_size_need_update is not
> the root cause (so need nfs maintainers' audit)
> C) if nfs_size_need_update is truly not the root cause, I shall
> continue analysing it, after get enough information from nfs maintainers.
>
>
> >> B) the test tools which I use is from the LTP (Linux Test Project),
> >> they use both udp and tcp to test both the nfsv2 and nfsv3.
> >
> > So what combinations are failing?
>
> for udp + nfsv2 failing (I am not test udp + nfsv3)
>
The problem is a little more fundamental than that. The attr cache
handling logic is some of the trickiest code to deal with in the NFS
client.
In any situation where we get back attributes, we have to decide
whether they are valid or stale. It's always possible for replies or
their handling to be reordered such that an older set of attributes
is processed after a newer set.
Unfortunately, the v2/v3 protocols do not have great support for
helping the client detect this situation, so we do the best we can with
what we do have. Unfortunately when things are changing very quickly we
can still get it wrong, especially with v2/3. [1]
In any case, the logic to determine this is in
nfs_inode_attrs_need_update(). Looking at the size is sort of the "last
resort" after we look at the timestamps and gencount.
The problem with doing what you suggest is that if we get it wrong, the
consequences are worse than the file appearing to be bigger than it is.
It means that written data may be silently lost.
======
[1]: v4 has a change attribute so it's slightly simpler there when the
server supports it. Unrelated Q for Trond: should we be checking the v4
change_attr in nfs_inode_attrs_need_update too?
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 10:33 ` Jeff Layton
@ 2012-10-16 11:44 ` Chen Gang
2012-10-16 12:13 ` Jeff Layton
0 siblings, 1 reply; 12+ messages in thread
From: Chen Gang @ 2012-10-16 11:44 UTC (permalink / raw)
To: Jeff Layton
Cc: Myklebust, Trond, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月16日 18:33, Jeff Layton 写道:
> In any situation where we get back attributes, we have to decide
> whether they are valid or stale. It's always possible for replies or
> their handling to be reordered such that an older set of attributes
> is processed after a newer set.
>
> Unfortunately, the v2/v3 protocols do not have great support for
> helping the client detect this situation, so we do the best we can with
> what we do have. Unfortunately when things are changing very quickly we
> can still get it wrong, especially with v2/3. [1]
>
> In any case, the logic to determine this is in
> nfs_inode_attrs_need_update(). Looking at the size is sort of the "last
> resort" after we look at the timestamps and gencount.
>
I agree with you (what I understood originally is just like what you
said above).
thank for your confirmation.
> The problem with doing what you suggest is that if we get it wrong, the
> consequences are worse than the file appearing to be bigger than it is.
> It means that written data may be silently lost.
>
1) I understand why my suggestion is not quite suitable. I agree you.
2) but, are we truly no ways to solve this issue ? (I do not think so).
3) I think an executable way (but maybe not a good way) is :
A) for each client, check each task id of the client its own (such as
rpc task xid), so can know the order of tasks of the client its own.
B) maybe also need another some synchronization code, but I think it
does not have much negative effect with performance.
> ======
>
> [1]: v4 has a change attribute so it's slightly simpler there when the
> server supports it. Unrelated Q for Trond: should we be checking the v4
> change_attr in nfs_inode_attrs_need_update too?
sorry for I am truly not quite familiar with nfs, I also think it is not
quite relative with current issue, so I have to skip it (although I
think these contents are valulable for Trond)
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 11:44 ` Chen Gang
@ 2012-10-16 12:13 ` Jeff Layton
2012-10-17 1:37 ` Chen Gang
0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2012-10-16 12:13 UTC (permalink / raw)
To: Chen Gang
Cc: Myklebust, Trond, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
On Tue, 16 Oct 2012 19:44:38 +0800
Chen Gang <gang.chen@asianux.com> wrote:
>
> 于 2012年10月16日 18:33, Jeff Layton 写道:
> > In any situation where we get back attributes, we have to decide
> > whether they are valid or stale. It's always possible for replies or
> > their handling to be reordered such that an older set of attributes
> > is processed after a newer set.
> >
> > Unfortunately, the v2/v3 protocols do not have great support for
> > helping the client detect this situation, so we do the best we can with
> > what we do have. Unfortunately when things are changing very quickly we
> > can still get it wrong, especially with v2/3. [1]
> >
> > In any case, the logic to determine this is in
> > nfs_inode_attrs_need_update(). Looking at the size is sort of the "last
> > resort" after we look at the timestamps and gencount.
> >
>
> I agree with you (what I understood originally is just like what you
> said above).
>
> thank for your confirmation.
>
> > The problem with doing what you suggest is that if we get it wrong, the
> > consequences are worse than the file appearing to be bigger than it is.
> > It means that written data may be silently lost.
> >
>
> 1) I understand why my suggestion is not quite suitable. I agree you.
>
> 2) but, are we truly no ways to solve this issue ? (I do not think so).
>
Not that I see, but don't let me stop you from trying to find one. ;)
> 3) I think an executable way (but maybe not a good way) is :
>
> A) for each client, check each task id of the client its own (such as
> rpc task xid), so can know the order of tasks of the client its own.
>
We do something like this already. That's what the gencount thing is
all about. It's still possible though to fool that check if two calls
are scheduled closely enough.
Also note that it's not just the reordering of replies that you have to
concern yourself with. The requests themselves can be reordered on the
network. The server is also under no obligation to execute calls in the
order received.
> B) maybe also need another some synchronization code, but I think it
> does not have much negative effect with performance.
>
Yeah, serializing things to fix this is probably a non-starter. NFSv2
and UDP transports are basically legacy code at this point, so there's
not a lot of incentive to do anything drastic here.
>
> > ======
> >
> > [1]: v4 has a change attribute so it's slightly simpler there when the
> > server supports it. Unrelated Q for Trond: should we be checking the v4
> > change_attr in nfs_inode_attrs_need_update too?
>
> sorry for I am truly not quite familiar with nfs, I also think it is not
> quite relative with current issue, so I have to skip it (although I
> think these contents are valulable for Trond)
>
Correct. That was just an aside question for Trond or someone else who
understands the attribute revalidation code better than I do.
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
2012-10-16 12:13 ` Jeff Layton
@ 2012-10-17 1:37 ` Chen Gang
0 siblings, 0 replies; 12+ messages in thread
From: Chen Gang @ 2012-10-17 1:37 UTC (permalink / raw)
To: Jeff Layton
Cc: Myklebust, Trond, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org
于 2012年10月16日 20:13, Jeff Layton 写道:
>>
>> 2) but, are we truly no ways to solve this issue ? (I do not think so).
>>
>
> Not that I see, but don't let me stop you from trying to find one. ;)
>
we can divide the issue to 2 separate parts:
1) the inconsistent attribute by time delay between client and server:
A) it is the nfs v2/v3 design issue, the "user" can understand (not
implementation mistake)
B) we need make the time delay as shorter as we can. (this is the
reason why I call it "performance", although this "word" is still not
quit suitable)
C) "user" can understand, not mean can bear (such as skipping writing
operation attribute changes)
2) the inconsistent attribute by a client itself:
A) it is implementation issue, the "user" can not understand (it is
an implementation mistake)
B) we need solve it (so I call it "correctness" issue).
C) "user" can not understand, not mean can not bear (such as current
issue which I report)
at last, for maintainer:
A) for "performance", we need try our best to do;
B) for "correctness", we need fix it completely;
>> 3) I think an executable way (but maybe not a good way) is :
>>
>> A) for each client, check each task id of the client its own (such as
>> rpc task xid), so can know the order of tasks of the client its own.
>>
>
> We do something like this already. That's what the gencount thing is
> all about. It's still possible though to fool that check if two calls
> are scheduled closely enough.
>
1) I think gencount is not equal to sequence number, the sequence number
can mark all relative tasks of one client in order.
2) I also think, it is not quite complex to make a client itself in
consistency. (it is implementation issue, not design issue)
> Also note that it's not just the reordering of replies that you have to
> concern yourself with. The requests themselves can be reordered on the
> network. The server is also under no obligation to execute calls in the
> order received.
>
1) I agree with you, in nfs_inode_attrs_need_update(), it need consider
this situation (the tasks from server return is not in order).
2) I do not think it can not be accomplished if the tasks of client
itself have sequence number. (maybe, it would be enough to only judge
which task is later between the 2 tasks by sequence number).
>> B) maybe also need another some synchronization code, but I think it
>> does not have much negative effect with performance.
>>
>
> Yeah, serializing things to fix this is probably a non-starter. NFSv2
> and UDP transports are basically legacy code at this point, so there's
> not a lot of incentive to do anything drastic here.
>
1) I agree with what you said above, but maybe you misunderstand of what
I said for the "item B)"
2) the "item B)" is for the completion of "item A)". when we fix this
issue, maybe have to add additional synchronization code which maybe can
cause negative effect with performance, but I think it is not much
("user" can bear).
At last, I suggest we need think of how to fix this implementation bug
in nfs-client region.
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-10-17 1:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-15 2:12 [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size Chen Gang
2012-10-15 4:27 ` Myklebust, Trond
2012-10-15 4:52 ` Chen Gang
2012-10-15 5:39 ` Chen Gang
2012-10-15 12:32 ` Myklebust, Trond
2012-10-16 1:37 ` Chen Gang
2012-10-16 2:51 ` Myklebust, Trond
2012-10-16 4:13 ` Chen Gang
2012-10-16 10:33 ` Jeff Layton
2012-10-16 11:44 ` Chen Gang
2012-10-16 12:13 ` Jeff Layton
2012-10-17 1:37 ` Chen Gang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox