* [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size
@ 2012-10-15 2:12 Chen Gang
2012-10-15 4:27 ` Myklebust, Trond
0 siblings, 1 reply; 12+ messages in thread
From: Chen Gang @ 2012-10-15 2:12 UTC (permalink / raw)
To: Jeff Layton, Trond.Myklebust, linux-nfs, linux-kernel
Hello Trond Myklebust, Jeff Layton:
1) Root Cause:
A) begin truncate to smaller, after async read finish starting.
B) async read done come, after truncate operation change inode size.
C) in nfs_inode_attrs_need_update, nfs_size_need_update return true.
i) the bigger size is the original old size of client itself.
ii) the smaller size is the current true size.
iii) nfs_inode_attrs_need_update not consider this situation.
2) Fix nfs_size_need_update:
A) delete it:
i) it is for performance, not necessary (not for correctness).
ii) if it was necessary, it should use "!=" instead of '>'.
iii) it is the simplest way to fix this bug (maybe not best way).
B) consider this situation in it:
i) it is the best way.
ii) it is a little complex (need think of)
iii) sorry for I do not know how to fix it (at least now).
C) not touch it:
i) correct another place (such as nfs_update_inode)
ii) it is a bad idea (at least, I think it is)
iii) we need keep the source code as clearer as possible.
3) Test Result:
A) it is one client and one server separately, under 3.6-rc5 x86_32.
B) use one process (fsx-linux) test (only one user mode thread).
C) only use read, truncate, llseek, fstat operation for one file.
Before delete nfs_size_need_update, it causes issue.
After delete nfs_size_need_update, it is ok.
User Mode Log:
-------------------------------------------------------------------------
<<<test_start>>>
tag=nfsx-linux stime=1350202875
cmdline="export VERSION SOCKET_TYPE; TCbin=$LTPROOT/testcases/bin fsx.sh"
contacts=""
analysis=exit
<<<test_output>>>
Test Options:
VERSION: 2
RHOST: dhcp122.asianux.net
ITERATIONS: 50000
SOCKET_TYPE: udp
NFS_TYPE: nfs
Setting up remote machine: dhcp122.asianux.net
Mounting NFS filesystem dhcp122.asianux.net:/tmp/fsx1447.testdir on
/opt/ltp/testcases/bin/fsx1447 with options '-o proto=udp,vers=2 '
fsx-linux -N 50000 /opt/ltp/testcases/bin/fsx1447/testfile Starting
truncating to largest ever: 0x13e76
truncating to largest ever: 0x2e52c
truncating to largest ever: 0x3c2c2
truncating to largest ever: 0x3f15f
truncating to largest ever: 0x3fcb9
truncating to largest ever: 0x3fe96
truncating to largest ever: 0x3ff9d
Size error: expected 0x36ef9 stat 0x3bbca seek 0x36ef9
LOG DUMP (5652 total operations):
...
5636: 1350203089.781599 READ 0x143b6 thru 0x21ccb (0xd916 bytes)
5637: 1350203090.028214 READ 0x2a629 thru 0x2d0a1 (0x2a79 bytes)
5638: 1350203090.072029 TRUNCATE DOWN from 0x2d0a2 to 0x1bb35
5639: 1350203090.087401 READ 0x11a05 thru 0x1bb34 (0xa130 bytes)
5640: 1350203090.223985 READ 0x508c thru 0xa9da (0x594f bytes)
5641: 1350203090.245717 TRUNCATE DOWN from 0x1bb35 to 0x8830
5642: 1350203090.353502 READ 0x548f thru 0x882f (0x33a1 bytes)
5643: 1350203090.366596 READ 0x5802 thru 0x882f (0x302e bytes)
5644: 1350203090.366629 TRUNCATE UP from 0x8830 to 0x20011
5645: 1350203090.379476 TRUNCATE DOWN from 0x20011 to 0x134f4
5646: 1350203090.396234 READ 0x124a0 thru 0x134f3 (0x1054 bytes)
5647: 1350203090.401805 READ 0x880b thru 0x1189d (0x9093 bytes)
5648: 1350203090.532050 READ 0x134c7 thru 0x134f3 (0x2d bytes)
5649: 1350203090.532057 TRUNCATE UP from 0x134f4 to 0x3bbca
5650: 1350203090.546373 READ 0x2944c thru 0x2c1d6 (0x2d8b bytes)
5651: 1350203090.561228 READ 0xdbe1 thru 0x16260 (0x8680 bytes)
5652: 1350203090.751937 TRUNCATE DOWN from 0x3bbca to 0x36ef9
Correct content saved for comparison
(maybe hexdump "/opt/ltp/testcases/bin/fsx1447/testfile" vs
"/opt/ltp/testcases/bin/fsx1447/testfile.fsxgood")
fsx-linux -N 50000 /opt/ltp/testcases/bin/fsx1447/testfile Finished
Cleaning up testcase
Unmounting /opt/ltp/testcases/bin/fsx1447
Test Failed: Errors have resulted from this test
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=218 termination_type=exited termination_id=1 corefile=no
cutime=43 cstime=82
<<<test_end>>>
-------------------------------------------------------------------------
Kernel Mode Log: (using printk which I add)
-------------------------------------------------------------------------
Time: My Mark: Task ptr: comments (include function name):
[ 280.883701] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.890677] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.897437] gchen_tag: f5c30000, nfs_read_done call
nfs_refresh_inode, cur=0x3bbca, new=0x3bbca
[ 280.897441] gchen_tag: f5e48c90, nfs_setattr_update_inode, cur=3bbca,
new=36ef9
[ 280.897450] gchen_tag: f5e48c90, nfs_setattr
[ 280.897462] gchen_tag: hit, f5c30000, nfs_refresh_inode_locked,
cur=36ef9, new=3bbca
[ 280.897469] gchen_tag: f5c30000, nfs_update_inode, change size,
cur=36ef9, new=3bbca
[ 280.898129] gchen_tag: f5e48c90, nfs_update_inode, change size,
cur=3bbca, new=36ef9
[ 280.977915] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=2000, pages=55
[ 281.019879] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=a000, pages=53
[ 281.070325] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=e000, pages=45
[ 281.087103] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=12000, pages=43
[ 281.129061] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=16000, pages=41
[ 281.163012] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=18000, pages=37
[ 281.213481] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=1c000, pages=33
[ 281.255727] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=22000, pages=31
[ 281.306177] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=24000, pages=27
[ 281.356888] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=2e000, pages=21
[ 281.398859] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=36ef9, new=36000, pages=19
[ 281.585491] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=104d8, new=2000, pages=17
[ 281.644207] gchen_tag: f5c30000, nfs_update_inode, not change size,
cur=104d8, new=10000, pages=15
-------------------------------------------------------------------------
Thanks.
--
Chen Gang
Asianux Corporation
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-15 2:12 [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size Chen Gang @ 2012-10-15 4:27 ` Myklebust, Trond 2012-10-15 4:52 ` Chen Gang 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-10-15 4:27 UTC (permalink / raw) To: Chen Gang Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2456 bytes --] On Mon, 2012-10-15 at 10:12 +0800, Chen Gang wrote: > Hello Trond Myklebust, Jeff Layton: > > 1) Root Cause: > A) begin truncate to smaller, after async read finish starting. > B) async read done come, after truncate operation change inode size. > C) in nfs_inode_attrs_need_update, nfs_size_need_update return true. > i) the bigger size is the original old size of client itself. > ii) the smaller size is the current true size. > iii) nfs_inode_attrs_need_update not consider this situation. > > 2) Fix nfs_size_need_update: > A) delete it: > i) it is for performance, not necessary (not for correctness). > ii) if it was necessary, it should use "!=" instead of '>'. > iii) it is the simplest way to fix this bug (maybe not best way). > B) consider this situation in it: > i) it is the best way. > ii) it is a little complex (need think of) > iii) sorry for I do not know how to fix it (at least now). > C) not touch it: > i) correct another place (such as nfs_update_inode) > ii) it is a bad idea (at least, I think it is) > iii) we need keep the source code as clearer as possible. > > 3) Test Result: > A) it is one client and one server separately, under 3.6-rc5 x86_32. > B) use one process (fsx-linux) test (only one user mode thread). > C) only use read, truncate, llseek, fstat operation for one file. > > Before delete nfs_size_need_update, it causes issue. > After delete nfs_size_need_update, it is ok. nfs_size_need_update is not about performance. It is a heuristic that is entirely about ensuring correctness when faced with the fact that most Linux filesystems are utterly incapable of reporting with modifications that occur within < 1 second intervals because their mtime/ctime is limited to 1 second resolutions. Now, what are the conditions of your test setup? The above bug report is meaningless unless it includes a description of what is being exported by the server (including a proper listing of the contents of /etc/exports and /proc/mounts). It should also include a description of the NFS client mount options (see /proc/mounts on the client). -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-15 4:27 ` Myklebust, Trond @ 2012-10-15 4:52 ` Chen Gang 2012-10-15 5:39 ` Chen Gang 2012-10-15 12:32 ` Myklebust, Trond 0 siblings, 2 replies; 12+ messages in thread From: Chen Gang @ 2012-10-15 4:52 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月15日 12:27, Myklebust, Trond 写道: > nfs_size_need_update is not about performance. It is a heuristic that is > entirely about ensuring correctness when faced with the fact that most > Linux filesystems are utterly incapable of reporting with modifications > that occur within < 1 second intervals because their mtime/ctime is > limited to 1 second resolutions. > if truly it was for correctness, why not use "!=" instead of '>' ? > Now, what are the conditions of your test setup? The above bug report is > meaningless unless it includes a description of what is being exported > by the server (including a proper listing of the contents > of /etc/exports and /proc/mounts). It should also include a description > of the NFS client mount options (see /proc/mounts on the client). they are below, if you need additional information, please tell me again. for server: (nfsx-linux using rsh auto exportfs in cmd line, not in /etc/exports) -------------------------------------------------------------------- root@dhcp122:~# exportfs /tmp/fsx18251.testdir <world> /tmp <world> root@dhcp122:~# root@dhcp122:~# cat /etc/exports # /etc/exports: the access control list for filesystems which may be exported # to NFS clients. See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: # /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) # /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) # /tmp *(rw,sync,no_root_squash,no_subtree_check) root@dhcp122:~# root@dhcp122:~# cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=1229628k,nr_inodes=189901,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,relatime,size=516280k,mode=755 0 0 /dev/disk/by-uuid/e843c57e-98ce-44cc-8e02-6d8e8d8a01b6 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0 cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 none /sys/fs/fuse/connections fusectl rw,relatime 0 0 none /sys/kernel/debug debugfs rw,relatime 0 0 none /sys/kernel/security securityfs rw,relatime 0 0 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 root@dhcp122:~# ----------------------------------------------------------------------- for client: ----------------------------------------------------------------------- root@dhcp159:/opt/ltp/testscripts# cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=1103700k,nr_inodes=190392,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,relatime,size=465908k,mode=755 0 0 /dev/disk/by-uuid/418ec1f1-ed9d-4cae-9336-6c742accf538 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0 cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 none /sys/fs/fuse/connections fusectl rw,relatime 0 0 none /sys/kernel/debug debugfs rw,relatime 0 0 none /sys/kernel/security securityfs rw,relatime 0 0 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 /dev/sda1 /mnt/sda1 ext3 rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0 dhcp122.asianux.net:/tmp/fsx18251.testdir/ /opt/ltp/testcases/bin/fsx18251 nfs rw,relatime,vers=2,rsize=8192,wsize=8192,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.1.0.139,mountvers=1,mountport=39973,mountproto=udp,local_lock=none,addr=10.1.0.139 0 0 root@dhcp159:/opt/ltp/testscripts# -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-15 4:52 ` Chen Gang @ 2012-10-15 5:39 ` Chen Gang 2012-10-15 12:32 ` Myklebust, Trond 1 sibling, 0 replies; 12+ messages in thread From: Chen Gang @ 2012-10-15 5:39 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月15日 12:52, Chen Gang 写道: >> Now, what are the conditions of your test setup? The above bug report is >> > meaningless unless it includes a description of what is being exported >> > by the server (including a proper listing of the contents >> > of /etc/exports and /proc/mounts). It should also include a description >> > of the NFS client mount options (see /proc/mounts on the client). for exportfs command line is: rsh -n $RHOST "/usr/sbin/exportfs -i -o no_root_squash,rw *:$TESTDIR" $RHOST is dhcp122.asianux.net (10.1.0.122, not need input password) $TESTDIR just the mount dir. > they are below, if you need additional information, please tell me again. > -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-15 4:52 ` Chen Gang 2012-10-15 5:39 ` Chen Gang @ 2012-10-15 12:32 ` Myklebust, Trond 2012-10-16 1:37 ` Chen Gang 1 sibling, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-10-15 12:32 UTC (permalink / raw) To: Chen Gang Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5657 bytes --] On Mon, 2012-10-15 at 12:52 +0800, Chen Gang wrote: > äº 2012å¹´10æ15æ¥ 12:27, Myklebust, Trond åé: > > nfs_size_need_update is not about performance. It is a heuristic that is > > entirely about ensuring correctness when faced with the fact that most > > Linux filesystems are utterly incapable of reporting with modifications > > that occur within < 1 second intervals because their mtime/ctime is > > limited to 1 second resolutions. > > > > if truly it was for correctness, why not use "!=" instead of '>' ? RPC is not ordered. The fact that we get one RPC reply before another does not mean that the server sent them in that order. This is doubly true when you use UDP as the transport protocol. > > Now, what are the conditions of your test setup? The above bug report is > > meaningless unless it includes a description of what is being exported > > by the server (including a proper listing of the contents > > of /etc/exports and /proc/mounts). It should also include a description > > of the NFS client mount options (see /proc/mounts on the client). > > they are below, if you need additional information, please tell me again. > > for server: > (nfsx-linux using rsh auto exportfs in cmd line, not in /etc/exports) > -------------------------------------------------------------------- > root@dhcp122:~# exportfs > /tmp/fsx18251.testdir > <world> > /tmp <world> > root@dhcp122:~# > root@dhcp122:~# cat /etc/exports > # /etc/exports: the access control list for filesystems which may be > exported > # to NFS clients. See exports(5). > # > # Example for NFSv2 and NFSv3: > # /srv/homes hostname1(rw,sync,no_subtree_check) > hostname2(ro,sync,no_subtree_check) > # > # Example for NFSv4: > # /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) > # /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) > # > /tmp *(rw,sync,no_root_squash,no_subtree_check) > root@dhcp122:~# > root@dhcp122:~# cat /proc/mounts > rootfs / rootfs rw 0 0 > sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 > udev /dev devtmpfs rw,relatime,size=1229628k,nr_inodes=189901,mode=755 0 0 > devpts /dev/pts devpts > rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 > tmpfs /run tmpfs rw,nosuid,relatime,size=516280k,mode=755 0 0 > /dev/disk/by-uuid/e843c57e-98ce-44cc-8e02-6d8e8d8a01b6 / ext4 OK. So the export is part of this ext4 filesystem? > rw,relatime,errors=remount-ro,data=ordered 0 0 > cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 > cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 > cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 > cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 > cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 > cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 > cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 > cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 > none /sys/fs/fuse/connections fusectl rw,relatime 0 0 > none /sys/kernel/debug debugfs rw,relatime 0 0 > none /sys/kernel/security securityfs rw,relatime 0 0 > none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 > none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 > rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 > nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 > root@dhcp122:~# > ----------------------------------------------------------------------- > > for client: > ----------------------------------------------------------------------- > > root@dhcp159:/opt/ltp/testscripts# cat /proc/mounts > rootfs / rootfs rw 0 0 > sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 > udev /dev devtmpfs rw,relatime,size=1103700k,nr_inodes=190392,mode=755 0 0 > devpts /dev/pts devpts > rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 > tmpfs /run tmpfs rw,nosuid,relatime,size=465908k,mode=755 0 0 > /dev/disk/by-uuid/418ec1f1-ed9d-4cae-9336-6c742accf538 / ext4 > rw,relatime,errors=remount-ro,data=ordered 0 0 > cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 > cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 > cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 > cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 > cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 > cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 > cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 > cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 > none /sys/fs/fuse/connections fusectl rw,relatime 0 0 > none /sys/kernel/debug debugfs rw,relatime 0 0 > none /sys/kernel/security securityfs rw,relatime 0 0 > none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 > none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 > rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 > nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 > /dev/sda1 /mnt/sda1 ext3 > rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0 > dhcp122.asianux.net:/tmp/fsx18251.testdir/ > /opt/ltp/testcases/bin/fsx18251 nfs > rw,relatime,vers=2,rsize=8192,wsize=8192,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.1.0.139,mountvers=1,mountport=39973,mountproto=udp,local_lock=none,addr=10.1.0.139 > 0 0 > root@dhcp159:/opt/ltp/testscripts# ...and you are using NFSv2 with UDP? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-15 12:32 ` Myklebust, Trond @ 2012-10-16 1:37 ` Chen Gang 2012-10-16 2:51 ` Myklebust, Trond 0 siblings, 1 reply; 12+ messages in thread From: Chen Gang @ 2012-10-16 1:37 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月15日 20:32, Myklebust, Trond 写道: > RPC is not ordered. The fact that we get one RPC reply before another > does not mean that the server sent them in that order. > > This is doubly true when you use UDP as the transport protocol. 1) is it means: nfs_inode_attrs_need_update need not consider async read_done situation ? 2) for correctness, I do not think "nfs_size_to_loff_t(fattr->size) > i_size_read(inode)" in nfs_size_need_update is enough. (at least need use "!=" instead of '>'), do you think so ? 3) another reference: A) for an old kernel version (such as 2.6.27-rc9), no such issue (because it did not have nfs_size_need_update). B) the test tools which I use is from the LTP (Linux Test Project), they use both udp and tcp to test both the nfsv2 and nfsv3. C) truly LTP has its limitations: "for stress test, LTP let nfs client and server under the same machine, which will cause kernel stable issue", but for net test, LTP use different machine (I got our issue from LTP net test). -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 1:37 ` Chen Gang @ 2012-10-16 2:51 ` Myklebust, Trond 2012-10-16 4:13 ` Chen Gang 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-10-16 2:51 UTC (permalink / raw) To: Chen Gang Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2084 bytes --] On Tue, 2012-10-16 at 09:37 +0800, Chen Gang wrote: > äº 2012å¹´10æ15æ¥ 20:32, Myklebust, Trond åé: > > RPC is not ordered. The fact that we get one RPC reply before another > > does not mean that the server sent them in that order. > > > > This is doubly true when you use UDP as the transport protocol. > > 1) is it means: nfs_inode_attrs_need_update need not consider async > read_done situation ? I don't understand what you mean. This is mainly about the asynchronous write situation... > 2) for correctness, I do not think "nfs_size_to_loff_t(fattr->size) > > i_size_read(inode)" in nfs_size_need_update is enough. (at least need > use "!=" instead of '>'), do you think so ? No... If I did, I would have changed this 15 years ago when I was writing that code. Nothing here is new... 2.6.27-rc9 has the exact same heuristics. It boils down to the rule that if you want to ensure that data is not _lost_, then you have to ensure that the cached file size is not less than the true file size. > 3) another reference: > > A) for an old kernel version (such as 2.6.27-rc9), no such issue > (because it did not have nfs_size_need_update). > > B) the test tools which I use is from the LTP (Linux Test Project), > they use both udp and tcp to test both the nfsv2 and nfsv3. So what combinations are failing? > C) truly LTP has its limitations: "for stress test, LTP let nfs client > and server under the same machine, which will cause kernel stable > issue", but for net test, LTP use different machine (I got our issue > from LTP net test). Running the client and server on the same machine is likely to deadlock due to memory pressure issues. The client needs to be able to _increase_ memory pressure on the server in order to reduce its own pressure. That doesn't work well when client == server. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 2:51 ` Myklebust, Trond @ 2012-10-16 4:13 ` Chen Gang 2012-10-16 10:33 ` Jeff Layton 0 siblings, 1 reply; 12+ messages in thread From: Chen Gang @ 2012-10-16 4:13 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月16日 10:51, Myklebust, Trond 写道: >> >> 1) is it means: nfs_inode_attrs_need_update need not consider async >> read_done situation ? > > I don't understand what you mean. This is mainly about the asynchronous > write situation... for async read done, it will call nfs_readpage_result -> nfs_read_done -> nfs_refresh_inode -> nfs_refresh_inode_locked -> nfs_inode_attrs_need_update -> nfs_size_need_update. we need consider the situation that "async read_done also call nfs_size_need_update with an old useless larger file size". you means, it need not consider async read (only consider async write is enough), is it correct ? > > No... If I did, I would have changed this 15 years ago when I was > writing that code. Nothing here is new... 2.6.27-rc9 has the exact same > heuristics. 1) I have read the relative source code of 2.6.27-rc9, it is truly no nfs_size_need_update function. 2) I have test the 2.6.27-rc9, it truly pass the LTP test of udp+nfsv2. 3) I got the 2.6.27-rc9 source code by this way (please check) A) get source code from (git clone) git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git B) git archive v2.6.27-rc9 | tar -xf - -C ../2.6.27-rc9/ > It boils down to the rule that if you want to ensure that data is not > _lost_, then you have to ensure that the cached file size is not less > than the true file size. > 1) you means: in some condition, the cached file size can be bigger than the true file size ? can you give some example (which no negative effect for correctness) ? 2) What I feel: A) I am not quite familiar with nfs (so truly need your information); B) I think it is truly a bug, but maybe nfs_size_need_update is not the root cause (so need nfs maintainers' audit) C) if nfs_size_need_update is truly not the root cause, I shall continue analysing it, after get enough information from nfs maintainers. >> B) the test tools which I use is from the LTP (Linux Test Project), >> they use both udp and tcp to test both the nfsv2 and nfsv3. > > So what combinations are failing? for udp + nfsv2 failing (I am not test udp + nfsv3) > >> C) truly LTP has its limitations: "for stress test, LTP let nfs client >> and server under the same machine, which will cause kernel stable >> issue", but for net test, LTP use different machine (I got our issue >> from LTP net test). > > Running the client and server on the same machine is likely to deadlock > due to memory pressure issues. The client needs to be able to _increase_ > memory pressure on the server in order to reduce its own pressure. That > doesn't work well when client == server. > truly got confirmation from Jeff Layton, 1-2 months ago; also thank you for giving confirmation too. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 4:13 ` Chen Gang @ 2012-10-16 10:33 ` Jeff Layton 2012-10-16 11:44 ` Chen Gang 0 siblings, 1 reply; 12+ messages in thread From: Jeff Layton @ 2012-10-16 10:33 UTC (permalink / raw) To: Chen Gang Cc: Myklebust, Trond, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Tue, 16 Oct 2012 12:13:38 +0800 Chen Gang <gang.chen@asianux.com> wrote: > 于 2012年10月16日 10:51, Myklebust, Trond 写道: > > >> > >> 1) is it means: nfs_inode_attrs_need_update need not consider async > >> read_done situation ? > > > > I don't understand what you mean. This is mainly about the asynchronous > > write situation... > > for async read done, it will call nfs_readpage_result -> nfs_read_done > -> nfs_refresh_inode -> nfs_refresh_inode_locked -> > nfs_inode_attrs_need_update -> nfs_size_need_update. > > we need consider the situation that "async read_done also call > nfs_size_need_update with an old useless larger file size". > > you means, it need not consider async read (only consider async write is > enough), is it correct ? > > > > > No... If I did, I would have changed this 15 years ago when I was > > writing that code. Nothing here is new... 2.6.27-rc9 has the exact same > > heuristics. > > 1) I have read the relative source code of 2.6.27-rc9, it is truly no > nfs_size_need_update function. > > 2) I have test the 2.6.27-rc9, it truly pass the LTP test of udp+nfsv2. > > 3) I got the 2.6.27-rc9 source code by this way (please check) > A) get source code from (git clone) > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > B) git archive v2.6.27-rc9 | tar -xf - -C ../2.6.27-rc9/ > > > > It boils down to the rule that if you want to ensure that data is not > > _lost_, then you have to ensure that the cached file size is not less > > than the true file size. > > > > 1) you means: in some condition, the cached file size can be bigger than > the true file size ? can you give some example (which no negative > effect for correctness) ? > > 2) What I feel: > A) I am not quite familiar with nfs (so truly need your information); > B) I think it is truly a bug, but maybe nfs_size_need_update is not > the root cause (so need nfs maintainers' audit) > C) if nfs_size_need_update is truly not the root cause, I shall > continue analysing it, after get enough information from nfs maintainers. > > > >> B) the test tools which I use is from the LTP (Linux Test Project), > >> they use both udp and tcp to test both the nfsv2 and nfsv3. > > > > So what combinations are failing? > > for udp + nfsv2 failing (I am not test udp + nfsv3) > The problem is a little more fundamental than that. The attr cache handling logic is some of the trickiest code to deal with in the NFS client. In any situation where we get back attributes, we have to decide whether they are valid or stale. It's always possible for replies or their handling to be reordered such that an older set of attributes is processed after a newer set. Unfortunately, the v2/v3 protocols do not have great support for helping the client detect this situation, so we do the best we can with what we do have. Unfortunately when things are changing very quickly we can still get it wrong, especially with v2/3. [1] In any case, the logic to determine this is in nfs_inode_attrs_need_update(). Looking at the size is sort of the "last resort" after we look at the timestamps and gencount. The problem with doing what you suggest is that if we get it wrong, the consequences are worse than the file appearing to be bigger than it is. It means that written data may be silently lost. ====== [1]: v4 has a change attribute so it's slightly simpler there when the server supports it. Unrelated Q for Trond: should we be checking the v4 change_attr in nfs_inode_attrs_need_update too? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 10:33 ` Jeff Layton @ 2012-10-16 11:44 ` Chen Gang 2012-10-16 12:13 ` Jeff Layton 0 siblings, 1 reply; 12+ messages in thread From: Chen Gang @ 2012-10-16 11:44 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月16日 18:33, Jeff Layton 写道: > In any situation where we get back attributes, we have to decide > whether they are valid or stale. It's always possible for replies or > their handling to be reordered such that an older set of attributes > is processed after a newer set. > > Unfortunately, the v2/v3 protocols do not have great support for > helping the client detect this situation, so we do the best we can with > what we do have. Unfortunately when things are changing very quickly we > can still get it wrong, especially with v2/3. [1] > > In any case, the logic to determine this is in > nfs_inode_attrs_need_update(). Looking at the size is sort of the "last > resort" after we look at the timestamps and gencount. > I agree with you (what I understood originally is just like what you said above). thank for your confirmation. > The problem with doing what you suggest is that if we get it wrong, the > consequences are worse than the file appearing to be bigger than it is. > It means that written data may be silently lost. > 1) I understand why my suggestion is not quite suitable. I agree you. 2) but, are we truly no ways to solve this issue ? (I do not think so). 3) I think an executable way (but maybe not a good way) is : A) for each client, check each task id of the client its own (such as rpc task xid), so can know the order of tasks of the client its own. B) maybe also need another some synchronization code, but I think it does not have much negative effect with performance. > ====== > > [1]: v4 has a change attribute so it's slightly simpler there when the > server supports it. Unrelated Q for Trond: should we be checking the v4 > change_attr in nfs_inode_attrs_need_update too? sorry for I am truly not quite familiar with nfs, I also think it is not quite relative with current issue, so I have to skip it (although I think these contents are valulable for Trond) -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 11:44 ` Chen Gang @ 2012-10-16 12:13 ` Jeff Layton 2012-10-17 1:37 ` Chen Gang 0 siblings, 1 reply; 12+ messages in thread From: Jeff Layton @ 2012-10-16 12:13 UTC (permalink / raw) To: Chen Gang Cc: Myklebust, Trond, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Tue, 16 Oct 2012 19:44:38 +0800 Chen Gang <gang.chen@asianux.com> wrote: > > 于 2012年10月16日 18:33, Jeff Layton 写道: > > In any situation where we get back attributes, we have to decide > > whether they are valid or stale. It's always possible for replies or > > their handling to be reordered such that an older set of attributes > > is processed after a newer set. > > > > Unfortunately, the v2/v3 protocols do not have great support for > > helping the client detect this situation, so we do the best we can with > > what we do have. Unfortunately when things are changing very quickly we > > can still get it wrong, especially with v2/3. [1] > > > > In any case, the logic to determine this is in > > nfs_inode_attrs_need_update(). Looking at the size is sort of the "last > > resort" after we look at the timestamps and gencount. > > > > I agree with you (what I understood originally is just like what you > said above). > > thank for your confirmation. > > > The problem with doing what you suggest is that if we get it wrong, the > > consequences are worse than the file appearing to be bigger than it is. > > It means that written data may be silently lost. > > > > 1) I understand why my suggestion is not quite suitable. I agree you. > > 2) but, are we truly no ways to solve this issue ? (I do not think so). > Not that I see, but don't let me stop you from trying to find one. ;) > 3) I think an executable way (but maybe not a good way) is : > > A) for each client, check each task id of the client its own (such as > rpc task xid), so can know the order of tasks of the client its own. > We do something like this already. That's what the gencount thing is all about. It's still possible though to fool that check if two calls are scheduled closely enough. Also note that it's not just the reordering of replies that you have to concern yourself with. The requests themselves can be reordered on the network. The server is also under no obligation to execute calls in the order received. > B) maybe also need another some synchronization code, but I think it > does not have much negative effect with performance. > Yeah, serializing things to fix this is probably a non-starter. NFSv2 and UDP transports are basically legacy code at this point, so there's not a lot of incentive to do anything drastic here. > > > ====== > > > > [1]: v4 has a change attribute so it's slightly simpler there when the > > server supports it. Unrelated Q for Trond: should we be checking the v4 > > change_attr in nfs_inode_attrs_need_update too? > > sorry for I am truly not quite familiar with nfs, I also think it is not > quite relative with current issue, so I have to skip it (although I > think these contents are valulable for Trond) > Correct. That was just an aside question for Trond or someone else who understands the attribute revalidation code better than I do. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size 2012-10-16 12:13 ` Jeff Layton @ 2012-10-17 1:37 ` Chen Gang 0 siblings, 0 replies; 12+ messages in thread From: Chen Gang @ 2012-10-17 1:37 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org 于 2012年10月16日 20:13, Jeff Layton 写道: >> >> 2) but, are we truly no ways to solve this issue ? (I do not think so). >> > > Not that I see, but don't let me stop you from trying to find one. ;) > we can divide the issue to 2 separate parts: 1) the inconsistent attribute by time delay between client and server: A) it is the nfs v2/v3 design issue, the "user" can understand (not implementation mistake) B) we need make the time delay as shorter as we can. (this is the reason why I call it "performance", although this "word" is still not quit suitable) C) "user" can understand, not mean can bear (such as skipping writing operation attribute changes) 2) the inconsistent attribute by a client itself: A) it is implementation issue, the "user" can not understand (it is an implementation mistake) B) we need solve it (so I call it "correctness" issue). C) "user" can not understand, not mean can not bear (such as current issue which I report) at last, for maintainer: A) for "performance", we need try our best to do; B) for "correctness", we need fix it completely; >> 3) I think an executable way (but maybe not a good way) is : >> >> A) for each client, check each task id of the client its own (such as >> rpc task xid), so can know the order of tasks of the client its own. >> > > We do something like this already. That's what the gencount thing is > all about. It's still possible though to fool that check if two calls > are scheduled closely enough. > 1) I think gencount is not equal to sequence number, the sequence number can mark all relative tasks of one client in order. 2) I also think, it is not quite complex to make a client itself in consistency. (it is implementation issue, not design issue) > Also note that it's not just the reordering of replies that you have to > concern yourself with. The requests themselves can be reordered on the > network. The server is also under no obligation to execute calls in the > order received. > 1) I agree with you, in nfs_inode_attrs_need_update(), it need consider this situation (the tasks from server return is not in order). 2) I do not think it can not be accomplished if the tasks of client itself have sequence number. (maybe, it would be enough to only judge which task is later between the 2 tasks by sequence number). >> B) maybe also need another some synchronization code, but I think it >> does not have much negative effect with performance. >> > > Yeah, serializing things to fix this is probably a non-starter. NFSv2 > and UDP transports are basically legacy code at this point, so there's > not a lot of incentive to do anything drastic here. > 1) I agree with what you said above, but maybe you misunderstand of what I said for the "item B)" 2) the "item B)" is for the completion of "item A)". when we fix this issue, maybe have to add additional synchronization code which maybe can cause negative effect with performance, but I think it is not much ("user" can bear). At last, I suggest we need think of how to fix this implementation bug in nfs-client region. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-10-17 1:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-15 2:12 [Bug fix] nfs-client: fix nfs_inode_attrs_need_update for async read_done comes during truncating to smaller size Chen Gang 2012-10-15 4:27 ` Myklebust, Trond 2012-10-15 4:52 ` Chen Gang 2012-10-15 5:39 ` Chen Gang 2012-10-15 12:32 ` Myklebust, Trond 2012-10-16 1:37 ` Chen Gang 2012-10-16 2:51 ` Myklebust, Trond 2012-10-16 4:13 ` Chen Gang 2012-10-16 10:33 ` Jeff Layton 2012-10-16 11:44 ` Chen Gang 2012-10-16 12:13 ` Jeff Layton 2012-10-17 1:37 ` Chen Gang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox