On 2014-10-20 09:02, Zygo Blaxell wrote: > On Mon, Oct 20, 2014 at 04:38:28AM +0000, Duncan wrote: >> Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted: >> >>> # find . -name "*546" >>> ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot >>> access ./1412233213.M638209P10546: No such file or directory >> >> Does your mail server do a lot of renames? Is one perhaps stuck? If so, >> that sounds like the same thing "Zygo Blaxell" is reporting in the >> "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014 >> 15:25:26 -400, Msg-ID: <20141019192525.GA29401@hungrycats.org>, as linked >> here: >> >> >> >> I pointed him at this thread too. I hadn't seen you mention a hung >> rename, but the other symptoms sound similar. > > Not really. It looks like Russell having a NFS client-side problem, > I'm having a server-side one (maybe). Also, all Russell's system calls > seem to be returning promptly, while some of mine are not. Even if > there were timeouts, an NFS server timeout gives a different error than > 'No such file or directory'. Finally, the one and only thing I _can_ > do with my bug is 'ls' on the renamed files (for me, the find would get > stuck before returning any output). > > For Russell's issue...most of the stuff I can think of has been > tried already. I didn't see if there was any attempt try to ls the > file from the NFS server as well as the client side. If ls is OK on > the server but not the client, it's an NFS issue (possibly interacting > with some btrfs-specific quirk); otherwise, it's likely a corrupted > filesystem (mail servers seem to be unusually good at making these). > > Most of the I/O time on mail servers tends to land in the fsync() system > call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after > 3.16, and not in the 3.16.x stable update for x <= 5 (the last one > I've checked)). That said, I'm not familiar with how fsync() translates > over NFS, so it might not be relevant after all. > > If the NFS server's view of the filesystem is OK, check the NFS protocol > version from /proc/mounts on the client. Sometimes NFS clients will > get some transient network error during connection and fall back to some > earlier (and potentially buggier) NFS version. I've seen very different > behavior in some important corner cases from v4 and v3 clients, for > example, and if the client is falling all the way back to v2 the bugs > and their workarounds start to get just plain _weird_ (e.g. filenames > which produce specific values from some hash function or that contain > specific character sequences are unusable). v2 is so old it may even > have issues with 64-bit inode numbers. > Just now saw this thread, but IIRC 'No such file or directory' also gets returned sometimes when trying to automount a share that can't be enumerated by the client, and also sometimes when there is a stale NFS file handle.