Linux NFS development
 help / color / mirror / Atom feed
* REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success;
@ 2026-04-29  5:02 Igor Raits
  2026-04-29  7:12 ` Igor Raits
  0 siblings, 1 reply; 4+ messages in thread
From: Igor Raits @ 2026-04-29  5:02 UTC (permalink / raw)
  To: NeilBrown, Anna Schumaker, Trond Myklebust, linux-nfs
  Cc: Jaroslav Pulchart, Jan Cipa

Hi all,

I think I've run into an NFSv4 client regression and wanted to report
it before I forget the details. Apologies in advance if I'm
mis-reading the code — please correct me if so.

Symptom: an occasional mkdir(2) on an NFSv4 mount returns -EEXIST,
but the directory it was supposed to create is actually present
afterwards. It's reproducible on both NFSv4.0 and NFSv4.2 against an
in-kernel Linux nfsd. Both client and server are running 6.19.14.

Reproducer (random 16-hex names so collisions are not the cause):

  N=2000000; base=/var/gdc/export
  for ((i=1; i<=N; i++)); do
      d=$base/$(openssl rand -hex 8)
      mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
      rmdir "$d" 2>/dev/null
  done

Failures cluster every ~2-3 minutes, and also reliably trigger on the
first mkdir after a few minutes of mount idleness. Each failed mkdir
takes about 100 ms.

strace shows just one syscall, so userspace isn't retrying:

  $ strace -ttt -e trace=mkdir mkdir "$dir"
  mkdir("/var/gdc/export/954ce422698ef4b1", 0777) = -1 EEXIST (File exists)
  +++ exited with 1 +++

A packet capture for one failure (NFSv4.2; the v4.0 capture has the
same shape):

  client → server  CREATE name=...  → NFS4ERR_DELAY (10008)
  ~100 ms later
  client → server  CREATE name=...  → NFS4_OK            ← dir created
  ~80 µs later
  client → server  CREATE name=...  → NFS4ERR_EXIST (17) ← server is right

Three CREATE RPCs from one mkdir(2). The server looks correct: it
returns DELAY, then OK on the retry, then EXIST when the client asks
again for a name that now exists. The client then surfaces that final
EXIST to userspace even though its own previous retry already
succeeded.

While poking around in fs/nfs/nfs4proc.c I noticed nfs4_proc_mkdir()
looks like this in current master:

  do {
      alias = _nfs4_proc_mkdir(dir, dentry, sattr, label, &err);
      trace_nfs4_mkdir(dir, &dentry->d_name, err);
      if (err)
          alias = ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir),
                                                err, &exception));
  } while (exception.retry);

If I'm reading this right, on a successful retry (err == 0)
nfs4_handle_exception() is skipped, so exception.retry stays at the
value it had after the previous DELAY iteration (which is 1). The
loop then runs once more, sends another CREATE for the same name,
and that one legitimately gets NFS4ERR_EXIST. Other do-while loops
in the same file (e.g. nfs4_proc_symlink) seem to call
nfs4_handle_exception() unconditionally, which would reset
exception.retry to 0 on success and exit the loop.

git blame points at:

  dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors
                 in nfs4_do_mkdir()")

(stable backport: 062feb506caf). The change makes sense in itself —
the goal of returning the int separately from the dentry is good — but
the `if (err)` gate around nfs4_handle_exception() seems to be what
introduced the retry-state issue. I might be wrong about that, though,
so please take it with a grain of salt.

Happy to provide pcaps, more traces, or test a patch if useful.
Reproduces on demand here, so iteration should be quick.

Thanks for all the work on NFS,
Igor

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-29 10:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  5:02 REGRESSION: NFSv4: mkdir returns EEXIST after NFS4ERR_DELAY-then-success; Igor Raits
2026-04-29  7:12 ` Igor Raits
2026-04-29  9:58   ` NeilBrown
2026-04-29 10:49     ` [PATCH] NFSv4: clear exception state on successful mkdir retry Igor Raits

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox