Desired RPC client behaviour on socket errors?

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jamie Bainbridge <jbainbri@redhat.com>
To: linux-nfs@vger.kernel.org
Cc: harshula@redhat.com
Subject: Desired RPC client behaviour on socket errors?
Date: Fri, 1 May 2015 01:22:35 -0400 (EDT)	[thread overview]
Message-ID: <284822107.10169266.1430457755109.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <107624115.10168203.1430457528206.JavaMail.zimbra@redhat.com>

Commit 3ed5e2a introduced a change to the RPC client's handling of socket return on connect.

Prior to this commit, any error return was considered instantly fatal and rpc_exit(task,-EIO) was called.

After this commit, socket returns ECONNREFUSED ECONNRESET ECONNABORTED ENETUNREACH EHOSTUNREACH are passed back to the caller. This is a good idea and works well.

However, this commit also causes those returns to call rpc_delay(task,3*HZ) and the RPC connect to retry until the RPC times out. The timeout can be modified with soft/timeo/retrans but defaults to 3 minutes.

In practice this means if a client tries to mount and there is a permanent network error outside the client, a TCP Reset or an ICMP error might get returned, bu the mount will hang and the client will keep trying to connect many times until the RPC times out. Previously a mount would fail almost straight away.

It seems 3ed5e2a solves a problem for transient network errors but creates a problem for permanent network errors.

I agree it's probably desirable for a client application (RPC in this instance) to keep trying to connect until a timeout, and it's good the timeout is configurable, but it's bad that the timeout must be tied to all RPC operations. Someone wanting a quick mount timeout must also suffer a quick NFS operation timeout, not to mention the data corruption risk that goes along with soft.

Should the RPC client call rpc_exit() on an xprt connect which returns ECONNREFUSED ECONNRESET ECONNABORTED ENETUNREACH EHOSTUNREACH because those returns imply a "more permanent" network issue?

Disclosure: We came across this because a customer is (ab)using NFSv4 Migrations in a strange way. One server in fs_locations is firewalled behind a TCP Reset and one is not. Depending on which security zone a client is in, it can connect to one server but not the other. This enables clients in both security zones to use the same NFS mount configuration.

Cheers,
Jamie

next      parent reply	other threads:[~2015-05-01  5:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <107624115.10168203.1430457528206.JavaMail.zimbra@redhat.com>
2015-05-01  5:22 ` Jamie Bainbridge [this message]
2015-05-01 13:39   ` Desired RPC client behaviour on socket errors? Jeff Layton
2015-05-04  6:23     ` Jamie Bainbridge
2015-05-13  4:38       ` Jamie Bainbridge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=284822107.10169266.1430457755109.JavaMail.zimbra@redhat.com \
    --to=jbainbri@redhat.com \
    --cc=harshula@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).