All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Marc Dionne <marc.dionne@auristor.com>
Cc: David Howells <dhowells@redhat.com>,
	linux-afs@lists.infradead.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Jeffrey E Altman <jaltman@auristor.com>
Subject: [PATCH v2 07/40] afs: Add comments on abort handling
Date: Wed, 13 Dec 2023 13:49:29 +0000	[thread overview]
Message-ID: <20231213135003.367397-8-dhowells@redhat.com> (raw)
In-Reply-To: <20231213135003.367397-1-dhowells@redhat.com>

Add some comments on AFS abort code handling in the rotation algorithm and
adjust the errors produced to match.

Reported-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
---
 fs/afs/rotate.c | 101 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 90 insertions(+), 11 deletions(-)

diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index a840c3588ebb..a3d127953ac6 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -13,6 +13,7 @@
 #include <linux/sched/signal.h>
 #include "internal.h"
 #include "afs_fs.h"
+#include "protocol_uae.h"
 
 /*
  * Begin iteration through a server list, starting with the vnode's last used
@@ -143,6 +144,11 @@ bool afs_select_fileserver(struct afs_operation *op)
 	case -ECONNABORTED:
 		/* The far side rejected the operation on some grounds.  This
 		 * might involve the server being busy or the volume having been moved.
+		 *
+		 * Note that various V* errors should not be sent to a cache manager
+		 * by a fileserver as they should be translated to more modern UAE*
+		 * errors instead.  IBM AFS and OpenAFS fileservers, however, do leak
+		 * these abort codes.
 		 */
 		switch (op->ac.abort_code) {
 		case VNOVOL:
@@ -150,6 +156,11 @@ bool afs_select_fileserver(struct afs_operation *op)
 			 * - May indicate that the VL is wrong - retry once and compare
 			 *   the results.
 			 * - May indicate that the fileserver couldn't attach to the vol.
+			 * - The volume might have been temporarily removed so that it can
+			 *   be replaced by a volume restore.  "vos" might have ended one
+			 *   transaction and has yet to create the next.
+			 * - The volume might not be blessed or might not be in-service
+			 *   (administrative action).
 			 */
 			if (op->flags & AFS_OPERATION_VNOVOL) {
 				op->error = -EREMOTEIO;
@@ -183,16 +194,56 @@ bool afs_select_fileserver(struct afs_operation *op)
 			_leave(" = t [vnovol]");
 			return true;
 
-		case VSALVAGE: /* TODO: Should this return an error or iterate? */
 		case VVOLEXISTS:
-		case VNOSERVICE:
 		case VONLINE:
-		case VDISKFULL:
-		case VOVERQUOTA:
-			op->error = afs_abort_to_error(op->ac.abort_code);
+			/* These should not be returned from the fileserver. */
+			pr_warn("Fileserver returned unexpected abort %d\n",
+				op->ac.abort_code);
+			op->error = -EREMOTEIO;
+			goto next_server;
+
+		case VNOSERVICE:
+			/* Prior to AFS 3.2 VNOSERVICE was returned from the fileserver
+			 * if the volume was neither in-service nor administratively
+			 * blessed.  All usage was replaced by VNOVOL because AFS 3.1 and
+			 * earlier cache managers did not handle VNOSERVICE and assumed
+			 * it was the client OSes errno 105.
+			 *
+			 * Starting with OpenAFS 1.4.8 VNOSERVICE was repurposed as the
+			 * fileserver idle dead time error which was sent in place of
+			 * RX_CALL_TIMEOUT (-3).  The error was intended to be sent if the
+			 * fileserver took too long to send a reply to the client.
+			 * RX_CALL_TIMEOUT would have caused the cache manager to mark the
+			 * server down whereas VNOSERVICE since AFS 3.2 would cause cache
+			 * manager to temporarily (up to 15 minutes) mark the volume
+			 * instance as unusable.
+			 *
+			 * The idle dead logic resulted in cache inconsistency since a
+			 * state changing call that the cache manager assumed was dead
+			 * could still be processed to completion by the fileserver.  This
+			 * logic was removed in OpenAFS 1.8.0 and VNOSERVICE is no longer
+			 * returned.  However, many 1.4.8 through 1.6.24 fileservers are
+			 * still in existence.
+			 *
+			 * AuriStorFS fileservers have never returned VNOSERVICE.
+			 *
+			 * VNOSERVICE should be treated as an alias for RX_CALL_TIMEOUT.
+			 */
+		case RX_CALL_TIMEOUT:
+			op->error = -ETIMEDOUT;
 			goto next_server;
 
+		case VSALVAGING: /* This error should not be leaked to cache managers
+				  * but is from OpenAFS demand attach fileservers.
+				  * It should be treated as an alias for VOFFLINE.
+				  */
+		case VSALVAGE: /* VSALVAGE should be treated as a synonym of VOFFLINE */
 		case VOFFLINE:
+			/* The volume is in use by the volserver or another volume utility
+			 * for an operation that might alter the contents.  The volume is
+			 * expected to come back but it might take a long time (could be
+			 * days).
+			 */
 			if (!test_and_set_bit(AFS_VOLUME_OFFLINE, &op->volume->flags)) {
 				afs_busy(op->volume, op->ac.abort_code);
 				clear_bit(AFS_VOLUME_BUSY, &op->volume->flags);
@@ -207,11 +258,20 @@ bool afs_select_fileserver(struct afs_operation *op)
 			}
 			goto busy;
 
-		case VSALVAGING:
-		case VRESTARTING:
+		case VRESTARTING: /* The fileserver is either shutting down or starting up. */
 		case VBUSY:
-			/* Retry after going round all the servers unless we
-			 * have a file lock we need to maintain.
+			/* The volume is in use by the volserver or another volume
+			 * utility for an operation that is not expected to alter the
+			 * contents of the volume.  VBUSY does not need to be returned
+			 * for a ROVOL or BACKVOL bound to an ITBusy volserver
+			 * transaction.  The fileserver is permitted to continue serving
+			 * content from ROVOLs and BACKVOLs during an ITBusy transaction
+			 * because the content will not change.  However, many fileserver
+			 * releases do return VBUSY for ROVOL and BACKVOL instances under
+			 * many circumstances.
+			 *
+			 * Retry after going round all the servers unless we have a file
+			 * lock we need to maintain.
 			 */
 			if (op->flags & AFS_OPERATION_NO_VSLEEP) {
 				op->error = -EBUSY;
@@ -226,7 +286,7 @@ bool afs_select_fileserver(struct afs_operation *op)
 				if (!afs_sleep_and_retry(op))
 					goto failed;
 
-				 /* Retry with same server & address */
+				/* Retry with same server & address */
 				_leave(" = t [vbusy]");
 				return true;
 			}
@@ -270,10 +330,29 @@ bool afs_select_fileserver(struct afs_operation *op)
 
 			goto restart_from_beginning;
 
+		case VDISKFULL:
+		case UAENOSPC:
+			/* The partition is full.  Only applies to RWVOLs.
+			 * Translate locally and return ENOSPC.
+			 * No replicas to failover to.
+			 */
+			op->error = -ENOSPC;
+			goto failed_but_online;
+
+		case VOVERQUOTA:
+		case UAEDQUOT:
+			/* Volume is full.  Only applies to RWVOLs.
+			 * Translate locally and return EDQUOT.
+			 * No replicas to failover to.
+			 */
+			op->error = -EDQUOT;
+			goto failed_but_online;
+
 		default:
+			op->error = afs_abort_to_error(op->ac.abort_code);
+		failed_but_online:
 			clear_bit(AFS_VOLUME_OFFLINE, &op->volume->flags);
 			clear_bit(AFS_VOLUME_BUSY, &op->volume->flags);
-			op->error = afs_abort_to_error(op->ac.abort_code);
 			goto failed;
 		}
 


  parent reply	other threads:[~2023-12-13 13:50 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13 13:49 [PATCH v2 00/40] afs: Fix probe handling, server rotation and RO volume callback handling David Howells
2023-12-13 13:49 ` [PATCH v2 01/40] afs: fix the usage of read_seqbegin_or_lock() in afs_lookup_volume_rcu() David Howells
2023-12-13 13:49 ` [PATCH v2 02/40] afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*() David Howells
2023-12-13 13:49 ` [PATCH v2 03/40] afs: use read_seqbegin() in afs_check_validity() and afs_getattr() David Howells
2023-12-13 13:49 ` [PATCH v2 04/40] rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock() David Howells
2023-12-13 13:49 ` [PATCH v2 05/40] afs: Remove whitespace before most ')' from the trace header David Howells
2023-12-13 13:49 ` [PATCH v2 06/40] afs: Automatically generate trace tag enums David Howells
2023-12-13 13:49 ` David Howells [this message]
2023-12-13 13:49 ` [PATCH v2 08/40] afs: Turn the afs_addr_list address array into an array of structs David Howells
2023-12-13 13:49 ` [PATCH v2 09/40] rxrpc, afs: Allow afs to pin rxrpc_peer objects David Howells
2023-12-13 13:49 ` [PATCH v2 10/40] afs: Don't skip server addresses for which we didn't get an RTT reading David Howells
2023-12-13 13:49 ` [PATCH v2 11/40] afs: Rename addr_list::failed to probe_failed David Howells
2023-12-13 13:49 ` [PATCH v2 12/40] afs: Handle the VIO and UAEIO aborts explicitly David Howells
2023-12-13 13:49 ` [PATCH v2 13/40] afs: Use op->nr_iterations=-1 to indicate to begin fileserver iteration David Howells
2023-12-13 13:49 ` [PATCH v2 14/40] afs: Wrap most op->error accesses with inline funcs David Howells
2023-12-13 13:49 ` [PATCH v2 15/40] afs: Don't put afs_call in afs_wait_for_call_to_complete() David Howells
2023-12-13 13:49 ` [PATCH v2 16/40] afs: Simplify error handling David Howells
2023-12-13 13:49 ` [PATCH v2 17/40] afs: Add a tracepoint for struct afs_addr_list David Howells
2023-12-13 13:49 ` [PATCH v2 18/40] afs: Rename some fields David Howells
2023-12-13 13:49 ` [PATCH v2 19/40] afs: Use peer + service_id as call address David Howells
2023-12-13 13:49 ` [PATCH v2 20/40] afs: Fold the afs_addr_cursor struct in David Howells
2023-12-13 13:49 ` [PATCH v2 21/40] rxrpc: Create a procfile to display outstanding client conn bundles David Howells
2023-12-13 13:49 ` [PATCH v2 22/40] afs: Add some more info to /proc/net/afs/servers David Howells
2023-12-13 13:49 ` [PATCH v2 23/40] afs: Remove the unimplemented afs_cmp_addr_list() David Howells
2023-12-13 13:49 ` [PATCH v2 24/40] afs: Provide a way to configure address priorities David Howells
2023-12-13 13:49 ` [PATCH v2 25/40] afs: Mark address lists with configured priorities David Howells
2023-12-13 13:49 ` [PATCH v2 26/40] afs: Dispatch fileserver probes in priority order David Howells
2023-12-13 13:49 ` [PATCH v2 27/40] afs: Dispatch vlserver " David Howells
2023-12-13 13:49 ` [PATCH v2 28/40] afs: Keep a record of the current fileserver endpoint state David Howells
2023-12-13 13:49 ` [PATCH v2 29/40] afs: Combine the endpoint state bools into a bitmask David Howells
2023-12-13 13:49 ` [PATCH v2 30/40] afs: Make it possible to find the volumes that are using a server David Howells
2023-12-13 13:49 ` [PATCH v2 31/40] afs: Defer volume record destruction to a workqueue David Howells
2023-12-13 13:49 ` [PATCH v2 32/40] afs: Move the vnode/volume validity checking code into its own file David Howells
2023-12-13 13:49 ` [PATCH v2 33/40] afs: Apply server breaks to mmap'd files in the call processor David Howells
2023-12-13 13:49 ` [PATCH v2 34/40] afs: Fix comment in afs_do_lookup() David Howells
2023-12-13 13:49 ` [PATCH v2 35/40] afs: Don't leave DONTUSE/NEWREPSITE servers out of server list David Howells
2023-12-13 13:49 ` [PATCH v2 36/40] afs: Parse the VolSync record in the reply of a number of RPC ops David Howells
2023-12-13 13:49 ` [PATCH v2 37/40] afs: Overhaul invalidation handling to better support RO volumes David Howells
2023-12-13 13:50 ` [PATCH v2 38/40] afs: Fix fileserver rotation David Howells
2023-12-13 13:50 ` [PATCH v2 39/40] afs: Fix offline and busy message emission David Howells
2023-12-13 13:50 ` [PATCH v2 40/40] afs: trace: Log afs_make_call(), including server address David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231213135003.367397-8-dhowells@redhat.com \
    --to=dhowells@redhat.com \
    --cc=jaltman@auristor.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.