From: David Howells <dhowells@redhat.com>
To: Marc Dionne <marc.dionne@auristor.com>
Cc: David Howells <dhowells@redhat.com>,
linux-afs@lists.infradead.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
Jeffrey E Altman <jaltman@auristor.com>
Subject: [PATCH 08/41] afs: Add comments on abort handling
Date: Thu, 9 Nov 2023 15:39:31 +0000 [thread overview]
Message-ID: <20231109154004.3317227-9-dhowells@redhat.com> (raw)
In-Reply-To: <20231109154004.3317227-1-dhowells@redhat.com>
Add some comments on AFS abort code handling in the rotation algorithm and
adjust the errors produced to match.
Reported-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
---
fs/afs/rotate.c | 100 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 89 insertions(+), 11 deletions(-)
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index a840c3588ebb..180bcad081dd 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -13,6 +13,7 @@
#include <linux/sched/signal.h>
#include "internal.h"
#include "afs_fs.h"
+#include "protocol_uae.h"
/*
* Begin iteration through a server list, starting with the vnode's last used
@@ -143,6 +144,11 @@ bool afs_select_fileserver(struct afs_operation *op)
case -ECONNABORTED:
/* The far side rejected the operation on some grounds. This
* might involve the server being busy or the volume having been moved.
+ *
+ * Note that various V* errors should not be sent to a cache manager
+ * by a fileserver as they should be translated to more modern UAE*
+ * errors instead. IBM AFS and OpenAFS fileservers, however, do leak
+ * these abort codes.
*/
switch (op->ac.abort_code) {
case VNOVOL:
@@ -150,6 +156,11 @@ bool afs_select_fileserver(struct afs_operation *op)
* - May indicate that the VL is wrong - retry once and compare
* the results.
* - May indicate that the fileserver couldn't attach to the vol.
+ * - The volume might have been temporarily removed so that it can
+ * be replaced by a volume restore. "vos" might have ended one
+ * transaction and has yet to create the next.
+ * - The volume might not be blessed or might not be in-service
+ * (administrative action).
*/
if (op->flags & AFS_OPERATION_VNOVOL) {
op->error = -EREMOTEIO;
@@ -183,16 +194,56 @@ bool afs_select_fileserver(struct afs_operation *op)
_leave(" = t [vnovol]");
return true;
- case VSALVAGE: /* TODO: Should this return an error or iterate? */
case VVOLEXISTS:
- case VNOSERVICE:
case VONLINE:
- case VDISKFULL:
- case VOVERQUOTA:
- op->error = afs_abort_to_error(op->ac.abort_code);
+ /* These should not be returned from the fileserver. */
+ pr_warn("Fileserver returned unexpected abort %d\n",
+ op->ac.abort_code);
+ op->error = -EREMOTEIO;
+ goto next_server;
+
+ case VNOSERVICE:
+ /* Prior to AFS 3.2 VNOSERVICE was returned from the fileserver
+ * if the volume was neither in-service nor administratively
+ * blessed. All usage was replaced by VNOVOL because AFS 3.1 and
+ * earlier cache managers did not handle VNOSERVICE and assumed
+ * it was the client OSes errno 105.
+ *
+ * Starting with OpenAFS 1.4.8 VNOSERVICE was repurposed as the
+ * fileserver idle dead time error which was sent in place of
+ * RX_CALL_TIMEOUT (-3). The error was intended to be sent if the
+ * fileserver took too long to send a reply to the client.
+ * RX_CALL_TIMEOUT would have caused the cache manager to mark the
+ * server down whereas VNOSERVICE since AFS 3.2 would cause cache
+ * manager to temporarily (up to 15 minutes) mark the volume
+ * instance as unusable.
+ *
+ * The idle dead logic resulted in cache inconsistency since a
+ * state changing call that the cache manager assumed was dead
+ * could still be processed to completion by the fileserver. This
+ * logic was removed in OpenAFS 1.8.0 and VNOSERVICE is no longer
+ * returned. However, many 1.4.8 through 1.6.24 fileservers are
+ * still in existence.
+ *
+ * AuriStorFS fileservers have never returned VNOSERVICE.
+ *
+ * VNOSERVICE should be treated as an alias for RX_CALL_TIMEOUT..
+ */
+ case RX_CALL_TIMEOUT:
+ op->error = -ETIMEDOUT;
goto next_server;
+ case VSALVAGING: /* This error should not be leaked to cache managers
+ * but is from OpenAFS demand attach fileservers.
+ * It should be treated as an alias for VOFFLINE.
+ */
+ case VSALVAGE: /* VSALVAGE should be treated as a synonym of VOFFLINE */
case VOFFLINE:
+ /* The volume is in use by the volserver or another volume utility
+ * for an operation that might alter the contents. The volume is
+ * expected to come back but it might take a long time (could be
+ * days).
+ */
if (!test_and_set_bit(AFS_VOLUME_OFFLINE, &op->volume->flags)) {
afs_busy(op->volume, op->ac.abort_code);
clear_bit(AFS_VOLUME_BUSY, &op->volume->flags);
@@ -207,11 +258,19 @@ bool afs_select_fileserver(struct afs_operation *op)
}
goto busy;
- case VSALVAGING:
- case VRESTARTING:
+ case VRESTARTING: /* The fileserver is either shutting down or starting up. */
case VBUSY:
- /* Retry after going round all the servers unless we
- * have a file lock we need to maintain.
+ /* The volume is in use by the volserver or another volume utility
+ * for an operation that is not expected to alter the contents of
+ * the volume. VBUSY should not be returned for a ROVOL or
+ * BACKVOL (but many OpenAFS fileserver versions are broken). The
+ * fileserver is supposed to continue serving content from ROVOLs
+ * and BACKVOLs during an ITBusy transaction because the content
+ * cannot change. The volume is expected to come back but it
+ * might take awhile.
+ *
+ * Retry after going round all the servers unless we have a file
+ * lock we need to maintain.
*/
if (op->flags & AFS_OPERATION_NO_VSLEEP) {
op->error = -EBUSY;
@@ -226,7 +285,7 @@ bool afs_select_fileserver(struct afs_operation *op)
if (!afs_sleep_and_retry(op))
goto failed;
- /* Retry with same server & address */
+ /* Retry with same server & address */
_leave(" = t [vbusy]");
return true;
}
@@ -270,10 +329,29 @@ bool afs_select_fileserver(struct afs_operation *op)
goto restart_from_beginning;
+ case VDISKFULL:
+ case UAENOSPC:
+ /* The partition is full. Only applies to RWVOLs.
+ * Translate locally and return ENOSPC.
+ * No replicas to failover to.
+ */
+ op->error = -ENOSPC;
+ goto failed_but_online;
+
+ case VOVERQUOTA:
+ case UAEDQUOT:
+ /* Volume is full. Only applies to RWVOLs.
+ * Translate locally and return EDQUOT.
+ * No replicas to failover to.
+ */
+ op->error = -EDQUOT;
+ goto failed_but_online;
+
default:
+ op->error = afs_abort_to_error(op->ac.abort_code);
+ failed_but_online:
clear_bit(AFS_VOLUME_OFFLINE, &op->volume->flags);
clear_bit(AFS_VOLUME_BUSY, &op->volume->flags);
- op->error = afs_abort_to_error(op->ac.abort_code);
goto failed;
}
next prev parent reply other threads:[~2023-11-09 15:40 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-09 15:39 [PATCH 00/41] afs: Fix probe handling, server rotation and RO volume callback handling David Howells
2023-11-09 15:39 ` [PATCH 01/41] rxrpc: Fix RTT determination to use PING ACKs as a source David Howells
2023-11-09 17:16 ` Jeffrey E Altman
2023-11-09 22:06 ` David Howells
2023-11-10 14:15 ` Jeffrey E Altman
2023-11-10 16:12 ` Jeffrey E Altman
2023-11-10 17:25 ` David Howells
2023-11-10 21:52 ` Jeffrey E Altman
2023-11-10 21:54 ` Jeffrey E Altman
2023-11-09 15:39 ` [PATCH 02/41] rxrpc: Fix two connection reaping bugs David Howells
2023-11-09 17:27 ` Jeffrey E Altman
2023-11-09 17:50 ` patchwork-bot+netdevbpf
2023-11-09 15:39 ` [PATCH 03/41] rxrpc: Fix some minor issues with bundle tracing David Howells
2023-11-09 15:39 ` [PATCH 04/41] afs: Fix afs_server_list to be cleaned up with RCU David Howells
2023-11-09 15:39 ` [PATCH 05/41] afs: Make error on cell lookup failure consistent with OpenAFS David Howells
2023-11-09 15:39 ` [PATCH 06/41] afs: Remove whitespace before most ')' from the trace header David Howells
2023-11-09 15:39 ` [PATCH 07/41] afs: Automatically generate trace tag enums David Howells
2023-11-09 15:39 ` David Howells [this message]
2023-11-09 17:41 ` [PATCH 08/41] afs: Add comments on abort handling Jeffrey E Altman
2023-11-09 15:39 ` [PATCH 09/41] afs: Turn the afs_addr_list address array into an array of structs David Howells
2023-11-09 15:39 ` [PATCH 10/41] rxrpc, afs: Allow afs to pin rxrpc_peer objects David Howells
2023-11-09 17:48 ` Marc Dionne
2023-11-09 15:39 ` [PATCH 11/41] afs: Don't skip server addresses for which we didn't get an RTT reading David Howells
2023-11-09 15:39 ` [PATCH 12/41] afs: Rename addr_list::failed to probe_failed David Howells
2023-11-09 15:39 ` [PATCH 13/41] afs: Handle the VIO abort explicitly David Howells
2023-11-09 18:12 ` Jeffrey E Altman
2023-11-09 15:39 ` [PATCH 14/41] afs: Use op->nr_iterations=-1 to indicate to begin fileserver iteration David Howells
2023-11-09 15:39 ` [PATCH 15/41] afs: Return ENOENT if no cell DNS record can be found David Howells
2023-11-09 15:39 ` [PATCH 16/41] afs: Wrap most op->error accesses with inline funcs David Howells
2023-11-09 15:39 ` [PATCH 17/41] afs: Don't put afs_call in afs_wait_for_call_to_complete() David Howells
2023-11-09 15:39 ` [PATCH 18/41] afs: Simplify error handling David Howells
2023-11-09 15:39 ` [PATCH 19/41] afs: Add a tracepoint for struct afs_addr_list David Howells
2023-11-09 15:39 ` [PATCH 20/41] afs: Rename some fields David Howells
2023-11-09 15:39 ` [PATCH 21/41] afs: Use peer + service_id as call address David Howells
2023-11-09 15:39 ` [PATCH 22/41] afs: Fold the afs_addr_cursor struct in David Howells
2023-11-09 15:39 ` [PATCH 23/41] rxrpc: Create a procfile to display outstanding clien conn bundles David Howells
2023-11-09 18:20 ` Jeffrey E Altman
2023-11-09 15:39 ` [PATCH 24/41] afs: Add some more info to /proc/net/afs/servers David Howells
2023-11-09 15:39 ` [PATCH 25/41] afs: Remove the unimplemented afs_cmp_addr_list() David Howells
2023-11-09 15:39 ` [PATCH 26/41] afs: Provide a way to configure address priorities David Howells
2023-11-09 15:39 ` [PATCH 27/41] afs: Mark address lists with configured priorities David Howells
2023-11-09 15:39 ` [PATCH 28/41] afs: Dispatch fileserver probes in priority order David Howells
2023-11-09 15:39 ` [PATCH 29/41] afs: Dispatch vlserver " David Howells
2023-11-09 15:39 ` [PATCH 30/41] afs: Keep a record of the current fileserver endpoint state David Howells
2023-11-09 15:39 ` [PATCH 31/41] afs: Combine the endpoint state bools into a bitmask David Howells
2023-11-09 15:39 ` [PATCH 32/41] afs: Fix file locking on R/O volumes to operate in local mode David Howells
2023-11-09 15:39 ` [PATCH 33/41] afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY David Howells
2023-11-09 15:39 ` [PATCH 34/41] afs: Make it possible to find the volumes that are using a server David Howells
2023-11-09 15:39 ` [PATCH 35/41] afs: Defer volume record destruction to a workqueue David Howells
2023-11-09 15:39 ` [PATCH 36/41] afs: Move the vnode/volume validity checking code into its own file David Howells
2023-11-09 15:40 ` [PATCH 37/41] afs: Apply server breaks to mmap'd files in the call processor David Howells
2023-11-09 15:40 ` [PATCH 38/41] afs: Parse the VolSync record in the reply of a number of RPC ops David Howells
2023-11-09 15:40 ` [PATCH 39/41] afs: Overhaul invalidation handling to better support RO volumes David Howells
2023-11-09 19:00 ` Jeffrey E Altman
2023-11-13 15:58 ` [PATCH 42/41] afs: Fix the handling of " David Howells
2023-11-09 15:40 ` [PATCH 40/41] afs: Fix fileserver rotation David Howells
2023-11-09 15:40 ` [PATCH 41/41] afs: Fix offline and busy handling David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231109154004.3317227-9-dhowells@redhat.com \
--to=dhowells@redhat.com \
--cc=jaltman@auristor.com \
--cc=linux-afs@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.dionne@auristor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).