linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support
@ 2017-11-17 21:00 Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
                   ` (27 more replies)
  0 siblings, 28 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

This is the tenth revision of a patchset that adds to XFS userland tools
support for online metadata scrubbing and repair.

We start by creating the basic shell of the program that can do argument
parsing and error reporting, create some abstractions for the XFS ioctls
that we use to iterate and scrub metadata, and then tie together all the
in-kernel scrubbing in separate scrub phases.

Next, we move on to checking the directory tree for connectivity and
naming problems and add the infrastructure to perform an (optional) scan
of the in-use parts of the disk media.  We also implement a minimal
preen -- if the fs checks out, we can try to run fstrim; and some basic
progress reporting if the program is running interactively.

Finally, we add some wrapper scripts to schedule scrubs of all the
mounted filesystems; and the necessary systemd / cron infrastructure
that is needed to automatically scan everything once a week.  All of
this is disabled by default.  The systemd integration allows us to give
scrub exactly the privileges it needs while walling off the rest of the
system.

If you're going to start using this mess, you probably ought to just
pull from my git tree for xfsprogs[1].  This series relies on the
libfrog patches sent earlier.  Kernel support will appear in 4.15-rc1.

Comments and questions are, as always, welcome.

--D

[1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/27] xfs_scrub: create online filesystem scrub program
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the foundations of a filesystem scrubbing tool that asks the
kernel to inspect all metadata in the filesystem and (ultimately) to
repair anything that's broken.  Also create the man page for the
utility.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .gitignore           |    1 
 Makefile             |    3 +
 man/man8/xfs_scrub.8 |  117 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile       |   42 ++++++++++++++++++
 scrub/common.c       |   20 +++++++++
 scrub/common.h       |   23 ++++++++++
 scrub/xfs_scrub.c    |  109 +++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h    |   23 ++++++++++
 8 files changed, 337 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/common.c
 create mode 100644 scrub/common.h
 create mode 100644 scrub/xfs_scrub.c
 create mode 100644 scrub/xfs_scrub.h


diff --git a/.gitignore b/.gitignore
index e839e2a..a3db640 100644
--- a/.gitignore
+++ b/.gitignore
@@ -68,6 +68,7 @@ cscope.*
 /repair/xfs_repair
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
+/scrub/xfs_scrub
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/Makefile b/Makefile
index 0dce80a..3bd0796 100644
--- a/Makefile
+++ b/Makefile
@@ -48,7 +48,7 @@ LIBFROG_SUBDIR = libfrog
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian spaceman
+		mdrestore repair rtcp m4 man doc debian spaceman scrub
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -91,6 +91,7 @@ repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
 spaceman: libxcmd
+scrub: libhandle libxcmd
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..95f4fea
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,117 @@
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-abemnTvVxy
+]
+.I mount-point
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to check and repair all metadata in a mounted XFS filesystem.
+.PP
+.B xfs_scrub
+asks the kernel to scrub all metadata objects in the filesystem.
+Metadata records are scanned for obviously bad values and then
+cross-referenced against other metadata.
+The goal is to establish a threasonable confidence about the consistency
+of the overall filesystem by examining the consistency of individual
+metadata records against the other metadata in the filesystem across the
+entire filesystem.
+Damaged metadata can be rebuilt from other metadata if there is
+sufficient redundancy (and no other corruption) in the metadata.
+.PP
+This utility does not know how to correct all errors.
+If the tool cannot fix the detected errors, you must unmount the
+filesystem and run
+.B xfs_repair
+to fix the problems.
+If this tool is not run with either of the
+.B \-n
+or
+.B \-y
+options, then it will optimize the filesystem when possible,
+but it will not try to fix errors.
+.SH OPTIONS
+.TP
+.BI \-a " errors"
+Abort if more than this many errors are found on the filesystem.
+.TP
+.B \-b
+Run in background mode.
+If the option is specified once, only run a single scrubbing thread at a
+time.
+If given more than once, an artificial delay of 100us is added to each
+scrub call to reduce CPU overhead even further.
+.TP
+.B \-e
+Specifies what happens when errors are detected.
+If
+.IR shutdown
+is given, the filesystem will be taken offline if errors are found.
+Not all backends can shut down a filesystem.
+If
+.IR continue
+is given, no action taken if errors are found.
+This is the default.
+.TP
+.BI \-m " file"
+Search this file for mounted filesystems instead of /etc/mtab.
+.TP
+.B \-n
+Dry run, do not modify anything in the filesystem.
+This disables all preening and optimization behaviors, and disables
+calling FITRIM on the free space after a successful run.
+.TP
+.BI \-T
+Print timing and memory usage information for each phase.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Scrub all file data too.
+The block list will be sorted in disk order for better performance.
+.B xfs_scrub
+will issue O_DIRECT reads to the block device directly.
+If the block device is a SCSI disk, it will issue READ VERIFY commands
+directly to the disk.
+.TP
+.B \-y
+Try to repair all filesystem errors.
+If the errors cannot be fixed online, then the filesystem must be taken
+offline for repair.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	1\	\-\ File system errors left uncorrected
+.br
+\	2\	\-\ File system optimizations possible
+.br
+\	4\	\-\ Operational error
+.br
+\	8\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is an immature utility!
+This program takes advantage of in-kernel scrubbing to verify a given
+data structure with locks held.
+The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
+GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.
+This can tie up the system for a while.
+.PP
+If errors are found and cannot be repaired, the filesystem must be taken
+offline and repaired.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..204b0a1
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,42 @@
+#
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+# On linux we get fsmap from the system or define it ourselves
+# so include this based on platform type.  If this reverts to only
+# the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
+SCRUB_PREREQS=$(PKG_PLATFORM)
+
+ifeq ($(SCRUB_PREREQS),linux)
+LTCOMMAND = xfs_scrub
+INSTALL_SCRUB = install-scrub
+endif	# scrub_prereqs
+
+HFILES = \
+common.h \
+xfs_scrub.h
+
+CFILES = \
+common.c \
+xfs_scrub.c
+
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDFLAGS = -static
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default $(INSTALL_SCRUB)
+
+install-scrub:
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+
+install-dev:
+
+-include .dep
diff --git a/scrub/common.c b/scrub/common.c
new file mode 100644
index 0000000..57ad182
--- /dev/null
+++ b/scrub/common.c
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "common.h"
diff --git a/scrub/common.h b/scrub/common.h
new file mode 100644
index 0000000..f29e4d3
--- /dev/null
+++ b/scrub/common.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COMMON_H_
+#define XFS_SCRUB_COMMON_H_
+
+#endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
new file mode 100644
index 0000000..6344428
--- /dev/null
+++ b/scrub/xfs_scrub.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include "xfs_scrub.h"
+
+/*
+ * XFS Online Metadata Scrub (and Repair)
+ *
+ * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
+ * internals of the filesystem.  It takes advantage of scrubbing ioctls
+ * to check all the records stored in a metadata object and to
+ * cross-reference those records against the other filesystem metadata.
+ *
+ * After the program gathers command line arguments to figure out
+ * exactly what the user wants the program is going to do, scrub
+ * execution is split up into several separate phases:
+ *
+ * The "find geometry" phase queries XFS for the filesystem geometry.
+ * The block devices for the data, realtime, and log devices are opened.
+ * Kernel ioctls are test-queried to see if they actually work (the scrub
+ * ioctl in particular), and any other filesystem-specific information
+ * is gathered.
+ *
+ * In the "check internal metadata" phase, we call the metadata scrub
+ * ioctl to check the filesystem's internal per-AG btrees.  This
+ * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
+ * btrees, the regular and free inode btrees, the reverse mapping
+ * btrees, and the reference counting btrees.  If the realtime device is
+ * enabled, the realtime bitmap and reverse mapping btrees are enabled.
+ * Quotas, if enabled, are also checked in this phase.
+ *
+ * Each AG (and the realtime device) has its metadata checked in a
+ * separate thread for better performance.  Errors in the internal
+ * metadata can be fixed here prior to the inode scan; refer to the
+ * section about the "repair filesystem" phase for more information.
+ *
+ * The "scan all inodes" phase uses BULKSTAT to scan all the inodes in
+ * an AG in disk order.  The BULKSTAT information provides enough
+ * information to construct a file handle that is used to check the
+ * following parts of every file:
+ *
+ *  - The inode record
+ *  - All three block forks (data, attr, CoW)
+ *  - If it's a symlink, the symlink target.
+ *  - If it's a directory, the directory entries.
+ *  - All extended attributes
+ *  - The parent pointer
+ *
+ * Multiple threads are started to check each the inodes of each AG in
+ * parallel.  Errors in file metadata can be fixed here; see the section
+ * about the "repair filesystem" phase for more information.
+ *
+ * Next comes the (configurable) "repair filesystem" phase.  The user
+ * can instruct this program to fix all problems encountered; to fix
+ * only optimality problems and leave the corruptions; or not to touch
+ * the filesystem at all.  Any metadata repairs that did not succeed in
+ * the previous two phases are retried here; if there are uncorrectable
+ * errors, xfs_scrub stops here.
+ *
+ * The next phase is the "check directory tree" phase.  In this phase,
+ * every directory is opened (via file handle) to confirm that each
+ * directory is connected to the root.  Directory entries are checked
+ * for ambiguous Unicode normalization mappings, which is to say that we
+ * look for pairs of entries whose utf-8 strings normalize to the same
+ * code point sequence and map to different inodes, because that could
+ * be used to trick a user into opening the wrong file.  The names of
+ * extended attributes are checked for Unicode normalization collisions.
+ *
+ * In the "verify data file integrity" phase, we employ GETFSMAP to read
+ * the reverse-mappings of all AGs and issue direct-reads of the
+ * underlying disk blocks.  We rely on the underlying storage to have
+ * checksummed the data blocks appropriately.  Multiple threads are
+ * started to check each AG in parallel; a separate thread pool is used
+ * to handle the direct reads.
+ *
+ * In the "check summary counters" phase, use GETFSMAP to tally up the
+ * blocks and BULKSTAT to tally up the inodes we saw and compare that to
+ * the statfs output.  This gives the user a rough estimate of how
+ * thorough the scrub was.
+ */
+
+/* Program name; needed for libxcmd error reports. */
+char				*progname = "xfs_scrub";
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	fprintf(stderr, "XXX: This program is not complete!\n");
+	return 4;
+}
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
new file mode 100644
index 0000000..f1ce4d2
--- /dev/null
+++ b/scrub/xfs_scrub.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_XFS_SCRUB_H_
+#define XFS_SCRUB_XFS_SCRUB_H_
+
+#endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/27] xfs_scrub: common error handling
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Standardize how we record and report errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |  141 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/common.h    |   28 +++++++++++
 scrub/xfs_scrub.c |    8 +++
 scrub/xfs_scrub.h |   12 +++++
 4 files changed, 189 insertions(+)


diff --git a/scrub/common.c b/scrub/common.c
index 57ad182..d739169 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -17,4 +17,145 @@
  * along with this program; if not, write the Free Software Foundation,
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
  */
+#include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_scrub.h"
 #include "common.h"
+
+/*
+ * Reporting Status to the Console
+ *
+ * We aim for a roughly standard reporting format -- the severity of the
+ * status being reported, a textual description of the objecting being
+ * reported, and whatever the status happens to be.
+ *
+ * Errors are the most severe and reflect filesystem corruption.
+ * Warnings indicate that something is amiss and needs the attention of
+ * the administrator, but does not constitute a corruption.  Information
+ * is merely advisory.
+ */
+
+/* Too many errors? Bail out. */
+bool
+xfs_scrub_excessive_errors(
+	struct scrub_ctx	*ctx)
+{
+	bool			ret;
+
+	pthread_mutex_lock(&ctx->lock);
+	ret = ctx->max_errors > 0 && ctx->errors_found >= ctx->max_errors;
+	pthread_mutex_unlock(&ctx->lock);
+
+	return ret;
+}
+
+/* Print an error string and whatever error is stored in errno. */
+void
+__str_errno(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line)
+{
+	char			buf[DESCR_BUFSZ];
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Error: %s: %s."), descr,
+			strerror_r(errno, buf, DESCR_BUFSZ));
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->runtime_errors++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print an error string and some error text. */
+void
+__str_error(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Error: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->errors_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print a warning string and some warning text. */
+void
+__str_warn(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Warning: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stderr, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Print an informational string and some informational text. */
+void
+__str_info(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line,
+	const char		*format,
+	...)
+{
+	va_list			args;
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stdout, _("Info: %s: "), descr);
+	va_start(args, format);
+	vfprintf(stdout, format, args);
+	va_end(args);
+	if (debug)
+		fprintf(stdout, _(" (%s line %d)"), file, line);
+	fprintf(stdout, "\n");
+	fflush(stdout);
+	pthread_mutex_unlock(&ctx->lock);
+}
+
+/* Catch fatal errors from pieces we import from xfs_repair. */
+void __attribute__((noreturn))
+do_error(char const *msg, ...)
+{
+	va_list args;
+
+	fprintf(stderr, _("\nfatal error -- "));
+
+	va_start(args, msg);
+	vfprintf(stderr, msg, args);
+	va_end(args);
+	if (dumpcore)
+		abort();
+	exit(1);
+}
diff --git a/scrub/common.h b/scrub/common.h
index f29e4d3..b7c5f47 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -20,4 +20,32 @@
 #ifndef XFS_SCRUB_COMMON_H_
 #define XFS_SCRUB_COMMON_H_
 
+/*
+ * When reporting a defective metadata object to the console, this
+ * is the size of the buffer to use to store the description of that
+ * item.
+ */
+#define DESCR_BUFSZ	256
+
+bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
+
+void __str_errno(struct scrub_ctx *ctx, const char *descr, const char *file,
+		 int line);
+void __str_error(struct scrub_ctx *ctx, const char *descr, const char *file,
+		 int line, const char *format, ...);
+void __str_warn(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __str_info(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __record_repair(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
+		int line, const char *format, ...);
+
+#define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
+#define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 6344428..d32f26c 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -18,6 +18,8 @@
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 #include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
 #include "xfs_scrub.h"
 
 /*
@@ -99,6 +101,12 @@
 /* Program name; needed for libxcmd error reports. */
 char				*progname = "xfs_scrub";
 
+/* Debug level; higher values mean more verbosity. */
+unsigned int			debug;
+
+/* Should we dump core if errors happen? */
+bool				dumpcore;
+
 int
 main(
 	int			argc,
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index f1ce4d2..6c03167 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -20,4 +20,16 @@
 #ifndef XFS_SCRUB_XFS_SCRUB_H_
 #define XFS_SCRUB_XFS_SCRUB_H_
 
+extern unsigned int		debug;
+extern bool			dumpcore;
+
+struct scrub_ctx {
+	/* Mutable scrub state; use lock. */
+	pthread_mutex_t		lock;
+	unsigned long long	max_errors;
+	unsigned long long	runtime_errors;
+	unsigned long long	errors_found;
+	unsigned long long	warnings_found;
+};
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/27] xfs_scrub: set up command line argument parsing
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Parse command line options in order to set up the context in which we
will scrub the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.h    |    8 ++
 scrub/xfs_scrub.c |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h |   34 +++++++++
 3 files changed, 249 insertions(+)


diff --git a/scrub/common.h b/scrub/common.h
index b7c5f47..b601680 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -48,4 +48,12 @@ void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
 #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
 
+/* Is this debug tweak enabled? */
+static inline bool
+debug_tweak_on(
+	const char		*name)
+{
+	return debug && getenv(name) != NULL;
+}
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index d32f26c..efaccad 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -20,7 +20,12 @@
 #include <stdio.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <stdlib.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "input.h"
 #include "xfs_scrub.h"
+#include "common.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -107,11 +112,213 @@ unsigned int			debug;
 /* Should we dump core if errors happen? */
 bool				dumpcore;
 
+/* Display resource usage at the end of each phase? */
+bool				display_rusage;
+
+/* Background mode; higher values insert more pauses between scrub calls. */
+unsigned int			bg_mode;
+
+/* Maximum number of processors available to us. */
+int				nproc;
+
+/* Number of threads we're allowed to use. */
+unsigned int			nr_threads;
+
+/* Verbosity; higher values print more information. */
+bool				verbose;
+
+/* Should we scrub the data blocks? */
+bool				scrub_data;
+
+/* Size of a memory page. */
+long				page_size;
+
+static void __attribute__((noreturn))
+usage(void)
+{
+	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
+	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
+	fprintf(stderr, _("-b:\tBackground mode.\n"));
+	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
+	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
+	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
+	fprintf(stderr, _("-T:\tDisplay timing/usage information.\n"));
+	fprintf(stderr, _("-v:\tVerbose output.\n"));
+	fprintf(stderr, _("-V:\tPrint version.\n"));
+	fprintf(stderr, _("-x:\tScrub file data too.\n"));
+	fprintf(stderr, _("-y:\tRepair all errors.\n"));
+
+	exit(16);
+}
+
 int
 main(
 	int			argc,
 	char			**argv)
 {
+	int			c;
+	char			*mtab = NULL;
+	char			*repairstr = "";
+	struct scrub_ctx	ctx = {0};
+	unsigned long long	total_errors;
+	bool			moveon = true;
+	static bool		injected;
+	int			ret = 0;
+
 	fprintf(stderr, "XXX: This program is not complete!\n");
 	return 4;
+
+	progname = basename(argv[0]);
+	setlocale(LC_ALL, "");
+	bindtextdomain(PACKAGE, LOCALEDIR);
+	textdomain(PACKAGE);
+
+	pthread_mutex_init(&ctx.lock, NULL);
+	ctx.mode = SCRUB_MODE_DEFAULT;
+	ctx.error_action = ERRORS_CONTINUE;
+	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
+		switch (c) {
+		case 'a':
+			ctx.max_errors = cvt_u64(optarg, 10);
+			if (errno) {
+				perror(optarg);
+				usage();
+			}
+			break;
+		case 'b':
+			nr_threads = 1;
+			bg_mode++;
+			break;
+		case 'd':
+			debug++;
+			dumpcore = true;
+			break;
+		case 'e':
+			if (!strcmp("continue", optarg))
+				ctx.error_action = ERRORS_CONTINUE;
+			else if (!strcmp("shutdown", optarg))
+				ctx.error_action = ERRORS_SHUTDOWN;
+			else
+				usage();
+			break;
+		case 'm':
+			mtab = optarg;
+			break;
+		case 'n':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_DRY_RUN;
+			break;
+		case 'T':
+			display_rusage = true;
+			break;
+		case 'v':
+			verbose = true;
+			break;
+		case 'V':
+			fprintf(stdout, _("%s version %s\n"), progname,
+					VERSION);
+			fflush(stdout);
+			exit(0);
+		case 'x':
+			scrub_data = true;
+			break;
+		case 'y':
+			if (ctx.mode != SCRUB_MODE_DEFAULT) {
+				fprintf(stderr,
+_("Only one of the options -n or -y may be specified.\n"));
+				return 1;
+			}
+			ctx.mode = SCRUB_MODE_REPAIR;
+			break;
+		case '?':
+			/* fall through */
+		default:
+			usage();
+		}
+	}
+
+	/* Override thread count if debugger */
+	if (debug_tweak_on("XFS_SCRUB_THREADS")) {
+		unsigned int	x;
+
+		x = cvt_u32(getenv("XFS_SCRUB_THREADS"), 10);
+		if (errno) {
+			perror("nr_threads");
+			usage();
+		}
+		nr_threads = x;
+	}
+
+	if (optind != argc - 1)
+		usage();
+
+	ctx.mntpoint = strdup(argv[optind]);
+
+	/*
+	 * If the user did not specify an explicit mount table, try to use
+	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
+	 * /proc/mounts because it is kernel controlled, while /etc/mtab
+	 * may contain garbage that userspace tools like pam_mounts wrote
+	 * into it.
+	 */
+	if (!mtab) {
+		if (access(_PATH_PROC_MOUNTS, R_OK) == 0)
+			mtab = _PATH_PROC_MOUNTS;
+		else
+			mtab = _PATH_MOUNTED;
+	}
+
+	/* How many CPUs? */
+	nproc = sysconf(_SC_NPROCESSORS_ONLN);
+	if (nproc < 0)
+		nproc = 1;
+
+	/* Set up a page-aligned buffer for read verification. */
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 0) {
+		str_errno(&ctx, ctx.mntpoint);
+		goto out;
+	}
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !injected) {
+		ctx.mode = SCRUB_MODE_REPAIR;
+		injected = true;
+	}
+
+	if (xfs_scrub_excessive_errors(&ctx))
+		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
+
+	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
+		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
+
+out:
+	total_errors = ctx.errors_found + ctx.runtime_errors;
+	if (ctx.need_repair)
+		repairstr = _("  Unmount and run xfs_repair.");
+	if (total_errors && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %llu errors and %llu warnings found.%s\n"),
+			ctx.mntpoint, total_errors, ctx.warnings_found,
+			repairstr);
+	else if (total_errors && ctx.warnings_found == 0)
+		fprintf(stderr,
+_("%s: %llu errors found.%s\n"),
+			ctx.mntpoint, total_errors, repairstr);
+	else if (total_errors == 0 && ctx.warnings_found)
+		fprintf(stderr,
+_("%s: %llu warnings found.\n"),
+			ctx.mntpoint, ctx.warnings_found);
+	if (ctx.errors_found)
+		ret |= 1;
+	if (ctx.warnings_found)
+		ret |= 2;
+	if (ctx.runtime_errors)
+		ret |= 4;
+	free(ctx.mntpoint);
+
+	return ret;
 }
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 6c03167..3bc2b63 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -20,16 +20,50 @@
 #ifndef XFS_SCRUB_XFS_SCRUB_H_
 #define XFS_SCRUB_XFS_SCRUB_H_
 
+#define _PATH_PROC_MOUNTS	"/proc/mounts"
+
+extern unsigned int		nr_threads;
+extern unsigned int		bg_mode;
 extern unsigned int		debug;
+extern int			nproc;
+extern bool			display_rusage;
 extern bool			dumpcore;
+extern bool			verbose;
+extern bool			scrub_data;
+extern long			page_size;
+
+enum scrub_mode {
+	SCRUB_MODE_DRY_RUN,
+	SCRUB_MODE_PREEN,
+	SCRUB_MODE_REPAIR,
+};
+#define SCRUB_MODE_DEFAULT			SCRUB_MODE_PREEN
+
+enum error_action {
+	ERRORS_CONTINUE,
+	ERRORS_SHUTDOWN,
+};
 
 struct scrub_ctx {
+	/* Immutable scrub state. */
+
+	/* Strings we need for presentation */
+	char			*mntpoint;
+	char			*blkdev;
+
+	/* What does the user want us to do? */
+	enum scrub_mode		mode;
+
+	/* How does the user want us to react to errors? */
+	enum error_action	error_action;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
+	bool			need_repair;
 };
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the dispatching routines that we'll use to call out to each
separate phase of the program.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    1 
 include/builddefs.in  |    1 
 m4/package_libcdev.m4 |   18 +++
 scrub/Makefile        |    4 +
 scrub/common.c        |   63 +++++++++++
 scrub/common.h        |    4 +
 scrub/xfs_scrub.c     |  275 +++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 366 insertions(+)


diff --git a/configure.ac b/configure.ac
index cb23fb8..960a40a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -162,6 +162,7 @@ AC_HAVE_FSETXATTR
 AC_HAVE_MREMAP
 AC_NEED_INTERNAL_FSXATTR
 AC_HAVE_GETFSMAP
+AC_HAVE_MALLINFO
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index 2d7b199..4be2efb 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -115,6 +115,7 @@ HAVE_FSETXATTR = @have_fsetxattr@
 HAVE_MREMAP = @have_mremap@
 NEED_INTERNAL_FSXATTR = @need_internal_fsxattr@
 HAVE_GETFSMAP = @have_getfsmap@
+HAVE_MALLINFO = @have_mallinfo@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index fdf9d69..91e1959 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -314,3 +314,21 @@ AC_DEFUN([AC_HAVE_GETFSMAP],
        AC_MSG_RESULT(no))
     AC_SUBST(have_getfsmap)
   ])
+
+#
+# Check if we have a mallinfo libc call
+#
+AC_DEFUN([AC_HAVE_MALLINFO],
+  [ AC_MSG_CHECKING([for mallinfo ])
+    AC_TRY_COMPILE([
+#include <malloc.h>
+    ], [
+         struct mallinfo test;
+
+         test.arena = 0; test.hblkhd = 0; test.uordblks = 0; test.fordblks = 0;
+         test = mallinfo();
+    ], have_mallinfo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_mallinfo)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index 204b0a1..ac0af94 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -27,6 +27,10 @@ LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
 LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
 LLDFLAGS = -static
 
+ifeq ($(HAVE_MALLINFO),yes)
+LCFLAGS += -DHAVE_MALLINFO
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/scrub/common.c b/scrub/common.c
index d739169..649b1c9 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -159,3 +159,66 @@ do_error(char const *msg, ...)
 		abort();
 	exit(1);
 }
+
+double
+timeval_subtract(
+	struct timeval		*tv1,
+	struct timeval		*tv2)
+{
+	return ((tv1->tv_sec - tv2->tv_sec) +
+		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
+}
+
+/* Produce human readable disk space output. */
+double
+auto_space_units(
+	unsigned long long	bytes,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (bytes > (1ULL << 40)) {
+		*units = "TiB";
+		return (double)bytes / (1ULL << 40);
+	} else if (bytes > (1ULL << 30)) {
+		*units = "GiB";
+		return (double)bytes / (1ULL << 30);
+	} else if (bytes > (1ULL << 20)) {
+		*units = "MiB";
+		return (double)bytes / (1ULL << 20);
+	} else if (bytes > (1ULL << 10)) {
+		*units = "KiB";
+		return (double)bytes / (1ULL << 10);
+	}
+
+no_prefix:
+	*units = "B";
+	return bytes;
+}
+
+/* Produce human readable discrete number output. */
+double
+auto_units(
+	unsigned long long	number,
+	char			**units)
+{
+	if (debug > 1)
+		goto no_prefix;
+	if (number > 1000000000000ULL) {
+		*units = "T";
+		return number / 1000000000000.0;
+	} else if (number > 1000000000ULL) {
+		*units = "G";
+		return number / 1000000000.0;
+	} else if (number > 1000000ULL) {
+		*units = "M";
+		return number / 1000000.0;
+	} else if (number > 1000ULL) {
+		*units = "K";
+		return number / 1000.0;
+	}
+
+no_prefix:
+	*units = "";
+	return number;
+}
diff --git a/scrub/common.h b/scrub/common.h
index b601680..70a3b9d 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -56,4 +56,8 @@ debug_tweak_on(
 	return debug && getenv(name) != NULL;
 }
 
+double timeval_subtract(struct timeval *tv1, struct timeval *tv2);
+double auto_space_units(unsigned long long kilobytes, char **units);
+double auto_units(unsigned long long number, char **units);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index efaccad..5466c58 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -21,6 +21,8 @@
 #include <pthread.h>
 #include <stdbool.h>
 #include <stdlib.h>
+#include <sys/time.h>
+#include <sys/resource.h>
 #include "platform_defs.h"
 #include "xfs.h"
 #include "input.h"
@@ -151,6 +153,267 @@ usage(void)
 	exit(16);
 }
 
+#ifndef RUSAGE_BOTH
+# define RUSAGE_BOTH		(-2)
+#endif
+
+/* Get resource usage for ourselves and all children. */
+static int
+scrub_getrusage(
+	struct rusage		*usage)
+{
+	struct rusage		cusage;
+	int			err;
+
+	err = getrusage(RUSAGE_BOTH, usage);
+	if (!err)
+		return err;
+
+	err = getrusage(RUSAGE_SELF, usage);
+	if (err)
+		return err;
+
+	err = getrusage(RUSAGE_CHILDREN, &cusage);
+	if (err)
+		return err;
+
+	usage->ru_minflt += cusage.ru_minflt;
+	usage->ru_majflt += cusage.ru_majflt;
+	usage->ru_nswap += cusage.ru_nswap;
+	usage->ru_inblock += cusage.ru_inblock;
+	usage->ru_oublock += cusage.ru_oublock;
+	usage->ru_msgsnd += cusage.ru_msgsnd;
+	usage->ru_msgrcv += cusage.ru_msgrcv;
+	usage->ru_nsignals += cusage.ru_nsignals;
+	usage->ru_nvcsw += cusage.ru_nvcsw;
+	usage->ru_nivcsw += cusage.ru_nivcsw;
+	return 0;
+}
+
+/*
+ * Scrub Phase Dispatch
+ *
+ * The operations of the scrub program are split up into several
+ * different phases.  Each phase builds upon the metadata checked in the
+ * previous phase, which is to say that we may skip phase (X + 1) if our
+ * scans in phase (X) reveal corruption.  A phase may be skipped
+ * entirely.
+ */
+
+/* Resource usage for each phase. */
+struct phase_rusage {
+	struct rusage		ruse;
+	struct timeval		time;
+	unsigned long long	verified_bytes;
+	void			*brk_start;
+	const char		*descr;
+};
+
+/* Operations for each phase. */
+#define DATASCAN_DUMMY_FN	((void *)1)
+#define REPAIR_DUMMY_FN		((void *)2)
+struct phase_ops {
+	char		*descr;
+	bool		(*fn)(struct scrub_ctx *);
+	bool		must_run;
+};
+
+/* Start tracking resource usage for a phase. */
+static bool
+phase_start(
+	struct phase_rusage	*pi,
+	unsigned int		phase,
+	const char		*descr)
+{
+	int			error;
+
+	memset(pi, 0, sizeof(*pi));
+	error = scrub_getrusage(&pi->ruse);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+	pi->brk_start = sbrk(0);
+
+	error = gettimeofday(&pi->time, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+
+	pi->descr = descr;
+	if ((verbose || display_rusage) && descr) {
+		fprintf(stdout, _("Phase %u: %s\n"), phase, descr);
+		fflush(stdout);
+	}
+	return true;
+}
+
+/* Report usage stats. */
+static bool
+phase_end(
+	struct phase_rusage	*pi,
+	unsigned int		phase)
+{
+	struct rusage		ruse_now;
+#ifdef HAVE_MALLINFO
+	struct mallinfo		mall_now;
+#endif
+	struct timeval		time_now;
+	char			phasebuf[DESCR_BUFSZ];
+	double			dt;
+	unsigned long long	in, out;
+	unsigned long long	io;
+	double			i, o, t;
+	double			din, dout, dtot;
+	char			*iu, *ou, *tu, *dinu, *doutu, *dtotu;
+	int			error;
+
+	if (!display_rusage)
+		return true;
+
+	error = gettimeofday(&time_now, NULL);
+	if (error) {
+		perror(_("gettimeofday"));
+		return false;
+	}
+	dt = timeval_subtract(&time_now, &pi->time);
+
+	error = scrub_getrusage(&ruse_now);
+	if (error) {
+		perror(_("getrusage"));
+		return false;
+	}
+
+	if (phase)
+		snprintf(phasebuf, DESCR_BUFSZ, _("Phase %u: "), phase);
+	else
+		phasebuf[0] = 0;
+
+#define kbytes(x)	(((unsigned long)(x) + 1023) / 1024)
+#ifdef HAVE_MALLINFO
+
+	mall_now = mallinfo();
+	fprintf(stdout, _("%sMemory used: %luk/%luk (%luk/%luk), "),
+		phasebuf,
+		kbytes(mall_now.arena), kbytes(mall_now.hblkhd),
+		kbytes(mall_now.uordblks), kbytes(mall_now.fordblks));
+#else
+	fprintf(stdout, _("%sMemory used: %luk, "),
+		phasebuf,
+		(unsigned long) kbytes(((char *) sbrk(0)) -
+				       ((char *) pi->brk_start)));
+#endif
+#undef kbytes
+
+	fprintf(stdout, _("time: %5.2f/%5.2f/%5.2fs\n"),
+		timeval_subtract(&time_now, &pi->time),
+		timeval_subtract(&ruse_now.ru_utime, &pi->ruse.ru_utime),
+		timeval_subtract(&ruse_now.ru_stime, &pi->ruse.ru_stime));
+
+	/* I/O usage */
+	in =  ((unsigned long long)ruse_now.ru_inblock -
+			pi->ruse.ru_inblock) << BBSHIFT;
+	out = ((unsigned long long)ruse_now.ru_oublock -
+			pi->ruse.ru_oublock) << BBSHIFT;
+	io = in + out;
+	if (io) {
+		i = auto_space_units(in, &iu);
+		o = auto_space_units(out, &ou);
+		t = auto_space_units(io, &tu);
+		din = auto_space_units(in / dt, &dinu);
+		dout = auto_space_units(out / dt, &doutu);
+		dtot = auto_space_units(io / dt, &dtotu);
+		fprintf(stdout,
+_("%sI/O: %.1f%s in, %.1f%s out, %.1f%s tot\n"),
+			phasebuf, i, iu, o, ou, t, tu);
+		fprintf(stdout,
+_("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
+			phasebuf, din, dinu, dout, doutu, dtot, dtotu);
+	}
+	fflush(stdout);
+
+	return true;
+}
+
+/* Run all the phases of the scrubber. */
+static bool
+run_scrub_phases(
+	struct scrub_ctx	*ctx)
+{
+	struct phase_ops phases[] =
+	{
+		{
+			.descr = _("Find filesystem geometry."),
+		},
+		{
+			.descr = _("Check internal metadata."),
+		},
+		{
+			.descr = _("Scan all inodes."),
+		},
+		{
+			.descr = _("Defer filesystem repairs."),
+			.fn = REPAIR_DUMMY_FN,
+		},
+		{
+			.descr = _("Check directory tree."),
+		},
+		{
+			.descr = _("Verify data file integrity."),
+			.fn = DATASCAN_DUMMY_FN,
+		},
+		{
+			.descr = _("Check summary counters."),
+		},
+		{
+			NULL
+		},
+	};
+	struct phase_rusage	pi;
+	struct phase_ops	*sp;
+	bool			moveon = true;
+	unsigned int		debug_phase = 0;
+	unsigned int		phase;
+
+	if (debug && debug_tweak_on("XFS_SCRUB_PHASE"))
+		debug_phase = atoi(getenv("XFS_SCRUB_PHASE"));
+
+	/* Run all phases of the scrub tool. */
+	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
+		/* Skip certain phases unless they're turned on. */
+		if (sp->fn == REPAIR_DUMMY_FN ||
+		    sp->fn == DATASCAN_DUMMY_FN)
+			continue;
+
+		/* Allow debug users to force a particular phase. */
+		if (debug_phase && phase != debug_phase && !sp->must_run)
+			continue;
+
+		/* Run this phase. */
+		moveon = phase_start(&pi, phase, sp->descr);
+		if (!moveon)
+			break;
+		moveon = sp->fn(ctx);
+		if (!moveon) {
+			str_info(ctx, ctx->mntpoint,
+_("Scrub aborted after phase %d."),
+					phase);
+			break;
+		}
+		moveon = phase_end(&pi, phase);
+		if (!moveon)
+			break;
+
+		/* Too many errors? */
+		moveon = !xfs_scrub_excessive_errors(ctx);
+		if (!moveon)
+			break;
+	}
+
+	return moveon;
+}
+
 int
 main(
 	int			argc,
@@ -160,6 +423,7 @@ main(
 	char			*mtab = NULL;
 	char			*repairstr = "";
 	struct scrub_ctx	ctx = {0};
+	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
 	bool			moveon = true;
 	static bool		injected;
@@ -272,6 +536,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 			mtab = _PATH_MOUNTED;
 	}
 
+	/* Initialize overall phase stats. */
+	moveon = phase_start(&all_pi, 0, NULL);
+	if (!moveon)
+		goto out;
+
 	/* How many CPUs? */
 	nproc = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nproc < 0)
@@ -289,6 +558,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 		injected = true;
 	}
 
+	/* Scrub a filesystem. */
+	moveon = run_scrub_phases(&ctx);
+	if (!moveon)
+		ret |= 4;
+
 	if (xfs_scrub_excessive_errors(&ctx))
 		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
 
@@ -318,6 +592,7 @@ _("%s: %llu warnings found.\n"),
 		ret |= 2;
 	if (ctx.runtime_errors)
 		ret |= 4;
+	phase_end(&all_pi, 0);
 	free(ctx.mntpoint);
 
 	return ret;


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create the plumbing to figure out how many threads we're going to want
to do all of our scrubbing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |   26 ++++++++++++++++++++++++++
 scrub/common.h    |    2 ++
 scrub/xfs_scrub.h |    3 +++
 3 files changed, 31 insertions(+)


diff --git a/scrub/common.c b/scrub/common.c
index 649b1c9..3df6126 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -222,3 +222,29 @@ auto_units(
 	*units = "";
 	return number;
 }
+
+/* How many threads to kick off? */
+unsigned int
+scrub_nproc(
+	struct scrub_ctx	*ctx)
+{
+	if (nr_threads)
+		return nr_threads;
+	return ctx->nr_io_threads;
+}
+
+/*
+ * How many threads to kick off for a workqueue?  If we only want one
+ * thread, save ourselves the overhead and just run it in the main thread.
+ */
+unsigned int
+scrub_nproc_workqueue(
+	struct scrub_ctx	*ctx)
+{
+	unsigned int		x;
+
+	x = scrub_nproc(ctx);
+	if (x == 1)
+		x = 0;
+	return x;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 70a3b9d..145a05c 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -59,5 +59,7 @@ debug_tweak_on(
 double timeval_subtract(struct timeval *tv1, struct timeval *tv2);
 double auto_space_units(unsigned long long kilobytes, char **units);
 double auto_units(unsigned long long number, char **units);
+unsigned int scrub_nproc(struct scrub_ctx *ctx);
+unsigned int scrub_nproc_workqueue(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 3bc2b63..8e2fa54 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -57,6 +57,9 @@ struct scrub_ctx {
 	/* How does the user want us to react to errors? */
 	enum error_action	error_action;
 
+	/* Number of threads for metadata scrubbing */
+	unsigned int		nr_io_threads;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/27] xfs_scrub: create an abstraction for a block device
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an abstraction to handle all of our low level disk operations.
We'll eventually use it to bind to a fs mount point and block device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 +
 scrub/disk.c   |  164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/disk.h   |   39 +++++++++++++
 3 files changed, 205 insertions(+)
 create mode 100644 scrub/disk.c
 create mode 100644 scrub/disk.h


diff --git a/scrub/Makefile b/scrub/Makefile
index ac0af94..f810790 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -17,10 +17,12 @@ endif	# scrub_prereqs
 
 HFILES = \
 common.h \
+disk.h \
 xfs_scrub.h
 
 CFILES = \
 common.c \
+disk.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
diff --git a/scrub/disk.c b/scrub/disk.c
new file mode 100644
index 0000000..fe91842
--- /dev/null
+++ b/scrub/disk.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <linux/fs.h>
+#include "platform_defs.h"
+#include "libfrog.h"
+#include "xfs_scrub.h"
+#include "disk.h"
+
+/*
+ * Disk Abstraction
+ *
+ * These routines help us to discover the geometry of a block device,
+ * estimate the amount of concurrent IOs that we can send to it, and
+ * abstract the process of performing read verification of disk blocks.
+ */
+
+/* Figure out how many disk heads are available. */
+static unsigned int
+__disk_heads(
+	struct disk		*disk)
+{
+	int			iomin;
+	int			ioopt;
+	unsigned short		rot;
+	int			error;
+
+	/* If it's not a block device, throw all the CPUs at it. */
+	if (!S_ISBLK(disk->d_sb.st_mode))
+		return nproc;
+
+	/* Non-rotational device?  Throw all the CPUs. */
+	rot = 1;
+	error = ioctl(disk->d_fd, BLKROTATIONAL, &rot);
+	if (error == 0 && rot == 0)
+		return nproc;
+
+	/*
+	 * Sometimes we can infer the number of devices from the
+	 * min/optimal IO sizes.
+	 */
+	iomin = ioopt = 0;
+	if (ioctl(disk->d_fd, BLKIOMIN, &iomin) == 0 &&
+	    ioctl(disk->d_fd, BLKIOOPT, &ioopt) == 0 &&
+	    iomin > 0 && ioopt > 0) {
+		return min(nproc, max(1, ioopt / iomin));
+	}
+
+	/* Rotating device?  I guess? */
+	return 2;
+}
+
+/* Figure out how many disk heads are available. */
+unsigned int
+disk_heads(
+	struct disk		*disk)
+{
+	if (nr_threads)
+		return nr_threads;
+	return __disk_heads(disk);
+}
+
+/* Open a disk device and discover its geometry. */
+struct disk *
+disk_open(
+	const char		*pathname)
+{
+	struct disk		*disk;
+	int			lba_sz;
+	int			error;
+
+	disk = calloc(1, sizeof(struct disk));
+	if (!disk)
+		return NULL;
+
+	disk->d_fd = open(pathname, O_RDONLY | O_DIRECT | O_NOATIME);
+	if (disk->d_fd < 0)
+		goto out_free;
+
+	/* Try to get LBA size. */
+	error = ioctl(disk->d_fd, BLKSSZGET, &lba_sz);
+	if (error)
+		lba_sz = 512;
+	disk->d_lbalog = log2_roundup(lba_sz);
+
+	/* Obtain disk's stat info. */
+	error = fstat(disk->d_fd, &disk->d_sb);
+	if (error)
+		goto out_close;
+
+	/* Determine bdev size, block size, and offset. */
+	if (S_ISBLK(disk->d_sb.st_mode)) {
+		error = ioctl(disk->d_fd, BLKGETSIZE64, &disk->d_size);
+		if (error)
+			disk->d_size = 0;
+		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
+		if (error)
+			disk->d_blksize = 0;
+		disk->d_start = 0;
+	} else {
+		disk->d_size = disk->d_sb.st_size;
+		disk->d_blksize = disk->d_sb.st_blksize;
+		disk->d_start = 0;
+	}
+
+	return disk;
+out_close:
+	close(disk->d_fd);
+out_free:
+	free(disk);
+	return NULL;
+}
+
+/* Close a disk device. */
+int
+disk_close(
+	struct disk		*disk)
+{
+	int			error = 0;
+
+	if (disk->d_fd >= 0)
+		error = close(disk->d_fd);
+	disk->d_fd = -1;
+	free(disk);
+	return error;
+}
+
+/* Read-verify an extent of a disk device. */
+ssize_t
+disk_read_verify(
+	struct disk		*disk,
+	void			*buf,
+	uint64_t		start,
+	uint64_t		length)
+{
+	return pread(disk->d_fd, buf, length, start);
+}
diff --git a/scrub/disk.h b/scrub/disk.h
new file mode 100644
index 0000000..4331300
--- /dev/null
+++ b/scrub/disk.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_DISK_H_
+#define XFS_SCRUB_DISK_H_
+
+struct disk {
+	struct stat	d_sb;
+	int		d_fd;
+	int		d_lbalog;
+	unsigned int	d_flags;
+	unsigned int	d_blksize;	/* bytes */
+	uint64_t	d_size;		/* bytes */
+	uint64_t	d_start;	/* bytes */
+};
+
+unsigned int disk_heads(struct disk *disk);
+struct disk *disk_open(const char *pathname);
+int disk_close(struct disk *disk);
+ssize_t disk_read_verify(struct disk *disk, void *buf, uint64_t startblock,
+		uint64_t blockcount);
+
+#endif /* XFS_SCRUB_DISK_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/27] xfs_scrub: find XFS filesystem geometry
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:00 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Discover the geometry of the XFS filesystem that we've been told to
scan, and set up some common functions that will be used by the
scrub phases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    5 +
 scrub/common.c    |   72 +++++++++++++++++
 scrub/common.h    |   10 ++
 scrub/disk.c      |    3 +
 scrub/phase1.c    |  223 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |   34 ++++++++
 scrub/xfs_scrub.h |   29 +++++++
 7 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 scrub/phase1.c


diff --git a/scrub/Makefile b/scrub/Makefile
index f810790..2d2a164 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ xfs_scrub.h
 CFILES = \
 common.c \
 disk.c \
+phase1.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
@@ -33,6 +34,10 @@ ifeq ($(HAVE_MALLINFO),yes)
 LCFLAGS += -DHAVE_MALLINFO
 endif
 
+ifeq ($(HAVE_SYNCFS),yes)
+LCFLAGS += -DHAVE_SYNCFS
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/scrub/common.c b/scrub/common.c
index 3df6126..1d1b3e3 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -20,8 +20,11 @@
 #include <stdio.h>
 #include <pthread.h>
 #include <stdbool.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -248,3 +251,72 @@ scrub_nproc_workqueue(
 		x = 0;
 	return x;
 }
+
+/*
+ * Check if the argument is either the device name or mountpoint of a mounted
+ * filesystem.
+ */
+#define MNTTYPE_XFS	"xfs"
+static bool
+find_mountpoint_check(
+	struct stat		*sb,
+	struct mntent		*t)
+{
+	struct stat		ms;
+
+	if (S_ISDIR(sb->st_mode)) {		/* mount point */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+		if (sb->st_ino != ms.st_ino)
+			return false;
+		if (sb->st_dev != ms.st_dev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+	} else {				/* device */
+		if (stat(t->mnt_fsname, &ms) < 0)
+			return false;
+		if (sb->st_rdev != ms.st_rdev)
+			return false;
+		if (strcmp(t->mnt_type, MNTTYPE_XFS) != 0)
+			return NULL;
+		/*
+		 * Make sure the mountpoint given by mtab is accessible
+		 * before using it.
+		 */
+		if (stat(t->mnt_dir, &ms) < 0)
+			return false;
+	}
+
+	return true;
+}
+
+/* Check that our alleged mountpoint is in mtab */
+bool
+find_mountpoint(
+	char			*mtab,
+	struct scrub_ctx	*ctx)
+{
+	struct mntent_cursor	cursor;
+	struct mntent		*t = NULL;
+	bool			found = false;
+
+	if (platform_mntent_open(&cursor, mtab) != 0) {
+		fprintf(stderr, "Error: can't get mntent entries.\n");
+		exit(1);
+	}
+
+	while ((t = platform_mntent_next(&cursor)) != NULL) {
+		/*
+		 * Keep jotting down matching mount details; newer mounts are
+		 * towards the end of the file (hopefully).
+		 */
+		if (find_mountpoint_check(&ctx->mnt_sb, t)) {
+			ctx->mntpoint = strdup(t->mnt_dir);
+			ctx->blkdev = strdup(t->mnt_fsname);
+			found = true;
+		}
+	}
+	platform_mntent_close(&cursor);
+	return found;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 145a05c..ae5da76 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -62,4 +62,14 @@ double auto_units(unsigned long long number, char **units);
 unsigned int scrub_nproc(struct scrub_ctx *ctx);
 unsigned int scrub_nproc_workqueue(struct scrub_ctx *ctx);
 
+#ifndef HAVE_SYNCFS
+static inline int syncfs(int fd)
+{
+	sync();
+	return 0;
+}
+#endif
+
+bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/disk.c b/scrub/disk.c
index fe91842..96eaa6a 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -31,6 +31,9 @@
 #include <linux/fs.h>
 #include "platform_defs.h"
 #include "libfrog.h"
+#include "xfs.h"
+#include "path.h"
+#include "xfs_fs.h"
 #include "xfs_scrub.h"
 #include "disk.h"
 
diff --git a/scrub/phase1.c b/scrub/phase1.c
new file mode 100644
index 0000000..0a18f85
--- /dev/null
+++ b/scrub/phase1.c
@@ -0,0 +1,223 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <mntent.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/statvfs.h>
+#include <sys/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <errno.h>
+#include <linux/fs.h>
+#include "libfrog.h"
+#include "workqueue.h"
+#include "input.h"
+#include "path.h"
+#include "handle.h"
+#include "bitops.h"
+#include "xfs_arch.h"
+#include "xfs_format.h"
+#include "avl64.h"
+#include "list.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "disk.h"
+
+/* Phase 1: Find filesystem geometry (and clean up after) */
+
+/* Shut down the filesystem. */
+void
+xfs_shutdown_fs(
+	struct scrub_ctx		*ctx)
+{
+	int				flag;
+
+	flag = XFS_FSOP_GOING_FLAGS_LOGFLUSH;
+	str_info(ctx, ctx->mntpoint, _("Shutting down filesystem!"));
+	if (ioctl(ctx->mnt_fd, XFS_IOC_GOINGDOWN, &flag))
+		str_errno(ctx, ctx->mntpoint);
+}
+
+/* Clean up the XFS-specific state data. */
+bool
+xfs_cleanup_fs(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->fshandle)
+		free_handle(ctx->fshandle, ctx->fshandle_len);
+	if (ctx->rtdev)
+		disk_close(ctx->rtdev);
+	if (ctx->logdev)
+		disk_close(ctx->logdev);
+	if (ctx->datadev)
+		disk_close(ctx->datadev);
+	fshandle_destroy();
+	close(ctx->mnt_fd);
+	fs_table_destroy();
+
+	return true;
+}
+
+/*
+ * Bind to the mountpoint, read the XFS geometry, bind to the block devices.
+ * Anything we've already built will be cleaned up by xfs_cleanup_fs.
+ */
+bool
+xfs_setup_fs(
+	struct scrub_ctx		*ctx)
+{
+	struct fs_path			*fsp;
+	int				error;
+
+	/*
+	 * Open the directory with O_NOATIME.  For mountpoints owned
+	 * by root, this should be sufficient to ensure that we have
+	 * CAP_SYS_ADMIN, which we probably need to do anything fancy
+	 * with the (XFS driver) kernel.
+	 */
+	ctx->mnt_fd = open(ctx->mntpoint, O_RDONLY | O_NOATIME | O_DIRECTORY);
+	if (ctx->mnt_fd < 0) {
+		if (errno == EPERM)
+			str_info(ctx, ctx->mntpoint,
+_("Must be root to run scrub."));
+		else
+			str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	error = fstat(ctx->mnt_fd, &ctx->mnt_sb);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatvfs(ctx->mnt_fd, &ctx->mnt_sv);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	error = fstatfs(ctx->mnt_fd, &ctx->mnt_sf);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->nr_io_threads = nproc;
+	if (verbose) {
+		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
+				ctx->mntpoint, scrub_nproc(ctx));
+		fflush(stdout);
+	}
+
+	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
+		str_error(ctx, ctx->mntpoint,
+_("Does not appear to be an XFS filesystem!"));
+		return false;
+	}
+
+	/*
+	 * Flush everything out to disk before we start checking.
+	 * This seems to reduce the incidence of stale file handle
+	 * errors when we open things by handle.
+	 */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Retrieve XFS geometry. */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSGEOMETRY, &ctx->geo);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ctx->agblklog = log2_roundup(ctx->geo.agblocks);
+	ctx->blocklog = highbit32(ctx->geo.blocksize);
+	ctx->inodelog = highbit32(ctx->geo.inodesize);
+	ctx->inopblog = ctx->blocklog - ctx->inodelog;
+
+	error = path_to_fshandle(ctx->mntpoint, &ctx->fshandle,
+			&ctx->fshandle_len);
+	if (error) {
+		perror(_("getting fshandle"));
+		return false;
+	}
+
+	/* Go find the XFS devices if we have a usable fsmap. */
+	fs_table_initialise(0, NULL, 0, NULL);
+	errno = 0;
+	fsp = fs_table_lookup(ctx->mntpoint, FS_MOUNT_POINT);
+	if (!fsp) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find XFS information."));
+		return false;
+	}
+	memcpy(&ctx->fsinfo, fsp, sizeof(struct fs_path));
+
+	/* Did we find the log and rt devices, if they're present? */
+	if (ctx->geo.logstart == 0 && ctx->fsinfo.fs_log == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find log device path."));
+		return false;
+	}
+	if (ctx->geo.rtblocks && ctx->fsinfo.fs_rt == NULL) {
+		str_error(ctx, ctx->mntpoint,
+_("Unable to find realtime device path."));
+		return false;
+	}
+
+	/* Open the raw devices. */
+	ctx->datadev = disk_open(ctx->fsinfo.fs_name);
+	if (error) {
+		str_errno(ctx, ctx->fsinfo.fs_name);
+		return false;
+	}
+
+	if (ctx->fsinfo.fs_log) {
+		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+	if (ctx->fsinfo.fs_rt) {
+		ctx->rtdev = disk_open(ctx->fsinfo.fs_rt);
+		if (error) {
+			str_errno(ctx, ctx->fsinfo.fs_name);
+			return false;
+		}
+	}
+
+	/*
+	 * Everything's set up, which means any failures recorded after
+	 * this point are most probably corruption errors (as opposed to
+	 * purely setup errors).
+	 */
+	ctx->need_repair = true;
+	return true;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 5466c58..a003e44 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -23,9 +23,12 @@
 #include <stdlib.h>
 #include <sys/time.h>
 #include <sys/resource.h>
+#include <sys/statvfs.h>
 #include "platform_defs.h"
 #include "xfs.h"
+#include "xfs_fs.h"
 #include "input.h"
+#include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
 
@@ -345,6 +348,8 @@ run_scrub_phases(
 	{
 		{
 			.descr = _("Find filesystem geometry."),
+			.fn = xfs_setup_fs,
+			.must_run = true,
 		},
 		{
 			.descr = _("Check internal metadata."),
@@ -426,6 +431,7 @@ main(
 	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
 	bool			moveon = true;
+	bool			ismnt;
 	static bool		injected;
 	int			ret = 0;
 
@@ -522,6 +528,15 @@ _("Only one of the options -n or -y may be specified.\n"));
 
 	ctx.mntpoint = strdup(argv[optind]);
 
+	/* Find the mount record for the passed-in argument. */
+	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
+		fprintf(stderr,
+			_("%s: could not stat: %s: %s\n"),
+			progname, argv[optind], strerror(errno));
+		ret |= 8;
+		goto out;
+	}
+
 	/*
 	 * If the user did not specify an explicit mount table, try to use
 	 * /proc/mounts if it is available, else /etc/mtab.  We prefer
@@ -541,6 +556,14 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (!moveon)
 		goto out;
 
+	ismnt = find_mountpoint(mtab, &ctx);
+	if (!ismnt) {
+		fprintf(stderr, _("%s: Not a mount point or block device.\n"),
+			ctx.mntpoint);
+		ret |= 8;
+		goto out;
+	}
+
 	/* How many CPUs? */
 	nproc = sysconf(_SC_NPROCESSORS_ONLN);
 	if (nproc < 0)
@@ -569,6 +592,11 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
 		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
 
+	/* Clean up scan data. */
+	moveon = xfs_cleanup_fs(&ctx);
+	if (!moveon)
+		ret |= 8;
+
 out:
 	total_errors = ctx.errors_found + ctx.runtime_errors;
 	if (ctx.need_repair)
@@ -586,13 +614,17 @@ _("%s: %llu errors found.%s\n"),
 		fprintf(stderr,
 _("%s: %llu warnings found.\n"),
 			ctx.mntpoint, ctx.warnings_found);
-	if (ctx.errors_found)
+	if (ctx.errors_found) {
+		if (ctx.error_action == ERRORS_SHUTDOWN)
+			xfs_shutdown_fs(&ctx);
 		ret |= 1;
+	}
 	if (ctx.warnings_found)
 		ret |= 2;
 	if (ctx.runtime_errors)
 		ret |= 4;
 	phase_end(&all_pi, 0);
+	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
 	return ret;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 8e2fa54..037452e 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -51,15 +51,38 @@ struct scrub_ctx {
 	char			*mntpoint;
 	char			*blkdev;
 
+	/* Mountpoint info */
+	struct stat		mnt_sb;
+	struct statvfs		mnt_sv;
+	struct statfs		mnt_sf;
+
+	/* Open block devices */
+	struct disk		*datadev;
+	struct disk		*logdev;
+	struct disk		*rtdev;
+
 	/* What does the user want us to do? */
 	enum scrub_mode		mode;
 
 	/* How does the user want us to react to errors? */
 	enum error_action	error_action;
 
+	/* fd to filesystem mount point */
+	int			mnt_fd;
+
 	/* Number of threads for metadata scrubbing */
 	unsigned int		nr_io_threads;
 
+	/* XFS specific geometry */
+	struct xfs_fsop_geom	geo;
+	struct fs_path		fsinfo;
+	unsigned int		agblklog;
+	unsigned int		blocklog;
+	unsigned int		inodelog;
+	unsigned int		inopblog;
+	void			*fshandle;
+	size_t			fshandle_len;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;
@@ -67,6 +90,12 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	bool			need_repair;
+	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };
 
+/* Phase helper functions */
+void xfs_shutdown_fs(struct scrub_ctx *ctx);
+bool xfs_cleanup_fs(struct scrub_ctx *ctx);
+bool xfs_setup_fs(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/27] xfs_scrub: add inode iteration functions
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
@ 2017-11-17 21:00 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:00 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to count or iterate all inodes in a
filesystem.  The counting function uses INUMBERS, while the inode
iterator uses INUMBERS and BULKSTAT to iterate over every inode that
should be in the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/inodes.c |  284 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/inodes.h |   32 ++++++
 3 files changed, 318 insertions(+)
 create mode 100644 scrub/inodes.c
 create mode 100644 scrub/inodes.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 2d2a164..259f4d7 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -18,11 +18,13 @@ endif	# scrub_prereqs
 HFILES = \
 common.h \
 disk.h \
+inodes.h \
 xfs_scrub.h
 
 CFILES = \
 common.c \
 disk.c \
+inodes.c \
 phase1.c \
 xfs_scrub.c
 
diff --git a/scrub/inodes.c b/scrub/inodes.c
new file mode 100644
index 0000000..c880c36
--- /dev/null
+++ b/scrub/inodes.c
@@ -0,0 +1,284 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "xfs_format.h"
+#include "handle.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "inodes.h"
+
+/*
+ * Iterate a range of inodes.
+ *
+ * This is a little more involved than repeatedly asking BULKSTAT for a
+ * buffer's worth of stat data for some number of inodes.  We want to
+ * scan as many of the inodes that the inobt thinks there are, including
+ * the ones that are broken, but if we ask for n inodes start at x,
+ * it'll skip the bad ones and fill from beyond the range (x + n).
+ *
+ * Therefore, we ask INUMBERS to return one inobt chunk's worth of inode
+ * bitmap information.  Then we try to BULKSTAT only the inodes that
+ * were present in that chunk, and compare what we got against what
+ * INUMBERS said was there.  If there's a mismatch, we know that we have
+ * an inode that fails the verifiers but so we can inject the bulkstat
+ * information to force the scrub code to deal with the broken inodes.
+ *
+ * If the iteration function returns ESTALE, that means that the inode
+ * has been deleted and possibly recreated since the BULKSTAT call.  We
+ * wil refresh the stat information and try again up to 30 times before
+ * reporting the staleness as an error.
+ */
+
+/*
+ * Call into the filesystem for inode/bulkstat information and call our
+ * iterator function.  We'll try to fill the bulkstat information in
+ * batches, but we also can detect iget failures.
+ */
+static bool
+xfs_iterate_inodes_range(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	void			*fshandle,
+	uint64_t		first_ino,
+	uint64_t		last_ino,
+	xfs_inode_iter_fn	fn,
+	void			*arg)
+{
+	struct xfs_fsop_bulkreq	igrpreq = {0};
+	struct xfs_fsop_bulkreq	bulkreq = {0};
+	struct xfs_fsop_bulkreq	onereq = {0};
+	struct xfs_handle	handle;
+	struct xfs_inogrp	inogrp;
+	struct xfs_bstat	bstat[XFS_INODES_PER_CHUNK] = {0};
+	char			idescr[DESCR_BUFSZ];
+	char			buf[DESCR_BUFSZ];
+	struct xfs_bstat	*bs;
+	__u64			last_stale = first_ino - 1;
+	__u64			igrp_ino;
+	__u64			oneino;
+	__u64			ino;
+	__s32			bulklen = 0;
+	__s32			onelen = 0;
+	__s32			igrplen = 0;
+	bool			moveon = true;
+	int			i;
+	int			error;
+	int			stale_count = 0;
+
+	onereq.lastip  = &oneino;
+	onereq.icount  = 1;
+	onereq.ocount  = &onelen;
+
+	bulkreq.lastip  = &ino;
+	bulkreq.icount  = XFS_INODES_PER_CHUNK;
+	bulkreq.ubuffer = &bstat;
+	bulkreq.ocount  = &bulklen;
+
+	igrpreq.lastip  = &igrp_ino;
+	igrpreq.icount  = 1;
+	igrpreq.ubuffer = &inogrp;
+	igrpreq.ocount  = &igrplen;
+
+	memcpy(&handle.ha_fsid, fshandle, sizeof(handle.ha_fsid));
+	handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+			sizeof(handle.ha_fid.fid_len);
+	handle.ha_fid.fid_pad = 0;
+
+	/* Find the inode chunk & alloc mask */
+	igrp_ino = first_ino;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	while (!error && igrplen) {
+		/* Load the inodes. */
+		ino = inogrp.xi_startino - 1;
+		bulkreq.icount = inogrp.xi_alloccount;
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSBULKSTAT, &bulkreq);
+		if (error)
+			str_warn(ctx, descr, "%s", strerror_r(errno,
+						buf, DESCR_BUFSZ));
+
+		/* Did we get exactly the inodes we expected? */
+		for (i = 0, bs = bstat; i < XFS_INODES_PER_CHUNK; i++) {
+			if (!(inogrp.xi_allocmask & (1ULL << i)))
+				continue;
+			if (bs->bs_ino == inogrp.xi_startino + i) {
+				bs++;
+				continue;
+			}
+
+			/* Load the one inode. */
+			oneino = inogrp.xi_startino + i;
+			onereq.ubuffer = bs;
+			error = ioctl(ctx->mnt_fd, XFS_IOC_FSBULKSTAT_SINGLE,
+					&onereq);
+			if (error || bs->bs_ino != inogrp.xi_startino + i) {
+				memset(bs, 0, sizeof(struct xfs_bstat));
+				bs->bs_ino = inogrp.xi_startino + i;
+				bs->bs_blksize = ctx->mnt_sv.f_frsize;
+			}
+			bs++;
+		}
+
+		/* Iterate all the inodes. */
+		for (i = 0, bs = bstat; i < inogrp.xi_alloccount; i++, bs++) {
+			if (bs->bs_ino > last_ino)
+				goto out;
+
+			handle.ha_fid.fid_ino = bs->bs_ino;
+			handle.ha_fid.fid_gen = bs->bs_gen;
+			error = fn(ctx, &handle, bs, arg);
+			switch (error) {
+			case 0:
+				break;
+			case ESTALE:
+				if (last_stale == inogrp.xi_startino)
+					stale_count++;
+				else {
+					last_stale = inogrp.xi_startino;
+					stale_count = 0;
+				}
+				if (stale_count < 30) {
+					igrp_ino = inogrp.xi_startino;
+					goto igrp_retry;
+				}
+				snprintf(idescr, DESCR_BUFSZ, "inode %"PRIu64,
+						(uint64_t)bs->bs_ino);
+				str_warn(ctx, idescr, "%s", strerror_r(error,
+						buf, DESCR_BUFSZ));
+				break;
+			case XFS_ITERATE_INODES_ABORT:
+				error = 0;
+				/* fall thru */
+			default:
+				moveon = false;
+				errno = error;
+				goto err;
+			}
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+igrp_retry:
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	}
+
+err:
+	if (error) {
+		str_errno(ctx, descr);
+		moveon = false;
+	}
+out:
+	return moveon;
+}
+
+/* BULKSTAT wrapper routines. */
+struct xfs_scan_inodes {
+	xfs_inode_iter_fn	fn;
+	void			*arg;
+	bool			moveon;
+};
+
+/* Scan all the inodes in an AG. */
+static void
+xfs_scan_ag_inodes(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct xfs_scan_inodes	*si = arg;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	char			descr[DESCR_BUFSZ];
+	uint64_t		ag_ino;
+	uint64_t		next_ag_ino;
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u inodes"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	ag_ino = (__u64)agno << (ctx->inopblog + ctx->agblklog);
+	next_ag_ino = (__u64)(agno + 1) << (ctx->inopblog + ctx->agblklog);
+
+	moveon = xfs_iterate_inodes_range(ctx, descr, ctx->fshandle, ag_ino,
+			next_ag_ino - 1, si->fn, si->arg);
+	if (!moveon)
+		si->moveon = false;
+}
+
+/* Scan all the inodes in a filesystem. */
+bool
+xfs_scan_all_inodes(
+	struct scrub_ctx	*ctx,
+	xfs_inode_iter_fn	fn,
+	void			*arg)
+{
+	struct xfs_scan_inodes	si;
+	xfs_agnumber_t		agno;
+	struct workqueue	wq;
+	int			ret;
+
+	si.moveon = true;
+	si.fn = fn;
+	si.arg = arg;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_inodes, agno, &si);
+		if (ret) {
+			si.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u bulkstat work."), agno);
+			break;
+		}
+	}
+
+	workqueue_destroy(&wq);
+
+	return si.moveon;
+}
+
+/*
+ * Open a file by handle, or return a negative error code.
+ */
+int
+xfs_open_handle(
+	struct xfs_handle	*handle)
+{
+	return open_by_fshandle(handle, sizeof(*handle),
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+}
diff --git a/scrub/inodes.h b/scrub/inodes.h
new file mode 100644
index 0000000..c398677
--- /dev/null
+++ b/scrub/inodes.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_INODES_H_
+#define XFS_SCRUB_INODES_H_
+
+typedef int (*xfs_inode_iter_fn)(struct scrub_ctx *ctx,
+		struct xfs_handle *handle, struct xfs_bstat *bs, void *arg);
+
+#define XFS_ITERATE_INODES_ABORT	(-1)
+bool xfs_scan_all_inodes(struct scrub_ctx *ctx, xfs_inode_iter_fn fn,
+		void *arg);
+
+int xfs_open_handle(struct xfs_handle *handle);
+
+#endif /* XFS_SCRUB_INODES_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/27] xfs_scrub: add space map iteration functions
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2017-11-17 21:00 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to iterate all the space map information
in a filesystem.  The iteration function uses GETFSMAP.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile   |    2 
 scrub/spacemap.c |  256 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/spacemap.h |   31 +++++++
 3 files changed, 289 insertions(+)
 create mode 100644 scrub/spacemap.c
 create mode 100644 scrub/spacemap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 259f4d7..5a2ec67 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -19,6 +19,7 @@ HFILES = \
 common.h \
 disk.h \
 inodes.h \
+spacemap.h \
 xfs_scrub.h
 
 CFILES = \
@@ -26,6 +27,7 @@ common.c \
 disk.c \
 inodes.c \
 phase1.c \
+spacemap.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
new file mode 100644
index 0000000..362931e
--- /dev/null
+++ b/scrub/spacemap.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "workqueue.h"
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "spacemap.h"
+
+/*
+ * Filesystem space map iterators.
+ *
+ * Logically, we call GETFSMAP to fetch a set of space map records and
+ * call a function to iterate over the records.  However, that's not
+ * what actually happens -- the work is split into separate items, with
+ * each AG, the realtime device, and the log device getting their own
+ * work items.  For an XFS with a realtime device and an external log,
+ * this means that we can have up to ($agcount + 2) threads running at
+ * once.
+ *
+ * This comes into play if we want to have per-workitem memory.  Maybe.
+ * XXX: do we really need all that ?
+ */
+
+#define FSMAP_NR	65536
+
+/* Iterate all the fs block mappings between the two keys. */
+bool
+xfs_iterate_fsmap(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fsmap		*keys,
+	xfs_fsmap_iter_fn	fn,
+	void			*arg)
+{
+	struct fsmap_head	*head;
+	struct fsmap		*p;
+	bool			moveon = true;
+	int			i;
+	int			error;
+
+	head = malloc(fsmap_sizeof(FSMAP_NR));
+	if (!head) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	memset(head, 0, sizeof(*head));
+	memcpy(head->fmh_keys, keys, sizeof(struct fsmap) * 2);
+	head->fmh_count = FSMAP_NR;
+
+	while ((error = ioctl(ctx->mnt_fd, FS_IOC_GETFSMAP, head)) == 0) {
+		for (i = 0, p = head->fmh_recs;
+		     i < head->fmh_entries;
+		     i++, p++) {
+			moveon = fn(ctx, descr, p, arg);
+			if (!moveon)
+				goto out;
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+		if (head->fmh_entries == 0)
+			break;
+		p = &head->fmh_recs[head->fmh_entries - 1];
+		if (p->fmr_flags & FMR_OF_LAST)
+			break;
+		fsmap_advance(head);
+	}
+
+	if (error) {
+		str_errno(ctx, descr);
+		moveon = false;
+	}
+out:
+	free(head);
+	return moveon;
+}
+
+/* GETFSMAP wrappers routines. */
+struct xfs_scan_blocks {
+	xfs_fsmap_iter_fn	fn;
+	void			*arg;
+	bool			moveon;
+};
+
+/* Iterate all the reverse mappings of an AG. */
+static void
+xfs_scan_ag_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct xfs_scan_blocks	*sbx = arg;
+	char			descr[DESCR_BUFSZ];
+	struct fsmap		keys[2];
+	off64_t			bperag;
+	bool			moveon;
+
+	bperag = (off64_t)ctx->geo.agblocks *
+		 (off64_t)ctx->geo.blocksize;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u fsmap"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = ctx->fsinfo.fs_datadev;
+	keys->fmr_physical = agno * bperag;
+	(keys + 1)->fmr_device = ctx->fsinfo.fs_datadev;
+	(keys + 1)->fmr_physical = ((agno + 1) * bperag) - 1;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+
+	moveon = xfs_iterate_fsmap(ctx, descr, keys, sbx->fn, sbx->arg);
+	if (!moveon)
+		sbx->moveon = false;
+}
+
+/* Iterate all the reverse mappings of a standalone device. */
+static void
+xfs_scan_dev_blocks(
+	struct scrub_ctx	*ctx,
+	int			idx,
+	dev_t			dev,
+	struct xfs_scan_blocks	*sbx)
+{
+	struct fsmap		keys[2];
+	char			descr[DESCR_BUFSZ];
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d fsmap"),
+			major(dev), minor(dev));
+
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = dev;
+	(keys + 1)->fmr_device = dev;
+	(keys + 1)->fmr_physical = ULLONG_MAX;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+
+	moveon = xfs_iterate_fsmap(ctx, descr, keys, sbx->fn, sbx->arg);
+	if (!moveon)
+		sbx->moveon = false;
+}
+
+/* Iterate all the reverse mappings of the realtime device. */
+static void
+xfs_scan_rt_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+
+	xfs_scan_dev_blocks(ctx, agno, ctx->fsinfo.fs_rtdev, arg);
+}
+
+/* Iterate all the reverse mappings of the log device. */
+static void
+xfs_scan_log_blocks(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+
+	xfs_scan_dev_blocks(ctx, agno, ctx->fsinfo.fs_logdev, arg);
+}
+
+/* Scan all the blocks in a filesystem. */
+bool
+xfs_scan_all_spacemaps(
+	struct scrub_ctx	*ctx,
+	xfs_fsmap_iter_fn	fn,
+	void			*arg)
+{
+	struct workqueue	wq;
+	struct xfs_scan_blocks	sbx;
+	xfs_agnumber_t		agno;
+	int			ret;
+
+	sbx.moveon = true;
+	sbx.fn = fn;
+	sbx.arg = arg;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+	if (ctx->fsinfo.fs_rt) {
+		ret = workqueue_add(&wq, xfs_scan_rt_blocks,
+				ctx->geo.agcount + 1, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue rtdev fsmap work."));
+			goto out;
+		}
+	}
+	if (ctx->fsinfo.fs_log) {
+		ret = workqueue_add(&wq, xfs_scan_log_blocks,
+				ctx->geo.agcount + 2, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue logdev fsmap work."));
+			goto out;
+		}
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_blocks, agno, &sbx);
+		if (ret) {
+			sbx.moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u fsmap work."), agno);
+			break;
+		}
+	}
+out:
+	workqueue_destroy(&wq);
+
+	return sbx.moveon;
+}
diff --git a/scrub/spacemap.h b/scrub/spacemap.h
new file mode 100644
index 0000000..0d44fde
--- /dev/null
+++ b/scrub/spacemap.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_SPACEMAP_H_
+#define XFS_SCRUB_SPACEMAP_H_
+
+typedef bool (*xfs_fsmap_iter_fn)(struct scrub_ctx *ctx, const char *descr,
+		struct fsmap *fsr, void *arg);
+
+bool xfs_iterate_fsmap(struct scrub_ctx *ctx, const char *descr,
+		struct fsmap *keys, xfs_fsmap_iter_fn fn, void *arg);
+bool xfs_scan_all_spacemaps(struct scrub_ctx *ctx, xfs_fsmap_iter_fn fn,
+		void *arg);
+
+#endif /* XFS_SCRUB_SPACEMAP_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/27] xfs_scrub: add file space map iteration functions
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

These helpers enable userspace to iterate all the space map information
for a file.  The iteration function uses GETBMAPX.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile  |    2 +
 scrub/filemap.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/filemap.h |   39 ++++++++++++++
 3 files changed, 199 insertions(+)
 create mode 100644 scrub/filemap.c
 create mode 100644 scrub/filemap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 5a2ec67..bf0a813 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -18,6 +18,7 @@ endif	# scrub_prereqs
 HFILES = \
 common.h \
 disk.h \
+filemap.h \
 inodes.h \
 spacemap.h \
 xfs_scrub.h
@@ -25,6 +26,7 @@ xfs_scrub.h
 CFILES = \
 common.c \
 disk.c \
+filemap.c \
 inodes.c \
 phase1.c \
 spacemap.c \
diff --git a/scrub/filemap.c b/scrub/filemap.c
new file mode 100644
index 0000000..a56fc2b
--- /dev/null
+++ b/scrub/filemap.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "filemap.h"
+
+/*
+ * These routines provide a simple interface to query the block
+ * mappings of the fork of a given inode via GETBMAPX and call a
+ * function to iterate each mapping result.
+ */
+
+#define BMAP_NR		2048
+
+/* Iterate all the extent block mappings between the key and fork end. */
+bool
+xfs_iterate_filemaps(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			fd,
+	int			whichfork,
+	struct xfs_bmap		*key,
+	xfs_bmap_iter_fn	fn,
+	void			*arg)
+{
+	struct fsxattr		fsx;
+	struct getbmapx		*map;
+	struct getbmapx		*p;
+	struct xfs_bmap		bmap;
+	char			bmap_descr[DESCR_BUFSZ];
+	bool			moveon = true;
+	xfs_off_t		new_off;
+	int			getxattr_type;
+	int			i;
+	int			error;
+
+	switch (whichfork) {
+	case XFS_ATTR_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s attr"), descr);
+		break;
+	case XFS_COW_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s CoW"), descr);
+		break;
+	case XFS_DATA_FORK:
+		snprintf(bmap_descr, DESCR_BUFSZ, _("%s data"), descr);
+		break;
+	default:
+		abort();
+	}
+
+	map = calloc(BMAP_NR, sizeof(struct getbmapx));
+	if (!map) {
+		str_errno(ctx, bmap_descr);
+		return false;
+	}
+
+	map->bmv_offset = BTOBB(key->bm_offset);
+	map->bmv_block = BTOBB(key->bm_physical);
+	if (key->bm_length == 0)
+		map->bmv_length = ULLONG_MAX;
+	else
+		map->bmv_length = BTOBB(key->bm_length);
+	map->bmv_count = BMAP_NR;
+	map->bmv_iflags = BMV_IF_NO_DMAPI_READ | BMV_IF_PREALLOC |
+			  BMV_IF_NO_HOLES;
+	switch (whichfork) {
+	case XFS_ATTR_FORK:
+		getxattr_type = XFS_IOC_FSGETXATTRA;
+		map->bmv_iflags |= BMV_IF_ATTRFORK;
+		break;
+	case XFS_COW_FORK:
+		map->bmv_iflags |= BMV_IF_COWFORK;
+		getxattr_type = FS_IOC_FSGETXATTR;
+		break;
+	case XFS_DATA_FORK:
+		getxattr_type = FS_IOC_FSGETXATTR;
+		break;
+	default:
+		abort();
+	}
+
+	error = ioctl(fd, getxattr_type, &fsx);
+	if (error < 0) {
+		str_errno(ctx, bmap_descr);
+		moveon = false;
+		goto out;
+	}
+
+	while ((error = ioctl(fd, XFS_IOC_GETBMAPX, map)) == 0) {
+		for (i = 0, p = &map[i + 1]; i < map->bmv_entries; i++, p++) {
+			bmap.bm_offset = BBTOB(p->bmv_offset);
+			bmap.bm_physical = BBTOB(p->bmv_block);
+			bmap.bm_length = BBTOB(p->bmv_length);
+			bmap.bm_flags = p->bmv_oflags;
+			moveon = fn(ctx, bmap_descr, fd, whichfork, &fsx,
+					&bmap, arg);
+			if (!moveon)
+				goto out;
+			if (xfs_scrub_excessive_errors(ctx)) {
+				moveon = false;
+				goto out;
+			}
+		}
+
+		if (map->bmv_entries == 0)
+			break;
+		p = map + map->bmv_entries;
+		if (p->bmv_oflags & BMV_OF_LAST)
+			break;
+
+		new_off = p->bmv_offset + p->bmv_length;
+		map->bmv_length -= new_off - map->bmv_offset;
+		map->bmv_offset = new_off;
+	}
+
+	/*
+	 * Pre-reflink filesystems don't know about CoW forks, so don't
+	 * be too surprised if it fails.
+	 */
+	if (whichfork == XFS_COW_FORK && error && errno == EINVAL)
+		error = 0;
+
+	if (error)
+		str_errno(ctx, bmap_descr);
+out:
+	memcpy(key, map, sizeof(struct getbmapx));
+	free(map);
+	return moveon;
+}
diff --git a/scrub/filemap.h b/scrub/filemap.h
new file mode 100644
index 0000000..17c0be7
--- /dev/null
+++ b/scrub/filemap.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_FILEMAP_H_
+#define XFS_SCRUB_FILEMAP_H_
+
+/* inode fork block mapping */
+struct xfs_bmap {
+	uint64_t	bm_offset;	/* file offset of segment in bytes */
+	uint64_t	bm_physical;	/* physical starting byte  */
+	uint64_t	bm_length;	/* length of segment, bytes */
+	uint32_t	bm_flags;	/* output flags */
+};
+
+typedef bool (*xfs_bmap_iter_fn)(struct scrub_ctx *ctx, const char *descr,
+		int fd, int whichfork, struct fsxattr *fsx,
+		struct xfs_bmap *bmap, void *arg);
+
+bool xfs_iterate_filemaps(struct scrub_ctx *ctx, const char *descr, int fd,
+		int whichfork, struct xfs_bmap *key, xfs_bmap_iter_fn fn,
+		void *arg);
+
+#endif /* XFS_SCRUB_FILEMAP_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/27] xfs_scrub: filesystem counter collection functions
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a couple of helper functions to estimate the inode and block
counters on the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile     |    2 
 scrub/fscounters.c |  212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/fscounters.h |   29 +++++++
 3 files changed, 243 insertions(+)
 create mode 100644 scrub/fscounters.c
 create mode 100644 scrub/fscounters.h


diff --git a/scrub/Makefile b/scrub/Makefile
index bf0a813..878a5d5 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -19,6 +19,7 @@ HFILES = \
 common.h \
 disk.h \
 filemap.h \
+fscounters.h \
 inodes.h \
 spacemap.h \
 xfs_scrub.h
@@ -27,6 +28,7 @@ CFILES = \
 common.c \
 disk.c \
 filemap.c \
+fscounters.c \
 inodes.c \
 phase1.c \
 spacemap.c \
diff --git a/scrub/fscounters.c b/scrub/fscounters.c
new file mode 100644
index 0000000..33667a8
--- /dev/null
+++ b/scrub/fscounters.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "fscounters.h"
+
+/*
+ * Filesystem counter collection routines.  We can count the number of
+ * inodes in the filesystem, and we can estimate the block counters.
+ */
+
+/* Count the number of inodes in the filesystem. */
+
+/* INUMBERS wrapper routines. */
+struct xfs_count_inodes {
+	bool			moveon;
+	uint64_t		counters[0];
+};
+
+/*
+ * Count the number of inodes.  Use INUMBERS to figure out how many inodes
+ * exist in the filesystem, assuming we've already scrubbed that.
+ */
+static bool
+xfs_count_inodes_range(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	uint64_t		first_ino,
+	uint64_t		last_ino,
+	uint64_t		*count)
+{
+	struct xfs_fsop_bulkreq	igrpreq = {0};
+	struct xfs_inogrp	inogrp;
+	__u64			igrp_ino;
+	uint64_t		nr = 0;
+	__s32			igrplen = 0;
+	int			error;
+
+	ASSERT(!(first_ino & (XFS_INODES_PER_CHUNK - 1)));
+	ASSERT((last_ino & (XFS_INODES_PER_CHUNK - 1)));
+
+	igrpreq.lastip  = &igrp_ino;
+	igrpreq.icount  = 1;
+	igrpreq.ubuffer = &inogrp;
+	igrpreq.ocount  = &igrplen;
+
+	igrp_ino = first_ino;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	while (!error && igrplen && inogrp.xi_startino < last_ino) {
+		nr += inogrp.xi_alloccount;
+		error = ioctl(ctx->mnt_fd, XFS_IOC_FSINUMBERS, &igrpreq);
+	}
+
+	if (error) {
+		str_errno(ctx, descr);
+		return false;
+	}
+
+	*count = nr;
+	return true;
+}
+
+/* Scan all the inodes in an AG. */
+static void
+xfs_count_ag_inodes(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct xfs_count_inodes	*ci = arg;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	char			descr[DESCR_BUFSZ];
+	uint64_t		ag_ino;
+	uint64_t		next_ag_ino;
+	bool			moveon;
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d AG %u inodes"),
+				major(ctx->fsinfo.fs_datadev),
+				minor(ctx->fsinfo.fs_datadev),
+				agno);
+
+	ag_ino = (__u64)agno << (ctx->inopblog + ctx->agblklog);
+	next_ag_ino = (__u64)(agno + 1) << (ctx->inopblog + ctx->agblklog);
+
+	moveon = xfs_count_inodes_range(ctx, descr, ag_ino, next_ag_ino - 1,
+			&ci->counters[agno]);
+	if (!moveon)
+		ci->moveon = false;
+}
+
+/* Count all the inodes in a filesystem. */
+bool
+xfs_count_all_inodes(
+	struct scrub_ctx	*ctx,
+	uint64_t		*count)
+{
+	struct xfs_count_inodes	*ci;
+	xfs_agnumber_t		agno;
+	struct workqueue	wq;
+	bool			moveon;
+	int			ret;
+
+	ci = calloc(1, sizeof(struct xfs_count_inodes) +
+			(ctx->geo.agcount * sizeof(uint64_t)));
+	if (!ci)
+		return false;
+	ci->moveon = true;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		goto out_free;
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_count_ag_inodes, agno, ci);
+		if (ret) {
+			moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u icount work."), agno);
+			break;
+		}
+	}
+	workqueue_destroy(&wq);
+
+	for (agno = 0; agno < ctx->geo.agcount; agno++)
+		*count += ci->counters[agno];
+	moveon = ci->moveon;
+
+out_free:
+	free(ci);
+	return moveon;
+}
+
+/* Estimate the number of blocks and inodes in the filesystem. */
+bool
+xfs_scan_estimate_blocks(
+	struct scrub_ctx		*ctx,
+	unsigned long long		*d_blocks,
+	unsigned long long		*d_bfree,
+	unsigned long long		*r_blocks,
+	unsigned long long		*r_bfree,
+	unsigned long long		*f_files,
+	unsigned long long		*f_free)
+{
+	struct xfs_fsop_counts		fc;
+	struct xfs_fsop_resblks		rb;
+	struct statvfs			sfs;
+	int				error;
+
+	/* Grab the fstatvfs counters, since it has to report accurately. */
+	error = fstatvfs(ctx->mnt_fd, &sfs);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Fetch the filesystem counters. */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_FSCOUNTS, &fc);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/*
+	 * XFS reserves some blocks to prevent hard ENOSPC, so add those
+	 * blocks back to the free data counts.
+	 */
+	error = ioctl(ctx->mnt_fd, XFS_IOC_GET_RESBLKS, &rb);
+	if (error)
+		str_errno(ctx, ctx->mntpoint);
+	sfs.f_bfree += rb.resblks_avail;
+
+	*d_blocks = sfs.f_blocks + (ctx->geo.logstart ? ctx->geo.logblocks : 0);
+	*d_bfree = sfs.f_bfree;
+	*r_blocks = ctx->geo.rtblocks;
+	*r_bfree = fc.freertx;
+	*f_files = sfs.f_files;
+	*f_free = sfs.f_ffree;
+
+	return true;
+}
diff --git a/scrub/fscounters.h b/scrub/fscounters.h
new file mode 100644
index 0000000..c181d05
--- /dev/null
+++ b/scrub/fscounters.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_FSCOUNTERS_H_
+#define XFS_SCRUB_FSCOUNTERS_H_
+
+bool xfs_scan_estimate_blocks(struct scrub_ctx *ctx,
+		unsigned long long *d_blocks, unsigned long long *d_bfree,
+		unsigned long long *r_blocks, unsigned long long *r_bfree,
+		unsigned long long *f_files, unsigned long long *f_free);
+bool xfs_count_all_inodes(struct scrub_ctx *ctx, uint64_t *count);
+
+#endif /* XFS_SCRUB_FSCOUNTERS_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/27] xfs_scrub: wrap the scrub ioctl
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some wrappers to call the scrub ioctls.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/common.c |   19 ++
 scrub/common.h |    1 
 scrub/phase1.c |    8 +
 scrub/scrub.c  |  588 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/scrub.h  |   61 ++++++
 6 files changed, 679 insertions(+)
 create mode 100644 scrub/scrub.c
 create mode 100644 scrub/scrub.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 878a5d5..b3f5220 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -21,6 +21,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+scrub.h \
 spacemap.h \
 xfs_scrub.h
 
@@ -31,6 +32,7 @@ filemap.c \
 fscounters.c \
 inodes.c \
 phase1.c \
+scrub.c \
 spacemap.c \
 xfs_scrub.c
 
diff --git a/scrub/common.c b/scrub/common.c
index 1d1b3e3..18c060d 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -320,3 +320,22 @@ find_mountpoint(
 	platform_mntent_close(&cursor);
 	return found;
 }
+
+/*
+ * Sleep for 100ms * however many -b we got past the initial one.
+ * This is an (albeit clumsy) way to throttle scrub activity.
+ */
+void
+background_sleep(void)
+{
+	unsigned long long	time;
+	struct timespec		tv;
+
+	if (bg_mode < 2)
+		return;
+
+	time = 100000 * (bg_mode - 1);
+	tv.tv_sec = time / 1000000;
+	tv.tv_nsec = time % 1000000;
+	nanosleep(&tv, NULL);
+}
diff --git a/scrub/common.h b/scrub/common.h
index ae5da76..e63f711 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -71,5 +71,6 @@ static inline int syncfs(int fd)
 #endif
 
 bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
+void background_sleep(void);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 0a18f85..5003f29 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -46,6 +46,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "disk.h"
+#include "scrub.h"
 
 /* Phase 1: Find filesystem geometry (and clean up after) */
 
@@ -168,6 +169,13 @@ _("Does not appear to be an XFS filesystem!"));
 		return false;
 	}
 
+	/* Do we have kernel-assisted metadata scrubbing? */
+	if (!xfs_can_scrub_fs_metadata(ctx) || !xfs_can_scrub_inode(ctx) ||
+	    !xfs_can_scrub_bmap(ctx) || !xfs_can_scrub_dir(ctx) ||
+	    !xfs_can_scrub_attr(ctx) || !xfs_can_scrub_symlink(ctx) ||
+	    !xfs_can_scrub_parent(ctx))
+		return false;
+
 	/* Go find the XFS devices if we have a usable fsmap. */
 	fs_table_initialise(0, NULL, 0, NULL);
 	errno = 0;
diff --git a/scrub/scrub.c b/scrub/scrub.c
new file mode 100644
index 0000000..2ff588c
--- /dev/null
+++ b/scrub/scrub.c
@@ -0,0 +1,588 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+#include "xfs_errortag.h"
+
+/* Online scrub and repair wrappers. */
+
+/* Type info and names for the scrub types. */
+enum scrub_type {
+	ST_NONE,	/* disabled */
+	ST_AGHEADER,	/* per-AG header */
+	ST_PERAG,	/* per-AG metadata */
+	ST_FS,		/* per-FS metadata */
+	ST_INODE,	/* per-inode metadata */
+};
+struct scrub_descr {
+	const char	*name;
+	enum scrub_type	type;
+};
+
+/* These must correspond to XFS_SCRUB_TYPE_ */
+static const struct scrub_descr scrubbers[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_PROBE] =
+		{"metadata",				ST_NONE},
+	[XFS_SCRUB_TYPE_SB] =
+		{"superblock",				ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGF] =
+		{"free space header",			ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGFL] =
+		{"free list",				ST_AGHEADER},
+	[XFS_SCRUB_TYPE_AGI] =
+		{"inode header",			ST_AGHEADER},
+	[XFS_SCRUB_TYPE_BNOBT] =
+		{"freesp by block btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_CNTBT] =
+		{"freesp by length btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_INOBT] =
+		{"inode btree",				ST_PERAG},
+	[XFS_SCRUB_TYPE_FINOBT] =
+		{"free inode btree",			ST_PERAG},
+	[XFS_SCRUB_TYPE_RMAPBT] =
+		{"reverse mapping btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_REFCNTBT] =
+		{"reference count btree",		ST_PERAG},
+	[XFS_SCRUB_TYPE_INODE] =
+		{"inode record",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTD] =
+		{"data block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTA] =
+		{"attr block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_BMBTC] =
+		{"CoW block map",			ST_INODE},
+	[XFS_SCRUB_TYPE_DIR] =
+		{"directory entries",			ST_INODE},
+	[XFS_SCRUB_TYPE_XATTR] =
+		{"extended attributes",			ST_INODE},
+	[XFS_SCRUB_TYPE_SYMLINK] =
+		{"symbolic link",			ST_INODE},
+	[XFS_SCRUB_TYPE_PARENT] =
+		{"parent pointer",			ST_INODE},
+	[XFS_SCRUB_TYPE_RTBITMAP] =
+		{"realtime bitmap",			ST_FS},
+	[XFS_SCRUB_TYPE_RTSUM] =
+		{"realtime summary",			ST_FS},
+	[XFS_SCRUB_TYPE_UQUOTA] =
+		{"user quotas",				ST_FS},
+	[XFS_SCRUB_TYPE_GQUOTA] =
+		{"group quotas",			ST_FS},
+	[XFS_SCRUB_TYPE_PQUOTA] =
+		{"project quotas",			ST_FS},
+};
+
+/* Format a scrub description. */
+static void
+format_scrub_descr(
+	char				*buf,
+	size_t				buflen,
+	struct xfs_scrub_metadata	*meta,
+	const struct scrub_descr	*sc)
+{
+	switch (sc->type) {
+	case ST_AGHEADER:
+	case ST_PERAG:
+		snprintf(buf, buflen, _("AG %u %s"), meta->sm_agno,
+				_(sc->name));
+		break;
+	case ST_INODE:
+		snprintf(buf, buflen, _("Inode %"PRIu64" %s"),
+				(uint64_t)meta->sm_ino, _(sc->name));
+		break;
+	case ST_FS:
+		snprintf(buf, buflen, _("%s"), _(sc->name));
+		break;
+	case ST_NONE:
+		assert(0);
+		break;
+	}
+}
+
+/* Predicates for scrub flag state. */
+
+static inline bool is_corrupt(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT;
+}
+
+static inline bool is_unoptimized(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_PREEN;
+}
+
+static inline bool xref_failed(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL;
+}
+
+static inline bool xref_disagrees(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT;
+}
+
+static inline bool is_incomplete(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE;
+}
+
+static inline bool is_suspicious(struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & XFS_SCRUB_OFLAG_WARNING;
+}
+
+/* Should we fix it? */
+static inline bool needs_repair(struct xfs_scrub_metadata *sm)
+{
+	return is_corrupt(sm) || xref_disagrees(sm);
+}
+
+/* Warn about strange circumstances after scrub. */
+static inline void
+xfs_scrub_warn_incomplete_scrub(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_scrub_metadata	*meta)
+{
+	if (is_incomplete(meta))
+		str_info(ctx, descr, _("Check incomplete."));
+
+	if (is_suspicious(meta)) {
+		if (debug)
+			str_info(ctx, descr, _("Possibly suspect metadata."));
+		else
+			str_warn(ctx, descr, _("Possibly suspect metadata."));
+	}
+
+	if (xref_failed(meta))
+		str_info(ctx, descr, _("Cross-referencing failed."));
+}
+
+/* Do a read-only check of some metadata. */
+static enum check_outcome
+xfs_check_metadata(
+	struct scrub_ctx		*ctx,
+	int				fd,
+	struct xfs_scrub_metadata	*meta,
+	bool				is_inode)
+{
+	char				buf[DESCR_BUFSZ];
+	unsigned int			tries = 0;
+	int				code;
+	int				error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
+	assert(meta->sm_type < XFS_SCRUB_TYPE_NR);
+	format_scrub_descr(buf, DESCR_BUFSZ, meta, &scrubbers[meta->sm_type]);
+
+	dbg_printf("check %s flags %xh\n", buf, meta->sm_flags);
+retry:
+	error = ioctl(fd, XFS_IOC_SCRUB_METADATA, meta);
+	if (debug_tweak_on("XFS_SCRUB_FORCE_REPAIR") && !error)
+		meta->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
+	if (error) {
+		code = errno;
+		switch (code) {
+		case ENOENT:
+			/* Metadata not present, just skip it. */
+			return CHECK_DONE;
+		case ESHUTDOWN:
+			/* FS already crashed, give up. */
+			str_error(ctx, buf,
+_("Filesystem is shut down, aborting."));
+			return CHECK_ABORT;
+		case ENOMEM:
+			/* Ran out of memory, just give up. */
+			str_errno(ctx, buf);
+			return CHECK_ABORT;
+		case EDEADLOCK:
+		case EBUSY:
+		case EFSBADCRC:
+		case EFSCORRUPTED:
+			/*
+			 * The first two should never escape the kernel,
+			 * and the other two should be reported via sm_flags.
+			 */
+			str_error(ctx, buf,
+_("Kernel bug!  errno=%d"), code);
+			/* fall through */
+		default:
+			/* Operational error. */
+			str_errno(ctx, buf);
+			return CHECK_DONE;
+		}
+	}
+
+	/*
+	 * If the kernel says the test was incomplete or that there was
+	 * a cross-referencing discrepancy but no obvious corruption,
+	 * we'll try the scan again, just in case the fs was busy.
+	 * Only retry so many times.
+	 */
+	if (tries < 10 && (is_incomplete(meta) ||
+			   (xref_disagrees(meta) && !is_corrupt(meta)))) {
+		tries++;
+		goto retry;
+	}
+
+	/* Complain about incomplete or suspicious metadata. */
+	xfs_scrub_warn_incomplete_scrub(ctx, buf, meta);
+
+	/*
+	 * If we need repairs or there were discrepancies, schedule a
+	 * repair if desired, otherwise complain.
+	 */
+	if (is_corrupt(meta) || xref_disagrees(meta)) {
+		if (ctx->mode < SCRUB_MODE_REPAIR) {
+			str_error(ctx, buf,
+_("Repairs are required."));
+			return CHECK_DONE;
+		}
+
+		return CHECK_REPAIR;
+	}
+
+	/*
+	 * If we could optimize, schedule a repair if desired,
+	 * otherwise complain.
+	 */
+	if (is_unoptimized(meta)) {
+		if (ctx->mode < SCRUB_MODE_PREEN) {
+			if (!is_inode) {
+				/* AG or FS metadata, always warn. */
+				str_info(ctx, buf,
+_("Optimization is possible."));
+			} else if (!ctx->preen_triggers[meta->sm_type]) {
+				/* File metadata, only warn once per type. */
+				pthread_mutex_lock(&ctx->lock);
+				if (!ctx->preen_triggers[meta->sm_type])
+					ctx->preen_triggers[meta->sm_type] = true;
+				pthread_mutex_unlock(&ctx->lock);
+			}
+			return CHECK_DONE;
+		}
+
+		return CHECK_REPAIR;
+	}
+
+	/* Everything is ok. */
+	return CHECK_DONE;
+}
+
+/* Bulk-notify user about things that could be optimized. */
+void
+xfs_scrub_report_preen_triggers(
+	struct scrub_ctx		*ctx)
+{
+	int				i;
+
+	for (i = 0; i < XFS_SCRUB_TYPE_NR; i++) {
+		pthread_mutex_lock(&ctx->lock);
+		if (ctx->preen_triggers[i]) {
+			ctx->preen_triggers[i] = false;
+			pthread_mutex_unlock(&ctx->lock);
+			str_info(ctx, ctx->mntpoint,
+_("Optimizations of %s are possible."), scrubbers[i].name);
+		} else {
+			pthread_mutex_unlock(&ctx->lock);
+		}
+	}
+}
+
+/* Scrub metadata, saving corruption reports for later. */
+static bool
+xfs_scrub_metadata(
+	struct scrub_ctx		*ctx,
+	enum scrub_type			scrub_type,
+	xfs_agnumber_t			agno)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	const struct scrub_descr	*sc;
+	enum check_outcome		fix;
+	int				type;
+
+	sc = scrubbers;
+	for (type = 0; type < XFS_SCRUB_TYPE_NR; type++, sc++) {
+		if (sc->type != scrub_type)
+			continue;
+
+		meta.sm_type = type;
+		meta.sm_flags = 0;
+		meta.sm_agno = agno;
+		background_sleep();
+
+		/* Check the item. */
+		fix = xfs_check_metadata(ctx, ctx->mnt_fd, &meta, false);
+		switch (fix) {
+		case CHECK_ABORT:
+			return false;
+		case CHECK_REPAIR:
+			/* fall through */
+		case CHECK_DONE:
+			continue;
+		case CHECK_RETRY:
+			abort();
+			break;
+		}
+	}
+
+	return true;
+}
+
+/* Scrub each AG's header blocks. */
+bool
+xfs_scrub_ag_headers(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno)
+{
+	return xfs_scrub_metadata(ctx, ST_AGHEADER, agno);
+}
+
+/* Scrub each AG's metadata btrees. */
+bool
+xfs_scrub_ag_metadata(
+	struct scrub_ctx		*ctx,
+	xfs_agnumber_t			agno)
+{
+	return xfs_scrub_metadata(ctx, ST_PERAG, agno);
+}
+
+/* Scrub whole-FS metadata btrees. */
+bool
+xfs_scrub_fs_metadata(
+	struct scrub_ctx		*ctx)
+{
+	return xfs_scrub_metadata(ctx, ST_FS, 0);
+}
+
+/* Scrub inode metadata. */
+static bool
+__xfs_scrub_file(
+	struct scrub_ctx		*ctx,
+	uint64_t			ino,
+	uint32_t			gen,
+	int				fd,
+	unsigned int			type)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	enum check_outcome		fix;
+
+	assert(type < XFS_SCRUB_TYPE_NR);
+	assert(scrubbers[type].type == ST_INODE);
+
+	meta.sm_type = type;
+	meta.sm_ino = ino;
+	meta.sm_gen = gen;
+
+	/* Scrub the piece of metadata. */
+	fix = xfs_check_metadata(ctx, fd, &meta, true);
+	if (fix == CHECK_ABORT)
+		return false;
+	if (fix == CHECK_DONE)
+		return true;
+
+	return true;
+}
+
+bool
+xfs_scrub_inode_fields(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_INODE);
+}
+
+bool
+xfs_scrub_data_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTD);
+}
+
+bool
+xfs_scrub_attr_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTA);
+}
+
+bool
+xfs_scrub_cow_fork(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_BMBTC);
+}
+
+bool
+xfs_scrub_dir(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_DIR);
+}
+
+bool
+xfs_scrub_attr(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_XATTR);
+}
+
+bool
+xfs_scrub_symlink(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_SYMLINK);
+}
+
+bool
+xfs_scrub_parent(
+	struct scrub_ctx	*ctx,
+	uint64_t		ino,
+	uint32_t		gen,
+	int			fd)
+{
+	return __xfs_scrub_file(ctx, ino, gen, fd, XFS_SCRUB_TYPE_PARENT);
+}
+
+/* Test the availability of a kernel scrub command. */
+static bool
+__xfs_scrub_test(
+	struct scrub_ctx		*ctx,
+	unsigned int			type,
+	bool				repair)
+{
+	struct xfs_scrub_metadata	meta = {0};
+	int				error;
+
+	if (debug_tweak_on("XFS_SCRUB_NO_KERNEL"))
+		return false;
+
+	meta.sm_type = type;
+	if (repair)
+		meta.sm_flags |= XFS_SCRUB_IFLAG_REPAIR;
+	error = ioctl(ctx->mnt_fd, XFS_IOC_SCRUB_METADATA, &meta);
+	if (!error)
+		return true;
+	switch (errno) {
+	case EROFS:
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem is mounted read-only; cannot proceed."));
+		return false;
+	case ENOTRECOVERABLE:
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem is mounted norecovery; cannot proceed."));
+		return false;
+	case EOPNOTSUPP:
+	case ENOTTY:
+		str_info(ctx, ctx->mntpoint,
+_("Kernel %s %s facility is required."),
+				_(scrubbers[type].name),
+				repair ? _("repair") : _("scrub"));
+		return false;
+	case ENOENT:
+		/* Scrubber says not present on this fs; that's fine. */
+		return true;
+	default:
+		str_info(ctx, ctx->mntpoint, "%s", strerror(errno));
+		return true;
+	}
+	return error == 0 || (error && errno != EOPNOTSUPP && errno != ENOTTY);
+}
+
+bool
+xfs_can_scrub_fs_metadata(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PROBE, false);
+}
+
+bool
+xfs_can_scrub_inode(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_INODE, false);
+}
+
+bool
+xfs_can_scrub_bmap(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_BMBTD, false);
+}
+
+bool
+xfs_can_scrub_dir(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_DIR, false);
+}
+
+bool
+xfs_can_scrub_attr(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_XATTR, false);
+}
+
+bool
+xfs_can_scrub_symlink(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_SYMLINK, false);
+}
+
+bool
+xfs_can_scrub_parent(
+	struct scrub_ctx	*ctx)
+{
+	return __xfs_scrub_test(ctx, XFS_SCRUB_TYPE_PARENT, false);
+}
diff --git a/scrub/scrub.h b/scrub/scrub.h
new file mode 100644
index 0000000..4d687de
--- /dev/null
+++ b/scrub/scrub.h
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_SCRUB_H_
+#define XFS_SCRUB_SCRUB_H_
+
+/* Online scrub and repair. */
+enum check_outcome {
+	CHECK_DONE,	/* no further processing needed */
+	CHECK_REPAIR,	/* schedule this for repairs */
+	CHECK_ABORT,	/* end program */
+	CHECK_RETRY,	/* repair failed, try again later */
+};
+
+void xfs_scrub_report_preen_triggers(struct scrub_ctx *ctx);
+bool xfs_scrub_ag_headers(struct scrub_ctx *ctx, xfs_agnumber_t agno);
+bool xfs_scrub_ag_metadata(struct scrub_ctx *ctx, xfs_agnumber_t agno);
+bool xfs_scrub_fs_metadata(struct scrub_ctx *ctx);
+
+bool xfs_can_scrub_fs_metadata(struct scrub_ctx *ctx);
+bool xfs_can_scrub_inode(struct scrub_ctx *ctx);
+bool xfs_can_scrub_bmap(struct scrub_ctx *ctx);
+bool xfs_can_scrub_dir(struct scrub_ctx *ctx);
+bool xfs_can_scrub_attr(struct scrub_ctx *ctx);
+bool xfs_can_scrub_symlink(struct scrub_ctx *ctx);
+bool xfs_can_scrub_parent(struct scrub_ctx *ctx);
+
+bool xfs_scrub_inode_fields(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_data_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_attr_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_cow_fork(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_dir(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_attr(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_symlink(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+bool xfs_scrub_parent(struct scrub_ctx *ctx, uint64_t ino, uint32_t gen,
+		int fd);
+
+#endif /* XFS_SCRUB_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scrub the filesystem and per-AG metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/phase2.c    |  120 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 
 scrub/xfs_scrub.h |    1 
 4 files changed, 123 insertions(+)
 create mode 100644 scrub/phase2.c


diff --git a/scrub/Makefile b/scrub/Makefile
index b3f5220..b63fd92 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -32,6 +32,7 @@ filemap.c \
 fscounters.c \
 inodes.c \
 phase1.c \
+phase2.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase2.c b/scrub/phase2.c
new file mode 100644
index 0000000..b1f2c6e
--- /dev/null
+++ b/scrub/phase2.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+
+/* Phase 2: Check internal metadata. */
+
+/* Scrub each AG's metadata btrees. */
+static void
+xfs_scan_ag_metadata(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	bool				*pmoveon = arg;
+	bool				moveon;
+	char				descr[DESCR_BUFSZ];
+
+	snprintf(descr, DESCR_BUFSZ, _("AG %u"), agno);
+
+	/*
+	 * First we scrub and fix the AG headers, because we need
+	 * them to work well enough to check the AG btrees.
+	 */
+	moveon = xfs_scrub_ag_headers(ctx, agno);
+	if (!moveon)
+		goto err;
+
+	/* Now scrub the AG btrees. */
+	moveon = xfs_scrub_ag_metadata(ctx, agno);
+	if (!moveon)
+		goto err;
+
+	return;
+err:
+	*pmoveon = false;
+}
+
+/* Scrub whole-FS metadata btrees. */
+static void
+xfs_scan_fs_metadata(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct scrub_ctx		*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	bool				*pmoveon = arg;
+	bool				moveon;
+
+	moveon = xfs_scrub_fs_metadata(ctx);
+	if (!moveon)
+		*pmoveon = false;
+}
+
+/* Scan all filesystem metadata. */
+bool
+xfs_scan_metadata(
+	struct scrub_ctx	*ctx)
+{
+	xfs_agnumber_t		agno;
+	struct workqueue	wq;
+	bool			moveon = true;
+	int			ret;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		return false;
+	}
+	for (agno = 0; agno < ctx->geo.agcount; agno++) {
+		ret = workqueue_add(&wq, xfs_scan_ag_metadata, agno, &moveon);
+		if (ret) {
+			moveon = false;
+			str_error(ctx, ctx->mntpoint,
+_("Could not queue AG %u fsmap work."), agno);
+			goto out;
+		}
+	}
+
+	ret = workqueue_add(&wq, xfs_scan_fs_metadata, 0, &moveon);
+	if (ret) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint,
+_("Could not queue filesystem scrub work."));
+		goto out;
+	}
+
+out:
+	workqueue_destroy(&wq);
+	return moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index a003e44..c5e8368 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -353,6 +353,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check internal metadata."),
+			.fn = xfs_scan_metadata,
 		},
 		{
 			.descr = _("Scan all inodes."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 037452e..8ebe097 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -97,5 +97,6 @@ struct scrub_ctx {
 void xfs_shutdown_fs(struct scrub_ctx *ctx);
 bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
+bool xfs_scan_metadata(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/27] xfs_scrub: thread-safe stats counter
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a threaded stats counter that we'll use to track scan progress.
This includes things like how much of the disk blocks we've scanned,
or later how much progress we've made in each phase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/ptvar.h  |   32 +++++++++++++
 libfrog/Makefile |    1 
 libfrog/ptvar.c  |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile   |    2 +
 scrub/counter.c  |  104 ++++++++++++++++++++++++++++++++++++++++++
 scrub/counter.h  |   29 ++++++++++++
 6 files changed, 301 insertions(+)
 create mode 100644 include/ptvar.h
 create mode 100644 libfrog/ptvar.c
 create mode 100644 scrub/counter.c
 create mode 100644 scrub/counter.h


diff --git a/include/ptvar.h b/include/ptvar.h
new file mode 100644
index 0000000..52d03a5
--- /dev/null
+++ b/include/ptvar.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef LIBFROG_PERCPU_H_
+#define LIBFROG_PERCPU_H_
+
+struct ptvar;
+
+typedef bool (*ptvar_iter_fn)(struct ptvar *ptv, void *data, void *foreach_arg);
+
+struct ptvar *ptvar_init(size_t nr, size_t size);
+void ptvar_free(struct ptvar *ptv);
+void *ptvar_get(struct ptvar *ptv);
+bool ptvar_foreach(struct ptvar *ptv, ptvar_iter_fn fn, void *foreach_arg);
+
+#endif /* LIBFROG_PERCPU_H_ */
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 4c15605..230b08f 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -16,6 +16,7 @@ convert.c \
 list_sort.c \
 paths.c \
 projects.c \
+ptvar.c \
 radix-tree.c \
 topology.c \
 util.c \
diff --git a/libfrog/ptvar.c b/libfrog/ptvar.c
new file mode 100644
index 0000000..818667c
--- /dev/null
+++ b/libfrog/ptvar.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+#include "platform_defs.h"
+#include "ptvar.h"
+
+/*
+ * Per-thread Variables
+ *
+ * This data structure manages a lockless per-thread variable.  We
+ * implement this by allocating an array of memory regions, and as each
+ * thread tries to acquire its own region, we hand out the array
+ * elements to each thread.  This way, each thread gets its own
+ * cacheline and (after the first access) doesn't have to contend for a
+ * lock for each access.
+ */
+struct ptvar {
+	pthread_key_t	key;
+	pthread_mutex_t	lock;
+	size_t		nr_used;
+	size_t		nr_counters;
+	size_t		data_size;
+	unsigned char	data[0];
+};
+#define PTVAR_SIZE(nr, sz) (sizeof(struct ptvar) + ((nr) * (size)))
+
+/* Initialize per-thread counter. */
+struct ptvar *
+ptvar_init(
+	size_t		nr,
+	size_t		size)
+{
+	struct ptvar	*ptv;
+	int		ret;
+
+#ifdef _SC_LEVEL1_DCACHE_LINESIZE
+	/* Try to prevent cache pingpong by aligning to cacheline size. */
+	size = max(size, sysconf(_SC_LEVEL1_DCACHE_LINESIZE));
+#endif
+
+	ptv = malloc(PTVAR_SIZE(nr, size));
+	if (!ptv)
+		return NULL;
+	ptv->data_size = size;
+	ptv->nr_counters = nr;
+	ptv->nr_used = 0;
+	memset(ptv->data, 0, nr * size);
+	ret = pthread_mutex_init(&ptv->lock, NULL);
+	if (ret)
+		goto out;
+	ret = pthread_key_create(&ptv->key, NULL);
+	if (ret)
+		goto out_mutex;
+	return ptv;
+
+out_mutex:
+	pthread_mutex_destroy(&ptv->lock);
+out:
+	free(ptv);
+	return NULL;
+}
+
+/* Free per-thread counter. */
+void
+ptvar_free(
+	struct ptvar	*ptv)
+{
+	pthread_key_delete(ptv->key);
+	pthread_mutex_destroy(&ptv->lock);
+	free(ptv);
+}
+
+/* Get a reference to this thread's variable. */
+void *
+ptvar_get(
+	struct ptvar	*ptv)
+{
+	void		*p;
+
+	p = pthread_getspecific(ptv->key);
+	if (!p) {
+		pthread_mutex_lock(&ptv->lock);
+		assert(ptv->nr_used < ptv->nr_counters);
+		p = &ptv->data[(ptv->nr_used++) * ptv->data_size];
+		pthread_setspecific(ptv->key, p);
+		pthread_mutex_unlock(&ptv->lock);
+	}
+	return p;
+}
+
+/* Iterate all of the per-thread variables. */
+bool
+ptvar_foreach(
+	struct ptvar	*ptv,
+	ptvar_iter_fn	fn,
+	void		*foreach_arg)
+{
+	size_t		i;
+	bool		ret = true;
+
+	pthread_mutex_lock(&ptv->lock);
+	for (i = 0; i < ptv->nr_used; i++) {
+		ret = fn(ptv, &ptv->data[i * ptv->data_size], foreach_arg);
+		if (!ret)
+			break;
+	}
+	pthread_mutex_unlock(&ptv->lock);
+
+	return ret;
+}
diff --git a/scrub/Makefile b/scrub/Makefile
index b63fd92..71d80fb 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -17,6 +17,7 @@ endif	# scrub_prereqs
 
 HFILES = \
 common.h \
+counter.h \
 disk.h \
 filemap.h \
 fscounters.h \
@@ -27,6 +28,7 @@ xfs_scrub.h
 
 CFILES = \
 common.c \
+counter.c \
 disk.c \
 filemap.c \
 fscounters.c \
diff --git a/scrub/counter.c b/scrub/counter.c
new file mode 100644
index 0000000..b503f7a
--- /dev/null
+++ b/scrub/counter.c
@@ -0,0 +1,104 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <assert.h>
+#include <pthread.h>
+#include "ptvar.h"
+#include "counter.h"
+
+/*
+ * Per-Thread Counters
+ *
+ * This is a global counter object that uses per-thread counters to
+ * count things without having to content for a single shared lock.
+ * Provided we know the number of threads that will be accessing the
+ * counter, each thread gets its own thread-specific counter variable.
+ * Changing the value is fast, though retrieving the value is expensive
+ * and approximate.
+ */
+struct ptcounter {
+	struct ptvar	*var;
+};
+
+/* Initialize per-thread counter. */
+struct ptcounter *
+ptcounter_init(
+	size_t			nr)
+{
+	struct ptcounter	*p;
+
+	p = malloc(sizeof(struct ptcounter));
+	if (!p)
+		return NULL;
+	p->var = ptvar_init(nr, sizeof(uint64_t));
+	if (!p->var) {
+		free(p);
+		return NULL;
+	}
+	return p;
+}
+
+/* Free per-thread counter. */
+void
+ptcounter_free(
+	struct ptcounter	*ptc)
+{
+	ptvar_free(ptc->var);
+	free(ptc);
+}
+
+/* Add a quantity to the counter. */
+void
+ptcounter_add(
+	struct ptcounter	*ptc,
+	int64_t			nr)
+{
+	uint64_t		*p;
+
+	p = ptvar_get(ptc->var);
+	*p += nr;
+}
+
+static bool
+ptcounter_val_helper(
+	struct ptvar		*ptv,
+	void			*data,
+	void			*foreach_arg)
+{
+	uint64_t		*sum = foreach_arg;
+	uint64_t		*count = data;
+
+	*sum += *count;
+	return true;
+}
+
+/* Return the approximate value of this counter. */
+uint64_t
+ptcounter_value(
+	struct ptcounter	*ptc)
+{
+	uint64_t		sum = 0;
+
+	ptvar_foreach(ptc->var, ptcounter_val_helper, &sum);
+	return sum;
+}
diff --git a/scrub/counter.h b/scrub/counter.h
new file mode 100644
index 0000000..f6225b2
--- /dev/null
+++ b/scrub/counter.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COUNTER_H_
+#define XFS_SCRUB_COUNTER_H_
+
+struct ptcounter;
+struct ptcounter *ptcounter_init(size_t nr);
+void ptcounter_free(struct ptcounter *ptc);
+void ptcounter_add(struct ptcounter *ptc, int64_t nr);
+uint64_t ptcounter_value(struct ptcounter *ptc);
+
+#endif /* XFS_SCRUB_COUNTER_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/27] xfs_scrub: scan inodes
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Scan all the inodes in the system for problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/phase3.c    |  154 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 
 scrub/xfs_scrub.h |    2 +
 4 files changed, 158 insertions(+)
 create mode 100644 scrub/phase3.c


diff --git a/scrub/Makefile b/scrub/Makefile
index 71d80fb..eca2c56 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -35,6 +35,7 @@ fscounters.c \
 inodes.c \
 phase1.c \
 phase2.c \
+phase3.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase3.c b/scrub/phase3.c
new file mode 100644
index 0000000..8c3748e
--- /dev/null
+++ b/scrub/phase3.c
@@ -0,0 +1,154 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "inodes.h"
+#include "scrub.h"
+
+/* Phase 3: Scan all inodes. */
+
+/*
+ * Run a per-file metadata scanner.  We use the ino/gen interface to
+ * ensure that the inode we're checking matches what the inode scan
+ * told us to look at.
+ */
+static bool
+xfs_scrub_fd(
+	struct scrub_ctx	*ctx,
+	bool			(*fn)(struct scrub_ctx *, uint64_t,
+				      uint32_t, int),
+	struct xfs_bstat	*bs)
+{
+	return fn(ctx, bs->bs_ino, bs->bs_gen, ctx->mnt_fd);
+}
+
+/* Verify the contents, xattrs, and extent maps of an inode. */
+static int
+xfs_scrub_inode(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bstat	*bstat,
+	void			*arg)
+{
+	char			descr[DESCR_BUFSZ];
+	struct ptcounter	*icount = arg;
+	bool			moveon = true;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			fd = -1;
+	int			error = 0;
+
+	agno = bstat->bs_ino / (1ULL << (ctx->inopblog + ctx->agblklog));
+	agino = bstat->bs_ino % (1ULL << (ctx->inopblog + ctx->agblklog));
+	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (%u/%u)"),
+			(uint64_t)bstat->bs_ino, agno, agino);
+	background_sleep();
+
+	/* Try to open the inode to pin it. */
+	if (S_ISREG(bstat->bs_mode)) {
+		fd = xfs_open_handle(handle);
+		if (fd < 0) {
+			error = errno;
+			if (error != ESTALE)
+				str_errno(ctx, descr);
+			goto out;
+		}
+	}
+
+	/* Scrub the inode. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_inode_fields, bstat);
+	if (!moveon)
+		goto out;
+
+	/* Scrub all block mappings. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_data_fork, bstat);
+	if (!moveon)
+		goto out;
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr_fork, bstat);
+	if (!moveon)
+		goto out;
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_cow_fork, bstat);
+	if (!moveon)
+		goto out;
+
+	if (S_ISLNK(bstat->bs_mode)) {
+		/* Check symlink contents. */
+		moveon = xfs_scrub_symlink(ctx, bstat->bs_ino,
+				bstat->bs_gen, ctx->mnt_fd);
+	} else if (S_ISDIR(bstat->bs_mode)) {
+		/* Check the directory entries. */
+		moveon = xfs_scrub_fd(ctx, xfs_scrub_dir, bstat);
+	}
+	if (!moveon)
+		goto out;
+
+	/* Check all the extended attributes. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_attr, bstat);
+	if (!moveon)
+		goto out;
+
+	/* Check parent pointers. */
+	moveon = xfs_scrub_fd(ctx, xfs_scrub_parent, bstat);
+	if (!moveon)
+		goto out;
+
+out:
+	ptcounter_add(icount, 1);
+	if (fd >= 0)
+		close(fd);
+	if (error)
+		return error;
+	return moveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Verify all the inodes in a filesystem. */
+bool
+xfs_scan_inodes(
+	struct scrub_ctx	*ctx)
+{
+	struct ptcounter	*icount;
+	bool			moveon;
+
+	icount = ptcounter_init(scrub_nproc(ctx));
+	if (!icount) {
+		str_error(ctx, ctx->mntpoint, _("Could not create counter."));
+		return false;
+	}
+
+	moveon = xfs_scan_all_inodes(ctx, xfs_scrub_inode, icount);
+	if (!moveon)
+		goto free;
+	xfs_scrub_report_preen_triggers(ctx);
+	ctx->inodes_checked = ptcounter_value(icount);
+
+free:
+	ptcounter_free(icount);
+	return moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index c5e8368..592fc35 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -357,6 +357,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Scan all inodes."),
+			.fn = xfs_scan_inodes,
 		},
 		{
 			.descr = _("Defer filesystem repairs."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 8ebe097..8407a46 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -89,6 +89,7 @@ struct scrub_ctx {
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
+	unsigned long long	inodes_checked;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };
@@ -98,5 +99,6 @@ void xfs_shutdown_fs(struct scrub_ctx *ctx);
 bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
+bool xfs_scan_inodes(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/27] xfs_scrub: check directory connectivity
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Opening directories by file handle will cause the kernel to perform
parent lookups all the way to the root directory.  Take advantage of
this to ensure that directories actually connect to the root.  Some
day we'll have parent pointers and can make this more comprehensive.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 +
 scrub/phase5.c    |   94 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    1 +
 scrub/xfs_scrub.h |    1 +
 4 files changed, 97 insertions(+)
 create mode 100644 scrub/phase5.c


diff --git a/scrub/Makefile b/scrub/Makefile
index eca2c56..ca08bd5 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -36,6 +36,7 @@ inodes.c \
 phase1.c \
 phase2.c \
 phase3.c \
+phase5.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/phase5.c b/scrub/phase5.c
new file mode 100644
index 0000000..7cc0496
--- /dev/null
+++ b/scrub/phase5.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "inodes.h"
+#include "scrub.h"
+
+/* Phase 5: Check directory connectivity. */
+
+/*
+ * Verify the connectivity of the directory tree.
+ * We know that the kernel's open-by-handle function will try to reconnect
+ * parents of an opened directory, so we'll accept that as sufficient.
+ */
+static int
+xfs_scrub_connections(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bstat	*bstat,
+	void			*arg)
+{
+	char			descr[DESCR_BUFSZ];
+	bool			moveon = true;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			fd = -1;
+	int			error = 0;
+
+	agno = bstat->bs_ino / (1ULL << (ctx->inopblog + ctx->agblklog));
+	agino = bstat->bs_ino % (1ULL << (ctx->inopblog + ctx->agblklog));
+	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (%u/%u)"),
+			(uint64_t)bstat->bs_ino, agno, agino);
+	background_sleep();
+
+	/* Open the dir, let the kernel try to reconnect it to the root. */
+	if (S_ISDIR(bstat->bs_mode)) {
+		fd = xfs_open_handle(handle);
+		if (fd < 0) {
+			error = errno;
+			if (error != ESTALE)
+				str_errno(ctx, descr);
+			goto out;
+		}
+	}
+
+out:
+	if (fd >= 0)
+		close(fd);
+	if (error)
+		return error;
+	return moveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Check directory connectivity. */
+bool
+xfs_scan_connections(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->errors_found) {
+		str_info(ctx, ctx->mntpoint,
+_("Filesystem has errors, skipping connectivity checks."));
+		return true;
+	}
+	if (!xfs_scan_all_inodes(ctx, xfs_scrub_connections, NULL))
+		return false;
+	xfs_scrub_report_preen_triggers(ctx);
+	return true;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 592fc35..642f541 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -365,6 +365,7 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check directory tree."),
+			.fn = xfs_scan_connections,
 		},
 		{
 			.descr = _("Verify data file integrity."),
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 8407a46..373901e 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -100,5 +100,6 @@ bool xfs_cleanup_fs(struct scrub_ctx *ctx);
 bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
+bool xfs_scan_connections(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:01 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Look for control characters and punctuation that interfere with shell
globbing in directory entry names and extended attribute key names.
Technically these aren't filesystem corruptions because names are
arbitrary sequences of bytes, but they've been known to cause problems
in the Unix environment so warn if we see them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac         |    2 +
 debian/control       |    2 -
 include/builddefs.in |    1 
 m4/Makefile          |    1 
 m4/package_attr.m4   |   23 ++++++
 scrub/Makefile       |    6 ++
 scrub/common.c       |   54 ++++++++++++++
 scrub/common.h       |    4 +
 scrub/phase5.c       |  192 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h    |    1 
 10 files changed, 285 insertions(+), 1 deletion(-)
 create mode 100644 m4/package_attr.m4


diff --git a/configure.ac b/configure.ac
index 960a40a..58dc9e8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -163,6 +163,8 @@ AC_HAVE_MREMAP
 AC_NEED_INTERNAL_FSXATTR
 AC_HAVE_GETFSMAP
 AC_HAVE_MALLINFO
+AC_PACKAGE_WANT_ATTRIBUTES_H
+AC_HAVE_LIBATTR
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/debian/control b/debian/control
index ad81662..1ef0b97 100644
--- a/debian/control
+++ b/debian/control
@@ -3,7 +3,7 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libattr1-dev
 Standards-Version: 3.9.1
 Homepage: http://xfs.org/
 
diff --git a/include/builddefs.in b/include/builddefs.in
index 4be2efb..7964599 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -116,6 +116,7 @@ HAVE_MREMAP = @have_mremap@
 NEED_INTERNAL_FSXATTR = @need_internal_fsxattr@
 HAVE_GETFSMAP = @have_getfsmap@
 HAVE_MALLINFO = @have_mallinfo@
+HAVE_LIBATTR = @have_libattr@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/Makefile b/m4/Makefile
index 4706121..100c8f5 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -16,6 +16,7 @@ LSRCFILES = \
 	manual_format.m4 \
 	package_blkid.m4 \
 	package_globals.m4 \
+	package_attr.m4 \
 	package_libcdev.m4 \
 	package_pthread.m4 \
 	package_sanitizer.m4 \
diff --git a/m4/package_attr.m4 b/m4/package_attr.m4
new file mode 100644
index 0000000..4324923
--- /dev/null
+++ b/m4/package_attr.m4
@@ -0,0 +1,23 @@
+AC_DEFUN([AC_PACKAGE_WANT_ATTRIBUTES_H],
+  [
+    AC_CHECK_HEADERS(attr/attributes.h)
+  ])
+
+#
+# Check if we have a ATTR_ROOT flag and libattr structures
+#
+AC_DEFUN([AC_HAVE_LIBATTR],
+  [ AC_MSG_CHECKING([for struct attrlist_cursor])
+    AC_TRY_COMPILE([
+#include <sys/types.h>
+#include <attr/attributes.h>
+       ], [
+struct attrlist_cursor *cur;
+struct attrlist *list;
+struct attrlist_ent *ent;
+int flags = ATTR_ROOT;
+       ], have_libattr=yes
+          AC_MSG_RESULT(yes),
+          AC_MSG_RESULT(no))
+    AC_SUBST(have_libattr)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index ca08bd5..d7c24a1 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -53,8 +53,14 @@ ifeq ($(HAVE_SYNCFS),yes)
 LCFLAGS += -DHAVE_SYNCFS
 endif
 
+ifeq ($(HAVE_LIBATTR),yes)
+LCFLAGS += -DHAVE_LIBATTR
+endif
+
 default: depend $(LTCOMMAND)
 
+phase5.o: $(TOPDIR)/include/builddefs
+
 include $(BUILDRULES)
 
 install: default $(INSTALL_SCRUB)
diff --git a/scrub/common.c b/scrub/common.c
index 18c060d..ceb80bc 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -339,3 +339,57 @@ background_sleep(void)
 	tv.tv_nsec = time % 1000000;
 	nanosleep(&tv, NULL);
 }
+
+/*
+ * Return the input string with non-printing bytes escaped.
+ * Caller must free the buffer.
+ */
+char *
+string_escape(
+	const char		*in)
+{
+	char			*str;
+	const char		*p;
+	char			*q;
+	int			x;
+
+	str = malloc(strlen(in) * 4);
+	if (!str)
+		return NULL;
+	for (p = in, q = str; *p != '\0'; p++) {
+		if (isprint(*p)) {
+			*q = *p;
+			q++;
+		} else {
+			x = sprintf(q, "\\x%02x", *p);
+			q += x;
+		}
+	}
+	*q = '\0';
+	return str;
+}
+
+/*
+ * Record another naming warning, and decide if it's worth
+ * complaining about.
+ */
+bool
+should_warn_about_name(
+	struct scrub_ctx	*ctx)
+{
+	bool			whine;
+	bool			res;
+
+	pthread_mutex_lock(&ctx->lock);
+	ctx->naming_warnings++;
+	whine = ctx->naming_warnings == TOO_MANY_NAME_WARNINGS;
+	res = ctx->naming_warnings < TOO_MANY_NAME_WARNINGS;
+	pthread_mutex_unlock(&ctx->lock);
+
+	if (whine && !(debug || verbose))
+		str_info(ctx, ctx->mntpoint,
+_("More than %u naming warnings, shutting up."),
+				TOO_MANY_NAME_WARNINGS);
+
+	return debug || verbose || res;
+}
diff --git a/scrub/common.h b/scrub/common.h
index e63f711..3bb2524 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -72,5 +72,9 @@ static inline int syncfs(int fd)
 
 bool find_mountpoint(char *mtab, struct scrub_ctx *ctx);
 void background_sleep(void);
+char *string_escape(const char *in);
+
+#define TOO_MANY_NAME_WARNINGS	10000
+bool should_warn_about_name(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 7cc0496..a248c59 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -20,10 +20,15 @@
 #include <stdio.h>
 #include <stdint.h>
 #include <stdbool.h>
+#include <dirent.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/statvfs.h>
+#ifdef HAVE_LIBATTR
+# include <attr/attributes.h>
+#endif
 #include "xfs.h"
+#include "handle.h"
 #include "path.h"
 #include "workqueue.h"
 #include "xfs_scrub.h"
@@ -34,6 +39,181 @@
 /* Phase 5: Check directory connectivity. */
 
 /*
+ * Warn about problematic bytes in a directory/attribute name.  That means
+ * terminal control characters and escape sequences, since that could be used
+ * to do something naughty to the user's computer and/or break scripts.  XFS
+ * doesn't consider any byte sequence invalid, so don't flag these as errors.
+ */
+static bool
+xfs_scrub_check_name(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*namedescr,
+	const char		*name)
+{
+	const char		*p;
+	bool			bad = false;
+	char			*errname;
+
+	/* Complain about zero length names. */
+	if (*name == '\0' && should_warn_about_name(ctx)) {
+		str_warn(ctx, descr, _("Zero length name found."));
+		return true;
+	}
+
+	/* control characters */
+	for (p = name; *p; p++) {
+		if ((*p >= 1 && *p <= 31) || *p == 127) {
+			bad = true;
+			break;
+		}
+	}
+
+	if (bad && should_warn_about_name(ctx)) {
+		errname = string_escape(name);
+		if (!errname) {
+			str_errno(ctx, descr);
+			return false;
+		}
+		str_info(ctx, descr,
+_("Control character found in %s name \"%s\"."),
+				namedescr, errname);
+		free(errname);
+	}
+
+	return true;
+}
+
+/*
+ * Iterate a directory looking for filenames with problematic
+ * characters.
+ */
+static bool
+xfs_scrub_scan_dirents(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	int			*fd)
+{
+	DIR			*dir;
+	struct dirent		*dentry;
+	bool			moveon = true;
+
+	dir = fdopendir(*fd);
+	if (!dir) {
+		str_errno(ctx, descr);
+		goto out;
+	}
+	*fd = -1; /* closedir will close *fd for us */
+
+	dentry = readdir(dir);
+	while (dentry) {
+		moveon = xfs_scrub_check_name(ctx, descr, _("directory"),
+				dentry->d_name);
+		if (!moveon)
+			break;
+		dentry = readdir(dir);
+	}
+
+	closedir(dir);
+out:
+	return moveon;
+}
+
+#ifdef HAVE_LIBATTR
+/* Routines to scan all of an inode's xattrs for name problems. */
+struct xfs_attr_ns {
+	int			flags;
+	const char		*name;
+};
+
+static const struct xfs_attr_ns attr_ns[] = {
+	{0,			"user"},
+	{ATTR_ROOT,		"system"},
+	{ATTR_SECURE,		"secure"},
+	{0, NULL},
+};
+
+/*
+ * Check all the xattr names in a particular namespace of a file handle
+ * for Unicode normalization problems or collisions.
+ */
+static bool
+xfs_scrub_scan_fhandle_namespace_xattrs(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_handle		*handle,
+	const struct xfs_attr_ns	*attr_ns)
+{
+	struct attrlist_cursor		cur;
+	char				attrbuf[XFS_XATTR_LIST_MAX];
+	char				keybuf[NAME_MAX + 1];
+	struct attrlist			*attrlist = (struct attrlist *)attrbuf;
+	struct attrlist_ent		*ent;
+	bool				moveon = true;
+	int				i;
+	int				error;
+
+	memset(attrbuf, 0, XFS_XATTR_LIST_MAX);
+	memset(&cur, 0, sizeof(cur));
+	memset(keybuf, 0, NAME_MAX + 1);
+	error = attr_list_by_handle(handle, sizeof(*handle), attrbuf,
+			XFS_XATTR_LIST_MAX, attr_ns->flags, &cur);
+	while (!error) {
+		/* Examine the xattrs. */
+		for (i = 0; i < attrlist->al_count; i++) {
+			ent = ATTR_ENTRY(attrlist, i);
+			snprintf(keybuf, NAME_MAX, "%s.%s", attr_ns->name,
+					ent->a_name);
+			moveon = xfs_scrub_check_name(ctx, descr,
+					_("extended attribute"), keybuf);
+			if (!moveon)
+				goto out;
+		}
+
+		if (!attrlist->al_more)
+			break;
+		error = attr_list_by_handle(handle, sizeof(*handle), attrbuf,
+				XFS_XATTR_LIST_MAX, attr_ns->flags, &cur);
+	}
+	if (error && errno != ESTALE)
+		str_errno(ctx, descr);
+out:
+	return moveon;
+}
+
+/*
+ * Check all the xattr names in all the xattr namespaces for problematic
+ * characters.
+ */
+static bool
+xfs_scrub_scan_fhandle_xattrs(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct xfs_handle		*handle)
+{
+	const struct xfs_attr_ns	*ns;
+	bool				moveon = true;
+
+	for (ns = attr_ns; ns->name; ns++) {
+		moveon = xfs_scrub_scan_fhandle_namespace_xattrs(ctx, descr,
+				handle, ns);
+		if (!moveon)
+			break;
+	}
+	return moveon;
+}
+#else
+static inline bool
+xfs_scrub_scan_fhandle_xattrs(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct xfs_handle	*handle)
+{
+	return true;
+}
+#endif /* HAVE_LIBATTR */
+
+/*
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
  * parents of an opened directory, so we'll accept that as sufficient.
@@ -58,6 +238,11 @@ xfs_scrub_connections(
 			(uint64_t)bstat->bs_ino, agno, agino);
 	background_sleep();
 
+        /* Warn about naming problems in xattrs. */
+        moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle);
+        if (!moveon)
+                goto out;
+
 	/* Open the dir, let the kernel try to reconnect it to the root. */
 	if (S_ISDIR(bstat->bs_mode)) {
 		fd = xfs_open_handle(handle);
@@ -69,6 +254,13 @@ xfs_scrub_connections(
 		}
 	}
 
+        /* Warn about naming problems in the directory entries. */
+        if (fd >= 0 && S_ISDIR(bstat->bs_mode)) {
+                moveon = xfs_scrub_scan_dirents(ctx, descr, &fd);
+                if (!moveon)
+                        goto out;
+        }
+
 out:
 	if (fd >= 0)
 		close(fd);
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 373901e..c7e8e8e 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -90,6 +90,7 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
+	unsigned long long	naming_warnings;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
 };


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
@ 2017-11-17 21:01 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:01 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Iterate all directory and xattr names to look for name collisions
amongst Unicode normalized names.  This is generally a sign of buggy
programs or malicious duplicate files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac            |    2 
 debian/control          |    2 
 include/builddefs.in    |    2 
 m4/Makefile             |    1 
 m4/package_unistring.m4 |   19 ++
 scrub/Makefile          |   12 +
 scrub/common.c          |   20 ++
 scrub/common.h          |    3 
 scrub/phase5.c          |   37 ++++
 scrub/unicrash.c        |  399 +++++++++++++++++++++++++++++++++++++++++++++++
 scrub/unicrash.h        |   49 ++++++
 scrub/xfs_scrub.c       |    2 
 12 files changed, 538 insertions(+), 10 deletions(-)
 create mode 100644 m4/package_unistring.m4
 create mode 100644 scrub/unicrash.c
 create mode 100644 scrub/unicrash.h


diff --git a/configure.ac b/configure.ac
index 58dc9e8..b96d7a7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -165,6 +165,8 @@ AC_HAVE_GETFSMAP
 AC_HAVE_MALLINFO
 AC_PACKAGE_WANT_ATTRIBUTES_H
 AC_HAVE_LIBATTR
+AC_PACKAGE_WANT_UNINORM_H
+AC_HAVE_U8NORMALIZE
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/debian/control b/debian/control
index 1ef0b97..25b8594 100644
--- a/debian/control
+++ b/debian/control
@@ -3,7 +3,7 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libattr1-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libattr1-dev, libunistring-dev
 Standards-Version: 3.9.1
 Homepage: http://xfs.org/
 
diff --git a/include/builddefs.in b/include/builddefs.in
index 7964599..e63e232 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -35,6 +35,7 @@ LIBTERMCAP = @libtermcap@
 LIBEDITLINE = @libeditline@
 LIBREADLINE = @libreadline@
 LIBBLKID = @libblkid@
+LIBUNISTRING = @libunistring@
 LIBXFS = $(TOPDIR)/libxfs/libxfs.la
 LIBFROG = $(TOPDIR)/libfrog/libfrog.la
 LIBXCMD = $(TOPDIR)/libxcmd/libxcmd.la
@@ -117,6 +118,7 @@ NEED_INTERNAL_FSXATTR = @need_internal_fsxattr@
 HAVE_GETFSMAP = @have_getfsmap@
 HAVE_MALLINFO = @have_mallinfo@
 HAVE_LIBATTR = @have_libattr@
+HAVE_U8NORMALIZE = @have_u8normalize@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/Makefile b/m4/Makefile
index 100c8f5..f3a24bc 100644
--- a/m4/Makefile
+++ b/m4/Makefile
@@ -21,6 +21,7 @@ LSRCFILES = \
 	package_pthread.m4 \
 	package_sanitizer.m4 \
 	package_types.m4 \
+	package_unistring.m4 \
 	package_utilies.m4 \
 	package_uuiddev.m4 \
 	multilib.m4 \
diff --git a/m4/package_unistring.m4 b/m4/package_unistring.m4
new file mode 100644
index 0000000..9cbfcb0
--- /dev/null
+++ b/m4/package_unistring.m4
@@ -0,0 +1,19 @@
+AC_DEFUN([AC_PACKAGE_WANT_UNINORM_H],
+  [ AC_CHECK_HEADERS(uninorm.h)
+    if test $ac_cv_header_uninorm_h = no; then
+	AC_CHECK_HEADERS(uninorm.h,, [
+	echo
+	echo 'WARNING: could not find a valid uninorm.h header.'])
+    fi
+  ])
+
+AC_DEFUN([AC_HAVE_U8NORMALIZE],
+  [ AC_CHECK_LIB(unistring, u8_normalize,[
+	libunistring=-lunistring
+	have_u8normalize=yes
+    ],[
+	echo
+	echo 'WARNING: xfs_scrub will not be built with Unicode libraries.'])
+    AC_SUBST(libunistring)
+    AC_SUBST(have_u8normalize)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index d7c24a1..5642dea 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -24,6 +24,7 @@ fscounters.h \
 inodes.h \
 scrub.h \
 spacemap.h \
+unicrash.h \
 xfs_scrub.h
 
 CFILES = \
@@ -41,8 +42,8 @@ scrub.c \
 spacemap.c \
 xfs_scrub.c
 
-LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
-LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG) $(LIBUNISTRING)
 LLDFLAGS = -static
 
 ifeq ($(HAVE_MALLINFO),yes)
@@ -57,9 +58,14 @@ ifeq ($(HAVE_LIBATTR),yes)
 LCFLAGS += -DHAVE_LIBATTR
 endif
 
+ifeq ($(HAVE_U8NORMALIZE),yes)
+CFILES += unicrash.c
+LCFLAGS += -DHAVE_U8NORMALIZE
+endif
+
 default: depend $(LTCOMMAND)
 
-phase5.o: $(TOPDIR)/include/builddefs
+phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
 
 include $(BUILDRULES)
 
diff --git a/scrub/common.c b/scrub/common.c
index ceb80bc..5072493 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -75,6 +75,26 @@ __str_errno(
 	pthread_mutex_unlock(&ctx->lock);
 }
 
+/* Print a warning string and whatever error is stored in errno. */
+void
+__str_errno_warn(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	const char		*file,
+	int			line)
+{
+	char			buf[DESCR_BUFSZ];
+
+	pthread_mutex_lock(&ctx->lock);
+	fprintf(stderr, _("Warning: %s: %s."), descr,
+			strerror_r(errno, buf, DESCR_BUFSZ));
+	if (debug)
+		fprintf(stderr, _(" (%s line %d)"), file, line);
+	fprintf(stderr, "\n");
+	ctx->warnings_found++;
+	pthread_mutex_unlock(&ctx->lock);
+}
+
 /* Print an error string and some error text. */
 void
 __str_error(
diff --git a/scrub/common.h b/scrub/common.h
index 3bb2524..0c451cc 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -41,11 +41,14 @@ void __record_repair(struct scrub_ctx *ctx, const char *descr, const char *file,
 		int line, const char *format, ...);
 void __record_preen(struct scrub_ctx *ctx, const char *descr, const char *file,
 		int line, const char *format, ...);
+void __str_errno_warn(struct scrub_ctx *, const char *descr, const char *file,
+		      int line);
 
 #define str_errno(ctx, str)		__str_errno(ctx, str, __FILE__, __LINE__)
 #define str_error(ctx, str, ...)	__str_error(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_warn(ctx, str, ...)		__str_warn(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
 #define str_info(ctx, str, ...)		__str_info(ctx, str, __FILE__, __LINE__, __VA_ARGS__)
+#define str_errno_warn(ctx, str)	__str_errno_warn(ctx, str, __FILE__, __LINE__)
 #define dbg_printf(fmt, ...)		{if (debug > 1) {printf(fmt, __VA_ARGS__);}}
 
 /* Is this debug tweak enabled? */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index a248c59..ed89266 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -35,6 +35,7 @@
 #include "common.h"
 #include "inodes.h"
 #include "scrub.h"
+#include "unicrash.h"
 
 /* Phase 5: Check directory connectivity. */
 
@@ -92,8 +93,10 @@ static bool
 xfs_scrub_scan_dirents(
 	struct scrub_ctx	*ctx,
 	const char		*descr,
-	int			*fd)
+	int			*fd,
+	struct xfs_bstat	*bstat)
 {
+	struct unicrash		*uc = NULL;
 	DIR			*dir;
 	struct dirent		*dentry;
 	bool			moveon = true;
@@ -105,15 +108,24 @@ xfs_scrub_scan_dirents(
 	}
 	*fd = -1; /* closedir will close *fd for us */
 
+	moveon = unicrash_dir_init(&uc, ctx, bstat);
+	if (!moveon)
+		goto out_unicrash;
+
 	dentry = readdir(dir);
 	while (dentry) {
 		moveon = xfs_scrub_check_name(ctx, descr, _("directory"),
 				dentry->d_name);
 		if (!moveon)
 			break;
+		moveon = unicrash_check_dir_name(uc, descr, dentry);
+		if (!moveon)
+			break;
 		dentry = readdir(dir);
 	}
+	unicrash_free(uc);
 
+out_unicrash:
 	closedir(dir);
 out:
 	return moveon;
@@ -142,6 +154,7 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	struct scrub_ctx		*ctx,
 	const char			*descr,
 	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat,
 	const struct xfs_attr_ns	*attr_ns)
 {
 	struct attrlist_cursor		cur;
@@ -149,10 +162,15 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	char				keybuf[NAME_MAX + 1];
 	struct attrlist			*attrlist = (struct attrlist *)attrbuf;
 	struct attrlist_ent		*ent;
+	struct unicrash			*uc;
 	bool				moveon = true;
 	int				i;
 	int				error;
 
+	moveon = unicrash_xattr_init(&uc, ctx, bstat);
+	if (!moveon)
+		return false;
+
 	memset(attrbuf, 0, XFS_XATTR_LIST_MAX);
 	memset(&cur, 0, sizeof(cur));
 	memset(keybuf, 0, NAME_MAX + 1);
@@ -168,6 +186,9 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 					_("extended attribute"), keybuf);
 			if (!moveon)
 				goto out;
+			moveon = unicrash_check_xattr_name(uc, descr, keybuf);
+			if (!moveon)
+				goto out;
 		}
 
 		if (!attrlist->al_more)
@@ -178,6 +199,7 @@ xfs_scrub_scan_fhandle_namespace_xattrs(
 	if (error && errno != ESTALE)
 		str_errno(ctx, descr);
 out:
+	unicrash_free(uc);
 	return moveon;
 }
 
@@ -189,14 +211,15 @@ static bool
 xfs_scrub_scan_fhandle_xattrs(
 	struct scrub_ctx		*ctx,
 	const char			*descr,
-	struct xfs_handle		*handle)
+	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat)
 {
 	const struct xfs_attr_ns	*ns;
 	bool				moveon = true;
 
 	for (ns = attr_ns; ns->name; ns++) {
 		moveon = xfs_scrub_scan_fhandle_namespace_xattrs(ctx, descr,
-				handle, ns);
+				handle, bstat, ns);
 		if (!moveon)
 			break;
 	}
@@ -217,6 +240,8 @@ xfs_scrub_scan_fhandle_xattrs(
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
  * parents of an opened directory, so we'll accept that as sufficient.
+ *
+ * Check for potential Unicode collisions in names.
  */
 static int
 xfs_scrub_connections(
@@ -226,7 +251,7 @@ xfs_scrub_connections(
 	void			*arg)
 {
 	char			descr[DESCR_BUFSZ];
-	bool			moveon = true;
+	bool			moveon;
 	xfs_agnumber_t		agno;
 	xfs_agino_t		agino;
 	int			fd = -1;
@@ -239,7 +264,7 @@ xfs_scrub_connections(
 	background_sleep();
 
         /* Warn about naming problems in xattrs. */
-        moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle);
+        moveon = xfs_scrub_scan_fhandle_xattrs(ctx, descr, handle, bstat);
         if (!moveon)
                 goto out;
 
@@ -256,7 +281,7 @@ xfs_scrub_connections(
 
         /* Warn about naming problems in the directory entries. */
         if (fd >= 0 && S_ISDIR(bstat->bs_mode)) {
-                moveon = xfs_scrub_scan_dirents(ctx, descr, &fd);
+                moveon = xfs_scrub_scan_dirents(ctx, descr, &fd, bstat);
                 if (!moveon)
                         goto out;
         }
diff --git a/scrub/unicrash.c b/scrub/unicrash.c
new file mode 100644
index 0000000..35d3b56
--- /dev/null
+++ b/scrub/unicrash.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include <unistr.h>
+#include <uninorm.h>
+#include "xfs.h"
+#include "path.h"
+#include "xfs_scrub.h"
+#include "common.h"
+
+/*
+ * Detect collisions of Unicode-normalized names.
+ *
+ * Record all the name->ino mappings in a directory/xattr, with a twist!
+ * The twist is that we perform unicode normalization on every name we
+ * see, so that we can warn about a directory containing more than one
+ * directory entries that normalize to the same Unicode string.  These
+ * entries are at best a sign of Unicode mishandling, or some sort of
+ * weird name substitution attack if the entries do not point to the
+ * same inode.  Warn if we see multiple dirents that do not all point to
+ * the same inode.
+ *
+ * For extended attributes we perform the same collision checks on the
+ * attribute, though any collision is enough to trigger a warning.
+ *
+ * We flag these collisions as warnings and not errors because XFS
+ * treats names as a sequence of arbitrary nonzero bytes.  While a
+ * Unicode collision is not technically a filesystem corruption, we
+ * ought to say something if there's a possibility for misleading a
+ * user.
+ *
+ * To normalize, we use Unicode NFKC.  We use the composing
+ * normalization mode (e.g. "E WITH ACUTE" instead of "E" then "ACUTE")
+ * because that's what W3C (and in general Linux) uses.  This enables us
+ * to detect multiple object names that normalize to the same name and
+ * could be confusing to users.  Furthermore, we use the compatibility
+ * mode to detect names with compatible but different code points to
+ * strengthen those checks.
+ */
+
+struct name_entry {
+	struct name_entry	*next;
+	xfs_ino_t		ino;
+	size_t			uninamelen;
+	uint8_t			uniname[0];
+};
+#define NAME_ENTRY_SZ(nl)	(sizeof(struct name_entry) + 1 + \
+				 (nl * sizeof(uint8_t)))
+
+struct unicrash {
+	struct scrub_ctx	*ctx;
+	bool			compare_ino;
+	size_t			nr_buckets;
+	struct name_entry	*buckets[0];
+};
+#define UNICRASH_SZ(nr)		(sizeof(struct unicrash) + \
+				 (nr * sizeof(struct name_entry *)))
+
+/*
+ * We only care about validating utf8 collisions if the underlying
+ * system configuration says we're using utf8.  If the language
+ * specifier string used to output messages has ".UTF-8" somewhere in
+ * its name, then we conclude utf8 is in use.  Otherwise, no checking is
+ * performed.
+ *
+ * Most modern Linux systems default to utf8, so the only time this
+ * check will return false is if the administrator configured things
+ * this way or if things are so messed up there is no locale data at
+ * all.
+ */
+#define UTF8_STR		".UTF-8"
+#define UTF8_STRLEN		(sizeof(UTF8_STR) - 1)
+static bool
+is_utf8_locale(void)
+{
+	const char		*msg_locale;
+	static int		answer = -1;
+
+	if (answer != -1)
+		return answer;
+
+	msg_locale = setlocale(LC_MESSAGES, NULL);
+	if (msg_locale == NULL)
+		return false;
+
+	if (strstr(msg_locale, UTF8_STR) != NULL)
+		answer = 1;
+	else
+		answer = 0;
+	return answer;
+}
+
+/* Set up unicrash global state. */
+void
+unicrash_setup(void)
+{
+	is_utf8_locale();
+}
+
+/* Initialize the collision detector. */
+static bool
+unicrash_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	bool			compare_ino,
+	size_t			nr_buckets)
+{
+	struct unicrash		*p;
+
+	if (!is_utf8_locale()) {
+		*ucp = NULL;
+		return true;
+	}
+
+	if (nr_buckets > 65536)
+		nr_buckets = 65536;
+	else if (nr_buckets < 16)
+		nr_buckets = 16;
+
+	p = calloc(1, UNICRASH_SZ(nr_buckets));
+	if (!p)
+		return false;
+	p->ctx = ctx;
+	p->nr_buckets = nr_buckets;
+	p->compare_ino = compare_ino;
+	*ucp = p;
+
+	return true;
+}
+
+/* Initialize the collision detector for a directory. */
+bool
+unicrash_dir_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	struct xfs_bstat	*bstat)
+{
+	/*
+	 * Assume 64 bytes per dentry, clamp buckets between 16 and 64k.
+	 * Same general idea as dir_hash_init in xfs_repair.
+	 */
+	return unicrash_init(ucp, ctx, true, bstat->bs_size / 64);
+}
+
+/* Initialize the collision detector for an extended attribute. */
+bool
+unicrash_xattr_init(
+	struct unicrash		**ucp,
+	struct scrub_ctx	*ctx,
+	struct xfs_bstat	*bstat)
+{
+	/* Assume 16 attributes per extent for lack of a better idea. */
+	return unicrash_init(ucp, ctx, false, 16 * (1 + bstat->bs_aextents));
+}
+
+/* Free the crash detector. */
+void
+unicrash_free(
+	struct unicrash		*uc)
+{
+	struct name_entry	*ne;
+	struct name_entry	*x;
+	size_t			i;
+
+	if (!uc)
+		return;
+
+	for (i = 0; i < uc->nr_buckets; i++) {
+		for (ne = uc->buckets[i]; ne != NULL; ne = x) {
+			x = ne->next;
+			free(ne);
+		}
+	}
+	free(uc);
+}
+
+/* Steal the dirhash function from libxfs, avoid linking with libxfs. */
+
+#define rol32(x, y)		(((x) << (y)) | ((x) >> (32 - (y))))
+
+/*
+ * Implement a simple hash on a character string.
+ * Rotate the hash value by 7 bits, then XOR each character in.
+ * This is implemented with some source-level loop unrolling.
+ */
+static xfs_dahash_t
+unicrash_hashname(
+	const uint8_t		*name,
+	size_t			namelen)
+{
+	xfs_dahash_t		hash;
+
+	/*
+	 * Do four characters at a time as long as we can.
+	 */
+	for (hash = 0; namelen >= 4; namelen -= 4, name += 4)
+		hash = (name[0] << 21) ^ (name[1] << 14) ^ (name[2] << 7) ^
+		       (name[3] << 0) ^ rol32(hash, 7 * 4);
+
+	/*
+	 * Now do the rest of the characters.
+	 */
+	switch (namelen) {
+	case 3:
+		return (name[0] << 14) ^ (name[1] << 7) ^ (name[2] << 0) ^
+		       rol32(hash, 7 * 3);
+	case 2:
+		return (name[0] << 7) ^ (name[1] << 0) ^ rol32(hash, 7 * 2);
+	case 1:
+		return (name[0] << 0) ^ rol32(hash, 7 * 1);
+	default: /* case 0: */
+		return hash;
+	}
+}
+
+/*
+ * Normalize a name according to Unicode NFKC normalization rules.
+ * Returns true if the name was already normalized.
+ */
+static bool
+unicrash_normalize(
+	const char		*in,
+	uint8_t			*out,
+	size_t			outlen)
+{
+	size_t			inlen = strlen(in);
+
+	assert(inlen <= outlen);
+	if (!u8_normalize(UNINORM_NFKC, (const uint8_t *)in, inlen,
+			out, &outlen)) {
+		/* Didn't normalize, just return the same buffer. */
+		memcpy(out, in, inlen + 1);
+		return true;
+	}
+	out[outlen] = 0;
+	return outlen == inlen ? memcmp(in, out, inlen) == 0 : false;
+}
+
+/* Complain about Unicode problems. */
+static void
+unicrash_complain(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*what,
+	bool			normal,
+	bool			unique,
+	const char		*name,
+	uint8_t			*uniname)
+{
+	char			*bad1 = NULL;
+	char			*bad2 = NULL;
+
+	bad1 = string_escape(name);
+	bad2 = string_escape((char *)uniname);
+
+	if (!normal && should_warn_about_name(uc->ctx))
+		str_info(uc->ctx, descr,
+_("Unicode name \"%s\" in %s should be normalized as \"%s\"."),
+				bad1, what, bad2);
+	if (!unique)
+		str_warn(uc->ctx, descr,
+_("Duplicate normalized Unicode name \"%s\" found in %s."),
+				bad1, what);
+
+	free(bad1);
+	free(bad2);
+}
+
+/*
+ * Try to add a name -> ino entry to the collision detector.  The name
+ * must be normalized according to Unicode NFKC normalization rules to
+ * detect byte-unique names that map to the same sequence of Unicode
+ * code points.
+ *
+ * This function returns true either if there was no previous mapping or
+ * there was a mapping that matched exactly.  It returns false if
+ * there is already a record with that name pointing to a different
+ * inode.
+ */
+static bool
+unicrash_add(
+	struct unicrash		*uc,
+	uint8_t			*uniname,
+	xfs_ino_t		ino,
+	bool			*unique)
+{
+	struct name_entry	*ne;
+	struct name_entry	*x;
+	struct name_entry	**nep;
+	size_t			uninamelen = u8_strlen(uniname);
+	size_t			bucket;
+	xfs_dahash_t		hash;
+
+	/* Do we already know about that name? */
+	hash = unicrash_hashname(uniname, uninamelen);
+	bucket = hash % uc->nr_buckets;
+	for (nep = &uc->buckets[bucket], ne = *nep; ne != NULL; ne = x) {
+		if (u8_strcmp(uniname, ne->uniname) == 0) {
+			*unique = uc->compare_ino ? ne->ino == ino : false;
+			return true;
+		}
+		nep = &ne->next;
+		x = ne->next;
+	}
+
+	/* Remember that name. */
+	x = malloc(NAME_ENTRY_SZ(uninamelen));
+	if (!x)
+		return false;
+	x->next = NULL;
+	x->ino = ino;
+	x->uninamelen = uninamelen;
+	memcpy(x->uniname, uniname, uninamelen + 1);
+	*nep = x;
+	*unique = true;
+
+	return true;
+}
+
+/* Check a name for unicode normalization problems or collisions. */
+static bool
+__unicrash_check_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*namedescr,
+	const char		*name,
+	xfs_ino_t		ino)
+{
+	uint8_t			uniname[(NAME_MAX * 2) + 1];
+	bool			moveon;
+	bool			normal;
+	bool			unique;
+
+	memset(uniname, 0, (NAME_MAX * 2) + 1);
+	normal = unicrash_normalize(name, uniname, NAME_MAX * 2);
+	moveon = unicrash_add(uc, uniname, ino, &unique);
+	if (!moveon)
+		return false;
+
+	if (normal && unique)
+		return true;
+
+	unicrash_complain(uc, descr, namedescr, normal, unique, name,
+			uniname);
+	return true;
+}
+
+/* Check a directory entry for unicode normalization problems or collisions. */
+bool
+unicrash_check_dir_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	struct dirent		*dentry)
+{
+	if (!uc)
+		return true;
+	return __unicrash_check_name(uc, descr, _("directory"),
+			dentry->d_name, dentry->d_ino);
+}
+
+/*
+ * Check an extended attribute name for unicode normalization problems
+ * or collisions.
+ */
+bool
+unicrash_check_xattr_name(
+	struct unicrash		*uc,
+	const char		*descr,
+	const char		*attrname)
+{
+	if (!uc)
+		return true;
+	return __unicrash_check_name(uc, descr, _("extended attribute"),
+			attrname, 0);
+}
diff --git a/scrub/unicrash.h b/scrub/unicrash.h
new file mode 100644
index 0000000..266fdd0
--- /dev/null
+++ b/scrub/unicrash.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_UNICRASH_H_
+#define XFS_SCRUB_UNICRASH_H_
+
+struct unicrash;
+
+/* Unicode name collision detection. */
+#ifdef HAVE_U8NORMALIZE
+
+struct dirent;
+
+void unicrash_setup(void);
+bool unicrash_dir_init(struct unicrash **ucp, struct scrub_ctx *ctx,
+		struct xfs_bstat *bstat);
+bool unicrash_xattr_init(struct unicrash **ucp, struct scrub_ctx *ctx,
+		struct xfs_bstat *bstat);
+void unicrash_free(struct unicrash *uc);
+bool unicrash_check_dir_name(struct unicrash *uc, const char *descr,
+		struct dirent *dirent);
+bool unicrash_check_xattr_name(struct unicrash *uc, const char *descr,
+		const char *attrname);
+#else
+# define unicrash_setup()
+# define unicrash_dir_init(u, c, b)		(true)
+# define unicrash_xattr_init(u, c, b)		(true)
+# define unicrash_free(u)			do {(u) = (u);} while (0)
+# define unicrash_check_dir_name(u, d, n)	(true)
+# define unicrash_check_xattr_name(u, d, n)	(true)
+#endif /* HAVE_U8NORMALIZE */
+
+#endif /* XFS_SCRUB_UNICRASH_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 642f541..f224784 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -31,6 +31,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "unicrash.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -529,6 +530,7 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (optind != argc - 1)
 		usage();
 
+	unicrash_setup();
 	ctx.mntpoint = strdup(argv[optind]);
 
 	/* Find the mount record for the passed-in argument. */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 19/27] xfs_scrub: create a bitmap data structure
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2017-11-17 21:01 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an efficient tree-based bitmap data structure.  We will use this
during the data block scan to record the LBAs of IO errors so that we
can report broken files to userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile |    2 
 scrub/bitmap.c |  410 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/bitmap.h |   38 +++++
 3 files changed, 450 insertions(+)
 create mode 100644 scrub/bitmap.c
 create mode 100644 scrub/bitmap.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 5642dea..4118ab6 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -16,6 +16,7 @@ INSTALL_SCRUB = install-scrub
 endif	# scrub_prereqs
 
 HFILES = \
+bitmap.h \
 common.h \
 counter.h \
 disk.h \
@@ -28,6 +29,7 @@ unicrash.h \
 xfs_scrub.h
 
 CFILES = \
+bitmap.c \
 common.c \
 counter.c \
 disk.c \
diff --git a/scrub/bitmap.c b/scrub/bitmap.c
new file mode 100644
index 0000000..c2d6af1
--- /dev/null
+++ b/scrub/bitmap.c
@@ -0,0 +1,410 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <assert.h>
+#include <inttypes.h>
+#include <pthread.h>
+#include "platform_defs.h"
+#include "avl64.h"
+#include "list.h"
+#include "bitmap.h"
+
+/*
+ * Space Efficient Bitmap
+ *
+ * Implements a space-efficient bitmap.  We use an AVL tree to manage
+ * extent records that tell us which ranges are set; the bitmap key is
+ * an arbitrary uint64_t.  The usual bitmap operations (set, clear,
+ * test, test and set) are supported, plus we can iterate set ranges.
+ */
+
+#define avl_for_each_range_safe(pos, n, l, first, last) \
+	for (pos = (first), n = pos->avl_nextino, l = (last)->avl_nextino; pos != (l); \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each_safe(tree, pos, n) \
+	for (pos = (tree)->avl_firstino, n = pos ? pos->avl_nextino : NULL; \
+			pos != NULL; \
+			pos = n, n = pos ? pos->avl_nextino : NULL)
+
+#define avl_for_each(tree, pos) \
+	for (pos = (tree)->avl_firstino; pos != NULL; pos = pos->avl_nextino)
+
+struct bitmap_node {
+	struct avl64node	btn_node;
+	uint64_t		btn_start;
+	uint64_t		btn_length;
+};
+
+static uint64_t
+extent_start(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start;
+}
+
+static uint64_t
+extent_end(
+	struct avl64node	*node)
+{
+	struct bitmap_node	*btn;
+
+	btn = container_of(node, struct bitmap_node, btn_node);
+	return btn->btn_start + btn->btn_length;
+}
+
+static struct avl64ops bitmap_ops = {
+	extent_start,
+	extent_end,
+};
+
+/* Initialize a bitmap. */
+bool
+bitmap_init(
+	struct bitmap		**bmapp)
+{
+	struct bitmap		*bmap;
+
+	bmap = calloc(1, sizeof(struct bitmap));
+	if (!bmap)
+		return false;
+	bmap->bt_tree = malloc(sizeof(struct avl64tree_desc));
+	if (!bmap->bt_tree) {
+		free(bmap);
+		return false;
+	}
+
+	pthread_mutex_init(&bmap->bt_lock, NULL);
+	avl64_init_tree(bmap->bt_tree, &bitmap_ops);
+	*bmapp = bmap;
+
+	return true;
+}
+
+/* Free a bitmap. */
+void
+bitmap_free(
+	struct bitmap		**bmapp)
+{
+	struct bitmap		*bmap;
+	struct avl64node	*node;
+	struct avl64node	*n;
+	struct bitmap_node	*ext;
+
+	bmap = *bmapp;
+	avl_for_each_safe(bmap->bt_tree, node, n) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		free(ext);
+	}
+	free(bmap->bt_tree);
+	*bmapp = NULL;
+}
+
+/* Create a new bitmap extent node. */
+static struct bitmap_node *
+bitmap_node_init(
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct bitmap_node	*ext;
+
+	ext = malloc(sizeof(struct bitmap_node));
+	if (!ext)
+		return NULL;
+
+	ext->btn_node.avl_nextino = NULL;
+	ext->btn_start = start;
+	ext->btn_length = len;
+
+	return ext;
+}
+
+/* Set a region of bits (locked). */
+static bool
+__bitmap_set(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	bool			res = true;
+
+	/* Find any existing nodes adjacent or within that range. */
+	avl64_findranges(bmap->bt_tree, start - 1, start + length + 1,
+			&firstn, &lastn);
+
+	/* Nothing, just insert a new extent. */
+	if (firstn == NULL && lastn == NULL) {
+		ext = bitmap_node_init(start, length);
+		if (!ext)
+			return false;
+
+		node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+		if (node == NULL) {
+			free(ext);
+			errno = EEXIST;
+			return false;
+		}
+
+		return true;
+	}
+
+	assert(firstn != NULL && lastn != NULL);
+	new_start = start;
+	new_length = length;
+
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		/* Bail if the new extent is contained within an old one. */
+		if (ext->btn_start <= start &&
+		    ext->btn_start + ext->btn_length >= start + length)
+			return res;
+
+		/* Check for overlapping and adjacent extents. */
+		if (ext->btn_start + ext->btn_length >= start ||
+		    ext->btn_start <= start + length) {
+			if (ext->btn_start < start) {
+				new_start = ext->btn_start;
+				new_length += ext->btn_length;
+			}
+
+			if (ext->btn_start + ext->btn_length >
+			    new_start + new_length)
+				new_length = ext->btn_start + ext->btn_length -
+						new_start;
+
+			avl64_delete(bmap->bt_tree, pos);
+			free(ext);
+		}
+	}
+
+	ext = bitmap_node_init(new_start, new_length);
+	if (!ext)
+		return false;
+
+	node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+	if (node == NULL) {
+		free(ext);
+		errno = EEXIST;
+		return false;
+	}
+
+	return res;
+}
+
+/* Set a region of bits. */
+bool
+bitmap_set(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length)
+{
+	bool			res;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	res = __bitmap_set(bmap, start, length);
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return res;
+}
+
+#if 0	/* Unused, provided for completeness. */
+/* Clear a region of bits. */
+bool
+bitmap_clear(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	uint64_t		new_start;
+	uint64_t		new_length;
+	struct avl64node	*node;
+	int			stat;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	/* Find any existing nodes over that range. */
+	avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn);
+
+	/* Nothing, we're done. */
+	if (firstn == NULL && lastn == NULL) {
+		pthread_mutex_unlock(&bmap->bt_lock);
+		return true;
+	}
+
+	assert(firstn != NULL && lastn != NULL);
+
+	/* Delete or truncate everything in sight. */
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+
+		stat = 0;
+		if (ext->btn_start < start)
+			stat |= 1;
+		if (ext->btn_start + ext->btn_length > start + len)
+			stat |= 2;
+		switch (stat) {
+		case 0:
+			/* Extent totally within range; delete. */
+			avl64_delete(bmap->bt_tree, pos);
+			free(ext);
+			break;
+		case 1:
+			/* Extent is left-adjacent; truncate. */
+			ext->btn_length = start - ext->btn_start;
+			break;
+		case 2:
+			/* Extent is right-adjacent; move it. */
+			ext->btn_length = ext->btn_start + ext->btn_length -
+					(start + len);
+			ext->btn_start = start + len;
+			break;
+		case 3:
+			/* Extent overlaps both ends. */
+			ext->btn_length = start - ext->btn_start;
+			new_start = start + len;
+			new_length = ext->btn_start + ext->btn_length -
+					new_start;
+
+			ext = bitmap_node_init(new_start, new_length);
+			if (!ext)
+				return false;
+
+			node = avl64_insert(bmap->bt_tree, &ext->btn_node);
+			if (node == NULL) {
+				errno = EEXIST;
+				return false;
+			}
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&bmap->bt_lock);
+	return true;
+}
+#endif
+
+#ifdef DEBUG
+/* Iterate the set regions of this bitmap. */
+bool
+bitmap_iterate(
+	struct bitmap		*bmap,
+	bool			(*fn)(uint64_t, uint64_t, void *),
+	void			*arg)
+{
+	struct avl64node	*node;
+	struct bitmap_node	*ext;
+	bool			moveon = true;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	avl_for_each(bmap->bt_tree, node) {
+		ext = container_of(node, struct bitmap_node, btn_node);
+		moveon = fn(ext->btn_start, ext->btn_length, arg);
+		if (!moveon)
+			break;
+	}
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return moveon;
+}
+#endif
+
+/* Do any bitmap extents overlap the given one?  (locked) */
+static bool
+__bitmap_test(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+
+	/* Find any existing nodes over that range. */
+	avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn);
+
+	return firstn != NULL && lastn != NULL;
+}
+
+/* Is any part of this range set? */
+bool
+bitmap_test(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	bool			res;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	res = __bitmap_test(bmap, start, len);
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return res;
+}
+
+/* Are none of the bits set? */
+bool
+bitmap_empty(
+	struct bitmap		*bmap)
+{
+	return bmap->bt_tree->avl_firstino == NULL;
+}
+
+#ifdef DEBUG
+static bool
+bitmap_dump_fn(
+	uint64_t		startblock,
+	uint64_t		blockcount,
+	void			*arg)
+{
+	printf("%"PRIu64":%"PRIu64"\n", startblock, blockcount);
+	return true;
+}
+
+/* Dump bitmap. */
+void
+bitmap_dump(
+	struct bitmap		*bmap)
+{
+	printf("BITMAP DUMP %p\n", bmap);
+	bitmap_iterate(bmap, bitmap_dump_fn, NULL);
+	printf("BITMAP DUMP DONE\n");
+}
+#endif
diff --git a/scrub/bitmap.h b/scrub/bitmap.h
new file mode 100644
index 0000000..9910edf
--- /dev/null
+++ b/scrub/bitmap.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_BITMAP_H_
+#define XFS_SCRUB_BITMAP_H_
+
+struct bitmap {
+	pthread_mutex_t		bt_lock;
+	struct avl64tree_desc	*bt_tree;
+};
+
+bool bitmap_init(struct bitmap **bmap);
+void bitmap_free(struct bitmap **bmap);
+bool bitmap_set(struct bitmap *bmap, uint64_t start, uint64_t length);
+bool bitmap_iterate(struct bitmap *bmap,
+		bool (*fn)(uint64_t, uint64_t, void *), void *arg);
+bool bitmap_test(struct bitmap *bmap, uint64_t start,
+		uint64_t len);
+bool bitmap_empty(struct bitmap *bmap);
+void bitmap_dump(struct bitmap *bmap);
+
+#endif /* XFS_SCRUB_BITMAP_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Manage the scheduling, issuance, and reporting of data block
verification reads.  This enables us to combine adjacent (or nearly
adjacent) read requests, and to take advantage of high-IOPS devices by
issuing IO from multiple threads.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile      |    2 
 scrub/read_verify.c |  268 +++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/read_verify.h |   50 ++++++++++
 scrub/xfs_scrub.h   |    3 +
 4 files changed, 323 insertions(+)
 create mode 100644 scrub/read_verify.c
 create mode 100644 scrub/read_verify.h


diff --git a/scrub/Makefile b/scrub/Makefile
index 4118ab6..d1d1eb1 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+read_verify.h \
 scrub.h \
 spacemap.h \
 unicrash.h \
@@ -40,6 +41,7 @@ phase1.c \
 phase2.c \
 phase3.c \
 phase5.c \
+read_verify.c \
 scrub.c \
 spacemap.c \
 xfs_scrub.c
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
new file mode 100644
index 0000000..b3e79a4
--- /dev/null
+++ b/scrub/read_verify.c
@@ -0,0 +1,268 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "workqueue.h"
+#include "path.h"
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "disk.h"
+#include "read_verify.h"
+
+/*
+ * Read Verify Pool
+ *
+ * Manages the data block read verification phase.  The caller schedules
+ * verification requests, which are then scheduled to be run by a thread
+ * pool worker.  Adjacent (or nearly adjacent) requests can be combined
+ * to reduce overhead when free space fragmentation is high.  The thread
+ * pool takes care of issuing multiple IOs to the device, if possible.
+ */
+
+/*
+ * Perform all IO in 32M chunks.  This cannot exceed 65536 sectors
+ * because that's the biggest SCSI VERIFY(16) we dare to send.
+ */
+#define RVP_IO_MAX_SIZE		(33554432)
+#define RVP_IO_MAX_SECTORS	(RVP_IO_MAX_SIZE >> BBSHIFT)
+
+/* Tolerate 64k holes in adjacent read verify requests. */
+#define RVP_IO_BATCH_LOCALITY	(65536)
+
+struct read_verify_pool {
+	struct workqueue	wq;		/* thread pool */
+	struct scrub_ctx	*ctx;		/* scrub context */
+	void			*readbuf;	/* read buffer */
+	struct ptcounter	*verified_bytes;
+	read_verify_ioerr_fn_t	ioerr_fn;	/* io error callback */
+	size_t			miniosz;	/* minimum io size, bytes */
+};
+
+/* Create a thread pool to run read verifiers. */
+struct read_verify_pool *
+read_verify_pool_init(
+	struct scrub_ctx		*ctx,
+	size_t				miniosz,
+	read_verify_ioerr_fn_t		ioerr_fn,
+	unsigned int			nproc)
+{
+	struct read_verify_pool		*rvp;
+	bool				ret;
+	int				error;
+
+	rvp = calloc(1, sizeof(struct read_verify_pool));
+	if (!rvp)
+		return NULL;
+
+	error = posix_memalign((void **)&rvp->readbuf, page_size,
+			RVP_IO_MAX_SIZE);
+	if (error || !rvp->readbuf)
+		goto out_free;
+	rvp->verified_bytes = ptcounter_init(nproc);
+	if (!rvp->verified_bytes)
+		goto out_buf;
+	rvp->miniosz = miniosz;
+	rvp->ctx = ctx;
+	rvp->ioerr_fn = ioerr_fn;
+	/* Run in the main thread if we only want one thread. */
+	if (nproc == 1)
+		nproc = 0;
+	ret = workqueue_create(&rvp->wq, (struct xfs_mount *)rvp, nproc);
+	if (ret)
+		goto out_counter;
+	return rvp;
+
+out_counter:
+	ptcounter_free(rvp->verified_bytes);
+out_buf:
+	free(rvp->readbuf);
+out_free:
+	free(rvp);
+	return NULL;
+}
+
+/* Finish up any read verification work. */
+void
+read_verify_pool_flush(
+	struct read_verify_pool		*rvp)
+{
+	workqueue_destroy(&rvp->wq);
+}
+
+/* Finish up any read verification work and tear it down. */
+void
+read_verify_pool_destroy(
+	struct read_verify_pool		*rvp)
+{
+	ptcounter_free(rvp->verified_bytes);
+	free(rvp->readbuf);
+	free(rvp);
+}
+
+/*
+ * Issue a read-verify IO in big batches.
+ */
+static void
+read_verify(
+	struct workqueue		*wq,
+	xfs_agnumber_t			agno,
+	void				*arg)
+{
+	struct read_verify		*rv = arg;
+	struct read_verify_pool		*rvp;
+	unsigned long long		verified = 0;
+	ssize_t				sz;
+	ssize_t				len;
+
+	rvp = (struct read_verify_pool *)wq->wq_ctx;
+	while (rv->io_length > 0) {
+		len = min(rv->io_length, RVP_IO_MAX_SIZE);
+		dbg_printf("diskverify %d %"PRIu64" %zu\n", rv->io_disk->d_fd,
+				rv->io_start, len);
+		sz = disk_read_verify(rv->io_disk, rvp->readbuf,
+				rv->io_start, len);
+		if (sz < 0) {
+			dbg_printf("IOERR %d %"PRIu64" %zu\n",
+					rv->io_disk->d_fd,
+					rv->io_start, len);
+			/* IO error, so try the next logical block. */
+			len = rvp->miniosz;
+			rvp->ioerr_fn(rvp->ctx, rv->io_disk, rv->io_start, len,
+					errno, rv->io_end_arg);
+		}
+
+		verified += len;
+		rv->io_start += len;
+		rv->io_length -= len;
+	}
+
+	free(rv);
+	ptcounter_add(rvp->verified_bytes, verified);
+}
+
+/* Queue a read verify request. */
+static bool
+read_verify_queue(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	struct read_verify		*tmp;
+	bool				ret;
+
+	dbg_printf("verify fd %d start %"PRIu64" len %"PRIu64"\n",
+			rv->io_disk->d_fd, rv->io_start, rv->io_length);
+
+	tmp = malloc(sizeof(struct read_verify));
+	if (!tmp) {
+		rvp->ioerr_fn(rvp->ctx, rv->io_disk, rv->io_start,
+				rv->io_length, errno, rv->io_end_arg);
+		return true;
+	}
+	memcpy(tmp, rv, sizeof(*tmp));
+
+	ret = workqueue_add(&rvp->wq, read_verify, 0, tmp);
+	if (ret) {
+		str_error(rvp->ctx, rvp->ctx->mntpoint,
+_("Could not queue read-verify work."));
+		free(tmp);
+		return false;
+	}
+	rv->io_length = 0;
+	return true;
+}
+
+/*
+ * Issue an IO request.  We'll batch subsequent requests if they're
+ * within 64k of each other
+ */
+bool
+read_verify_schedule_io(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	void				*end_arg)
+{
+	uint64_t			req_end;
+	uint64_t			rv_end;
+
+	assert(rvp->readbuf);
+	req_end = start + length;
+	rv_end = rv->io_start + rv->io_length;
+
+	/*
+	 * If we have a stashed IO, we haven't changed fds, the error
+	 * reporting is the same, and the two extents are close,
+	 * we can combine them.
+	 */
+	if (rv->io_length > 0 && disk == rv->io_disk &&
+	    end_arg == rv->io_end_arg &&
+	    ((start >= rv->io_start && start <= rv_end + RVP_IO_BATCH_LOCALITY) ||
+	     (rv->io_start >= start &&
+	      rv->io_start <= req_end + RVP_IO_BATCH_LOCALITY))) {
+		rv->io_start = min(rv->io_start, start);
+		rv->io_length = max(req_end, rv_end) - rv->io_start;
+	} else  {
+		/* Otherwise, issue the stashed IO (if there is one) */
+		if (rv->io_length > 0)
+			return read_verify_queue(rvp, rv);
+
+		/* Stash the new IO. */
+		rv->io_disk = disk;
+		rv->io_start = start;
+		rv->io_length = length;
+		rv->io_end_arg = end_arg;
+	}
+
+	return true;
+}
+
+/* Force any stashed IOs into the verifier. */
+bool
+read_verify_force_io(
+	struct read_verify_pool		*rvp,
+	struct read_verify		*rv)
+{
+	bool				moveon;
+
+	assert(rvp->readbuf);
+	if (rv->io_length == 0)
+		return true;
+
+	moveon = read_verify_queue(rvp, rv);
+	if (moveon)
+		rv->io_length = 0;
+	return moveon;
+}
+
+/* How many bytes has this process verified? */
+uint64_t
+read_verify_bytes(
+	struct read_verify_pool		*rvp)
+{
+	return ptcounter_value(rvp->verified_bytes);
+}
diff --git a/scrub/read_verify.h b/scrub/read_verify.h
new file mode 100644
index 0000000..6b4e11b
--- /dev/null
+++ b/scrub/read_verify.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_READ_VERIFY_H_
+#define XFS_SCRUB_READ_VERIFY_H_
+
+struct scrub_ctx;
+struct read_verify_pool;
+
+/* Function called when an IO error happens. */
+typedef void (*read_verify_ioerr_fn_t)(struct scrub_ctx *ctx,
+		struct disk *disk, uint64_t start, uint64_t length,
+		int error, void *arg);
+
+struct read_verify_pool *read_verify_pool_init(struct scrub_ctx *ctx,
+		size_t miniosz, read_verify_ioerr_fn_t ioerr_fn,
+		unsigned int nproc);
+void read_verify_pool_flush(struct read_verify_pool *rvp);
+void read_verify_pool_destroy(struct read_verify_pool *rvp);
+
+struct read_verify {
+	void			*io_end_arg;
+	struct disk		*io_disk;
+	uint64_t		io_start;	/* bytes */
+	uint64_t		io_length;	/* bytes */
+};
+
+bool read_verify_schedule_io(struct read_verify_pool *rvp,
+		struct read_verify *rv, struct disk *disk, uint64_t start,
+		uint64_t length, void *end_arg);
+bool read_verify_force_io(struct read_verify_pool *rvp, struct read_verify *rv);
+uint64_t read_verify_bytes(struct read_verify_pool *rvp);
+
+#endif /* XFS_SCRUB_READ_VERIFY_H_ */
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index c7e8e8e..2396c6e 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -83,6 +83,9 @@ struct scrub_ctx {
 	void			*fshandle;
 	size_t			fshandle_len;
 
+	/* Data block read verification buffer */
+	void			*readbuf;
+
 	/* Mutable scrub state; use lock. */
 	pthread_mutex_t		lock;
 	unsigned long long	max_errors;


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 21/27] xfs_scrub: scrub file data blocks
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Read all data blocks from the disk, hoping to catch IO errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    2 
 include/builddefs.in  |    2 
 m4/package_libcdev.m4 |   28 +++
 scrub/Makefile        |    7 -
 scrub/phase6.c        |  516 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/vfs.c           |  221 +++++++++++++++++++++
 scrub/vfs.h           |   31 +++
 scrub/xfs_scrub.c     |    4 
 scrub/xfs_scrub.h     |    2 
 9 files changed, 811 insertions(+), 2 deletions(-)
 create mode 100644 scrub/phase6.c
 create mode 100644 scrub/vfs.c
 create mode 100644 scrub/vfs.h


diff --git a/configure.ac b/configure.ac
index b96d7a7..2a86767 100644
--- a/configure.ac
+++ b/configure.ac
@@ -167,6 +167,8 @@ AC_PACKAGE_WANT_ATTRIBUTES_H
 AC_HAVE_LIBATTR
 AC_PACKAGE_WANT_UNINORM_H
 AC_HAVE_U8NORMALIZE
+AC_HAVE_OPENAT
+AC_HAVE_FSTATAT
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index e63e232..a7034d8 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -119,6 +119,8 @@ HAVE_GETFSMAP = @have_getfsmap@
 HAVE_MALLINFO = @have_mallinfo@
 HAVE_LIBATTR = @have_libattr@
 HAVE_U8NORMALIZE = @have_u8normalize@
+HAVE_OPENAT = @have_openat@
+HAVE_FSTATAT = @have_fstatat@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index 91e1959..d111fd1 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -332,3 +332,31 @@ AC_DEFUN([AC_HAVE_MALLINFO],
        AC_MSG_RESULT(no))
     AC_SUBST(have_mallinfo)
   ])
+
+#
+# Check if we have a openat call
+#
+AC_DEFUN([AC_HAVE_OPENAT],
+  [ AC_CHECK_DECL([openat],
+       have_openat=yes,
+       [],
+       [#include <sys/types.h>
+        #include <sys/stat.h>
+        #include <fcntl.h>]
+       )
+    AC_SUBST(have_openat)
+  ])
+
+#
+# Check if we have a fstatat call
+#
+AC_DEFUN([AC_HAVE_FSTATAT],
+  [ AC_CHECK_DECL([fstatat],
+       have_fstatat=yes,
+       [],
+       [#define _GNU_SOURCE
+       #include <sys/types.h>
+       #include <sys/stat.h>
+       #include <unistd.h>])
+    AC_SUBST(have_fstatat)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index d1d1eb1..ce3aa9d 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -8,9 +8,9 @@ include $(TOPDIR)/include/builddefs
 # On linux we get fsmap from the system or define it ourselves
 # so include this based on platform type.  If this reverts to only
 # the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
-SCRUB_PREREQS=$(PKG_PLATFORM)
+SCRUB_PREREQS=$(PKG_PLATFORM)$(HAVE_OPENAT)$(HAVE_FSTATAT)
 
-ifeq ($(SCRUB_PREREQS),linux)
+ifeq ($(SCRUB_PREREQS),linuxyesyes)
 LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 endif	# scrub_prereqs
@@ -27,6 +27,7 @@ read_verify.h \
 scrub.h \
 spacemap.h \
 unicrash.h \
+vfs.h \
 xfs_scrub.h
 
 CFILES = \
@@ -41,9 +42,11 @@ phase1.c \
 phase2.c \
 phase3.c \
 phase5.c \
+phase6.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
+vfs.c \
 xfs_scrub.c
 
 LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD) $(LIBUNISTRING)
diff --git a/scrub/phase6.c b/scrub/phase6.c
new file mode 100644
index 0000000..e6d9c69
--- /dev/null
+++ b/scrub/phase6.c
@@ -0,0 +1,516 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "handle.h"
+#include "path.h"
+#include "ptvar.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "bitmap.h"
+#include "disk.h"
+#include "filemap.h"
+#include "inodes.h"
+#include "read_verify.h"
+#include "spacemap.h"
+#include "vfs.h"
+
+/*
+ * Phase 6: Verify data file integrity.
+ *
+ * Identify potential data block extents with GETFSMAP, then feed those
+ * extents to the read-verify pool to get the verify commands batched,
+ * issued, and (if there are problems) reported back to us.  If there
+ * are errors, we'll record the bad regions and (if available) use rmap
+ * to tell us if metadata are now corrupt.  Otherwise, we'll scan the
+ * whole directory tree looking for files that overlap the bad regions
+ * and report the paths of the now corrupt files.
+ */
+
+/* Find the fd for a given device identifier. */
+static struct disk *
+xfs_dev_to_disk(
+	struct scrub_ctx	*ctx,
+	dev_t			dev)
+{
+	if (dev == ctx->fsinfo.fs_datadev)
+		return ctx->datadev;
+	else if (dev == ctx->fsinfo.fs_logdev)
+		return ctx->logdev;
+	else if (dev == ctx->fsinfo.fs_rtdev)
+		return ctx->rtdev;
+	abort();
+}
+
+/* Find the device major/minor for a given file descriptor. */
+static dev_t
+xfs_disk_to_dev(
+	struct scrub_ctx	*ctx,
+	struct disk		*disk)
+{
+	if (disk == ctx->datadev)
+		return ctx->fsinfo.fs_datadev;
+	else if (disk == ctx->logdev)
+		return ctx->fsinfo.fs_logdev;
+	else if (disk == ctx->rtdev)
+		return ctx->fsinfo.fs_rtdev;
+	abort();
+}
+
+struct owner_decode {
+	uint64_t		owner;
+	const char		*descr;
+};
+
+static const struct owner_decode special_owners[] = {
+	{XFS_FMR_OWN_FREE,	"free space"},
+	{XFS_FMR_OWN_UNKNOWN,	"unknown owner"},
+	{XFS_FMR_OWN_FS,	"static FS metadata"},
+	{XFS_FMR_OWN_LOG,	"journalling log"},
+	{XFS_FMR_OWN_AG,	"per-AG metadata"},
+	{XFS_FMR_OWN_INOBT,	"inode btree blocks"},
+	{XFS_FMR_OWN_INODES,	"inodes"},
+	{XFS_FMR_OWN_REFC,	"refcount btree"},
+	{XFS_FMR_OWN_COW,	"CoW staging"},
+	{XFS_FMR_OWN_DEFECTIVE,	"bad blocks"},
+	{0, NULL},
+};
+
+/* Decode a special owner. */
+static const char *
+xfs_decode_special_owner(
+	uint64_t			owner)
+{
+	const struct owner_decode	*od = special_owners;
+
+	while (od->descr) {
+		if (od->owner == owner)
+			return od->descr;
+		od++;
+	}
+
+	return NULL;
+}
+
+/* Routines to translate bad physical extents into file paths and offsets. */
+
+struct xfs_verify_error_info {
+	struct bitmap			*d_bad;		/* bytes */
+	struct bitmap			*r_bad;		/* bytes */
+};
+
+/* Report if this extent overlaps a bad region. */
+static bool
+xfs_report_verify_inode_bmap(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	int				whichfork,
+	struct fsxattr			*fsx,
+	struct xfs_bmap			*bmap,
+	void				*arg)
+{
+	struct xfs_verify_error_info	*vei = arg;
+	struct bitmap			*bmp;
+
+	/* Only report errors for real extents. */
+	if (bmap->bm_flags & (BMV_OF_PREALLOC | BMV_OF_DELALLOC))
+		return true;
+
+	if (fsx->fsx_xflags & FS_XFLAG_REALTIME)
+		bmp = vei->r_bad;
+	else
+		bmp = vei->d_bad;
+
+	if (!bitmap_test(bmp, bmap->bm_physical, bmap->bm_length))
+		return true;
+
+	str_error(ctx, descr,
+_("offset %llu failed read verification."), bmap->bm_offset);
+	return true;
+}
+
+/* Iterate the extent mappings of a file to report errors. */
+static bool
+xfs_report_verify_fd(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	void				*arg)
+{
+	struct xfs_bmap			key = {0};
+	bool				moveon;
+
+	/* data fork */
+	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_DATA_FORK, &key,
+			xfs_report_verify_inode_bmap, arg);
+	if (!moveon)
+		return false;
+
+	/* attr fork */
+	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_ATTR_FORK, &key,
+			xfs_report_verify_inode_bmap, arg);
+	if (!moveon)
+		return false;
+	return true;
+}
+
+/* Report read verify errors in unlinked (but still open) files. */
+static int
+xfs_report_verify_inode(
+	struct scrub_ctx		*ctx,
+	struct xfs_handle		*handle,
+	struct xfs_bstat		*bstat,
+	void				*arg)
+{
+	char				descr[DESCR_BUFSZ];
+	char				buf[DESCR_BUFSZ];
+	bool				moveon;
+	int				fd;
+	int				error;
+
+	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (unlinked)"),
+			(uint64_t)bstat->bs_ino);
+
+	/* Ignore linked files and things we can't open. */
+	if (bstat->bs_nlink != 0)
+		return 0;
+	if (!S_ISREG(bstat->bs_mode) && !S_ISDIR(bstat->bs_mode))
+		return 0;
+
+	/* Try to open the inode. */
+	fd = xfs_open_handle(handle);
+	if (fd < 0) {
+		error = errno;
+		if (error == ESTALE)
+			return error;
+
+		str_warn(ctx, descr, "%s", strerror_r(error, buf, DESCR_BUFSZ));
+		return error;
+	}
+
+	/* Go find the badness. */
+	moveon = xfs_report_verify_fd(ctx, descr, fd, arg);
+	close(fd);
+
+	return moveon ? 0 : XFS_ITERATE_INODES_ABORT;
+}
+
+/* Scan a directory for matches in the read verify error list. */
+static bool
+xfs_report_verify_dir(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	void			*arg)
+{
+	return xfs_report_verify_fd(ctx, path, dir_fd, arg);
+}
+
+/*
+ * Scan the inode associated with a directory entry for matches with
+ * the read verify error list.
+ */
+static bool
+xfs_report_verify_dirent(
+	struct scrub_ctx	*ctx,
+	const char		*path,
+	int			dir_fd,
+	struct dirent		*dirent,
+	struct stat		*sb,
+	void			*arg)
+{
+	bool			moveon;
+	int			fd;
+
+	/* Ignore things we can't open. */
+	if (!S_ISREG(sb->st_mode) && !S_ISDIR(sb->st_mode))
+		return true;
+
+	/* Ignore . and .. */
+	if (!strcmp(".", dirent->d_name) || !strcmp("..", dirent->d_name))
+		return true;
+
+	/*
+	 * If we were given a dirent, open the associated file under
+	 * dir_fd for badblocks scanning.  If dirent is NULL, then it's
+	 * the directory itself we want to scan.
+	 */
+	fd = openat(dir_fd, dirent->d_name,
+			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (fd < 0)
+		return true;
+
+	/* Go find the badness. */
+	moveon = xfs_report_verify_fd(ctx, path, fd, arg);
+	if (moveon)
+		goto out;
+
+out:
+	close(fd);
+
+	return moveon;
+}
+
+/* Given bad extent lists for the data & rtdev, find bad files. */
+static bool
+xfs_report_verify_errors(
+	struct scrub_ctx		*ctx,
+	struct bitmap			*d_bad,
+	struct bitmap			*r_bad)
+{
+	struct xfs_verify_error_info	vei;
+	bool				moveon;
+
+	vei.d_bad = d_bad;
+	vei.r_bad = r_bad;
+
+	/* Scan the directory tree to get file paths. */
+	moveon = scan_fs_tree(ctx, xfs_report_verify_dir,
+			xfs_report_verify_dirent, &vei);
+	if (!moveon)
+		return false;
+
+	/* Scan for unlinked files. */
+	return xfs_scan_all_inodes(ctx, xfs_report_verify_inode, &vei);
+}
+
+/* Verify disk blocks with GETFSMAP */
+
+struct xfs_verify_extent {
+	struct read_verify_pool	*readverify;
+	struct ptvar		*rvstate;
+	struct bitmap		*d_bad;		/* bytes */
+	struct bitmap		*r_bad;		/* bytes */
+};
+
+/* Report an IO error resulting from read-verify based off getfsmap. */
+static bool
+xfs_check_rmap_error_report(
+	struct scrub_ctx	*ctx,
+	const char		*descr,
+	struct fsmap		*map,
+	void			*arg)
+{
+	const char		*type;
+	char			buf[32];
+	uint64_t		err_physical = *(uint64_t *)arg;
+	uint64_t		err_off;
+
+	if (err_physical > map->fmr_physical)
+		err_off = err_physical - map->fmr_physical;
+	else
+		err_off = 0;
+
+	snprintf(buf, 32, _("disk offset %"PRIu64),
+			(uint64_t)BTOBB(map->fmr_physical + err_off));
+
+	if (map->fmr_flags & FMR_OF_SPECIAL_OWNER) {
+		type = xfs_decode_special_owner(map->fmr_owner);
+		str_error(ctx, buf,
+_("%s failed read verification."),
+				type);
+	}
+
+	/*
+	 * XXX: If we had a getparent() call we could report IO errors
+	 * efficiently.  Until then, we'll have to scan the dir tree
+	 * to find the bad file's pathname.
+	 */
+
+	return true;
+}
+
+/*
+ * Remember a read error for later, and see if rmap will tell us about the
+ * owner ahead of time.
+ */
+void
+xfs_check_rmap_ioerr(
+	struct scrub_ctx		*ctx,
+	struct disk			*disk,
+	uint64_t			start,
+	uint64_t			length,
+	int				error,
+	void				*arg)
+{
+	struct fsmap			keys[2];
+	char				descr[DESCR_BUFSZ];
+	struct xfs_verify_extent	*ve = arg;
+	struct bitmap			*tree;
+	dev_t				dev;
+	bool				moveon;
+
+	dev = xfs_disk_to_dev(ctx, disk);
+
+	/*
+	 * If we don't have parent pointers, save the bad extent for
+	 * later rescanning.
+	 */
+	if (dev == ctx->fsinfo.fs_datadev)
+		tree = ve->d_bad;
+	else if (dev == ctx->fsinfo.fs_rtdev)
+		tree = ve->r_bad;
+	else
+		tree = NULL;
+	if (tree) {
+		moveon = bitmap_set(tree, start, length);
+		if (!moveon)
+			str_errno(ctx, ctx->mntpoint);
+	}
+
+	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d ioerr @ %"PRIu64":%"PRIu64" "),
+			major(dev), minor(dev), start, length);
+
+	/* Go figure out which blocks are bad from the fsmap. */
+	memset(keys, 0, sizeof(struct fsmap) * 2);
+	keys->fmr_device = dev;
+	keys->fmr_physical = start;
+	(keys + 1)->fmr_device = dev;
+	(keys + 1)->fmr_physical = start + length - 1;
+	(keys + 1)->fmr_owner = ULLONG_MAX;
+	(keys + 1)->fmr_offset = ULLONG_MAX;
+	(keys + 1)->fmr_flags = UINT_MAX;
+	xfs_iterate_fsmap(ctx, descr, keys, xfs_check_rmap_error_report,
+			&start);
+}
+
+/* Schedule a read-verify of a (data block) extent. */
+static bool
+xfs_check_rmap(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fsmap			*map,
+	void				*arg)
+{
+	struct xfs_verify_extent	*ve = arg;
+	struct disk			*disk;
+
+	dbg_printf("rmap dev %d:%d phys %"PRIu64" owner %"PRId64
+			" offset %"PRIu64" len %"PRIu64" flags 0x%x\n",
+			major(map->fmr_device), minor(map->fmr_device),
+			(uint64_t)map->fmr_physical, (int64_t)map->fmr_owner,
+			(uint64_t)map->fmr_offset, (uint64_t)map->fmr_length,
+			map->fmr_flags);
+
+	/* "Unknown" extents should be verified; they could be data. */
+	if ((map->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+			map->fmr_owner == XFS_FMR_OWN_UNKNOWN)
+		map->fmr_flags &= ~FMR_OF_SPECIAL_OWNER;
+
+	/*
+	 * We only care about read-verifying data extents that have been
+	 * written to disk.  This means we can skip "special" owners
+	 * (metadata), xattr blocks, unwritten extents, and extent maps.
+	 * These should all get checked elsewhere in the scrubber.
+	 */
+	if (map->fmr_flags & (FMR_OF_PREALLOC | FMR_OF_ATTR_FORK |
+			      FMR_OF_EXTENT_MAP | FMR_OF_SPECIAL_OWNER))
+		goto out;
+
+	/* XXX: Filter out directory data blocks. */
+
+	/* Schedule the read verify command for (eventual) running. */
+	disk = xfs_dev_to_disk(ctx, map->fmr_device);
+
+	read_verify_schedule_io(ve->readverify, ptvar_get(ve->rvstate), disk,
+			map->fmr_physical, map->fmr_length, ve);
+
+out:
+	/* Is this the last extent?  Fire off the read. */
+	if (map->fmr_flags & FMR_OF_LAST)
+		read_verify_force_io(ve->readverify, ptvar_get(ve->rvstate));
+
+	return true;
+}
+
+/*
+ * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
+ * do data checksums, we trust that the underlying storage will pass back
+ * an IO error if it can't retrieve whatever we previously stored there.
+ * If we hit an IO error, we'll record the bad blocks in a bitmap and then
+ * scan the extent maps of the entire fs tree to figure (and the unlinked
+ * inodes) out which files are now broken.
+ */
+bool
+xfs_scan_blocks(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_verify_extent	ve;
+	bool				moveon;
+
+	ve.rvstate = ptvar_init(scrub_nproc(ctx), sizeof(struct read_verify));
+	if (!ve.rvstate) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	moveon = bitmap_init(&ve.d_bad);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		goto out_ve;
+	}
+
+	moveon = bitmap_init(&ve.r_bad);
+	if (!moveon) {
+		str_errno(ctx, ctx->mntpoint);
+		goto out_dbad;
+	}
+
+	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
+			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
+	if (!ve.readverify) {
+		moveon = false;
+		str_error(ctx, ctx->mntpoint,
+_("Could not create media verifier."));
+		goto out_rbad;
+	}
+	moveon = xfs_scan_all_spacemaps(ctx, xfs_check_rmap, &ve);
+	if (!moveon)
+		goto out_pool;
+	read_verify_pool_flush(ve.readverify);
+	ctx->bytes_checked += read_verify_bytes(ve.readverify);
+	read_verify_pool_destroy(ve.readverify);
+
+	/* Scan the whole dir tree to see what matches the bad extents. */
+	if (!bitmap_empty(ve.d_bad) || !bitmap_empty(ve.r_bad))
+		moveon = xfs_report_verify_errors(ctx, ve.d_bad, ve.r_bad);
+
+	bitmap_free(&ve.r_bad);
+	bitmap_free(&ve.d_bad);
+	ptvar_free(ve.rvstate);
+	return moveon;
+
+out_pool:
+	read_verify_pool_destroy(ve.readverify);
+out_rbad:
+	bitmap_free(&ve.r_bad);
+out_dbad:
+	bitmap_free(&ve.d_bad);
+out_ve:
+	ptvar_free(ve.rvstate);
+	return moveon;
+}
diff --git a/scrub/vfs.c b/scrub/vfs.c
new file mode 100644
index 0000000..6afeca0
--- /dev/null
+++ b/scrub/vfs.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "handle.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "vfs.h"
+
+/*
+ * Helper functions to assist in traversing a directory tree using regular
+ * VFS calls.
+ */
+
+/* Scan a filesystem tree. */
+struct scan_fs_tree {
+	unsigned int		nr_dirs;
+	pthread_mutex_t		lock;
+	pthread_cond_t		wakeup;
+	struct stat		root_sb;
+	bool			moveon;
+	scan_fs_tree_dir_fn	dir_fn;
+	scan_fs_tree_dirent_fn	dirent_fn;
+	void			*arg;
+};
+
+/* Per-work-item scan context. */
+struct scan_fs_tree_dir {
+	char			*path;
+	struct scan_fs_tree	*sft;
+	bool			rootdir;
+};
+
+/* Scan a directory sub tree. */
+static void
+scan_fs_dir(
+	struct workqueue	*wq,
+	xfs_agnumber_t		agno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct scan_fs_tree_dir	*sftd = arg;
+	struct scan_fs_tree	*sft = sftd->sft;
+	DIR			*dir;
+	struct dirent		*dirent;
+	char			newpath[PATH_MAX];
+	struct scan_fs_tree_dir	*new_sftd;
+	struct stat		sb;
+	int			dir_fd;
+	int			error;
+
+	/* Open the directory. */
+	dir_fd = open(sftd->path, O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
+	if (dir_fd < 0) {
+		if (errno != ENOENT)
+			str_errno(ctx, sftd->path);
+		goto out;
+	}
+
+	/* Caller-specific directory checks. */
+	if (!sft->dir_fn(ctx, sftd->path, dir_fd, sft->arg)) {
+		sft->moveon = false;
+		goto out;
+	}
+
+	/* Iterate the directory entries. */
+	dir = fdopendir(dir_fd);
+	if (!dir) {
+		str_errno(ctx, sftd->path);
+		goto out;
+	}
+	rewinddir(dir);
+	for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
+		snprintf(newpath, PATH_MAX, "%s/%s", sftd->path,
+				dirent->d_name);
+
+		/* Get the stat info for this directory entry. */
+		error = fstatat(dir_fd, dirent->d_name, &sb,
+				AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW);
+		if (error) {
+			str_errno(ctx, newpath);
+			continue;
+		}
+
+		/* Ignore files on other filesystems. */
+		if (sb.st_dev != sft->root_sb.st_dev)
+			continue;
+
+		/* Caller-specific directory entry function. */
+		if (!sft->dirent_fn(ctx, newpath, dir_fd, dirent, &sb,
+				sft->arg)) {
+			sft->moveon = false;
+			break;
+		}
+
+		if (xfs_scrub_excessive_errors(ctx)) {
+			sft->moveon = false;
+			break;
+		}
+
+		/* If directory, call ourselves recursively. */
+		if (S_ISDIR(sb.st_mode) && strcmp(".", dirent->d_name) &&
+		    strcmp("..", dirent->d_name)) {
+			new_sftd = malloc(sizeof(struct scan_fs_tree_dir));
+			if (!new_sftd) {
+				str_errno(ctx, newpath);
+				sft->moveon = false;
+				break;
+			}
+			new_sftd->path = strdup(newpath);
+			new_sftd->sft = sft;
+			new_sftd->rootdir = false;
+			pthread_mutex_lock(&sft->lock);
+			sft->nr_dirs++;
+			pthread_mutex_unlock(&sft->lock);
+			error = workqueue_add(wq, scan_fs_dir, 0, new_sftd);
+			if (error) {
+				str_error(ctx, ctx->mntpoint,
+_("Could not queue subdirectory scan work."));
+				sft->moveon = false;
+				break;
+			}
+		}
+	}
+
+	/* Close dir, go away. */
+	error = closedir(dir);
+	if (error)
+		str_errno(ctx, sftd->path);
+
+out:
+	pthread_mutex_lock(&sft->lock);
+	sft->nr_dirs--;
+	if (sft->nr_dirs == 0)
+		pthread_cond_signal(&sft->wakeup);
+	pthread_mutex_unlock(&sft->lock);
+
+	free(sftd->path);
+	free(sftd);
+}
+
+/* Scan the entire filesystem. */
+bool
+scan_fs_tree(
+	struct scrub_ctx	*ctx,
+	scan_fs_tree_dir_fn	dir_fn,
+	scan_fs_tree_dirent_fn	dirent_fn,
+	void			*arg)
+{
+	struct workqueue	wq;
+	struct scan_fs_tree	sft;
+	struct scan_fs_tree_dir	*sftd;
+	int			ret;
+
+	sft.moveon = true;
+	sft.nr_dirs = 1;
+	sft.root_sb = ctx->mnt_sb;
+	sft.dir_fn = dir_fn;
+	sft.dirent_fn = dirent_fn;
+	sft.arg = arg;
+	pthread_mutex_init(&sft.lock, NULL);
+	pthread_cond_init(&sft.wakeup, NULL);
+
+	sftd = malloc(sizeof(struct scan_fs_tree_dir));
+	if (!sftd) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+	sftd->path = strdup(ctx->mntpoint);
+	sftd->sft = &sft;
+	sftd->rootdir = true;
+
+	ret = workqueue_create(&wq, (struct xfs_mount *)ctx,
+			scrub_nproc_workqueue(ctx));
+	if (ret) {
+		str_error(ctx, ctx->mntpoint, _("Could not create workqueue."));
+		goto out_free;
+	}
+	ret = workqueue_add(&wq, scan_fs_dir, 0, sftd);
+	if (ret) {
+		str_error(ctx, ctx->mntpoint,
+_("Could not queue directory scan work."));
+		goto out_free;
+	}
+
+	pthread_mutex_lock(&sft.lock);
+	pthread_cond_wait(&sft.wakeup, &sft.lock);
+	assert(sft.nr_dirs == 0);
+	pthread_mutex_unlock(&sft.lock);
+	workqueue_destroy(&wq);
+
+	return sft.moveon;
+out_free:
+	free(sftd->path);
+	free(sftd);
+	return false;
+}
diff --git a/scrub/vfs.h b/scrub/vfs.h
new file mode 100644
index 0000000..d565039
--- /dev/null
+++ b/scrub/vfs.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_VFS_H_
+#define XFS_SCRUB_VFS_H_
+
+typedef bool (*scan_fs_tree_dir_fn)(struct scrub_ctx *, const char *,
+		int, void *);
+typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
+		int, struct dirent *, struct stat *, void *);
+
+bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
+		scan_fs_tree_dirent_fn dirent_fn, void *arg);
+
+#endif /* XFS_SCRUB_VFS_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index f224784..0d40e1f 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -390,6 +390,10 @@ run_scrub_phases(
 
 	/* Run all phases of the scrub tool. */
 	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
+		/* Turn on certain phases if user said to. */
+		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
+			sp->fn = xfs_scan_blocks;
+
 		/* Skip certain phases unless they're turned on. */
 		if (sp->fn == REPAIR_DUMMY_FN ||
 		    sp->fn == DATASCAN_DUMMY_FN)
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 2396c6e..b7102de 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -93,6 +93,7 @@ struct scrub_ctx {
 	unsigned long long	errors_found;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
+	unsigned long long	bytes_checked;
 	unsigned long long	naming_warnings;
 	bool			need_repair;
 	bool			preen_triggers[XFS_SCRUB_TYPE_NR];
@@ -105,5 +106,6 @@ bool xfs_setup_fs(struct scrub_ctx *ctx);
 bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
+bool xfs_scan_blocks(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If we sense that we're talking to a raw SCSI disk, use the SCSI READ
VERIFY command to ask the disk to verify a disk internally.  This can
sharply reduce the runtime of the data block verification phase on
devices whose internal bandwidth exceeds their link bandwidth.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 configure.ac          |    2 +
 include/builddefs.in  |    2 +
 m4/package_libcdev.m4 |   30 ++++++++++
 scrub/Makefile        |    8 +++
 scrub/disk.c          |  146 +++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/disk.h          |    1 
 6 files changed, 188 insertions(+), 1 deletion(-)


diff --git a/configure.ac b/configure.ac
index 2a86767..0a2e7f3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -169,6 +169,8 @@ AC_PACKAGE_WANT_UNINORM_H
 AC_HAVE_U8NORMALIZE
 AC_HAVE_OPENAT
 AC_HAVE_FSTATAT
+AC_HAVE_SG_IO
+AC_HAVE_HDIO_GETGEO
 
 if test "$enable_blkid" = yes; then
 AC_HAVE_BLKID_TOPO
diff --git a/include/builddefs.in b/include/builddefs.in
index a7034d8..0e358d0 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -121,6 +121,8 @@ HAVE_LIBATTR = @have_libattr@
 HAVE_U8NORMALIZE = @have_u8normalize@
 HAVE_OPENAT = @have_openat@
 HAVE_FSTATAT = @have_fstatat@
+HAVE_SG_IO = @have_sg_io@
+HAVE_HDIO_GETGEO = @have_hdio_getgeo@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index d111fd1..339e8a2 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -360,3 +360,33 @@ AC_DEFUN([AC_HAVE_FSTATAT],
        #include <unistd.h>])
     AC_SUBST(have_fstatat)
   ])
+
+#
+# Check if we have the SG_IO ioctl
+#
+AC_DEFUN([AC_HAVE_SG_IO],
+  [ AC_MSG_CHECKING([for struct sg_io_hdr ])
+    AC_TRY_COMPILE([#include <scsi/sg.h>],
+    [
+         struct sg_io_hdr hdr;
+         ioctl(0, SG_IO, &hdr);
+    ], have_sg_io=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_sg_io)
+  ])
+
+#
+# Check if we have the HDIO_GETGEO ioctl
+#
+AC_DEFUN([AC_HAVE_HDIO_GETGEO],
+  [ AC_MSG_CHECKING([for struct hd_geometry ])
+    AC_TRY_COMPILE([#include <linux/hdreg.h>],
+    [
+         struct hd_geometry hdr;
+         ioctl(0, HDIO_GETGEO, &hdr);
+    ], have_hdio_getgeo=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_hdio_getgeo)
+  ])
diff --git a/scrub/Makefile b/scrub/Makefile
index ce3aa9d..cb7d9c1 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -70,6 +70,14 @@ CFILES += unicrash.c
 LCFLAGS += -DHAVE_U8NORMALIZE
 endif
 
+ifeq ($(HAVE_SG_IO),yes)
+LCFLAGS += -DHAVE_SG_IO
+endif
+
+ifeq ($(HAVE_HDIO_GETGEO),yes)
+LCFLAGS += -DHAVE_HDIO_GETGEO
+endif
+
 default: depend $(LTCOMMAND)
 
 phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
diff --git a/scrub/disk.c b/scrub/disk.c
index 96eaa6a..31a99af 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -29,12 +29,19 @@
 #include <sys/statvfs.h>
 #include <sys/vfs.h>
 #include <linux/fs.h>
+#ifdef HAVE_SG_IO
+# include <scsi/sg.h>
+#endif
+#ifdef HAVE_HDIO_GETGEO
+# include <linux/hdreg.h>
+#endif
 #include "platform_defs.h"
 #include "libfrog.h"
 #include "xfs.h"
 #include "path.h"
 #include "xfs_fs.h"
 #include "xfs_scrub.h"
+#include "common.h"
 #include "disk.h"
 
 /*
@@ -90,12 +97,119 @@ disk_heads(
 	return __disk_heads(disk);
 }
 
+/*
+ * Execute a SCSI VERIFY(16) to verify disk contents.
+ * For devices that support this command, this can sharply reduce the
+ * runtime of the data block verification phase if the storage device's
+ * internal bandwidth exceeds its link bandwidth.  However, it only
+ * works if we're talking to a raw SCSI device, and only if we trust the
+ * firmware.
+ */
+#ifdef HAVE_SG_IO
+# define SENSE_BUF_LEN		64
+# define VERIFY16_CMDLEN	16
+# define VERIFY16_CMD		0x8F
+
+# ifndef SG_FLAG_Q_AT_TAIL
+#  define SG_FLAG_Q_AT_TAIL	0x10
+# endif
+static int
+disk_scsi_verify(
+	struct disk		*disk,
+	uint64_t		startblock, /* lba */
+	uint64_t		blockcount) /* lba */
+{
+	struct sg_io_hdr	iohdr;
+	unsigned char		cdb[VERIFY16_CMDLEN];
+	unsigned char		sense[SENSE_BUF_LEN];
+	uint64_t		llba;
+	uint64_t		veri_len = blockcount;
+	int			error;
+
+	assert(!debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"));
+
+	llba = startblock + (disk->d_start >> BBSHIFT);
+
+	/* Borrowed from sg_verify */
+	cdb[0] = VERIFY16_CMD;
+	cdb[1] = 0; /* skip PI, DPO, and byte check. */
+	cdb[2] = (llba >> 56) & 0xff;
+	cdb[3] = (llba >> 48) & 0xff;
+	cdb[4] = (llba >> 40) & 0xff;
+	cdb[5] = (llba >> 32) & 0xff;
+	cdb[6] = (llba >> 24) & 0xff;
+	cdb[7] = (llba >> 16) & 0xff;
+	cdb[8] = (llba >> 8) & 0xff;
+	cdb[9] = llba & 0xff;
+	cdb[10] = (veri_len >> 24) & 0xff;
+	cdb[11] = (veri_len >> 16) & 0xff;
+	cdb[12] = (veri_len >> 8) & 0xff;
+	cdb[13] = veri_len & 0xff;
+	cdb[14] = 0;
+	cdb[15] = 0;
+	memset(sense, 0, SENSE_BUF_LEN);
+
+	/* v3 SG_IO */
+	memset(&iohdr, 0, sizeof(iohdr));
+	iohdr.interface_id = 'S';
+	iohdr.dxfer_direction = SG_DXFER_NONE;
+	iohdr.cmdp = cdb;
+	iohdr.cmd_len = VERIFY16_CMDLEN;
+	iohdr.sbp = sense;
+	iohdr.mx_sb_len = SENSE_BUF_LEN;
+	iohdr.flags |= SG_FLAG_Q_AT_TAIL;
+	iohdr.timeout = 30000; /* 30s */
+
+	error = ioctl(disk->d_fd, SG_IO, &iohdr);
+	if (error)
+		return error;
+
+	dbg_printf("VERIFY(16) fd %d lba %"PRIu64" len %"PRIu64" info %x "
+			"status %d masked %d msg %d host %d driver %d "
+			"duration %d resid %d\n",
+			disk->d_fd, startblock, blockcount, iohdr.info,
+			iohdr.status, iohdr.masked_status, iohdr.msg_status,
+			iohdr.host_status, iohdr.driver_status, iohdr.duration,
+			iohdr.resid);
+
+	if (iohdr.info & SG_INFO_CHECK) {
+		dbg_printf("status: msg %x host %x driver %x\n",
+				iohdr.msg_status, iohdr.host_status,
+				iohdr.driver_status);
+		errno = EIO;
+		return -1;
+	}
+
+	return error;
+}
+#else
+# define disk_scsi_verify(...)		(ENOTTY)
+#endif /* HAVE_SG_IO */
+
+/* Test the availability of the kernel scrub ioctl. */
+static bool
+disk_can_scsi_verify(
+	struct disk		*disk)
+{
+	int			error;
+
+	if (debug_tweak_on("XFS_SCRUB_NO_SCSI_VERIFY"))
+		return false;
+
+	error = disk_scsi_verify(disk, 0, 1);
+	return error == 0;
+}
+
 /* Open a disk device and discover its geometry. */
 struct disk *
 disk_open(
 	const char		*pathname)
 {
+#ifdef HAVE_HDIO_GETGEO
+	struct hd_geometry	bdgeo;
+#endif
 	struct disk		*disk;
+	bool			suspicious_disk = false;
 	int			lba_sz;
 	int			error;
 
@@ -126,13 +240,34 @@ disk_open(
 		error = ioctl(disk->d_fd, BLKBSZGET, &disk->d_blksize);
 		if (error)
 			disk->d_blksize = 0;
-		disk->d_start = 0;
+#ifdef HAVE_HDIO_GETGEO
+		error = ioctl(disk->d_fd, HDIO_GETGEO, &bdgeo);
+		if (!error) {
+			/*
+			 * dm devices will pass through ioctls, which means
+			 * we can't use SCSI VERIFY unless the start is 0.
+			 * Most dm devices don't set geometry (unlike scsi
+			 * and nvme) so use a zeroed out CHS to screen them
+			 * out.
+			 */
+			if (bdgeo.start != 0 &&
+			    (unsigned long long)bdgeo.heads * bdgeo.sectors *
+					bdgeo.sectors == 0)
+				suspicious_disk = true;
+			disk->d_start = bdgeo.start << BBSHIFT;
+		} else
+#endif
+			disk->d_start = 0;
 	} else {
 		disk->d_size = disk->d_sb.st_size;
 		disk->d_blksize = disk->d_sb.st_blksize;
 		disk->d_start = 0;
 	}
 
+	/* Can we issue SCSI VERIFY? */
+	if (!suspicious_disk && disk_can_scsi_verify(disk))
+		disk->d_flags |= DISK_FLAG_SCSI_VERIFY;
+
 	return disk;
 out_close:
 	close(disk->d_fd);
@@ -155,6 +290,10 @@ disk_close(
 	return error;
 }
 
+#define BTOLBAT(d, bytes)	((uint64_t)(bytes) >> (d)->d_lbalog)
+#define LBASIZE(d)		(1ULL << (d)->d_lbalog)
+#define BTOLBA(d, bytes)	(((uint64_t)(bytes) + LBASIZE(d) - 1) >> (d)->d_lbalog)
+
 /* Read-verify an extent of a disk device. */
 ssize_t
 disk_read_verify(
@@ -163,5 +302,10 @@ disk_read_verify(
 	uint64_t		start,
 	uint64_t		length)
 {
+	/* Convert to logical block size. */
+	if (disk->d_flags & DISK_FLAG_SCSI_VERIFY)
+		return disk_scsi_verify(disk, BTOLBAT(disk, start),
+				BTOLBA(disk, length));
+
 	return pread(disk->d_fd, buf, length, start);
 }
diff --git a/scrub/disk.h b/scrub/disk.h
index 4331300..b1b15c0 100644
--- a/scrub/disk.h
+++ b/scrub/disk.h
@@ -20,6 +20,7 @@
 #ifndef XFS_SCRUB_DISK_H_
 #define XFS_SCRUB_DISK_H_
 
+#define DISK_FLAG_SCSI_VERIFY	0x1
 struct disk {
 	struct stat	d_sb;
 	int		d_fd;


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 23/27] xfs_scrub: check summary counters
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure the filesystem summary counters are somewhat close to what
we can find by scanning the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 
 scrub/common.c    |   28 ++++++
 scrub/common.h    |    4 +
 scrub/phase7.c    |  266 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    5 -
 scrub/xfs_scrub.h |    1 
 6 files changed, 302 insertions(+), 3 deletions(-)
 create mode 100644 scrub/phase7.c


diff --git a/scrub/Makefile b/scrub/Makefile
index cb7d9c1..c442c12 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -43,6 +43,7 @@ phase2.c \
 phase3.c \
 phase5.c \
 phase6.c \
+phase7.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
diff --git a/scrub/common.c b/scrub/common.c
index 5072493..2be17b9 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -413,3 +413,31 @@ _("More than %u naming warnings, shutting up."),
 
 	return debug || verbose || res;
 }
+
+/* Decide if a value is within +/- (n/d) of a desired value. */
+bool
+within_range(
+	struct scrub_ctx	*ctx,
+	unsigned long long	value,
+	unsigned long long	desired,
+	unsigned long long	abs_threshold,
+	unsigned int		n,
+	unsigned int		d,
+	const char		*descr)
+{
+	assert(n < d);
+
+	/* Don't complain if difference does not exceed an absolute value. */
+	if (value < desired && desired - value < abs_threshold)
+		return true;
+	if (value > desired && value - desired < abs_threshold)
+		return true;
+
+	/* Complain if the difference exceeds a certain percentage. */
+	if (value < desired * (d - n) / d)
+		return false;
+	if (value > desired * (d + n) / d)
+		return false;
+
+	return true;
+}
diff --git a/scrub/common.h b/scrub/common.h
index 0c451cc..1bb6c44 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -80,4 +80,8 @@ char *string_escape(const char *in);
 #define TOO_MANY_NAME_WARNINGS	10000
 bool should_warn_about_name(struct scrub_ctx *ctx);
 
+bool within_range(struct scrub_ctx *ctx, unsigned long long value,
+		unsigned long long desired, unsigned long long abs_threshold,
+		unsigned int n, unsigned int d, const char *descr);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/phase7.c b/scrub/phase7.c
new file mode 100644
index 0000000..c1422e8
--- /dev/null
+++ b/scrub/phase7.c
@@ -0,0 +1,266 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "path.h"
+#include "ptvar.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "fscounters.h"
+#include "spacemap.h"
+
+/* Phase 7: Check summary counters. */
+
+struct xfs_summary_counts {
+	unsigned long long	dbytes;		/* data dev bytes */
+	unsigned long long	rbytes;		/* rt dev bytes */
+	unsigned long long	next_phys;	/* next phys bytes we see? */
+	unsigned long long	agbytes;	/* freespace bytes */
+};
+
+/* Record block usage. */
+static bool
+xfs_record_block_summary(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	struct fsmap			*fsmap,
+	void				*arg)
+{
+	struct xfs_summary_counts	*counts;
+	unsigned long long		len;
+
+	counts = ptvar_get((struct ptvar *)arg);
+	if (fsmap->fmr_device == ctx->fsinfo.fs_logdev)
+		return true;
+	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+	    fsmap->fmr_owner == XFS_FMR_OWN_FREE)
+		return true;
+
+	len = fsmap->fmr_length;
+
+	/* freesp btrees live in free space, need to adjust counters later. */
+	if ((fsmap->fmr_flags & FMR_OF_SPECIAL_OWNER) &&
+	    fsmap->fmr_owner == XFS_FMR_OWN_AG) {
+		counts->agbytes += fsmap->fmr_length;
+	}
+	if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev) {
+		/* Count realtime extents. */
+		counts->rbytes += len;
+	} else {
+		/* Count datadev extents. */
+		if (counts->next_phys >= fsmap->fmr_physical + len)
+			return true;
+		else if (counts->next_phys > fsmap->fmr_physical)
+			len = counts->next_phys - fsmap->fmr_physical;
+		counts->dbytes += len;
+		counts->next_phys = fsmap->fmr_physical + fsmap->fmr_length;
+	}
+
+	return true;
+}
+
+/* Add all the summaries in the per-thread counter */
+static bool
+xfs_add_summaries(
+	struct ptvar			*ptv,
+	void				*data,
+	void				*arg)
+{
+	struct xfs_summary_counts	*total = arg;
+	struct xfs_summary_counts	*item = data;
+
+	total->dbytes += item->dbytes;
+	total->rbytes += item->rbytes;
+	total->agbytes += item->agbytes;
+	return true;
+}
+
+/*
+ * Count all inodes and blocks in the filesystem as told by GETFSMAP and
+ * BULKSTAT, and compare that to summary counters.  Since this is a live
+ * filesystem we'll be content if the summary counts are within 10% of
+ * what we observed.
+ */
+bool
+xfs_scan_summary(
+	struct scrub_ctx		*ctx)
+{
+	struct xfs_summary_counts	totalcount = {0};
+	struct ptvar			*ptvar;
+	unsigned long long		used_data;
+	unsigned long long		used_rt;
+	unsigned long long		used_files;
+	unsigned long long		stat_data;
+	unsigned long long		stat_rt;
+	uint64_t			counted_inodes = 0;
+	unsigned long long		absdiff;
+	unsigned long long		d_blocks;
+	unsigned long long		d_bfree;
+	unsigned long long		r_blocks;
+	unsigned long long		r_bfree;
+	unsigned long long		f_files;
+	unsigned long long		f_free;
+	bool				moveon;
+	bool				complain;
+	int				error;
+
+	/* Flush everything out to disk before we start counting. */
+	error = syncfs(ctx->mnt_fd);
+	if (error) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	ptvar = ptvar_init(scrub_nproc(ctx), sizeof(struct xfs_summary_counts));
+	if (!ptvar) {
+		str_errno(ctx, ctx->mntpoint);
+		return false;
+	}
+
+	/* Use fsmap to count blocks. */
+	moveon = xfs_scan_all_spacemaps(ctx, xfs_record_block_summary, ptvar);
+	if (!moveon)
+		goto out_free;
+	moveon = ptvar_foreach(ptvar, xfs_add_summaries, &totalcount);
+	if (!moveon)
+		goto out_free;
+	ptvar_free(ptvar);
+
+	/* Scan the whole fs. */
+	moveon = xfs_count_all_inodes(ctx, &counted_inodes);
+	if (!moveon)
+		goto out;
+
+	moveon = xfs_scan_estimate_blocks(ctx, &d_blocks, &d_bfree, &r_blocks,
+			&r_bfree, &f_files, &f_free);
+	if (!moveon)
+		return moveon;
+
+	/*
+	 * If we counted blocks with fsmap, then dblocks includes
+	 * blocks for the AGFL and the freespace/rmap btrees.  The
+	 * filesystem treats them as "free", but since we scanned
+	 * them, we'll consider them used.
+	 */
+	d_bfree -= totalcount.agbytes >> ctx->blocklog;
+
+	/* Report on what we found. */
+	used_data = (d_blocks - d_bfree) << ctx->blocklog;
+	used_rt = (r_blocks - r_bfree) << ctx->blocklog;
+	used_files = f_files - f_free;
+	stat_data = totalcount.dbytes;
+	stat_rt = totalcount.rbytes;
+
+	/*
+	 * Complain if the counts are off by more than 10% unless
+	 * the inaccuracy is less than 32MB worth of blocks or 100 inodes.
+	 */
+	absdiff = 1ULL << 25;
+	complain = verbose;
+	complain |= !within_range(ctx, stat_data, used_data, absdiff, 1, 10,
+			_("data blocks"));
+	complain |= !within_range(ctx, stat_rt, used_rt, absdiff, 1, 10,
+			_("realtime blocks"));
+	complain |= !within_range(ctx, counted_inodes, used_files, 100, 1, 10,
+			_("inodes"));
+
+	if (complain) {
+		double		d, r, i;
+		char		*du, *ru, *iu;
+
+		if (used_rt || stat_rt) {
+			d = auto_space_units(used_data, &du);
+			r = auto_space_units(used_rt, &ru);
+			i = auto_units(used_files, &iu);
+			fprintf(stdout,
+_("%.1f%s data used;  %.1f%s realtime data used;  %.2f%s inodes used.\n"),
+					d, du, r, ru, i, iu);
+			d = auto_space_units(stat_data, &du);
+			r = auto_space_units(stat_rt, &ru);
+			i = auto_units(counted_inodes, &iu);
+			fprintf(stdout,
+_("%.1f%s data found; %.1f%s realtime data found; %.2f%s inodes found.\n"),
+					d, du, r, ru, i, iu);
+		} else {
+			d = auto_space_units(used_data, &du);
+			i = auto_units(used_files, &iu);
+			fprintf(stdout,
+_("%.1f%s data used;  %.1f%s inodes used.\n"),
+					d, du, i, iu);
+			d = auto_space_units(stat_data, &du);
+			i = auto_units(counted_inodes, &iu);
+			fprintf(stdout,
+_("%.1f%s data found; %.1f%s inodes found.\n"),
+					d, du, i, iu);
+		}
+		fflush(stdout);
+	}
+
+	/*
+	 * Complain if the checked inode counts are off, which
+	 * implies an incomplete check.
+	 */
+	if (verbose ||
+	    !within_range(ctx, counted_inodes, ctx->inodes_checked, 100, 1, 10,
+			_("checked inodes"))) {
+		double		i1, i2;
+		char		*i1u, *i2u;
+
+		i1 = auto_units(counted_inodes, &i1u);
+		i2 = auto_units(ctx->inodes_checked, &i2u);
+		fprintf(stdout,
+_("%.1f%s inodes counted; %.1f%s inodes checked.\n"),
+				i1, i1u, i2, i2u);
+		fflush(stdout);
+	}
+
+	/*
+	 * Complain if the checked block counts are off, which
+	 * implies an incomplete check.
+	 */
+	if (ctx->bytes_checked &&
+	    (verbose ||
+	     !within_range(ctx, used_data + used_rt,
+			ctx->bytes_checked, absdiff, 1, 10,
+			_("verified blocks")))) {
+		double		b1, b2;
+		char		*b1u, *b2u;
+
+		b1 = auto_space_units(used_data + used_rt, &b1u);
+		b2 = auto_space_units(ctx->bytes_checked, &b2u);
+		fprintf(stdout,
+_("%.1f%s data counted; %.1f%s data verified.\n"),
+				b1, b1u, b2, b2u);
+		fflush(stdout);
+	}
+
+	moveon = true;
+
+out:
+	return moveon;
+out_free:
+	ptvar_free(ptvar);
+	return moveon;
+}
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 0d40e1f..786e2b3 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -374,6 +374,8 @@ run_scrub_phases(
 		},
 		{
 			.descr = _("Check summary counters."),
+			.fn = xfs_scan_summary,
+			.must_run = true,
 		},
 		{
 			NULL
@@ -443,9 +445,6 @@ main(
 	static bool		injected;
 	int			ret = 0;
 
-	fprintf(stderr, "XXX: This program is not complete!\n");
-	return 4;
-
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
 	bindtextdomain(PACKAGE, LOCALEDIR);
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index b7102de..d7b8900 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -107,5 +107,6 @@ bool xfs_scan_metadata(struct scrub_ctx *ctx);
 bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
 bool xfs_scan_blocks(struct scrub_ctx *ctx);
+bool xfs_scan_summary(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If the filesystem scan comes out clean or fixes all the problems, call
fstrim to clean out the free areas (if it's an ssd/thinp/whatever).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/Makefile    |    1 +
 scrub/phase4.c    |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/vfs.c       |   23 +++++++++++++++++++++++
 scrub/vfs.h       |    2 ++
 scrub/xfs_scrub.c |   26 +++++++++++++++++++++++++-
 scrub/xfs_scrub.h |    1 +
 6 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 scrub/phase4.c


diff --git a/scrub/Makefile b/scrub/Makefile
index c442c12..dace31e 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -41,6 +41,7 @@ inodes.c \
 phase1.c \
 phase2.c \
 phase3.c \
+phase4.c \
 phase5.c \
 phase6.c \
 phase7.c \
diff --git a/scrub/phase4.c b/scrub/phase4.c
new file mode 100644
index 0000000..c7874a6
--- /dev/null
+++ b/scrub/phase4.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/statvfs.h>
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "list.h"
+#include "path.h"
+#include "workqueue.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "scrub.h"
+#include "vfs.h"
+
+/* Phase 4: Repair filesystem. */
+
+/* Fix everything that needs fixing. */
+bool
+xfs_repair_fs(
+	struct scrub_ctx		*ctx)
+{
+	bool				moveon = true;
+
+	pthread_mutex_lock(&ctx->lock);
+	if (moveon && ctx->errors_found == 0)
+		fstrim(ctx);
+	pthread_mutex_unlock(&ctx->lock);
+
+	return moveon;
+}
diff --git a/scrub/vfs.c b/scrub/vfs.c
index 6afeca0..cc2f483 100644
--- a/scrub/vfs.c
+++ b/scrub/vfs.c
@@ -219,3 +219,26 @@ _("Could not queue directory scan work."));
 	free(sftd);
 	return false;
 }
+
+#ifndef FITRIM
+struct fstrim_range {
+	__u64 start;
+	__u64 len;
+	__u64 minlen;
+};
+#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#endif
+
+/* Call FITRIM to trim all the unused space in a filesystem. */
+void
+fstrim(
+	struct scrub_ctx	*ctx)
+{
+	struct fstrim_range	range = {0};
+	int			error;
+
+	range.len = ULLONG_MAX;
+	error = ioctl(ctx->mnt_fd, FITRIM, &range);
+	if (error && errno != EOPNOTSUPP && errno != ENOTTY)
+		perror(_("fstrim"));
+}
diff --git a/scrub/vfs.h b/scrub/vfs.h
index d565039..99da6b5 100644
--- a/scrub/vfs.h
+++ b/scrub/vfs.h
@@ -28,4 +28,6 @@ typedef bool (*scan_fs_tree_dirent_fn)(struct scrub_ctx *, const char *,
 bool scan_fs_tree(struct scrub_ctx *ctx, scan_fs_tree_dir_fn dir_fn,
 		scan_fs_tree_dirent_fn dirent_fn, void *arg);
 
+void fstrim(struct scrub_ctx *ctx);
+
 #endif /* XFS_SCRUB_VFS_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 786e2b3..81ec7e5 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -340,6 +340,20 @@ _("%sI/O rate: %.1f%s/s in, %.1f%s/s out, %.1f%s/s tot\n"),
 	return true;
 }
 
+/* Run the preening phase if there are no errors. */
+static bool
+preen(
+	struct scrub_ctx	*ctx)
+{
+	if (ctx->errors_found) {
+		str_info(ctx, ctx->mntpoint,
+_("Errors found, please re-run with -y."));
+		return true;
+	}
+
+	return xfs_repair_fs(ctx);
+}
+
 /* Run all the phases of the scrubber. */
 static bool
 run_scrub_phases(
@@ -393,8 +407,18 @@ run_scrub_phases(
 	/* Run all phases of the scrub tool. */
 	for (phase = 1, sp = phases; sp->fn; sp++, phase++) {
 		/* Turn on certain phases if user said to. */
-		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data)
+		if (sp->fn == DATASCAN_DUMMY_FN && scrub_data) {
 			sp->fn = xfs_scan_blocks;
+		} else if (sp->fn == REPAIR_DUMMY_FN) {
+			if (ctx->mode == SCRUB_MODE_PREEN) {
+				sp->descr = _("Preen filesystem.");
+				sp->fn = preen;
+			} else if (ctx->mode == SCRUB_MODE_REPAIR) {
+				sp->descr = _("Repair filesystem.");
+				sp->fn = xfs_repair_fs;
+			}
+			sp->must_run = true;
+		}
 
 		/* Skip certain phases unless they're turned on. */
 		if (sp->fn == REPAIR_DUMMY_FN ||
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index d7b8900..9a09dcd 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -108,5 +108,6 @@ bool xfs_scan_inodes(struct scrub_ctx *ctx);
 bool xfs_scan_connections(struct scrub_ctx *ctx);
 bool xfs_scan_blocks(struct scrub_ctx *ctx);
 bool xfs_scan_summary(struct scrub_ctx *ctx);
+bool xfs_repair_fs(struct scrub_ctx *ctx);
 
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 25/27] xfs_scrub: progress indicator
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Implement a progress indicator.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man/man8/xfs_scrub.8 |   11 ++
 scrub/Makefile       |    2 
 scrub/common.c       |   23 ++++-
 scrub/phase2.c       |   14 +++
 scrub/phase3.c       |   16 ++++
 scrub/phase4.c       |   19 ++++
 scrub/phase5.c       |    2 
 scrub/phase6.c       |   28 ++++++
 scrub/progress.c     |  221 ++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/progress.h     |   33 +++++++
 scrub/read_verify.c  |    2 
 scrub/scrub.c        |   28 ++++++
 scrub/xfs_scrub.c    |   59 +++++++++++++
 scrub/xfs_scrub.h    |   14 +++
 14 files changed, 462 insertions(+), 10 deletions(-)
 create mode 100644 scrub/progress.c
 create mode 100644 scrub/progress.h


diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
index 95f4fea..dee9076 100644
--- a/man/man8/xfs_scrub.8
+++ b/man/man8/xfs_scrub.8
@@ -4,7 +4,7 @@ xfs_scrub \- scrub the contents of an XFS filesystem
 .SH SYNOPSIS
 .B xfs_scrub
 [
-.B \-abemnTvVxy
+.B \-abCemnTvVxy
 ]
 .I mount-point
 .br
@@ -47,6 +47,15 @@ time.
 If given more than once, an artificial delay of 100us is added to each
 scrub call to reduce CPU overhead even further.
 .TP
+.BI \-C " fd"
+This option causes xfs_scrub to write progress information to the
+specified file description so that the progress of the filesystem check
+can be monitored.
+If the file description is a tty, a fancy progress bar is rendered.
+Otherwise, a simple numeric status dump compatible with the
+.B fsck -C
+format is output.
+.TP
 .B \-e
 Specifies what happens when errors are detected.
 If
diff --git a/scrub/Makefile b/scrub/Makefile
index dace31e..7d52e7d 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -23,6 +23,7 @@ disk.h \
 filemap.h \
 fscounters.h \
 inodes.h \
+progress.h \
 read_verify.h \
 scrub.h \
 spacemap.h \
@@ -45,6 +46,7 @@ phase4.c \
 phase5.c \
 phase6.c \
 phase7.c \
+progress.c \
 read_verify.c \
 scrub.c \
 spacemap.c \
diff --git a/scrub/common.c b/scrub/common.c
index 2be17b9..54381a5 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -27,6 +27,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 
 /*
  * Reporting Status to the Console
@@ -55,6 +56,18 @@ xfs_scrub_excessive_errors(
 	return ret;
 }
 
+/* If stderr is a tty, clear to end of line to clean up progress bar. */
+static inline const char *stderr_start(void)
+{
+	return stderr_isatty ? CLEAR_EOL : "";
+}
+
+/* If stdout is a tty, clear to end of line to clean up progress bar. */
+static inline const char *stdout_start(void)
+{
+	return stdout_isatty ? CLEAR_EOL : "";
+}
+
 /* Print an error string and whatever error is stored in errno. */
 void
 __str_errno(
@@ -66,7 +79,7 @@ __str_errno(
 	char			buf[DESCR_BUFSZ];
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Error: %s: %s."), descr,
+	fprintf(stderr, _("%sError: %s: %s."), stderr_start(), descr,
 			strerror_r(errno, buf, DESCR_BUFSZ));
 	if (debug)
 		fprintf(stderr, _(" (%s line %d)"), file, line);
@@ -86,7 +99,7 @@ __str_errno_warn(
 	char			buf[DESCR_BUFSZ];
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Warning: %s: %s."), descr,
+	fprintf(stderr, _("%sWarning: %s: %s."), stderr_start(), descr,
 			strerror_r(errno, buf, DESCR_BUFSZ));
 	if (debug)
 		fprintf(stderr, _(" (%s line %d)"), file, line);
@@ -108,7 +121,7 @@ __str_error(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Error: %s: "), descr);
+	fprintf(stderr, _("%sError: %s: "), stderr_start(), descr);
 	va_start(args, format);
 	vfprintf(stderr, format, args);
 	va_end(args);
@@ -132,7 +145,7 @@ __str_warn(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stderr, _("Warning: %s: "), descr);
+	fprintf(stderr, _("%sWarning: %s: "), stderr_start(), descr);
 	va_start(args, format);
 	vfprintf(stderr, format, args);
 	va_end(args);
@@ -156,7 +169,7 @@ __str_info(
 	va_list			args;
 
 	pthread_mutex_lock(&ctx->lock);
-	fprintf(stdout, _("Info: %s: "), descr);
+	fprintf(stdout, _("%sInfo: %s: "), stdout_start(), descr);
 	va_start(args, format);
 	vfprintf(stdout, format, args);
 	va_end(args);
diff --git a/scrub/phase2.c b/scrub/phase2.c
index b1f2c6e..ac4f75b 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -118,3 +118,17 @@ _("Could not queue filesystem scrub work."));
 	workqueue_destroy(&wq);
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_metadata_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = xfs_scrub_estimate_ag_work(ctx);
+	*nr_threads = scrub_nproc(ctx);
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 8c3748e..68a34ed 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -30,6 +30,7 @@
 #include "common.h"
 #include "counter.h"
 #include "inodes.h"
+#include "progress.h"
 #include "scrub.h"
 
 /* Phase 3: Scan all inodes. */
@@ -121,6 +122,7 @@ xfs_scrub_inode(
 
 out:
 	ptcounter_add(icount, 1);
+	progress_add(1);
 	if (fd >= 0)
 		close(fd);
 	if (error)
@@ -152,3 +154,17 @@ xfs_scan_inodes(
 	ptcounter_free(icount);
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_inodes_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = ctx->mnt_sv.f_files - ctx->mnt_sv.f_ffree;
+	*nr_threads = scrub_nproc(ctx);
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase4.c b/scrub/phase4.c
index c7874a6..92ac276 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -31,6 +31,7 @@
 #include "workqueue.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 #include "scrub.h"
 #include "vfs.h"
 
@@ -44,9 +45,25 @@ xfs_repair_fs(
 	bool				moveon = true;
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->errors_found == 0)
+	if (moveon && ctx->errors_found == 0) {
 		fstrim(ctx);
+		progress_add(1);
+	}
 	pthread_mutex_unlock(&ctx->lock);
 
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_repair_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	*items = 1;
+	*nr_threads = 1;
+	*rshift = 0;
+	return true;
+}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index ed89266..efa27cb 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -34,6 +34,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "inodes.h"
+#include "progress.h"
 #include "scrub.h"
 #include "unicrash.h"
 
@@ -287,6 +288,7 @@ xfs_scrub_connections(
         }
 
 out:
+	progress_add(1);
 	if (fd >= 0)
 		close(fd);
 	if (error)
diff --git a/scrub/phase6.c b/scrub/phase6.c
index e6d9c69..37f2ebe 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -33,6 +33,7 @@
 #include "bitmap.h"
 #include "disk.h"
 #include "filemap.h"
+#include "fscounters.h"
 #include "inodes.h"
 #include "read_verify.h"
 #include "spacemap.h"
@@ -514,3 +515,30 @@ _("Could not create media verifier."));
 	ptvar_free(ve.rvstate);
 	return moveon;
 }
+
+/* Estimate how much work we're going to do. */
+bool
+xfs_estimate_verify_work(
+	struct scrub_ctx	*ctx,
+	uint64_t		*items,
+	unsigned int		*nr_threads,
+	int			*rshift)
+{
+	unsigned long long	d_blocks;
+	unsigned long long	d_bfree;
+	unsigned long long	r_blocks;
+	unsigned long long	r_bfree;
+	unsigned long long	f_files;
+	unsigned long long	f_free;
+	bool			moveon;
+
+	moveon = xfs_scan_estimate_blocks(ctx, &d_blocks, &d_bfree,
+				&r_blocks, &r_bfree, &f_files, &f_free);
+	if (!moveon)
+		return moveon;
+
+	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
+	*nr_threads = disk_heads(ctx->datadev);
+	*rshift = 20;
+	return moveon;
+}
diff --git a/scrub/progress.c b/scrub/progress.c
new file mode 100644
index 0000000..26920e0
--- /dev/null
+++ b/scrub/progress.c
@@ -0,0 +1,221 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "libxfs.h"
+#include <stdio.h>
+#include <dirent.h>
+#include <pthread.h>
+#include <sys/statvfs.h>
+#include "../repair/threads.h"
+#include "path.h"
+#include "disk.h"
+#include "read_verify.h"
+#include "xfs_scrub.h"
+#include "common.h"
+#include "counter.h"
+#include "progress.h"
+
+/*
+ * Progress Tracking
+ *
+ * For scrub phases that expect to take a long time, this facility uses
+ * the threaded counter and some phase/state information to report the
+ * progress of a particular phase to stdout.  Each phase that wants
+ * progress information needs to set up the tracker with an estimate of
+ * the work to be done and periodic updates when work items finish.  In
+ * return, the progress tracker will print a pretty progress bar and
+ * twiddle to a tty, or a raw numeric output compatible with fsck -C.
+ */
+struct progress_tracker {
+	FILE			*fp;
+	const char		*tag;
+	struct ptcounter	*ptc;
+	uint64_t		max;
+	unsigned int		phase;
+	int			rshift;
+	int			twiddle;
+	bool			isatty;
+	bool			terminate;
+	pthread_t		thread;
+
+	/* static state */
+	pthread_mutex_t		lock;
+	pthread_cond_t		wakeup;
+};
+
+static struct progress_tracker pt = {
+	.lock			= PTHREAD_MUTEX_INITIALIZER,
+	.wakeup			= PTHREAD_COND_INITIALIZER,
+};
+
+/* Add some progress. */
+void
+progress_add(
+	uint64_t		x)
+{
+	if (pt.fp)
+		ptcounter_add(pt.ptc, x);
+}
+
+static const char twiddles[] = "|/-\\";
+
+static void
+progress_report(
+	uint64_t		sum)
+{
+	char			buf[80];
+	int			tag_len;
+	int			num_len;
+	int			pbar_len;
+	int			plen;
+
+	if (!pt.fp)
+		return;
+
+	if (sum > pt.max)
+		sum = pt.max;
+
+	/* Emulate fsck machine-readable output (phase, cur, max, label) */
+	if (!pt.isatty) {
+		snprintf(buf, sizeof(buf), _("%u %"PRIu64" %"PRIu64" %s"),
+				pt.phase, sum, pt.max, pt.tag);
+		fprintf(pt.fp, "%s\n", buf);
+		fflush(pt.fp);
+		return;
+	}
+
+	/* Interactive twiddle progress bar. */
+	tag_len = snprintf(buf, sizeof(buf), _("Phase %u: |"), pt.phase);
+	num_len = snprintf(buf, sizeof(buf),
+			"%c %"PRIu64"/%"PRIu64" (%.1f%%)",
+			twiddles[pt.twiddle],
+			sum >> pt.rshift,
+			pt.max >> pt.rshift,
+			100.0 * sum / pt.max) + 1;
+	pbar_len = sizeof(buf) - (num_len + tag_len);
+	snprintf(buf, sizeof(buf), _("Phase %u: |"), pt.phase);
+	snprintf(buf + sizeof(buf) - num_len, num_len,
+			"%c %"PRIu64"/%"PRIu64" (%.1f%%)",
+			twiddles[pt.twiddle],
+			sum >> pt.rshift,
+			pt.max >> pt.rshift,
+			100.0 * sum / pt.max);
+	plen = (int)((double)pbar_len * sum / pt.max);
+	memset(buf + tag_len, '=', plen);
+	memset(buf + tag_len + plen, ' ', pbar_len - plen);
+	pt.twiddle = (pt.twiddle + 1) % 4;
+	fprintf(pt.fp, "%c%s\r%c", START_IGNORE, buf, END_IGNORE);
+	fflush(pt.fp);
+}
+
+#define NSEC_PER_SEC	(1000000000)
+static void *
+progress_report_thread(void *arg)
+{
+	struct timespec		abstime;
+	int			ret;
+
+	pthread_mutex_lock(&pt.lock);
+	while (1) {
+		/* Every half second. */
+		ret = clock_gettime(CLOCK_REALTIME, &abstime);
+		if (ret)
+			break;
+		abstime.tv_nsec += NSEC_PER_SEC / 2;
+		if (abstime.tv_nsec > NSEC_PER_SEC) {
+			abstime.tv_sec++;
+			abstime.tv_nsec -= NSEC_PER_SEC;
+		}
+		pthread_cond_timedwait(&pt.wakeup, &pt.lock, &abstime);
+		if (pt.terminate)
+			break;
+		progress_report(ptcounter_value(pt.ptc));
+	}
+	pthread_mutex_unlock(&pt.lock);
+	return NULL;
+}
+
+/* End a phase of progress reporting. */
+void
+progress_end_phase(void)
+{
+	if (!pt.fp)
+		return;
+
+	pthread_mutex_lock(&pt.lock);
+	pt.terminate = true;
+	pthread_mutex_unlock(&pt.lock);
+	pthread_cond_broadcast(&pt.wakeup);
+	pthread_join(pt.thread, NULL);
+
+	progress_report(pt.max);
+	ptcounter_free(pt.ptc);
+	pt.max = 0;
+	pt.ptc = NULL;
+	if (pt.fp) {
+		fprintf(pt.fp, CLEAR_EOL);
+		fflush(pt.fp);
+	}
+	pt.fp = NULL;
+}
+
+/* Set ourselves up to report progress. */
+bool
+progress_init_phase(
+	struct scrub_ctx	*ctx,
+	FILE			*fp,
+	unsigned int		phase,
+	uint64_t		max,
+	int			rshift,
+	unsigned int		nr_threads)
+{
+	int			ret;
+
+	assert(pt.fp == NULL);
+	if (fp == NULL || max == 0) {
+		pt.fp = NULL;
+		return true;
+	}
+	pt.fp = fp;
+	pt.isatty = isatty(fileno(fp));
+	pt.tag = ctx->mntpoint;
+	pt.max = max;
+	pt.phase = phase;
+	pt.rshift = rshift;
+	pt.twiddle = 0;
+	pt.terminate = false;
+
+	pt.ptc = ptcounter_init(nr_threads);
+	if (!pt.ptc)
+		goto out_max;
+
+	ret = pthread_create(&pt.thread, NULL, progress_report_thread, NULL);
+	if (ret)
+		goto out_ptcounter;
+
+	return true;
+
+out_ptcounter:
+	ptcounter_free(pt.ptc);
+	pt.ptc = NULL;
+out_max:
+	pt.max = 0;
+	pt.fp = NULL;
+	return false;
+}
diff --git a/scrub/progress.h b/scrub/progress.h
new file mode 100644
index 0000000..1fbbf77
--- /dev/null
+++ b/scrub/progress.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_PROGRESS_H_
+#define XFS_SCRUB_PROGRESS_H_
+
+#define CLEAR_EOL	"\033[K"
+#define START_IGNORE	'\001'
+#define END_IGNORE	'\002'
+
+bool progress_init_phase(struct scrub_ctx *ctx, FILE *progress_fp,
+			 unsigned int phase, uint64_t max, int rshift,
+			 unsigned int nr_threads);
+void progress_end_phase(void);
+void progress_add(uint64_t x);
+
+#endif /* XFS_SCRUB_PROGRESS_H_ */
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index b3e79a4..8fdf69b 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -31,6 +31,7 @@
 #include "counter.h"
 #include "disk.h"
 #include "read_verify.h"
+#include "progress.h"
 
 /*
  * Read Verify Pool
@@ -154,6 +155,7 @@ read_verify(
 					errno, rv->io_end_arg);
 		}
 
+		progress_add(len);
 		verified += len;
 		rv->io_start += len;
 		rv->io_length -= len;
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 2ff588c..efb1b86 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -31,6 +31,7 @@
 #include "path.h"
 #include "xfs_scrub.h"
 #include "common.h"
+#include "progress.h"
 #include "scrub.h"
 #include "xfs_errortag.h"
 
@@ -342,6 +343,7 @@ xfs_scrub_metadata(
 
 		/* Check the item. */
 		fix = xfs_check_metadata(ctx, ctx->mnt_fd, &meta, false);
+		progress_add(1);
 		switch (fix) {
 		case CHECK_ABORT:
 			return false;
@@ -384,6 +386,32 @@ xfs_scrub_fs_metadata(
 	return xfs_scrub_metadata(ctx, ST_FS, 0);
 }
 
+/* How many items do we have to check? */
+unsigned int
+xfs_scrub_estimate_ag_work(
+	struct scrub_ctx		*ctx)
+{
+	const struct scrub_descr	*sc;
+	int				type;
+	unsigned int			estimate = 0;
+
+	sc = scrubbers;
+	for (type = 0; type < XFS_SCRUB_TYPE_NR; type++, sc++) {
+		switch (sc->type) {
+		case ST_AGHEADER:
+		case ST_PERAG:
+			estimate += ctx->geo.agcount;
+			break;
+		case ST_FS:
+			estimate++;
+			break;
+		default:
+			break;
+		}
+	}
+	return estimate;
+}
+
 /* Scrub inode metadata. */
 static bool
 __xfs_scrub_file(
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 81ec7e5..3e8021e 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -32,6 +32,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "unicrash.h"
+#include "progress.h"
 
 /*
  * XFS Online Metadata Scrub (and Repair)
@@ -139,12 +140,17 @@ bool				scrub_data;
 /* Size of a memory page. */
 long				page_size;
 
+/* If stdout/stderr are ttys, we can use richer terminal control. */
+bool				stderr_isatty;
+bool				stdout_isatty;
+
 static void __attribute__((noreturn))
 usage(void)
 {
 	fprintf(stderr, _("Usage: %s [OPTIONS] mountpoint\n"), progname);
 	fprintf(stderr, _("-a:\tStop after this many errors are found.\n"));
 	fprintf(stderr, _("-b:\tBackground mode.\n"));
+	fprintf(stderr, _("-C:\tPrint progress information to this fd.\n"));
 	fprintf(stderr, _("-e:\tWhat to do if errors are found.\n"));
 	fprintf(stderr, _("-m:\tPath to /etc/mtab.\n"));
 	fprintf(stderr, _("-n:\tDry run.  Do not modify anything.\n"));
@@ -219,6 +225,8 @@ struct phase_rusage {
 struct phase_ops {
 	char		*descr;
 	bool		(*fn)(struct scrub_ctx *);
+	bool		(*estimate_work)(struct scrub_ctx *, uint64_t *,
+					 unsigned int *, int *);
 	bool		must_run;
 };
 
@@ -357,7 +365,8 @@ _("Errors found, please re-run with -y."));
 /* Run all the phases of the scrubber. */
 static bool
 run_scrub_phases(
-	struct scrub_ctx	*ctx)
+	struct scrub_ctx	*ctx,
+	FILE			*progress_fp)
 {
 	struct phase_ops phases[] =
 	{
@@ -369,22 +378,27 @@ run_scrub_phases(
 		{
 			.descr = _("Check internal metadata."),
 			.fn = xfs_scan_metadata,
+			.estimate_work = xfs_estimate_metadata_work,
 		},
 		{
 			.descr = _("Scan all inodes."),
 			.fn = xfs_scan_inodes,
+			.estimate_work = xfs_estimate_inodes_work,
 		},
 		{
 			.descr = _("Defer filesystem repairs."),
 			.fn = REPAIR_DUMMY_FN,
+			.estimate_work = xfs_estimate_repair_work,
 		},
 		{
 			.descr = _("Check directory tree."),
 			.fn = xfs_scan_connections,
+			.estimate_work = xfs_estimate_inodes_work,
 		},
 		{
 			.descr = _("Verify data file integrity."),
 			.fn = DATASCAN_DUMMY_FN,
+			.estimate_work = xfs_estimate_verify_work,
 		},
 		{
 			.descr = _("Check summary counters."),
@@ -397,9 +411,12 @@ run_scrub_phases(
 	};
 	struct phase_rusage	pi;
 	struct phase_ops	*sp;
+	uint64_t		max_work;
 	bool			moveon = true;
 	unsigned int		debug_phase = 0;
 	unsigned int		phase;
+	unsigned int		nr_threads;
+	int			rshift;
 
 	if (debug && debug_tweak_on("XFS_SCRUB_PHASE"))
 		debug_phase = atoi(getenv("XFS_SCRUB_PHASE"));
@@ -433,6 +450,18 @@ run_scrub_phases(
 		moveon = phase_start(&pi, phase, sp->descr);
 		if (!moveon)
 			break;
+		if (sp->estimate_work) {
+			moveon = sp->estimate_work(ctx, &max_work, &nr_threads,
+					&rshift);
+			if (!moveon)
+				break;
+			moveon = progress_init_phase(ctx, progress_fp, phase,
+					max_work, rshift, nr_threads);
+		} else {
+			moveon = progress_init_phase(ctx, NULL, phase, 0, 0, 0);
+		}
+		if (!moveon)
+			break;
 		moveon = sp->fn(ctx);
 		if (!moveon) {
 			str_info(ctx, ctx->mntpoint,
@@ -440,6 +469,7 @@ _("Scrub aborted after phase %d."),
 					phase);
 			break;
 		}
+		progress_end_phase();
 		moveon = phase_end(&pi, phase);
 		if (!moveon)
 			break;
@@ -461,6 +491,7 @@ main(
 	int			c;
 	char			*mtab = NULL;
 	char			*repairstr = "";
+	FILE			*progress_fp = NULL;
 	struct scrub_ctx	ctx = {0};
 	struct phase_rusage	all_pi;
 	unsigned long long	total_errors;
@@ -477,7 +508,7 @@ main(
 	pthread_mutex_init(&ctx.lock, NULL);
 	ctx.mode = SCRUB_MODE_DEFAULT;
 	ctx.error_action = ERRORS_CONTINUE;
-	while ((c = getopt(argc, argv, "a:bde:m:nTvxVy")) != EOF) {
+	while ((c = getopt(argc, argv, "a:bC:de:m:nTvxVy")) != EOF) {
 		switch (c) {
 		case 'a':
 			ctx.max_errors = cvt_u64(optarg, 10);
@@ -490,6 +521,19 @@ main(
 			nr_threads = 1;
 			bg_mode++;
 			break;
+		case 'C':
+			errno = 0;
+			ret = cvt_u32(optarg, 10);
+			if (errno) {
+				perror(optarg);
+				usage();
+			}
+			progress_fp = fdopen(ret, "w");
+			if (!progress_fp) {
+				perror(optarg);
+				usage();
+			}
+			break;
 		case 'd':
 			debug++;
 			dumpcore = true;
@@ -560,6 +604,13 @@ _("Only one of the options -n or -y may be specified.\n"));
 	unicrash_setup();
 	ctx.mntpoint = strdup(argv[optind]);
 
+	stdout_isatty = isatty(STDOUT_FILENO);
+	stderr_isatty = isatty(STDERR_FILENO);
+
+	/* If interactive, start the progress bar. */
+	if (stdout_isatty && !progress_fp)
+		progress_fp = fdopen(1, "w+");
+
 	/* Find the mount record for the passed-in argument. */
 	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
 		fprintf(stderr,
@@ -614,7 +665,7 @@ _("Only one of the options -n or -y may be specified.\n"));
 	}
 
 	/* Scrub a filesystem. */
-	moveon = run_scrub_phases(&ctx);
+	moveon = run_scrub_phases(&ctx, progress_fp);
 	if (!moveon)
 		ret |= 4;
 
@@ -656,6 +707,8 @@ _("%s: %llu warnings found.\n"),
 	if (ctx.runtime_errors)
 		ret |= 4;
 	phase_end(&all_pi, 0);
+	if (progress_fp)
+		fclose(progress_fp);
 	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 9a09dcd..da58815 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -31,6 +31,8 @@ extern bool			dumpcore;
 extern bool			verbose;
 extern bool			scrub_data;
 extern long			page_size;
+extern bool			stderr_isatty;
+extern bool			stdout_isatty;
 
 enum scrub_mode {
 	SCRUB_MODE_DRY_RUN,
@@ -110,4 +112,16 @@ bool xfs_scan_blocks(struct scrub_ctx *ctx);
 bool xfs_scan_summary(struct scrub_ctx *ctx);
 bool xfs_repair_fs(struct scrub_ctx *ctx);
 
+/* Progress estimator functions */
+uint64_t xfs_estimate_inodes(struct scrub_ctx *ctx);
+unsigned int xfs_scrub_estimate_ag_work(struct scrub_ctx *ctx);
+bool xfs_estimate_metadata_work(struct scrub_ctx *ctx, uint64_t *items,
+				unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_inodes_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_repair_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+bool xfs_estimate_verify_work(struct scrub_ctx *ctx, uint64_t *items,
+			      unsigned int *nr_threads, int *rshift);
+
 #endif /* XFS_SCRUB_XFS_SCRUB_H_ */


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:02 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
  2017-11-17 21:52 ` [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Martin Steigerwald
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an xfs_scrub_all command to find all XFS filesystems
and run an online scrub against them all.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 debian/control           |    3 +
 debian/rules             |    1 
 man/man8/xfs_scrub_all.8 |   32 ++++++++++
 scrub/Makefile           |   15 ++++
 scrub/xfs_scrub_all.in   |  154 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 201 insertions(+), 4 deletions(-)
 create mode 100644 man/man8/xfs_scrub_all.8
 create mode 100644 scrub/xfs_scrub_all.in


diff --git a/debian/control b/debian/control
index 25b8594..5a300d5 100644
--- a/debian/control
+++ b/debian/control
@@ -3,12 +3,13 @@ Section: admin
 Priority: optional
 Maintainer: XFS Development Team <linux-xfs@vger.kernel.org>
 Uploaders: Nathan Scott <nathans@debian.org>, Anibal Monsalve Salazar <anibal@debian.org>
-Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libattr1-dev, libunistring-dev
+Build-Depends: uuid-dev, dh-autoreconf, debhelper (>= 5), gettext, libtool, libreadline-gplv2-dev | libreadline5-dev, libblkid-dev (>= 2.17), linux-libc-dev, libattr1-dev, libunistring-dev, dh-python
 Standards-Version: 3.9.1
 Homepage: http://xfs.org/
 
 Package: xfsprogs
 Depends: ${shlibs:Depends}, ${misc:Depends}
+Recommends: ${python3:Depends}, util-linux
 Provides: fsck-backend
 Suggests: xfsdump, acl, attr, quota
 Breaks: xfsdump (<< 3.0.0)
diff --git a/debian/rules b/debian/rules
index baefdba..abb794e 100755
--- a/debian/rules
+++ b/debian/rules
@@ -76,6 +76,7 @@ binary-arch: checkroot built
 	$(pkgdi)  $(MAKE) -C debian install-d-i
 	$(pkgme)  $(MAKE) dist
 	rmdir debian/xfslibs-dev/usr/share/doc/xfsprogs
+	dh_python3
 	dh_installdocs
 	dh_installchangelogs
 	dh_strip
diff --git a/man/man8/xfs_scrub_all.8 b/man/man8/xfs_scrub_all.8
new file mode 100644
index 0000000..5e1420b
--- /dev/null
+++ b/man/man8/xfs_scrub_all.8
@@ -0,0 +1,32 @@
+.TH xfs_scrub_all 8
+.SH NAME
+xfs_scrub_all \- scrub all mounted XFS filesystems
+.SH SYNOPSIS
+.B xfs_scrub_all
+.SH DESCRIPTION
+.B xfs_scrub_all
+attempts to read and check all the metadata on all mounted XFS filesystems.
+The online scrub is performed via the
+.B xfs_scrub
+tool, either by running it directly or by using systemd to start it
+in a restricted fashion.
+Mounted filesystems are mapped to physical storage devices so that scrub
+operations can be run in parallel so long as no two scrubbers access
+the same device simultaneously.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub_all
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	4\	\-\ File system errors left uncorrected
+.br
+\	8\	\-\ Operational error
+.br
+\	16\	\-\ Usage or syntax error
+.TP
+These are the same error codes returned by xfs_scrub.
+.br
+.SH SEE ALSO
+.BR xfs_scrub (8).
diff --git a/scrub/Makefile b/scrub/Makefile
index 7d52e7d..efd1c1b 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -13,6 +13,8 @@ SCRUB_PREREQS=$(PKG_PLATFORM)$(HAVE_OPENAT)$(HAVE_FSTATAT)
 ifeq ($(SCRUB_PREREQS),linuxyesyes)
 LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
+XFS_SCRUB_ALL_PROG = xfs_scrub_all
+XFS_SCRUB_ARGS = -b -n
 endif	# scrub_prereqs
 
 HFILES = \
@@ -82,17 +84,24 @@ ifeq ($(HAVE_HDIO_GETGEO),yes)
 LCFLAGS += -DHAVE_HDIO_GETGEO
 endif
 
-default: depend $(LTCOMMAND)
+default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG)
+
+xfs_scrub_all: xfs_scrub_all.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" \
+		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" < $< > $@
+	$(Q)chmod a+x $@
 
 phase5.o unicrash.o xfs.o: $(TOPDIR)/include/builddefs
 
 include $(BUILDRULES)
 
-install: default $(INSTALL_SCRUB)
+install: $(INSTALL_SCRUB)
 
-install-scrub:
+install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+	$(INSTALL) -m 755 $(XFS_SCRUB_ALL_PROG) $(PKG_ROOT_SBIN_DIR)
 
 install-dev:
 
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
new file mode 100644
index 0000000..ab0981d
--- /dev/null
+++ b/scrub/xfs_scrub_all.in
@@ -0,0 +1,154 @@
+#!/usr/bin/env python3
+
+# Run online scrubbers in parallel, but avoid thrashing.
+#
+# Copyright (C) 2017 Oracle.  All rights reserved.
+#
+# Author: Darrick J. Wong <darrick.wong@oracle.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+import subprocess
+import json
+import threading
+import time
+import sys
+
+retcode = 0
+terminate = False
+
+def find_mounts():
+	'''Map mountpoints to physical disks.'''
+
+	fs = {}
+	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
+	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
+	result.wait()
+	if result.returncode != 0:
+		return fs
+	sarray = [x.decode('utf-8') for x in result.stdout.readlines()]
+	output = ' '.join(sarray)
+	bdevdata = json.loads(output)
+	# The lsblk output had better be in disks-then-partitions order
+	for bdev in bdevdata['blockdevices']:
+		if bdev['type'] in ('disk', 'loop'):
+			lastdisk = bdev['kname']
+		if bdev['fstype'] == 'xfs':
+			mnt = bdev['mountpoint']
+			if mnt is None:
+				continue
+			if mnt in fs:
+				fs[mnt].add(lastdisk)
+			else:
+				fs[mnt] = set([lastdisk])
+	return fs
+
+def run_killable(cmd, stdout, killfuncs, kill_fn):
+	'''Run a killable program.  Returns program retcode or -1 if we can't start it.'''
+	try:
+		proc = subprocess.Popen(cmd, stdout = stdout)
+		real_kill_fn = lambda: kill_fn(proc)
+		killfuncs.add(real_kill_fn)
+		proc.wait()
+		try:
+			killfuncs.remove(real_kill_fn)
+		except:
+			pass
+		return proc.returncode
+	except:
+		return -1
+
+def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
+	'''Run a scrub process.'''
+	global retcode, terminate
+
+	print("Scrubbing %s..." % mnt)
+	sys.stdout.flush()
+
+	try:
+		if terminate:
+			return
+
+		# Invoke xfs_scrub manually
+		cmd=['@sbindir@/xfs_scrub', '@scrub_args@', mnt]
+		ret = run_killable(cmd, None, killfuncs, \
+				lambda proc: proc.terminate())
+		if ret >= 0:
+			print("Scrubbing %s done, (err=%d)" % (mnt, ret))
+			sys.stdout.flush()
+			retcode |= ret
+			return
+
+		if terminate:
+			return
+
+		print("Unable to start scrub tool.")
+		sys.stdout.flush()
+	finally:
+		running_devs -= mntdevs
+		cond.acquire()
+		cond.notify()
+		cond.release()
+
+def main():
+	'''Find mounts, schedule scrub runs.'''
+	def thr(mnt, devs):
+		a = (mnt, cond, running_devs, devs, killfuncs)
+		thr = threading.Thread(target = run_scrub, args = a)
+		thr.start()
+	global retcode, terminate
+
+	fs = find_mounts()
+
+	# Schedule scrub jobs...
+	running_devs = set()
+	killfuncs = set()
+	cond = threading.Condition()
+	while len(fs) > 0:
+		if len(running_devs) == 0:
+			mnt, devs = fs.popitem()
+			running_devs.update(devs)
+			thr(mnt, devs)
+		poppers = set()
+		for mnt in fs:
+			devs = fs[mnt]
+			can_run = True
+			for dev in devs:
+				if dev in running_devs:
+					can_run = False
+					break
+			if can_run:
+				running_devs.update(devs)
+				poppers.add(mnt)
+				thr(mnt, devs)
+		for p in poppers:
+			fs.pop(p)
+		cond.acquire()
+		try:
+			cond.wait()
+		except KeyboardInterrupt:
+			terminate = True
+			print("Terminating...")
+			sys.stdout.flush()
+			while len(killfuncs) > 0:
+				fn = killfuncs.pop()
+				fn()
+			fs = []
+		cond.release()
+
+	sys.exit(retcode)
+
+if __name__ == '__main__':
+	main()


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 27/27] xfs_scrub: integrate services with systemd
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
@ 2017-11-17 21:02 ` Darrick J. Wong
  2017-11-17 21:52 ` [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Martin Steigerwald
  27 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-17 21:02 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Create a systemd service unit so that we can run the online scrubber
under systemd with (somewhat) appropriate containment.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .gitignore                       |    4 +++
 configure.ac                     |   15 +++++++++++
 include/builddefs.in             |    3 ++
 scrub/Makefile                   |   32 ++++++++++++++++++++++-
 scrub/xfs_scrub.c                |   25 ++++++++++++++++++
 scrub/xfs_scrub@.service.in      |   18 +++++++++++++
 scrub/xfs_scrub_all.cron.in      |    2 +
 scrub/xfs_scrub_all.in           |   53 ++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub_all.service.in   |    8 ++++++
 scrub/xfs_scrub_all.timer        |   11 ++++++++
 scrub/xfs_scrub_fail             |   26 +++++++++++++++++++
 scrub/xfs_scrub_fail@.service.in |   10 +++++++
 12 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 scrub/xfs_scrub@.service.in
 create mode 100644 scrub/xfs_scrub_all.cron.in
 create mode 100644 scrub/xfs_scrub_all.service.in
 create mode 100644 scrub/xfs_scrub_all.timer
 create mode 100755 scrub/xfs_scrub_fail
 create mode 100644 scrub/xfs_scrub_fail@.service.in


diff --git a/.gitignore b/.gitignore
index a3db640..d887451 100644
--- a/.gitignore
+++ b/.gitignore
@@ -69,6 +69,10 @@ cscope.*
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
 /scrub/xfs_scrub
+/scrub/xfs_scrub@.service
+/scrub/xfs_scrub_all
+/scrub/xfs_scrub_all.service
+/scrub/xfs_scrub_fail@.service
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/configure.ac b/configure.ac
index 0a2e7f3..2371440 100644
--- a/configure.ac
+++ b/configure.ac
@@ -121,6 +121,21 @@ esac
 AC_SUBST([root_sbindir])
 AC_SUBST([root_libdir])
 
+# Where do systemd services go?
+pkg_systemdsystemunitdir="$(pkg-config --variable=systemdsystemunitdir systemd 2>/dev/null)"
+case "${pkg_systemdsystemunitdir}" in
+"")
+	systemdsystemunitdir=""
+	have_systemd=no
+	;;
+*)
+	systemdsystemunitdir="${pkg_systemdsystemunitdir}"
+	have_systemd=yes
+	;;
+esac
+AC_SUBST([have_systemd])
+AC_SUBST([systemdsystemunitdir])
+
 # Find localized files.  Don't descend into any "dot directories"
 # (like .git or .pc from quilt).  Strangely, the "-print" argument
 # to "find" is required, to avoid including such directories in the
diff --git a/include/builddefs.in b/include/builddefs.in
index 0e358d0..984e424 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -124,6 +124,9 @@ HAVE_FSTATAT = @have_fstatat@
 HAVE_SG_IO = @have_sg_io@
 HAVE_HDIO_GETGEO = @have_hdio_getgeo@
 
+HAVE_SYSTEMD = @have_systemd@
+SYSTEMDSYSTEMUNITDIR = @systemdsystemunitdir@
+
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
 
diff --git a/scrub/Makefile b/scrub/Makefile
index efd1c1b..7855876 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -15,6 +15,16 @@ LTCOMMAND = xfs_scrub
 INSTALL_SCRUB = install-scrub
 XFS_SCRUB_ALL_PROG = xfs_scrub_all
 XFS_SCRUB_ARGS = -b -n
+ifeq ($(HAVE_SYSTEMD),yes)
+INSTALL_SCRUB += install-systemd
+SYSTEMDSERVICES = xfs_scrub@.service xfs_scrub_all.service xfs_scrub_all.timer xfs_scrub_fail@.service
+endif
+CRONSERVICES = xfs_scrub_all.cron
+CROND_DIR = /etc/cron.d
+
+# Disable all the crontabs for now
+CROND_DIR = $(PKG_LIB_DIR)/$(PKG_NAME)
+
 endif	# scrub_prereqs
 
 HFILES = \
@@ -84,7 +94,8 @@ ifeq ($(HAVE_HDIO_GETGEO),yes)
 LCFLAGS += -DHAVE_HDIO_GETGEO
 endif
 
-default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG)
+default: depend $(LTCOMMAND) $(XFS_SCRUB_ALL_PROG) $(SYSTEMDSERVICES) \
+	$(CRONSERVICES)
 
 xfs_scrub_all: xfs_scrub_all.in
 	@echo "    [SED]    $@"
@@ -98,10 +109,29 @@ include $(BUILDRULES)
 
 install: $(INSTALL_SCRUB)
 
+%.service: %.service.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" \
+		   -e "s|@scrub_args@|$(XFS_SCRUB_ARGS)|g" \
+		   -e "s|@pkg_lib_dir@|$(PKG_LIB_DIR)|g" \
+		   -e "s|@pkg_name@|$(PKG_NAME)|g" < $< > $@
+
+%.cron: %.cron.in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@sbindir@|$(PKG_ROOT_SBIN_DIR)|g" < $< > $@
+
+install-systemd: default
+	$(INSTALL) -m 755 -d $(SYSTEMDSYSTEMUNITDIR)
+	$(INSTALL) -m 644 $(SYSTEMDSERVICES) $(SYSTEMDSYSTEMUNITDIR)
+	$(INSTALL) -m 755 -d $(PKG_LIB_DIR)/$(PKG_NAME)
+	$(INSTALL) -m 755 xfs_scrub_fail $(PKG_LIB_DIR)/$(PKG_NAME)
+
 install-scrub: default
 	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
 	$(INSTALL) -m 755 $(XFS_SCRUB_ALL_PROG) $(PKG_ROOT_SBIN_DIR)
+	$(INSTALL) -m 755 -d $(CROND_DIR)
+	$(INSTALL) -m 644 $(CRONSERVICES) $(CROND_DIR)
 
 install-dev:
 
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 3e8021e..c1b8f94 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -144,6 +144,12 @@ long				page_size;
 bool				stderr_isatty;
 bool				stdout_isatty;
 
+/*
+ * If we are running as a service, we need to be careful about what
+ * error codes we return to the calling process.
+ */
+bool				is_service;
+
 static void __attribute__((noreturn))
 usage(void)
 {
@@ -611,6 +617,9 @@ _("Only one of the options -n or -y may be specified.\n"));
 	if (stdout_isatty && !progress_fp)
 		progress_fp = fdopen(1, "w+");
 
+	if (getenv("SERVICE_MODE"))
+		is_service = true;
+
 	/* Find the mount record for the passed-in argument. */
 	if (stat(argv[optind], &ctx.mnt_sb) < 0) {
 		fprintf(stderr,
@@ -712,5 +721,21 @@ _("%s: %llu warnings found.\n"),
 	free(ctx.blkdev);
 	free(ctx.mntpoint);
 
+	/*
+	 * If we're running as a service, bump return code up by 150 to
+	 * avoid conflicting with (sysvinit) service return codes.
+	 */
+	if (is_service) {
+		/*
+		 * journald queries /proc as part of taking in log
+		 * messages; it uses this information to associate the
+		 * message with systemd units, etc.  This races with
+		 * process exit, so delay that a couple of seconds so
+		 * that we capture the summary outputs in the job log.
+		 */
+		sleep(2);
+		if (ret)
+			ret += 150;
+	}
 	return ret;
 }
diff --git a/scrub/xfs_scrub@.service.in b/scrub/xfs_scrub@.service.in
new file mode 100644
index 0000000..6b6992d
--- /dev/null
+++ b/scrub/xfs_scrub@.service.in
@@ -0,0 +1,18 @@
+[Unit]
+Description=Online XFS Metadata Check for %I
+OnFailure=xfs_scrub_fail@%i.service
+
+[Service]
+Type=oneshot
+WorkingDirectory=%I
+PrivateNetwork=true
+ProtectSystem=full
+ProtectHome=read-only
+PrivateTmp=yes
+AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO
+NoNewPrivileges=yes
+User=nobody
+IOSchedulingClass=idle
+CPUSchedulingPolicy=idle
+Environment=SERVICE_MODE=1
+ExecStart=@sbindir@/xfs_scrub @scrub_args@ %I
diff --git a/scrub/xfs_scrub_all.cron.in b/scrub/xfs_scrub_all.cron.in
new file mode 100644
index 0000000..ec82236
--- /dev/null
+++ b/scrub/xfs_scrub_all.cron.in
@@ -0,0 +1,2 @@
+SERVICE_MODE=1
+10 3 * * 0 root test -e /run/systemd/system || @sbindir@/xfs_scrub_all
diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index ab0981d..9a9abe5 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -25,10 +25,19 @@ import json
 import threading
 import time
 import sys
+import os
 
 retcode = 0
 terminate = False
 
+def DEVNULL():
+	'''Return /dev/null in subprocess writable format.'''
+	try:
+		from subprocess import DEVNULL
+		return DEVNULL
+	except ImportError:
+		return open(os.devnull, 'wb')
+
 def find_mounts():
 	'''Map mountpoints to physical disks.'''
 
@@ -55,6 +64,13 @@ def find_mounts():
 				fs[mnt] = set([lastdisk])
 	return fs
 
+def kill_systemd(unit, proc):
+	'''Kill systemd unit.'''
+	proc.terminate()
+	cmd=['systemctl', 'stop', unit]
+	x = subprocess.Popen(cmd)
+	x.wait()
+
 def run_killable(cmd, stdout, killfuncs, kill_fn):
 	'''Run a killable program.  Returns program retcode or -1 if we can't start it.'''
 	try:
@@ -81,6 +97,19 @@ def run_scrub(mnt, cond, running_devs, mntdevs, killfuncs):
 		if terminate:
 			return
 
+		# Try it the systemd way
+		cmd=['systemctl', 'start', 'xfs_scrub@%s' % mnt]
+		ret = run_killable(cmd, DEVNULL(), killfuncs, \
+				lambda proc: kill_systemd('xfs_scrub@%s' % mnt, proc))
+		if ret == 0 or ret == 1:
+			print("Scrubbing %s done, (err=%d)" % (mnt, ret))
+			sys.stdout.flush()
+			retcode |= ret
+			return
+
+		if terminate:
+			return
+
 		# Invoke xfs_scrub manually
 		cmd=['@sbindir@/xfs_scrub', '@scrub_args@', mnt]
 		ret = run_killable(cmd, None, killfuncs, \
@@ -112,6 +141,17 @@ def main():
 
 	fs = find_mounts()
 
+	# Tail the journal if we ourselves aren't a service...
+	journalthread = None
+	if 'SERVICE_MODE' not in os.environ:
+		try:
+			cmd=['journalctl', '--no-pager', '-q', '-S', 'now', \
+					'-f', '-u', 'xfs_scrub@*', '-o', \
+					'cat']
+			journalthread = subprocess.Popen(cmd)
+		except:
+			pass
+
 	# Schedule scrub jobs...
 	running_devs = set()
 	killfuncs = set()
@@ -148,6 +188,19 @@ def main():
 			fs = []
 		cond.release()
 
+	if journalthread is not None:
+		journalthread.terminate()
+
+	# journald queries /proc as part of taking in log
+	# messages; it uses this information to associate the
+	# message with systemd units, etc.  This races with
+	# process exit, so delay that a couple of seconds so
+	# that we capture the summary outputs in the job log.
+	if 'SERVICE_MODE' in os.environ:
+		time.sleep(2)
+		if retcode:
+			retcode += 150
+
 	sys.exit(retcode)
 
 if __name__ == '__main__':
diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
new file mode 100644
index 0000000..683804e
--- /dev/null
+++ b/scrub/xfs_scrub_all.service.in
@@ -0,0 +1,8 @@
+[Unit]
+Description=Online XFS Metadata Check for All Filesystems
+ConditionACPower=true
+
+[Service]
+Type=oneshot
+Environment=SERVICE_MODE=1
+ExecStart=@sbindir@/xfs_scrub_all
diff --git a/scrub/xfs_scrub_all.timer b/scrub/xfs_scrub_all.timer
new file mode 100644
index 0000000..2e4a33b
--- /dev/null
+++ b/scrub/xfs_scrub_all.timer
@@ -0,0 +1,11 @@
+[Unit]
+Description=Periodic XFS Online Metadata Check for All Filesystems
+
+[Timer]
+# Run on Sunday at 3:10am, to avoid running afoul of DST changes
+OnCalendar=Sun *-*-* 03:10:00
+RandomizedDelaySec=60
+Persistent=true
+
+[Install]
+WantedBy=timers.target
diff --git a/scrub/xfs_scrub_fail b/scrub/xfs_scrub_fail
new file mode 100755
index 0000000..36dd50e
--- /dev/null
+++ b/scrub/xfs_scrub_fail
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+# Email logs of failed xfs_scrub unit runs
+
+mailer=/usr/sbin/sendmail
+recipient="$1"
+test -z "${recipient}" && exit 0
+mntpoint="$2"
+test -z "${mntpoint}" && exit 0
+hostname="$(hostname -f 2>/dev/null)"
+test -z "${hostname}" && hostname="${HOSTNAME}"
+if [ ! -x "${mailer}" ]; then
+	echo "${mailer}: Mailer program not found."
+	exit 1
+fi
+
+(cat << ENDL
+To: $1
+From: <xfs_scrub@${hostname}>
+Subject: xfs_scrub failure on ${mntpoint}
+
+So sorry, the automatic xfs_scrub of ${mntpoint} on ${hostname} failed.
+
+A log of what happened follows:
+ENDL
+systemctl status --full --lines 4294967295 "xfs_scrub@${mntpoint}") | "${mailer}" -t -i
diff --git a/scrub/xfs_scrub_fail@.service.in b/scrub/xfs_scrub_fail@.service.in
new file mode 100644
index 0000000..785f881
--- /dev/null
+++ b/scrub/xfs_scrub_fail@.service.in
@@ -0,0 +1,10 @@
+[Unit]
+Description=Online XFS Metadata Check Failure Reporting for %I
+
+[Service]
+Type=oneshot
+Environment=EMAIL_ADDR=root
+ExecStart=@pkg_lib_dir@/@pkg_name@/xfs_scrub_fail "${EMAIL_ADDR}" %I
+User=mail
+Group=mail
+SupplementaryGroups=systemd-journal


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support
  2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2017-11-17 21:02 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
@ 2017-11-17 21:52 ` Martin Steigerwald
  2017-11-20 17:30   ` Darrick J. Wong
  27 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2017-11-17 21:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

Hello Darrick.

Darrick J. Wong - 17.11.17, 22:00:
> If you're going to start using this mess, you probably ought to just
> pull from my git tree for xfsprogs[1].  This series relies on the
> libfrog patches sent earlier.  Kernel support will appear in 4.15-rc1.

No extraordinary way to eat your data?

I am a tad bit disappointed.


Okay, jokes aside: Thank you very much for this work on XFS. Sometimes I would 
not be surprised if you at one point would even have a go at implementing 
snapshots for XFS. (Did he actually say the snapshot word. He didn´t, or did 
he?)

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support
  2017-11-17 21:52 ` [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Martin Steigerwald
@ 2017-11-20 17:30   ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2017-11-20 17:30 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: sandeen, linux-xfs

On Fri, Nov 17, 2017 at 10:52:47PM +0100, Martin Steigerwald wrote:
> Hello Darrick.
> 
> Darrick J. Wong - 17.11.17, 22:00:
> > If you're going to start using this mess, you probably ought to just
> > pull from my git tree for xfsprogs[1].  This series relies on the
> > libfrog patches sent earlier.  Kernel support will appear in 4.15-rc1.
> 
> No extraordinary way to eat your data?
> 
> I am a tad bit disappointed.

Please have patience until I get around to posting online *repair*. :)

> Okay, jokes aside: Thank you very much for this work on XFS. Sometimes I would 
> not be surprised if you at one point would even have a go at implementing 
> snapshots for XFS. (Did he actually say the snapshot word. He didn´t, or did 
> he?)

/me wonders if amir will get there first via overlayfs...

--D

> Thanks,
> -- 
> Martin

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-11-20 17:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-17 21:00 [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Darrick J. Wong
2017-11-17 21:00 ` [PATCH 01/27] xfs_scrub: create online filesystem scrub program Darrick J. Wong
2017-11-17 21:00 ` [PATCH 02/27] xfs_scrub: common error handling Darrick J. Wong
2017-11-17 21:00 ` [PATCH 03/27] xfs_scrub: set up command line argument parsing Darrick J. Wong
2017-11-17 21:00 ` [PATCH 04/27] xfs_scrub: dispatch the various phases of the scrub program Darrick J. Wong
2017-11-17 21:00 ` [PATCH 05/27] xfs_scrub: figure out how many threads we're going to need Darrick J. Wong
2017-11-17 21:00 ` [PATCH 06/27] xfs_scrub: create an abstraction for a block device Darrick J. Wong
2017-11-17 21:00 ` [PATCH 07/27] xfs_scrub: find XFS filesystem geometry Darrick J. Wong
2017-11-17 21:00 ` [PATCH 08/27] xfs_scrub: add inode iteration functions Darrick J. Wong
2017-11-17 21:01 ` [PATCH 09/27] xfs_scrub: add space map " Darrick J. Wong
2017-11-17 21:01 ` [PATCH 10/27] xfs_scrub: add file " Darrick J. Wong
2017-11-17 21:01 ` [PATCH 11/27] xfs_scrub: filesystem counter collection functions Darrick J. Wong
2017-11-17 21:01 ` [PATCH 12/27] xfs_scrub: wrap the scrub ioctl Darrick J. Wong
2017-11-17 21:01 ` [PATCH 13/27] xfs_scrub: scan filesystem and AG metadata Darrick J. Wong
2017-11-17 21:01 ` [PATCH 14/27] xfs_scrub: thread-safe stats counter Darrick J. Wong
2017-11-17 21:01 ` [PATCH 15/27] xfs_scrub: scan inodes Darrick J. Wong
2017-11-17 21:01 ` [PATCH 16/27] xfs_scrub: check directory connectivity Darrick J. Wong
2017-11-17 21:01 ` [PATCH 17/27] xfs_scrub: warn about suspicious characters in directory/xattr names Darrick J. Wong
2017-11-17 21:01 ` [PATCH 18/27] xfs_scrub: warn about normalized Unicode name collisions Darrick J. Wong
2017-11-17 21:02 ` [PATCH 19/27] xfs_scrub: create a bitmap data structure Darrick J. Wong
2017-11-17 21:02 ` [PATCH 20/27] xfs_scrub: create infrastructure to read verify data blocks Darrick J. Wong
2017-11-17 21:02 ` [PATCH 21/27] xfs_scrub: scrub file " Darrick J. Wong
2017-11-17 21:02 ` [PATCH 22/27] xfs_scrub: optionally use SCSI READ VERIFY commands to scrub data blocks on disk Darrick J. Wong
2017-11-17 21:02 ` [PATCH 23/27] xfs_scrub: check summary counters Darrick J. Wong
2017-11-17 21:02 ` [PATCH 24/27] xfs_scrub: fstrim the free areas if there are no errors on the filesystem Darrick J. Wong
2017-11-17 21:02 ` [PATCH 25/27] xfs_scrub: progress indicator Darrick J. Wong
2017-11-17 21:02 ` [PATCH 26/27] xfs_scrub: create a script to scrub all xfs filesystems Darrick J. Wong
2017-11-17 21:02 ` [PATCH 27/27] xfs_scrub: integrate services with systemd Darrick J. Wong
2017-11-17 21:52 ` [PATCH v10 00/27] xfsprogs-4.15: online scrub/repair support Martin Steigerwald
2017-11-20 17:30   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).