From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 5/5] repair: limit auto-striding concurrency apprpriately
Date: Thu, 12 Dec 2013 18:22:25 +1100 [thread overview]
Message-ID: <1386832945-19763-6-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1386832945-19763-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
It's possible to have filesystems with hundreds of AGs on systems
with little concurrency and resources. In this case, we can easily
exhaust memory and fail to create threads and have all sorts of
interesting problems.
xfs/250 can cause this to occur, with failures like:
- agno = 707
- agno = 692
fatal error -- cannot create worker threads, error = [11] Resource temporarily unavailable
And this:
- agno = 484
- agno = 782
failed to create prefetch thread: Resource temporarily unavailable
Because it's trying to create more threads than a poor little 512MB
single CPU ia32 box can handle.
So, limit concurrency to a maximum of numcpus * 8 to prevent this.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
include/libxfs.h | 1 +
libxfs/init.h | 1 -
repair/xfs_repair.c | 18 +++++++++++++++++-
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/libxfs.h b/include/libxfs.h
index 4bf331c..39e3d85 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -144,6 +144,7 @@ extern void libxfs_device_close (dev_t);
extern int libxfs_device_alignment (void);
extern void libxfs_report(FILE *);
extern void platform_findsizes(char *path, int fd, long long *sz, int *bsz);
+extern int platform_nproc(void);
/* check or write log footer: specify device, log size in blocks & uuid */
typedef xfs_caddr_t (libxfs_get_block_t)(xfs_caddr_t, int, void *);
diff --git a/libxfs/init.h b/libxfs/init.h
index f0b8cb6..112febb 100644
--- a/libxfs/init.h
+++ b/libxfs/init.h
@@ -31,7 +31,6 @@ extern char *platform_findrawpath (char *path);
extern char *platform_findblockpath (char *path);
extern int platform_direct_blockdev (void);
extern int platform_align_blockdev (void);
-extern int platform_nproc(void);
extern unsigned long platform_physmem(void); /* in kilobytes */
extern int platform_has_uuid;
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 7beffcb..0d006ae 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -627,13 +627,29 @@ main(int argc, char **argv)
* to target these for an increase in thread count. Hence a stride value
* of 15 is chosen to ensure we get at least 2 AGs being scanned at once
* on such filesystems.
+ *
+ * Limit the maximum thread count based on the available CPU power that
+ * is available. If we use too many threads, we might run out of memory
+ * and CPU power before we run out of IO concurrency.
*/
if (!ag_stride && glob_agcount >= 16 && do_prefetch)
ag_stride = 15;
if (ag_stride) {
+ int max_threads = platform_nproc() * 8;
+
thread_count = (glob_agcount + ag_stride - 1) / ag_stride;
- thread_init();
+ while (thread_count > max_threads) {
+ ag_stride *= 2;
+ thread_count = (glob_agcount + ag_stride - 1) /
+ ag_stride;
+ }
+ if (thread_count > 0)
+ thread_init();
+ else {
+ thread_count = 1;
+ ag_stride = 0;
+ }
}
if (ag_stride && report_interval) {
--
1.8.4.rc3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-12-12 7:22 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 7:22 [PATCH 0/5] xfs_repair: scalability inmprovements Dave Chinner
2013-12-12 7:22 ` [PATCH 1/5] repair: translation lookups limit scalability Dave Chinner
2013-12-12 18:26 ` Christoph Hellwig
2013-12-12 18:58 ` Brian Foster
2013-12-12 7:22 ` [PATCH 2/5] repair: per AG locks contend for cachelines Dave Chinner
2013-12-12 18:27 ` Christoph Hellwig
2013-12-12 18:58 ` Brian Foster
2013-12-12 20:46 ` Dave Chinner
2013-12-12 7:22 ` [PATCH 3/5] repair: phase 6 is trivially parallelisable Dave Chinner
2013-12-12 18:43 ` Christoph Hellwig
2013-12-12 20:53 ` Dave Chinner
2013-12-12 18:59 ` Brian Foster
2013-12-12 7:22 ` [PATCH 4/5] libxfs: buffer cache hashing is suboptimal Dave Chinner
2013-12-12 18:28 ` Christoph Hellwig
2013-12-12 18:59 ` Brian Foster
2013-12-12 20:56 ` Dave Chinner
2013-12-13 14:23 ` Brian Foster
2013-12-12 7:22 ` Dave Chinner [this message]
2013-12-12 18:29 ` [PATCH 5/5] repair: limit auto-striding concurrency apprpriately Christoph Hellwig
2013-12-12 21:00 ` Dave Chinner
2013-12-12 18:59 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1386832945-19763-6-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox