[Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup
@ 2018-05-29  8:05 Abhi Das
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument Abhi Das
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Abhi Das @ 2018-05-29  8:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patchset uses bio(s) to read in the journal in large chunks
to lookup the journal head. This will replace the existing
bisect method of reading the journal block-by-block to find the
head.

These are RHEL7 patches because it was simpler for me to
test. Upstream versions shouldn't be very dissimilar. I'll port
once I get some feedback on this.

I'm seeing some promising results with some preliminary testing.
I have 5 lock_nolock filesystems saved off using 'gfs2_edit savemeta'
after killing the node mid-IO so as to save unrecovered journals.

All times are in milliseconds. This table below shows the overall
time taken for recovery and the jhead column represents the time
taken to find the jhead alone. Finding the jhead is a big chunk of
the overall time taken for recovery.

		bisect	      | 	bio
	overall		jhead |	overall		jhead
------------------------------|-----------------------
fs1	1702		1642  |	394		325
fs2	761		334   |	706		356
fs3	1150		904   | 1356		1089
fs4	953		596   |	691		343
fs5	1262		1233  |	554		524

Abhi Das (3):
  gfs2: allow map_journal_extents() to take a journal descriptor as
    argument
  gfs2: add timing info for various stages of journal recovery
  gfs2: read journal in large chunks to locate the head

 fs/gfs2/incore.h     |   7 ++-
 fs/gfs2/log.c        |   2 +-
 fs/gfs2/log.h        |   2 +
 fs/gfs2/lops.c       | 115 +++++++++++++++++++++++++++++++++++++++---
 fs/gfs2/lops.h       |   1 +
 fs/gfs2/ops_fstype.c |  10 ++--
 fs/gfs2/recovery.c   | 138 +++++++++------------------------------------------
 fs/gfs2/recovery.h   |   1 +
 8 files changed, 149 insertions(+), 127 deletions(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument
  2018-05-29  8:05 [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Abhi Das
@ 2018-05-29  8:05 ` Abhi Das
  2018-05-29  9:57   ` Steven Whitehouse
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery Abhi Das
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Abhi Das @ 2018-05-29  8:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This function now maps the extents for the journal whose descriptor
is passed in as argument.

Signed-off-by: Abhi Das <adas@redhat.com>
---
 fs/gfs2/log.h        | 1 +
 fs/gfs2/ops_fstype.c | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 92dcbe7..19c93df 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -75,4 +75,5 @@ extern int gfs2_logd(void *data);
 extern void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd);
 extern void gfs2_write_revokes(struct gfs2_sbd *sdp);
 
+extern int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
 #endif /* __LOG_DOT_H__ */
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 228f38e..cf3e366 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -524,9 +524,8 @@ out:
  *       but since it's only done at mount time, I'm not worried about the
  *       time it takes.
  */
-static int map_journal_extents(struct gfs2_sbd *sdp)
+int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 {
-	struct gfs2_jdesc *jd = sdp->sd_jdesc;
 	unsigned int lb;
 	u64 db, prev_db; /* logical block, disk block, prev disk block */
 	struct gfs2_inode *ip = GFS2_I(jd->jd_inode);
@@ -772,7 +771,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 		atomic_set(&sdp->sd_log_thresh2, 4*sdp->sd_jdesc->jd_blocks/5);
 
 		/* Map the extents for this journal's blocks */
-		map_journal_extents(sdp);
+		map_journal_extents(sdp, sdp->sd_jdesc);
 	}
 	trace_gfs2_log_blocks(sdp, atomic_read(&sdp->sd_log_blks_free));
 
-- 
2.4.11



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery
  2018-05-29  8:05 [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Abhi Das
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument Abhi Das
@ 2018-05-29  8:05 ` Abhi Das
  2018-05-29  9:59   ` Steven Whitehouse
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head Abhi Das
  2018-05-29  9:57 ` [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Steven Whitehouse
  3 siblings, 1 reply; 8+ messages in thread
From: Abhi Das @ 2018-05-29  8:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Tells you how many milliseconds each stage of journal recovery
takes.

Signed-off-by: Abhi Das <adas@redhat.com>
---
 fs/gfs2/ops_fstype.c |  5 +++++
 fs/gfs2/recovery.c   | 20 ++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index cf3e366..fd460c1 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -532,7 +532,9 @@ int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 	struct gfs2_journal_extent *jext = NULL;
 	struct buffer_head bh;
 	int rc = 0;
+	ktime_t start, end;
 
+	start = ktime_get();
 	prev_db = 0;
 
 	for (lb = 0; lb < i_size_read(jd->jd_inode) >> sdp->sd_sb.sb_bsize_shift; lb++) {
@@ -564,6 +566,9 @@ int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 		}
 		prev_db = db;
 	}
+	end = ktime_get();
+	fs_info(sdp, "jid=%u: Journal extent mapped in %lldms\n", jd->jd_jid,
+		ktime_ms_delta(end, start));
 	return rc;
 }
 
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 56dea44..4b042db 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -14,6 +14,7 @@
 #include <linux/buffer_head.h>
 #include <linux/gfs2_ondisk.h>
 #include <linux/crc32.h>
+#include <linux/ktime.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -455,12 +456,13 @@ void gfs2_recover_func(struct work_struct *work)
 	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
 	struct gfs2_log_header_host head;
 	struct gfs2_holder j_gh, ji_gh, t_gh;
-	unsigned long t;
+	ktime_t t_start, t_jlck, t_jhd, t_tlck, t_rep;
 	int ro = 0;
 	unsigned int pass;
 	int error;
 	int jlocked = 0;
 
+	t_start = ktime_get();
 	if (sdp->sd_args.ar_spectator ||
 	    (jd->jd_jid != sdp->sd_lockstruct.ls_jid)) {
 		fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n",
@@ -492,6 +494,7 @@ void gfs2_recover_func(struct work_struct *work)
 		fs_info(sdp, "jid=%u, already locked for use\n", jd->jd_jid);
 	}
 
+	t_jlck = ktime_get();
 	fs_info(sdp, "jid=%u: Looking at journal...\n", jd->jd_jid);
 
 	error = gfs2_jdesc_check(jd);
@@ -501,13 +504,12 @@ void gfs2_recover_func(struct work_struct *work)
 	error = gfs2_find_jhead(jd, &head);
 	if (error)
 		goto fail_gunlock_ji;
+	t_jhd = ktime_get();
 
 	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
 		fs_info(sdp, "jid=%u: Acquiring the transaction lock...\n",
 			jd->jd_jid);
 
-		t = jiffies;
-
 		/* Acquire a shared hold on the transaction lock */
 
 		error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED,
@@ -541,6 +543,7 @@ void gfs2_recover_func(struct work_struct *work)
 			goto fail_gunlock_tr;
 		}
 
+		t_tlck = ktime_get();
 		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);
 
 		for (pass = 0; pass < 2; pass++) {
@@ -557,9 +560,14 @@ void gfs2_recover_func(struct work_struct *work)
 			goto fail_gunlock_tr;
 
 		gfs2_glock_dq_uninit(&t_gh);
-		t = DIV_ROUND_UP(jiffies - t, HZ);
-		fs_info(sdp, "jid=%u: Journal replayed in %lus\n",
-			jd->jd_jid, t);
+		t_rep = ktime_get();
+		fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "
+			"jhead:%lldms, tlck:%lldms, replay:%lldms]\n",
+			jd->jd_jid, ktime_ms_delta(t_rep, t_start),
+			ktime_ms_delta(t_jlck, t_start),
+			ktime_ms_delta(t_jhd, t_jlck),
+			ktime_ms_delta(t_tlck, t_jhd),
+			ktime_ms_delta(t_rep, t_tlck));
 	}
 
 	gfs2_recovery_done(sdp, jd->jd_jid, LM_RD_SUCCESS);
-- 
2.4.11



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head
  2018-05-29  8:05 [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Abhi Das
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument Abhi Das
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery Abhi Das
@ 2018-05-29  8:05 ` Abhi Das
  2018-05-29 10:15   ` Steven Whitehouse
  2018-05-29  9:57 ` [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Steven Whitehouse
  3 siblings, 1 reply; 8+ messages in thread
From: Abhi Das @ 2018-05-29  8:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Use bio(s) to read in the journal sequentially in large chunks and
locate the head of the journal.
This is faster in most cases when compared to the existing bisect
method which operates one block at a time.

Signed-off-by: Abhi Das <adas@redhat.com>
---
 fs/gfs2/incore.h   |   7 +++-
 fs/gfs2/log.c      |   2 +-
 fs/gfs2/log.h      |   1 +
 fs/gfs2/lops.c     | 115 +++++++++++++++++++++++++++++++++++++++++++++++----
 fs/gfs2/lops.h     |   1 +
 fs/gfs2/recovery.c | 118 ++++-------------------------------------------------
 fs/gfs2/recovery.h |   1 +
 7 files changed, 127 insertions(+), 118 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index f303616..82d50a0 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -494,18 +494,23 @@ struct gfs2_journal_extent {
 	u64 blocks;
 };
 
+enum {
+	JDF_RECOVERY        = 1,
+	JDF_JHEAD           = 2,
+};
+
 struct gfs2_jdesc {
 	struct list_head jd_list;
 	struct list_head extent_list;
 	struct work_struct jd_work;
 	struct inode *jd_inode;
 	unsigned long jd_flags;
-#define JDF_RECOVERY 1
 	unsigned int jd_jid;
 	unsigned int jd_blocks;
 	int jd_recover_error;
 	/* Replay stuff */
 
+	struct gfs2_log_header_host jd_jhead;
 	unsigned int jd_found_blocks;
 	unsigned int jd_found_revokes;
 	unsigned int jd_replayed_blocks;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 15a3a8c..523cf79 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -469,7 +469,7 @@ static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail)
 }
 
 
-static void log_flush_wait(struct gfs2_sbd *sdp)
+void log_flush_wait(struct gfs2_sbd *sdp)
 {
 	DEFINE_WAIT(wait);
 
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 19c93df..95b1ea3 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -65,6 +65,7 @@ extern unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
 
 extern int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
 extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl);
+extern void log_flush_wait(struct gfs2_sbd *sdp);
 extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
 extern void gfs2_remove_from_ail(struct gfs2_bufdata *bd);
 extern void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 4da6055..48d7e6d 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -16,6 +16,7 @@
 #include <linux/gfs2_ondisk.h>
 #include <linux/bio.h>
 #include <linux/fs.h>
+#include <linux/crc32.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -228,6 +229,60 @@ static void gfs2_end_log_write(struct bio *bio, int error)
 		wake_up(&sdp->sd_log_flush_wait);
 }
 
+static void gfs2_end_log_read(struct bio *bio, int error)
+{
+	struct gfs2_jdesc *jd = bio->bi_private;
+	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
+	struct page *page;
+	struct bio_vec *bvec;
+	int i, last;
+
+	if (error) {
+		sdp->sd_log_error = error;
+		fs_err(sdp, "Error %d reading from journal, jid=%u\n", error,
+		       jd->jd_jid);
+		wake_up(&sdp->sd_logd_waitq);
+	}
+
+	bio_for_each_segment_all(bvec, bio, i) {
+		struct gfs2_log_header_host uninitialized_var(lh);
+		void *ptr;
+		const u32 nothing = 0;
+		u32 hash;
+
+		page = bvec->bv_page;
+		ptr = page_address(page);
+		hash = crc32_le((u32)~0, ptr, sizeof(struct gfs2_log_header) - sizeof(u32));
+		hash = crc32_le(hash, (unsigned char const *)&nothing, sizeof(nothing));
+		hash ^= (u32)~0;
+		error = gfs2_log_header_in(&lh, ptr);
+		last = page_private(page);
+		mempool_free(page, gfs2_page_pool);
+
+		if (!test_bit(JDF_JHEAD, &jd->jd_flags))
+			continue;
+
+		if (!error && lh.lh_hash == hash) {
+			if (lh.lh_sequence > jd->jd_jhead.lh_sequence)
+				jd->jd_jhead = lh;
+			else
+				goto found;
+		}
+
+		if (last) {
+		found:
+			clear_bit(JDF_JHEAD, &jd->jd_flags);
+			smp_mb__after_clear_bit();
+			wake_up_bit(&jd->jd_flags, JDF_JHEAD);
+		}
+	}
+
+	bio_put(bio);
+
+	if (atomic_dec_and_test(&sdp->sd_log_in_flight))
+		wake_up(&sdp->sd_log_flush_wait);
+}
+
 /**
  * gfs2_log_flush_bio - Submit any pending log bio
  * @sdp: The superblock
@@ -259,7 +314,8 @@ void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw)
  * Returns: Newly allocated bio
  */
 
-static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
+static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno,
+				      bio_end_io_t *end_io)
 {
 	struct super_block *sb = sdp->sd_vfs;
 	unsigned nrvecs = bio_get_nr_vecs(sb->s_bdev);
@@ -276,7 +332,7 @@ static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
 
 	bio->bi_sector = blkno * (sb->s_blocksize >> 9);
 	bio->bi_bdev = sb->s_bdev;
-	bio->bi_end_io = gfs2_end_log_write;
+	bio->bi_end_io = end_io;
 	bio->bi_private = sdp;
 
 	sdp->sd_log_bio = bio;
@@ -297,7 +353,7 @@ static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
  * Returns: The bio to use for log writes
  */
 
-static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno)
+static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno, int rw)
 {
 	struct bio *bio = sdp->sd_log_bio;
 	u64 nblk;
@@ -307,10 +363,11 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno)
 		nblk >>= sdp->sd_fsb2bb_shift;
 		if (blkno == nblk)
 			return bio;
-		gfs2_log_flush_bio(sdp, WRITE);
+		gfs2_log_flush_bio(sdp, rw);
 	}
 
-	return gfs2_log_alloc_bio(sdp, blkno);
+	return gfs2_log_alloc_bio(sdp, blkno, rw == WRITE ? gfs2_end_log_write :
+		gfs2_end_log_read);
 }
 
 
@@ -333,11 +390,11 @@ static void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page,
 	struct bio *bio;
 	int ret;
 
-	bio = gfs2_log_get_bio(sdp, blkno);
+	bio = gfs2_log_get_bio(sdp, blkno, WRITE);
 	ret = bio_add_page(bio, page, size, offset);
 	if (ret == 0) {
 		gfs2_log_flush_bio(sdp, WRITE);
-		bio = gfs2_log_alloc_bio(sdp, blkno);
+		bio = gfs2_log_alloc_bio(sdp, blkno, gfs2_end_log_write);
 		ret = bio_add_page(bio, page, size, offset);
 		WARN_ON(ret == 0);
 	}
@@ -375,6 +432,50 @@ void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page)
 	gfs2_log_write(sdp, page, sb->s_blocksize, 0);
 }
 
+void gfs2_log_read_extent(struct gfs2_jdesc *jd, u64 dblock,
+			  unsigned int blocks, int last)
+{
+	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
+	struct page *page;
+	int i, ret;
+	struct bio *bio;
+	struct super_block *sb = sdp->sd_vfs;
+
+	for (i=0; i<blocks; i++) {
+		page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
+		/* flag the last page of the journal we plan to read in */
+		page_private(page) = (last && i == (blocks - 1));
+
+		bio = gfs2_log_get_bio(sdp, dblock + i, READ);
+		ret = bio_add_page(bio, page, sb->s_blocksize, 0);
+		if (ret == 0) {
+			gfs2_log_flush_bio(sdp, READ);
+			log_flush_wait(sdp);
+			bio = gfs2_log_alloc_bio(sdp, dblock + i, gfs2_end_log_read);
+			ret = bio_add_page(bio, page, sb->s_blocksize, 0);
+			WARN_ON(ret == 0);
+		}
+		bio->bi_private = jd;
+	}
+}
+
+void gfs2_log_read(struct gfs2_jdesc *jd)
+{
+	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
+	int last = 0;
+	struct gfs2_journal_extent *je;
+
+	if (list_empty(&jd->extent_list))
+		map_journal_extents(sdp, jd);
+
+	list_for_each_entry(je, &jd->extent_list, extent_list) {
+		last = list_is_last(&je->extent_list, &jd->extent_list);
+		gfs2_log_read_extent(jd, je->dblock, je->blocks, last);
+		gfs2_log_flush_bio(sdp, READ);
+		log_flush_wait(sdp);
+	}
+}
+
 static struct page *gfs2_get_log_desc(struct gfs2_sbd *sdp, u32 ld_type,
 				      u32 ld_length, u32 ld_data1)
 {
diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h
index 06793e3..b8a3886 100644
--- a/fs/gfs2/lops.h
+++ b/fs/gfs2/lops.h
@@ -30,6 +30,7 @@ extern const struct gfs2_log_operations *gfs2_log_ops[];
 extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page);
 extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw);
 extern void gfs2_pin(struct gfs2_sbd *sdp, struct buffer_head *bh);
+extern void gfs2_log_read(struct gfs2_jdesc *jd);
 
 static inline unsigned int buf_limit(struct gfs2_sbd *sdp)
 {
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 4b042db..4fd2b23 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -118,7 +118,7 @@ void gfs2_revoke_clean(struct gfs2_jdesc *jd)
 	}
 }
 
-static int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf)
+int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf)
 {
 	const struct gfs2_log_header *str = buf;
 
@@ -177,126 +177,26 @@ static int get_log_header(struct gfs2_jdesc *jd, unsigned int blk,
 }
 
 /**
- * find_good_lh - find a good log header
- * @jd: the journal
- * @blk: the segment to start searching from
- * @lh: the log header to fill in
- * @forward: if true search forward in the log, else search backward
- *
- * Call get_log_header() to get a log header for a segment, but if the
- * segment is bad, either scan forward or backward until we find a good one.
- *
- * Returns: errno
- */
-
-static int find_good_lh(struct gfs2_jdesc *jd, unsigned int *blk,
-			struct gfs2_log_header_host *head)
-{
-	unsigned int orig_blk = *blk;
-	int error;
-
-	for (;;) {
-		error = get_log_header(jd, *blk, head);
-		if (error <= 0)
-			return error;
-
-		if (++*blk == jd->jd_blocks)
-			*blk = 0;
-
-		if (*blk == orig_blk) {
-			gfs2_consist_inode(GFS2_I(jd->jd_inode));
-			return -EIO;
-		}
-	}
-}
-
-/**
- * jhead_scan - make sure we've found the head of the log
- * @jd: the journal
- * @head: this is filled in with the log descriptor of the head
- *
- * At this point, seg and lh should be either the head of the log or just
- * before.  Scan forward until we find the head.
- *
- * Returns: errno
- */
-
-static int jhead_scan(struct gfs2_jdesc *jd, struct gfs2_log_header_host *head)
-{
-	unsigned int blk = head->lh_blkno;
-	struct gfs2_log_header_host lh;
-	int error;
-
-	for (;;) {
-		if (++blk == jd->jd_blocks)
-			blk = 0;
-
-		error = get_log_header(jd, blk, &lh);
-		if (error < 0)
-			return error;
-		if (error == 1)
-			continue;
-
-		if (lh.lh_sequence == head->lh_sequence) {
-			gfs2_consist_inode(GFS2_I(jd->jd_inode));
-			return -EIO;
-		}
-		if (lh.lh_sequence < head->lh_sequence)
-			break;
-
-		*head = lh;
-	}
-
-	return 0;
-}
-
-/**
  * gfs2_find_jhead - find the head of a log
  * @jd: the journal
  * @head: the log descriptor for the head of the log is returned here
  *
- * Do a binary search of a journal and find the valid log entry with the
+ * Do a search of a journal and find the valid log entry with the
  * highest sequence number.  (i.e. the log head)
  *
  * Returns: errno
  */
-
 int gfs2_find_jhead(struct gfs2_jdesc *jd, struct gfs2_log_header_host *head)
 {
-	struct gfs2_log_header_host lh_1, lh_m;
-	u32 blk_1, blk_2, blk_m;
-	int error;
+	memset(&jd->jd_jhead, 0, sizeof(struct gfs2_log_header_host));
+	set_bit(JDF_JHEAD, &jd->jd_flags);
+	gfs2_log_read(jd);
 
-	blk_1 = 0;
-	blk_2 = jd->jd_blocks - 1;
+	if (test_bit(JDF_JHEAD, &jd->jd_flags))
+		wait_on_bit(&jd->jd_flags, JDF_JHEAD, TASK_INTERRUPTIBLE);
 
-	for (;;) {
-		blk_m = (blk_1 + blk_2) / 2;
-
-		error = find_good_lh(jd, &blk_1, &lh_1);
-		if (error)
-			return error;
-
-		error = find_good_lh(jd, &blk_m, &lh_m);
-		if (error)
-			return error;
-
-		if (blk_1 == blk_m || blk_m == blk_2)
-			break;
-
-		if (lh_1.lh_sequence <= lh_m.lh_sequence)
-			blk_1 = blk_m;
-		else
-			blk_2 = blk_m;
-	}
-
-	error = jhead_scan(jd, &lh_1);
-	if (error)
-		return error;
-
-	*head = lh_1;
-
-	return error;
+	*head = jd->jd_jhead;
+	return 0;
 }
 
 /**
diff --git a/fs/gfs2/recovery.h b/fs/gfs2/recovery.h
index 11fdfab..cd691ff 100644
--- a/fs/gfs2/recovery.h
+++ b/fs/gfs2/recovery.h
@@ -29,6 +29,7 @@ extern void gfs2_revoke_clean(struct gfs2_jdesc *jd);
 
 extern int gfs2_find_jhead(struct gfs2_jdesc *jd,
 		    struct gfs2_log_header_host *head);
+extern int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf);
 extern int gfs2_recover_journal(struct gfs2_jdesc *gfs2_jd, bool wait);
 extern void gfs2_recover_func(struct work_struct *work);
 
-- 
2.4.11



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup
  2018-05-29  8:05 [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Abhi Das
                   ` (2 preceding siblings ...)
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head Abhi Das
@ 2018-05-29  9:57 ` Steven Whitehouse
  3 siblings, 0 replies; 8+ messages in thread
From: Steven Whitehouse @ 2018-05-29  9:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 29/05/18 09:05, Abhi Das wrote:
> This patchset uses bio(s) to read in the journal in large chunks
> to lookup the journal head. This will replace the existing
> bisect method of reading the journal block-by-block to find the
> head.
>
> These are RHEL7 patches because it was simpler for me to
> test. Upstream versions shouldn't be very dissimilar. I'll port
> once I get some feedback on this.
>
> I'm seeing some promising results with some preliminary testing.
> I have 5 lock_nolock filesystems saved off using 'gfs2_edit savemeta'
> after killing the node mid-IO so as to save unrecovered journals.
>
> All times are in milliseconds. This table below shows the overall
> time taken for recovery and the jhead column represents the time
> taken to find the jhead alone. Finding the jhead is a big chunk of
> the overall time taken for recovery.
>
> 		bisect	      | 	bio
> 	overall		jhead |	overall		jhead
> ------------------------------|-----------------------
> fs1	1702		1642  |	394		325
> fs2	761		334   |	706		356
> fs3	1150		904   | 1356		1089
> fs4	953		596   |	691		343
> fs5	1262		1233  |	554		524
>
Some impressive numbers, certainly looking very promising,

Steve.

> Abhi Das (3):
>    gfs2: allow map_journal_extents() to take a journal descriptor as
>      argument
>    gfs2: add timing info for various stages of journal recovery
>    gfs2: read journal in large chunks to locate the head
>
>   fs/gfs2/incore.h     |   7 ++-
>   fs/gfs2/log.c        |   2 +-
>   fs/gfs2/log.h        |   2 +
>   fs/gfs2/lops.c       | 115 +++++++++++++++++++++++++++++++++++++++---
>   fs/gfs2/lops.h       |   1 +
>   fs/gfs2/ops_fstype.c |  10 ++--
>   fs/gfs2/recovery.c   | 138 +++++++++------------------------------------------
>   fs/gfs2/recovery.h   |   1 +
>   8 files changed, 149 insertions(+), 127 deletions(-)
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument Abhi Das
@ 2018-05-29  9:57   ` Steven Whitehouse
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Whitehouse @ 2018-05-29  9:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Looks good.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.


On 29/05/18 09:05, Abhi Das wrote:
> This function now maps the extents for the journal whose descriptor
> is passed in as argument.
>
> Signed-off-by: Abhi Das <adas@redhat.com>
> ---
>   fs/gfs2/log.h        | 1 +
>   fs/gfs2/ops_fstype.c | 5 ++---
>   2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
> index 92dcbe7..19c93df 100644
> --- a/fs/gfs2/log.h
> +++ b/fs/gfs2/log.h
> @@ -75,4 +75,5 @@ extern int gfs2_logd(void *data);
>   extern void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd);
>   extern void gfs2_write_revokes(struct gfs2_sbd *sdp);
>   
> +extern int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
>   #endif /* __LOG_DOT_H__ */
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 228f38e..cf3e366 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -524,9 +524,8 @@ out:
>    *       but since it's only done at mount time, I'm not worried about the
>    *       time it takes.
>    */
> -static int map_journal_extents(struct gfs2_sbd *sdp)
> +int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
>   {
> -	struct gfs2_jdesc *jd = sdp->sd_jdesc;
>   	unsigned int lb;
>   	u64 db, prev_db; /* logical block, disk block, prev disk block */
>   	struct gfs2_inode *ip = GFS2_I(jd->jd_inode);
> @@ -772,7 +771,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
>   		atomic_set(&sdp->sd_log_thresh2, 4*sdp->sd_jdesc->jd_blocks/5);
>   
>   		/* Map the extents for this journal's blocks */
> -		map_journal_extents(sdp);
> +		map_journal_extents(sdp, sdp->sd_jdesc);
>   	}
>   	trace_gfs2_log_blocks(sdp, atomic_read(&sdp->sd_log_blks_free));
>   



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery Abhi Das
@ 2018-05-29  9:59   ` Steven Whitehouse
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Whitehouse @ 2018-05-29  9:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Looks good.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.


On 29/05/18 09:05, Abhi Das wrote:
> Tells you how many milliseconds each stage of journal recovery
> takes.
>
> Signed-off-by: Abhi Das <adas@redhat.com>
> ---
>   fs/gfs2/ops_fstype.c |  5 +++++
>   fs/gfs2/recovery.c   | 20 ++++++++++++++------
>   2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index cf3e366..fd460c1 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -532,7 +532,9 @@ int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
>   	struct gfs2_journal_extent *jext = NULL;
>   	struct buffer_head bh;
>   	int rc = 0;
> +	ktime_t start, end;
>   
> +	start = ktime_get();
>   	prev_db = 0;
>   
>   	for (lb = 0; lb < i_size_read(jd->jd_inode) >> sdp->sd_sb.sb_bsize_shift; lb++) {
> @@ -564,6 +566,9 @@ int map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
>   		}
>   		prev_db = db;
>   	}
> +	end = ktime_get();
> +	fs_info(sdp, "jid=%u: Journal extent mapped in %lldms\n", jd->jd_jid,
> +		ktime_ms_delta(end, start));
>   	return rc;
>   }
>   
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 56dea44..4b042db 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -14,6 +14,7 @@
>   #include <linux/buffer_head.h>
>   #include <linux/gfs2_ondisk.h>
>   #include <linux/crc32.h>
> +#include <linux/ktime.h>
>   
>   #include "gfs2.h"
>   #include "incore.h"
> @@ -455,12 +456,13 @@ void gfs2_recover_func(struct work_struct *work)
>   	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
>   	struct gfs2_log_header_host head;
>   	struct gfs2_holder j_gh, ji_gh, t_gh;
> -	unsigned long t;
> +	ktime_t t_start, t_jlck, t_jhd, t_tlck, t_rep;
>   	int ro = 0;
>   	unsigned int pass;
>   	int error;
>   	int jlocked = 0;
>   
> +	t_start = ktime_get();
>   	if (sdp->sd_args.ar_spectator ||
>   	    (jd->jd_jid != sdp->sd_lockstruct.ls_jid)) {
>   		fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n",
> @@ -492,6 +494,7 @@ void gfs2_recover_func(struct work_struct *work)
>   		fs_info(sdp, "jid=%u, already locked for use\n", jd->jd_jid);
>   	}
>   
> +	t_jlck = ktime_get();
>   	fs_info(sdp, "jid=%u: Looking at journal...\n", jd->jd_jid);
>   
>   	error = gfs2_jdesc_check(jd);
> @@ -501,13 +504,12 @@ void gfs2_recover_func(struct work_struct *work)
>   	error = gfs2_find_jhead(jd, &head);
>   	if (error)
>   		goto fail_gunlock_ji;
> +	t_jhd = ktime_get();
>   
>   	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
>   		fs_info(sdp, "jid=%u: Acquiring the transaction lock...\n",
>   			jd->jd_jid);
>   
> -		t = jiffies;
> -
>   		/* Acquire a shared hold on the transaction lock */
>   
>   		error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED,
> @@ -541,6 +543,7 @@ void gfs2_recover_func(struct work_struct *work)
>   			goto fail_gunlock_tr;
>   		}
>   
> +		t_tlck = ktime_get();
>   		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);
>   
>   		for (pass = 0; pass < 2; pass++) {
> @@ -557,9 +560,14 @@ void gfs2_recover_func(struct work_struct *work)
>   			goto fail_gunlock_tr;
>   
>   		gfs2_glock_dq_uninit(&t_gh);
> -		t = DIV_ROUND_UP(jiffies - t, HZ);
> -		fs_info(sdp, "jid=%u: Journal replayed in %lus\n",
> -			jd->jd_jid, t);
> +		t_rep = ktime_get();
> +		fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "
> +			"jhead:%lldms, tlck:%lldms, replay:%lldms]\n",
> +			jd->jd_jid, ktime_ms_delta(t_rep, t_start),
> +			ktime_ms_delta(t_jlck, t_start),
> +			ktime_ms_delta(t_jhd, t_jlck),
> +			ktime_ms_delta(t_tlck, t_jhd),
> +			ktime_ms_delta(t_rep, t_tlck));
>   	}
>   
>   	gfs2_recovery_done(sdp, jd->jd_jid, LM_RD_SUCCESS);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head
  2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head Abhi Das
@ 2018-05-29 10:15   ` Steven Whitehouse
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Whitehouse @ 2018-05-29 10:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,



On 29/05/18 09:05, Abhi Das wrote:
> Use bio(s) to read in the journal sequentially in large chunks and
> locate the head of the journal.
> This is faster in most cases when compared to the existing bisect
> method which operates one block at a time.
Overall this looks very good. One thing that is not immediately clear is 
whether there is likely to be any issue with sharing some of the 
variables with the "write side", such as sd_log_in_flight. I suspect we 
might need to account for this separately, and also consider what might 
happen if we were asked to do multiple recoveries at the same time. I 
know that userspace will not ask for that currently, but it might be 
useful in the future so we should take that into account. I'd be tempted 
to count the outstanding reads separately for each journal, that will 
also aid debugging.

In the end_io function we should be able to skip the crc check in cases 
where the block in not a log header, and thus save some time there, 
since the crc calculation will be time consuming by comparison with a 
simple comparison on the block type & magic number. In fact you should 
not need to convert the whole header either, but just byteswap the 
relatively few fields that you need to use in that function.

Eventually we want to cache the blocks that we read in, in the page 
cache, dropping those which we know that we'll not want, and allowing 
the usual memory pressure code paths to work, that will save the need to 
reread some of the blocks in most cases. That can be a follow up item 
though, since this is already showing significant performance benefits, 
but it should be possible to do even better in due course,

Steve.

> Signed-off-by: Abhi Das <adas@redhat.com>
> ---
>   fs/gfs2/incore.h   |   7 +++-
>   fs/gfs2/log.c      |   2 +-
>   fs/gfs2/log.h      |   1 +
>   fs/gfs2/lops.c     | 115 +++++++++++++++++++++++++++++++++++++++++++++++----
>   fs/gfs2/lops.h     |   1 +
>   fs/gfs2/recovery.c | 118 ++++-------------------------------------------------
>   fs/gfs2/recovery.h |   1 +
>   7 files changed, 127 insertions(+), 118 deletions(-)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index f303616..82d50a0 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -494,18 +494,23 @@ struct gfs2_journal_extent {
>   	u64 blocks;
>   };
>   
> +enum {
> +	JDF_RECOVERY        = 1,
> +	JDF_JHEAD           = 2,
> +};
> +
>   struct gfs2_jdesc {
>   	struct list_head jd_list;
>   	struct list_head extent_list;
>   	struct work_struct jd_work;
>   	struct inode *jd_inode;
>   	unsigned long jd_flags;
> -#define JDF_RECOVERY 1
>   	unsigned int jd_jid;
>   	unsigned int jd_blocks;
>   	int jd_recover_error;
>   	/* Replay stuff */
>   
> +	struct gfs2_log_header_host jd_jhead;
>   	unsigned int jd_found_blocks;
>   	unsigned int jd_found_revokes;
>   	unsigned int jd_replayed_blocks;
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 15a3a8c..523cf79 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -469,7 +469,7 @@ static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail)
>   }
>   
>   
> -static void log_flush_wait(struct gfs2_sbd *sdp)
> +void log_flush_wait(struct gfs2_sbd *sdp)
>   {
>   	DEFINE_WAIT(wait);
>   
> diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
> index 19c93df..95b1ea3 100644
> --- a/fs/gfs2/log.h
> +++ b/fs/gfs2/log.h
> @@ -65,6 +65,7 @@ extern unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
>   
>   extern int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
>   extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl);
> +extern void log_flush_wait(struct gfs2_sbd *sdp);
>   extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
>   extern void gfs2_remove_from_ail(struct gfs2_bufdata *bd);
>   extern void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc);
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 4da6055..48d7e6d 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -16,6 +16,7 @@
>   #include <linux/gfs2_ondisk.h>
>   #include <linux/bio.h>
>   #include <linux/fs.h>
> +#include <linux/crc32.h>
>   
>   #include "gfs2.h"
>   #include "incore.h"
> @@ -228,6 +229,60 @@ static void gfs2_end_log_write(struct bio *bio, int error)
>   		wake_up(&sdp->sd_log_flush_wait);
>   }
>   
> +static void gfs2_end_log_read(struct bio *bio, int error)
> +{
> +	struct gfs2_jdesc *jd = bio->bi_private;
> +	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
> +	struct page *page;
> +	struct bio_vec *bvec;
> +	int i, last;
> +
> +	if (error) {
> +		sdp->sd_log_error = error;
> +		fs_err(sdp, "Error %d reading from journal, jid=%u\n", error,
> +		       jd->jd_jid);
> +		wake_up(&sdp->sd_logd_waitq);
> +	}
> +
> +	bio_for_each_segment_all(bvec, bio, i) {
> +		struct gfs2_log_header_host uninitialized_var(lh);
> +		void *ptr;
> +		const u32 nothing = 0;
> +		u32 hash;
> +
> +		page = bvec->bv_page;
> +		ptr = page_address(page);
> +		hash = crc32_le((u32)~0, ptr, sizeof(struct gfs2_log_header) - sizeof(u32));
> +		hash = crc32_le(hash, (unsigned char const *)&nothing, sizeof(nothing));
> +		hash ^= (u32)~0;
> +		error = gfs2_log_header_in(&lh, ptr);
> +		last = page_private(page);
> +		mempool_free(page, gfs2_page_pool);
> +
> +		if (!test_bit(JDF_JHEAD, &jd->jd_flags))
> +			continue;
> +
> +		if (!error && lh.lh_hash == hash) {
> +			if (lh.lh_sequence > jd->jd_jhead.lh_sequence)
> +				jd->jd_jhead = lh;
> +			else
> +				goto found;
> +		}
> +
> +		if (last) {
> +		found:
> +			clear_bit(JDF_JHEAD, &jd->jd_flags);
> +			smp_mb__after_clear_bit();
> +			wake_up_bit(&jd->jd_flags, JDF_JHEAD);
> +		}
> +	}
> +
> +	bio_put(bio);
> +
> +	if (atomic_dec_and_test(&sdp->sd_log_in_flight))
> +		wake_up(&sdp->sd_log_flush_wait);
> +}
> +
>   /**
>    * gfs2_log_flush_bio - Submit any pending log bio
>    * @sdp: The superblock
> @@ -259,7 +314,8 @@ void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw)
>    * Returns: Newly allocated bio
>    */
>   
> -static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
> +static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno,
> +				      bio_end_io_t *end_io)
>   {
>   	struct super_block *sb = sdp->sd_vfs;
>   	unsigned nrvecs = bio_get_nr_vecs(sb->s_bdev);
> @@ -276,7 +332,7 @@ static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
>   
>   	bio->bi_sector = blkno * (sb->s_blocksize >> 9);
>   	bio->bi_bdev = sb->s_bdev;
> -	bio->bi_end_io = gfs2_end_log_write;
> +	bio->bi_end_io = end_io;
>   	bio->bi_private = sdp;
>   
>   	sdp->sd_log_bio = bio;
> @@ -297,7 +353,7 @@ static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
>    * Returns: The bio to use for log writes
>    */
>   
> -static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno)
> +static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno, int rw)
>   {
>   	struct bio *bio = sdp->sd_log_bio;
>   	u64 nblk;
> @@ -307,10 +363,11 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno)
>   		nblk >>= sdp->sd_fsb2bb_shift;
>   		if (blkno == nblk)
>   			return bio;
> -		gfs2_log_flush_bio(sdp, WRITE);
> +		gfs2_log_flush_bio(sdp, rw);
>   	}
>   
> -	return gfs2_log_alloc_bio(sdp, blkno);
> +	return gfs2_log_alloc_bio(sdp, blkno, rw == WRITE ? gfs2_end_log_write :
> +		gfs2_end_log_read);
>   }
>   
>   
> @@ -333,11 +390,11 @@ static void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page,
>   	struct bio *bio;
>   	int ret;
>   
> -	bio = gfs2_log_get_bio(sdp, blkno);
> +	bio = gfs2_log_get_bio(sdp, blkno, WRITE);
>   	ret = bio_add_page(bio, page, size, offset);
>   	if (ret == 0) {
>   		gfs2_log_flush_bio(sdp, WRITE);
> -		bio = gfs2_log_alloc_bio(sdp, blkno);
> +		bio = gfs2_log_alloc_bio(sdp, blkno, gfs2_end_log_write);
>   		ret = bio_add_page(bio, page, size, offset);
>   		WARN_ON(ret == 0);
>   	}
> @@ -375,6 +432,50 @@ void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page)
>   	gfs2_log_write(sdp, page, sb->s_blocksize, 0);
>   }
>   
> +void gfs2_log_read_extent(struct gfs2_jdesc *jd, u64 dblock,
> +			  unsigned int blocks, int last)
> +{
> +	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
> +	struct page *page;
> +	int i, ret;
> +	struct bio *bio;
> +	struct super_block *sb = sdp->sd_vfs;
> +
> +	for (i=0; i<blocks; i++) {
> +		page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
> +		/* flag the last page of the journal we plan to read in */
> +		page_private(page) = (last && i == (blocks - 1));
> +
> +		bio = gfs2_log_get_bio(sdp, dblock + i, READ);
> +		ret = bio_add_page(bio, page, sb->s_blocksize, 0);
> +		if (ret == 0) {
> +			gfs2_log_flush_bio(sdp, READ);
> +			log_flush_wait(sdp);
> +			bio = gfs2_log_alloc_bio(sdp, dblock + i, gfs2_end_log_read);
> +			ret = bio_add_page(bio, page, sb->s_blocksize, 0);
> +			WARN_ON(ret == 0);
> +		}
> +		bio->bi_private = jd;
> +	}
> +}
> +
> +void gfs2_log_read(struct gfs2_jdesc *jd)
> +{
> +	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
> +	int last = 0;
> +	struct gfs2_journal_extent *je;
> +
> +	if (list_empty(&jd->extent_list))
> +		map_journal_extents(sdp, jd);
> +
> +	list_for_each_entry(je, &jd->extent_list, extent_list) {
> +		last = list_is_last(&je->extent_list, &jd->extent_list);
> +		gfs2_log_read_extent(jd, je->dblock, je->blocks, last);
> +		gfs2_log_flush_bio(sdp, READ);
> +		log_flush_wait(sdp);
> +	}
> +}
> +
>   static struct page *gfs2_get_log_desc(struct gfs2_sbd *sdp, u32 ld_type,
>   				      u32 ld_length, u32 ld_data1)
>   {
> diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h
> index 06793e3..b8a3886 100644
> --- a/fs/gfs2/lops.h
> +++ b/fs/gfs2/lops.h
> @@ -30,6 +30,7 @@ extern const struct gfs2_log_operations *gfs2_log_ops[];
>   extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page);
>   extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw);
>   extern void gfs2_pin(struct gfs2_sbd *sdp, struct buffer_head *bh);
> +extern void gfs2_log_read(struct gfs2_jdesc *jd);
>   
>   static inline unsigned int buf_limit(struct gfs2_sbd *sdp)
>   {
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 4b042db..4fd2b23 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -118,7 +118,7 @@ void gfs2_revoke_clean(struct gfs2_jdesc *jd)
>   	}
>   }
>   
> -static int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf)
> +int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf)
>   {
>   	const struct gfs2_log_header *str = buf;
>   
> @@ -177,126 +177,26 @@ static int get_log_header(struct gfs2_jdesc *jd, unsigned int blk,
>   }
>   
>   /**
> - * find_good_lh - find a good log header
> - * @jd: the journal
> - * @blk: the segment to start searching from
> - * @lh: the log header to fill in
> - * @forward: if true search forward in the log, else search backward
> - *
> - * Call get_log_header() to get a log header for a segment, but if the
> - * segment is bad, either scan forward or backward until we find a good one.
> - *
> - * Returns: errno
> - */
> -
> -static int find_good_lh(struct gfs2_jdesc *jd, unsigned int *blk,
> -			struct gfs2_log_header_host *head)
> -{
> -	unsigned int orig_blk = *blk;
> -	int error;
> -
> -	for (;;) {
> -		error = get_log_header(jd, *blk, head);
> -		if (error <= 0)
> -			return error;
> -
> -		if (++*blk == jd->jd_blocks)
> -			*blk = 0;
> -
> -		if (*blk == orig_blk) {
> -			gfs2_consist_inode(GFS2_I(jd->jd_inode));
> -			return -EIO;
> -		}
> -	}
> -}
> -
> -/**
> - * jhead_scan - make sure we've found the head of the log
> - * @jd: the journal
> - * @head: this is filled in with the log descriptor of the head
> - *
> - * At this point, seg and lh should be either the head of the log or just
> - * before.  Scan forward until we find the head.
> - *
> - * Returns: errno
> - */
> -
> -static int jhead_scan(struct gfs2_jdesc *jd, struct gfs2_log_header_host *head)
> -{
> -	unsigned int blk = head->lh_blkno;
> -	struct gfs2_log_header_host lh;
> -	int error;
> -
> -	for (;;) {
> -		if (++blk == jd->jd_blocks)
> -			blk = 0;
> -
> -		error = get_log_header(jd, blk, &lh);
> -		if (error < 0)
> -			return error;
> -		if (error == 1)
> -			continue;
> -
> -		if (lh.lh_sequence == head->lh_sequence) {
> -			gfs2_consist_inode(GFS2_I(jd->jd_inode));
> -			return -EIO;
> -		}
> -		if (lh.lh_sequence < head->lh_sequence)
> -			break;
> -
> -		*head = lh;
> -	}
> -
> -	return 0;
> -}
> -
> -/**
>    * gfs2_find_jhead - find the head of a log
>    * @jd: the journal
>    * @head: the log descriptor for the head of the log is returned here
>    *
> - * Do a binary search of a journal and find the valid log entry with the
> + * Do a search of a journal and find the valid log entry with the
>    * highest sequence number.  (i.e. the log head)
>    *
>    * Returns: errno
>    */
> -
>   int gfs2_find_jhead(struct gfs2_jdesc *jd, struct gfs2_log_header_host *head)
>   {
> -	struct gfs2_log_header_host lh_1, lh_m;
> -	u32 blk_1, blk_2, blk_m;
> -	int error;
> +	memset(&jd->jd_jhead, 0, sizeof(struct gfs2_log_header_host));
> +	set_bit(JDF_JHEAD, &jd->jd_flags);
> +	gfs2_log_read(jd);
>   
> -	blk_1 = 0;
> -	blk_2 = jd->jd_blocks - 1;
> +	if (test_bit(JDF_JHEAD, &jd->jd_flags))
> +		wait_on_bit(&jd->jd_flags, JDF_JHEAD, TASK_INTERRUPTIBLE);
>   
> -	for (;;) {
> -		blk_m = (blk_1 + blk_2) / 2;
> -
> -		error = find_good_lh(jd, &blk_1, &lh_1);
> -		if (error)
> -			return error;
> -
> -		error = find_good_lh(jd, &blk_m, &lh_m);
> -		if (error)
> -			return error;
> -
> -		if (blk_1 == blk_m || blk_m == blk_2)
> -			break;
> -
> -		if (lh_1.lh_sequence <= lh_m.lh_sequence)
> -			blk_1 = blk_m;
> -		else
> -			blk_2 = blk_m;
> -	}
> -
> -	error = jhead_scan(jd, &lh_1);
> -	if (error)
> -		return error;
> -
> -	*head = lh_1;
> -
> -	return error;
> +	*head = jd->jd_jhead;
> +	return 0;
>   }
>   
>   /**
> diff --git a/fs/gfs2/recovery.h b/fs/gfs2/recovery.h
> index 11fdfab..cd691ff 100644
> --- a/fs/gfs2/recovery.h
> +++ b/fs/gfs2/recovery.h
> @@ -29,6 +29,7 @@ extern void gfs2_revoke_clean(struct gfs2_jdesc *jd);
>   
>   extern int gfs2_find_jhead(struct gfs2_jdesc *jd,
>   		    struct gfs2_log_header_host *head);
> +extern int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf);
>   extern int gfs2_recover_journal(struct gfs2_jdesc *gfs2_jd, bool wait);
>   extern void gfs2_recover_func(struct work_struct *work);
>   



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-29 10:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-29  8:05 [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Abhi Das
2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 1/3] gfs2: allow map_journal_extents() to take a journal descriptor as argument Abhi Das
2018-05-29  9:57   ` Steven Whitehouse
2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 2/3] gfs2: add timing info for various stages of journal recovery Abhi Das
2018-05-29  9:59   ` Steven Whitehouse
2018-05-29  8:05 ` [Cluster-devel] [RFC PATCH 3/3] gfs2: read journal in large chunks to locate the head Abhi Das
2018-05-29 10:15   ` Steven Whitehouse
2018-05-29  9:57 ` [Cluster-devel] [RFC PATCH 0/3] Speed up journal head lookup Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).