From: Brian Foster <bfoster@redhat.com>
To: zwu.kernel@gmail.com
Cc: linux-fsdevel@vger.kernel.org,
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH v2] xfs: introduce object readahead to log recovery
Date: Tue, 30 Jul 2013 09:10:02 -0400 [thread overview]
Message-ID: <51F7BB2A.2090803@redhat.com> (raw)
In-Reply-To: <1375178347-29037-1-git-send-email-zwu.kernel@gmail.com>
On 07/30/2013 05:59 AM, zwu.kernel@gmail.com wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> It can take a long time to run log recovery operation because it is
> single threaded and is bound by read latency. We can find that it took
> most of the time to wait for the read IO to occur, so if one object
> readahead is introduced to log recovery, it will obviously reduce the
> log recovery time.
>
> Log recovery time stat:
>
> w/o this patch w/ this patch
>
> real: 0m15.023s 0m7.802s
> user: 0m0.001s 0m0.001s
> sys: 0m0.246s 0m0.107s
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
Cool patch. I'm not terribly familiar with the log recovery code so take
my comments with a grain of salt, but a couple things I noticed on a
quick skim...
> fs/xfs/xfs_log_recover.c | 162 +++++++++++++++++++++++++++++++++++++++++++++--
> fs/xfs/xfs_log_recover.h | 2 +
> 2 files changed, 159 insertions(+), 5 deletions(-)
>
...
>
> +STATIC int
> +xlog_recover_items_pass2(
> + struct xlog *log,
> + struct xlog_recover *trans,
> + struct list_head *buffer_list,
> + struct list_head *ra_list)
A nit, but technically this function doesn't have to be involved with
readahead. Perhaps rename ra_list to item_list..?
> +{
> + int error = 0;
> + xlog_recover_item_t *item;
> +
> + list_for_each_entry(item, ra_list, ri_list) {
> + error = xlog_recover_commit_pass2(log, trans,
> + buffer_list, item);
> + if (error)
> + return error;
> + }
> +
> + return error;
> +}
> +
> /*
> * Perform the transaction.
> *
> @@ -3189,9 +3314,11 @@ xlog_recover_commit_trans(
> struct xlog_recover *trans,
> int pass)
> {
> - int error = 0, error2;
> - xlog_recover_item_t *item;
> + int error = 0, error2, ra_qdepth = 0;
> + xlog_recover_item_t *item, *next;
> LIST_HEAD (buffer_list);
> + LIST_HEAD (ra_list);
> + LIST_HEAD (all_ra_list);
>
> hlist_del(&trans->r_list);
>
> @@ -3199,14 +3326,21 @@ xlog_recover_commit_trans(
> if (error)
> return error;
>
> - list_for_each_entry(item, &trans->r_itemq, ri_list) {
> + list_for_each_entry_safe(item, next, &trans->r_itemq, ri_list) {
> switch (pass) {
> case XLOG_RECOVER_PASS1:
> error = xlog_recover_commit_pass1(log, trans, item);
> break;
> case XLOG_RECOVER_PASS2:
> - error = xlog_recover_commit_pass2(log, trans,
> - &buffer_list, item);
> + if (ra_qdepth++ >= XLOG_RECOVER_MAX_QDEPTH) {
The counting mechanism looks strange and easy to break with future
changes. Why not increment ra_qdepth in the else bracket where it is
explicitly tied to the operation it tracks?
> + error = xlog_recover_items_pass2(log, trans,
> + &buffer_list, &ra_list);
> + list_splice_tail_init(&ra_list, &all_ra_list);
> + ra_qdepth = 0;
So now we've queued up a bunch of items we've issued readahead on in
ra_list and we've executed the recovery on the list. What happens to the
current item?
Brian
> + } else {
> + xlog_recover_ra_pass2(log, item);
> + list_move_tail(&item->ri_list, &ra_list);
> + }
> break;
> default:
> ASSERT(0);
> @@ -3216,9 +3350,27 @@ xlog_recover_commit_trans(
> goto out;
> }
>
> + if (!list_empty(&ra_list)) {
> + error = xlog_recover_items_pass2(log, trans,
> + &buffer_list, &ra_list);
> + if (error)
> + goto out;
> +
> + list_splice_tail_init(&ra_list, &all_ra_list);
> + }
> +
> + if (!list_empty(&all_ra_list))
> + list_splice_init(&all_ra_list, &trans->r_itemq);
> +
> xlog_recover_free_trans(trans);
>
> out:
> + if (!list_empty(&ra_list))
> + list_splice_tail_init(&ra_list, &all_ra_list);
> +
> + if (!list_empty(&all_ra_list))
> + list_splice_init(&all_ra_list, &trans->r_itemq);
> +
> error2 = xfs_buf_delwri_submit(&buffer_list);
> return error ? error : error2;
> }
> diff --git a/fs/xfs/xfs_log_recover.h b/fs/xfs/xfs_log_recover.h
> index 1c55ccb..16322f6 100644
> --- a/fs/xfs/xfs_log_recover.h
> +++ b/fs/xfs/xfs_log_recover.h
> @@ -63,4 +63,6 @@ typedef struct xlog_recover {
> #define XLOG_RECOVER_PASS1 1
> #define XLOG_RECOVER_PASS2 2
>
> +#define XLOG_RECOVER_MAX_QDEPTH 100
> +
> #endif /* __XFS_LOG_RECOVER_H__ */
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Brian Foster <bfoster@redhat.com>
To: zwu.kernel@gmail.com
Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org,
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] xfs: introduce object readahead to log recovery
Date: Tue, 30 Jul 2013 09:10:02 -0400 [thread overview]
Message-ID: <51F7BB2A.2090803@redhat.com> (raw)
In-Reply-To: <1375178347-29037-1-git-send-email-zwu.kernel@gmail.com>
On 07/30/2013 05:59 AM, zwu.kernel@gmail.com wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> It can take a long time to run log recovery operation because it is
> single threaded and is bound by read latency. We can find that it took
> most of the time to wait for the read IO to occur, so if one object
> readahead is introduced to log recovery, it will obviously reduce the
> log recovery time.
>
> Log recovery time stat:
>
> w/o this patch w/ this patch
>
> real: 0m15.023s 0m7.802s
> user: 0m0.001s 0m0.001s
> sys: 0m0.246s 0m0.107s
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
Cool patch. I'm not terribly familiar with the log recovery code so take
my comments with a grain of salt, but a couple things I noticed on a
quick skim...
> fs/xfs/xfs_log_recover.c | 162 +++++++++++++++++++++++++++++++++++++++++++++--
> fs/xfs/xfs_log_recover.h | 2 +
> 2 files changed, 159 insertions(+), 5 deletions(-)
>
...
>
> +STATIC int
> +xlog_recover_items_pass2(
> + struct xlog *log,
> + struct xlog_recover *trans,
> + struct list_head *buffer_list,
> + struct list_head *ra_list)
A nit, but technically this function doesn't have to be involved with
readahead. Perhaps rename ra_list to item_list..?
> +{
> + int error = 0;
> + xlog_recover_item_t *item;
> +
> + list_for_each_entry(item, ra_list, ri_list) {
> + error = xlog_recover_commit_pass2(log, trans,
> + buffer_list, item);
> + if (error)
> + return error;
> + }
> +
> + return error;
> +}
> +
> /*
> * Perform the transaction.
> *
> @@ -3189,9 +3314,11 @@ xlog_recover_commit_trans(
> struct xlog_recover *trans,
> int pass)
> {
> - int error = 0, error2;
> - xlog_recover_item_t *item;
> + int error = 0, error2, ra_qdepth = 0;
> + xlog_recover_item_t *item, *next;
> LIST_HEAD (buffer_list);
> + LIST_HEAD (ra_list);
> + LIST_HEAD (all_ra_list);
>
> hlist_del(&trans->r_list);
>
> @@ -3199,14 +3326,21 @@ xlog_recover_commit_trans(
> if (error)
> return error;
>
> - list_for_each_entry(item, &trans->r_itemq, ri_list) {
> + list_for_each_entry_safe(item, next, &trans->r_itemq, ri_list) {
> switch (pass) {
> case XLOG_RECOVER_PASS1:
> error = xlog_recover_commit_pass1(log, trans, item);
> break;
> case XLOG_RECOVER_PASS2:
> - error = xlog_recover_commit_pass2(log, trans,
> - &buffer_list, item);
> + if (ra_qdepth++ >= XLOG_RECOVER_MAX_QDEPTH) {
The counting mechanism looks strange and easy to break with future
changes. Why not increment ra_qdepth in the else bracket where it is
explicitly tied to the operation it tracks?
> + error = xlog_recover_items_pass2(log, trans,
> + &buffer_list, &ra_list);
> + list_splice_tail_init(&ra_list, &all_ra_list);
> + ra_qdepth = 0;
So now we've queued up a bunch of items we've issued readahead on in
ra_list and we've executed the recovery on the list. What happens to the
current item?
Brian
> + } else {
> + xlog_recover_ra_pass2(log, item);
> + list_move_tail(&item->ri_list, &ra_list);
> + }
> break;
> default:
> ASSERT(0);
> @@ -3216,9 +3350,27 @@ xlog_recover_commit_trans(
> goto out;
> }
>
> + if (!list_empty(&ra_list)) {
> + error = xlog_recover_items_pass2(log, trans,
> + &buffer_list, &ra_list);
> + if (error)
> + goto out;
> +
> + list_splice_tail_init(&ra_list, &all_ra_list);
> + }
> +
> + if (!list_empty(&all_ra_list))
> + list_splice_init(&all_ra_list, &trans->r_itemq);
> +
> xlog_recover_free_trans(trans);
>
> out:
> + if (!list_empty(&ra_list))
> + list_splice_tail_init(&ra_list, &all_ra_list);
> +
> + if (!list_empty(&all_ra_list))
> + list_splice_init(&all_ra_list, &trans->r_itemq);
> +
> error2 = xfs_buf_delwri_submit(&buffer_list);
> return error ? error : error2;
> }
> diff --git a/fs/xfs/xfs_log_recover.h b/fs/xfs/xfs_log_recover.h
> index 1c55ccb..16322f6 100644
> --- a/fs/xfs/xfs_log_recover.h
> +++ b/fs/xfs/xfs_log_recover.h
> @@ -63,4 +63,6 @@ typedef struct xlog_recover {
> #define XLOG_RECOVER_PASS1 1
> #define XLOG_RECOVER_PASS2 2
>
> +#define XLOG_RECOVER_MAX_QDEPTH 100
> +
> #endif /* __XFS_LOG_RECOVER_H__ */
>
next prev parent reply other threads:[~2013-07-30 13:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 9:59 [PATCH v2] xfs: introduce object readahead to log recovery zwu.kernel
2013-07-30 9:59 ` zwu.kernel
2013-07-30 13:10 ` Brian Foster [this message]
2013-07-30 13:10 ` Brian Foster
2013-07-30 22:36 ` Zhi Yong Wu
2013-07-30 22:36 ` Zhi Yong Wu
2013-07-30 23:11 ` Dave Chinner
2013-07-30 23:11 ` Dave Chinner
2013-07-31 4:07 ` Zhi Yong Wu
2013-07-31 4:07 ` Zhi Yong Wu
2013-07-31 13:35 ` Ben Myers
2013-07-31 13:35 ` Ben Myers
2013-07-31 14:17 ` Zhi Yong Wu
2013-07-31 14:17 ` Zhi Yong Wu
2013-07-31 23:11 ` Dave Chinner
2013-07-31 23:11 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F7BB2A.2090803@redhat.com \
--to=bfoster@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wuzhy@linux.vnet.ibm.com \
--cc=xfs@oss.sgi.com \
--cc=zwu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.