ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
From: Srinivas Eeda <srinivas.eeda@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix lock migration crash
Date: Wed, 26 Feb 2014 18:10:57 -0800	[thread overview]
Message-ID: <530E9EB1.7020409@oracle.com> (raw)
In-Reply-To: <1393400857-12294-1-git-send-email-junxiao.bi@oracle.com>

looks good, thanks Junxiao for fixing this.

Reviewed-by: Srinivas Eeda<srinivas.eeda@oracle.com>

On 02/25/2014 11:47 PM, Junxiao Bi wrote:
> This issue was introduced by commit 800deef3 where it replaced list_for_each
> with list_for_each_entry. The variable "lock" will point to invalid data if
> "tmpq" list is empty and a panic will be triggered due to this.
> Sunil advised reverting it back, but the old version was also not right. At
> the end of the outer for loop, that list_for_each_entry will also set "lock"
> to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock"
> will be an stale invalid data and cause the panic. So reverting the list_for_each
> back and reset "lock" to NULL to fix this issue.
>
> Another concern is that this seemes can not happen because the "tmpq" list should
> not be empty. Let me describe how.
> old lock resource owner(node 1):                                  migratation target(node 2):
> image there's lockres with a EX lock from node 2 in
> granted list, a NR lock from node x with convert_type
> EX in converting list.
> dlm_empty_lockres() {
>   dlm_pick_migration_target() {
>     pick node 2 as target as its lock is the first one
>     in granted list.
>   }
>   dlm_migrate_lockres() {
>     dlm_mark_lockres_migrating() {
>       res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
>       wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
> 	 //after the above code, we can not dirty lockres any more,
>       // so dlm_thread shuffle list will not run
>                                                                     downconvert lock from EX to NR
>                                                                     upconvert lock from NR to EX
> <<< migration may schedule out here, then
> <<< node 2 send down convert request to convert type from EX to
> <<< NR, then send up convert request to convert type from NR to
> <<< EX,@this time, lockres granted list is empty, and two locks
> <<< in the converting list, node x up convert lock followed by
> <<< node 2 up convert lock.
>
> 	 // will set lockres RES_MIGRATING flag, the following
> 	 // lock/unlock can not run
>       dlm_lockres_release_ast(dlm, res);
>     }
>
>     dlm_send_one_lockres()
>                                                                   dlm_process_recovery_data()
>                                                                     for (i=0; i<mres->num_locks; i++)
>                                                                       if (ml->node == dlm->node_num)
>                                                                         for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
>                                                                          list_for_each_entry(lock, tmpq, list)
>                                                                          if (lock) break; <<< lock is invalid as grant list is empty.
>                                                                         }
>                                                                         if (lock->ml.node != ml->node)
>                                                                           BUG() >>> crash here
>   }
> I see the above locks status from a vmcore of our internal bug.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: Sunil Mushran <sunil.mushran@gmail.com>
> Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c |   14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 7035af0..c2dd258 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1750,13 +1750,13 @@ static int dlm_process_recovery_data(struct dlm_ctxt *dlm,
>   				     struct dlm_migratable_lockres *mres)
>   {
>   	struct dlm_migratable_lock *ml;
> -	struct list_head *queue;
> +	struct list_head *queue, *iter;
>   	struct list_head *tmpq = NULL;
>   	struct dlm_lock *newlock = NULL;
>   	struct dlm_lockstatus *lksb = NULL;
>   	int ret = 0;
>   	int i, j, bad;
> -	struct dlm_lock *lock = NULL;
> +	struct dlm_lock *lock;
>   	u8 from = O2NM_MAX_NODES;
>   	unsigned int added = 0;
>   	__be64 c;
> @@ -1791,14 +1791,16 @@ static int dlm_process_recovery_data(struct dlm_ctxt *dlm,
>   			/* MIGRATION ONLY! */
>   			BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));
>   
> +			lock = NULL;
>   			spin_lock(&res->spinlock);
>   			for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
>   				tmpq = dlm_list_idx_to_ptr(res, j);
> -				list_for_each_entry(lock, tmpq, list) {
> -					if (lock->ml.cookie != ml->cookie)
> -						lock = NULL;
> -					else
> +				list_for_each(iter, tmpq) {
> +					lock = list_entry(iter,
> +						  struct dlm_lock, list);
> +					if (lock->ml.cookie == ml->cookie)
>   						break;
> +					lock = NULL;
>   				}
>   				if (lock)
>   					break;

      parent reply	other threads:[~2014-02-27  2:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-26  7:47 [Ocfs2-devel] [PATCH] ocfs2: dlm: fix lock migration crash Junxiao Bi
2014-02-26  7:51 ` Wengang
2014-02-27  0:48 ` Andrew Morton
2014-02-27  1:24   ` Junxiao Bi
2014-02-27  2:10 ` Srinivas Eeda [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530E9EB1.7020409@oracle.com \
    --to=srinivas.eeda@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).