From: Srinivas Eeda <srinivas.eeda@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix lock migration crash
Date: Wed, 26 Feb 2014 18:10:57 -0800 [thread overview]
Message-ID: <530E9EB1.7020409@oracle.com> (raw)
In-Reply-To: <1393400857-12294-1-git-send-email-junxiao.bi@oracle.com>
looks good, thanks Junxiao for fixing this.
Reviewed-by: Srinivas Eeda<srinivas.eeda@oracle.com>
On 02/25/2014 11:47 PM, Junxiao Bi wrote:
> This issue was introduced by commit 800deef3 where it replaced list_for_each
> with list_for_each_entry. The variable "lock" will point to invalid data if
> "tmpq" list is empty and a panic will be triggered due to this.
> Sunil advised reverting it back, but the old version was also not right. At
> the end of the outer for loop, that list_for_each_entry will also set "lock"
> to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock"
> will be an stale invalid data and cause the panic. So reverting the list_for_each
> back and reset "lock" to NULL to fix this issue.
>
> Another concern is that this seemes can not happen because the "tmpq" list should
> not be empty. Let me describe how.
> old lock resource owner(node 1): migratation target(node 2):
> image there's lockres with a EX lock from node 2 in
> granted list, a NR lock from node x with convert_type
> EX in converting list.
> dlm_empty_lockres() {
> dlm_pick_migration_target() {
> pick node 2 as target as its lock is the first one
> in granted list.
> }
> dlm_migrate_lockres() {
> dlm_mark_lockres_migrating() {
> res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
> wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
> //after the above code, we can not dirty lockres any more,
> // so dlm_thread shuffle list will not run
> downconvert lock from EX to NR
> upconvert lock from NR to EX
> <<< migration may schedule out here, then
> <<< node 2 send down convert request to convert type from EX to
> <<< NR, then send up convert request to convert type from NR to
> <<< EX,@this time, lockres granted list is empty, and two locks
> <<< in the converting list, node x up convert lock followed by
> <<< node 2 up convert lock.
>
> // will set lockres RES_MIGRATING flag, the following
> // lock/unlock can not run
> dlm_lockres_release_ast(dlm, res);
> }
>
> dlm_send_one_lockres()
> dlm_process_recovery_data()
> for (i=0; i<mres->num_locks; i++)
> if (ml->node == dlm->node_num)
> for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
> list_for_each_entry(lock, tmpq, list)
> if (lock) break; <<< lock is invalid as grant list is empty.
> }
> if (lock->ml.node != ml->node)
> BUG() >>> crash here
> }
> I see the above locks status from a vmcore of our internal bug.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: Sunil Mushran <sunil.mushran@gmail.com>
> Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 7035af0..c2dd258 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1750,13 +1750,13 @@ static int dlm_process_recovery_data(struct dlm_ctxt *dlm,
> struct dlm_migratable_lockres *mres)
> {
> struct dlm_migratable_lock *ml;
> - struct list_head *queue;
> + struct list_head *queue, *iter;
> struct list_head *tmpq = NULL;
> struct dlm_lock *newlock = NULL;
> struct dlm_lockstatus *lksb = NULL;
> int ret = 0;
> int i, j, bad;
> - struct dlm_lock *lock = NULL;
> + struct dlm_lock *lock;
> u8 from = O2NM_MAX_NODES;
> unsigned int added = 0;
> __be64 c;
> @@ -1791,14 +1791,16 @@ static int dlm_process_recovery_data(struct dlm_ctxt *dlm,
> /* MIGRATION ONLY! */
> BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));
>
> + lock = NULL;
> spin_lock(&res->spinlock);
> for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
> tmpq = dlm_list_idx_to_ptr(res, j);
> - list_for_each_entry(lock, tmpq, list) {
> - if (lock->ml.cookie != ml->cookie)
> - lock = NULL;
> - else
> + list_for_each(iter, tmpq) {
> + lock = list_entry(iter,
> + struct dlm_lock, list);
> + if (lock->ml.cookie == ml->cookie)
> break;
> + lock = NULL;
> }
> if (lock)
> break;
prev parent reply other threads:[~2014-02-27 2:10 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-26 7:47 [Ocfs2-devel] [PATCH] ocfs2: dlm: fix lock migration crash Junxiao Bi
2014-02-26 7:51 ` Wengang
2014-02-27 0:48 ` Andrew Morton
2014-02-27 1:24 ` Junxiao Bi
2014-02-27 2:10 ` Srinivas Eeda [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530E9EB1.7020409@oracle.com \
--to=srinivas.eeda@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).