* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
@ 2010-06-10 16:25 Wengang Wang
2010-06-25 1:55 ` Wengang Wang
2010-07-19 10:09 ` Wengang Wang
0 siblings, 2 replies; 11+ messages in thread
From: Wengang Wang @ 2010-06-10 16:25 UTC (permalink / raw)
To: ocfs2-devel
If the dlm recovery goes on the non-master node where purging work is going on,
There could be unexpected reference left on some lockres' on recovery master.
That is because we migrated the lockres' to recovery master but didn't send
deref requests to it accordingly(was sending to the dead original master or to
the "UNKNOWN").
Fix:
For the lockres which is in progress of dropping reference, we don't migrate it
to recovery master and unhash the lockres in the purge work.
For those not in progress of the dropping, delay the purge work until recovery
finished so that it can send deref request to the correct master(recovery
master) later.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
2 files changed, 37 insertions(+), 16 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index f8b75ce..43530ce 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
struct list_head *queue;
struct dlm_lock *lock, *next;
+ assert_spin_locked(&dlm->spinlock);
+ assert_spin_locked(&res->spinlock);
res->state |= DLM_LOCK_RES_RECOVERING;
if (!list_empty(&res->recovering)) {
mlog(0,
@@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
/* the wake_up for this will happen when the
* RECOVERING flag is dropped later */
- res->state &= ~DLM_LOCK_RES_DROPPING_REF;
+ if (res->state & DLM_LOCK_RES_DROPPING_REF) {
+ /*
+ * don't migrate a lockres which is in
+ * progress of dropping ref
+ */
+ mlog(ML_NOTICE, "%.*s ignored for "
+ "migration\n", res->lockname.len,
+ res->lockname.name);
+ res->state &=
+ ~DLM_LOCK_RES_DROPPING_REF;
+ } else
+ dlm_move_lockres_to_recovery_list(dlm,
+ res);
- dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index d4f73ca..0771420 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
* truly ready to be freed. */
int __dlm_lockres_unused(struct dlm_lock_resource *res)
{
- if (!__dlm_lockres_has_locks(res) &&
- (list_empty(&res->dirty) && !(res->state & DLM_LOCK_RES_DIRTY))) {
- /* try not to scan the bitmap unless the first two
- * conditions are already true */
- int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
- if (bit >= O2NM_MAX_NODES) {
- /* since the bit for dlm->node_num is not
- * set, inflight_locks better be zero */
- BUG_ON(res->inflight_locks != 0);
- return 1;
- }
+ int bit;
+
+ if (__dlm_lockres_has_locks(res))
+ return 0;
+
+ if (!list_empty(&res->dirty) || res->state & DLM_LOCK_RES_DIRTY)
+ return 0;
+
+ if (res->state & DLM_LOCK_RES_RECOVERING)
+ return 0;
+
+ bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
+ if (bit >= O2NM_MAX_NODES) {
+ /* since the bit for dlm->node_num is not
+ * set, inflight_locks better be zero */
+ BUG_ON(res->inflight_locks != 0);
+ return 1;
}
return 0;
}
@@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
int master;
int ret = 0;
+ assert_spin_locked(&dlm->spinlock);
+
spin_lock(&res->spinlock);
if (!__dlm_lockres_unused(res)) {
mlog(0, "%s:%.*s: tried to purge but not unused\n",
@@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
"master = %d\n", res->lockname.len, res->lockname.name,
res, master);
list_del_init(&res->purge);
- spin_unlock(&res->spinlock);
+ /* not the last ref */
dlm_lockres_put(res);
dlm->purge_count--;
- } else
- spin_unlock(&res->spinlock);
+ }
__dlm_unhash_lockres(res);
+ spin_unlock(&res->spinlock);
/* lockres is not in the hash now. drop the flag and wake up
* any processes waiting in dlm_get_lock_resource. */
--
1.6.6.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-06-10 16:25 [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master Wengang Wang
@ 2010-06-25 1:55 ` Wengang Wang
2010-07-05 10:00 ` Wengang Wang
2010-07-19 10:09 ` Wengang Wang
1 sibling, 1 reply; 11+ messages in thread
From: Wengang Wang @ 2010-06-25 1:55 UTC (permalink / raw)
To: ocfs2-devel
Hi,
Any comment on this?
regards,
wengang.
On 10-06-11 00:25, Wengang Wang wrote:
> If the dlm recovery goes on the non-master node where purging work is going on,
> There could be unexpected reference left on some lockres' on recovery master.
> That is because we migrated the lockres' to recovery master but didn't send
> deref requests to it accordingly(was sending to the dead original master or to
> the "UNKNOWN").
>
> Fix:
> For the lockres which is in progress of dropping reference, we don't migrate it
> to recovery master and unhash the lockres in the purge work.
> For those not in progress of the dropping, delay the purge work until recovery
> finished so that it can send deref request to the correct master(recovery
> master) later.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
> fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
> 2 files changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index f8b75ce..43530ce 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
> struct list_head *queue;
> struct dlm_lock *lock, *next;
>
> + assert_spin_locked(&dlm->spinlock);
> + assert_spin_locked(&res->spinlock);
> res->state |= DLM_LOCK_RES_RECOVERING;
> if (!list_empty(&res->recovering)) {
> mlog(0,
> @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
>
> /* the wake_up for this will happen when the
> * RECOVERING flag is dropped later */
> - res->state &= ~DLM_LOCK_RES_DROPPING_REF;
> + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> + /*
> + * don't migrate a lockres which is in
> + * progress of dropping ref
> + */
> + mlog(ML_NOTICE, "%.*s ignored for "
> + "migration\n", res->lockname.len,
> + res->lockname.name);
> + res->state &=
> + ~DLM_LOCK_RES_DROPPING_REF;
> + } else
> + dlm_move_lockres_to_recovery_list(dlm,
> + res);
>
> - dlm_move_lockres_to_recovery_list(dlm, res);
> } else if (res->owner == dlm->node_num) {
> dlm_free_dead_locks(dlm, res, dead_node);
> __dlm_lockres_calc_usage(dlm, res);
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index d4f73ca..0771420 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
> * truly ready to be freed. */
> int __dlm_lockres_unused(struct dlm_lock_resource *res)
> {
> - if (!__dlm_lockres_has_locks(res) &&
> - (list_empty(&res->dirty) && !(res->state & DLM_LOCK_RES_DIRTY))) {
> - /* try not to scan the bitmap unless the first two
> - * conditions are already true */
> - int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> - if (bit >= O2NM_MAX_NODES) {
> - /* since the bit for dlm->node_num is not
> - * set, inflight_locks better be zero */
> - BUG_ON(res->inflight_locks != 0);
> - return 1;
> - }
> + int bit;
> +
> + if (__dlm_lockres_has_locks(res))
> + return 0;
> +
> + if (!list_empty(&res->dirty) || res->state & DLM_LOCK_RES_DIRTY)
> + return 0;
> +
> + if (res->state & DLM_LOCK_RES_RECOVERING)
> + return 0;
> +
> + bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> + if (bit >= O2NM_MAX_NODES) {
> + /* since the bit for dlm->node_num is not
> + * set, inflight_locks better be zero */
> + BUG_ON(res->inflight_locks != 0);
> + return 1;
> }
> return 0;
> }
> @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> int master;
> int ret = 0;
>
> + assert_spin_locked(&dlm->spinlock);
> +
> spin_lock(&res->spinlock);
> if (!__dlm_lockres_unused(res)) {
> mlog(0, "%s:%.*s: tried to purge but not unused\n",
> @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> "master = %d\n", res->lockname.len, res->lockname.name,
> res, master);
> list_del_init(&res->purge);
> - spin_unlock(&res->spinlock);
> + /* not the last ref */
> dlm_lockres_put(res);
> dlm->purge_count--;
> - } else
> - spin_unlock(&res->spinlock);
> + }
>
> __dlm_unhash_lockres(res);
> + spin_unlock(&res->spinlock);
>
> /* lockres is not in the hash now. drop the flag and wake up
> * any processes waiting in dlm_get_lock_resource. */
> --
> 1.6.6.1
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-06-25 1:55 ` Wengang Wang
@ 2010-07-05 10:00 ` Wengang Wang
0 siblings, 0 replies; 11+ messages in thread
From: Wengang Wang @ 2010-07-05 10:00 UTC (permalink / raw)
To: ocfs2-devel
Any comment?
On 10-06-25 09:55, Wengang Wang wrote:
> Hi,
>
> Any comment on this?
>
> regards,
> wengang.
> On 10-06-11 00:25, Wengang Wang wrote:
> > If the dlm recovery goes on the non-master node where purging work is going on,
> > There could be unexpected reference left on some lockres' on recovery master.
> > That is because we migrated the lockres' to recovery master but didn't send
> > deref requests to it accordingly(was sending to the dead original master or to
> > the "UNKNOWN").
> >
> > Fix:
> > For the lockres which is in progress of dropping reference, we don't migrate it
> > to recovery master and unhash the lockres in the purge work.
> > For those not in progress of the dropping, delay the purge work until recovery
> > finished so that it can send deref request to the correct master(recovery
> > master) later.
> >
> > Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> > ---
> > fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
> > fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
> > 2 files changed, 37 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> > index f8b75ce..43530ce 100644
> > --- a/fs/ocfs2/dlm/dlmrecovery.c
> > +++ b/fs/ocfs2/dlm/dlmrecovery.c
> > @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
> > struct list_head *queue;
> > struct dlm_lock *lock, *next;
> >
> > + assert_spin_locked(&dlm->spinlock);
> > + assert_spin_locked(&res->spinlock);
> > res->state |= DLM_LOCK_RES_RECOVERING;
> > if (!list_empty(&res->recovering)) {
> > mlog(0,
> > @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
> >
> > /* the wake_up for this will happen when the
> > * RECOVERING flag is dropped later */
> > - res->state &= ~DLM_LOCK_RES_DROPPING_REF;
> > + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> > + /*
> > + * don't migrate a lockres which is in
> > + * progress of dropping ref
> > + */
> > + mlog(ML_NOTICE, "%.*s ignored for "
> > + "migration\n", res->lockname.len,
> > + res->lockname.name);
> > + res->state &=
> > + ~DLM_LOCK_RES_DROPPING_REF;
> > + } else
> > + dlm_move_lockres_to_recovery_list(dlm,
> > + res);
> >
> > - dlm_move_lockres_to_recovery_list(dlm, res);
> > } else if (res->owner == dlm->node_num) {
> > dlm_free_dead_locks(dlm, res, dead_node);
> > __dlm_lockres_calc_usage(dlm, res);
> > diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> > index d4f73ca..0771420 100644
> > --- a/fs/ocfs2/dlm/dlmthread.c
> > +++ b/fs/ocfs2/dlm/dlmthread.c
> > @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
> > * truly ready to be freed. */
> > int __dlm_lockres_unused(struct dlm_lock_resource *res)
> > {
> > - if (!__dlm_lockres_has_locks(res) &&
> > - (list_empty(&res->dirty) && !(res->state & DLM_LOCK_RES_DIRTY))) {
> > - /* try not to scan the bitmap unless the first two
> > - * conditions are already true */
> > - int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> > - if (bit >= O2NM_MAX_NODES) {
> > - /* since the bit for dlm->node_num is not
> > - * set, inflight_locks better be zero */
> > - BUG_ON(res->inflight_locks != 0);
> > - return 1;
> > - }
> > + int bit;
> > +
> > + if (__dlm_lockres_has_locks(res))
> > + return 0;
> > +
> > + if (!list_empty(&res->dirty) || res->state & DLM_LOCK_RES_DIRTY)
> > + return 0;
> > +
> > + if (res->state & DLM_LOCK_RES_RECOVERING)
> > + return 0;
> > +
> > + bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> > + if (bit >= O2NM_MAX_NODES) {
> > + /* since the bit for dlm->node_num is not
> > + * set, inflight_locks better be zero */
> > + BUG_ON(res->inflight_locks != 0);
> > + return 1;
> > }
> > return 0;
> > }
> > @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> > int master;
> > int ret = 0;
> >
> > + assert_spin_locked(&dlm->spinlock);
> > +
> > spin_lock(&res->spinlock);
> > if (!__dlm_lockres_unused(res)) {
> > mlog(0, "%s:%.*s: tried to purge but not unused\n",
> > @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> > "master = %d\n", res->lockname.len, res->lockname.name,
> > res, master);
> > list_del_init(&res->purge);
> > - spin_unlock(&res->spinlock);
> > + /* not the last ref */
> > dlm_lockres_put(res);
> > dlm->purge_count--;
> > - } else
> > - spin_unlock(&res->spinlock);
> > + }
> >
> > __dlm_unhash_lockres(res);
> > + spin_unlock(&res->spinlock);
> >
> > /* lockres is not in the hash now. drop the flag and wake up
> > * any processes waiting in dlm_get_lock_resource. */
> > --
> > 1.6.6.1
> >
> >
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-06-10 16:25 [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master Wengang Wang
2010-06-25 1:55 ` Wengang Wang
@ 2010-07-19 10:09 ` Wengang Wang
2010-07-19 23:52 ` Sunil Mushran
1 sibling, 1 reply; 11+ messages in thread
From: Wengang Wang @ 2010-07-19 10:09 UTC (permalink / raw)
To: ocfs2-devel
Any comment?
On 10-06-11 00:25, Wengang Wang wrote:
> If the dlm recovery goes on the non-master node where purging work is going on,
> There could be unexpected reference left on some lockres' on recovery master.
> That is because we migrated the lockres' to recovery master but didn't send
> deref requests to it accordingly(was sending to the dead original master or to
> the "UNKNOWN").
>
> Fix:
> For the lockres which is in progress of dropping reference, we don't migrate it
> to recovery master and unhash the lockres in the purge work.
> For those not in progress of the dropping, delay the purge work until recovery
> finished so that it can send deref request to the correct master(recovery
> master) later.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
> fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
> 2 files changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index f8b75ce..43530ce 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
> struct list_head *queue;
> struct dlm_lock *lock, *next;
>
> + assert_spin_locked(&dlm->spinlock);
> + assert_spin_locked(&res->spinlock);
> res->state |= DLM_LOCK_RES_RECOVERING;
> if (!list_empty(&res->recovering)) {
> mlog(0,
> @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
>
> /* the wake_up for this will happen when the
> * RECOVERING flag is dropped later */
> - res->state &= ~DLM_LOCK_RES_DROPPING_REF;
> + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> + /*
> + * don't migrate a lockres which is in
> + * progress of dropping ref
> + */
> + mlog(ML_NOTICE, "%.*s ignored for "
> + "migration\n", res->lockname.len,
> + res->lockname.name);
> + res->state &=
> + ~DLM_LOCK_RES_DROPPING_REF;
> + } else
> + dlm_move_lockres_to_recovery_list(dlm,
> + res);
>
> - dlm_move_lockres_to_recovery_list(dlm, res);
> } else if (res->owner == dlm->node_num) {
> dlm_free_dead_locks(dlm, res, dead_node);
> __dlm_lockres_calc_usage(dlm, res);
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index d4f73ca..0771420 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
> * truly ready to be freed. */
> int __dlm_lockres_unused(struct dlm_lock_resource *res)
> {
> - if (!__dlm_lockres_has_locks(res) &&
> - (list_empty(&res->dirty) && !(res->state & DLM_LOCK_RES_DIRTY))) {
> - /* try not to scan the bitmap unless the first two
> - * conditions are already true */
> - int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> - if (bit >= O2NM_MAX_NODES) {
> - /* since the bit for dlm->node_num is not
> - * set, inflight_locks better be zero */
> - BUG_ON(res->inflight_locks != 0);
> - return 1;
> - }
> + int bit;
> +
> + if (__dlm_lockres_has_locks(res))
> + return 0;
> +
> + if (!list_empty(&res->dirty) || res->state & DLM_LOCK_RES_DIRTY)
> + return 0;
> +
> + if (res->state & DLM_LOCK_RES_RECOVERING)
> + return 0;
> +
> + bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> + if (bit >= O2NM_MAX_NODES) {
> + /* since the bit for dlm->node_num is not
> + * set, inflight_locks better be zero */
> + BUG_ON(res->inflight_locks != 0);
> + return 1;
> }
> return 0;
> }
> @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> int master;
> int ret = 0;
>
> + assert_spin_locked(&dlm->spinlock);
> +
> spin_lock(&res->spinlock);
> if (!__dlm_lockres_unused(res)) {
> mlog(0, "%s:%.*s: tried to purge but not unused\n",
> @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> "master = %d\n", res->lockname.len, res->lockname.name,
> res, master);
> list_del_init(&res->purge);
> - spin_unlock(&res->spinlock);
> + /* not the last ref */
> dlm_lockres_put(res);
> dlm->purge_count--;
> - } else
> - spin_unlock(&res->spinlock);
> + }
>
> __dlm_unhash_lockres(res);
> + spin_unlock(&res->spinlock);
>
> /* lockres is not in the hash now. drop the flag and wake up
> * any processes waiting in dlm_get_lock_resource. */
> --
> 1.6.6.1
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-19 10:09 ` Wengang Wang
@ 2010-07-19 23:52 ` Sunil Mushran
2010-07-20 2:59 ` Wengang Wang
0 siblings, 1 reply; 11+ messages in thread
From: Sunil Mushran @ 2010-07-19 23:52 UTC (permalink / raw)
To: ocfs2-devel
Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF.
You may want to look a this afresh with that patch.
My problem with this patch is that the description is not clear.
<<<<
That is because we migrated the lockres' to recovery master but didn't send
deref requests to it accordingly(was sending to the dead original master
or to
the "UNKNOWN").
>>>>
Do you have the message sequencing that would lead to this situation?
If we migrate the lockres to the reco master, the reco master will send
an assert that will make us change the master.
I think your problem is the one race we have concerning reco/migration.
If so, this fix is not enough.
Sunil
On 07/19/2010 03:09 AM, Wengang Wang wrote:
> Any comment?
>
> On 10-06-11 00:25, Wengang Wang wrote:
>> If the dlm recovery goes on the non-master node where purging work is going on,
>> There could be unexpected reference left on some lockres' on recovery master.
>> That is because we migrated the lockres' to recovery master but didn't send
>> deref requests to it accordingly(was sending to the dead original master or to
>> the "UNKNOWN").
>>
>> Fix:
>> For the lockres which is in progress of dropping reference, we don't migrate it
>> to recovery master and unhash the lockres in the purge work.
>> For those not in progress of the dropping, delay the purge work until recovery
>> finished so that it can send deref request to the correct master(recovery
>> master) later.
>>
>> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
>> ---
>> fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
>> fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
>> 2 files changed, 37 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>> index f8b75ce..43530ce 100644
>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>> @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
>> struct list_head *queue;
>> struct dlm_lock *lock, *next;
>>
>> + assert_spin_locked(&dlm->spinlock);
>> + assert_spin_locked(&res->spinlock);
>> res->state |= DLM_LOCK_RES_RECOVERING;
>> if (!list_empty(&res->recovering)) {
>> mlog(0,
>> @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
>>
>> /* the wake_up for this will happen when the
>> * RECOVERING flag is dropped later */
>> - res->state&= ~DLM_LOCK_RES_DROPPING_REF;
>> + if (res->state& DLM_LOCK_RES_DROPPING_REF) {
>> + /*
>> + * don't migrate a lockres which is in
>> + * progress of dropping ref
>> + */
>> + mlog(ML_NOTICE, "%.*s ignored for "
>> + "migration\n", res->lockname.len,
>> + res->lockname.name);
>> + res->state&=
>> + ~DLM_LOCK_RES_DROPPING_REF;
>> + } else
>> + dlm_move_lockres_to_recovery_list(dlm,
>> + res);
>>
>> - dlm_move_lockres_to_recovery_list(dlm, res);
>> } else if (res->owner == dlm->node_num) {
>> dlm_free_dead_locks(dlm, res, dead_node);
>> __dlm_lockres_calc_usage(dlm, res);
>> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
>> index d4f73ca..0771420 100644
>> --- a/fs/ocfs2/dlm/dlmthread.c
>> +++ b/fs/ocfs2/dlm/dlmthread.c
>> @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
>> * truly ready to be freed. */
>> int __dlm_lockres_unused(struct dlm_lock_resource *res)
>> {
>> - if (!__dlm_lockres_has_locks(res)&&
>> - (list_empty(&res->dirty)&& !(res->state& DLM_LOCK_RES_DIRTY))) {
>> - /* try not to scan the bitmap unless the first two
>> - * conditions are already true */
>> - int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
>> - if (bit>= O2NM_MAX_NODES) {
>> - /* since the bit for dlm->node_num is not
>> - * set, inflight_locks better be zero */
>> - BUG_ON(res->inflight_locks != 0);
>> - return 1;
>> - }
>> + int bit;
>> +
>> + if (__dlm_lockres_has_locks(res))
>> + return 0;
>> +
>> + if (!list_empty(&res->dirty) || res->state& DLM_LOCK_RES_DIRTY)
>> + return 0;
>> +
>> + if (res->state& DLM_LOCK_RES_RECOVERING)
>> + return 0;
>> +
>> + bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
>> + if (bit>= O2NM_MAX_NODES) {
>> + /* since the bit for dlm->node_num is not
>> + * set, inflight_locks better be zero */
>> + BUG_ON(res->inflight_locks != 0);
>> + return 1;
>> }
>> return 0;
>> }
>> @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>> int master;
>> int ret = 0;
>>
>> + assert_spin_locked(&dlm->spinlock);
>> +
>> spin_lock(&res->spinlock);
>> if (!__dlm_lockres_unused(res)) {
>> mlog(0, "%s:%.*s: tried to purge but not unused\n",
>> @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>> "master = %d\n", res->lockname.len, res->lockname.name,
>> res, master);
>> list_del_init(&res->purge);
>> - spin_unlock(&res->spinlock);
>> + /* not the last ref */
>> dlm_lockres_put(res);
>> dlm->purge_count--;
>> - } else
>> - spin_unlock(&res->spinlock);
>> + }
>>
>> __dlm_unhash_lockres(res);
>> + spin_unlock(&res->spinlock);
>>
>> /* lockres is not in the hash now. drop the flag and wake up
>> * any processes waiting in dlm_get_lock_resource. */
>> --
>> 1.6.6.1
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-19 23:52 ` Sunil Mushran
@ 2010-07-20 2:59 ` Wengang Wang
2010-07-20 22:33 ` Sunil Mushran
0 siblings, 1 reply; 11+ messages in thread
From: Wengang Wang @ 2010-07-20 2:59 UTC (permalink / raw)
To: ocfs2-devel
On 10-07-19 16:52, Sunil Mushran wrote:
> Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF.
> You may want to look a this afresh with that patch.
>
> My problem with this patch is that the description is not clear.
>
> <<<<
> That is because we migrated the lockres' to recovery master but didn't send
> deref requests to it accordingly(was sending to the dead original
> master or to
> the "UNKNOWN").
> >>>>
>
> Do you have the message sequencing that would lead to this situation?
> If we migrate the lockres to the reco master, the reco master will send
> an assert that will make us change the master.
So first, the problem is not about the changing owner. It is that
the bit(in refmap) on behalf of the node in question is not cleared on the new
master(recovery master). So that the new master will fail at purging the lockres
due to the incorrect bit in refmap.
Second, I have no messages at hand for the situation. But I think it is simple
enough.
1) node A has no interest on lockres A any longer, so it is purging it.
2) the owner of lockres A is node B, so node A is sending de-ref message
to node B.
3) at this time, node B crashed. node C becomes the recovery master. it recovers
lockres A(because the master is the dead node B).
4) node A migrated lockres A to node C with a refbit there.
5) node A failed to send de-ref message to node B because it crashed. The failure
is ignored. no other action is done for lockres A any more.
So node A means to drop the ref on the master. But in such a situation, node C
keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging
lockres A and hang on umount.
>
> I think your problem is the one race we have concerning reco/migration.
> If so, this fix is not enough.
It's a problem of purging + recovery. no pure migration for umount here.
So what's your concern?
regards,
wengang.
>
> Sunil
>
> On 07/19/2010 03:09 AM, Wengang Wang wrote:
> >Any comment?
> >
> >On 10-06-11 00:25, Wengang Wang wrote:
> >>If the dlm recovery goes on the non-master node where purging work is going on,
> >>There could be unexpected reference left on some lockres' on recovery master.
> >>That is because we migrated the lockres' to recovery master but didn't send
> >>deref requests to it accordingly(was sending to the dead original master or to
> >>the "UNKNOWN").
> >>
> >>Fix:
> >>For the lockres which is in progress of dropping reference, we don't migrate it
> >>to recovery master and unhash the lockres in the purge work.
> >>For those not in progress of the dropping, delay the purge work until recovery
> >>finished so that it can send deref request to the correct master(recovery
> >>master) later.
> >>
> >>Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> >>---
> >> fs/ocfs2/dlm/dlmrecovery.c | 17 +++++++++++++++--
> >> fs/ocfs2/dlm/dlmthread.c | 36 ++++++++++++++++++++++--------------
> >> 2 files changed, 37 insertions(+), 16 deletions(-)
> >>
> >>diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> >>index f8b75ce..43530ce 100644
> >>--- a/fs/ocfs2/dlm/dlmrecovery.c
> >>+++ b/fs/ocfs2/dlm/dlmrecovery.c
> >>@@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
> >> struct list_head *queue;
> >> struct dlm_lock *lock, *next;
> >>
> >>+ assert_spin_locked(&dlm->spinlock);
> >>+ assert_spin_locked(&res->spinlock);
> >> res->state |= DLM_LOCK_RES_RECOVERING;
> >> if (!list_empty(&res->recovering)) {
> >> mlog(0,
> >>@@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
> >>
> >> /* the wake_up for this will happen when the
> >> * RECOVERING flag is dropped later */
> >>- res->state&= ~DLM_LOCK_RES_DROPPING_REF;
> >>+ if (res->state& DLM_LOCK_RES_DROPPING_REF) {
> >>+ /*
> >>+ * don't migrate a lockres which is in
> >>+ * progress of dropping ref
> >>+ */
> >>+ mlog(ML_NOTICE, "%.*s ignored for "
> >>+ "migration\n", res->lockname.len,
> >>+ res->lockname.name);
> >>+ res->state&=
> >>+ ~DLM_LOCK_RES_DROPPING_REF;
> >>+ } else
> >>+ dlm_move_lockres_to_recovery_list(dlm,
> >>+ res);
> >>
> >>- dlm_move_lockres_to_recovery_list(dlm, res);
> >> } else if (res->owner == dlm->node_num) {
> >> dlm_free_dead_locks(dlm, res, dead_node);
> >> __dlm_lockres_calc_usage(dlm, res);
> >>diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> >>index d4f73ca..0771420 100644
> >>--- a/fs/ocfs2/dlm/dlmthread.c
> >>+++ b/fs/ocfs2/dlm/dlmthread.c
> >>@@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
> >> * truly ready to be freed. */
> >> int __dlm_lockres_unused(struct dlm_lock_resource *res)
> >> {
> >>- if (!__dlm_lockres_has_locks(res)&&
> >>- (list_empty(&res->dirty)&& !(res->state& DLM_LOCK_RES_DIRTY))) {
> >>- /* try not to scan the bitmap unless the first two
> >>- * conditions are already true */
> >>- int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> >>- if (bit>= O2NM_MAX_NODES) {
> >>- /* since the bit for dlm->node_num is not
> >>- * set, inflight_locks better be zero */
> >>- BUG_ON(res->inflight_locks != 0);
> >>- return 1;
> >>- }
> >>+ int bit;
> >>+
> >>+ if (__dlm_lockres_has_locks(res))
> >>+ return 0;
> >>+
> >>+ if (!list_empty(&res->dirty) || res->state& DLM_LOCK_RES_DIRTY)
> >>+ return 0;
> >>+
> >>+ if (res->state& DLM_LOCK_RES_RECOVERING)
> >>+ return 0;
> >>+
> >>+ bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> >>+ if (bit>= O2NM_MAX_NODES) {
> >>+ /* since the bit for dlm->node_num is not
> >>+ * set, inflight_locks better be zero */
> >>+ BUG_ON(res->inflight_locks != 0);
> >>+ return 1;
> >> }
> >> return 0;
> >> }
> >>@@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> >> int master;
> >> int ret = 0;
> >>
> >>+ assert_spin_locked(&dlm->spinlock);
> >>+
> >> spin_lock(&res->spinlock);
> >> if (!__dlm_lockres_unused(res)) {
> >> mlog(0, "%s:%.*s: tried to purge but not unused\n",
> >>@@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
> >> "master = %d\n", res->lockname.len, res->lockname.name,
> >> res, master);
> >> list_del_init(&res->purge);
> >>- spin_unlock(&res->spinlock);
> >>+ /* not the last ref */
> >> dlm_lockres_put(res);
> >> dlm->purge_count--;
> >>- } else
> >>- spin_unlock(&res->spinlock);
> >>+ }
> >>
> >> __dlm_unhash_lockres(res);
> >>+ spin_unlock(&res->spinlock);
> >>
> >> /* lockres is not in the hash now. drop the flag and wake up
> >> * any processes waiting in dlm_get_lock_resource. */
> >>--
> >>1.6.6.1
> >>
> >>
> >>_______________________________________________
> >>Ocfs2-devel mailing list
> >>Ocfs2-devel at oss.oracle.com
> >>http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
> >_______________________________________________
> >Ocfs2-devel mailing list
> >Ocfs2-devel at oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-20 2:59 ` Wengang Wang
@ 2010-07-20 22:33 ` Sunil Mushran
2010-07-21 12:22 ` Wengang Wang
0 siblings, 1 reply; 11+ messages in thread
From: Sunil Mushran @ 2010-07-20 22:33 UTC (permalink / raw)
To: ocfs2-devel
On 07/19/2010 07:59 PM, Wengang Wang wrote:
>> Do you have the message sequencing that would lead to this situation?
>> If we migrate the lockres to the reco master, the reco master will send
>> an assert that will make us change the master.
>>
> So first, the problem is not about the changing owner. It is that
> the bit(in refmap) on behalf of the node in question is not cleared on the new
> master(recovery master). So that the new master will fail at purging the lockres
> due to the incorrect bit in refmap.
>
> Second, I have no messages at hand for the situation. But I think it is simple
> enough.
>
> 1) node A has no interest on lockres A any longer, so it is purging it.
> 2) the owner of lockres A is node B, so node A is sending de-ref message
> to node B.
> 3) at this time, node B crashed. node C becomes the recovery master. it recovers
> lockres A(because the master is the dead node B).
> 4) node A migrated lockres A to node C with a refbit there.
> 5) node A failed to send de-ref message to node B because it crashed. The failure
> is ignored. no other action is done for lockres A any more.
>
In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
when the owner is the dead_node. So this should not happen.
Your patch changes the logic to exclude such lockres' from the
recovery list. And that's a change, while possibly workable, needs
to be looked into more thoroughly.
In short, there is a disconnect between your description and your patch.
Or, my understanding.
> So node A means to drop the ref on the master. But in such a situation, node C
> keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging
> lockres A and hang on umount.
>
>
>> I think your problem is the one race we have concerning reco/migration.
>> If so, this fix is not enough.
>>
> It's a problem of purging + recovery. no pure migration for umount here.
> So what's your concern?
>
See above.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-20 22:33 ` Sunil Mushran
@ 2010-07-21 12:22 ` Wengang Wang
2010-07-21 18:19 ` Sunil Mushran
0 siblings, 1 reply; 11+ messages in thread
From: Wengang Wang @ 2010-07-21 12:22 UTC (permalink / raw)
To: ocfs2-devel
On 10-07-20 15:33, Sunil Mushran wrote:
> On 07/19/2010 07:59 PM, Wengang Wang wrote:
> >>Do you have the message sequencing that would lead to this situation?
> >>If we migrate the lockres to the reco master, the reco master will send
> >>an assert that will make us change the master.
> >So first, the problem is not about the changing owner. It is that
> >the bit(in refmap) on behalf of the node in question is not cleared on the new
> >master(recovery master). So that the new master will fail at purging the lockres
> >due to the incorrect bit in refmap.
> >
> >Second, I have no messages at hand for the situation. But I think it is simple
> >enough.
> >
> >1) node A has no interest on lockres A any longer, so it is purging it.
> >2) the owner of lockres A is node B, so node A is sending de-ref message
> >to node B.
> >3) at this time, node B crashed. node C becomes the recovery master. it recovers
> >lockres A(because the master is the dead node B).
> >4) node A migrated lockres A to node C with a refbit there.
> >5) node A failed to send de-ref message to node B because it crashed. The failure
> >is ignored. no other action is done for lockres A any more.
>
> In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
> when the owner is the dead_node. So this should not happen.
It reproduces in my test env.
Clearing the flag DLM_LOCK_RES_DROPPING_REF doesn't prevent anything.
dlm_do_local_recovery_cleanup() continue to move the lockres to recovery
list.
2337 /* the wake_up for this will happen when the
2338 * RECOVERING flag is dropped later */
2339 res->state &= ~DLM_LOCK_RES_DROPPING_REF;
2340
2341 dlm_move_lockres_to_recovery_list(dlm, res);
and dlm_purge_lockres() continue to unhash the lockres.
202 ret = dlm_drop_lockres_ref(dlm, res);
203 if (ret < 0) {
204 mlog_errno(ret);
205 if (!dlm_is_host_down(ret))
206 BUG();
207 }
208 mlog(0, "%s:%.*s: dlm_deref_lockres returned %d\n",
209 dlm->name, res->lockname.len, res->lockname.name, ret);
210 spin_lock(&dlm->spinlock);
211 }
212
213 spin_lock(&res->spinlock);
214 if (!list_empty(&res->purge)) {
215 mlog(0, "removing lockres %.*s:%p from purgelist, "
216 "master = %d\n", res->lockname.len, res->lockname.name,
217 res, master);
218 list_del_init(&res->purge);
219 spin_unlock(&res->spinlock);
220 dlm_lockres_put(res);
221 dlm->purge_count--;
222 } else
223 spin_unlock(&res->spinlock);
224
225 __dlm_unhash_lockres(res);
> Your patch changes the logic to exclude such lockres' from the
> recovery list. And that's a change, while possibly workable, needs
> to be looked into more thoroughly.
>
> In short, there is a disconnect between your description and your patch.
> Or, my understanding.
For mormal, we recover the lockres to recovery master, and then re-send the deref
message to it. That my privious patches do.
After discussing with Srini, we found ignoring the failure of deref to the original
master and not recovering the lockres to recovery master has the same effect. And
it's simpler.
The patch fixes the bug per my test result.
regards,
wengang.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-21 12:22 ` Wengang Wang
@ 2010-07-21 18:19 ` Sunil Mushran
2010-07-22 10:51 ` Wengang Wang
0 siblings, 1 reply; 11+ messages in thread
From: Sunil Mushran @ 2010-07-21 18:19 UTC (permalink / raw)
To: ocfs2-devel
So I discussed this problem with Srini. Yes, your patch is on the right
track. Just that it needs to be rebased atop Srini's patch.
On 07/21/2010 05:22 AM, Wengang Wang wrote:
> On 10-07-20 15:33, Sunil Mushran wrote:
>
>> On 07/19/2010 07:59 PM, Wengang Wang wrote:
>>
>>>> Do you have the message sequencing that would lead to this situation?
>>>> If we migrate the lockres to the reco master, the reco master will send
>>>> an assert that will make us change the master.
>>>>
>>> So first, the problem is not about the changing owner. It is that
>>> the bit(in refmap) on behalf of the node in question is not cleared on the new
>>> master(recovery master). So that the new master will fail at purging the lockres
>>> due to the incorrect bit in refmap.
>>>
>>> Second, I have no messages at hand for the situation. But I think it is simple
>>> enough.
>>>
>>> 1) node A has no interest on lockres A any longer, so it is purging it.
>>> 2) the owner of lockres A is node B, so node A is sending de-ref message
>>> to node B.
>>> 3) at this time, node B crashed. node C becomes the recovery master. it recovers
>>> lockres A(because the master is the dead node B).
>>> 4) node A migrated lockres A to node C with a refbit there.
>>> 5) node A failed to send de-ref message to node B because it crashed. The failure
>>> is ignored. no other action is done for lockres A any more.
>>>
>> In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
>> when the owner is the dead_node. So this should not happen.
>>
> It reproduces in my test env.
>
> Clearing the flag DLM_LOCK_RES_DROPPING_REF doesn't prevent anything.
>
> dlm_do_local_recovery_cleanup() continue to move the lockres to recovery
> list.
>
> 2337 /* the wake_up for this will happen when the
> 2338 * RECOVERING flag is dropped later */
> 2339 res->state&= ~DLM_LOCK_RES_DROPPING_REF;
> 2340
> 2341 dlm_move_lockres_to_recovery_list(dlm, res);
>
>
> and dlm_purge_lockres() continue to unhash the lockres.
>
> 202 ret = dlm_drop_lockres_ref(dlm, res);
> 203 if (ret< 0) {
> 204 mlog_errno(ret);
> 205 if (!dlm_is_host_down(ret))
> 206 BUG();
> 207 }
> 208 mlog(0, "%s:%.*s: dlm_deref_lockres returned %d\n",
> 209 dlm->name, res->lockname.len, res->lockname.name, ret);
> 210 spin_lock(&dlm->spinlock);
> 211 }
> 212
> 213 spin_lock(&res->spinlock);
> 214 if (!list_empty(&res->purge)) {
> 215 mlog(0, "removing lockres %.*s:%p from purgelist, "
> 216 "master = %d\n", res->lockname.len, res->lockname.name,
> 217 res, master);
> 218 list_del_init(&res->purge);
> 219 spin_unlock(&res->spinlock);
> 220 dlm_lockres_put(res);
> 221 dlm->purge_count--;
> 222 } else
> 223 spin_unlock(&res->spinlock);
> 224
> 225 __dlm_unhash_lockres(res);
>
>
>
>> Your patch changes the logic to exclude such lockres' from the
>> recovery list. And that's a change, while possibly workable, needs
>> to be looked into more thoroughly.
>>
>> In short, there is a disconnect between your description and your patch.
>> Or, my understanding.
>>
> For mormal, we recover the lockres to recovery master, and then re-send the deref
> message to it. That my privious patches do.
>
> After discussing with Srini, we found ignoring the failure of deref to the original
> master and not recovering the lockres to recovery master has the same effect. And
> it's simpler.
>
> The patch fixes the bug per my test result.
>
> regards,
> wengang.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-21 18:19 ` Sunil Mushran
@ 2010-07-22 10:51 ` Wengang Wang
2010-07-22 16:58 ` Sunil Mushran
0 siblings, 1 reply; 11+ messages in thread
From: Wengang Wang @ 2010-07-22 10:51 UTC (permalink / raw)
To: ocfs2-devel
On 10-07-21 11:19, Sunil Mushran wrote:
> So I discussed this problem with Srini. Yes, your patch is on the right
> track. Just that it needs to be rebased atop Srini's patch.
>
It seems that Srini's patch is not committed to mainline yet.
regards,
wengang.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
2010-07-22 10:51 ` Wengang Wang
@ 2010-07-22 16:58 ` Sunil Mushran
0 siblings, 0 replies; 11+ messages in thread
From: Sunil Mushran @ 2010-07-22 16:58 UTC (permalink / raw)
To: ocfs2-devel
On 07/22/2010 03:51 AM, Wengang Wang wrote:
> On 10-07-21 11:19, Sunil Mushran wrote:
>
>> So I discussed this problem with Srini. Yes, your patch is on the right
>> track. Just that it needs to be rebased atop Srini's patch.
>>
>>
> It seems that Srini's patch is not committed to mainline yet.
>
No, it is not as yet. But should be in the queue. Just get the patch from
Srini and rebase your patch atop it.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-07-22 16:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10 16:25 [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master Wengang Wang
2010-06-25 1:55 ` Wengang Wang
2010-07-05 10:00 ` Wengang Wang
2010-07-19 10:09 ` Wengang Wang
2010-07-19 23:52 ` Sunil Mushran
2010-07-20 2:59 ` Wengang Wang
2010-07-20 22:33 ` Sunil Mushran
2010-07-21 12:22 ` Wengang Wang
2010-07-21 18:19 ` Sunil Mushran
2010-07-22 10:51 ` Wengang Wang
2010-07-22 16:58 ` Sunil Mushran
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).