All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal,
@ 2015-12-17  5:33 Zhangguanghui
  2015-12-18  1:05 ` Joseph Qi
  2015-12-18  2:15 ` Ryan Ding
  0 siblings, 2 replies; 4+ messages in thread
From: Zhangguanghui @ 2015-12-17  5:33 UTC (permalink / raw)
  To: ocfs2-devel

Hi all,

A tiny race about JBD2 has aborted to jbd2_journal_flush,

because of unstable storage link and I/O stress.

while JBD2 state is aborted, have been -EIO error,

may cause all cluster nodes hung. so I thinks

JBD2 has aborted the journal, ocfs2 cannot continue and trigger ocfs2_abort.

Thanks, Any ideas about this patch?


description:

ocfs2_commit_thread
  ocfs2_commit_cache
    jbd2_journal_flush


--- journal.c 2015-12-17 11:36:39.140542941 +0800
+++ journal.c.diff 2015-12-17 11:39:21.308542922 +0800
@@ -328,6 +328,9 @@
if (status < 0) {
up_write(&journal->j_trans_barrier);
mlog_errno(status);
+ if (is_journal_aborted(journal)) {
+ ocfs2_abort(osb->sb, "Detect aborted journal,while committing cache.");
+ }
goto finally;
}
________________________________
zhangguanghui
-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20151217/29877c0d/attachment.html 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal,
  2015-12-17  5:33 [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal, Zhangguanghui
@ 2015-12-18  1:05 ` Joseph Qi
  2015-12-18  4:58   ` Zhangguanghui
  2015-12-18  2:15 ` Ryan Ding
  1 sibling, 1 reply; 4+ messages in thread
From: Joseph Qi @ 2015-12-18  1:05 UTC (permalink / raw)
  To: ocfs2-devel

Hi Guanghui,
Could you please describe the problem you encountered more specifically?
I don't think this change is in a fair way.

On 2015/12/17 13:33, Zhangguanghui wrote:
> Hi all,
> 
> A tiny race about JBD2 has aborted to jbd2_journal_flush, 
> 
> because of unstable storage link and I/O stress.
> 
> while JBD2 state is aborted, have been -EIO error,
> 
> may cause all cluster nodes hung. so I thinks 
> 
> JBD2 has aborted the journal, ocfs2 cannot continue and trigger ocfs2_abort. 
> 
> Thanks, Any ideas about this patch?
> 
> 
> description:
> 
> ocfs2_commit_thread
>   ocfs2_commit_cache
>     jbd2_journal_flush
> 
> 
> --- journal.c 2015-12-17 11:36:39.140542941 +0800
> +++ journal.c.diff 2015-12-17 11:39:21.308542922 +0800
> @@ -328,6 +328,9 @@
> if (status < 0) {
> up_write(&journal->j_trans_barrier);
> mlog_errno(status);
> + if (is_journal_aborted(journal)) {
> + ocfs2_abort(osb->sb, "Detect aborted journal,while committing cache.");
> + }
> goto finally;
> }
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
---
> zhangguanghui
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal,
  2015-12-17  5:33 [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal, Zhangguanghui
  2015-12-18  1:05 ` Joseph Qi
@ 2015-12-18  2:15 ` Ryan Ding
  1 sibling, 0 replies; 4+ messages in thread
From: Ryan Ding @ 2015-12-18  2:15 UTC (permalink / raw)
  To: ocfs2-devel

Hi Guanghui,

I think I encounter an problem just like you. But it's not race case.

Every time ocfs2_commit_threadreceive an errorfrom jbd2_journal_flush 
(which may cause by disk io error), it will continue to try commit 
journal. But in this case, journal should run into abort state, so retry 
commit is useless. And even worse, the lock resource hold by this node 
can not be release, so entire cluster hung.

I have write a patch about this, and my solution is just like yours, 
will send it in another email.

Thanks,
Ryan

On 12/17/2015 01:33 PM, Zhangguanghui wrote:
> Hi all,
> A tiny race aboutJBD2has aborted to jbd2_journal_flush,
> because of unstable storagelink and I/O stress.
> whileJBD2state is aborted, have been -EIO error,
> may cause all cluster nodes hung. so I thinks
> JBD2 has aborted the journal, ocfs2 cannot continue andtriggerocfs2_abort.
> Thanks,Any ideas about this patch?
> description:
> ocfs2_commit_thread
>    ocfs2_commit_cache
>      jbd2_journal_flush
> --- journal.c 2015-12-17 11:36:39.140542941 +0800
> +++ journal.c.diff 2015-12-17 11:39:21.308542922 +0800
> @@ -328,6 +328,9 @@
> if (status < 0) {
> up_write(&journal->j_trans_barrier);
> mlog_errno(status);
> + if (is_journal_aborted(journal)) {
> + ocfs2_abort(osb->sb, "Detect aborted journal,while committing cache.");
> + }
> goto finally;
> }
> ------------------------------------------------------------------------
> zhangguanghui
> -------------------------------------------------------------------------------------------------------------------------------------
> ??????????????????????????,?????????????
> ?????????????????????(??????????????????
> ???)?????????????????,??????????????????
> ??!
> This e-mail and its attachments contain confidential information from 
> H3C, which is
> intended only for the person or entity whose address is listed above. 
> Any use of the
> information contained herein in any way (including, but not limited 
> to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the 
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender
> by phone or email immediately and delete it!
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20151218/4252dcb3/attachment.html 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal,
  2015-12-18  1:05 ` Joseph Qi
@ 2015-12-18  4:58   ` Zhangguanghui
  0 siblings, 0 replies; 4+ messages in thread
From: Zhangguanghui @ 2015-12-18  4:58 UTC (permalink / raw)
  To: ocfs2-devel

Hi  Joseph<mailto:joseph.qi@huawei.com>

The following locking order can cause a deadlock.
 Node  A                                                                                                           Node B                                                              Node C
     Super lock  EX
        ocfs2_commit_thread
             ocfs2_commit_cache
              jbd2_journal_flush  while  journal is aborted , have been -EIO error.
     do not wake_up(&osb->dc_event)
     do not  downconvert EX->NL

while Node B required EX lock or PR lock, may cause nodes hung.
So reset Node A,  Node B and Node C will be normal.
Thanks a lot
________________________________
zhangguanghui

From: Joseph Qi<mailto:joseph.qi@huawei.com>
Date: 2015-12-18 09:05
To: zhangguanghui 10102 (CCPL)<mailto:zhang.guanghui@h3c.com>
CC: ocfs2-devel at oss.oracle.com<mailto:ocfs2-devel@oss.oracle.com>
Subject: Re: [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal,

Hi Guanghui,
Could you please describe the problem you encountered more specifically?
I don't think this change is in a fair way.

On 2015/12/17 13:33, Zhangguanghui wrote:
> Hi all,
>
> A tiny race about JBD2 has aborted to jbd2_journal_flush,
>
> because of unstable storage link and I/O stress.
>
> while JBD2 state is aborted, have been -EIO error,
>
> may cause all cluster nodes hung. so I thinks
>
> JBD2 has aborted the journal, ocfs2 cannot continue and trigger ocfs2_abort.
>
> Thanks, Any ideas about this patch?
>
>
> description:
>
> ocfs2_commit_thread
>   ocfs2_commit_cache
>     jbd2_journal_flush
>
>
> --- journal.c 2015-12-17 11:36:39.140542941 +0800
> +++ journal.c.diff 2015-12-17 11:39:21.308542922 +0800
> @@ -328,6 +328,9 @@
> if (status < 0) {
> up_write(&journal->j_trans_barrier);
> mlog_errno(status);
> + if (is_journal_aborted(journal)) {
> + ocfs2_abort(osb->sb, "Detect aborted journal,while committing cache.");
> + }
> goto finally;
> }
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
---
> zhangguanghui
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20151218/daea7981/attachment-0001.html 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-12-18  4:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-17  5:33 [Ocfs2-devel] ocfs2 cannot continue when JBD2 has aborted the journal, Zhangguanghui
2015-12-18  1:05 ` Joseph Qi
2015-12-18  4:58   ` Zhangguanghui
2015-12-18  2:15 ` Ryan Ding

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.