All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joseph Qi via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
To: "heming.zhao@suse.com" <heming.zhao@suse.com>,
	ocfs2-devel@oss.oracle.com
Cc: vvidic@valentin-vidic.from.hr
Subject: Re: [Ocfs2-devel] [Resend PATCH v3] ocfs2: fix kernel crash after mounting when journal doesn't ready
Date: Mon, 28 Mar 2022 00:04:20 +0800	[thread overview]
Message-ID: <3fb78714-1e48-e72a-c8cf-52ca5723562b@linux.alibaba.com> (raw)
In-Reply-To: <9efaf256-4f66-32ca-96ef-7b2617396c27@suse.com>



On 3/27/22 6:37 PM, heming.zhao@suse.com wrote:
> Hello Joseph,
> 
> Could we speed up this crash fixing? I could trigger this kind of crash at least two ways.
> Do you have another alternative patch?
> 
As I've described in previous mail, I don't want to fix it in the way
you proposed.
And I am afraid we need do a thoroughly check since I've found there
are other issues in ocfs2_fill_super(), e.g.

ocfs2_fill_super
  ocfs2_initialize_super // fails in check max_slots

Then osb->osb_mount_event is not properly initialized, and then will
crash at wake_up(&osb->osb_mount_event).

Thanks,
Joseph

> On 3/20/22 21:51, Joseph Qi wrote:
>> I don't think it deserves a complex fix like this.
>> iput() is a common action and we don't want to re-create a private flow
>> in ocfs2. Also I think it's not safe.
>> I suggest we take the way to make sure ocfs2 journal is there as a
>> prerequisite.
> 
> let's image the journal init before any inode operations. and then there will
> create a gap between "journal init" and "dlm init". for a rule, any inode operation
> should be under dlm lock protecting. if there is a failed before dlm init.
> how to handle locking consistency?
> the correct code logic should do journal init after dlm lock init. IMO the commit
> ccd979bdbce9f ("OCFS2: The Second Oracle Cluster Filesystem") made the locking
> mistake on first code version. But author wrote comment to explain the reason.
> 
>     ```
>     /* FIXME
>      * This should be done in ocfs2_journal_init(), but unknown
>      * ordering issues will cause the filesystem to crash.
>      * If anyone wants to figure out what part of the code
>      * refers to osb->journal before ocfs2_journal_init() is run,
>      * be my guest.
>      */
>     ```
> 
> We don't simply turn back to pick up the problematic code logic. we should
> find a new method.
> 
> - Heming
> 
>>
>> On 3/18/22 11:04 PM, Heming Zhao wrote:
>>> Call trace:
>>>
>>>    ocfs2: Registered cluster interface user
>>>    dlm: no local IP address has been set
>>>    dlm: cannot start dlm lowcomms -107
>>>    (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
>>>    (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
>>>    (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
>>>    BUG: kernel NULL pointer dereference, address: 0000000000000030
>>>    #PF: supervisor read access in kernel mode
>>>    #PF: error_code(0x0000) - not-present page
>>>    PGD 0 P4D 0
>>>    Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>    CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1
>>> openSUSE Tumbleweed b40...e84
>>>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
>>>    RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
>>>    ...
>>>    Call Trace:
>>>     <TASK>
>>>     ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
>>>     evict+0xc0/0x1c0
>>>     ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
>>>     ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
>>>     ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
>>>     mount_bdev+0x182/0x1b0
>>>     ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
>>>     legacy_get_tree+0x24/0x40
>>>     vfs_get_tree+0x22/0xb0
>>>     path_mount+0x465/0xac0
>>>     __x64_sys_mount+0x103/0x140
>>>     do_syscall_64+0x59/0x80
>>>     ? syscall_exit_to_user_mode+0x18/0x40
>>>     ? do_syscall_64+0x69/0x80
>>>     ? syscall_exit_to_user_mode+0x18/0x40
>>>     ? do_syscall_64+0x69/0x80
>>>     ? exc_page_fault+0x68/0x150
>>>     entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> How to trigger:
>>>
>>>    tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>>>    tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \
>>>               --cluster-name=ocfstst
>>>
>>>               == there is no dlm running ==
>>>    tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>>>               == kernel crash ==
>>>
>>> Analysis:
>>>
>>> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
>>> Journal init later than before, it makes NULL pointer access in free
>>> routine.
>>>
>>> ocfs2_fill_super
>>>   + ocfs2_mount_volume
>>>   |  ocfs2_dlm_init //failed, osb->journal still NULL.
>>>   + ...
>>>   + ocfs2_dismount_volume
>>>      ocfs2_release_system_inodes
>>>        ...
>>>         evict
>>>          ...
>>>           ocfs2_clear_inode
>>>            ocfs2_checkpoint_inode
>>>             ocfs2_ci_fully_checkpointed
>>>              time_after(journal->j_trans_id, ci->ci_last_trans)
>>>               + journal is empty, crash!
>>>
>>> How to fix:
>>>
>>> create a new function _inode_direct_free() to handle journal non-exist
>>> case.
>>>
>>> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>> v3: change patch subject
>>>      change fix method:
>>>      - v1/v2 focused on the points of accessing osb->journal.
>>>      - v3 directly free inodes
>>> v2: revise commit log
>>>      add checking in ocfs2_dismount_volume()
>>> ---
>>>   fs/ocfs2/localalloc.c | 12 ++++++++
>>>   fs/ocfs2/super.c      | 67 ++++++++++++++++++++++++++++++++++++++++---
>>>   2 files changed, 75 insertions(+), 4 deletions(-)
>>> ...
>>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2022-03-27 16:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18 15:04 [Ocfs2-devel] [Resend PATCH v3] ocfs2: fix kernel crash after mounting when journal doesn't ready Heming Zhao via Ocfs2-devel
2022-03-20 13:51 ` Joseph Qi via Ocfs2-devel
2022-03-20 15:17   ` heming.zhao--- via Ocfs2-devel
2022-03-21  8:28     ` heming.zhao--- via Ocfs2-devel
2022-03-27 10:37   ` heming.zhao--- via Ocfs2-devel
2022-03-27 16:04     ` Joseph Qi via Ocfs2-devel [this message]
2022-03-27 16:56       ` heming.zhao--- via Ocfs2-devel
2022-03-28  2:08         ` Joseph Qi via Ocfs2-devel
2022-03-28  2:44           ` heming.zhao--- via Ocfs2-devel
2022-03-28  3:18             ` Joseph Qi via Ocfs2-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fb78714-1e48-e72a-c8cf-52ca5723562b@linux.alibaba.com \
    --to=ocfs2-devel@oss.oracle.com \
    --cc=heming.zhao@suse.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=vvidic@valentin-vidic.from.hr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.