ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
@ 2016-03-29 13:15 Gang He
  2016-03-31  2:56 ` Gang He
  0 siblings, 1 reply; 11+ messages in thread
From: Gang He @ 2016-03-29 13:15 UTC (permalink / raw)
  To: ocfs2-devel

Hello Guys,

I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
Base on my initial analysis, this bug looks like a race condition problem.
Unfortunately, there was no kernel crash dump caught, just got some kernel log as below,

kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
Oct 21 13:02:19 uii316 [ 1766.831234]
Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 3.0.101-0.47.67-default #1
Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
Oct 21 13:02:19 uii316
Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 [ocfs2]
Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 00010296
Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: ffff880f39c5e240 RCX: 00000000000039fd
Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 0000000000000007 RDI: 0000000000000246
Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: ffffffff81da0ac0 R09: 0000000000000000
Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 00000000ffffffff R12: ffff880f3949bc78
Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: ffff880f3d481000 R15: 00000000000e43bc
Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) GS:ffff880fefd40000(0000) knlGS:0000000000000000
Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 0000000f39d35000 CR4: 00000000000007e0
Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, threadinfo ffff880f39d78000, task ffff880f39c5e240)
Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
Oct 21 13:02:19 uii316 00000000000e43bc
Oct 21 13:02:19 uii316 00000000000eab40
Oct 21 13:02:19 uii316 ffff880f00000001
Oct 21 13:02:19 uii316
Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
Oct 21 13:02:19 uii316 ffff880f8e0d1000
Oct 21 13:02:19 uii316 ffff880f3949b800
Oct 21 13:02:19 uii316 0000000000000000
Oct 21 13:02:19 uii316
Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
Oct 21 13:02:19 uii316 0000000000000001
Oct 21 13:02:19 uii316 000000000002433c
Oct 21 13:02:19 uii316 0000000000000008
Oct 21 13:02:19 uii316
Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] ocfs2_setattr+0x26e/0xa90 [ocfs2]
Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] notify_change+0x19f/0x2f0
Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] do_truncate+0x57/0x80
Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] do_last+0x603/0x800
Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] path_openat+0xd9/0x420
Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] do_filp_open+0x4c/0xc0
Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] do_sys_open+0x17f/0x250
Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b
Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
Oct 21 13:02:19 uii316 [ 1766.831592] Code:

 The source code in question is as below,
 444 static int ocfs2_truncate_file(struct inode *inode,
 445                                struct buffer_head *di_bh,
 446                                u64 new_i_size)
 447 {
 448         int status = 0;
 449         struct ocfs2_dinode *fe = NULL;
 450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 451
 452         /* We trust di_bh because it comes from ocfs2_inode_lock(), which
 453          * already validated it */
 454         fe = (struct ocfs2_dinode *) di_bh->b_data;
 455
 456         trace_ocfs2_truncate_file((unsigned long long)OCFS2_I(inode)->ip_blkno,
 457                                   (unsigned long long)le64_to_cpu(fe->i_size),
 458                                   (unsigned long long)new_i_size);
 459
 460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),     <<= here
 461                         "Inode %llu, inode i_size = %lld != di "
 462                         "i_size = %llu, i_flags = 0x%x\n",
 463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
 464                         i_size_read(inode),
 465                         (unsigned long long)le64_to_cpu(fe->i_size),
 466                         le32_to_cpu(fe->i_flags));
 467

If your encountered the similar issue in the past, please let me know, to see if there is any patch/discussion for this bug.

Thanks a lot.
Gang

 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-03-29 13:15 [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file Gang He
@ 2016-03-31  2:56 ` Gang He
  2016-03-31  3:14   ` Junxiao Bi
  2016-03-31  4:26   ` Joseph Qi
  0 siblings, 2 replies; 11+ messages in thread
From: Gang He @ 2016-03-31  2:56 UTC (permalink / raw)
  To: ocfs2-devel

Hello Joseph and Junxiao,

Did you encounter this issue in the past? I doubt this is possible a race condition bug (rather than data inconsistency).

Thanks
Gang


>>> 
> Hello Guys,
> 
> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
> Base on my initial analysis, this bug looks like a race condition problem.
> Unfortunately, there was no kernel crash dump caught, just got some kernel 
> log as below,
> 
> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
> Oct 21 13:02:19 uii316 [ 1766.831234]
> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
> 3.0.101-0.47.67-default #1
> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
> [ocfs2]
> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
> 00010296
> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
> ffff880f39c5e240 RCX: 00000000000039fd
> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
> 0000000000000007 RDI: 0000000000000246
> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
> ffffffff81da0ac0 R09: 0000000000000000
> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
> 00000000ffffffff R12: ffff880f3949bc78
> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
> ffff880f3d481000 R15: 00000000000e43bc
> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
> GS:ffff880fefd40000(0000) knlGS:0000000000000000
> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
> 0000000f39d35000 CR4: 00000000000007e0
> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
> threadinfo ffff880f39d78000, task ffff880f39c5e240)
> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
> Oct 21 13:02:19 uii316 00000000000e43bc
> Oct 21 13:02:19 uii316 00000000000eab40
> Oct 21 13:02:19 uii316 ffff880f00000001
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
> Oct 21 13:02:19 uii316 ffff880f8e0d1000
> Oct 21 13:02:19 uii316 ffff880f3949b800
> Oct 21 13:02:19 uii316 0000000000000000
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
> Oct 21 13:02:19 uii316 0000000000000001
> Oct 21 13:02:19 uii316 000000000002433c
> Oct 21 13:02:19 uii316 0000000000000008
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
> ocfs2_setattr+0x26e/0xa90 [ocfs2]
> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
> notify_change+0x19f/0x2f0
> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
> do_truncate+0x57/0x80
> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
> do_last+0x603/0x800
> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
> path_openat+0xd9/0x420
> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
> do_filp_open+0x4c/0xc0
> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
> do_sys_open+0x17f/0x250
> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
> system_call_fastpath+0x16/0x1b
> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
> 
>  The source code in question is as below,
>  444 static int ocfs2_truncate_file(struct inode *inode,
>  445                                struct buffer_head *di_bh,
>  446                                u64 new_i_size)
>  447 {
>  448         int status = 0;
>  449         struct ocfs2_dinode *fe = NULL;
>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>  451
>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
> which
>  453          * already validated it */
>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>  455
>  456         trace_ocfs2_truncate_file((unsigned long 
> long)OCFS2_I(inode)->ip_blkno,
>  457                                   (unsigned long 
> long)le64_to_cpu(fe->i_size),
>  458                                   (unsigned long long)new_i_size);
>  459
>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>  <<= here
>  461                         "Inode %llu, inode i_size = %lld != di "
>  462                         "i_size = %llu, i_flags = 0x%x\n",
>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>  464                         i_size_read(inode),
>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>  466                         le32_to_cpu(fe->i_flags));
>  467
> 
> If your encountered the similar issue in the past, please let me know, to 
> see if there is any patch/discussion for this bug.
> 
> Thanks a lot.
> Gang
> 
>  
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-03-31  2:56 ` Gang He
@ 2016-03-31  3:14   ` Junxiao Bi
  2016-04-21  7:59     ` Gang He
  2016-03-31  4:26   ` Joseph Qi
  1 sibling, 1 reply; 11+ messages in thread
From: Junxiao Bi @ 2016-03-31  3:14 UTC (permalink / raw)
  To: ocfs2-devel

On 03/31/2016 10:56 AM, Gang He wrote:
> Hello Joseph and Junxiao,
> 
> Did you encounter this issue in the past? I doubt this is possible a race condition bug (rather than data inconsistency).
Never saw this. fsck report any corruption?

Thanks,
Junxiao.
> 
> Thanks
> Gang
> 
> 
>>>>
>> Hello Guys,
>>
>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>> Base on my initial analysis, this bug looks like a race condition problem.
>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>> log as below,
>>
>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>> Oct 21 13:02:19 uii316 [ 1766.831234]
>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>> 3.0.101-0.47.67-default #1
>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>> [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>> 00010296
>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>> ffff880f39c5e240 RCX: 00000000000039fd
>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>> 0000000000000007 RDI: 0000000000000246
>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>> ffffffff81da0ac0 R09: 0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>> 00000000ffffffff R12: ffff880f3949bc78
>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>> ffff880f3d481000 R15: 00000000000e43bc
>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>> 0000000080050033
>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>> 0000000f39d35000 CR4: 00000000000007e0
>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>> 0000000000000000 DR2: 0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>> Oct 21 13:02:19 uii316 00000000000e43bc
>> Oct 21 13:02:19 uii316 00000000000eab40
>> Oct 21 13:02:19 uii316 ffff880f00000001
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>> Oct 21 13:02:19 uii316 ffff880f3949b800
>> Oct 21 13:02:19 uii316 0000000000000000
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>> Oct 21 13:02:19 uii316 0000000000000001
>> Oct 21 13:02:19 uii316 000000000002433c
>> Oct 21 13:02:19 uii316 0000000000000008
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>> notify_change+0x19f/0x2f0
>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>> do_truncate+0x57/0x80
>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>> do_last+0x603/0x800
>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>> path_openat+0xd9/0x420
>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>> do_filp_open+0x4c/0xc0
>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>> do_sys_open+0x17f/0x250
>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>> system_call_fastpath+0x16/0x1b
>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>
>>  The source code in question is as below,
>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>  445                                struct buffer_head *di_bh,
>>  446                                u64 new_i_size)
>>  447 {
>>  448         int status = 0;
>>  449         struct ocfs2_dinode *fe = NULL;
>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>  451
>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>> which
>>  453          * already validated it */
>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>  455
>>  456         trace_ocfs2_truncate_file((unsigned long 
>> long)OCFS2_I(inode)->ip_blkno,
>>  457                                   (unsigned long 
>> long)le64_to_cpu(fe->i_size),
>>  458                                   (unsigned long long)new_i_size);
>>  459
>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>>  <<= here
>>  461                         "Inode %llu, inode i_size = %lld != di "
>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>  464                         i_size_read(inode),
>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>  466                         le32_to_cpu(fe->i_flags));
>>  467
>>
>> If your encountered the similar issue in the past, please let me know, to 
>> see if there is any patch/discussion for this bug.
>>
>> Thanks a lot.
>> Gang
>>
>>  
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com 
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-03-31  2:56 ` Gang He
  2016-03-31  3:14   ` Junxiao Bi
@ 2016-03-31  4:26   ` Joseph Qi
  2016-03-31  6:23     ` Gang He
  1 sibling, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2016-03-31  4:26 UTC (permalink / raw)
  To: ocfs2-devel

Hi Gang,
I haven't found any related information about this BUG.
Which kernel version are you using? It seems inode size mismatch between
disk and memory, so any further log about these?

Thanks,
Joseph

On 2016/3/31 10:56, Gang He wrote:
> Hello Joseph and Junxiao,
> 
> Did you encounter this issue in the past? I doubt this is possible a race condition bug (rather than data inconsistency).
> 
> Thanks
> Gang
> 
> 
>>>>
>> Hello Guys,
>>
>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>> Base on my initial analysis, this bug looks like a race condition problem.
>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>> log as below,
>>
>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>> Oct 21 13:02:19 uii316 [ 1766.831234]
>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>> 3.0.101-0.47.67-default #1
>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>> [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>> 00010296
>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>> ffff880f39c5e240 RCX: 00000000000039fd
>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>> 0000000000000007 RDI: 0000000000000246
>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>> ffffffff81da0ac0 R09: 0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>> 00000000ffffffff R12: ffff880f3949bc78
>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>> ffff880f3d481000 R15: 00000000000e43bc
>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>> 0000000080050033
>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>> 0000000f39d35000 CR4: 00000000000007e0
>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>> 0000000000000000 DR2: 0000000000000000
>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>> Oct 21 13:02:19 uii316 00000000000e43bc
>> Oct 21 13:02:19 uii316 00000000000eab40
>> Oct 21 13:02:19 uii316 ffff880f00000001
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>> Oct 21 13:02:19 uii316 ffff880f3949b800
>> Oct 21 13:02:19 uii316 0000000000000000
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>> Oct 21 13:02:19 uii316 0000000000000001
>> Oct 21 13:02:19 uii316 000000000002433c
>> Oct 21 13:02:19 uii316 0000000000000008
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>> notify_change+0x19f/0x2f0
>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>> do_truncate+0x57/0x80
>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>> do_last+0x603/0x800
>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>> path_openat+0xd9/0x420
>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>> do_filp_open+0x4c/0xc0
>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>> do_sys_open+0x17f/0x250
>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>> system_call_fastpath+0x16/0x1b
>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>
>>  The source code in question is as below,
>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>  445                                struct buffer_head *di_bh,
>>  446                                u64 new_i_size)
>>  447 {
>>  448         int status = 0;
>>  449         struct ocfs2_dinode *fe = NULL;
>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>  451
>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>> which
>>  453          * already validated it */
>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>  455
>>  456         trace_ocfs2_truncate_file((unsigned long 
>> long)OCFS2_I(inode)->ip_blkno,
>>  457                                   (unsigned long 
>> long)le64_to_cpu(fe->i_size),
>>  458                                   (unsigned long long)new_i_size);
>>  459
>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>>  <<= here
>>  461                         "Inode %llu, inode i_size = %lld != di "
>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>  464                         i_size_read(inode),
>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>  466                         le32_to_cpu(fe->i_flags));
>>  467
>>
>> If your encountered the similar issue in the past, please let me know, to 
>> see if there is any patch/discussion for this bug.
>>
>> Thanks a lot.
>> Gang
>>
>>  
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com 
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-03-31  4:26   ` Joseph Qi
@ 2016-03-31  6:23     ` Gang He
  0 siblings, 0 replies; 11+ messages in thread
From: Gang He @ 2016-03-31  6:23 UTC (permalink / raw)
  To: ocfs2-devel

Hello Joseph and Junxiao,

Unfortunately, there were not any kernel dump/message/fsck report, just happened twice at the customer environment.
The customer kept the last log as below, the customer used SLES11SP3 for x86_64 (3.0.x-xx).

Thanks
Gang 


>>> 
> Hi Gang,
> I haven't found any related information about this BUG.
> Which kernel version are you using? It seems inode size mismatch between
> disk and memory, so any further log about these?
> 
> Thanks,
> Joseph
> 
> On 2016/3/31 10:56, Gang He wrote:
>> Hello Joseph and Junxiao,
>> 
>> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
>> 
>> Thanks
>> Gang
>> 
>> 
>>>>>
>>> Hello Guys,
>>>
>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>> Base on my initial analysis, this bug looks like a race condition problem.
>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>> log as below,
>>>
>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>> 3.0.101-0.47.67-default #1
>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>> [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>> 00010296
>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>> ffff880f39c5e240 RCX: 00000000000039fd
>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>> 0000000000000007 RDI: 0000000000000246
>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>> ffffffff81da0ac0 R09: 0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>> 00000000ffffffff R12: ffff880f3949bc78
>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>> ffff880f3d481000 R15: 00000000000e43bc
>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>> 0000000080050033
>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>> 0000000f39d35000 CR4: 00000000000007e0
>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>> 0000000000000000 DR2: 0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>> 00000000ffff0ff0 DR7: 0000000000000400
>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>> Oct 21 13:02:19 uii316 00000000000eab40
>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>> Oct 21 13:02:19 uii316 0000000000000000
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>> Oct 21 13:02:19 uii316 0000000000000001
>>> Oct 21 13:02:19 uii316 000000000002433c
>>> Oct 21 13:02:19 uii316 0000000000000008
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>> notify_change+0x19f/0x2f0
>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>> do_truncate+0x57/0x80
>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>> do_last+0x603/0x800
>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>> path_openat+0xd9/0x420
>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>> do_filp_open+0x4c/0xc0
>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>> do_sys_open+0x17f/0x250
>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>> system_call_fastpath+0x16/0x1b
>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>
>>>  The source code in question is as below,
>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>  445                                struct buffer_head *di_bh,
>>>  446                                u64 new_i_size)
>>>  447 {
>>>  448         int status = 0;
>>>  449         struct ocfs2_dinode *fe = NULL;
>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>  451
>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>> which
>>>  453          * already validated it */
>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>  455
>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>> long)OCFS2_I(inode)->ip_blkno,
>>>  457                                   (unsigned long 
>>> long)le64_to_cpu(fe->i_size),
>>>  458                                   (unsigned long long)new_i_size);
>>>  459
>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
> 
>>>  <<= here
>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>  464                         i_size_read(inode),
>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>  466                         le32_to_cpu(fe->i_flags));
>>>  467
>>>
>>> If your encountered the similar issue in the past, please let me know, to 
>>> see if there is any patch/discussion for this bug.
>>>
>>> Thanks a lot.
>>> Gang
>>>
>>>  
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-03-31  3:14   ` Junxiao Bi
@ 2016-04-21  7:59     ` Gang He
  2016-04-21  9:05       ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Gang He @ 2016-04-21  7:59 UTC (permalink / raw)
  To: ocfs2-devel

Hello Guys, 

Sorry for delayed reply, the fsck log was reported from the customer.
OCFS2 volumes to check are:
/dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
/dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
/dev/dm-13                           89128960 19976272 69152688 23% /usr/sap/trans
uii316:~ # fsck.ocfs2 /dev/dm-13
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-13:
 Label:        <NONE>
 UUID:         E35AF95C8114494391E0FFDEFD07309B
 Number of blocks: 22282240
 Block size:     4096
 Number of clusters: 22282240
 Cluster size: 4096
 Number of slots: 2
/dev/dm-13 is clean. It will be checked after 20 additional mounts.
uii316:~ # fsck.ocfs2 /dev/dm-14
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-14: 
 Label:        <NONE>
 UUID:         43F2D9C8D70B47388527075F3C6C38BB
 Number of blocks: 15728640
 Block size:     4096
 Number of clusters: 15728640
 Cluster size: 4096
 Number of slots: 2
/dev/dm-14 is clean. It will be checked after 20 additional mounts.
uii316:~ # fsck.ocfs2 /dev/dm-11
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-11:
 Label:        <NONE>
 UUID:         0F1CBE6017934B5E8AD80149ED332659
 Number of blocks: 20971520
 Block size:     4096
 Number of clusters: 20971520
 Cluster size: 4096
 Number of slots: 2
/dev/dm-11 is clean. It will be checked after 20 additional mounts.

From the log, it looks the file systems were not corrupted, this should be a race-condition issue ?!

Thanks
Gang 


>>> 
> On 03/31/2016 10:56 AM, Gang He wrote:
>> Hello Joseph and Junxiao,
>> 
>> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
> Never saw this. fsck report any corruption?
> 
> Thanks,
> Junxiao.
>> 
>> Thanks
>> Gang
>> 
>> 
>>>>>
>>> Hello Guys,
>>>
>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>> Base on my initial analysis, this bug looks like a race condition problem.
>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>> log as below,
>>>
>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>> 3.0.101-0.47.67-default #1
>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>> [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>> 00010296
>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>> ffff880f39c5e240 RCX: 00000000000039fd
>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>> 0000000000000007 RDI: 0000000000000246
>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>> ffffffff81da0ac0 R09: 0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>> 00000000ffffffff R12: ffff880f3949bc78
>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>> ffff880f3d481000 R15: 00000000000e43bc
>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>> 0000000080050033
>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>> 0000000f39d35000 CR4: 00000000000007e0
>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>> 0000000000000000 DR2: 0000000000000000
>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>> 00000000ffff0ff0 DR7: 0000000000000400
>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>> Oct 21 13:02:19 uii316 00000000000eab40
>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>> Oct 21 13:02:19 uii316 0000000000000000
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>> Oct 21 13:02:19 uii316 0000000000000001
>>> Oct 21 13:02:19 uii316 000000000002433c
>>> Oct 21 13:02:19 uii316 0000000000000008
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>> notify_change+0x19f/0x2f0
>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>> do_truncate+0x57/0x80
>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>> do_last+0x603/0x800
>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>> path_openat+0xd9/0x420
>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>> do_filp_open+0x4c/0xc0
>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>> do_sys_open+0x17f/0x250
>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>> system_call_fastpath+0x16/0x1b
>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>
>>>  The source code in question is as below,
>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>  445                                struct buffer_head *di_bh,
>>>  446                                u64 new_i_size)
>>>  447 {
>>>  448         int status = 0;
>>>  449         struct ocfs2_dinode *fe = NULL;
>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>  451
>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>> which
>>>  453          * already validated it */
>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>  455
>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>> long)OCFS2_I(inode)->ip_blkno,
>>>  457                                   (unsigned long 
>>> long)le64_to_cpu(fe->i_size),
>>>  458                                   (unsigned long long)new_i_size);
>>>  459
>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
> 
>>>  <<= here
>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>  464                         i_size_read(inode),
>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>  466                         le32_to_cpu(fe->i_flags));
>>>  467
>>>
>>> If your encountered the similar issue in the past, please let me know, to 
>>> see if there is any patch/discussion for this bug.
>>>
>>> Thanks a lot.
>>> Gang
>>>
>>>  
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-04-21  7:59     ` Gang He
@ 2016-04-21  9:05       ` Joseph Qi
  2016-04-21  9:50         ` Gang He
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2016-04-21  9:05 UTC (permalink / raw)
  To: ocfs2-devel

Hi Gang,
May be. If so, it seems that it has relations with locks.
Can we know which data is newer?

Thanks,
Joseph

On 2016/4/21 15:59, Gang He wrote:
> Hello Guys, 
> 
> Sorry for delayed reply, the fsck log was reported from the customer.
> OCFS2 volumes to check are:
> /dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
> /dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
> /dev/dm-13                           89128960 19976272 69152688 23% /usr/sap/trans
> uii316:~ # fsck.ocfs2 /dev/dm-13
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-13:
>  Label:        <NONE>
>  UUID:         E35AF95C8114494391E0FFDEFD07309B
>  Number of blocks: 22282240
>  Block size:     4096
>  Number of clusters: 22282240
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-14
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-14: 
>  Label:        <NONE>
>  UUID:         43F2D9C8D70B47388527075F3C6C38BB
>  Number of blocks: 15728640
>  Block size:     4096
>  Number of clusters: 15728640
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-11
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-11:
>  Label:        <NONE>
>  UUID:         0F1CBE6017934B5E8AD80149ED332659
>  Number of blocks: 20971520
>  Block size:     4096
>  Number of clusters: 20971520
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
> 
>>From the log, it looks the file systems were not corrupted, this should be a race-condition issue ?!
> 
> Thanks
> Gang 
> 
> 
>>>>
>> On 03/31/2016 10:56 AM, Gang He wrote:
>>> Hello Joseph and Junxiao,
>>>
>>> Did you encounter this issue in the past? I doubt this is possible a race 
>> condition bug (rather than data inconsistency).
>> Never saw this. fsck report any corruption?
>>
>> Thanks,
>> Junxiao.
>>>
>>> Thanks
>>> Gang
>>>
>>>
>>>>>>
>>>> Hello Guys,
>>>>
>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>>> log as below,
>>>>
>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>>> 3.0.101-0.47.67-default #1
>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>> Oct 21 13:02:19 uii316
>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>>> [ocfs2]
>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>>> 00010296
>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>>> 0000000000000007 RDI: 0000000000000246
>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>> 0000000080050033
>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>>> 0000000000000000 DR2: 0000000000000000
>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>> Oct 21 13:02:19 uii316
>>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>> Oct 21 13:02:19 uii316
>>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>> Oct 21 13:02:19 uii316
>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>>> notify_change+0x19f/0x2f0
>>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>>> do_truncate+0x57/0x80
>>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>>> do_last+0x603/0x800
>>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>>> path_openat+0xd9/0x420
>>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>>> do_filp_open+0x4c/0xc0
>>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>>> do_sys_open+0x17f/0x250
>>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>>> system_call_fastpath+0x16/0x1b
>>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>
>>>>  The source code in question is as below,
>>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>>  445                                struct buffer_head *di_bh,
>>>>  446                                u64 new_i_size)
>>>>  447 {
>>>>  448         int status = 0;
>>>>  449         struct ocfs2_dinode *fe = NULL;
>>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>  451
>>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>>> which
>>>>  453          * already validated it */
>>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>  455
>>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>  457                                   (unsigned long 
>>>> long)le64_to_cpu(fe->i_size),
>>>>  458                                   (unsigned long long)new_i_size);
>>>>  459
>>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>>
>>>>  <<= here
>>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>  464                         i_size_read(inode),
>>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>>  466                         le32_to_cpu(fe->i_flags));
>>>>  467
>>>>
>>>> If your encountered the similar issue in the past, please let me know, to 
>>>> see if there is any patch/discussion for this bug.
>>>>
>>>> Thanks a lot.
>>>> Gang
>>>>
>>>>  
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com 
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-04-21  9:05       ` Joseph Qi
@ 2016-04-21  9:50         ` Gang He
  2016-04-21 12:22           ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Gang He @ 2016-04-21  9:50 UTC (permalink / raw)
  To: ocfs2-devel




>>> 
> Hi Gang,
> May be. If so, it seems that it has relations with locks.
> Can we know which data is newer?
Hi Joseph, I do not get your question. 
About this bug, the customer only have a panic message, no message/kernel dump is provided.


Thanks
Gang

> 
> Thanks,
> Joseph
> 
> On 2016/4/21 15:59, Gang He wrote:
>> Hello Guys, 
>> 
>> Sorry for delayed reply, the fsck log was reported from the customer.
>> OCFS2 volumes to check are:
>> /dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
>> /dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
>> /dev/dm-13                           89128960 19976272 69152688 23% 
> /usr/sap/trans
>> uii316:~ # fsck.ocfs2 /dev/dm-13
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-13:
>>  Label:        <NONE>
>>  UUID:         E35AF95C8114494391E0FFDEFD07309B
>>  Number of blocks: 22282240
>>  Block size:     4096
>>  Number of clusters: 22282240
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>> uii316:~ # fsck.ocfs2 /dev/dm-14
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-14: 
>>  Label:        <NONE>
>>  UUID:         43F2D9C8D70B47388527075F3C6C38BB
>>  Number of blocks: 15728640
>>  Block size:     4096
>>  Number of clusters: 15728640
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>> uii316:~ # fsck.ocfs2 /dev/dm-11
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-11:
>>  Label:        <NONE>
>>  UUID:         0F1CBE6017934B5E8AD80149ED332659
>>  Number of blocks: 20971520
>>  Block size:     4096
>>  Number of clusters: 20971520
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>> 
>>>From the log, it looks the file systems were not corrupted, this should be a 
> race-condition issue ?!
>> 
>> Thanks
>> Gang 
>> 
>> 
>>>>>
>>> On 03/31/2016 10:56 AM, Gang He wrote:
>>>> Hello Joseph and Junxiao,
>>>>
>>>> Did you encounter this issue in the past? I doubt this is possible a race 
>>> condition bug (rather than data inconsistency).
>>> Never saw this. fsck report any corruption?
>>>
>>> Thanks,
>>> Junxiao.
>>>>
>>>> Thanks
>>>> Gang
>>>>
>>>>
>>>>>>>
>>>>> Hello Guys,
>>>>>
>>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>>>> log as below,
>>>>>
>>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>>>> 3.0.101-0.47.67-default #1
>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>>> Oct 21 13:02:19 uii316
>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>>>> [ocfs2]
>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>>>> 00010296
>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>>>> 0000000000000007 RDI: 0000000000000246
>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>>> 0000000080050033
>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>>>> 0000000000000000 DR2: 0000000000000000
>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>>> Oct 21 13:02:19 uii316
>>>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>>> Oct 21 13:02:19 uii316
>>>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>>> Oct 21 13:02:19 uii316
>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>>>> notify_change+0x19f/0x2f0
>>>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>>>> do_truncate+0x57/0x80
>>>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>>>> do_last+0x603/0x800
>>>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>>>> path_openat+0xd9/0x420
>>>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>>>> do_filp_open+0x4c/0xc0
>>>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>>>> do_sys_open+0x17f/0x250
>>>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>>>> system_call_fastpath+0x16/0x1b
>>>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>>
>>>>>  The source code in question is as below,
>>>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>>>  445                                struct buffer_head *di_bh,
>>>>>  446                                u64 new_i_size)
>>>>>  447 {
>>>>>  448         int status = 0;
>>>>>  449         struct ocfs2_dinode *fe = NULL;
>>>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>  451
>>>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>>>> which
>>>>>  453          * already validated it */
>>>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>  455
>>>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>>  457                                   (unsigned long 
>>>>> long)le64_to_cpu(fe->i_size),
>>>>>  458                                   (unsigned long long)new_i_size);
>>>>>  459
>>>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
> 
>>>
>>>>>  <<= here
>>>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>>  464                         i_size_read(inode),
>>>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>>>  466                         le32_to_cpu(fe->i_flags));
>>>>>  467
>>>>>
>>>>> If your encountered the similar issue in the past, please let me know, to 
>>>>> see if there is any patch/discussion for this bug.
>>>>>
>>>>> Thanks a lot.
>>>>> Gang
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com 
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-04-21  9:50         ` Gang He
@ 2016-04-21 12:22           ` Joseph Qi
  2016-04-22  3:17             ` Gang He
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2016-04-21 12:22 UTC (permalink / raw)
  To: ocfs2-devel

Hi Gang,

On 2016/4/21 17:50, Gang He wrote:
> 
> 
> 
>>>>
>> Hi Gang,
>> May be. If so, it seems that it has relations with locks.
>> Can we know which data is newer?
> Hi Joseph, I do not get your question. 
> About this bug, the customer only have a panic message, no message/kernel dump is provided.
> 
Filesystem not corrupted doesn't mean the disk inode is right (the latest).
That means the in-memory inode may be right but happens that flush hasn't
been done successfully.
But as of now, we can't distinguish which the case is.

Thanks,
Joseph

> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>> On 2016/4/21 15:59, Gang He wrote:
>>> Hello Guys, 
>>>
>>> Sorry for delayed reply, the fsck log was reported from the customer.
>>> OCFS2 volumes to check are:
>>> /dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
>>> /dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
>>> /dev/dm-13                           89128960 19976272 69152688 23% 
>> /usr/sap/trans
>>> uii316:~ # fsck.ocfs2 /dev/dm-13
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-13:
>>>  Label:        <NONE>
>>>  UUID:         E35AF95C8114494391E0FFDEFD07309B
>>>  Number of blocks: 22282240
>>>  Block size:     4096
>>>  Number of clusters: 22282240
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>>> uii316:~ # fsck.ocfs2 /dev/dm-14
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-14: 
>>>  Label:        <NONE>
>>>  UUID:         43F2D9C8D70B47388527075F3C6C38BB
>>>  Number of blocks: 15728640
>>>  Block size:     4096
>>>  Number of clusters: 15728640
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>>> uii316:~ # fsck.ocfs2 /dev/dm-11
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-11:
>>>  Label:        <NONE>
>>>  UUID:         0F1CBE6017934B5E8AD80149ED332659
>>>  Number of blocks: 20971520
>>>  Block size:     4096
>>>  Number of clusters: 20971520
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>>>
>>> >From the log, it looks the file systems were not corrupted, this should be a 
>> race-condition issue ?!
>>>
>>> Thanks
>>> Gang 
>>>
>>>
>>>>>>
>>>> On 03/31/2016 10:56 AM, Gang He wrote:
>>>>> Hello Joseph and Junxiao,
>>>>>
>>>>> Did you encounter this issue in the past? I doubt this is possible a race 
>>>> condition bug (rather than data inconsistency).
>>>> Never saw this. fsck report any corruption?
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>>
>>>>> Thanks
>>>>> Gang
>>>>>
>>>>>
>>>>>>>>
>>>>>> Hello Guys,
>>>>>>
>>>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>>>>> log as below,
>>>>>>
>>>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>>>>> 3.0.101-0.47.67-default #1
>>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>>>> Oct 21 13:02:19 uii316
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>>>>> [ocfs2]
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>>>>> 00010296
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>>>>> 0000000000000007 RDI: 0000000000000246
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>>>> 0000000080050033
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>>>> Oct 21 13:02:19 uii316
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>>>> Oct 21 13:02:19 uii316
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>>>> Oct 21 13:02:19 uii316
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>>>>> notify_change+0x19f/0x2f0
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>>>>> do_truncate+0x57/0x80
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>>>>> do_last+0x603/0x800
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>>>>> path_openat+0xd9/0x420
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>>>>> do_filp_open+0x4c/0xc0
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>>>>> do_sys_open+0x17f/0x250
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>>>>> system_call_fastpath+0x16/0x1b
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>>>
>>>>>>  The source code in question is as below,
>>>>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>>>>  445                                struct buffer_head *di_bh,
>>>>>>  446                                u64 new_i_size)
>>>>>>  447 {
>>>>>>  448         int status = 0;
>>>>>>  449         struct ocfs2_dinode *fe = NULL;
>>>>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>  451
>>>>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>>>>> which
>>>>>>  453          * already validated it */
>>>>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>>  455
>>>>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>>>  457                                   (unsigned long 
>>>>>> long)le64_to_cpu(fe->i_size),
>>>>>>  458                                   (unsigned long long)new_i_size);
>>>>>>  459
>>>>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>>
>>>>
>>>>>>  <<= here
>>>>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>>>  464                         i_size_read(inode),
>>>>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>>>>  466                         le32_to_cpu(fe->i_flags));
>>>>>>  467
>>>>>>
>>>>>> If your encountered the similar issue in the past, please let me know, to 
>>>>>> see if there is any patch/discussion for this bug.
>>>>>>
>>>>>> Thanks a lot.
>>>>>> Gang
>>>>>>
>>>>>>  
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel at oss.oracle.com 
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-04-21 12:22           ` Joseph Qi
@ 2016-04-22  3:17             ` Gang He
  2016-04-22  3:26               ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Gang He @ 2016-04-22  3:17 UTC (permalink / raw)
  To: ocfs2-devel




>>> 
> Hi Gang,
> 
> On 2016/4/21 17:50, Gang He wrote:
>> 
>> 
>> 
>>>>>
>>> Hi Gang,
>>> May be. If so, it seems that it has relations with locks.
>>> Can we know which data is newer?
>> Hi Joseph, I do not get your question. 
>> About this bug, the customer only have a panic message, no message/kernel 
> dump is provided.
>> 
> Filesystem not corrupted doesn't mean the disk inode is right (the latest).
> That means the in-memory inode may be right but happens that flush hasn't
> been done successfully.
> But as of now, we can't distinguish which the case is.
True, I just brought out this bug to everyone, need more time to analyze the code, to see if we can reproduce this issue manually in the future.

Thanks
Gang

> 
> Thanks,
> Joseph
> 
>> 
>> Thanks
>> Gang
>> 
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/4/21 15:59, Gang He wrote:
>>>> Hello Guys, 
>>>>
>>>> Sorry for delayed reply, the fsck log was reported from the customer.
>>>> OCFS2 volumes to check are:
>>>> /dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
>>>> /dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
>>>> /dev/dm-13                           89128960 19976272 69152688 23% 
>>> /usr/sap/trans
>>>> uii316:~ # fsck.ocfs2 /dev/dm-13
>>>> fsck.ocfs2 1.8.2
>>>> Checking OCFS2 filesystem in /dev/dm-13:
>>>>  Label:        <NONE>
>>>>  UUID:         E35AF95C8114494391E0FFDEFD07309B
>>>>  Number of blocks: 22282240
>>>>  Block size:     4096
>>>>  Number of clusters: 22282240
>>>>  Cluster size: 4096
>>>>  Number of slots: 2
>>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>>>> uii316:~ # fsck.ocfs2 /dev/dm-14
>>>> fsck.ocfs2 1.8.2
>>>> Checking OCFS2 filesystem in /dev/dm-14: 
>>>>  Label:        <NONE>
>>>>  UUID:         43F2D9C8D70B47388527075F3C6C38BB
>>>>  Number of blocks: 15728640
>>>>  Block size:     4096
>>>>  Number of clusters: 15728640
>>>>  Cluster size: 4096
>>>>  Number of slots: 2
>>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>>>> uii316:~ # fsck.ocfs2 /dev/dm-11
>>>> fsck.ocfs2 1.8.2
>>>> Checking OCFS2 filesystem in /dev/dm-11:
>>>>  Label:        <NONE>
>>>>  UUID:         0F1CBE6017934B5E8AD80149ED332659
>>>>  Number of blocks: 20971520
>>>>  Block size:     4096
>>>>  Number of clusters: 20971520
>>>>  Cluster size: 4096
>>>>  Number of slots: 2
>>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>>>>
>>>> >From the log, it looks the file systems were not corrupted, this should be a 
> 
>>> race-condition issue ?!
>>>>
>>>> Thanks
>>>> Gang 
>>>>
>>>>
>>>>>>>
>>>>> On 03/31/2016 10:56 AM, Gang He wrote:
>>>>>> Hello Joseph and Junxiao,
>>>>>>
>>>>>> Did you encounter this issue in the past? I doubt this is possible a race 
>>>>> condition bug (rather than data inconsistency).
>>>>> Never saw this. fsck report any corruption?
>>>>>
>>>>> Thanks,
>>>>> Junxiao.
>>>>>>
>>>>>> Thanks
>>>>>> Gang
>>>>>>
>>>>>>
>>>>>>>>>
>>>>>>> Hello Guys,
>>>>>>>
>>>>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>>>>>> log as below,
>>>>>>>
>>>>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>>>>>> 3.0.101-0.47.67-default #1
>>>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>>>>> Oct 21 13:02:19 uii316
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>>>>>> [ocfs2]
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>>>>>> 00010296
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>>>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>>>>>> 0000000000000007 RDI: 0000000000000246
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>>>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>>>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>>>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>>>>> 0000000080050033
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>>>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>>>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>>>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>>>>> Oct 21 13:02:19 uii316
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>>>>> Oct 21 13:02:19 uii316
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>>>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>>>>> Oct 21 13:02:19 uii316
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>>>>>> notify_change+0x19f/0x2f0
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>>>>>> do_truncate+0x57/0x80
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>>>>>> do_last+0x603/0x800
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>>>>>> path_openat+0xd9/0x420
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>>>>>> do_filp_open+0x4c/0xc0
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>>>>>> do_sys_open+0x17f/0x250
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>>>>>> system_call_fastpath+0x16/0x1b
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>>>>
>>>>>>>  The source code in question is as below,
>>>>>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>>>>>  445                                struct buffer_head *di_bh,
>>>>>>>  446                                u64 new_i_size)
>>>>>>>  447 {
>>>>>>>  448         int status = 0;
>>>>>>>  449         struct ocfs2_dinode *fe = NULL;
>>>>>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>>  451
>>>>>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>>>>>> which
>>>>>>>  453          * already validated it */
>>>>>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>>>  455
>>>>>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>>>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>>>>  457                                   (unsigned long 
>>>>>>> long)le64_to_cpu(fe->i_size),
>>>>>>>  458                                   (unsigned long long)new_i_size);
>>>>>>>  459
>>>>>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
> 
>>>
>>>>>
>>>>>>>  <<= here
>>>>>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>>>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>>>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>>>>  464                         i_size_read(inode),
>>>>>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>>>>>  466                         le32_to_cpu(fe->i_flags));
>>>>>>>  467
>>>>>>>
>>>>>>> If your encountered the similar issue in the past, please let me know, to 
>>>>>>> see if there is any patch/discussion for this bug.
>>>>>>>
>>>>>>> Thanks a lot.
>>>>>>> Gang
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-devel mailing list
>>>>>>> Ocfs2-devel at oss.oracle.com 
>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>>>>
>>>> .
>>>>
>> 
>> .
>> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file
  2016-04-22  3:17             ` Gang He
@ 2016-04-22  3:26               ` Joseph Qi
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Qi @ 2016-04-22  3:26 UTC (permalink / raw)
  To: ocfs2-devel

On 2016/4/22 11:17, Gang He wrote:
> 
> 
> 
>>>>
>> Hi Gang,
>>
>> On 2016/4/21 17:50, Gang He wrote:
>>>
>>>
>>>
>>>>>>
>>>> Hi Gang,
>>>> May be. If so, it seems that it has relations with locks.
>>>> Can we know which data is newer?
>>> Hi Joseph, I do not get your question. 
>>> About this bug, the customer only have a panic message, no message/kernel 
>> dump is provided.
>>>
>> Filesystem not corrupted doesn't mean the disk inode is right (the latest).
>> That means the in-memory inode may be right but happens that flush hasn't
>> been done successfully.
>> But as of now, we can't distinguish which the case is.
> True, I just brought out this bug to everyone, need more time to analyze the code, to see if we can reproduce this issue manually in the future.
Fine, we will also set up a race environment and try to reproduce it.
And give feedback to mail list ASAP once we have more info.

Thanks,
Joseph
> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>>>
>>> Thanks
>>> Gang
>>>
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>> On 2016/4/21 15:59, Gang He wrote:
>>>>> Hello Guys, 
>>>>>
>>>>> Sorry for delayed reply, the fsck log was reported from the customer.
>>>>> OCFS2 volumes to check are:
>>>>> /dev/dm-11                           83886080 44540260 39345820 54% /sapmnt
>>>>> /dev/dm-14                           62914560 15374960 47539600 25% /usr/sap
>>>>> /dev/dm-13                           89128960 19976272 69152688 23% 
>>>> /usr/sap/trans
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-13
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-13:
>>>>>  Label:        <NONE>
>>>>>  UUID:         E35AF95C8114494391E0FFDEFD07309B
>>>>>  Number of blocks: 22282240
>>>>>  Block size:     4096
>>>>>  Number of clusters: 22282240
>>>>>  Cluster size: 4096
>>>>>  Number of slots: 2
>>>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-14
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-14: 
>>>>>  Label:        <NONE>
>>>>>  UUID:         43F2D9C8D70B47388527075F3C6C38BB
>>>>>  Number of blocks: 15728640
>>>>>  Block size:     4096
>>>>>  Number of clusters: 15728640
>>>>>  Cluster size: 4096
>>>>>  Number of slots: 2
>>>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>>>>> uii316:~ # fsck.ocfs2 /dev/dm-11
>>>>> fsck.ocfs2 1.8.2
>>>>> Checking OCFS2 filesystem in /dev/dm-11:
>>>>>  Label:        <NONE>
>>>>>  UUID:         0F1CBE6017934B5E8AD80149ED332659
>>>>>  Number of blocks: 20971520
>>>>>  Block size:     4096
>>>>>  Number of clusters: 20971520
>>>>>  Cluster size: 4096
>>>>>  Number of slots: 2
>>>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>>>>>
>>>>> >From the log, it looks the file systems were not corrupted, this should be a 
>>
>>>> race-condition issue ?!
>>>>>
>>>>> Thanks
>>>>> Gang 
>>>>>
>>>>>
>>>>>>>>
>>>>>> On 03/31/2016 10:56 AM, Gang He wrote:
>>>>>>> Hello Joseph and Junxiao,
>>>>>>>
>>>>>>> Did you encounter this issue in the past? I doubt this is possible a race 
>>>>>> condition bug (rather than data inconsistency).
>>>>>> Never saw this. fsck report any corruption?
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Gang
>>>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>>> Hello Guys,
>>>>>>>>
>>>>>>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>>>>>>> Base on my initial analysis, this bug looks like a race condition problem.
>>>>>>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>>>>>>> log as below,
>>>>>>>>
>>>>>>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>>>>>>> 3.0.101-0.47.67-default #1
>>>>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>]
>>>>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] ocfs2_truncate_file+0xa5/0x490 
>>>>>>>> [ocfs2]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68  EFLAGS: 
>>>>>>>> 00010296
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: 
>>>>>>>> ffff880f39c5e240 RCX: 00000000000039fd
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: 
>>>>>>>> 0000000000000007 RDI: 0000000000000246
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: 
>>>>>>>> ffffffff81da0ac0 R09: 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: 
>>>>>>>> 00000000ffffffff R12: ffff880f3949bc78
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: 
>>>>>>>> ffff880f3d481000 R15: 00000000000e43bc
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  00007f11cda9d720(0000) 
>>>>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS: 0000 ES: 0000 CR0: 
>>>>>>>> 0000000080050033
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: 
>>>>>>>> 0000000f39d35000 CR4: 00000000000007e0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: 
>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: 
>>>>>>>> 00000000ffff0ff0 DR7: 0000000000000400
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>>>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240)
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831383]  000000000002433c
>>>>>>>> Oct 21 13:02:19 uii316 00000000000e43bc
>>>>>>>> Oct 21 13:02:19 uii316 00000000000eab40
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f00000001
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831397]  ffff880f394956e0
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000
>>>>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000000
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831410]  ffff880f39454980
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000001
>>>>>>>> Oct 21 13:02:19 uii316 000000000002433c
>>>>>>>> Oct 21 13:02:19 uii316 0000000000000008
>>>>>>>> Oct 21 13:02:19 uii316
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [<ffffffffa074742e>] 
>>>>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [<ffffffff81178faf>] 
>>>>>>>> notify_change+0x19f/0x2f0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [<ffffffff8115d517>] 
>>>>>>>> do_truncate+0x57/0x80
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [<ffffffff8116c053>] 
>>>>>>>> do_last+0x603/0x800
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [<ffffffff8116ceb9>] 
>>>>>>>> path_openat+0xd9/0x420
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [<ffffffff8116d33c>] 
>>>>>>>> do_filp_open+0x4c/0xc0
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [<ffffffff8115de5f>] 
>>>>>>>> do_sys_open+0x17f/0x250
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [<ffffffff8146e1f2>] 
>>>>>>>> system_call_fastpath+0x16/0x1b
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<00007f11ccb07080>] 0x7f11ccb0707f
>>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>>>>>>
>>>>>>>>  The source code in question is as below,
>>>>>>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>>>>>>  445                                struct buffer_head *di_bh,
>>>>>>>>  446                                u64 new_i_size)
>>>>>>>>  447 {
>>>>>>>>  448         int status = 0;
>>>>>>>>  449         struct ocfs2_dinode *fe = NULL;
>>>>>>>>  450         struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>>>  451
>>>>>>>>  452         /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>>>>>>> which
>>>>>>>>  453          * already validated it */
>>>>>>>>  454         fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>>>>>>  455
>>>>>>>>  456         trace_ocfs2_truncate_file((unsigned long 
>>>>>>>> long)OCFS2_I(inode)->ip_blkno,
>>>>>>>>  457                                   (unsigned long 
>>>>>>>> long)le64_to_cpu(fe->i_size),
>>>>>>>>  458                                   (unsigned long long)new_i_size);
>>>>>>>>  459
>>>>>>>>  460         mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),    
>>
>>>>
>>>>>>
>>>>>>>>  <<= here
>>>>>>>>  461                         "Inode %llu, inode i_size = %lld != di "
>>>>>>>>  462                         "i_size = %llu, i_flags = 0x%x\n",
>>>>>>>>  463                         (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>>>>>>>  464                         i_size_read(inode),
>>>>>>>>  465                         (unsigned long long)le64_to_cpu(fe->i_size),
>>>>>>>>  466                         le32_to_cpu(fe->i_flags));
>>>>>>>>  467
>>>>>>>>
>>>>>>>> If your encountered the similar issue in the past, please let me know, to 
>>>>>>>> see if there is any patch/discussion for this bug.
>>>>>>>>
>>>>>>>> Thanks a lot.
>>>>>>>> Gang
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ocfs2-devel mailing list
>>>>>>>> Ocfs2-devel at oss.oracle.com 
>>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-22  3:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-29 13:15 [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file Gang He
2016-03-31  2:56 ` Gang He
2016-03-31  3:14   ` Junxiao Bi
2016-04-21  7:59     ` Gang He
2016-04-21  9:05       ` Joseph Qi
2016-04-21  9:50         ` Gang He
2016-04-21 12:22           ` Joseph Qi
2016-04-22  3:17             ` Gang He
2016-04-22  3:26               ` Joseph Qi
2016-03-31  4:26   ` Joseph Qi
2016-03-31  6:23     ` Gang He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).