* [Ocfs2-devel] discuss about jbd2 assertion in defragment path
@ 2023-02-10 10:04 Heming Zhao via Ocfs2-devel
2023-02-14 2:52 ` Joseph Qi via Ocfs2-devel
0 siblings, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-10 10:04 UTC (permalink / raw)
To: joseph.qi, ocfs2-devel; +Cc: jack
Hello Joseph,
I am sorry to wake up a long time ago thread.
All mails of this thread (my patch is [1]):
[1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html
[2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html
[3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html
[4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html
I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my
patch [1] is right. At least, the fixing code is correct, the patch commit
log needs to polish.
This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()").
For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during
defragmenting, and it's not about "not enough credits" issue you ever said in [2].
I explain my thinking again in this mail.
the crash call flow:
ocfs2_defrag_extent //caller call it in while() loop.
+ handle = ocfs2_start_trans(osb, credits)
+ __ocfs2_move_extent
| + ocfs2_journal_access_di //[a]
| + ocfs2_split_extent //[b]
| | + if //[b.1]
| | | ocfs2_replace_extent_rec/ocfs2_split_and_insert
| | + else
| | ocfs2_try_to_merge_extent
| |
| + ocfs2_journal_dirty //[c]
|
+ ocfs2_commit_trans(osb, handle) //<== complete this handle
In my viewpoint, ocfs2_split_extent() is journal self-service function. I still
belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless.
In ocfs2_split_extent(), the code from the first code line to "if-else" code
area ([b.1]) doesn't need any journal protection, and we also could see there
are only read operations.
If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed
some journal operations from [a] to [b.1]), we could only delete [c]. So the
fixed code seems (only remove line [c]):
ocfs2_defrag_extent
+ handle = ocfs2_start_trans(osb, credits)
+ __ocfs2_move_extent
| + ocfs2_journal_access_di //[a] <-- keep it, but remove pair dirty action
| + ocfs2_split_extent //[b]
| + if //[b.1]
| | ocfs2_replace_extent_rec/ocfs2_split_and_insert
| + else
| ocfs2_try_to_merge_extent
|
+ ocfs2_commit_trans(osb, handle)
Thanks,
Heming
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-10 10:04 [Ocfs2-devel] discuss about jbd2 assertion in defragment path Heming Zhao via Ocfs2-devel @ 2023-02-14 2:52 ` Joseph Qi via Ocfs2-devel 2023-02-14 4:33 ` Heming Zhao via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Joseph Qi via Ocfs2-devel @ 2023-02-14 2:52 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: jack Hi, Sorry about the late reply. This thread is indeed a long time ago:( It seems that I said the two ocfs2_journal_access_di() are for different buffer head. Anyway, I have to recall the discussion before and get back to you. Thanks, Joseph On 2/10/23 6:04 PM, Heming Zhao wrote: > Hello Joseph, > > I am sorry to wake up a long time ago thread. > > All mails of this thread (my patch is [1]): > [1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html > [2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html > [3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html > [4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html > > I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my > patch [1] is right. At least, the fixing code is correct, the patch commit > log needs to polish. > > This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call > ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()"). > For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during > defragmenting, and it's not about "not enough credits" issue you ever said in [2]. > > I explain my thinking again in this mail. > > the crash call flow: > > ocfs2_defrag_extent //caller call it in while() loop. > + handle = ocfs2_start_trans(osb, credits) > + __ocfs2_move_extent > | + ocfs2_journal_access_di //[a] > | + ocfs2_split_extent //[b] > | | + if //[b.1] > | | | ocfs2_replace_extent_rec/ocfs2_split_and_insert > | | + else > | | ocfs2_try_to_merge_extent > | | > | + ocfs2_journal_dirty //[c] > | > + ocfs2_commit_trans(osb, handle) //<== complete this handle > > In my viewpoint, ocfs2_split_extent() is journal self-service function. I still > belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless. > In ocfs2_split_extent(), the code from the first code line to "if-else" code > area ([b.1]) doesn't need any journal protection, and we also could see there > are only read operations. > > If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed > some journal operations from [a] to [b.1]), we could only delete [c]. So the > fixed code seems (only remove line [c]): > > ocfs2_defrag_extent > + handle = ocfs2_start_trans(osb, credits) > + __ocfs2_move_extent > | + ocfs2_journal_access_di //[a] <-- keep it, but remove pair dirty action > | + ocfs2_split_extent //[b] > | + if //[b.1] > | | ocfs2_replace_extent_rec/ocfs2_split_and_insert > | + else > | ocfs2_try_to_merge_extent > | > + ocfs2_commit_trans(osb, handle) > > Thanks, > Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-14 2:52 ` Joseph Qi via Ocfs2-devel @ 2023-02-14 4:33 ` Heming Zhao via Ocfs2-devel 2023-02-14 11:08 ` Joseph Qi via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-14 4:33 UTC (permalink / raw) To: joseph.qi, ocfs2-devel; +Cc: jack On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: > Hi, Sorry about the late reply. > This thread is indeed a long time ago:( > It seems that I said the two ocfs2_journal_access_di() are for different > buffer head. Anyway, I have to recall the discussion before and get back > to you. > If you belive from [a] to [b.1] doesn't need any journal protection, my patch makes sense from theory. From code logic, if the defrag path (ocfs2_split_extent) doesn't call jbd2_journal_restart(), current code is absolute correct. Increasing credits can't help avoid jbd2 assert crash, but make more easily to trigger crash, because it makes jbd2_journal_extend() more easily to return "status > 0". In my view, the correct fix should delete the access/dirty pair and make sure the ocfs2_split_extent() is journal self-service. Different buffer head issue could be handled by rechecking ocfs2_split_extent(), make sure all journal handle branches are correct. I already checked ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. Thanks, Heming > > On 2/10/23 6:04 PM, Heming Zhao wrote: > > Hello Joseph, > > > > I am sorry to wake up a long time ago thread. > > > > All mails of this thread (my patch is [1]): > > [1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html > > [2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html > > [3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html > > [4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html > > > > I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my > > patch [1] is right. At least, the fixing code is correct, the patch commit > > log needs to polish. > > > > This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call > > ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()"). > > For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during > > defragmenting, and it's not about "not enough credits" issue you ever said in [2]. > > > > I explain my thinking again in this mail. > > > > the crash call flow: > > > > ocfs2_defrag_extent //caller call it in while() loop. > > + handle = ocfs2_start_trans(osb, credits) > > + __ocfs2_move_extent > > | + ocfs2_journal_access_di //[a] > > | + ocfs2_split_extent //[b] > > | | + if //[b.1] > > | | | ocfs2_replace_extent_rec/ocfs2_split_and_insert > > | | + else > > | | ocfs2_try_to_merge_extent > > | | > > | + ocfs2_journal_dirty //[c] > > | > > + ocfs2_commit_trans(osb, handle) //<== complete this handle > > > > In my viewpoint, ocfs2_split_extent() is journal self-service function. I still > > belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless. > > In ocfs2_split_extent(), the code from the first code line to "if-else" code > > area ([b.1]) doesn't need any journal protection, and we also could see there > > are only read operations. > > > > If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed > > some journal operations from [a] to [b.1]), we could only delete [c]. So the > > fixed code seems (only remove line [c]): > > > > ocfs2_defrag_extent > > + handle = ocfs2_start_trans(osb, credits) > > + __ocfs2_move_extent > > | + ocfs2_journal_access_di //[a] <-- keep it, but remove pair dirty action > > | + ocfs2_split_extent //[b] > > | + if //[b.1] > > | | ocfs2_replace_extent_rec/ocfs2_split_and_insert > > | + else > > | ocfs2_try_to_merge_extent > > | > > + ocfs2_commit_trans(osb, handle) > > > > Thanks, > > Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-14 4:33 ` Heming Zhao via Ocfs2-devel @ 2023-02-14 11:08 ` Joseph Qi via Ocfs2-devel 2023-02-14 11:48 ` Heming Zhao via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Joseph Qi via Ocfs2-devel @ 2023-02-14 11:08 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: jack Hi, On 2/14/23 12:33 PM, Heming Zhao wrote: > On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >> Hi, Sorry about the late reply. >> This thread is indeed a long time ago:( >> It seems that I said the two ocfs2_journal_access_di() are for different >> buffer head. Anyway, I have to recall the discussion before and get back >> to you. >> > If you belive from [a] to [b.1] doesn't need any journal protection, my patch > makes sense from theory. > > From code logic, if the defrag path (ocfs2_split_extent) doesn't call > jbd2_journal_restart(), current code is absolute correct. Increasing credits > can't help avoid jbd2 assert crash, but make more easily to trigger crash, > because it makes jbd2_journal_extend() more easily to return "status > 0". In my > view, the correct fix should delete the access/dirty pair and make sure the > ocfs2_split_extent() is journal self-service. > > Different buffer head issue could be handled by rechecking ocfs2_split_extent(), > make sure all journal handle branches are correct. I already checked > ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. ocfs2_split_extent are not just 'read' operations, it may grow the tree, that's why we have to do a transaction here. The whole code flow: ocfs2_ioctl_move_extents ocfs2_move_extents __ocfs2_move_extents_range ocfs2_defrag_extent [1] ocfs2_start_trans [a] __ocfs2_move_extent ocfs2_journal_access_di ocfs2_split_extent ocfs2_journal_dirty ocfs2_commit_trans ocfs2_move_extent [2] ocfs2_start_trans __ocfs2_move_extent ocfs2_journal_access_di ocfs2_split_extent ocfs2_journal_dirty ocfs2_commit_trans ocfs2_start_trans [b] ocfs2_journal_access_di ocfs2_journal_dirty ocfs2_commit_trans In above, [a] and [b] are different transaction start/commit pair, and each allocates different journal credits, even they are operate the same bh, they are still different operations. My another question is, __ocfs2_move_extent will be called by both ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the other path? Thanks, Joseph _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-14 11:08 ` Joseph Qi via Ocfs2-devel @ 2023-02-14 11:48 ` Heming Zhao via Ocfs2-devel 2023-02-15 2:06 ` Joseph Qi via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-14 11:48 UTC (permalink / raw) To: joseph.qi, ocfs2-devel; +Cc: jack On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: > Hi, > > On 2/14/23 12:33 PM, Heming Zhao wrote: > > On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: > >> Hi, Sorry about the late reply. > >> This thread is indeed a long time ago:( > >> It seems that I said the two ocfs2_journal_access_di() are for different > >> buffer head. Anyway, I have to recall the discussion before and get back > >> to you. > >> > > If you belive from [a] to [b.1] doesn't need any journal protection, my patch > > makes sense from theory. > > > > From code logic, if the defrag path (ocfs2_split_extent) doesn't call > > jbd2_journal_restart(), current code is absolute correct. Increasing credits > > can't help avoid jbd2 assert crash, but make more easily to trigger crash, > > because it makes jbd2_journal_extend() more easily to return "status > 0". In my > > view, the correct fix should delete the access/dirty pair and make sure the > > ocfs2_split_extent() is journal self-service. > > > > Different buffer head issue could be handled by rechecking ocfs2_split_extent(), > > make sure all journal handle branches are correct. I already checked > > ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. > > ocfs2_split_extent are not just 'read' operations, it may grow the tree, > that's why we have to do a transaction here. You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() calling ocfs2_journal_access_di() (below [a]) to (before) the handling ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) __ocfs2_move_extent + ocfs2_journal_access_di //[a] <------+ + ocfs2_split_extent //[b] | [a, b.1) area is read operations | + if //[b.1] <------+ | | A | | +---- from this line, this func starts r/w IOs & needs jbd2 operations | | | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() | + else | ocfs2_try_to_merge_extent () | + ocfs2_journal_dirty //[c] > > The whole code flow: > ocfs2_ioctl_move_extents > ocfs2_move_extents > __ocfs2_move_extents_range > ocfs2_defrag_extent [1] > ocfs2_start_trans [a] > __ocfs2_move_extent > ocfs2_journal_access_di > ocfs2_split_extent > ocfs2_journal_dirty > ocfs2_commit_trans > ocfs2_move_extent [2] > ocfs2_start_trans > __ocfs2_move_extent > ocfs2_journal_access_di > ocfs2_split_extent > ocfs2_journal_dirty > ocfs2_commit_trans > ocfs2_start_trans [b] > ocfs2_journal_access_di > ocfs2_journal_dirty > ocfs2_commit_trans > > In above, [a] and [b] are different transaction start/commit pair, and > each allocates different journal credits, even they are operate the same > bh, they are still different operations. I agree above writing, but it can't change my conclusion: ocfs2_split_extent() is journal self-service. It seems the author meaning: he wanted [a] to protect defragging actions (extents changes), wanted [b] to protect inode ctime changes. the ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the access/dirty pair in __ocfs2_move_extent() is potential crash risk. > > My another question is, __ocfs2_move_extent will be called by both > ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the > other path? The reason is userspace defragfs.ocfs2 never trigger ocfs2_move_extent(). see do_file_defrag(): struct ocfs2_move_extents me = { .me_start = 0, .me_len = buf->st_size, .me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, //<-- It makes do_defrag becomes 1 //in kernel space __ocfs2_move_extents_range() }; Thanks, Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-14 11:48 ` Heming Zhao via Ocfs2-devel @ 2023-02-15 2:06 ` Joseph Qi via Ocfs2-devel 2023-02-15 6:20 ` Heming Zhao via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Joseph Qi via Ocfs2-devel @ 2023-02-15 2:06 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: jack On 2/14/23 7:48 PM, Heming Zhao wrote: > On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >> Hi, >> >> On 2/14/23 12:33 PM, Heming Zhao wrote: >>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>> Hi, Sorry about the late reply. >>>> This thread is indeed a long time ago:( >>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>> buffer head. Anyway, I have to recall the discussion before and get back >>>> to you. >>>> >>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>> makes sense from theory. >>> >>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>> view, the correct fix should delete the access/dirty pair and make sure the >>> ocfs2_split_extent() is journal self-service. >>> >>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>> make sure all journal handle branches are correct. I already checked >>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >> >> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >> that's why we have to do a transaction here. > > You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() > calling ocfs2_journal_access_di() (below [a]) to (before) the handling > ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read > operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) > > __ocfs2_move_extent > + ocfs2_journal_access_di //[a] <------+ > + ocfs2_split_extent //[b] | [a, b.1) area is read operations > | + if //[b.1] <------+ > | | A > | | +---- from this line, this func starts r/w IOs & needs jbd2 operations > | | > | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() > | + else > | ocfs2_try_to_merge_extent () > | > + ocfs2_journal_dirty //[c] > >> >> The whole code flow: >> ocfs2_ioctl_move_extents >> ocfs2_move_extents >> __ocfs2_move_extents_range >> ocfs2_defrag_extent [1] >> ocfs2_start_trans [a] >> __ocfs2_move_extent >> ocfs2_journal_access_di >> ocfs2_split_extent >> ocfs2_journal_dirty >> ocfs2_commit_trans >> ocfs2_move_extent [2] >> ocfs2_start_trans >> __ocfs2_move_extent >> ocfs2_journal_access_di >> ocfs2_split_extent >> ocfs2_journal_dirty >> ocfs2_commit_trans >> ocfs2_start_trans [b] >> ocfs2_journal_access_di >> ocfs2_journal_dirty >> ocfs2_commit_trans >> >> In above, [a] and [b] are different transaction start/commit pair, and >> each allocates different journal credits, even they are operate the same >> bh, they are still different operations. > > I agree above writing, but it can't change my conclusion: ocfs2_split_extent() > is journal self-service. Don't understand what does 'journal self-service' mean. > It seems the author meaning: he wanted [a] to protect defragging actions (extents > changes), wanted [b] to protect inode ctime changes. the > ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the > access/dirty pair in __ocfs2_move_extent() is potential crash risk. > If so, what is transaction [a] protect? (no dirty buffer) Thanks, Joseph >> >> My another question is, __ocfs2_move_extent will be called by both >> ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the >> other path? > > The reason is userspace defragfs.ocfs2 never trigger ocfs2_move_extent(). > see do_file_defrag(): > > struct ocfs2_move_extents me = { > .me_start = 0, > .me_len = buf->st_size, > .me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, //<-- It makes do_defrag becomes 1 > //in kernel space __ocfs2_move_extents_range() > }; > > Thanks, > Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 2:06 ` Joseph Qi via Ocfs2-devel @ 2023-02-15 6:20 ` Heming Zhao via Ocfs2-devel 2023-02-15 6:42 ` Joseph Qi via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-15 6:20 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack On 2/15/23 10:06 AM, Joseph Qi wrote: > > > On 2/14/23 7:48 PM, Heming Zhao wrote: >> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>> Hi, >>> >>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>> Hi, Sorry about the late reply. >>>>> This thread is indeed a long time ago:( >>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>> to you. >>>>> >>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>> makes sense from theory. >>>> >>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>> view, the correct fix should delete the access/dirty pair and make sure the >>>> ocfs2_split_extent() is journal self-service. >>>> >>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>> make sure all journal handle branches are correct. I already checked >>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>> >>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>> that's why we have to do a transaction here. >> >> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >> >> __ocfs2_move_extent >> + ocfs2_journal_access_di //[a] <------+ >> + ocfs2_split_extent //[b] | [a, b.1) area is read operations >> | + if //[b.1] <------+ >> | | A >> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations >> | | >> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >> | + else >> | ocfs2_try_to_merge_extent () >> | >> + ocfs2_journal_dirty //[c] >> >>> >>> The whole code flow: >>> ocfs2_ioctl_move_extents >>> ocfs2_move_extents >>> __ocfs2_move_extents_range >>> ocfs2_defrag_extent [1] >>> ocfs2_start_trans [a] >>> __ocfs2_move_extent >>> ocfs2_journal_access_di >>> ocfs2_split_extent >>> ocfs2_journal_dirty >>> ocfs2_commit_trans >>> ocfs2_move_extent [2] >>> ocfs2_start_trans >>> __ocfs2_move_extent >>> ocfs2_journal_access_di >>> ocfs2_split_extent >>> ocfs2_journal_dirty >>> ocfs2_commit_trans >>> ocfs2_start_trans [b] >>> ocfs2_journal_access_di >>> ocfs2_journal_dirty >>> ocfs2_commit_trans >>> >>> In above, [a] and [b] are different transaction start/commit pair, and >>> each allocates different journal credits, even they are operate the same >>> bh, they are still different operations. >> >> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >> is journal self-service. > > Don't understand what does 'journal self-service' mean. Sorry for my English skill, I invent this word 'journal self-service'. The meaning is: this function could handle journal operations totally by itself. Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 journal start/stop pair. Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, we even could add these two func in ocfs2_split_extent(). Then any caller won't take care of any journal operations when calling ocfs2_split_extent(). > >> It seems the author meaning: he wanted [a] to protect defragging actions (extents >> changes), wanted [b] to protect inode ctime changes. the >> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >> > If so, what is transaction [a] protect? (no dirty buffer) Transaction [a] protects the dirty buffers in ocfs2_split_extent(). ocfs2_split_extent() has already correctly used access/dirty pair to mark/handle dirty buffers. Thanks, Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 6:20 ` Heming Zhao via Ocfs2-devel @ 2023-02-15 6:42 ` Joseph Qi via Ocfs2-devel 2023-02-15 7:29 ` Heming Zhao via Ocfs2-devel 2023-02-15 10:02 ` Heming Zhao via Ocfs2-devel 0 siblings, 2 replies; 12+ messages in thread From: Joseph Qi via Ocfs2-devel @ 2023-02-15 6:42 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: jack On 2/15/23 2:20 PM, Heming Zhao wrote: > On 2/15/23 10:06 AM, Joseph Qi wrote: >> >> >> On 2/14/23 7:48 PM, Heming Zhao wrote: >>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>> Hi, >>>> >>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>> Hi, Sorry about the late reply. >>>>>> This thread is indeed a long time ago:( >>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>> to you. >>>>>> >>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>> makes sense from theory. >>>>> >>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>> ocfs2_split_extent() is journal self-service. >>>>> >>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>> make sure all journal handle branches are correct. I already checked >>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>> >>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>> that's why we have to do a transaction here. >>> >>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>> >>> __ocfs2_move_extent >>> + ocfs2_journal_access_di //[a] <------+ >>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations >>> | + if //[b.1] <------+ >>> | | A >>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations >>> | | >>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>> | + else >>> | ocfs2_try_to_merge_extent () >>> | >>> + ocfs2_journal_dirty //[c] >>> >>>> >>>> The whole code flow: >>>> ocfs2_ioctl_move_extents >>>> ocfs2_move_extents >>>> __ocfs2_move_extents_range >>>> ocfs2_defrag_extent [1] >>>> ocfs2_start_trans [a] >>>> __ocfs2_move_extent >>>> ocfs2_journal_access_di >>>> ocfs2_split_extent >>>> ocfs2_journal_dirty >>>> ocfs2_commit_trans >>>> ocfs2_move_extent [2] >>>> ocfs2_start_trans >>>> __ocfs2_move_extent >>>> ocfs2_journal_access_di >>>> ocfs2_split_extent >>>> ocfs2_journal_dirty >>>> ocfs2_commit_trans >>>> ocfs2_start_trans [b] >>>> ocfs2_journal_access_di >>>> ocfs2_journal_dirty >>>> ocfs2_commit_trans >>>> >>>> In above, [a] and [b] are different transaction start/commit pair, and >>>> each allocates different journal credits, even they are operate the same >>>> bh, they are still different operations. >>> >>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>> is journal self-service. >> >> Don't understand what does 'journal self-service' mean. > > Sorry for my English skill, I invent this word 'journal self-service'. > The meaning is: this function could handle journal operations totally by itself. > Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 > journal start/stop pair. > > Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, > we even could add these two func in ocfs2_split_extent(). Then any caller won't > take care of any journal operations when calling ocfs2_split_extent(). > >> >>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>> changes), wanted [b] to protect inode ctime changes. the >>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>> >> If so, what is transaction [a] protect? (no dirty buffer) > > Transaction [a] protects the dirty buffers in ocfs2_split_extent(). > ocfs2_split_extent() has already correctly used access/dirty pair to > mark/handle dirty buffers. > Okay, I think I've gotten your idea now. You change looks reasonable. Could you please also check if move extents without auto defrag also has the same issue? Thanks, Joseph _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 6:42 ` Joseph Qi via Ocfs2-devel @ 2023-02-15 7:29 ` Heming Zhao via Ocfs2-devel 2023-02-15 10:02 ` Heming Zhao via Ocfs2-devel 1 sibling, 0 replies; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-15 7:29 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack On 2/15/23 2:42 PM, Joseph Qi wrote: > > > On 2/15/23 2:20 PM, Heming Zhao wrote: >> On 2/15/23 10:06 AM, Joseph Qi wrote: >>> >>> >>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>> Hi, >>>>> >>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>> Hi, Sorry about the late reply. >>>>>>> This thread is indeed a long time ago:( >>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>> to you. >>>>>>> >>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>> makes sense from theory. >>>>>> >>>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>> ocfs2_split_extent() is journal self-service. >>>>>> >>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>> make sure all journal handle branches are correct. I already checked >>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>> >>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>> that's why we have to do a transaction here. >>>> >>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>> >>>> __ocfs2_move_extent >>>> + ocfs2_journal_access_di //[a] <------+ >>>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations >>>> | + if //[b.1] <------+ >>>> | | A >>>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>> | | >>>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>> | + else >>>> | ocfs2_try_to_merge_extent () >>>> | >>>> + ocfs2_journal_dirty //[c] >>>> >>>>> >>>>> The whole code flow: >>>>> ocfs2_ioctl_move_extents >>>>> ocfs2_move_extents >>>>> __ocfs2_move_extents_range >>>>> ocfs2_defrag_extent [1] >>>>> ocfs2_start_trans [a] >>>>> __ocfs2_move_extent >>>>> ocfs2_journal_access_di >>>>> ocfs2_split_extent >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> ocfs2_move_extent [2] >>>>> ocfs2_start_trans >>>>> __ocfs2_move_extent >>>>> ocfs2_journal_access_di >>>>> ocfs2_split_extent >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> ocfs2_start_trans [b] >>>>> ocfs2_journal_access_di >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> >>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>> each allocates different journal credits, even they are operate the same >>>>> bh, they are still different operations. >>>> >>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>> is journal self-service. >>> >>> Don't understand what does 'journal self-service' mean. >> >> Sorry for my English skill, I invent this word 'journal self-service'. >> The meaning is: this function could handle journal operations totally by itself. >> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >> journal start/stop pair. >> >> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >> we even could add these two func in ocfs2_split_extent(). Then any caller won't >> take care of any journal operations when calling ocfs2_split_extent(). >> >>> >>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>> changes), wanted [b] to protect inode ctime changes. the >>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>> >>> If so, what is transaction [a] protect? (no dirty buffer) >> >> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >> ocfs2_split_extent() has already correctly used access/dirty pair to >> mark/handle dirty buffers. >> > Okay, I think I've gotten your idea now. You change looks reasonable. > Could you please also check if move extents without auto defrag also has > the same issue? OK, wait me for checking. Thanks, Heming > > Thanks, > Joseph > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 6:42 ` Joseph Qi via Ocfs2-devel 2023-02-15 7:29 ` Heming Zhao via Ocfs2-devel @ 2023-02-15 10:02 ` Heming Zhao via Ocfs2-devel 2023-02-15 12:04 ` Joseph Qi via Ocfs2-devel 1 sibling, 1 reply; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-15 10:02 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack On 2/15/23 2:42 PM, Joseph Qi wrote: > > > On 2/15/23 2:20 PM, Heming Zhao wrote: >> On 2/15/23 10:06 AM, Joseph Qi wrote: >>> >>> >>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>> Hi, >>>>> >>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>> Hi, Sorry about the late reply. >>>>>>> This thread is indeed a long time ago:( >>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>> to you. >>>>>>> >>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>> makes sense from theory. >>>>>> >>>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>> ocfs2_split_extent() is journal self-service. >>>>>> >>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>> make sure all journal handle branches are correct. I already checked >>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>> >>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>> that's why we have to do a transaction here. >>>> >>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>> >>>> __ocfs2_move_extent >>>> + ocfs2_journal_access_di //[a] <------+ >>>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations >>>> | + if //[b.1] <------+ >>>> | | A >>>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>> | | >>>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>> | + else >>>> | ocfs2_try_to_merge_extent () >>>> | >>>> + ocfs2_journal_dirty //[c] >>>> >>>>> >>>>> The whole code flow: >>>>> ocfs2_ioctl_move_extents >>>>> ocfs2_move_extents >>>>> __ocfs2_move_extents_range >>>>> ocfs2_defrag_extent [1] >>>>> ocfs2_start_trans [a] >>>>> __ocfs2_move_extent >>>>> ocfs2_journal_access_di >>>>> ocfs2_split_extent >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> ocfs2_move_extent [2] >>>>> ocfs2_start_trans >>>>> __ocfs2_move_extent >>>>> ocfs2_journal_access_di >>>>> ocfs2_split_extent >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> ocfs2_start_trans [b] >>>>> ocfs2_journal_access_di >>>>> ocfs2_journal_dirty >>>>> ocfs2_commit_trans >>>>> >>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>> each allocates different journal credits, even they are operate the same >>>>> bh, they are still different operations. >>>> >>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>> is journal self-service. >>> >>> Don't understand what does 'journal self-service' mean. >> >> Sorry for my English skill, I invent this word 'journal self-service'. >> The meaning is: this function could handle journal operations totally by itself. >> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >> journal start/stop pair. >> >> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >> we even could add these two func in ocfs2_split_extent(). Then any caller won't >> take care of any journal operations when calling ocfs2_split_extent(). >> >>> >>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>> changes), wanted [b] to protect inode ctime changes. the >>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>> >>> If so, what is transaction [a] protect? (no dirty buffer) >> >> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >> ocfs2_split_extent() has already correctly used access/dirty pair to >> mark/handle dirty buffers. >> > Okay, I think I've gotten your idea now. You change looks reasonable. > Could you please also check if move extents without auto defrag also has > the same issue? > I have finished code checking. ocfs2_move_extent() path has the same issue. My patch also works fine from this code branch. In theory, ocfs2_move_extent() has less opportunity to trigger crash, because userspace defrag.ocfs2 very likely set less than file size on 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this code branch. If you think my patch is acceptable, I will file v2 patch & plan only to change commit log in v2. Thanks, Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 10:02 ` Heming Zhao via Ocfs2-devel @ 2023-02-15 12:04 ` Joseph Qi via Ocfs2-devel 2023-02-16 14:58 ` Heming Zhao via Ocfs2-devel 0 siblings, 1 reply; 12+ messages in thread From: Joseph Qi via Ocfs2-devel @ 2023-02-15 12:04 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: jack On 2/15/23 6:02 PM, Heming Zhao wrote: > On 2/15/23 2:42 PM, Joseph Qi wrote: >> >> >> On 2/15/23 2:20 PM, Heming Zhao wrote: >>> On 2/15/23 10:06 AM, Joseph Qi wrote: >>>> >>>> >>>> On 2/14/23 7:48 PM, Heming Zhao wrote: >>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: >>>>>> Hi, >>>>>> >>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: >>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: >>>>>>>> Hi, Sorry about the late reply. >>>>>>>> This thread is indeed a long time ago:( >>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different >>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back >>>>>>>> to you. >>>>>>>> >>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch >>>>>>> makes sense from theory. >>>>>>> >>>>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call >>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits >>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, >>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my >>>>>>> view, the correct fix should delete the access/dirty pair and make sure the >>>>>>> ocfs2_split_extent() is journal self-service. >>>>>>> >>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), >>>>>>> make sure all journal handle branches are correct. I already checked >>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. >>>>>> >>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, >>>>>> that's why we have to do a transaction here. >>>>> >>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() >>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling >>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read >>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) >>>>> >>>>> __ocfs2_move_extent >>>>> + ocfs2_journal_access_di //[a] <------+ >>>>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations >>>>> | + if //[b.1] <------+ >>>>> | | A >>>>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations >>>>> | | >>>>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() >>>>> | + else >>>>> | ocfs2_try_to_merge_extent () >>>>> | >>>>> + ocfs2_journal_dirty //[c] >>>>> >>>>>> >>>>>> The whole code flow: >>>>>> ocfs2_ioctl_move_extents >>>>>> ocfs2_move_extents >>>>>> __ocfs2_move_extents_range >>>>>> ocfs2_defrag_extent [1] >>>>>> ocfs2_start_trans [a] >>>>>> __ocfs2_move_extent >>>>>> ocfs2_journal_access_di >>>>>> ocfs2_split_extent >>>>>> ocfs2_journal_dirty >>>>>> ocfs2_commit_trans >>>>>> ocfs2_move_extent [2] >>>>>> ocfs2_start_trans >>>>>> __ocfs2_move_extent >>>>>> ocfs2_journal_access_di >>>>>> ocfs2_split_extent >>>>>> ocfs2_journal_dirty >>>>>> ocfs2_commit_trans >>>>>> ocfs2_start_trans [b] >>>>>> ocfs2_journal_access_di >>>>>> ocfs2_journal_dirty >>>>>> ocfs2_commit_trans >>>>>> >>>>>> In above, [a] and [b] are different transaction start/commit pair, and >>>>>> each allocates different journal credits, even they are operate the same >>>>>> bh, they are still different operations. >>>>> >>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() >>>>> is journal self-service. >>>> >>>> Don't understand what does 'journal self-service' mean. >>> >>> Sorry for my English skill, I invent this word 'journal self-service'. >>> The meaning is: this function could handle journal operations totally by itself. >>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 >>> journal start/stop pair. >>> >>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, >>> we even could add these two func in ocfs2_split_extent(). Then any caller won't >>> take care of any journal operations when calling ocfs2_split_extent(). >>> >>>> >>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents >>>>> changes), wanted [b] to protect inode ctime changes. the >>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the >>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. >>>>> >>>> If so, what is transaction [a] protect? (no dirty buffer) >>> >>> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). >>> ocfs2_split_extent() has already correctly used access/dirty pair to >>> mark/handle dirty buffers. >>> >> Okay, I think I've gotten your idea now. You change looks reasonable. >> Could you please also check if move extents without auto defrag also has >> the same issue? >> > > I have finished code checking. ocfs2_move_extent() path has the same issue. > My patch also works fine from this code branch. > In theory, ocfs2_move_extent() has less opportunity to trigger crash, because > userspace defrag.ocfs2 very likely set less than file size on > 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this > code branch. > If we write a simple program to do move extents ioctl on the same ocfs2 (without auto defrag), can we trigger this crash? > If you think my patch is acceptable, I will file v2 patch & plan only to change > commit log in v2. > Sure, with above confirmed. _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path 2023-02-15 12:04 ` Joseph Qi via Ocfs2-devel @ 2023-02-16 14:58 ` Heming Zhao via Ocfs2-devel 0 siblings, 0 replies; 12+ messages in thread From: Heming Zhao via Ocfs2-devel @ 2023-02-16 14:58 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack On Wed, Feb 15, 2023 at 08:04:48PM +0800, Joseph Qi wrote: > > > On 2/15/23 6:02 PM, Heming Zhao wrote: > > On 2/15/23 2:42 PM, Joseph Qi wrote: > >> > >> > >> On 2/15/23 2:20 PM, Heming Zhao wrote: > >>> On 2/15/23 10:06 AM, Joseph Qi wrote: > >>>> > >>>> > >>>> On 2/14/23 7:48 PM, Heming Zhao wrote: > >>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote: > >>>>>> Hi, > >>>>>> > >>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote: > >>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote: > >>>>>>>> Hi, Sorry about the late reply. > >>>>>>>> This thread is indeed a long time ago:( > >>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different > >>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back > >>>>>>>> to you. > >>>>>>>> > >>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch > >>>>>>> makes sense from theory. > >>>>>>> > >>>>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call > >>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits > >>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash, > >>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my > >>>>>>> view, the correct fix should delete the access/dirty pair and make sure the > >>>>>>> ocfs2_split_extent() is journal self-service. > >>>>>>> > >>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(), > >>>>>>> make sure all journal handle branches are correct. I already checked > >>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner. > >>>>>> > >>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree, > >>>>>> that's why we have to do a transaction here. > >>>>> > >>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent() > >>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling > >>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read > >>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1]) > >>>>> > >>>>> __ocfs2_move_extent > >>>>> + ocfs2_journal_access_di //[a] <------+ > >>>>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations > >>>>> | + if //[b.1] <------+ > >>>>> | | A > >>>>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations > >>>>> | | > >>>>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert() > >>>>> | + else > >>>>> | ocfs2_try_to_merge_extent () > >>>>> | > >>>>> + ocfs2_journal_dirty //[c] > >>>>> > >>>>>> > >>>>>> The whole code flow: > >>>>>> ocfs2_ioctl_move_extents > >>>>>> ocfs2_move_extents > >>>>>> __ocfs2_move_extents_range > >>>>>> ocfs2_defrag_extent [1] > >>>>>> ocfs2_start_trans [a] > >>>>>> __ocfs2_move_extent > >>>>>> ocfs2_journal_access_di > >>>>>> ocfs2_split_extent > >>>>>> ocfs2_journal_dirty > >>>>>> ocfs2_commit_trans > >>>>>> ocfs2_move_extent [2] > >>>>>> ocfs2_start_trans > >>>>>> __ocfs2_move_extent > >>>>>> ocfs2_journal_access_di > >>>>>> ocfs2_split_extent > >>>>>> ocfs2_journal_dirty > >>>>>> ocfs2_commit_trans > >>>>>> ocfs2_start_trans [b] > >>>>>> ocfs2_journal_access_di > >>>>>> ocfs2_journal_dirty > >>>>>> ocfs2_commit_trans > >>>>>> > >>>>>> In above, [a] and [b] are different transaction start/commit pair, and > >>>>>> each allocates different journal credits, even they are operate the same > >>>>>> bh, they are still different operations. > >>>>> > >>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent() > >>>>> is journal self-service. > >>>> > >>>> Don't understand what does 'journal self-service' mean. > >>> > >>> Sorry for my English skill, I invent this word 'journal self-service'. > >>> The meaning is: this function could handle journal operations totally by itself. > >>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2 > >>> journal start/stop pair. > >>> > >>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function, > >>> we even could add these two func in ocfs2_split_extent(). Then any caller won't > >>> take care of any journal operations when calling ocfs2_split_extent(). > >>> > >>>> > >>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents > >>>>> changes), wanted [b] to protect inode ctime changes. the > >>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the > >>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk. > >>>>> > >>>> If so, what is transaction [a] protect? (no dirty buffer) > >>> > >>> Transaction [a] protects the dirty buffers in ocfs2_split_extent(). > >>> ocfs2_split_extent() has already correctly used access/dirty pair to > >>> mark/handle dirty buffers. > >>> > >> Okay, I think I've gotten your idea now. You change looks reasonable. > >> Could you please also check if move extents without auto defrag also has > >> the same issue? > >> > > > > I have finished code checking. ocfs2_move_extent() path has the same issue. > > My patch also works fine from this code branch. > > In theory, ocfs2_move_extent() has less opportunity to trigger crash, because > > userspace defrag.ocfs2 very likely set less than file size on > > 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this > > code branch. > > > > If we write a simple program to do move extents ioctl on the same ocfs2 > (without auto defrag), can we trigger this crash? For this question, first, I modified defrag.ocfs2 to test non-auto defrag flow, and found three bugs. I will describe it at the end of this mail. Second, I failed to trigger the crash with non-auto defrag flow. In theory, the non-auto & auto share the same defrag path from __ocfs2_move_extent(). Before I send mail to start this dicussion, I had tested some days, and failed to trigger the crash. This crash, happened last year, one of SUSE customer triggered it. But for confidence, I can't talk too much about it. Under theory analysis, the crash derives from ocfs2_extend_trans() calling jbd2_journal_restart() and getting return value "status > 0". In jbd2_journal_extend(), there are only two cases (A & B) for return "value > 0". ```<A> /* Don't extend a locked-down transaction! */ if (transaction->t_state != T_RUNNING) { jbd2_debug(3, "denied handle %p %d blocks: " "transaction not running\n", handle, nblocks); goto error_out; } ``` ```<B> if (wanted > journal->j_max_transaction_buffers) { jbd2_debug(3, "denied handle %p %d blocks: " "transaction too large\n", handle, nblocks); atomic_sub(nblocks, &transaction->t_outstanding_credits); goto error_out; } ``` For case <A>, in my view, if the transaction state is not T_RUNNING, which should be committing or committed by jbd2. So I did: - generated depth 3 B+tree - modified defragfs.ocfs2 with/without OCFS2_MOVE_EXT_FL_PART_DEFRAG. - mount options using commit_interval=[1,3] - enable some mlog(0) to output log for slowing down ocfs2 layer defrag speed. For case <B>, I did - generated depth 3 B+tree - modified defragfs.ocfs2 with/without OCFS2_MOVE_EXT_FL_PART_DEFRAG. - mount options using: resv_level=[0,2] commit_interval=[1,3,5,30] - running defragfs, meanwhile, running dd to issue write IOs. Both two cases with lots of times test failed to trigger crash. I only met ocfs2 ininite loop issue: ocfs2 kept busy all the time, the defrag job is running all the time. At last, I describe the three bugs for non-auto defrag flow. The fixing code as below: ``` diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c index 192cad0662d8..19693894aeec 100644 --- a/fs/ocfs2/move_extents.c +++ b/fs/ocfs2/move_extents.c @@ -444,7 +444,7 @@ static int ocfs2_find_victim_alloc_group(struct inode *inode, bg = (struct ocfs2_group_desc *)gd_bh->b_data; if (vict_blkno < (le64_to_cpu(bg->bg_blkno) + - le16_to_cpu(bg->bg_bits))) { + (le16_to_cpu(bg->bg_bits) << bits_per_unit))) { *ret_bh = gd_bh; *vict_bit = (vict_blkno - blkno) >> @@ -559,6 +559,7 @@ static void ocfs2_probe_alloc_group(struct inode *inode, struct buffer_head *bh, last_free_bits++; if (last_free_bits == move_len) { + i -= move_len; *goal_bit = i; *phys_cpos = base_cpos + i; break; @@ -1030,18 +1031,19 @@ int ocfs2_ioctl_move_extents(struct file *filp, void __user *argp) context->range = ⦥ + /* + * ok, the default theshold for the defragmentation + * is 1M, since our maximum clustersize was 1M also. + * any thought? + */ + if (!range.me_threshold) + range.me_threshold = 1024 * 1024; + + if (range.me_threshold > i_size_read(inode)) + range.me_threshold = i_size_read(inode); + if (range.me_flags & OCFS2_MOVE_EXT_FL_AUTO_DEFRAG) { context->auto_defrag = 1; - /* - * ok, the default theshold for the defragmentation - * is 1M, since our maximum clustersize was 1M also. - * any thought? - */ - if (!range.me_threshold) - range.me_threshold = 1024 * 1024; - - if (range.me_threshold > i_size_read(inode)) - range.me_threshold = i_size_read(inode); if (range.me_flags & OCFS2_MOVE_EXT_FL_PART_DEFRAG) context->partial = 1; ``` I will file a new patch for above fix. Thanks, Heming _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-02-16 14:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-10 10:04 [Ocfs2-devel] discuss about jbd2 assertion in defragment path Heming Zhao via Ocfs2-devel 2023-02-14 2:52 ` Joseph Qi via Ocfs2-devel 2023-02-14 4:33 ` Heming Zhao via Ocfs2-devel 2023-02-14 11:08 ` Joseph Qi via Ocfs2-devel 2023-02-14 11:48 ` Heming Zhao via Ocfs2-devel 2023-02-15 2:06 ` Joseph Qi via Ocfs2-devel 2023-02-15 6:20 ` Heming Zhao via Ocfs2-devel 2023-02-15 6:42 ` Joseph Qi via Ocfs2-devel 2023-02-15 7:29 ` Heming Zhao via Ocfs2-devel 2023-02-15 10:02 ` Heming Zhao via Ocfs2-devel 2023-02-15 12:04 ` Joseph Qi via Ocfs2-devel 2023-02-16 14:58 ` Heming Zhao via Ocfs2-devel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.