* [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5
@ 2024-12-04 3:32 Heming Zhao
2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
2024-12-04 3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao
0 siblings, 2 replies; 11+ messages in thread
From: Heming Zhao @ 2024-12-04 3:32 UTC (permalink / raw)
To: joseph.qi, ocfs2-devel; +Cc: Heming Zhao, linux-kernel, akpm, gregkh
SUSE QA team detected a mistake in my commit dfe6c5692fb5 ("ocfs2: fix
the la space leak when unmounting an ocfs2 volume"). I am very sorry for
my error. (If my eyes are correct) From the mailling list mails, this
patch shouldn't be applied to 4.19 5.4 5.10 5.15 6.1 6.6, and these
branches should perform a revert operation.
Reason for revert:
In commit dfe6c5692fb5, I mistakenly wrote: "This bug has existed since
the initial OCFS2 code.". The statement is wrong. The correct
introduction commit is 30dd3478c3cd. IOW, if the branch doesn't include
30dd3478c3cd, dfe6c5692fb5 should also not be included.
I am not sure whether the revert and resubmit patch operations are
correct or not. Please guide me if the approach is incorrect.
Heming Zhao (2):
ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2
volume"
ocfs2: fix the la space leak when unmounting an ocfs2 volume
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao @ 2024-12-04 3:32 ` Heming Zhao 2024-12-04 3:47 ` Joseph Qi 2024-12-04 3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao 1 sibling, 1 reply; 11+ messages in thread From: Heming Zhao @ 2024-12-04 3:32 UTC (permalink / raw) To: joseph.qi, ocfs2-devel Cc: Heming Zhao, linux-kernel, akpm, gregkh, Heming Zhao, stable This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume"). In commit dfe6c5692fb5, the commit log stating "This bug has existed since the initial OCFS2 code." is incorrect. The correct introduction commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume") Signed-off-by: Heming Zhao <heing.zhao@suse.com> Cc: <stable@vger.kernel.org> --- fs/ocfs2/localalloc.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 8ac42ea81a17..5df34561c551 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, start = bit_off + 1; } - /* clear the contiguous bits until the end boundary */ - if (count) { - blkno = la_start_blk + - ocfs2_clusters_to_blocks(osb->sb, - start - count); - - trace_ocfs2_sync_local_to_main_free( - count, start - count, - (unsigned long long)la_start_blk, - (unsigned long long)blkno); - - status = ocfs2_release_clusters(handle, - main_bm_inode, - main_bm_bh, blkno, - count); - if (status < 0) - mlog_errno(status); - } - bail: if (status) mlog_errno(status); -- 2.43.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao @ 2024-12-04 3:47 ` Joseph Qi 2024-12-04 6:46 ` Heming Zhao 0 siblings, 1 reply; 11+ messages in thread From: Joseph Qi @ 2024-12-04 3:47 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable On 12/4/24 11:32 AM, Heming Zhao wrote: > This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when > unmounting an ocfs2 volume"). > > In commit dfe6c5692fb5, the commit log stating "This bug has existed > since the initial OCFS2 code." is incorrect. The correct introduction > commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). > Could you please elaborate more how it happens? And it seems no difference with the new version. So we may submit a standalone revert patch to those backported stable kernels (< 6.10). Thanks, Joseph > Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume") > Signed-off-by: Heming Zhao <heing.zhao@suse.com> > Cc: <stable@vger.kernel.org> > --- > fs/ocfs2/localalloc.c | 19 ------------------- > 1 file changed, 19 deletions(-) > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 8ac42ea81a17..5df34561c551 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = bit_off + 1; > } > > - /* clear the contiguous bits until the end boundary */ > - if (count) { > - blkno = la_start_blk + > - ocfs2_clusters_to_blocks(osb->sb, > - start - count); > - > - trace_ocfs2_sync_local_to_main_free( > - count, start - count, > - (unsigned long long)la_start_blk, > - (unsigned long long)blkno); > - > - status = ocfs2_release_clusters(handle, > - main_bm_inode, > - main_bm_bh, blkno, > - count); > - if (status < 0) > - mlog_errno(status); > - } > - > bail: > if (status) > mlog_errno(status); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 3:47 ` Joseph Qi @ 2024-12-04 6:46 ` Heming Zhao 2024-12-04 9:28 ` Joseph Qi 2024-12-12 8:18 ` Greg KH 0 siblings, 2 replies; 11+ messages in thread From: Heming Zhao @ 2024-12-04 6:46 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable On 12/4/24 11:47, Joseph Qi wrote: > > > On 12/4/24 11:32 AM, Heming Zhao wrote: >> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >> unmounting an ocfs2 volume"). >> >> In commit dfe6c5692fb5, the commit log stating "This bug has existed >> since the initial OCFS2 code." is incorrect. The correct introduction >> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >> > > Could you please elaborate more how it happens? > And it seems no difference with the new version. So we may submit a > standalone revert patch to those backported stable kernels (< 6.10). commit log from patch [2/2] should be revised. change: This bug has existed since the initial OCFS2 code. to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") ---- See below for the details of patch [1/2]. following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". static int ocfs2_sync_local_to_main() { ... ... 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) 2 != -1) { 3 if ((bit_off < left) && (bit_off == start)) { 4 count++; 5 start++; 6 continue; 7 } 8 if (count) { 9 blkno = la_start_blk + 10 ocfs2_clusters_to_blocks(osb->sb, 11 start - count); 12 13 trace_ocfs2_sync_local_to_main_free(); 14 15 status = ocfs2_release_clusters(handle, 16 main_bm_inode, 17 main_bm_bh, blkno, 18 count); 19 if (status < 0) { 20 mlog_errno(status); 21 goto bail; 22 } 23 } 24 if (bit_off >= left) 25 break; 26 count = 1; 27 start = bit_off + 1; 28 } 29 30 /* clear the contiguous bits until the end boundary */ 31 if (count) { 32 blkno = la_start_blk + 33 ocfs2_clusters_to_blocks(osb->sb, 34 start - count); 35 36 trace_ocfs2_sync_local_to_main_free(); 37 38 status = ocfs2_release_clusters(handle, 39 main_bm_inode, 40 main_bm_bh, blkno, 41 count); 42 if (status < 0) 43 mlog_errno(status); 44 } ... ... } bug flow: 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. 5. code runs to line 31, this area code releases space to main_bm for the same gd again. kernel will report the following likely error: OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 thanks, Heming ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 6:46 ` Heming Zhao @ 2024-12-04 9:28 ` Joseph Qi 2024-12-04 11:11 ` Heming Zhao 2024-12-04 11:34 ` Heming Zhao 2024-12-12 8:18 ` Greg KH 1 sibling, 2 replies; 11+ messages in thread From: Joseph Qi @ 2024-12-04 9:28 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 2:46 PM, Heming Zhao wrote: > On 12/4/24 11:47, Joseph Qi wrote: >> >> >> On 12/4/24 11:32 AM, Heming Zhao wrote: >>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>> unmounting an ocfs2 volume"). >>> >>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>> since the initial OCFS2 code." is incorrect. The correct introduction >>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>> >> >> Could you please elaborate more how it happens? >> And it seems no difference with the new version. So we may submit a >> standalone revert patch to those backported stable kernels (< 6.10). > > commit log from patch [2/2] should be revised. > change: This bug has existed since the initial OCFS2 code. > to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") > > ---- > See below for the details of patch [1/2]. > > following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". > > static int ocfs2_sync_local_to_main() > { > ... ... > 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) > 2 != -1) { > 3 if ((bit_off < left) && (bit_off == start)) { > 4 count++; > 5 start++; > 6 continue; > 7 } > 8 if (count) { > 9 blkno = la_start_blk + > 10 ocfs2_clusters_to_blocks(osb->sb, > 11 start - count); > 12 > 13 trace_ocfs2_sync_local_to_main_free(); > 14 > 15 status = ocfs2_release_clusters(handle, > 16 main_bm_inode, > 17 main_bm_bh, blkno, > 18 count); > 19 if (status < 0) { > 20 mlog_errno(status); > 21 goto bail; > 22 } > 23 } > 24 if (bit_off >= left) > 25 break; > 26 count = 1; > 27 start = bit_off + 1; > 28 } > 29 > 30 /* clear the contiguous bits until the end boundary */ > 31 if (count) { > 32 blkno = la_start_blk + > 33 ocfs2_clusters_to_blocks(osb->sb, > 34 start - count); > 35 > 36 trace_ocfs2_sync_local_to_main_free(); > 37 > 38 status = ocfs2_release_clusters(handle, > 39 main_bm_inode, > 40 main_bm_bh, blkno, > 41 count); > 42 if (status < 0) > 43 mlog_errno(status); > 44 } > ... ... > } > > bug flow: > 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. > 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. > 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. > 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. > 5. code runs to line 31, this area code releases space to main_bm for the same gd again. > > kernel will report the following likely error: > OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 > Okay, IIUC, it seems we have to: 1. revert commit dfe6c5692fb5 (so does stable kernel). 2. fix 30dd3478c3cd in following way: diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 5df34561c551..f0feadac2ef1 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, start = count = 0; left = le32_to_cpu(alloc->id1.bitmap1.i_total); - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= left) { - if (bit_off == start) { + if ((bit_off < left) && (bit_off == start)) { count++; start++; continue; @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, goto bail; } } - + if (bit_off >= left) + break; count = 1; start = bit_off + 1; } Thanks, Joseph ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 9:28 ` Joseph Qi @ 2024-12-04 11:11 ` Heming Zhao 2024-12-04 11:34 ` Heming Zhao 1 sibling, 0 replies; 11+ messages in thread From: Heming Zhao @ 2024-12-04 11:11 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 17:28, Joseph Qi wrote: > > > On 12/4/24 2:46 PM, Heming Zhao wrote: >> On 12/4/24 11:47, Joseph Qi wrote: >>> >>> >>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>> unmounting an ocfs2 volume"). >>>> >>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>> >>> >>> Could you please elaborate more how it happens? >>> And it seems no difference with the new version. So we may submit a >>> standalone revert patch to those backported stable kernels (< 6.10). >> >> commit log from patch [2/2] should be revised. >> change: This bug has existed since the initial OCFS2 code. >> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >> >> ---- >> See below for the details of patch [1/2]. >> >> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >> >> static int ocfs2_sync_local_to_main() >> { >> ... ... >> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >> 2 != -1) { >> 3 if ((bit_off < left) && (bit_off == start)) { >> 4 count++; >> 5 start++; >> 6 continue; >> 7 } >> 8 if (count) { >> 9 blkno = la_start_blk + >> 10 ocfs2_clusters_to_blocks(osb->sb, >> 11 start - count); >> 12 >> 13 trace_ocfs2_sync_local_to_main_free(); >> 14 >> 15 status = ocfs2_release_clusters(handle, >> 16 main_bm_inode, >> 17 main_bm_bh, blkno, >> 18 count); >> 19 if (status < 0) { >> 20 mlog_errno(status); >> 21 goto bail; >> 22 } >> 23 } >> 24 if (bit_off >= left) >> 25 break; >> 26 count = 1; >> 27 start = bit_off + 1; >> 28 } >> 29 >> 30 /* clear the contiguous bits until the end boundary */ >> 31 if (count) { >> 32 blkno = la_start_blk + >> 33 ocfs2_clusters_to_blocks(osb->sb, >> 34 start - count); >> 35 >> 36 trace_ocfs2_sync_local_to_main_free(); >> 37 >> 38 status = ocfs2_release_clusters(handle, >> 39 main_bm_inode, >> 40 main_bm_bh, blkno, >> 41 count); >> 42 if (status < 0) >> 43 mlog_errno(status); >> 44 } >> ... ... >> } >> >> bug flow: >> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >> >> kernel will report the following likely error: >> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >> > > Okay, IIUC, it seems we have to: > 1. revert commit dfe6c5692fb5 (so does stable kernel). OK. > 2. fix 30dd3478c3cd in following way: It looks good to me. I will send v2 patch. -Heming > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 5df34561c551..f0feadac2ef1 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = count = 0; > left = le32_to_cpu(alloc->id1.bitmap1.i_total); > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= > left) { > - if (bit_off == start) { > + if ((bit_off < left) && (bit_off == start)) { > count++; > start++; > continue; > @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > goto bail; > } > } > - > + if (bit_off >= left) > + break; > count = 1; > start = bit_off + 1; > } > > Thanks, > Joseph > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 9:28 ` Joseph Qi 2024-12-04 11:11 ` Heming Zhao @ 2024-12-04 11:34 ` Heming Zhao 2024-12-04 12:09 ` Joseph Qi 1 sibling, 1 reply; 11+ messages in thread From: Heming Zhao @ 2024-12-04 11:34 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 17:28, Joseph Qi wrote: > > > On 12/4/24 2:46 PM, Heming Zhao wrote: >> On 12/4/24 11:47, Joseph Qi wrote: >>> >>> >>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>> unmounting an ocfs2 volume"). >>>> >>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>> >>> >>> Could you please elaborate more how it happens? >>> And it seems no difference with the new version. So we may submit a >>> standalone revert patch to those backported stable kernels (< 6.10). >> >> commit log from patch [2/2] should be revised. >> change: This bug has existed since the initial OCFS2 code. >> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >> >> ---- >> See below for the details of patch [1/2]. >> >> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >> >> static int ocfs2_sync_local_to_main() >> { >> ... ... >> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >> 2 != -1) { >> 3 if ((bit_off < left) && (bit_off == start)) { >> 4 count++; >> 5 start++; >> 6 continue; >> 7 } >> 8 if (count) { >> 9 blkno = la_start_blk + >> 10 ocfs2_clusters_to_blocks(osb->sb, >> 11 start - count); >> 12 >> 13 trace_ocfs2_sync_local_to_main_free(); >> 14 >> 15 status = ocfs2_release_clusters(handle, >> 16 main_bm_inode, >> 17 main_bm_bh, blkno, >> 18 count); >> 19 if (status < 0) { >> 20 mlog_errno(status); >> 21 goto bail; >> 22 } >> 23 } >> 24 if (bit_off >= left) >> 25 break; >> 26 count = 1; >> 27 start = bit_off + 1; >> 28 } >> 29 >> 30 /* clear the contiguous bits until the end boundary */ >> 31 if (count) { >> 32 blkno = la_start_blk + >> 33 ocfs2_clusters_to_blocks(osb->sb, >> 34 start - count); >> 35 >> 36 trace_ocfs2_sync_local_to_main_free(); >> 37 >> 38 status = ocfs2_release_clusters(handle, >> 39 main_bm_inode, >> 40 main_bm_bh, blkno, >> 41 count); >> 42 if (status < 0) >> 43 mlog_errno(status); >> 44 } >> ... ... >> } >> >> bug flow: >> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >> >> kernel will report the following likely error: >> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >> > > Okay, IIUC, it seems we have to: > 1. revert commit dfe6c5692fb5 (so does stable kernel). > 2. fix 30dd3478c3cd in following way: > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 5df34561c551..f0feadac2ef1 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = count = 0; > left = le32_to_cpu(alloc->id1.bitmap1.i_total); > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= > left) { The ocfs2_find_next_zero_bit() always returns a value within the range [0, left], do you like the following code? - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < - left) { + for(;;) { + bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); -Heming > - if (bit_off == start) { > + if ((bit_off < left) && (bit_off == start)) { > count++; > start++; > continue; > @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > goto bail; > } > } > - > + if (bit_off >= left) > + break; > count = 1; > start = bit_off + 1; > } > > Thanks, > Joseph > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 11:34 ` Heming Zhao @ 2024-12-04 12:09 ` Joseph Qi 0 siblings, 0 replies; 11+ messages in thread From: Joseph Qi @ 2024-12-04 12:09 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 7:34 PM, Heming Zhao wrote: > On 12/4/24 17:28, Joseph Qi wrote: >> >> >> On 12/4/24 2:46 PM, Heming Zhao wrote: >>> On 12/4/24 11:47, Joseph Qi wrote: >>>> >>>> >>>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>>> unmounting an ocfs2 volume"). >>>>> >>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>>> >>>> >>>> Could you please elaborate more how it happens? >>>> And it seems no difference with the new version. So we may submit a >>>> standalone revert patch to those backported stable kernels (< 6.10). >>> >>> commit log from patch [2/2] should be revised. >>> change: This bug has existed since the initial OCFS2 code. >>> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >>> >>> ---- >>> See below for the details of patch [1/2]. >>> >>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >>> >>> static int ocfs2_sync_local_to_main() >>> { >>> ... ... >>> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >>> 2 != -1) { >>> 3 if ((bit_off < left) && (bit_off == start)) { >>> 4 count++; >>> 5 start++; >>> 6 continue; >>> 7 } >>> 8 if (count) { >>> 9 blkno = la_start_blk + >>> 10 ocfs2_clusters_to_blocks(osb->sb, >>> 11 start - count); >>> 12 >>> 13 trace_ocfs2_sync_local_to_main_free(); >>> 14 >>> 15 status = ocfs2_release_clusters(handle, >>> 16 main_bm_inode, >>> 17 main_bm_bh, blkno, >>> 18 count); >>> 19 if (status < 0) { >>> 20 mlog_errno(status); >>> 21 goto bail; >>> 22 } >>> 23 } >>> 24 if (bit_off >= left) >>> 25 break; >>> 26 count = 1; >>> 27 start = bit_off + 1; >>> 28 } >>> 29 >>> 30 /* clear the contiguous bits until the end boundary */ >>> 31 if (count) { >>> 32 blkno = la_start_blk + >>> 33 ocfs2_clusters_to_blocks(osb->sb, >>> 34 start - count); >>> 35 >>> 36 trace_ocfs2_sync_local_to_main_free(); >>> 37 >>> 38 status = ocfs2_release_clusters(handle, >>> 39 main_bm_inode, >>> 40 main_bm_bh, blkno, >>> 41 count); >>> 42 if (status < 0) >>> 43 mlog_errno(status); >>> 44 } >>> ... ... >>> } >>> >>> bug flow: >>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >>> >>> kernel will report the following likely error: >>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >>> >> >> Okay, IIUC, it seems we have to: >> 1. revert commit dfe6c5692fb5 (so does stable kernel). >> 2. fix 30dd3478c3cd in following way: >> >> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c >> index 5df34561c551..f0feadac2ef1 100644 >> --- a/fs/ocfs2/localalloc.c >> +++ b/fs/ocfs2/localalloc.c >> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, >> start = count = 0; >> left = le32_to_cpu(alloc->id1.bitmap1.i_total); >> - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < >> + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= >> left) { > > The ocfs2_find_next_zero_bit() always returns a value within the range [0, left], You're right. > do you like the following code? > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > - left) { > + for(;;) { > + bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); > Or simplify to: while (1) { bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); ... } Thanks, Joseph ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 6:46 ` Heming Zhao 2024-12-04 9:28 ` Joseph Qi @ 2024-12-12 8:18 ` Greg KH 1 sibling, 0 replies; 11+ messages in thread From: Greg KH @ 2024-12-12 8:18 UTC (permalink / raw) To: Heming Zhao Cc: Joseph Qi, ocfs2-devel, linux-kernel, akpm, Heming Zhao, stable On Wed, Dec 04, 2024 at 02:46:15PM +0800, Heming Zhao wrote: > On 12/4/24 11:47, Joseph Qi wrote: > > > > > > On 12/4/24 11:32 AM, Heming Zhao wrote: > > > This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when > > > unmounting an ocfs2 volume"). > > > > > > In commit dfe6c5692fb5, the commit log stating "This bug has existed > > > since the initial OCFS2 code." is incorrect. The correct introduction > > > commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). > > > > > > > Could you please elaborate more how it happens? > > And it seems no difference with the new version. So we may submit a > > standalone revert patch to those backported stable kernels (< 6.10). > > commit log from patch [2/2] should be revised. > change: This bug has existed since the initial OCFS2 code. > to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") So can you please send a new version of this series? thanks, greg k-h ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume 2024-12-04 3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao 2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao @ 2024-12-04 3:32 ` Heming Zhao 2024-12-04 4:08 ` Heming Zhao 1 sibling, 1 reply; 11+ messages in thread From: Heming Zhao @ 2024-12-04 3:32 UTC (permalink / raw) To: joseph.qi, ocfs2-devel; +Cc: Heming Zhao, linux-kernel, akpm, gregkh This bug has existed since the initial OCFS2 code. The code logic in ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous free bits, which causes an OCFS2 volume to lose the last free clusters of LA window on each umount command. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Fixes: 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") --- fs/ocfs2/localalloc.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 5df34561c551..8ac42ea81a17 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -1002,6 +1002,25 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, start = bit_off + 1; } + /* clear the contiguous bits until the end boundary */ + if (count) { + blkno = la_start_blk + + ocfs2_clusters_to_blocks(osb->sb, + start - count); + + trace_ocfs2_sync_local_to_main_free( + count, start - count, + (unsigned long long)la_start_blk, + (unsigned long long)blkno); + + status = ocfs2_release_clusters(handle, + main_bm_inode, + main_bm_bh, blkno, + count); + if (status < 0) + mlog_errno(status); + } + bail: if (status) mlog_errno(status); -- 2.43.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume 2024-12-04 3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao @ 2024-12-04 4:08 ` Heming Zhao 0 siblings, 0 replies; 11+ messages in thread From: Heming Zhao @ 2024-12-04 4:08 UTC (permalink / raw) To: joseph.qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh On 12/4/24 11:32, Heming Zhao wrote: > This bug has existed since the initial OCFS2 code. The code logic in forgot revise commit, very stupid mistake. > ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous > free bits, which causes an OCFS2 volume to lose the last free clusters of > LA window on each umount command. > > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > Fixes: 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") > --- > fs/ocfs2/localalloc.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 5df34561c551..8ac42ea81a17 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -1002,6 +1002,25 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = bit_off + 1; > } > > + /* clear the contiguous bits until the end boundary */ > + if (count) { > + blkno = la_start_blk + > + ocfs2_clusters_to_blocks(osb->sb, > + start - count); > + > + trace_ocfs2_sync_local_to_main_free( > + count, start - count, > + (unsigned long long)la_start_blk, > + (unsigned long long)blkno); > + > + status = ocfs2_release_clusters(handle, > + main_bm_inode, > + main_bm_bh, blkno, > + count); > + if (status < 0) > + mlog_errno(status); > + } > + > bail: > if (status) > mlog_errno(status); ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-12-12 8:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-04 3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao 2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao 2024-12-04 3:47 ` Joseph Qi 2024-12-04 6:46 ` Heming Zhao 2024-12-04 9:28 ` Joseph Qi 2024-12-04 11:11 ` Heming Zhao 2024-12-04 11:34 ` Heming Zhao 2024-12-04 12:09 ` Joseph Qi 2024-12-12 8:18 ` Greg KH 2024-12-04 3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao 2024-12-04 4:08 ` Heming Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox