* [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" [not found] <20241204033243.8273-1-heming.zhao@suse.com> @ 2024-12-04 3:32 ` Heming Zhao 2024-12-04 3:47 ` Joseph Qi 0 siblings, 1 reply; 8+ messages in thread From: Heming Zhao @ 2024-12-04 3:32 UTC (permalink / raw) To: joseph.qi, ocfs2-devel Cc: Heming Zhao, linux-kernel, akpm, gregkh, Heming Zhao, stable This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume"). In commit dfe6c5692fb5, the commit log stating "This bug has existed since the initial OCFS2 code." is incorrect. The correct introduction commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume") Signed-off-by: Heming Zhao <heing.zhao@suse.com> Cc: <stable@vger.kernel.org> --- fs/ocfs2/localalloc.c | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 8ac42ea81a17..5df34561c551 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, start = bit_off + 1; } - /* clear the contiguous bits until the end boundary */ - if (count) { - blkno = la_start_blk + - ocfs2_clusters_to_blocks(osb->sb, - start - count); - - trace_ocfs2_sync_local_to_main_free( - count, start - count, - (unsigned long long)la_start_blk, - (unsigned long long)blkno); - - status = ocfs2_release_clusters(handle, - main_bm_inode, - main_bm_bh, blkno, - count); - if (status < 0) - mlog_errno(status); - } - bail: if (status) mlog_errno(status); -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao @ 2024-12-04 3:47 ` Joseph Qi 2024-12-04 6:46 ` Heming Zhao 0 siblings, 1 reply; 8+ messages in thread From: Joseph Qi @ 2024-12-04 3:47 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable On 12/4/24 11:32 AM, Heming Zhao wrote: > This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when > unmounting an ocfs2 volume"). > > In commit dfe6c5692fb5, the commit log stating "This bug has existed > since the initial OCFS2 code." is incorrect. The correct introduction > commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). > Could you please elaborate more how it happens? And it seems no difference with the new version. So we may submit a standalone revert patch to those backported stable kernels (< 6.10). Thanks, Joseph > Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume") > Signed-off-by: Heming Zhao <heing.zhao@suse.com> > Cc: <stable@vger.kernel.org> > --- > fs/ocfs2/localalloc.c | 19 ------------------- > 1 file changed, 19 deletions(-) > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 8ac42ea81a17..5df34561c551 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = bit_off + 1; > } > > - /* clear the contiguous bits until the end boundary */ > - if (count) { > - blkno = la_start_blk + > - ocfs2_clusters_to_blocks(osb->sb, > - start - count); > - > - trace_ocfs2_sync_local_to_main_free( > - count, start - count, > - (unsigned long long)la_start_blk, > - (unsigned long long)blkno); > - > - status = ocfs2_release_clusters(handle, > - main_bm_inode, > - main_bm_bh, blkno, > - count); > - if (status < 0) > - mlog_errno(status); > - } > - > bail: > if (status) > mlog_errno(status); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 3:47 ` Joseph Qi @ 2024-12-04 6:46 ` Heming Zhao 2024-12-04 9:28 ` Joseph Qi 2024-12-12 8:18 ` Greg KH 0 siblings, 2 replies; 8+ messages in thread From: Heming Zhao @ 2024-12-04 6:46 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable On 12/4/24 11:47, Joseph Qi wrote: > > > On 12/4/24 11:32 AM, Heming Zhao wrote: >> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >> unmounting an ocfs2 volume"). >> >> In commit dfe6c5692fb5, the commit log stating "This bug has existed >> since the initial OCFS2 code." is incorrect. The correct introduction >> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >> > > Could you please elaborate more how it happens? > And it seems no difference with the new version. So we may submit a > standalone revert patch to those backported stable kernels (< 6.10). commit log from patch [2/2] should be revised. change: This bug has existed since the initial OCFS2 code. to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") ---- See below for the details of patch [1/2]. following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". static int ocfs2_sync_local_to_main() { ... ... 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) 2 != -1) { 3 if ((bit_off < left) && (bit_off == start)) { 4 count++; 5 start++; 6 continue; 7 } 8 if (count) { 9 blkno = la_start_blk + 10 ocfs2_clusters_to_blocks(osb->sb, 11 start - count); 12 13 trace_ocfs2_sync_local_to_main_free(); 14 15 status = ocfs2_release_clusters(handle, 16 main_bm_inode, 17 main_bm_bh, blkno, 18 count); 19 if (status < 0) { 20 mlog_errno(status); 21 goto bail; 22 } 23 } 24 if (bit_off >= left) 25 break; 26 count = 1; 27 start = bit_off + 1; 28 } 29 30 /* clear the contiguous bits until the end boundary */ 31 if (count) { 32 blkno = la_start_blk + 33 ocfs2_clusters_to_blocks(osb->sb, 34 start - count); 35 36 trace_ocfs2_sync_local_to_main_free(); 37 38 status = ocfs2_release_clusters(handle, 39 main_bm_inode, 40 main_bm_bh, blkno, 41 count); 42 if (status < 0) 43 mlog_errno(status); 44 } ... ... } bug flow: 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. 5. code runs to line 31, this area code releases space to main_bm for the same gd again. kernel will report the following likely error: OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 thanks, Heming ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 6:46 ` Heming Zhao @ 2024-12-04 9:28 ` Joseph Qi 2024-12-04 11:11 ` Heming Zhao 2024-12-04 11:34 ` Heming Zhao 2024-12-12 8:18 ` Greg KH 1 sibling, 2 replies; 8+ messages in thread From: Joseph Qi @ 2024-12-04 9:28 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 2:46 PM, Heming Zhao wrote: > On 12/4/24 11:47, Joseph Qi wrote: >> >> >> On 12/4/24 11:32 AM, Heming Zhao wrote: >>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>> unmounting an ocfs2 volume"). >>> >>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>> since the initial OCFS2 code." is incorrect. The correct introduction >>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>> >> >> Could you please elaborate more how it happens? >> And it seems no difference with the new version. So we may submit a >> standalone revert patch to those backported stable kernels (< 6.10). > > commit log from patch [2/2] should be revised. > change: This bug has existed since the initial OCFS2 code. > to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") > > ---- > See below for the details of patch [1/2]. > > following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". > > static int ocfs2_sync_local_to_main() > { > ... ... > 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) > 2 != -1) { > 3 if ((bit_off < left) && (bit_off == start)) { > 4 count++; > 5 start++; > 6 continue; > 7 } > 8 if (count) { > 9 blkno = la_start_blk + > 10 ocfs2_clusters_to_blocks(osb->sb, > 11 start - count); > 12 > 13 trace_ocfs2_sync_local_to_main_free(); > 14 > 15 status = ocfs2_release_clusters(handle, > 16 main_bm_inode, > 17 main_bm_bh, blkno, > 18 count); > 19 if (status < 0) { > 20 mlog_errno(status); > 21 goto bail; > 22 } > 23 } > 24 if (bit_off >= left) > 25 break; > 26 count = 1; > 27 start = bit_off + 1; > 28 } > 29 > 30 /* clear the contiguous bits until the end boundary */ > 31 if (count) { > 32 blkno = la_start_blk + > 33 ocfs2_clusters_to_blocks(osb->sb, > 34 start - count); > 35 > 36 trace_ocfs2_sync_local_to_main_free(); > 37 > 38 status = ocfs2_release_clusters(handle, > 39 main_bm_inode, > 40 main_bm_bh, blkno, > 41 count); > 42 if (status < 0) > 43 mlog_errno(status); > 44 } > ... ... > } > > bug flow: > 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. > 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. > 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. > 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. > 5. code runs to line 31, this area code releases space to main_bm for the same gd again. > > kernel will report the following likely error: > OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 > Okay, IIUC, it seems we have to: 1. revert commit dfe6c5692fb5 (so does stable kernel). 2. fix 30dd3478c3cd in following way: diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 5df34561c551..f0feadac2ef1 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, start = count = 0; left = le32_to_cpu(alloc->id1.bitmap1.i_total); - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= left) { - if (bit_off == start) { + if ((bit_off < left) && (bit_off == start)) { count++; start++; continue; @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, goto bail; } } - + if (bit_off >= left) + break; count = 1; start = bit_off + 1; } Thanks, Joseph ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 9:28 ` Joseph Qi @ 2024-12-04 11:11 ` Heming Zhao 2024-12-04 11:34 ` Heming Zhao 1 sibling, 0 replies; 8+ messages in thread From: Heming Zhao @ 2024-12-04 11:11 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 17:28, Joseph Qi wrote: > > > On 12/4/24 2:46 PM, Heming Zhao wrote: >> On 12/4/24 11:47, Joseph Qi wrote: >>> >>> >>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>> unmounting an ocfs2 volume"). >>>> >>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>> >>> >>> Could you please elaborate more how it happens? >>> And it seems no difference with the new version. So we may submit a >>> standalone revert patch to those backported stable kernels (< 6.10). >> >> commit log from patch [2/2] should be revised. >> change: This bug has existed since the initial OCFS2 code. >> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >> >> ---- >> See below for the details of patch [1/2]. >> >> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >> >> static int ocfs2_sync_local_to_main() >> { >> ... ... >> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >> 2 != -1) { >> 3 if ((bit_off < left) && (bit_off == start)) { >> 4 count++; >> 5 start++; >> 6 continue; >> 7 } >> 8 if (count) { >> 9 blkno = la_start_blk + >> 10 ocfs2_clusters_to_blocks(osb->sb, >> 11 start - count); >> 12 >> 13 trace_ocfs2_sync_local_to_main_free(); >> 14 >> 15 status = ocfs2_release_clusters(handle, >> 16 main_bm_inode, >> 17 main_bm_bh, blkno, >> 18 count); >> 19 if (status < 0) { >> 20 mlog_errno(status); >> 21 goto bail; >> 22 } >> 23 } >> 24 if (bit_off >= left) >> 25 break; >> 26 count = 1; >> 27 start = bit_off + 1; >> 28 } >> 29 >> 30 /* clear the contiguous bits until the end boundary */ >> 31 if (count) { >> 32 blkno = la_start_blk + >> 33 ocfs2_clusters_to_blocks(osb->sb, >> 34 start - count); >> 35 >> 36 trace_ocfs2_sync_local_to_main_free(); >> 37 >> 38 status = ocfs2_release_clusters(handle, >> 39 main_bm_inode, >> 40 main_bm_bh, blkno, >> 41 count); >> 42 if (status < 0) >> 43 mlog_errno(status); >> 44 } >> ... ... >> } >> >> bug flow: >> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >> >> kernel will report the following likely error: >> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >> > > Okay, IIUC, it seems we have to: > 1. revert commit dfe6c5692fb5 (so does stable kernel). OK. > 2. fix 30dd3478c3cd in following way: It looks good to me. I will send v2 patch. -Heming > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 5df34561c551..f0feadac2ef1 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = count = 0; > left = le32_to_cpu(alloc->id1.bitmap1.i_total); > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= > left) { > - if (bit_off == start) { > + if ((bit_off < left) && (bit_off == start)) { > count++; > start++; > continue; > @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > goto bail; > } > } > - > + if (bit_off >= left) > + break; > count = 1; > start = bit_off + 1; > } > > Thanks, > Joseph > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 9:28 ` Joseph Qi 2024-12-04 11:11 ` Heming Zhao @ 2024-12-04 11:34 ` Heming Zhao 2024-12-04 12:09 ` Joseph Qi 1 sibling, 1 reply; 8+ messages in thread From: Heming Zhao @ 2024-12-04 11:34 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 17:28, Joseph Qi wrote: > > > On 12/4/24 2:46 PM, Heming Zhao wrote: >> On 12/4/24 11:47, Joseph Qi wrote: >>> >>> >>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>> unmounting an ocfs2 volume"). >>>> >>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>> >>> >>> Could you please elaborate more how it happens? >>> And it seems no difference with the new version. So we may submit a >>> standalone revert patch to those backported stable kernels (< 6.10). >> >> commit log from patch [2/2] should be revised. >> change: This bug has existed since the initial OCFS2 code. >> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >> >> ---- >> See below for the details of patch [1/2]. >> >> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >> >> static int ocfs2_sync_local_to_main() >> { >> ... ... >> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >> 2 != -1) { >> 3 if ((bit_off < left) && (bit_off == start)) { >> 4 count++; >> 5 start++; >> 6 continue; >> 7 } >> 8 if (count) { >> 9 blkno = la_start_blk + >> 10 ocfs2_clusters_to_blocks(osb->sb, >> 11 start - count); >> 12 >> 13 trace_ocfs2_sync_local_to_main_free(); >> 14 >> 15 status = ocfs2_release_clusters(handle, >> 16 main_bm_inode, >> 17 main_bm_bh, blkno, >> 18 count); >> 19 if (status < 0) { >> 20 mlog_errno(status); >> 21 goto bail; >> 22 } >> 23 } >> 24 if (bit_off >= left) >> 25 break; >> 26 count = 1; >> 27 start = bit_off + 1; >> 28 } >> 29 >> 30 /* clear the contiguous bits until the end boundary */ >> 31 if (count) { >> 32 blkno = la_start_blk + >> 33 ocfs2_clusters_to_blocks(osb->sb, >> 34 start - count); >> 35 >> 36 trace_ocfs2_sync_local_to_main_free(); >> 37 >> 38 status = ocfs2_release_clusters(handle, >> 39 main_bm_inode, >> 40 main_bm_bh, blkno, >> 41 count); >> 42 if (status < 0) >> 43 mlog_errno(status); >> 44 } >> ... ... >> } >> >> bug flow: >> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >> >> kernel will report the following likely error: >> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >> > > Okay, IIUC, it seems we have to: > 1. revert commit dfe6c5692fb5 (so does stable kernel). > 2. fix 30dd3478c3cd in following way: > > diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c > index 5df34561c551..f0feadac2ef1 100644 > --- a/fs/ocfs2/localalloc.c > +++ b/fs/ocfs2/localalloc.c > @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > start = count = 0; > left = le32_to_cpu(alloc->id1.bitmap1.i_total); > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= > left) { The ocfs2_find_next_zero_bit() always returns a value within the range [0, left], do you like the following code? - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < - left) { + for(;;) { + bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); -Heming > - if (bit_off == start) { > + if ((bit_off < left) && (bit_off == start)) { > count++; > start++; > continue; > @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, > goto bail; > } > } > - > + if (bit_off >= left) > + break; > count = 1; > start = bit_off + 1; > } > > Thanks, > Joseph > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 11:34 ` Heming Zhao @ 2024-12-04 12:09 ` Joseph Qi 0 siblings, 0 replies; 8+ messages in thread From: Joseph Qi @ 2024-12-04 12:09 UTC (permalink / raw) To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable On 12/4/24 7:34 PM, Heming Zhao wrote: > On 12/4/24 17:28, Joseph Qi wrote: >> >> >> On 12/4/24 2:46 PM, Heming Zhao wrote: >>> On 12/4/24 11:47, Joseph Qi wrote: >>>> >>>> >>>> On 12/4/24 11:32 AM, Heming Zhao wrote: >>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when >>>>> unmounting an ocfs2 volume"). >>>>> >>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed >>>>> since the initial OCFS2 code." is incorrect. The correct introduction >>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). >>>>> >>>> >>>> Could you please elaborate more how it happens? >>>> And it seems no difference with the new version. So we may submit a >>>> standalone revert patch to those backported stable kernels (< 6.10). >>> >>> commit log from patch [2/2] should be revised. >>> change: This bug has existed since the initial OCFS2 code. >>> to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") >>> >>> ---- >>> See below for the details of patch [1/2]. >>> >>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e". >>> >>> static int ocfs2_sync_local_to_main() >>> { >>> ... ... >>> 1 while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) >>> 2 != -1) { >>> 3 if ((bit_off < left) && (bit_off == start)) { >>> 4 count++; >>> 5 start++; >>> 6 continue; >>> 7 } >>> 8 if (count) { >>> 9 blkno = la_start_blk + >>> 10 ocfs2_clusters_to_blocks(osb->sb, >>> 11 start - count); >>> 12 >>> 13 trace_ocfs2_sync_local_to_main_free(); >>> 14 >>> 15 status = ocfs2_release_clusters(handle, >>> 16 main_bm_inode, >>> 17 main_bm_bh, blkno, >>> 18 count); >>> 19 if (status < 0) { >>> 20 mlog_errno(status); >>> 21 goto bail; >>> 22 } >>> 23 } >>> 24 if (bit_off >= left) >>> 25 break; >>> 26 count = 1; >>> 27 start = bit_off + 1; >>> 28 } >>> 29 >>> 30 /* clear the contiguous bits until the end boundary */ >>> 31 if (count) { >>> 32 blkno = la_start_blk + >>> 33 ocfs2_clusters_to_blocks(osb->sb, >>> 34 start - count); >>> 35 >>> 36 trace_ocfs2_sync_local_to_main_free(); >>> 37 >>> 38 status = ocfs2_release_clusters(handle, >>> 39 main_bm_inode, >>> 40 main_bm_bh, blkno, >>> 41 count); >>> 42 if (status < 0) >>> 43 mlog_errno(status); >>> 44 } >>> ... ... >>> } >>> >>> bug flow: >>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap. >>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3. >>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm. >>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999. >>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again. >>> >>> kernel will report the following likely error: >>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878 >>> >> >> Okay, IIUC, it seems we have to: >> 1. revert commit dfe6c5692fb5 (so does stable kernel). >> 2. fix 30dd3478c3cd in following way: >> >> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c >> index 5df34561c551..f0feadac2ef1 100644 >> --- a/fs/ocfs2/localalloc.c >> +++ b/fs/ocfs2/localalloc.c >> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb, >> start = count = 0; >> left = le32_to_cpu(alloc->id1.bitmap1.i_total); >> - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < >> + while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <= >> left) { > > The ocfs2_find_next_zero_bit() always returns a value within the range [0, left], You're right. > do you like the following code? > > - while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) < > - left) { > + for(;;) { > + bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); > Or simplify to: while (1) { bit_off = ocfs2_find_next_zero_bit(bitmap, left, start); ... } Thanks, Joseph ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" 2024-12-04 6:46 ` Heming Zhao 2024-12-04 9:28 ` Joseph Qi @ 2024-12-12 8:18 ` Greg KH 1 sibling, 0 replies; 8+ messages in thread From: Greg KH @ 2024-12-12 8:18 UTC (permalink / raw) To: Heming Zhao Cc: Joseph Qi, ocfs2-devel, linux-kernel, akpm, Heming Zhao, stable On Wed, Dec 04, 2024 at 02:46:15PM +0800, Heming Zhao wrote: > On 12/4/24 11:47, Joseph Qi wrote: > > > > > > On 12/4/24 11:32 AM, Heming Zhao wrote: > > > This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when > > > unmounting an ocfs2 volume"). > > > > > > In commit dfe6c5692fb5, the commit log stating "This bug has existed > > > since the initial OCFS2 code." is incorrect. The correct introduction > > > commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()"). > > > > > > > Could you please elaborate more how it happens? > > And it seems no difference with the new version. So we may submit a > > standalone revert patch to those backported stable kernels (< 6.10). > > commit log from patch [2/2] should be revised. > change: This bug has existed since the initial OCFS2 code. > to : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") So can you please send a new version of this series? thanks, greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-12 8:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20241204033243.8273-1-heming.zhao@suse.com>
2024-12-04 3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
2024-12-04 3:47 ` Joseph Qi
2024-12-04 6:46 ` Heming Zhao
2024-12-04 9:28 ` Joseph Qi
2024-12-04 11:11 ` Heming Zhao
2024-12-04 11:34 ` Heming Zhao
2024-12-04 12:09 ` Joseph Qi
2024-12-12 8:18 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox