public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5
@ 2024-12-04  3:32 Heming Zhao
  2024-12-04  3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
  2024-12-04  3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao
  0 siblings, 2 replies; 11+ messages in thread
From: Heming Zhao @ 2024-12-04  3:32 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: Heming Zhao, linux-kernel, akpm, gregkh

SUSE QA team detected a mistake in my commit dfe6c5692fb5 ("ocfs2: fix
the la space leak when unmounting an ocfs2 volume"). I am very sorry for
my error. (If my eyes are correct) From the mailling list mails, this
patch shouldn't be applied to 4.19 5.4 5.10 5.15 6.1 6.6, and these
branches should perform a revert operation.

Reason for revert:
In commit dfe6c5692fb5, I mistakenly wrote: "This bug has existed since
the initial OCFS2 code.". The statement is wrong. The correct
introduction commit is 30dd3478c3cd. IOW, if the branch doesn't include
30dd3478c3cd, dfe6c5692fb5 should also not be included.

I am not sure whether the revert and resubmit patch operations are
correct or not. Please guide me if the approach is incorrect.

Heming Zhao (2):
  ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2
    volume"
  ocfs2: fix the la space leak when unmounting an ocfs2 volume


-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao
@ 2024-12-04  3:32 ` Heming Zhao
  2024-12-04  3:47   ` Joseph Qi
  2024-12-04  3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao
  1 sibling, 1 reply; 11+ messages in thread
From: Heming Zhao @ 2024-12-04  3:32 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel
  Cc: Heming Zhao, linux-kernel, akpm, gregkh, Heming Zhao, stable

This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
unmounting an ocfs2 volume").

In commit dfe6c5692fb5, the commit log stating "This bug has existed
since the initial OCFS2 code." is incorrect. The correct introduction
commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").

Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume")
Signed-off-by: Heming Zhao <heing.zhao@suse.com>
Cc: <stable@vger.kernel.org>
---
 fs/ocfs2/localalloc.c | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 8ac42ea81a17..5df34561c551 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
 		start = bit_off + 1;
 	}
 
-	/* clear the contiguous bits until the end boundary */
-	if (count) {
-		blkno = la_start_blk +
-			ocfs2_clusters_to_blocks(osb->sb,
-					start - count);
-
-		trace_ocfs2_sync_local_to_main_free(
-				count, start - count,
-				(unsigned long long)la_start_blk,
-				(unsigned long long)blkno);
-
-		status = ocfs2_release_clusters(handle,
-				main_bm_inode,
-				main_bm_bh, blkno,
-				count);
-		if (status < 0)
-			mlog_errno(status);
-	}
-
 bail:
 	if (status)
 		mlog_errno(status);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume
  2024-12-04  3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao
  2024-12-04  3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
@ 2024-12-04  3:32 ` Heming Zhao
  2024-12-04  4:08   ` Heming Zhao
  1 sibling, 1 reply; 11+ messages in thread
From: Heming Zhao @ 2024-12-04  3:32 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: Heming Zhao, linux-kernel, akpm, gregkh

This bug has existed since the initial OCFS2 code.  The code logic in
ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous
free bits, which causes an OCFS2 volume to lose the last free clusters of
LA window on each umount command.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Fixes: 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
---
 fs/ocfs2/localalloc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 5df34561c551..8ac42ea81a17 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -1002,6 +1002,25 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
 		start = bit_off + 1;
 	}
 
+	/* clear the contiguous bits until the end boundary */
+	if (count) {
+		blkno = la_start_blk +
+			ocfs2_clusters_to_blocks(osb->sb,
+					start - count);
+
+		trace_ocfs2_sync_local_to_main_free(
+				count, start - count,
+				(unsigned long long)la_start_blk,
+				(unsigned long long)blkno);
+
+		status = ocfs2_release_clusters(handle,
+				main_bm_inode,
+				main_bm_bh, blkno,
+				count);
+		if (status < 0)
+			mlog_errno(status);
+	}
+
 bail:
 	if (status)
 		mlog_errno(status);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
@ 2024-12-04  3:47   ` Joseph Qi
  2024-12-04  6:46     ` Heming Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2024-12-04  3:47 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable



On 12/4/24 11:32 AM, Heming Zhao wrote:
> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
> unmounting an ocfs2 volume").
> 
> In commit dfe6c5692fb5, the commit log stating "This bug has existed
> since the initial OCFS2 code." is incorrect. The correct introduction
> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
> 

Could you please elaborate more how it happens?
And it seems no difference with the new version. So we may submit a
standalone revert patch to those backported stable kernels (< 6.10).

Thanks,
Joseph

> Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume")
> Signed-off-by: Heming Zhao <heing.zhao@suse.com>
> Cc: <stable@vger.kernel.org>
> ---
>  fs/ocfs2/localalloc.c | 19 -------------------
>  1 file changed, 19 deletions(-)
> 
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index 8ac42ea81a17..5df34561c551 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>  		start = bit_off + 1;
>  	}
>  
> -	/* clear the contiguous bits until the end boundary */
> -	if (count) {
> -		blkno = la_start_blk +
> -			ocfs2_clusters_to_blocks(osb->sb,
> -					start - count);
> -
> -		trace_ocfs2_sync_local_to_main_free(
> -				count, start - count,
> -				(unsigned long long)la_start_blk,
> -				(unsigned long long)blkno);
> -
> -		status = ocfs2_release_clusters(handle,
> -				main_bm_inode,
> -				main_bm_bh, blkno,
> -				count);
> -		if (status < 0)
> -			mlog_errno(status);
> -	}
> -
>  bail:
>  	if (status)
>  		mlog_errno(status);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume
  2024-12-04  3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao
@ 2024-12-04  4:08   ` Heming Zhao
  0 siblings, 0 replies; 11+ messages in thread
From: Heming Zhao @ 2024-12-04  4:08 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh

On 12/4/24 11:32, Heming Zhao wrote:
> This bug has existed since the initial OCFS2 code.  The code logic in
forgot revise commit, very stupid mistake.

> ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous
> free bits, which causes an OCFS2 volume to lose the last free clusters of
> LA window on each umount command.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> Fixes: 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
> ---
>   fs/ocfs2/localalloc.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index 5df34561c551..8ac42ea81a17 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -1002,6 +1002,25 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>   		start = bit_off + 1;
>   	}
>   
> +	/* clear the contiguous bits until the end boundary */
> +	if (count) {
> +		blkno = la_start_blk +
> +			ocfs2_clusters_to_blocks(osb->sb,
> +					start - count);
> +
> +		trace_ocfs2_sync_local_to_main_free(
> +				count, start - count,
> +				(unsigned long long)la_start_blk,
> +				(unsigned long long)blkno);
> +
> +		status = ocfs2_release_clusters(handle,
> +				main_bm_inode,
> +				main_bm_bh, blkno,
> +				count);
> +		if (status < 0)
> +			mlog_errno(status);
> +	}
> +
>   bail:
>   	if (status)
>   		mlog_errno(status);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  3:47   ` Joseph Qi
@ 2024-12-04  6:46     ` Heming Zhao
  2024-12-04  9:28       ` Joseph Qi
  2024-12-12  8:18       ` Greg KH
  0 siblings, 2 replies; 11+ messages in thread
From: Heming Zhao @ 2024-12-04  6:46 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, Heming Zhao, stable

On 12/4/24 11:47, Joseph Qi wrote:
> 
> 
> On 12/4/24 11:32 AM, Heming Zhao wrote:
>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>> unmounting an ocfs2 volume").
>>
>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>> since the initial OCFS2 code." is incorrect. The correct introduction
>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>
> 
> Could you please elaborate more how it happens?
> And it seems no difference with the new version. So we may submit a
> standalone revert patch to those backported stable kernels (< 6.10).

commit log from patch [2/2] should be revised.
change: This bug has existed since the initial OCFS2 code.
to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")

----
See below for the details of patch [1/2].

following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".

    static int ocfs2_sync_local_to_main()
    {
    	... ...
  1  	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
  2  	       != -1) {
  3  		if ((bit_off < left) && (bit_off == start)) {
  4  			count++;
  5  			start++;
  6  			continue;
  7  		}
  8  		if (count) {
  9  			blkno = la_start_blk +
10   				ocfs2_clusters_to_blocks(osb->sb,
11   							 start - count);
12
13   			trace_ocfs2_sync_local_to_main_free();
14
15   			status = ocfs2_release_clusters(handle,
16   							main_bm_inode,
17   							main_bm_bh, blkno,
18   							count);
19   			if (status < 0) {
20   				mlog_errno(status);
21   				goto bail;
22   			}
23   		}
24   		if (bit_off >= left)
25   			break;
26   		count = 1;
27   		start = bit_off + 1;
28   	}
29
30 	/* clear the contiguous bits until the end boundary */
31 	if (count) {
32 		blkno = la_start_blk +
33 			ocfs2_clusters_to_blocks(osb->sb,
34 					start - count);
35
36 		trace_ocfs2_sync_local_to_main_free();
37
38 		status = ocfs2_release_clusters(handle,
39 				main_bm_inode,
40 				main_bm_bh, blkno,
41 				count);
42 		if (status < 0)
43 			mlog_errno(status);
44  	}
    	... ...
    }

bug flow:
1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
5. code runs to line 31, this area code releases space to main_bm for the same gd again.

kernel will report the following likely error:
OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878

thanks,
Heming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  6:46     ` Heming Zhao
@ 2024-12-04  9:28       ` Joseph Qi
  2024-12-04 11:11         ` Heming Zhao
  2024-12-04 11:34         ` Heming Zhao
  2024-12-12  8:18       ` Greg KH
  1 sibling, 2 replies; 11+ messages in thread
From: Joseph Qi @ 2024-12-04  9:28 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable



On 12/4/24 2:46 PM, Heming Zhao wrote:
> On 12/4/24 11:47, Joseph Qi wrote:
>>
>>
>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>> unmounting an ocfs2 volume").
>>>
>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>
>>
>> Could you please elaborate more how it happens?
>> And it seems no difference with the new version. So we may submit a
>> standalone revert patch to those backported stable kernels (< 6.10).
> 
> commit log from patch [2/2] should be revised.
> change: This bug has existed since the initial OCFS2 code.
> to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
> 
> ----
> See below for the details of patch [1/2].
> 
> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
> 
>    static int ocfs2_sync_local_to_main()
>    {
>        ... ...
>  1      while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>  2             != -1) {
>  3          if ((bit_off < left) && (bit_off == start)) {
>  4              count++;
>  5              start++;
>  6              continue;
>  7          }
>  8          if (count) {
>  9              blkno = la_start_blk +
> 10                   ocfs2_clusters_to_blocks(osb->sb,
> 11                                start - count);
> 12
> 13               trace_ocfs2_sync_local_to_main_free();
> 14
> 15               status = ocfs2_release_clusters(handle,
> 16                               main_bm_inode,
> 17                               main_bm_bh, blkno,
> 18                               count);
> 19               if (status < 0) {
> 20                   mlog_errno(status);
> 21                   goto bail;
> 22               }
> 23           }
> 24           if (bit_off >= left)
> 25               break;
> 26           count = 1;
> 27           start = bit_off + 1;
> 28       }
> 29
> 30     /* clear the contiguous bits until the end boundary */
> 31     if (count) {
> 32         blkno = la_start_blk +
> 33             ocfs2_clusters_to_blocks(osb->sb,
> 34                     start - count);
> 35
> 36         trace_ocfs2_sync_local_to_main_free();
> 37
> 38         status = ocfs2_release_clusters(handle,
> 39                 main_bm_inode,
> 40                 main_bm_bh, blkno,
> 41                 count);
> 42         if (status < 0)
> 43             mlog_errno(status);
> 44      }
>        ... ...
>    }
> 
> bug flow:
> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
> 
> kernel will report the following likely error:
> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
> 

Okay, IIUC, it seems we have to:
1. revert commit dfe6c5692fb5 (so does stable kernel).
2. fix 30dd3478c3cd in following way:

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 5df34561c551..f0feadac2ef1 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
 	start = count = 0;
 	left = le32_to_cpu(alloc->id1.bitmap1.i_total);
 
-	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
+	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
 	       left) {
-		if (bit_off == start) {
+		if ((bit_off < left) && (bit_off == start)) {
 			count++;
 			start++;
 			continue;
@@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
 				goto bail;
 			}
 		}
-
+		if (bit_off >= left)
+			break;
 		count = 1;
 		start = bit_off + 1;
 	}

Thanks,
Joseph



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  9:28       ` Joseph Qi
@ 2024-12-04 11:11         ` Heming Zhao
  2024-12-04 11:34         ` Heming Zhao
  1 sibling, 0 replies; 11+ messages in thread
From: Heming Zhao @ 2024-12-04 11:11 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable

On 12/4/24 17:28, Joseph Qi wrote:
> 
> 
> On 12/4/24 2:46 PM, Heming Zhao wrote:
>> On 12/4/24 11:47, Joseph Qi wrote:
>>>
>>>
>>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>>> unmounting an ocfs2 volume").
>>>>
>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>>
>>>
>>> Could you please elaborate more how it happens?
>>> And it seems no difference with the new version. So we may submit a
>>> standalone revert patch to those backported stable kernels (< 6.10).
>>
>> commit log from patch [2/2] should be revised.
>> change: This bug has existed since the initial OCFS2 code.
>> to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
>>
>> ----
>> See below for the details of patch [1/2].
>>
>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
>>
>>     static int ocfs2_sync_local_to_main()
>>     {
>>         ... ...
>>   1      while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>>   2             != -1) {
>>   3          if ((bit_off < left) && (bit_off == start)) {
>>   4              count++;
>>   5              start++;
>>   6              continue;
>>   7          }
>>   8          if (count) {
>>   9              blkno = la_start_blk +
>> 10                   ocfs2_clusters_to_blocks(osb->sb,
>> 11                                start - count);
>> 12
>> 13               trace_ocfs2_sync_local_to_main_free();
>> 14
>> 15               status = ocfs2_release_clusters(handle,
>> 16                               main_bm_inode,
>> 17                               main_bm_bh, blkno,
>> 18                               count);
>> 19               if (status < 0) {
>> 20                   mlog_errno(status);
>> 21                   goto bail;
>> 22               }
>> 23           }
>> 24           if (bit_off >= left)
>> 25               break;
>> 26           count = 1;
>> 27           start = bit_off + 1;
>> 28       }
>> 29
>> 30     /* clear the contiguous bits until the end boundary */
>> 31     if (count) {
>> 32         blkno = la_start_blk +
>> 33             ocfs2_clusters_to_blocks(osb->sb,
>> 34                     start - count);
>> 35
>> 36         trace_ocfs2_sync_local_to_main_free();
>> 37
>> 38         status = ocfs2_release_clusters(handle,
>> 39                 main_bm_inode,
>> 40                 main_bm_bh, blkno,
>> 41                 count);
>> 42         if (status < 0)
>> 43             mlog_errno(status);
>> 44      }
>>         ... ...
>>     }
>>
>> bug flow:
>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
>>
>> kernel will report the following likely error:
>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
>>
> 
> Okay, IIUC, it seems we have to:
> 1. revert commit dfe6c5692fb5 (so does stable kernel).

OK.

> 2. fix 30dd3478c3cd in following way:

It looks good to me.

I will send v2 patch.

-Heming
> 
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index 5df34561c551..f0feadac2ef1 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>   	start = count = 0;
>   	left = le32_to_cpu(alloc->id1.bitmap1.i_total);
>   
> -	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
> +	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
>   	       left) {
> -		if (bit_off == start) {
> +		if ((bit_off < left) && (bit_off == start)) {
>   			count++;
>   			start++;
>   			continue;
> @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>   				goto bail;
>   			}
>   		}
> -
> +		if (bit_off >= left)
> +			break;
>   		count = 1;
>   		start = bit_off + 1;
>   	}
> 
> Thanks,
> Joseph
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  9:28       ` Joseph Qi
  2024-12-04 11:11         ` Heming Zhao
@ 2024-12-04 11:34         ` Heming Zhao
  2024-12-04 12:09           ` Joseph Qi
  1 sibling, 1 reply; 11+ messages in thread
From: Heming Zhao @ 2024-12-04 11:34 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable

On 12/4/24 17:28, Joseph Qi wrote:
> 
> 
> On 12/4/24 2:46 PM, Heming Zhao wrote:
>> On 12/4/24 11:47, Joseph Qi wrote:
>>>
>>>
>>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>>> unmounting an ocfs2 volume").
>>>>
>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>>
>>>
>>> Could you please elaborate more how it happens?
>>> And it seems no difference with the new version. So we may submit a
>>> standalone revert patch to those backported stable kernels (< 6.10).
>>
>> commit log from patch [2/2] should be revised.
>> change: This bug has existed since the initial OCFS2 code.
>> to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
>>
>> ----
>> See below for the details of patch [1/2].
>>
>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
>>
>>     static int ocfs2_sync_local_to_main()
>>     {
>>         ... ...
>>   1      while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>>   2             != -1) {
>>   3          if ((bit_off < left) && (bit_off == start)) {
>>   4              count++;
>>   5              start++;
>>   6              continue;
>>   7          }
>>   8          if (count) {
>>   9              blkno = la_start_blk +
>> 10                   ocfs2_clusters_to_blocks(osb->sb,
>> 11                                start - count);
>> 12
>> 13               trace_ocfs2_sync_local_to_main_free();
>> 14
>> 15               status = ocfs2_release_clusters(handle,
>> 16                               main_bm_inode,
>> 17                               main_bm_bh, blkno,
>> 18                               count);
>> 19               if (status < 0) {
>> 20                   mlog_errno(status);
>> 21                   goto bail;
>> 22               }
>> 23           }
>> 24           if (bit_off >= left)
>> 25               break;
>> 26           count = 1;
>> 27           start = bit_off + 1;
>> 28       }
>> 29
>> 30     /* clear the contiguous bits until the end boundary */
>> 31     if (count) {
>> 32         blkno = la_start_blk +
>> 33             ocfs2_clusters_to_blocks(osb->sb,
>> 34                     start - count);
>> 35
>> 36         trace_ocfs2_sync_local_to_main_free();
>> 37
>> 38         status = ocfs2_release_clusters(handle,
>> 39                 main_bm_inode,
>> 40                 main_bm_bh, blkno,
>> 41                 count);
>> 42         if (status < 0)
>> 43             mlog_errno(status);
>> 44      }
>>         ... ...
>>     }
>>
>> bug flow:
>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
>>
>> kernel will report the following likely error:
>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
>>
> 
> Okay, IIUC, it seems we have to:
> 1. revert commit dfe6c5692fb5 (so does stable kernel).
> 2. fix 30dd3478c3cd in following way:
> 
> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
> index 5df34561c551..f0feadac2ef1 100644
> --- a/fs/ocfs2/localalloc.c
> +++ b/fs/ocfs2/localalloc.c
> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>   	start = count = 0;
>   	left = le32_to_cpu(alloc->id1.bitmap1.i_total);
>   
> -	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
> +	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
>   	       left) {

The ocfs2_find_next_zero_bit() always returns a value within the range [0, left],
do you like the following code?

-	while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
-		left) {
+	for(;;) {
+		bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);


-Heming

> -		if (bit_off == start) {
> +		if ((bit_off < left) && (bit_off == start)) {
>   			count++;
>   			start++;
>   			continue;
> @@ -997,7 +997,8 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>   				goto bail;
>   			}
>   		}
> -
> +		if (bit_off >= left)
> +			break;
>   		count = 1;
>   		start = bit_off + 1;
>   	}
> 
> Thanks,
> Joseph
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04 11:34         ` Heming Zhao
@ 2024-12-04 12:09           ` Joseph Qi
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Qi @ 2024-12-04 12:09 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: linux-kernel, akpm, gregkh, stable



On 12/4/24 7:34 PM, Heming Zhao wrote:
> On 12/4/24 17:28, Joseph Qi wrote:
>>
>>
>> On 12/4/24 2:46 PM, Heming Zhao wrote:
>>> On 12/4/24 11:47, Joseph Qi wrote:
>>>>
>>>>
>>>> On 12/4/24 11:32 AM, Heming Zhao wrote:
>>>>> This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
>>>>> unmounting an ocfs2 volume").
>>>>>
>>>>> In commit dfe6c5692fb5, the commit log stating "This bug has existed
>>>>> since the initial OCFS2 code." is incorrect. The correct introduction
>>>>> commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
>>>>>
>>>>
>>>> Could you please elaborate more how it happens?
>>>> And it seems no difference with the new version. So we may submit a
>>>> standalone revert patch to those backported stable kernels (< 6.10).
>>>
>>> commit log from patch [2/2] should be revised.
>>> change: This bug has existed since the initial OCFS2 code.
>>> to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
>>>
>>> ----
>>> See below for the details of patch [1/2].
>>>
>>> following is "the code before commit 30dd3478c3cd7" + "commit dfe6c5692fb525e".
>>>
>>>     static int ocfs2_sync_local_to_main()
>>>     {
>>>         ... ...
>>>   1      while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start))
>>>   2             != -1) {
>>>   3          if ((bit_off < left) && (bit_off == start)) {
>>>   4              count++;
>>>   5              start++;
>>>   6              continue;
>>>   7          }
>>>   8          if (count) {
>>>   9              blkno = la_start_blk +
>>> 10                   ocfs2_clusters_to_blocks(osb->sb,
>>> 11                                start - count);
>>> 12
>>> 13               trace_ocfs2_sync_local_to_main_free();
>>> 14
>>> 15               status = ocfs2_release_clusters(handle,
>>> 16                               main_bm_inode,
>>> 17                               main_bm_bh, blkno,
>>> 18                               count);
>>> 19               if (status < 0) {
>>> 20                   mlog_errno(status);
>>> 21                   goto bail;
>>> 22               }
>>> 23           }
>>> 24           if (bit_off >= left)
>>> 25               break;
>>> 26           count = 1;
>>> 27           start = bit_off + 1;
>>> 28       }
>>> 29
>>> 30     /* clear the contiguous bits until the end boundary */
>>> 31     if (count) {
>>> 32         blkno = la_start_blk +
>>> 33             ocfs2_clusters_to_blocks(osb->sb,
>>> 34                     start - count);
>>> 35
>>> 36         trace_ocfs2_sync_local_to_main_free();
>>> 37
>>> 38         status = ocfs2_release_clusters(handle,
>>> 39                 main_bm_inode,
>>> 40                 main_bm_bh, blkno,
>>> 41                 count);
>>> 42         if (status < 0)
>>> 43             mlog_errno(status);
>>> 44      }
>>>         ... ...
>>>     }
>>>
>>> bug flow:
>>> 1. the left:10000, start:0, bit_off:9000, and there are zeros from 9000 to the end of bitmap.
>>> 2. when 'start' is 9999, code runs to line 3, where bit_off is 10000 (the 'left' value), it doesn't trigger line 3.
>>> 3. code runs to line 8 (where 'count' is 9999), this area releases 9999 bytes of space to main_bm.
>>> 4. code runs to line 24, triggering "bit_off == left" and 'break' the loop. at this time, the 'count' still retains its old value 9999.
>>> 5. code runs to line 31, this area code releases space to main_bm for the same gd again.
>>>
>>> kernel will report the following likely error:
>>> OCFS2: ERROR (device dm-0): ocfs2_block_group_clear_bits: Group descriptor # 349184 has bit count 15872 but claims 19871 are freed. num_bits 7878
>>>
>>
>> Okay, IIUC, it seems we have to:
>> 1. revert commit dfe6c5692fb5 (so does stable kernel).
>> 2. fix 30dd3478c3cd in following way:
>>
>> diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
>> index 5df34561c551..f0feadac2ef1 100644
>> --- a/fs/ocfs2/localalloc.c
>> +++ b/fs/ocfs2/localalloc.c
>> @@ -971,9 +971,9 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
>>       start = count = 0;
>>       left = le32_to_cpu(alloc->id1.bitmap1.i_total);
>>   -    while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
>> +    while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <=
>>              left) {
> 
> The ocfs2_find_next_zero_bit() always returns a value within the range [0, left],

You're right.

> do you like the following code?
> 
> -    while ((bit_off = ocfs2_find_next_zero_bit(bitmap, left, start)) <
> -        left) {
> +    for(;;) {
> +        bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);
> 
Or simplify to:
while (1) {
	bit_off = ocfs2_find_next_zero_bit(bitmap, left, start);
	...
}

Thanks,
Joseph


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume"
  2024-12-04  6:46     ` Heming Zhao
  2024-12-04  9:28       ` Joseph Qi
@ 2024-12-12  8:18       ` Greg KH
  1 sibling, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-12-12  8:18 UTC (permalink / raw)
  To: Heming Zhao
  Cc: Joseph Qi, ocfs2-devel, linux-kernel, akpm, Heming Zhao, stable

On Wed, Dec 04, 2024 at 02:46:15PM +0800, Heming Zhao wrote:
> On 12/4/24 11:47, Joseph Qi wrote:
> > 
> > 
> > On 12/4/24 11:32 AM, Heming Zhao wrote:
> > > This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
> > > unmounting an ocfs2 volume").
> > > 
> > > In commit dfe6c5692fb5, the commit log stating "This bug has existed
> > > since the initial OCFS2 code." is incorrect. The correct introduction
> > > commit is 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
> > > 
> > 
> > Could you please elaborate more how it happens?
> > And it seems no difference with the new version. So we may submit a
> > standalone revert patch to those backported stable kernels (< 6.10).
> 
> commit log from patch [2/2] should be revised.
> change: This bug has existed since the initial OCFS2 code.
> to    : This bug was introduced by commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")

So can you please send a new version of this series?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-12-12  8:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04  3:32 [PATCH 0/2] Revert then resubmit ocfs2 commit dfe6c5692fb5 Heming Zhao
2024-12-04  3:32 ` [PATCH 1/2] ocfs2: Revert "ocfs2: fix the la space leak when unmounting an ocfs2 volume" Heming Zhao
2024-12-04  3:47   ` Joseph Qi
2024-12-04  6:46     ` Heming Zhao
2024-12-04  9:28       ` Joseph Qi
2024-12-04 11:11         ` Heming Zhao
2024-12-04 11:34         ` Heming Zhao
2024-12-04 12:09           ` Joseph Qi
2024-12-12  8:18       ` Greg KH
2024-12-04  3:32 ` [PATCH 2/2] ocfs2: fix the la space leak when unmounting an ocfs2 volume Heming Zhao
2024-12-04  4:08   ` Heming Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox