Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
@ 2025-06-25  2:23 Tao Liu
  2025-07-01  7:38 ` HAGIO KAZUHITO(萩尾 一仁)
  0 siblings, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-06-25  2:23 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec, sourabhjain; +Cc: Tao Liu

A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
reproduced with upstream makedumpfile.

When analyzing the corrupt vmcore using crash, the following error
message will output:

    crash: compressed kdump: uncompress failed: 0
    crash: read error: kernel virtual address: c0001e2d2fe48000  type:
    "hardirq thread_union"
    crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
    crash: compressed kdump: uncompress failed: 0

If the vmcore is generated without num-threads option, then no such
errors are noticed.

With --num-threads=N enabled, there will be N sub-threads created. All
sub-threads are producers which responsible for mm page processing, e.g.
compression. The main thread is the consumer which responsible for
writing the compressed data into file. page_flag_buf->ready is used to
sync main and sub-threads. When a sub-thread finishes page processing,
it will set ready flag to be FLAG_READY. In the meantime, main thread
looply check all threads of the ready flags, and break the loop when
find FLAG_READY.

page_flag_buf->ready is read/write by main/sub-threads simultaneously,
but it is unprotected and unsafe. I have tested both mutex and atomic_rw
can fix this issue. This patch takes atomic_rw for its simplicity.

[1]: https://github.com/makedumpfile/makedumpfile/issues/15

Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---

v2 -> v1: Add error message of crash into commit log

---
 makedumpfile.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 2d3b08b..bac45c2 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -8621,7 +8621,8 @@ kdump_thread_function_cyclic(void *arg) {
 
 		while (buf_ready == FALSE) {
 			pthread_testcancel();
-			if (page_flag_buf->ready == FLAG_READY)
+			if (__atomic_load_n(&page_flag_buf->ready,
+					__ATOMIC_SEQ_CST) == FLAG_READY)
 				continue;
 
 			/* get next dumpable pfn */
@@ -8637,7 +8638,8 @@ kdump_thread_function_cyclic(void *arg) {
 			info->current_pfn = pfn + 1;
 
 			page_flag_buf->pfn = pfn;
-			page_flag_buf->ready = FLAG_FILLING;
+			__atomic_store_n(&page_flag_buf->ready, FLAG_FILLING,
+					__ATOMIC_SEQ_CST);
 			pthread_mutex_unlock(&info->current_pfn_mutex);
 			sem_post(&info->page_flag_buf_sem);
 
@@ -8726,7 +8728,8 @@ kdump_thread_function_cyclic(void *arg) {
 			page_flag_buf->index = index;
 			buf_ready = TRUE;
 next:
-			page_flag_buf->ready = FLAG_READY;
+			__atomic_store_n(&page_flag_buf->ready, FLAG_READY,
+					__ATOMIC_SEQ_CST);
 			page_flag_buf = page_flag_buf->next;
 
 		}
@@ -8855,7 +8858,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
 			 * current_pfn is used for recording the value of pfn when checking the pfn.
 			 */
 			for (i = 0; i < info->num_threads; i++) {
-				if (info->page_flag_buf[i]->ready == FLAG_UNUSED)
+				if (__atomic_load_n(&info->page_flag_buf[i]->ready,
+						__ATOMIC_SEQ_CST) == FLAG_UNUSED)
 					continue;
 				temp_pfn = info->page_flag_buf[i]->pfn;
 
@@ -8863,7 +8867,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
 				 * count how many threads have reached the end.
 				 */
 				if (temp_pfn >= end_pfn) {
-					info->page_flag_buf[i]->ready = FLAG_UNUSED;
+					__atomic_store_n(&info->page_flag_buf[i]->ready,
+						FLAG_UNUSED, __ATOMIC_SEQ_CST);
 					end_count++;
 					continue;
 				}
@@ -8885,7 +8890,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
 			 * If the page_flag_buf is not ready, the pfn recorded may be changed.
 			 * So we should recheck.
 			 */
-			if (info->page_flag_buf[consuming]->ready != FLAG_READY) {
+			if (__atomic_load_n(&info->page_flag_buf[consuming]->ready,
+					__ATOMIC_SEQ_CST) != FLAG_READY) {
 				clock_gettime(CLOCK_MONOTONIC, &new);
 				if (new.tv_sec - last.tv_sec > WAIT_TIME) {
 					ERRMSG("Can't get data of pfn.\n");
@@ -8927,7 +8933,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
 				goto out;
 			page_data_buf[index].used = FALSE;
 		}
-		info->page_flag_buf[consuming]->ready = FLAG_UNUSED;
+		__atomic_store_n(&info->page_flag_buf[consuming]->ready,
+				FLAG_UNUSED, __ATOMIC_SEQ_CST);
 		info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next;
 	}
 finish:
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-06-25  2:23 [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N) Tao Liu
@ 2025-07-01  7:38 ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-01  7:59   ` Tao Liu
  0 siblings, 1 reply; 16+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-07-01  7:38 UTC (permalink / raw)
  To: Tao Liu
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Tao,

thank you for the patch.

On 2025/06/25 11:23, Tao Liu wrote:
> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> reproduced with upstream makedumpfile.
> 
> When analyzing the corrupt vmcore using crash, the following error
> message will output:
> 
>      crash: compressed kdump: uncompress failed: 0
>      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>      "hardirq thread_union"
>      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>      crash: compressed kdump: uncompress failed: 0
> 
> If the vmcore is generated without num-threads option, then no such
> errors are noticed.
> 
> With --num-threads=N enabled, there will be N sub-threads created. All
> sub-threads are producers which responsible for mm page processing, e.g.
> compression. The main thread is the consumer which responsible for
> writing the compressed data into file. page_flag_buf->ready is used to
> sync main and sub-threads. When a sub-thread finishes page processing,
> it will set ready flag to be FLAG_READY. In the meantime, main thread
> looply check all threads of the ready flags, and break the loop when
> find FLAG_READY.

I've tried to reproduce the issue, but I couldn't on x86_64.

Do you have any possible scenario that breaks a vmcore?  I could not 
think of it only by looking at the code.

and this is just out of curiosity, is the issue reproduced with 
makedumpfile compiled with -O0 too?

Thanks,
Kazu

> 
> page_flag_buf->ready is read/write by main/sub-threads simultaneously,
> but it is unprotected and unsafe. I have tested both mutex and atomic_rw
> can fix this issue. This patch takes atomic_rw for its simplicity.
> 
> [1]: https://github.com/makedumpfile/makedumpfile/issues/15
> 
> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
> 
> v2 -> v1: Add error message of crash into commit log
> 
> ---
>   makedumpfile.c | 21 ++++++++++++++-------
>   1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 2d3b08b..bac45c2 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -8621,7 +8621,8 @@ kdump_thread_function_cyclic(void *arg) {
>   
>   		while (buf_ready == FALSE) {
>   			pthread_testcancel();
> -			if (page_flag_buf->ready == FLAG_READY)
> +			if (__atomic_load_n(&page_flag_buf->ready,
> +					__ATOMIC_SEQ_CST) == FLAG_READY)
>   				continue;
>   
>   			/* get next dumpable pfn */
> @@ -8637,7 +8638,8 @@ kdump_thread_function_cyclic(void *arg) {
>   			info->current_pfn = pfn + 1;
>   
>   			page_flag_buf->pfn = pfn;
> -			page_flag_buf->ready = FLAG_FILLING;
> +			__atomic_store_n(&page_flag_buf->ready, FLAG_FILLING,
> +					__ATOMIC_SEQ_CST);
>   			pthread_mutex_unlock(&info->current_pfn_mutex);
>   			sem_post(&info->page_flag_buf_sem);
>   
> @@ -8726,7 +8728,8 @@ kdump_thread_function_cyclic(void *arg) {
>   			page_flag_buf->index = index;
>   			buf_ready = TRUE;
>   next:
> -			page_flag_buf->ready = FLAG_READY;
> +			__atomic_store_n(&page_flag_buf->ready, FLAG_READY,
> +					__ATOMIC_SEQ_CST);
>   			page_flag_buf = page_flag_buf->next;
>   
>   		}
> @@ -8855,7 +8858,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
>   			 * current_pfn is used for recording the value of pfn when checking the pfn.
>   			 */
>   			for (i = 0; i < info->num_threads; i++) {
> -				if (info->page_flag_buf[i]->ready == FLAG_UNUSED)
> +				if (__atomic_load_n(&info->page_flag_buf[i]->ready,
> +						__ATOMIC_SEQ_CST) == FLAG_UNUSED)
>   					continue;
>   				temp_pfn = info->page_flag_buf[i]->pfn;
>   
> @@ -8863,7 +8867,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
>   				 * count how many threads have reached the end.
>   				 */
>   				if (temp_pfn >= end_pfn) {
> -					info->page_flag_buf[i]->ready = FLAG_UNUSED;
> +					__atomic_store_n(&info->page_flag_buf[i]->ready,
> +						FLAG_UNUSED, __ATOMIC_SEQ_CST);
>   					end_count++;
>   					continue;
>   				}
> @@ -8885,7 +8890,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
>   			 * If the page_flag_buf is not ready, the pfn recorded may be changed.
>   			 * So we should recheck.
>   			 */
> -			if (info->page_flag_buf[consuming]->ready != FLAG_READY) {
> +			if (__atomic_load_n(&info->page_flag_buf[consuming]->ready,
> +					__ATOMIC_SEQ_CST) != FLAG_READY) {
>   				clock_gettime(CLOCK_MONOTONIC, &new);
>   				if (new.tv_sec - last.tv_sec > WAIT_TIME) {
>   					ERRMSG("Can't get data of pfn.\n");
> @@ -8927,7 +8933,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
>   				goto out;
>   			page_data_buf[index].used = FALSE;
>   		}
> -		info->page_flag_buf[consuming]->ready = FLAG_UNUSED;
> +		__atomic_store_n(&info->page_flag_buf[consuming]->ready,
> +				FLAG_UNUSED, __ATOMIC_SEQ_CST);
>   		info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next;
>   	}
>   finish:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-01  7:38 ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-07-01  7:59   ` Tao Liu
  2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-03 14:31     ` Petr Tesarik
  0 siblings, 2 replies; 16+ messages in thread
From: Tao Liu @ 2025-07-01  7:59 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁)
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Kazu,

Thanks for your comments!

On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>
> Hi Tao,
>
> thank you for the patch.
>
> On 2025/06/25 11:23, Tao Liu wrote:
> > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > reproduced with upstream makedumpfile.
> >
> > When analyzing the corrupt vmcore using crash, the following error
> > message will output:
> >
> >      crash: compressed kdump: uncompress failed: 0
> >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >      "hardirq thread_union"
> >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >      crash: compressed kdump: uncompress failed: 0
> >
> > If the vmcore is generated without num-threads option, then no such
> > errors are noticed.
> >
> > With --num-threads=N enabled, there will be N sub-threads created. All
> > sub-threads are producers which responsible for mm page processing, e.g.
> > compression. The main thread is the consumer which responsible for
> > writing the compressed data into file. page_flag_buf->ready is used to
> > sync main and sub-threads. When a sub-thread finishes page processing,
> > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > looply check all threads of the ready flags, and break the loop when
> > find FLAG_READY.
>
> I've tried to reproduce the issue, but I couldn't on x86_64.

Yes, I cannot reproduce it on x86_64 either, but the issue is very
easily reproduced on ppc64 arch, which is where our QE reported.
Recently we have enabled --num-threads=N in rhel by default. N ==
nr_cpus in 2nd kernel, so QE noticed the issue.

>
> Do you have any possible scenario that breaks a vmcore?  I could not
> think of it only by looking at the code.

I guess the issue only been observed on ppc might be due to ppc's
memory model, multi-thread scheduling algorithm etc. I'm not an expert
on those. So I cannot give a clear explanation, sorry...

The page_flag_buf->ready is an integer that r/w by main and sub
threads simultaneously. And the assignment operation, like
page_flag_buf->ready = 1, might be composed of several assembly
instructions. Without atomic r/w (memory) protection, there might be
racing r/w just within the few instructions, which caused the data
inconsistency. Frankly the ppc assembly consists of more instructions
than x86_64 for the same c code, which enlarged the possibility of
data racing.

We can observe the issue without the help of crash, just compare the
binary output of vmcore generated from the same core file, and
compress it with or without --num-threads option. Then compare it with
"cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
the 2 vmcores, and this is unexpected.

>
> and this is just out of curiosity, is the issue reproduced with
> makedumpfile compiled with -O0 too?

Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
share my findings...

Thanks,
Tao Liu

>
> Thanks,
> Kazu
>
> >
> > page_flag_buf->ready is read/write by main/sub-threads simultaneously,
> > but it is unprotected and unsafe. I have tested both mutex and atomic_rw
> > can fix this issue. This patch takes atomic_rw for its simplicity.
> >
> > [1]: https://github.com/makedumpfile/makedumpfile/issues/15
> >
> > Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> >
> > v2 -> v1: Add error message of crash into commit log
> >
> > ---
> >   makedumpfile.c | 21 ++++++++++++++-------
> >   1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index 2d3b08b..bac45c2 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -8621,7 +8621,8 @@ kdump_thread_function_cyclic(void *arg) {
> >
> >               while (buf_ready == FALSE) {
> >                       pthread_testcancel();
> > -                     if (page_flag_buf->ready == FLAG_READY)
> > +                     if (__atomic_load_n(&page_flag_buf->ready,
> > +                                     __ATOMIC_SEQ_CST) == FLAG_READY)
> >                               continue;
> >
> >                       /* get next dumpable pfn */
> > @@ -8637,7 +8638,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       info->current_pfn = pfn + 1;
> >
> >                       page_flag_buf->pfn = pfn;
> > -                     page_flag_buf->ready = FLAG_FILLING;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_FILLING,
> > +                                     __ATOMIC_SEQ_CST);
> >                       pthread_mutex_unlock(&info->current_pfn_mutex);
> >                       sem_post(&info->page_flag_buf_sem);
> >
> > @@ -8726,7 +8728,8 @@ kdump_thread_function_cyclic(void *arg) {
> >                       page_flag_buf->index = index;
> >                       buf_ready = TRUE;
> >   next:
> > -                     page_flag_buf->ready = FLAG_READY;
> > +                     __atomic_store_n(&page_flag_buf->ready, FLAG_READY,
> > +                                     __ATOMIC_SEQ_CST);
> >                       page_flag_buf = page_flag_buf->next;
> >
> >               }
> > @@ -8855,7 +8858,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                        * current_pfn is used for recording the value of pfn when checking the pfn.
> >                        */
> >                       for (i = 0; i < info->num_threads; i++) {
> > -                             if (info->page_flag_buf[i]->ready == FLAG_UNUSED)
> > +                             if (__atomic_load_n(&info->page_flag_buf[i]->ready,
> > +                                             __ATOMIC_SEQ_CST) == FLAG_UNUSED)
> >                                       continue;
> >                               temp_pfn = info->page_flag_buf[i]->pfn;
> >
> > @@ -8863,7 +8867,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                                * count how many threads have reached the end.
> >                                */
> >                               if (temp_pfn >= end_pfn) {
> > -                                     info->page_flag_buf[i]->ready = FLAG_UNUSED;
> > +                                     __atomic_store_n(&info->page_flag_buf[i]->ready,
> > +                                             FLAG_UNUSED, __ATOMIC_SEQ_CST);
> >                                       end_count++;
> >                                       continue;
> >                               }
> > @@ -8885,7 +8890,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                        * If the page_flag_buf is not ready, the pfn recorded may be changed.
> >                        * So we should recheck.
> >                        */
> > -                     if (info->page_flag_buf[consuming]->ready != FLAG_READY) {
> > +                     if (__atomic_load_n(&info->page_flag_buf[consuming]->ready,
> > +                                     __ATOMIC_SEQ_CST) != FLAG_READY) {
> >                               clock_gettime(CLOCK_MONOTONIC, &new);
> >                               if (new.tv_sec - last.tv_sec > WAIT_TIME) {
> >                                       ERRMSG("Can't get data of pfn.\n");
> > @@ -8927,7 +8933,8 @@ write_kdump_pages_parallel_cyclic(struct cache_data *cd_header,
> >                               goto out;
> >                       page_data_buf[index].used = FALSE;
> >               }
> > -             info->page_flag_buf[consuming]->ready = FLAG_UNUSED;
> > +             __atomic_store_n(&info->page_flag_buf[consuming]->ready,
> > +                             FLAG_UNUSED, __ATOMIC_SEQ_CST);
> >               info->page_flag_buf[consuming] = info->page_flag_buf[consuming]->next;
> >       }
> >   finish:



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-01  7:59   ` Tao Liu
@ 2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-02  4:36       ` Tao Liu
  2025-07-03 14:31     ` Petr Tesarik
  1 sibling, 1 reply; 16+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-07-02  0:13 UTC (permalink / raw)
  To: Tao Liu
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

On 2025/07/01 16:59, Tao Liu wrote:
> Hi Kazu,
> 
> Thanks for your comments!
> 
> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>
>> Hi Tao,
>>
>> thank you for the patch.
>>
>> On 2025/06/25 11:23, Tao Liu wrote:
>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>> reproduced with upstream makedumpfile.
>>>
>>> When analyzing the corrupt vmcore using crash, the following error
>>> message will output:
>>>
>>>       crash: compressed kdump: uncompress failed: 0
>>>       crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>       "hardirq thread_union"
>>>       crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>       crash: compressed kdump: uncompress failed: 0
>>>
>>> If the vmcore is generated without num-threads option, then no such
>>> errors are noticed.
>>>
>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>> sub-threads are producers which responsible for mm page processing, e.g.
>>> compression. The main thread is the consumer which responsible for
>>> writing the compressed data into file. page_flag_buf->ready is used to
>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>> looply check all threads of the ready flags, and break the loop when
>>> find FLAG_READY.
>>
>> I've tried to reproduce the issue, but I couldn't on x86_64.
> 
> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> easily reproduced on ppc64 arch, which is where our QE reported.
> Recently we have enabled --num-threads=N in rhel by default. N ==
> nr_cpus in 2nd kernel, so QE noticed the issue.

I see, thank you for the information.

> 
>>
>> Do you have any possible scenario that breaks a vmcore?  I could not
>> think of it only by looking at the code.
> 
> I guess the issue only been observed on ppc might be due to ppc's
> memory model, multi-thread scheduling algorithm etc. I'm not an expert
> on those. So I cannot give a clear explanation, sorry...

ok, I also don't think of how to debug this well..

> 
> The page_flag_buf->ready is an integer that r/w by main and sub
> threads simultaneously. And the assignment operation, like
> page_flag_buf->ready = 1, might be composed of several assembly
> instructions. Without atomic r/w (memory) protection, there might be
> racing r/w just within the few instructions, which caused the data
> inconsistency. Frankly the ppc assembly consists of more instructions
> than x86_64 for the same c code, which enlarged the possibility of
> data racing.
> 
> We can observe the issue without the help of crash, just compare the
> binary output of vmcore generated from the same core file, and
> compress it with or without --num-threads option. Then compare it with
> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
> the 2 vmcores, and this is unexpected.
> 
>>
>> and this is just out of curiosity, is the issue reproduced with
>> makedumpfile compiled with -O0 too?
> 
> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
> share my findings...

Thanks, we have to fix this anyway, I want a clue to think about a 
possible scenario..

Thanks,
Kazu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-07-02  4:36       ` Tao Liu
  2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
  0 siblings, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-07-02  4:36 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁)
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Kazu,

On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab@nec.com> wrote:
>
> On 2025/07/01 16:59, Tao Liu wrote:
> > Hi Kazu,
> >
> > Thanks for your comments!
> >
> > On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >>
> >> Hi Tao,
> >>
> >> thank you for the patch.
> >>
> >> On 2025/06/25 11:23, Tao Liu wrote:
> >>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> >>> reproduced with upstream makedumpfile.
> >>>
> >>> When analyzing the corrupt vmcore using crash, the following error
> >>> message will output:
> >>>
> >>>       crash: compressed kdump: uncompress failed: 0
> >>>       crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >>>       "hardirq thread_union"
> >>>       crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >>>       crash: compressed kdump: uncompress failed: 0
> >>>
> >>> If the vmcore is generated without num-threads option, then no such
> >>> errors are noticed.
> >>>
> >>> With --num-threads=N enabled, there will be N sub-threads created. All
> >>> sub-threads are producers which responsible for mm page processing, e.g.
> >>> compression. The main thread is the consumer which responsible for
> >>> writing the compressed data into file. page_flag_buf->ready is used to
> >>> sync main and sub-threads. When a sub-thread finishes page processing,
> >>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> >>> looply check all threads of the ready flags, and break the loop when
> >>> find FLAG_READY.
> >>
> >> I've tried to reproduce the issue, but I couldn't on x86_64.
> >
> > Yes, I cannot reproduce it on x86_64 either, but the issue is very
> > easily reproduced on ppc64 arch, which is where our QE reported.
> > Recently we have enabled --num-threads=N in rhel by default. N ==
> > nr_cpus in 2nd kernel, so QE noticed the issue.
>
> I see, thank you for the information.
>
> >
> >>
> >> Do you have any possible scenario that breaks a vmcore?  I could not
> >> think of it only by looking at the code.
> >
> > I guess the issue only been observed on ppc might be due to ppc's
> > memory model, multi-thread scheduling algorithm etc. I'm not an expert
> > on those. So I cannot give a clear explanation, sorry...
>
> ok, I also don't think of how to debug this well..
>
> >
> > The page_flag_buf->ready is an integer that r/w by main and sub
> > threads simultaneously. And the assignment operation, like
> > page_flag_buf->ready = 1, might be composed of several assembly
> > instructions. Without atomic r/w (memory) protection, there might be
> > racing r/w just within the few instructions, which caused the data
> > inconsistency. Frankly the ppc assembly consists of more instructions
> > than x86_64 for the same c code, which enlarged the possibility of
> > data racing.
> >
> > We can observe the issue without the help of crash, just compare the
> > binary output of vmcore generated from the same core file, and
> > compress it with or without --num-threads option. Then compare it with
> > "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
> > the 2 vmcores, and this is unexpected.
> >
> >>
> >> and this is just out of curiosity, is the issue reproduced with
> >> makedumpfile compiled with -O0 too?
> >
> > Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
> > share my findings...
>
> Thanks, we have to fix this anyway, I want a clue to think about a
> possible scenario..

1) Compiled with -O2 flag:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
Copying data                                      : [100.0 %] /
   eta: 0s

The dumpfile is saved to /tmp/out1.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out2
Copying data                                      : [100.0 %] |
   eta: 0s
Copying data                                      : [100.0 %] \
   eta: 0s

The dumpfile is saved to /tmp/out2.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
[root@ibm-p10-01-lp45 tmp]# cmp out1 out2
out1 out2 differ: byte 20786414, line 108064

2) Compiled with -O0 flag:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
Copying data                                      : [100.0 %] /
   eta: 0s

The dumpfile is saved to /tmp/out3.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out4
Copying data                                      : [100.0 %] |
   eta: 0s
Copying data                                      : [100.0 %] \
   eta: 0s

The dumpfile is saved to /tmp/out4.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
[root@ibm-p10-01-lp45 tmp]# cmp out3 out4
out3 out4 differ: byte 23948282, line 151739

Looks to me the O0/O2 have no difference for this case. If no problem,
the /tmp/outX generated from both single/multi thread should be
exactly the same, however the cmp reports there are differences. With
the v2 patch applied, there is no such difference:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
Copying data                                      : [100.0 %] /
   eta: 0s

The dumpfile is saved to /tmp/out5.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
31 -l ~/vmcore /tmp/out6
Copying data                                      : [100.0 %] |
   eta: 0s
Copying data                                      : [100.0 %] \
   eta: 0s

The dumpfile is saved to /tmp/out6.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
[root@ibm-p10-01-lp45 makedumpfile]#

Thanks,
Tao Liu
>
> Thanks,
> Kazu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-02  4:36       ` Tao Liu
@ 2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-02  5:03           ` Tao Liu
  2025-07-02  5:03           ` Sourabh Jain
  0 siblings, 2 replies; 16+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-07-02  4:52 UTC (permalink / raw)
  To: Tao Liu
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Tao,

On 2025/07/02 13:36, Tao Liu wrote:
> Hi Kazu,
> 
> On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
> <k-hagio-ab@nec.com> wrote:
>>
>> On 2025/07/01 16:59, Tao Liu wrote:
>>> Hi Kazu,
>>>
>>> Thanks for your comments!
>>>
>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>>>
>>>> Hi Tao,
>>>>
>>>> thank you for the patch.
>>>>
>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>> reproduced with upstream makedumpfile.
>>>>>
>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>> message will output:
>>>>>
>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>        "hardirq thread_union"
>>>>>        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>
>>>>> If the vmcore is generated without num-threads option, then no such
>>>>> errors are noticed.
>>>>>
>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>> compression. The main thread is the consumer which responsible for
>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>> looply check all threads of the ready flags, and break the loop when
>>>>> find FLAG_READY.
>>>>
>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>
>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>> Recently we have enabled --num-threads=N in rhel by default. N ==
>>> nr_cpus in 2nd kernel, so QE noticed the issue.
>>
>> I see, thank you for the information.
>>
>>>
>>>>
>>>> Do you have any possible scenario that breaks a vmcore?  I could not
>>>> think of it only by looking at the code.
>>>
>>> I guess the issue only been observed on ppc might be due to ppc's
>>> memory model, multi-thread scheduling algorithm etc. I'm not an expert
>>> on those. So I cannot give a clear explanation, sorry...
>>
>> ok, I also don't think of how to debug this well..
>>
>>>
>>> The page_flag_buf->ready is an integer that r/w by main and sub
>>> threads simultaneously. And the assignment operation, like
>>> page_flag_buf->ready = 1, might be composed of several assembly
>>> instructions. Without atomic r/w (memory) protection, there might be
>>> racing r/w just within the few instructions, which caused the data
>>> inconsistency. Frankly the ppc assembly consists of more instructions
>>> than x86_64 for the same c code, which enlarged the possibility of
>>> data racing.
>>>
>>> We can observe the issue without the help of crash, just compare the
>>> binary output of vmcore generated from the same core file, and
>>> compress it with or without --num-threads option. Then compare it with
>>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
>>> the 2 vmcores, and this is unexpected.
>>>
>>>>
>>>> and this is just out of curiosity, is the issue reproduced with
>>>> makedumpfile compiled with -O0 too?
>>>
>>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
>>> share my findings...
>>
>> Thanks, we have to fix this anyway, I want a clue to think about a
>> possible scenario..
> 
> 1) Compiled with -O2 flag:
> 
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
> Copying data                                      : [100.0 %] /
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out1.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> 31 -l ~/vmcore /tmp/out2
> Copying data                                      : [100.0 %] |
>     eta: 0s
> Copying data                                      : [100.0 %] \
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out2.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
> [root@ibm-p10-01-lp45 tmp]# cmp out1 out2
> out1 out2 differ: byte 20786414, line 108064
> 
> 2) Compiled with -O0 flag:
> 
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
> Copying data                                      : [100.0 %] /
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out3.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> 31 -l ~/vmcore /tmp/out4
> Copying data                                      : [100.0 %] |
>     eta: 0s
> Copying data                                      : [100.0 %] \
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out4.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
> [root@ibm-p10-01-lp45 tmp]# cmp out3 out4
> out3 out4 differ: byte 23948282, line 151739
> 
> Looks to me the O0/O2 have no difference for this case. If no problem,
> the /tmp/outX generated from both single/multi thread should be
> exactly the same, however the cmp reports there are differences. With
> the v2 patch applied, there is no such difference:
> 
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
> Copying data                                      : [100.0 %] /
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out5.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> 31 -l ~/vmcore /tmp/out6
> Copying data                                      : [100.0 %] |
>     eta: 0s
> Copying data                                      : [100.0 %] \
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out6.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
> [root@ibm-p10-01-lp45 makedumpfile]#

thank you for testing!  sorry one more thing,
does --num-threads=1 break the vmcore?

Thanks,
Kazu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-07-02  5:03           ` Tao Liu
  2025-07-02  6:02             ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-02  5:03           ` Sourabh Jain
  1 sibling, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-07-02  5:03 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁)
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Kazu,

On Wed, Jul 2, 2025 at 4:52 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>
> Hi Tao,
>
> On 2025/07/02 13:36, Tao Liu wrote:
> > Hi Kazu,
> >
> > On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
> > <k-hagio-ab@nec.com> wrote:
> >>
> >> On 2025/07/01 16:59, Tao Liu wrote:
> >>> Hi Kazu,
> >>>
> >>> Thanks for your comments!
> >>>
> >>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >>>>
> >>>> Hi Tao,
> >>>>
> >>>> thank you for the patch.
> >>>>
> >>>> On 2025/06/25 11:23, Tao Liu wrote:
> >>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> >>>>> reproduced with upstream makedumpfile.
> >>>>>
> >>>>> When analyzing the corrupt vmcore using crash, the following error
> >>>>> message will output:
> >>>>>
> >>>>>        crash: compressed kdump: uncompress failed: 0
> >>>>>        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >>>>>        "hardirq thread_union"
> >>>>>        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >>>>>        crash: compressed kdump: uncompress failed: 0
> >>>>>
> >>>>> If the vmcore is generated without num-threads option, then no such
> >>>>> errors are noticed.
> >>>>>
> >>>>> With --num-threads=N enabled, there will be N sub-threads created. All
> >>>>> sub-threads are producers which responsible for mm page processing, e.g.
> >>>>> compression. The main thread is the consumer which responsible for
> >>>>> writing the compressed data into file. page_flag_buf->ready is used to
> >>>>> sync main and sub-threads. When a sub-thread finishes page processing,
> >>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> >>>>> looply check all threads of the ready flags, and break the loop when
> >>>>> find FLAG_READY.
> >>>>
> >>>> I've tried to reproduce the issue, but I couldn't on x86_64.
> >>>
> >>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> >>> easily reproduced on ppc64 arch, which is where our QE reported.
> >>> Recently we have enabled --num-threads=N in rhel by default. N ==
> >>> nr_cpus in 2nd kernel, so QE noticed the issue.
> >>
> >> I see, thank you for the information.
> >>
> >>>
> >>>>
> >>>> Do you have any possible scenario that breaks a vmcore?  I could not
> >>>> think of it only by looking at the code.
> >>>
> >>> I guess the issue only been observed on ppc might be due to ppc's
> >>> memory model, multi-thread scheduling algorithm etc. I'm not an expert
> >>> on those. So I cannot give a clear explanation, sorry...
> >>
> >> ok, I also don't think of how to debug this well..
> >>
> >>>
> >>> The page_flag_buf->ready is an integer that r/w by main and sub
> >>> threads simultaneously. And the assignment operation, like
> >>> page_flag_buf->ready = 1, might be composed of several assembly
> >>> instructions. Without atomic r/w (memory) protection, there might be
> >>> racing r/w just within the few instructions, which caused the data
> >>> inconsistency. Frankly the ppc assembly consists of more instructions
> >>> than x86_64 for the same c code, which enlarged the possibility of
> >>> data racing.
> >>>
> >>> We can observe the issue without the help of crash, just compare the
> >>> binary output of vmcore generated from the same core file, and
> >>> compress it with or without --num-threads option. Then compare it with
> >>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
> >>> the 2 vmcores, and this is unexpected.
> >>>
> >>>>
> >>>> and this is just out of curiosity, is the issue reproduced with
> >>>> makedumpfile compiled with -O0 too?
> >>>
> >>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
> >>> share my findings...
> >>
> >> Thanks, we have to fix this anyway, I want a clue to think about a
> >> possible scenario..
> >
> > 1) Compiled with -O2 flag:
> >
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
> > Copying data                                      : [100.0 %] /
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out1.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out2
> > Copying data                                      : [100.0 %] |
> >     eta: 0s
> > Copying data                                      : [100.0 %] \
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out2.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
> > [root@ibm-p10-01-lp45 tmp]# cmp out1 out2
> > out1 out2 differ: byte 20786414, line 108064
> >
> > 2) Compiled with -O0 flag:
> >
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
> > Copying data                                      : [100.0 %] /
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out3.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out4
> > Copying data                                      : [100.0 %] |
> >     eta: 0s
> > Copying data                                      : [100.0 %] \
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out4.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
> > [root@ibm-p10-01-lp45 tmp]# cmp out3 out4
> > out3 out4 differ: byte 23948282, line 151739
> >
> > Looks to me the O0/O2 have no difference for this case. If no problem,
> > the /tmp/outX generated from both single/multi thread should be
> > exactly the same, however the cmp reports there are differences. With
> > the v2 patch applied, there is no such difference:
> >
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
> > Copying data                                      : [100.0 %] /
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out5.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
> > 31 -l ~/vmcore /tmp/out6
> > Copying data                                      : [100.0 %] |
> >     eta: 0s
> > Copying data                                      : [100.0 %] \
> >     eta: 0s
> >
> > The dumpfile is saved to /tmp/out6.
> >
> > makedumpfile Completed.
> > [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
> > [root@ibm-p10-01-lp45 makedumpfile]#
>
> thank you for testing!  sorry one more thing,
> does --num-threads=1 break the vmcore?

Yes:

[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out7
Copying data                                      : [100.0 %] /
   eta: 0s

The dumpfile is saved to /tmp/out7.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=1 -d
31 -l ~/vmcore /tmp/out8
Copying data                                      : [100.0 %] -
   eta: 0s
Copying data                                      : [100.0 %] /
   eta: 0s

The dumpfile is saved to /tmp/out8.

makedumpfile Completed.
[root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out7 /tmp/out8
/tmp/out7 /tmp/out8 differ: byte 11119019, line 49418

>
> Thanks,
> Kazu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-02  5:03           ` Tao Liu
@ 2025-07-02  5:03           ` Sourabh Jain
  1 sibling, 0 replies; 16+ messages in thread
From: Sourabh Jain @ 2025-07-02  5:03 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁), Tao Liu
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org

Hello Kazu,

On 02/07/25 10:22, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Tao,
>
> On 2025/07/02 13:36, Tao Liu wrote:
>> Hi Kazu,
>>
>> On Wed, Jul 2, 2025 at 12:13 PM HAGIO KAZUHITO(萩尾 一仁)
>> <k-hagio-ab@nec.com> wrote:
>>> On 2025/07/01 16:59, Tao Liu wrote:
>>>> Hi Kazu,
>>>>
>>>> Thanks for your comments!
>>>>
>>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>>>> Hi Tao,
>>>>>
>>>>> thank you for the patch.
>>>>>
>>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>>> reproduced with upstream makedumpfile.
>>>>>>
>>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>>> message will output:
>>>>>>
>>>>>>         crash: compressed kdump: uncompress failed: 0
>>>>>>         crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>>         "hardirq thread_union"
>>>>>>         crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>>         crash: compressed kdump: uncompress failed: 0
>>>>>>
>>>>>> If the vmcore is generated without num-threads option, then no such
>>>>>> errors are noticed.
>>>>>>
>>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>>> compression. The main thread is the consumer which responsible for
>>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>>> looply check all threads of the ready flags, and break the loop when
>>>>>> find FLAG_READY.
>>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>>> Recently we have enabled --num-threads=N in rhel by default. N ==
>>>> nr_cpus in 2nd kernel, so QE noticed the issue.
>>> I see, thank you for the information.
>>>
>>>>> Do you have any possible scenario that breaks a vmcore?  I could not
>>>>> think of it only by looking at the code.
>>>> I guess the issue only been observed on ppc might be due to ppc's
>>>> memory model, multi-thread scheduling algorithm etc. I'm not an expert
>>>> on those. So I cannot give a clear explanation, sorry...
>>> ok, I also don't think of how to debug this well..
>>>
>>>> The page_flag_buf->ready is an integer that r/w by main and sub
>>>> threads simultaneously. And the assignment operation, like
>>>> page_flag_buf->ready = 1, might be composed of several assembly
>>>> instructions. Without atomic r/w (memory) protection, there might be
>>>> racing r/w just within the few instructions, which caused the data
>>>> inconsistency. Frankly the ppc assembly consists of more instructions
>>>> than x86_64 for the same c code, which enlarged the possibility of
>>>> data racing.
>>>>
>>>> We can observe the issue without the help of crash, just compare the
>>>> binary output of vmcore generated from the same core file, and
>>>> compress it with or without --num-threads option. Then compare it with
>>>> "cmp vmcore1 vmcore2" cmdline, and cmp will output bytes differ for
>>>> the 2 vmcores, and this is unexpected.
>>>>
>>>>> and this is just out of curiosity, is the issue reproduced with
>>>>> makedumpfile compiled with -O0 too?
>>>> Sorry, I haven't done the -O0 experiment, I can do it tomorrow and
>>>> share my findings...
>>> Thanks, we have to fix this anyway, I want a clue to think about a
>>> possible scenario..
>> 1) Compiled with -O2 flag:
>>
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out1
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out1.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out2
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out2.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
>> [root@ibm-p10-01-lp45 tmp]# cmp out1 out2
>> out1 out2 differ: byte 20786414, line 108064
>>
>> 2) Compiled with -O0 flag:
>>
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out3
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out3.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out4
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out4.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# cd /tmp
>> [root@ibm-p10-01-lp45 tmp]# cmp out3 out4
>> out3 out4 differ: byte 23948282, line 151739
>>
>> Looks to me the O0/O2 have no difference for this case. If no problem,
>> the /tmp/outX generated from both single/multi thread should be
>> exactly the same, however the cmp reports there are differences. With
>> the v2 patch applied, there is no such difference:
>>
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out5
>> Copying data                                      : [100.0 %] /
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out5.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=2 -d
>> 31 -l ~/vmcore /tmp/out6
>> Copying data                                      : [100.0 %] |
>>      eta: 0s
>> Copying data                                      : [100.0 %] \
>>      eta: 0s
>>
>> The dumpfile is saved to /tmp/out6.
>>
>> makedumpfile Completed.
>> [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out5 /tmp/out6
>> [root@ibm-p10-01-lp45 makedumpfile]#
> thank you for testing!  sorry one more thing,
> does --num-threads=1 break the vmcore?

I was able to reproduce this issue with --num-threads=1. The reason is 
that when --num-threads is specified,
makedumpfile uses one producer and one consumer thread. So even with 
--num-threads=1, multithreading
is still in effect.

Thanks,
Sourabh Jain


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-02  5:03           ` Tao Liu
@ 2025-07-02  6:02             ` HAGIO KAZUHITO(萩尾 一仁)
  0 siblings, 0 replies; 16+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-07-02  6:02 UTC (permalink / raw)
  To: Tao Liu, sourabhjain@linux.ibm.com
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org

Hi Tao, Sourabh,

>> thank you for testing!  sorry one more thing,
>> does --num-threads=1 break the vmcore?
> 
> Yes:

Thank you for testing and information, certainly the race occurs between 
the main and sub-thread, I will check the code again.  If you could 
determine how it breaks the vmcore, please let me know.  It will be 
better to add the scenario to the commit log.

Thanks,
Kazu

> 
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile -d 31 -l ~/vmcore /tmp/out7
> Copying data                                      : [100.0 %] /
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out7.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# ./makedumpfile --num-threads=1 -d
> 31 -l ~/vmcore /tmp/out8
> Copying data                                      : [100.0 %] -
>     eta: 0s
> Copying data                                      : [100.0 %] /
>     eta: 0s
> 
> The dumpfile is saved to /tmp/out8.
> 
> makedumpfile Completed.
> [root@ibm-p10-01-lp45 makedumpfile]# cmp /tmp/out7 /tmp/out8
> /tmp/out7 /tmp/out8 differ: byte 11119019, line 49418
> 
>>
>> Thanks,
>> Kazu
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-01  7:59   ` Tao Liu
  2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-07-03 14:31     ` Petr Tesarik
  2025-07-03 22:35       ` Tao Liu
  1 sibling, 1 reply; 16+ messages in thread
From: Petr Tesarik @ 2025-07-03 14:31 UTC (permalink / raw)
  To: Tao Liu
  Cc: HAGIO KAZUHITO(萩尾 一仁),
	YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

On Tue, 1 Jul 2025 19:59:53 +1200
Tao Liu <ltao@redhat.com> wrote:

> Hi Kazu,
> 
> Thanks for your comments!
> 
> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >
> > Hi Tao,
> >
> > thank you for the patch.
> >
> > On 2025/06/25 11:23, Tao Liu wrote:  
> > > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > > reproduced with upstream makedumpfile.
> > >
> > > When analyzing the corrupt vmcore using crash, the following error
> > > message will output:
> > >
> > >      crash: compressed kdump: uncompress failed: 0
> > >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> > >      "hardirq thread_union"
> > >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> > >      crash: compressed kdump: uncompress failed: 0
> > >
> > > If the vmcore is generated without num-threads option, then no such
> > > errors are noticed.
> > >
> > > With --num-threads=N enabled, there will be N sub-threads created. All
> > > sub-threads are producers which responsible for mm page processing, e.g.
> > > compression. The main thread is the consumer which responsible for
> > > writing the compressed data into file. page_flag_buf->ready is used to
> > > sync main and sub-threads. When a sub-thread finishes page processing,
> > > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > > looply check all threads of the ready flags, and break the loop when
> > > find FLAG_READY.  
> >
> > I've tried to reproduce the issue, but I couldn't on x86_64.  
> 
> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> easily reproduced on ppc64 arch, which is where our QE reported.

Yes, this is expected. X86 implements a strongly ordered memory model,
so a "store-to-memory" instruction ensures that the new value is
immediately observed by other CPUs.

FWIW the current code is wrong even on X86, because it does nothing to
prevent compiler optimizations. The compiler is then allowed to reorder
instructions so that the write to page_flag_buf->ready happens after
other writes; with a bit of bad scheduling luck, the consumer thread
may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
Note that thanks to how compilers are designed (today), this issue is
more or less hypothetical. Nevertheless, the use of atomics fixes it,
because they also serve as memory barriers.

Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-03 14:31     ` Petr Tesarik
@ 2025-07-03 22:35       ` Tao Liu
  2025-07-04  6:49         ` HAGIO KAZUHITO(萩尾 一仁)
  0 siblings, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-07-03 22:35 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: HAGIO KAZUHITO(萩尾 一仁),
	YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi Petr,

On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
>
> On Tue, 1 Jul 2025 19:59:53 +1200
> Tao Liu <ltao@redhat.com> wrote:
>
> > Hi Kazu,
> >
> > Thanks for your comments!
> >
> > On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> > >
> > > Hi Tao,
> > >
> > > thank you for the patch.
> > >
> > > On 2025/06/25 11:23, Tao Liu wrote:
> > > > A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > > > reproduced with upstream makedumpfile.
> > > >
> > > > When analyzing the corrupt vmcore using crash, the following error
> > > > message will output:
> > > >
> > > >      crash: compressed kdump: uncompress failed: 0
> > > >      crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> > > >      "hardirq thread_union"
> > > >      crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> > > >      crash: compressed kdump: uncompress failed: 0
> > > >
> > > > If the vmcore is generated without num-threads option, then no such
> > > > errors are noticed.
> > > >
> > > > With --num-threads=N enabled, there will be N sub-threads created. All
> > > > sub-threads are producers which responsible for mm page processing, e.g.
> > > > compression. The main thread is the consumer which responsible for
> > > > writing the compressed data into file. page_flag_buf->ready is used to
> > > > sync main and sub-threads. When a sub-thread finishes page processing,
> > > > it will set ready flag to be FLAG_READY. In the meantime, main thread
> > > > looply check all threads of the ready flags, and break the loop when
> > > > find FLAG_READY.
> > >
> > > I've tried to reproduce the issue, but I couldn't on x86_64.
> >
> > Yes, I cannot reproduce it on x86_64 either, but the issue is very
> > easily reproduced on ppc64 arch, which is where our QE reported.
>
> Yes, this is expected. X86 implements a strongly ordered memory model,
> so a "store-to-memory" instruction ensures that the new value is
> immediately observed by other CPUs.
>
> FWIW the current code is wrong even on X86, because it does nothing to
> prevent compiler optimizations. The compiler is then allowed to reorder
> instructions so that the write to page_flag_buf->ready happens after
> other writes; with a bit of bad scheduling luck, the consumer thread
> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
> Note that thanks to how compilers are designed (today), this issue is
> more or less hypothetical. Nevertheless, the use of atomics fixes it,
> because they also serve as memory barriers.

Thanks a lot for your detailed explanation, it's very helpful! I
haven't thought of the possibility of instruction reordering and
atomic_rw prevents the reorder.

Thanks,
Tao Liu

>
> Petr T
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-03 22:35       ` Tao Liu
@ 2025-07-04  6:49         ` HAGIO KAZUHITO(萩尾 一仁)
  2025-07-04  7:51           ` Tao Liu
  0 siblings, 1 reply; 16+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-07-04  6:49 UTC (permalink / raw)
  To: Tao Liu, Petr Tesarik
  Cc: YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

On 2025/07/04 7:35, Tao Liu wrote:
> Hi Petr,
> 
> On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
>>
>> On Tue, 1 Jul 2025 19:59:53 +1200
>> Tao Liu <ltao@redhat.com> wrote:
>>
>>> Hi Kazu,
>>>
>>> Thanks for your comments!
>>>
>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>>>
>>>> Hi Tao,
>>>>
>>>> thank you for the patch.
>>>>
>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>> reproduced with upstream makedumpfile.
>>>>>
>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>> message will output:
>>>>>
>>>>>       crash: compressed kdump: uncompress failed: 0
>>>>>       crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>       "hardirq thread_union"
>>>>>       crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>       crash: compressed kdump: uncompress failed: 0
>>>>>
>>>>> If the vmcore is generated without num-threads option, then no such
>>>>> errors are noticed.
>>>>>
>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>> compression. The main thread is the consumer which responsible for
>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>> looply check all threads of the ready flags, and break the loop when
>>>>> find FLAG_READY.
>>>>
>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>
>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>
>> Yes, this is expected. X86 implements a strongly ordered memory model,
>> so a "store-to-memory" instruction ensures that the new value is
>> immediately observed by other CPUs.
>>
>> FWIW the current code is wrong even on X86, because it does nothing to
>> prevent compiler optimizations. The compiler is then allowed to reorder
>> instructions so that the write to page_flag_buf->ready happens after
>> other writes; with a bit of bad scheduling luck, the consumer thread
>> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
>> Note that thanks to how compilers are designed (today), this issue is
>> more or less hypothetical. Nevertheless, the use of atomics fixes it,
>> because they also serve as memory barriers.

Thank you Petr, for the information.  I was wondering whether atomic 
operations might be necessary for the other members of page_flag_buf, 
but it looks like they won't be necessary in this case.

Then I was convinced that the issue would be fixed by removing the 
inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.

Thanks,
Kazu

> 
> Thanks a lot for your detailed explanation, it's very helpful! I
> haven't thought of the possibility of instruction reordering and
> atomic_rw prevents the reorder.
> 
> Thanks,
> Tao Liu
> 
>>
>> Petr T
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-04  6:49         ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-07-04  7:51           ` Tao Liu
  2025-07-10  5:34             ` Tao Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-07-04  7:51 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁)
  Cc: Petr Tesarik,
	YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

On Fri, Jul 4, 2025 at 6:49 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>
> On 2025/07/04 7:35, Tao Liu wrote:
> > Hi Petr,
> >
> > On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
> >>
> >> On Tue, 1 Jul 2025 19:59:53 +1200
> >> Tao Liu <ltao@redhat.com> wrote:
> >>
> >>> Hi Kazu,
> >>>
> >>> Thanks for your comments!
> >>>
> >>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >>>>
> >>>> Hi Tao,
> >>>>
> >>>> thank you for the patch.
> >>>>
> >>>> On 2025/06/25 11:23, Tao Liu wrote:
> >>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> >>>>> reproduced with upstream makedumpfile.
> >>>>>
> >>>>> When analyzing the corrupt vmcore using crash, the following error
> >>>>> message will output:
> >>>>>
> >>>>>       crash: compressed kdump: uncompress failed: 0
> >>>>>       crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >>>>>       "hardirq thread_union"
> >>>>>       crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >>>>>       crash: compressed kdump: uncompress failed: 0
> >>>>>
> >>>>> If the vmcore is generated without num-threads option, then no such
> >>>>> errors are noticed.
> >>>>>
> >>>>> With --num-threads=N enabled, there will be N sub-threads created. All
> >>>>> sub-threads are producers which responsible for mm page processing, e.g.
> >>>>> compression. The main thread is the consumer which responsible for
> >>>>> writing the compressed data into file. page_flag_buf->ready is used to
> >>>>> sync main and sub-threads. When a sub-thread finishes page processing,
> >>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> >>>>> looply check all threads of the ready flags, and break the loop when
> >>>>> find FLAG_READY.
> >>>>
> >>>> I've tried to reproduce the issue, but I couldn't on x86_64.
> >>>
> >>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> >>> easily reproduced on ppc64 arch, which is where our QE reported.
> >>
> >> Yes, this is expected. X86 implements a strongly ordered memory model,
> >> so a "store-to-memory" instruction ensures that the new value is
> >> immediately observed by other CPUs.
> >>
> >> FWIW the current code is wrong even on X86, because it does nothing to
> >> prevent compiler optimizations. The compiler is then allowed to reorder
> >> instructions so that the write to page_flag_buf->ready happens after
> >> other writes; with a bit of bad scheduling luck, the consumer thread
> >> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
> >> Note that thanks to how compilers are designed (today), this issue is
> >> more or less hypothetical. Nevertheless, the use of atomics fixes it,
> >> because they also serve as memory barriers.
>
> Thank you Petr, for the information.  I was wondering whether atomic
> operations might be necessary for the other members of page_flag_buf,
> but it looks like they won't be necessary in this case.
>
> Then I was convinced that the issue would be fixed by removing the
> inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.
>

Thank you all for the patch review, patch testing and comments, these
have been so helpful!

Thanks,
Tao Liu

> Thanks,
> Kazu
>
> >
> > Thanks a lot for your detailed explanation, it's very helpful! I
> > haven't thought of the possibility of instruction reordering and
> > atomic_rw prevents the reorder.
> >
> > Thanks,
> > Tao Liu
> >
> >>
> >> Petr T
> >>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-04  7:51           ` Tao Liu
@ 2025-07-10  5:34             ` Tao Liu
  2025-07-11 12:08               ` YAMAZAKI MASAMITSU(山崎 真光)
  0 siblings, 1 reply; 16+ messages in thread
From: Tao Liu @ 2025-07-10  5:34 UTC (permalink / raw)
  To: HAGIO KAZUHITO(萩尾 一仁)
  Cc: Petr Tesarik,
	YAMAZAKI MASAMITSU(山崎 真光),
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Kindly ping...

Sorry to interrupt, could you please merge the patch since there are
few bugs which depend on the backporting of this patch?

Thanks,
Tao Liu


On Fri, Jul 4, 2025 at 7:51 PM Tao Liu <ltao@redhat.com> wrote:
>
> On Fri, Jul 4, 2025 at 6:49 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >
> > On 2025/07/04 7:35, Tao Liu wrote:
> > > Hi Petr,
> > >
> > > On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
> > >>
> > >> On Tue, 1 Jul 2025 19:59:53 +1200
> > >> Tao Liu <ltao@redhat.com> wrote:
> > >>
> > >>> Hi Kazu,
> > >>>
> > >>> Thanks for your comments!
> > >>>
> > >>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> > >>>>
> > >>>> Hi Tao,
> > >>>>
> > >>>> thank you for the patch.
> > >>>>
> > >>>> On 2025/06/25 11:23, Tao Liu wrote:
> > >>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> > >>>>> reproduced with upstream makedumpfile.
> > >>>>>
> > >>>>> When analyzing the corrupt vmcore using crash, the following error
> > >>>>> message will output:
> > >>>>>
> > >>>>>       crash: compressed kdump: uncompress failed: 0
> > >>>>>       crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> > >>>>>       "hardirq thread_union"
> > >>>>>       crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> > >>>>>       crash: compressed kdump: uncompress failed: 0
> > >>>>>
> > >>>>> If the vmcore is generated without num-threads option, then no such
> > >>>>> errors are noticed.
> > >>>>>
> > >>>>> With --num-threads=N enabled, there will be N sub-threads created. All
> > >>>>> sub-threads are producers which responsible for mm page processing, e.g.
> > >>>>> compression. The main thread is the consumer which responsible for
> > >>>>> writing the compressed data into file. page_flag_buf->ready is used to
> > >>>>> sync main and sub-threads. When a sub-thread finishes page processing,
> > >>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> > >>>>> looply check all threads of the ready flags, and break the loop when
> > >>>>> find FLAG_READY.
> > >>>>
> > >>>> I've tried to reproduce the issue, but I couldn't on x86_64.
> > >>>
> > >>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> > >>> easily reproduced on ppc64 arch, which is where our QE reported.
> > >>
> > >> Yes, this is expected. X86 implements a strongly ordered memory model,
> > >> so a "store-to-memory" instruction ensures that the new value is
> > >> immediately observed by other CPUs.
> > >>
> > >> FWIW the current code is wrong even on X86, because it does nothing to
> > >> prevent compiler optimizations. The compiler is then allowed to reorder
> > >> instructions so that the write to page_flag_buf->ready happens after
> > >> other writes; with a bit of bad scheduling luck, the consumer thread
> > >> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
> > >> Note that thanks to how compilers are designed (today), this issue is
> > >> more or less hypothetical. Nevertheless, the use of atomics fixes it,
> > >> because they also serve as memory barriers.
> >
> > Thank you Petr, for the information.  I was wondering whether atomic
> > operations might be necessary for the other members of page_flag_buf,
> > but it looks like they won't be necessary in this case.
> >
> > Then I was convinced that the issue would be fixed by removing the
> > inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.
> >
>
> Thank you all for the patch review, patch testing and comments, these
> have been so helpful!
>
> Thanks,
> Tao Liu
>
> > Thanks,
> > Kazu
> >
> > >
> > > Thanks a lot for your detailed explanation, it's very helpful! I
> > > haven't thought of the possibility of instruction reordering and
> > > atomic_rw prevents the reorder.
> > >
> > > Thanks,
> > > Tao Liu
> > >
> > >>
> > >> Petr T
> > >>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-10  5:34             ` Tao Liu
@ 2025-07-11 12:08               ` YAMAZAKI MASAMITSU(山崎 真光)
  2025-07-13 23:37                 ` Tao Liu
  0 siblings, 1 reply; 16+ messages in thread
From: YAMAZAKI MASAMITSU(山崎 真光) @ 2025-07-11 12:08 UTC (permalink / raw)
  To: Tao Liu, HAGIO KAZUHITO(萩尾 一仁)
  Cc: Petr Tesarik, kexec@lists.infradead.org,
	sourabhjain@linux.ibm.com

Sorry, I'm so rate.

I looked into the fix and I think it will work safely on other
architectures as well. I think it will also solve the problem
with ppc64. I accept and merge this patch.

Thank you for reporting this problem and providing the very
difficult fix.

Thanks,

Masa

On 2025/07/10 14:34, Tao Liu wrote:
> Kindly ping...
>
> Sorry to interrupt, could you please merge the patch since there are
> few bugs which depend on the backporting of this patch?
>
> Thanks,
> Tao Liu
>
>
> On Fri, Jul 4, 2025 at 7:51 PM Tao Liu <ltao@redhat.com> wrote:
>> On Fri, Jul 4, 2025 at 6:49 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>> On 2025/07/04 7:35, Tao Liu wrote:
>>>> Hi Petr,
>>>>
>>>> On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
>>>>> On Tue, 1 Jul 2025 19:59:53 +1200
>>>>> Tao Liu <ltao@redhat.com> wrote:
>>>>>
>>>>>> Hi Kazu,
>>>>>>
>>>>>> Thanks for your comments!
>>>>>>
>>>>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>>>>>> Hi Tao,
>>>>>>>
>>>>>>> thank you for the patch.
>>>>>>>
>>>>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>>>>> reproduced with upstream makedumpfile.
>>>>>>>>
>>>>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>>>>> message will output:
>>>>>>>>
>>>>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>>>>        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>>>>        "hardirq thread_union"
>>>>>>>>        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>>>>
>>>>>>>> If the vmcore is generated without num-threads option, then no such
>>>>>>>> errors are noticed.
>>>>>>>>
>>>>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>>>>> compression. The main thread is the consumer which responsible for
>>>>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>>>>> looply check all threads of the ready flags, and break the loop when
>>>>>>>> find FLAG_READY.
>>>>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>>>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>>>> Yes, this is expected. X86 implements a strongly ordered memory model,
>>>>> so a "store-to-memory" instruction ensures that the new value is
>>>>> immediately observed by other CPUs.
>>>>>
>>>>> FWIW the current code is wrong even on X86, because it does nothing to
>>>>> prevent compiler optimizations. The compiler is then allowed to reorder
>>>>> instructions so that the write to page_flag_buf->ready happens after
>>>>> other writes; with a bit of bad scheduling luck, the consumer thread
>>>>> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
>>>>> Note that thanks to how compilers are designed (today), this issue is
>>>>> more or less hypothetical. Nevertheless, the use of atomics fixes it,
>>>>> because they also serve as memory barriers.
>>> Thank you Petr, for the information.  I was wondering whether atomic
>>> operations might be necessary for the other members of page_flag_buf,
>>> but it looks like they won't be necessary in this case.
>>>
>>> Then I was convinced that the issue would be fixed by removing the
>>> inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.
>>>
>> Thank you all for the patch review, patch testing and comments, these
>> have been so helpful!
>>
>> Thanks,
>> Tao Liu
>>
>>> Thanks,
>>> Kazu
>>>
>>>> Thanks a lot for your detailed explanation, it's very helpful! I
>>>> haven't thought of the possibility of instruction reordering and
>>>> atomic_rw prevents the reorder.
>>>>
>>>> Thanks,
>>>> Tao Liu
>>>>
>>>>> Petr T
>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
  2025-07-11 12:08               ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-07-13 23:37                 ` Tao Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Tao Liu @ 2025-07-13 23:37 UTC (permalink / raw)
  To: YAMAZAKI MASAMITSU(山崎 真光)
  Cc: HAGIO KAZUHITO(萩尾 一仁), Petr Tesarik,
	kexec@lists.infradead.org, sourabhjain@linux.ibm.com

Hi YAMAZAKI,

On Sat, Jul 12, 2025 at 12:08 AM YAMAZAKI MASAMITSU(山崎 真光)
<yamazaki-msmt@nec.com> wrote:
>
> Sorry, I'm so rate.

No worries :)
>
> I looked into the fix and I think it will work safely on other
> architectures as well. I think it will also solve the problem
> with ppc64. I accept and merge this patch.
>
> Thank you for reporting this problem and providing the very
> difficult fix.

Thanks for your response and merging!

Thanks,
Tao Liu

>
> Thanks,
>
> Masa
>
> On 2025/07/10 14:34, Tao Liu wrote:
> > Kindly ping...
> >
> > Sorry to interrupt, could you please merge the patch since there are
> > few bugs which depend on the backporting of this patch?
> >
> > Thanks,
> > Tao Liu
> >
> >
> > On Fri, Jul 4, 2025 at 7:51 PM Tao Liu <ltao@redhat.com> wrote:
> >> On Fri, Jul 4, 2025 at 6:49 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >>> On 2025/07/04 7:35, Tao Liu wrote:
> >>>> Hi Petr,
> >>>>
> >>>> On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
> >>>>> On Tue, 1 Jul 2025 19:59:53 +1200
> >>>>> Tao Liu <ltao@redhat.com> wrote:
> >>>>>
> >>>>>> Hi Kazu,
> >>>>>>
> >>>>>> Thanks for your comments!
> >>>>>>
> >>>>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
> >>>>>>> Hi Tao,
> >>>>>>>
> >>>>>>> thank you for the patch.
> >>>>>>>
> >>>>>>> On 2025/06/25 11:23, Tao Liu wrote:
> >>>>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
> >>>>>>>> reproduced with upstream makedumpfile.
> >>>>>>>>
> >>>>>>>> When analyzing the corrupt vmcore using crash, the following error
> >>>>>>>> message will output:
> >>>>>>>>
> >>>>>>>>        crash: compressed kdump: uncompress failed: 0
> >>>>>>>>        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
> >>>>>>>>        "hardirq thread_union"
> >>>>>>>>        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
> >>>>>>>>        crash: compressed kdump: uncompress failed: 0
> >>>>>>>>
> >>>>>>>> If the vmcore is generated without num-threads option, then no such
> >>>>>>>> errors are noticed.
> >>>>>>>>
> >>>>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
> >>>>>>>> sub-threads are producers which responsible for mm page processing, e.g.
> >>>>>>>> compression. The main thread is the consumer which responsible for
> >>>>>>>> writing the compressed data into file. page_flag_buf->ready is used to
> >>>>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
> >>>>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
> >>>>>>>> looply check all threads of the ready flags, and break the loop when
> >>>>>>>> find FLAG_READY.
> >>>>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
> >>>>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
> >>>>>> easily reproduced on ppc64 arch, which is where our QE reported.
> >>>>> Yes, this is expected. X86 implements a strongly ordered memory model,
> >>>>> so a "store-to-memory" instruction ensures that the new value is
> >>>>> immediately observed by other CPUs.
> >>>>>
> >>>>> FWIW the current code is wrong even on X86, because it does nothing to
> >>>>> prevent compiler optimizations. The compiler is then allowed to reorder
> >>>>> instructions so that the write to page_flag_buf->ready happens after
> >>>>> other writes; with a bit of bad scheduling luck, the consumer thread
> >>>>> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
> >>>>> Note that thanks to how compilers are designed (today), this issue is
> >>>>> more or less hypothetical. Nevertheless, the use of atomics fixes it,
> >>>>> because they also serve as memory barriers.
> >>> Thank you Petr, for the information.  I was wondering whether atomic
> >>> operations might be necessary for the other members of page_flag_buf,
> >>> but it looks like they won't be necessary in this case.
> >>>
> >>> Then I was convinced that the issue would be fixed by removing the
> >>> inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.
> >>>
> >> Thank you all for the patch review, patch testing and comments, these
> >> have been so helpful!
> >>
> >> Thanks,
> >> Tao Liu
> >>
> >>> Thanks,
> >>> Kazu
> >>>
> >>>> Thanks a lot for your detailed explanation, it's very helpful! I
> >>>> haven't thought of the possibility of instruction reordering and
> >>>> atomic_rw prevents the reorder.
> >>>>
> >>>> Thanks,
> >>>> Tao Liu
> >>>>
> >>>>> Petr T
> >>>>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-13 23:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25  2:23 [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N) Tao Liu
2025-07-01  7:38 ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-01  7:59   ` Tao Liu
2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  4:36       ` Tao Liu
2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  5:03           ` Tao Liu
2025-07-02  6:02             ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  5:03           ` Sourabh Jain
2025-07-03 14:31     ` Petr Tesarik
2025-07-03 22:35       ` Tao Liu
2025-07-04  6:49         ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-04  7:51           ` Tao Liu
2025-07-10  5:34             ` Tao Liu
2025-07-11 12:08               ` YAMAZAKI MASAMITSU(山崎 真光)
2025-07-13 23:37                 ` Tao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox