Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "YAMAZAKI MASAMITSU(山崎 真光)" <yamazaki-msmt@nec.com>
To: "Tao Liu" <ltao@redhat.com>,
	"HAGIO KAZUHITO(萩尾 一仁)" <k-hagio-ab@nec.com>
Cc: Petr Tesarik <ptesarik@suse.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"sourabhjain@linux.ibm.com" <sourabhjain@linux.ibm.com>
Subject: Re: [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N)
Date: Fri, 11 Jul 2025 12:08:36 +0000	[thread overview]
Message-ID: <004de18c-263a-405d-9d5a-e83d4c391df7@nec.com> (raw)
In-Reply-To: <CAO7dBbVanh9oJT_PaMWZ7+3V5Bx2iz7CRNxyGSxfSsfZ5wFv2g@mail.gmail.com>

Sorry, I'm so rate.

I looked into the fix and I think it will work safely on other
architectures as well. I think it will also solve the problem
with ppc64. I accept and merge this patch.

Thank you for reporting this problem and providing the very
difficult fix.

Thanks,

Masa

On 2025/07/10 14:34, Tao Liu wrote:
> Kindly ping...
>
> Sorry to interrupt, could you please merge the patch since there are
> few bugs which depend on the backporting of this patch?
>
> Thanks,
> Tao Liu
>
>
> On Fri, Jul 4, 2025 at 7:51 PM Tao Liu <ltao@redhat.com> wrote:
>> On Fri, Jul 4, 2025 at 6:49 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>> On 2025/07/04 7:35, Tao Liu wrote:
>>>> Hi Petr,
>>>>
>>>> On Fri, Jul 4, 2025 at 2:31 AM Petr Tesarik <ptesarik@suse.com> wrote:
>>>>> On Tue, 1 Jul 2025 19:59:53 +1200
>>>>> Tao Liu <ltao@redhat.com> wrote:
>>>>>
>>>>>> Hi Kazu,
>>>>>>
>>>>>> Thanks for your comments!
>>>>>>
>>>>>> On Tue, Jul 1, 2025 at 7:38 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
>>>>>>> Hi Tao,
>>>>>>>
>>>>>>> thank you for the patch.
>>>>>>>
>>>>>>> On 2025/06/25 11:23, Tao Liu wrote:
>>>>>>>> A vmcore corrupt issue has been noticed in powerpc arch [1]. It can be
>>>>>>>> reproduced with upstream makedumpfile.
>>>>>>>>
>>>>>>>> When analyzing the corrupt vmcore using crash, the following error
>>>>>>>> message will output:
>>>>>>>>
>>>>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>>>>        crash: read error: kernel virtual address: c0001e2d2fe48000  type:
>>>>>>>>        "hardirq thread_union"
>>>>>>>>        crash: cannot read hardirq_ctx[930] at c0001e2d2fe48000
>>>>>>>>        crash: compressed kdump: uncompress failed: 0
>>>>>>>>
>>>>>>>> If the vmcore is generated without num-threads option, then no such
>>>>>>>> errors are noticed.
>>>>>>>>
>>>>>>>> With --num-threads=N enabled, there will be N sub-threads created. All
>>>>>>>> sub-threads are producers which responsible for mm page processing, e.g.
>>>>>>>> compression. The main thread is the consumer which responsible for
>>>>>>>> writing the compressed data into file. page_flag_buf->ready is used to
>>>>>>>> sync main and sub-threads. When a sub-thread finishes page processing,
>>>>>>>> it will set ready flag to be FLAG_READY. In the meantime, main thread
>>>>>>>> looply check all threads of the ready flags, and break the loop when
>>>>>>>> find FLAG_READY.
>>>>>>> I've tried to reproduce the issue, but I couldn't on x86_64.
>>>>>> Yes, I cannot reproduce it on x86_64 either, but the issue is very
>>>>>> easily reproduced on ppc64 arch, which is where our QE reported.
>>>>> Yes, this is expected. X86 implements a strongly ordered memory model,
>>>>> so a "store-to-memory" instruction ensures that the new value is
>>>>> immediately observed by other CPUs.
>>>>>
>>>>> FWIW the current code is wrong even on X86, because it does nothing to
>>>>> prevent compiler optimizations. The compiler is then allowed to reorder
>>>>> instructions so that the write to page_flag_buf->ready happens after
>>>>> other writes; with a bit of bad scheduling luck, the consumer thread
>>>>> may see an inconsistent state (e.g. read a stale page_flag_buf->pfn).
>>>>> Note that thanks to how compilers are designed (today), this issue is
>>>>> more or less hypothetical. Nevertheless, the use of atomics fixes it,
>>>>> because they also serve as memory barriers.
>>> Thank you Petr, for the information.  I was wondering whether atomic
>>> operations might be necessary for the other members of page_flag_buf,
>>> but it looks like they won't be necessary in this case.
>>>
>>> Then I was convinced that the issue would be fixed by removing the
>>> inconsistency of page_flag_buf->ready.  And the patch tested ok, so ack.
>>>
>> Thank you all for the patch review, patch testing and comments, these
>> have been so helpful!
>>
>> Thanks,
>> Tao Liu
>>
>>> Thanks,
>>> Kazu
>>>
>>>> Thanks a lot for your detailed explanation, it's very helpful! I
>>>> haven't thought of the possibility of instruction reordering and
>>>> atomic_rw prevents the reorder.
>>>>
>>>> Thanks,
>>>> Tao Liu
>>>>
>>>>> Petr T
>>>>>

  reply	other threads:[~2025-07-11 12:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25  2:23 [PATCH v2][makedumpfile] Fix a data race in multi-threading mode (--num-threads=N) Tao Liu
2025-07-01  7:38 ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-01  7:59   ` Tao Liu
2025-07-02  0:13     ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  4:36       ` Tao Liu
2025-07-02  4:52         ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  5:03           ` Tao Liu
2025-07-02  6:02             ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-02  5:03           ` Sourabh Jain
2025-07-03 14:31     ` Petr Tesarik
2025-07-03 22:35       ` Tao Liu
2025-07-04  6:49         ` HAGIO KAZUHITO(萩尾 一仁)
2025-07-04  7:51           ` Tao Liu
2025-07-10  5:34             ` Tao Liu
2025-07-11 12:08               ` YAMAZAKI MASAMITSU(山崎 真光) [this message]
2025-07-13 23:37                 ` Tao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=004de18c-263a-405d-9d5a-e83d4c391df7@nec.com \
    --to=yamazaki-msmt@nec.com \
    --cc=k-hagio-ab@nec.com \
    --cc=kexec@lists.infradead.org \
    --cc=ltao@redhat.com \
    --cc=ptesarik@suse.com \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox