public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Eric DeVolder <eric.devolder@oracle.com>,
	Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, ebiederm@xmission.com,
	dyoung@redhat.com, vgoyal@redhat.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, nramas@linux.microsoft.com,
	thomas.lendacky@amd.com, robh@kernel.org, efault@gmx.de,
	rppt@kernel.org, david@redhat.com, konrad.wilk@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: [PATCH v12 3/7] crash: add generic infrastructure for crash hotplug support
Date: Mon, 24 Oct 2022 17:10:01 +0800	[thread overview]
Message-ID: <Y1ZWaSeGk53QqZHX@MiWiFi-R3L-srv> (raw)
In-Reply-To: <97f2daae-f34a-86bb-6d28-8aa8314321bc@oracle.com>

Hi Eric, Sourabh,

On 10/07/22 at 02:14pm, Eric DeVolder wrote:
> 
> 
> On 10/3/22 12:51, Sourabh Jain wrote:
> > Hello Eric,
> > 
> > On 10/09/22 02:35, Eric DeVolder wrote:
......
> > > +static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> > > +{
> > > +    /* Obtain lock while changing crash information */
> > > +    mutex_lock(&kexec_mutex);
> > > +
> > > +    /* Check kdump is loaded */
> > > +    if (kexec_crash_image) {
> > > +        struct kimage *image = kexec_crash_image;
> > > +
> > > +        if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
> > > +            hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
> > > +            pr_debug("crash hp: hp_action %u, cpu %u\n", hp_action, cpu);
> > > +        else
> > > +            pr_debug("crash hp: hp_action %u\n", hp_action);
> > > +
> > > +        /*
> > > +         * When the struct kimage is allocated, it is wiped to zero, so
> > > +         * the elfcorehdr_index_valid defaults to false. Find the
> > > +         * segment containing the elfcorehdr, if not already found.
> > > +         * This works for both the kexec_load and kexec_file_load paths.
> > > +         */
> > > +        if (!image->elfcorehdr_index_valid) {
> > > +            unsigned char *ptr;
> > > +            unsigned long mem, memsz;
> > > +            unsigned int n;
> > > +
> > > +            for (n = 0; n < image->nr_segments; n++) {
> > > +                mem = image->segment[n].mem;
> > > +                memsz = image->segment[n].memsz;
> > > +                ptr = arch_map_crash_pages(mem, memsz);
> > > +                if (ptr) {
> > > +                    /* The segment containing elfcorehdr */
> > > +                    if (memcmp(ptr, ELFMAG, SELFMAG) == 0) {
> > > +                        image->elfcorehdr_index = (int)n;
> > > +                        image->elfcorehdr_index_valid = true;
> > > +                    }
> > > +                }
> > > +                arch_unmap_crash_pages((void **)&ptr);
> > > +            }
> > > +        }
> > > +
> > > +        if (!image->elfcorehdr_index_valid) {
> > > +            pr_err("crash hp: unable to locate elfcorehdr segment");
> > > +            goto out;
> > > +        }
> > > +
> > > +        /* Needed in order for the segments to be updated */
> > > +        arch_kexec_unprotect_crashkres();
> > > +
> > > +        /* Flag to differentiate between normal load and hotplug */
> > > +        image->hotplug_event = true;
> > > +
> > > +        /* Now invoke arch-specific update handler */
> > > +        arch_crash_handle_hotplug_event(image, hp_action);
> > > +
> > > +        /* No longer handling a hotplug event */
> > > +        image->hotplug_event = false;
> > > +
> > > +        /* Change back to read-only */
> > > +        arch_kexec_protect_crashkres();
> > > +    }
> > > +
> > > +out:
> > > +    /* Release lock now that update complete */
> > > +    mutex_unlock(&kexec_mutex);
> > > +}
> > > +
> > > +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> > > +{
> > > +    switch (val) {
> > > +    case MEM_ONLINE:
> > > +        handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, 0);
> > > +        break;
> > > +
> > > +    case MEM_OFFLINE:
> > > +        handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, 0);
> > > +        break;
> > > +    }
> > > +    return NOTIFY_OK;
> > 
> > Can we pass v (memory_notify) argument to arch_crash_handle_hotplug_event function
> > via handle_hotplug_event?
> > 
> > Because the way memory hotplug is handled on PowerPC, it is hard to update the elfcorehdr
> > without memory_notify args.
> > 
> > On PowePC memblock data structure is used to prepare elfcorehdr for kdump. Since the notifier
> > used for memory hotplug crash handler get initiated before the memblock data structure update
> > happens (as depicted below), the newly prepared elfcorehdr still holds the old memory regions.
> > So if the system crash with obsolete elfcorehdr, makedumpfile failed to collect vmcore.
> > 
> > Sequence of actions done on PowerPC to server the memory hotplug:
> > 
> >   Initiate memory hot remove
> >            |
> >            v
> >   offline pages
> >            |
> >            v
> >   initiate memory notify call chain
> >   for MEM_OFFLINE event.
> >   (same is used for crash update)
> >            |
> >            v
> >   prepare new elfcorehdr for kdump using
> >   memblock data structure
> >            |
> >            v
> >   update memblock data structure
> > 
> > How passing memory_notify to arch crash hotplug handler will help?
> > 
> > memory_notify holds the start PFN and page count, with that we can get
> > the base address and size of hot unplugged memory and can use the same
> > to avoid hot unplugged memeory region to get added in the elfcorehdr..
> > 
> > Thanks,
> > Sourabh Jain
> > 
> 
> Sourabh, let's see what Baoquan thinks.
> 
> Baoquan, are you OK with this request? I once had these parameters to the
> crash hotplug handler and since they were unused at the time, you asked
> that I remove them, which I did.

Sorry to miss this mail. I thought both of you were talking about
somthing, and didn't notice this question to me.

I think there are two ways to solve the issue Sourabh raised:
1) make handle_hotplug_event() get and pass down the memory_notify as
Sourabh said, or the hp_action, mem_start|size as Eric suggested. I
have to admit I haven't carefully checked which one is better.

2) let the current code as is since it's aiming at x86 only. Later
Sourabh can modify code according to his need on ppc. This can give
satisfying why on code change each time.

I personally like the 2nd way, while also like seeing 1st one if the
code change and log is convincing to any reviewer.

> 
> To accommodate this, how about this:
> 
> static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu,
>      unsigned long mem_start, unsigned long mem_size)
> 
> For CPU events, I would just pass zeros for mem_start/size. For memory events,
> I would pass KEXEC_CRASH_HP_INVALID_CPU.
> 
> Thanks,
> eric


  parent reply	other threads:[~2022-10-24  9:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 21:05 [PATCH v12 0/7] crash: Kernel handling of CPU and memory hot un/plug Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 1/7] crash: move crash_prepare_elf64_headers Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 2/7] crash: prototype change for crash_prepare_elf64_headers Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 3/7] crash: add generic infrastructure for crash hotplug support Eric DeVolder
2022-10-03 17:51   ` Sourabh Jain
2022-10-07 19:14     ` Eric DeVolder
2022-10-17  6:45       ` Sourabh Jain
2022-10-24  9:10       ` Baoquan He [this message]
2022-10-26  7:00         ` Sourabh Jain
2022-10-04  6:38   ` Sourabh Jain
2022-10-07 19:19     ` Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 4/7] kexec: exclude elfcorehdr from the segment digest Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 5/7] kexec: exclude hot remove cpu from elfcorehdr notes Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 6/7] crash: memory and cpu hotplug sysfs attributes Eric DeVolder
2022-09-09 21:05 ` [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support Eric DeVolder
2022-09-12  6:52   ` Borislav Petkov
2022-09-13 19:12     ` Eric DeVolder
2022-09-26 19:19       ` Eric DeVolder
2022-09-28 16:07       ` Borislav Petkov
2022-09-28 16:38         ` Borislav Petkov
2022-09-30 15:36         ` Eric DeVolder
2022-09-30 16:50           ` Borislav Petkov
2022-09-30 17:11             ` Eric DeVolder
2022-09-30 17:40               ` Borislav Petkov
2022-10-08  2:35                 ` Baoquan He
2022-10-12 17:46                   ` Borislav Petkov
2022-10-12 20:19                     ` Eric DeVolder
2022-10-12 20:41                       ` Borislav Petkov
2022-10-13  2:57                         ` Baoquan He
2022-10-25 10:31                           ` Borislav Petkov
2022-10-26 14:48                             ` Baoquan He
2022-10-26 14:54                               ` David Hildenbrand
2022-10-27 13:52                                 ` Baoquan He
2022-10-27 19:28                                   ` Eric DeVolder
2022-10-29  4:27                                     ` Baoquan He
2022-10-27 19:24                               ` Eric DeVolder
2022-10-28 10:19                                 ` Borislav Petkov
2022-10-28 15:29                                   ` Eric DeVolder
2022-10-28 17:06                                     ` Borislav Petkov
2022-10-28 19:26                                       ` Eric DeVolder
2022-10-28 20:30                                         ` Borislav Petkov
2022-10-28 20:34                                           ` Eric DeVolder
2022-10-28 21:22                                           ` Eric DeVolder
2022-10-28 22:19                                             ` Borislav Petkov
2022-10-12 20:42                       ` Eric DeVolder
2022-10-12 16:20                 ` Eric DeVolder
2022-10-25 10:39                   ` Borislav Petkov
2022-10-04  7:03           ` Sourabh Jain
2022-10-07 19:56             ` Eric DeVolder
2022-10-04  9:10           ` Sourabh Jain
2022-10-07 20:00             ` Eric DeVolder
2022-10-12  4:55               ` Sourabh Jain
2022-10-12 16:23                 ` Eric DeVolder
2022-09-19  7:06   ` Sourabh Jain
2022-10-07 19:33     ` Eric DeVolder
2022-10-17  6:54       ` Sourabh Jain
2022-09-12  3:47 ` [PATCH v12 0/7] crash: Kernel handling of CPU and memory hot un/plug Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1ZWaSeGk53QqZHX@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=eric.devolder@oracle.com \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nramas@linux.microsoft.com \
    --cc=robh@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sourabhjain@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox