All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] kexec command fails after cpu hot-removing
@ 2014-06-05  5:10 Takao Indoh
  2014-06-05  5:10 ` [PATCH 1/2] Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous Takao Indoh
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Takao Indoh @ 2014-06-05  5:10 UTC (permalink / raw)
  To: horms, kexec

After cpu hot-removing, kexec command fails with the following message.

"/sys/devices" does not exist. Sysfs does not seem to be mounted. Try
mounting sysfs.

Of course sysfs is mounted. kexec tried to open
/sys/devices/system/cpu/cpu30/crash_notes but it failed because
/sys/devices/system/cpu/cpu30 did not exist.


[Before hot-remove]

# ls /sys/devices/system/cpu/
cpu0    cpu111  cpu18  cpu31  cpu45  cpu59  cpu72  cpu86  cpuidle
cpu1    cpu112  cpu19  cpu32  cpu46  cpu6   cpu73  cpu87  intel_pstate
cpu10   cpu113  cpu2   cpu33  cpu47  cpu60  cpu74  cpu88  kernel_max
cpu100  cpu114  cpu20  cpu34  cpu48  cpu61  cpu75  cpu89  microcode
cpu101  cpu115  cpu21  cpu35  cpu49  cpu62  cpu76  cpu9   modalias
cpu102  cpu116  cpu22  cpu36  cpu5   cpu63  cpu77  cpu90  offline
cpu103  cpu117  cpu23  cpu37  cpu50  cpu64  cpu78  cpu91  online
cpu104  cpu118  cpu24  cpu38  cpu51  cpu65  cpu79  cpu92  possible
cpu105  cpu119  cpu25  cpu39  cpu52  cpu66  cpu8   cpu93  power
cpu106  cpu12   cpu26  cpu4   cpu53  cpu67  cpu80  cpu94  present
cpu107  cpu13   cpu27  cpu40  cpu54  cpu68  cpu81  cpu95  probe
cpu108  cpu14   cpu28  cpu41  cpu55  cpu69  cpu82  cpu96  release
cpu109  cpu15   cpu29  cpu42  cpu56  cpu7   cpu83  cpu97  uevent
cpu11   cpu16   cpu3   cpu43  cpu57  cpu70  cpu84  cpu98
cpu110  cpu17   cpu30  cpu44  cpu58  cpu71  cpu85  cpu99


[After hot-remove]

# ls /sys/devices/system/cpu/
cpu0   cpu16  cpu23  cpu4   cpu65  cpu72  cpu8   cpu87         modalias uevent
cpu1   cpu17  cpu24  cpu5   cpu66  cpu73  cpu80  cpu88         offline
cpu10  cpu18  cpu25  cpu6   cpu67  cpu74  cpu81  cpu89         online
cpu11  cpu19  cpu26  cpu60  cpu68  cpu75  cpu82  cpu9          possible
cpu12  cpu2   cpu27  cpu61  cpu69  cpu76  cpu83  cpuidle       power
cpu13  cpu20  cpu28  cpu62  cpu7   cpu77  cpu84  intel_pstate  present
cpu14  cpu21  cpu29  cpu63  cpu70  cpu78  cpu85  kernel_max    probe
cpu15  cpu22  cpu3   cpu64  cpu71  cpu79  cpu86  microcode     release

You can see cpu30 to cpu59, and cpu90 to cpu119 does not exist, they are
removed from this system by hot-remove operation.

kexec command expects the number of each directory is contiguous. For
example, if sysconf(_SC_NPROCESSORS_CONF) is 60, kexec thinks directory
cpu0, cpu1, cpu2, .... cpu59 exists. But in this case, the number of
directory is not contiguous. That is the root cause of this problem.
This patches fix it.

Takao Indoh (2):
  Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous
  Fix mistaken check of stat(2) return value

 kexec/crashdump-elf.c | 5 ++++-
 kexec/crashdump.c     | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

-- 
1.9.3



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous
  2014-06-05  5:10 [PATCH 0/2] kexec command fails after cpu hot-removing Takao Indoh
@ 2014-06-05  5:10 ` Takao Indoh
  2014-06-05  5:10 ` [PATCH 2/2] Fix mistaken check of stat(2) return value Takao Indoh
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Takao Indoh @ 2014-06-05  5:10 UTC (permalink / raw)
  To: horms, kexec

There is a case that the number of /sys/devices/system/cpu/cpuN is not
contiguous, for example after cpu hot removing. This patch fixes so that
all /sys/devices/system/cpu/cpuN is handled when they are discontiguous.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 kexec/crashdump-elf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c
index 2baa357..c869347 100644
--- a/kexec/crashdump-elf.c
+++ b/kexec/crashdump-elf.c
@@ -41,6 +41,7 @@ int FUNC(struct kexec_info *info,
 	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
 	int has_vmcoreinfo = 0;
 	int (*get_note_info)(int cpu, uint64_t *addr, uint64_t *len);
+	long int count_cpu;
 
 	if (xen_present())
 		nr_cpus = xen_get_nr_phys_cpus();
@@ -138,11 +139,13 @@ int FUNC(struct kexec_info *info,
 
 	/* PT_NOTE program headers. One per cpu */
 
-	for (i = 0; i < nr_cpus; i++) {
+	count_cpu = nr_cpus;
+	for (i = 0; count_cpu > 0; i++) {
 		if (get_note_info(i, &notes_addr, &notes_len) < 0) {
 			/* This cpu is not present. Skip it. */
 			continue;
 		}
+		count_cpu--;
 
 		phdr = (PHDR *) bufp;
 		bufp += sizeof(PHDR);
-- 
1.9.3



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] Fix mistaken check of stat(2) return value
  2014-06-05  5:10 [PATCH 0/2] kexec command fails after cpu hot-removing Takao Indoh
  2014-06-05  5:10 ` [PATCH 1/2] Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous Takao Indoh
@ 2014-06-05  5:10 ` Takao Indoh
  2014-06-05  6:04 ` [PATCH 0/2] kexec command fails after cpu hot-removing Zhang Yanfei
  2014-06-05  7:32 ` WANG Chao
  3 siblings, 0 replies; 6+ messages in thread
From: Takao Indoh @ 2014-06-05  5:10 UTC (permalink / raw)
  To: horms, kexec

get_crash_notes_per_cpu() should return -1 if return value of stat(2) is
zero (on success).

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 kexec/crashdump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kexec/crashdump.c b/kexec/crashdump.c
index 131e624..15c1105 100644
--- a/kexec/crashdump.c
+++ b/kexec/crashdump.c
@@ -84,7 +84,7 @@ int get_crash_notes_per_cpu(int cpu, uint64_t *addr, uint64_t *len)
 		if (fopen_errno != ENOENT)
 			die("Could not open \"%s\": %s\n", crash_notes,
 			    strerror(fopen_errno));
-		if (!stat("/sys/devices", &cpu_stat)) {
+		if (stat("/sys/devices", &cpu_stat)) {
 			stat_errno = errno;
 			if (stat_errno == ENOENT)
 				die("\"/sys/devices\" does not exist. "
-- 
1.9.3



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] kexec command fails after cpu hot-removing
  2014-06-05  5:10 [PATCH 0/2] kexec command fails after cpu hot-removing Takao Indoh
  2014-06-05  5:10 ` [PATCH 1/2] Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous Takao Indoh
  2014-06-05  5:10 ` [PATCH 2/2] Fix mistaken check of stat(2) return value Takao Indoh
@ 2014-06-05  6:04 ` Zhang Yanfei
  2014-06-05  9:08   ` Simon Horman
  2014-06-05  7:32 ` WANG Chao
  3 siblings, 1 reply; 6+ messages in thread
From: Zhang Yanfei @ 2014-06-05  6:04 UTC (permalink / raw)
  To: Takao Indoh; +Cc: horms, kexec

I think maybe no one had tested kexec-tools after cpu hot-remove. So
the bug remains until today.

For both two patches:

Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

On 06/05/2014 01:10 PM, Takao Indoh wrote:
> After cpu hot-removing, kexec command fails with the following message.
> 
> "/sys/devices" does not exist. Sysfs does not seem to be mounted. Try
> mounting sysfs.
> 
> Of course sysfs is mounted. kexec tried to open
> /sys/devices/system/cpu/cpu30/crash_notes but it failed because
> /sys/devices/system/cpu/cpu30 did not exist.
> 
> 
> [Before hot-remove]
> 
> # ls /sys/devices/system/cpu/
> cpu0    cpu111  cpu18  cpu31  cpu45  cpu59  cpu72  cpu86  cpuidle
> cpu1    cpu112  cpu19  cpu32  cpu46  cpu6   cpu73  cpu87  intel_pstate
> cpu10   cpu113  cpu2   cpu33  cpu47  cpu60  cpu74  cpu88  kernel_max
> cpu100  cpu114  cpu20  cpu34  cpu48  cpu61  cpu75  cpu89  microcode
> cpu101  cpu115  cpu21  cpu35  cpu49  cpu62  cpu76  cpu9   modalias
> cpu102  cpu116  cpu22  cpu36  cpu5   cpu63  cpu77  cpu90  offline
> cpu103  cpu117  cpu23  cpu37  cpu50  cpu64  cpu78  cpu91  online
> cpu104  cpu118  cpu24  cpu38  cpu51  cpu65  cpu79  cpu92  possible
> cpu105  cpu119  cpu25  cpu39  cpu52  cpu66  cpu8   cpu93  power
> cpu106  cpu12   cpu26  cpu4   cpu53  cpu67  cpu80  cpu94  present
> cpu107  cpu13   cpu27  cpu40  cpu54  cpu68  cpu81  cpu95  probe
> cpu108  cpu14   cpu28  cpu41  cpu55  cpu69  cpu82  cpu96  release
> cpu109  cpu15   cpu29  cpu42  cpu56  cpu7   cpu83  cpu97  uevent
> cpu11   cpu16   cpu3   cpu43  cpu57  cpu70  cpu84  cpu98
> cpu110  cpu17   cpu30  cpu44  cpu58  cpu71  cpu85  cpu99
> 
> 
> [After hot-remove]
> 
> # ls /sys/devices/system/cpu/
> cpu0   cpu16  cpu23  cpu4   cpu65  cpu72  cpu8   cpu87         modalias uevent
> cpu1   cpu17  cpu24  cpu5   cpu66  cpu73  cpu80  cpu88         offline
> cpu10  cpu18  cpu25  cpu6   cpu67  cpu74  cpu81  cpu89         online
> cpu11  cpu19  cpu26  cpu60  cpu68  cpu75  cpu82  cpu9          possible
> cpu12  cpu2   cpu27  cpu61  cpu69  cpu76  cpu83  cpuidle       power
> cpu13  cpu20  cpu28  cpu62  cpu7   cpu77  cpu84  intel_pstate  present
> cpu14  cpu21  cpu29  cpu63  cpu70  cpu78  cpu85  kernel_max    probe
> cpu15  cpu22  cpu3   cpu64  cpu71  cpu79  cpu86  microcode     release
> 
> You can see cpu30 to cpu59, and cpu90 to cpu119 does not exist, they are
> removed from this system by hot-remove operation.
> 
> kexec command expects the number of each directory is contiguous. For
> example, if sysconf(_SC_NPROCESSORS_CONF) is 60, kexec thinks directory
> cpu0, cpu1, cpu2, .... cpu59 exists. But in this case, the number of
> directory is not contiguous. That is the root cause of this problem.
> This patches fix it.
> 
> Takao Indoh (2):
>   Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous
>   Fix mistaken check of stat(2) return value
> 
>  kexec/crashdump-elf.c | 5 ++++-
>  kexec/crashdump.c     | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 


-- 
Thanks.
Zhang Yanfei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] kexec command fails after cpu hot-removing
  2014-06-05  5:10 [PATCH 0/2] kexec command fails after cpu hot-removing Takao Indoh
                   ` (2 preceding siblings ...)
  2014-06-05  6:04 ` [PATCH 0/2] kexec command fails after cpu hot-removing Zhang Yanfei
@ 2014-06-05  7:32 ` WANG Chao
  3 siblings, 0 replies; 6+ messages in thread
From: WANG Chao @ 2014-06-05  7:32 UTC (permalink / raw)
  To: Takao Indoh; +Cc: horms, kexec

On 06/05/14 at 02:10pm, Takao Indoh wrote:
> After cpu hot-removing, kexec command fails with the following message.
> 
> "/sys/devices" does not exist. Sysfs does not seem to be mounted. Try
> mounting sysfs.
> 
> Of course sysfs is mounted. kexec tried to open
> /sys/devices/system/cpu/cpu30/crash_notes but it failed because
> /sys/devices/system/cpu/cpu30 did not exist.
> 
> 
> [Before hot-remove]
> 
> # ls /sys/devices/system/cpu/
> cpu0    cpu111  cpu18  cpu31  cpu45  cpu59  cpu72  cpu86  cpuidle
> cpu1    cpu112  cpu19  cpu32  cpu46  cpu6   cpu73  cpu87  intel_pstate
> cpu10   cpu113  cpu2   cpu33  cpu47  cpu60  cpu74  cpu88  kernel_max
> cpu100  cpu114  cpu20  cpu34  cpu48  cpu61  cpu75  cpu89  microcode
> cpu101  cpu115  cpu21  cpu35  cpu49  cpu62  cpu76  cpu9   modalias
> cpu102  cpu116  cpu22  cpu36  cpu5   cpu63  cpu77  cpu90  offline
> cpu103  cpu117  cpu23  cpu37  cpu50  cpu64  cpu78  cpu91  online
> cpu104  cpu118  cpu24  cpu38  cpu51  cpu65  cpu79  cpu92  possible
> cpu105  cpu119  cpu25  cpu39  cpu52  cpu66  cpu8   cpu93  power
> cpu106  cpu12   cpu26  cpu4   cpu53  cpu67  cpu80  cpu94  present
> cpu107  cpu13   cpu27  cpu40  cpu54  cpu68  cpu81  cpu95  probe
> cpu108  cpu14   cpu28  cpu41  cpu55  cpu69  cpu82  cpu96  release
> cpu109  cpu15   cpu29  cpu42  cpu56  cpu7   cpu83  cpu97  uevent
> cpu11   cpu16   cpu3   cpu43  cpu57  cpu70  cpu84  cpu98
> cpu110  cpu17   cpu30  cpu44  cpu58  cpu71  cpu85  cpu99
> 
> 
> [After hot-remove]
> 
> # ls /sys/devices/system/cpu/
> cpu0   cpu16  cpu23  cpu4   cpu65  cpu72  cpu8   cpu87         modalias uevent
> cpu1   cpu17  cpu24  cpu5   cpu66  cpu73  cpu80  cpu88         offline
> cpu10  cpu18  cpu25  cpu6   cpu67  cpu74  cpu81  cpu89         online
> cpu11  cpu19  cpu26  cpu60  cpu68  cpu75  cpu82  cpu9          possible
> cpu12  cpu2   cpu27  cpu61  cpu69  cpu76  cpu83  cpuidle       power
> cpu13  cpu20  cpu28  cpu62  cpu7   cpu77  cpu84  intel_pstate  present
> cpu14  cpu21  cpu29  cpu63  cpu70  cpu78  cpu85  kernel_max    probe
> cpu15  cpu22  cpu3   cpu64  cpu71  cpu79  cpu86  microcode     release
> 
> You can see cpu30 to cpu59, and cpu90 to cpu119 does not exist, they are
> removed from this system by hot-remove operation.
> 
> kexec command expects the number of each directory is contiguous. For
> example, if sysconf(_SC_NPROCESSORS_CONF) is 60, kexec thinks directory
> cpu0, cpu1, cpu2, .... cpu59 exists. But in this case, the number of
> directory is not contiguous. That is the root cause of this problem.
> This patches fix it.
> 
> Takao Indoh (2):
>   Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous
>   Fix mistaken check of stat(2) return value
> 
>  kexec/crashdump-elf.c | 5 ++++-
>  kexec/crashdump.c     | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)

These two patches look good to me.

Acked-by: WANG Chao <chaowang@redhat.com>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] kexec command fails after cpu hot-removing
  2014-06-05  6:04 ` [PATCH 0/2] kexec command fails after cpu hot-removing Zhang Yanfei
@ 2014-06-05  9:08   ` Simon Horman
  0 siblings, 0 replies; 6+ messages in thread
From: Simon Horman @ 2014-06-05  9:08 UTC (permalink / raw)
  To: Zhang Yanfei, WANG Chao; +Cc: kexec, Takao Indoh

On Thu, Jun 05, 2014 at 02:04:41PM +0800, Zhang Yanfei wrote:
> I think maybe no one had tested kexec-tools after cpu hot-remove. So
> the bug remains until today.
> 
> For both two patches:
> 
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

On Thu, Jun 05, 2014 at 03:32:15PM +0800, WANG Chao wrote:
> On 06/05/14 at 02:10pm, Takao Indoh wrote:
> > After cpu hot-removing, kexec command fails with the following message.
> > 
> > "/sys/devices" does not exist. Sysfs does not seem to be mounted. Try
> > mounting sysfs.
> > 
> > Of course sysfs is mounted. kexec tried to open
> > /sys/devices/system/cpu/cpu30/crash_notes but it failed because
> > /sys/devices/system/cpu/cpu30 did not exist.
> > 
> > 
> > [Before hot-remove]
> > 
> > # ls /sys/devices/system/cpu/
> > cpu0    cpu111  cpu18  cpu31  cpu45  cpu59  cpu72  cpu86  cpuidle
> > cpu1    cpu112  cpu19  cpu32  cpu46  cpu6   cpu73  cpu87  intel_pstate
> > cpu10   cpu113  cpu2   cpu33  cpu47  cpu60  cpu74  cpu88  kernel_max
> > cpu100  cpu114  cpu20  cpu34  cpu48  cpu61  cpu75  cpu89  microcode
> > cpu101  cpu115  cpu21  cpu35  cpu49  cpu62  cpu76  cpu9   modalias
> > cpu102  cpu116  cpu22  cpu36  cpu5   cpu63  cpu77  cpu90  offline
> > cpu103  cpu117  cpu23  cpu37  cpu50  cpu64  cpu78  cpu91  online
> > cpu104  cpu118  cpu24  cpu38  cpu51  cpu65  cpu79  cpu92  possible
> > cpu105  cpu119  cpu25  cpu39  cpu52  cpu66  cpu8   cpu93  power
> > cpu106  cpu12   cpu26  cpu4   cpu53  cpu67  cpu80  cpu94  present
> > cpu107  cpu13   cpu27  cpu40  cpu54  cpu68  cpu81  cpu95  probe
> > cpu108  cpu14   cpu28  cpu41  cpu55  cpu69  cpu82  cpu96  release
> > cpu109  cpu15   cpu29  cpu42  cpu56  cpu7   cpu83  cpu97  uevent
> > cpu11   cpu16   cpu3   cpu43  cpu57  cpu70  cpu84  cpu98
> > cpu110  cpu17   cpu30  cpu44  cpu58  cpu71  cpu85  cpu99
> > 
> > 
> > [After hot-remove]
> > 
> > # ls /sys/devices/system/cpu/
> > cpu0   cpu16  cpu23  cpu4   cpu65  cpu72  cpu8   cpu87         modalias uevent
> > cpu1   cpu17  cpu24  cpu5   cpu66  cpu73  cpu80  cpu88         offline
> > cpu10  cpu18  cpu25  cpu6   cpu67  cpu74  cpu81  cpu89         online
> > cpu11  cpu19  cpu26  cpu60  cpu68  cpu75  cpu82  cpu9          possible
> > cpu12  cpu2   cpu27  cpu61  cpu69  cpu76  cpu83  cpuidle       power
> > cpu13  cpu20  cpu28  cpu62  cpu7   cpu77  cpu84  intel_pstate  present
> > cpu14  cpu21  cpu29  cpu63  cpu70  cpu78  cpu85  kernel_max    probe
> > cpu15  cpu22  cpu3   cpu64  cpu71  cpu79  cpu86  microcode     release
> > 
> > You can see cpu30 to cpu59, and cpu90 to cpu119 does not exist, they are
> > removed from this system by hot-remove operation.
> > 
> > kexec command expects the number of each directory is contiguous. For
> > example, if sysconf(_SC_NPROCESSORS_CONF) is 60, kexec thinks directory
> > cpu0, cpu1, cpu2, .... cpu59 exists. But in this case, the number of
> > directory is not contiguous. That is the root cause of this problem.
> > This patches fix it.
> > 
> > Takao Indoh (2):
> >   Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous
> >   Fix mistaken check of stat(2) return value
> > 
> >  kexec/crashdump-elf.c | 5 ++++-
> >  kexec/crashdump.c     | 2 +-
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> These two patches look good to me.
> 
> Acked-by: WANG Chao <chaowang@redhat.com>

Thanks, applied.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-05  9:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-05  5:10 [PATCH 0/2] kexec command fails after cpu hot-removing Takao Indoh
2014-06-05  5:10 ` [PATCH 1/2] Enumerate all /sys/devices/system/cpu/cpuN when they are discontiguous Takao Indoh
2014-06-05  5:10 ` [PATCH 2/2] Fix mistaken check of stat(2) return value Takao Indoh
2014-06-05  6:04 ` [PATCH 0/2] kexec command fails after cpu hot-removing Zhang Yanfei
2014-06-05  9:08   ` Simon Horman
2014-06-05  7:32 ` WANG Chao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.