LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Alexander Graf @ 2013-01-31 13:21 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-2-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> VCPU's MMUCFG register initialization should not depend on =
KVM_CAP_SW_TLB
> ioctl call. Move it earlier into tlb initalization phase.

Quite the contrary. The fact that there is an mfspr() in e500_mmu.c =
already tells us that the code is broken. The TLB guest code should only =
depend on input from the SW_TLB configuration. It's completely =
orthogonal to the host capabilities.


Alex

>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> arch/powerpc/kvm/e500_mmu.c |    4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>=20
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 5c44759..bb1b2b0 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -692,8 +692,6 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu =
*vcpu,
> 	vcpu_e500->gtlb_offset[0] =3D 0;
> 	vcpu_e500->gtlb_offset[1] =3D params.tlb_sizes[0];
>=20
> -	vcpu->arch.mmucfg =3D mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> -
> 	vcpu->arch.tlbcfg[0] &=3D ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> 	if (params.tlb_sizes[0] <=3D 2048)
> 		vcpu->arch.tlbcfg[0] |=3D params.tlb_sizes[0];
> @@ -781,6 +779,8 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 =
*vcpu_e500)
> 	if (!vcpu_e500->g2h_tlb1_map)
> 		goto err;
>=20
> +	vcpu->arch.mmucfg =3D mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> +
> 	/* Init TLB configuration register */
> 	vcpu->arch.tlbcfg[0] =3D mfspr(SPRN_TLB0CFG) &
> 			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31 10:38 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A3CE6.202@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> Hi Simon,
> 
> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >
> > 1. IIUC, there is a button on machine which supports hot-remove memory,
> > then what's the difference between press button and echo to /sys?
> 
> No important difference, I think. Since I don't have the machine you are
> saying, I cannot surely answer you. :)
> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> is just another entrance. At last, they will run into the same code.
> 
> > 2. Since kernel memory is linear mapping(I mean direct mapping part),
> > why can't put kernel direct mapping memory into one memory device, and
> > other memory into the other devices?
> 
> We cannot do that because in that way, we will lose NUMA performance.
> 
> If you know NUMA, you will understand the following example:
> 
> node0:                    node1:
>     cpu0~cpu15                cpu16~cpu31
>     memory0~memory511         memory512~memory1023
> 
> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> If we set direct mapping area in node0, and movable area in node1, then
> the kernel code running on cpu16~cpu31 will have to access 
> memory0~memory511.
> This is a terrible performance down.

So if config NUMA, kernel memory will not be linear mapping anymore? For
example, 

Node 0  Node 1 

0 ~ 10G 11G~14G

kernel memory only at Node 0? Can part of kernel memory also at Node 1?

How big is kernel direct mapping memory in x86_64? Is there max limit?
It seems that only around 896MB on x86_32. 

> 
> >As you know x86_64 don't need
> > highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> > idea available? If is correct, x86_32 can't implement in the same way
> > since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> > hard to focus kernel memory on single memory device.
> 
> Sorry, I'm not quite familiar with x86_32 box.
> 
> > 3. In current implementation, if memory hotplug just need memory
> > subsystem and ACPI codes support? Or also needs firmware take part in?
> > Hope you can explain in details, thanks in advance. :)
> 
> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> based memory migration mentioned by Liu Jiang.

Is there any material about firmware based memory migration?

> 
> So far, I only know this. :)
> 
> > 4. What's the status of memory hotplug? Apart from can't remove kernel
> > memory, other things are fully implementation?
> 
> I think the main job is done for now. And there are still bugs to fix.
> And this functionality is not stable.
> 
> Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-01-31  9:44 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359622123.1391.19.camel@kernel>

Hi Simon,

On 01/31/2013 04:48 PM, Simon Jeons wrote:
> Hi Tang,
> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>
> 1. IIUC, there is a button on machine which supports hot-remove memory,
> then what's the difference between press button and echo to /sys?

No important difference, I think. Since I don't have the machine you are
saying, I cannot surely answer you. :)
AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
is just another entrance. At last, they will run into the same code.

> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> why can't put kernel direct mapping memory into one memory device, and
> other memory into the other devices?

We cannot do that because in that way, we will lose NUMA performance.

If you know NUMA, you will understand the following example:

node0:                    node1:
    cpu0~cpu15                cpu16~cpu31
    memory0~memory511         memory512~memory1023

cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
If we set direct mapping area in node0, and movable area in node1, then
the kernel code running on cpu16~cpu31 will have to access 
memory0~memory511.
This is a terrible performance down.

>As you know x86_64 don't need
> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> idea available? If is correct, x86_32 can't implement in the same way
> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> hard to focus kernel memory on single memory device.

Sorry, I'm not quite familiar with x86_32 box.

> 3. In current implementation, if memory hotplug just need memory
> subsystem and ACPI codes support? Or also needs firmware take part in?
> Hope you can explain in details, thanks in advance. :)

We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
based memory migration mentioned by Liu Jiang.

So far, I only know this. :)

> 4. What's the status of memory hotplug? Apart from can't remove kernel
> memory, other things are fully implementation?

I think the main job is done for now. And there are still bugs to fix.
And this functionality is not stable.

Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  8:48 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A18FA.2010107@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

1. IIUC, there is a button on machine which supports hot-remove memory,
then what's the difference between press button and echo to /sys?
2. Since kernel memory is linear mapping(I mean direct mapping part),
why can't put kernel direct mapping memory into one memory device, and
other memory into the other devices? As you know x86_64 don't need
highmem, IIUC, all kernel memory will linear mapping in this case. Is my
idea available? If is correct, x86_32 can't implement in the same way
since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
hard to focus kernel memory on single memory device.
3. In current implementation, if memory hotplug just need memory
subsystem and ACPI codes support? Or also needs firmware take part in?
Hope you can explain in details, thanks in advance. :)
4. What's the status of memory hotplug? Apart from can't remove kernel
memory, other things are fully implementation?  


> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>       extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
>     N_NORMAL_MEMORY: nodes who have normal memory.
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem
>     N_MEMORY: nodes who has memory (any memory)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      highmem only -------------------------
>                      highmem and movablemem ---------------
>                      movablemem only ---------------------- We can have 
> movablemem only.    ***
> 
> 2) With out CONFIG_MOVABLE_NODE:
>     N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      No movablemem only ------------------- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.
> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [GIT PULL 00/21] perf/core improvements and fixes
From: Ingo Molnar @ 2013-01-31  9:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Frederic Weisbecker, Stephane Eranian,
	arnaldo.melo, linuxppc-dev, Paul Mackerras, Jiri Olsa,
	Andrea Arcangeli, Andi Kleen, Hugh Dickins, Mel Gorman,
	Michael Ellerman, Borislav Petkov, Thomas Jarosch, Rik van Riel,
	Corey Ashford, Namhyung Kim, Anton Blanchard, Steven Rostedt,
	Arnaldo Carvalho de Melo, Sukadev Bhattiprolu, Peter Hurley,
	Mike Galbraith, linux-kernel, David Ahern, Andrew Morton
In-Reply-To: <1359557222-17547-1-git-send-email-acme@infradead.org>


* Arnaldo Carvalho de Melo <acme@infradead.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling.
> 
> 	Namhyung, Jiri, the 'group report' patches are at acme/perf/group,
> will send a pull req later if it survives further testing.
> 
> - Arnaldo
> 
> The following changes since commit a2d28d0c198b65fac28ea6212f5f8edc77b29c27:
> 
>   Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2013-01-25 11:34:00 +0100)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo
> 
> for you to fetch changes up to 5809fde040de2afa477a6c593ce2e8fd2c11d9d3:
> 
>   perf header: Fix double fclose() on do_write(fd, xxx) failure (2013-01-30 10:40:44 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> . Fix some leaks in exit paths.
> 
> . Use memdup where applicable
> 
> . Remove some die() calls, allowing callers to handle exit paths
>   gracefully.
> 
> . Correct typo in tools Makefile, fix from Borislav Petkov.
> 
> . Add 'perf bench numa mem' NUMA performance measurement suite, from Ingo Molnar.
> 
> . Handle dynamic array's element size properly, fix from Jiri Olsa.
> 
> . Fix memory leaks on evsel->counts, from Namhyung Kim.
> 
> . Make numa benchmark optional, allowing the build in machines where required
>   numa libraries are not present, fix from Peter Hurley.
> 
> . Add interval printing in 'perf stat', from Stephane Eranian.
> 
> . Fix compile warnings in tests/attr.c, from Sukadev Bhattiprolu.
> 
> . Fix double free, pclose instead of fclose, leaks and double fclose errors
>   found with the cppcheck tool, from Thomas Jarosch.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (8):
>       perf tools: Stop using 'self' in strlist
>       perf tools: Stop using 'self' in map.[ch]
>       perf tools: Use memdup in map__clone
>       perf kmem: Use memdup()
>       perf header: Stop using die() calls when processing tracing data
>       perf ui browser: Free browser->helpline() on ui_browser__hide()
>       perf tests: Call machine__exit in the vmlinux matches kallsyms test
>       perf tests: Fix leaks on PERF_RECORD_* test
> 
> Borislav Petkov (1):
>       tools: Correct typo in tools Makefile
> 
> Ingo Molnar (1):
>       perf: Add 'perf bench numa mem' NUMA performance measurement suite
> 
> Jiri Olsa (1):
>       tools lib traceevent: Handle dynamic array's element size properly
> 
> Namhyung Kim (1):
>       perf evsel: Fix memory leaks on evsel->counts
> 
> Peter Hurley (1):
>       perf tools: Make numa benchmark optional
> 
> Stephane Eranian (2):
>       perf evsel: Add prev_raw_count field
>       perf stat: Add interval printing
> 
> Sukadev Bhattiprolu (1):
>       perf tools, powerpc: Fix compile warnings in tests/attr.c
> 
> Thomas Jarosch (5):
>       perf tools: Fix possible double free on error
>       perf sort: Use pclose() instead of fclose() on pipe stream
>       perf tools: Fix memory leak on error
>       perf header: Fix memory leak for the "Not caching a kptr_restrict'ed /proc/kallsyms" case
>       perf header: Fix double fclose() on do_write(fd, xxx) failure
> 
>  tools/Makefile                           |    2 +-
>  tools/lib/traceevent/event-parse.c       |   39 +-
>  tools/perf/Documentation/perf-stat.txt   |    4 +
>  tools/perf/Makefile                      |   13 +
>  tools/perf/arch/common.c                 |    1 +
>  tools/perf/bench/bench.h                 |    1 +
>  tools/perf/bench/numa.c                  | 1731 ++++++++++++++++++++++++++++++
>  tools/perf/builtin-bench.c               |   17 +
>  tools/perf/builtin-kmem.c                |    6 +-
>  tools/perf/builtin-stat.c                |  158 ++-
>  tools/perf/config/feature-tests.mak      |   11 +
>  tools/perf/tests/attr.c                  |    5 +
>  tools/perf/tests/open-syscall-all-cpus.c |    1 +
>  tools/perf/tests/perf-record.c           |   12 +-
>  tools/perf/tests/vmlinux-kallsyms.c      |    4 +-
>  tools/perf/ui/browser.c                  |    2 +
>  tools/perf/util/event.c                  |    4 +-
>  tools/perf/util/evsel.c                  |   31 +
>  tools/perf/util/evsel.h                  |    2 +
>  tools/perf/util/header.c                 |   25 +-
>  tools/perf/util/map.c                    |  118 +-
>  tools/perf/util/map.h                    |   24 +-
>  tools/perf/util/sort.c                   |    7 +-
>  tools/perf/util/strlist.c                |   54 +-
>  tools/perf/util/strlist.h                |   42 +-
>  25 files changed, 2154 insertions(+), 160 deletions(-)
>  create mode 100644 tools/perf/bench/numa.c

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  8:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A18FA.2010107@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>       extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
>     N_NORMAL_MEMORY: nodes who have normal memory.
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem
>     N_MEMORY: nodes who has memory (any memory)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      highmem only -------------------------
>                      highmem and movablemem ---------------
>                      movablemem only ---------------------- We can have 
> movablemem only.    ***
> 
> 2) With out CONFIG_MOVABLE_NODE:
>     N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      No movablemem only ------------------- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.

Thanks for your clarify, very clear now. :)

> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-01-31  7:10 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359613162.1587.0.camel@kernel>

On 01/31/2013 02:19 PM, Simon Jeons wrote:
> Hi Tang,
> On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> Please see below. :)
>>
>> On 01/31/2013 09:22 AM, Simon Jeons wrote:
>>>
>>> Sorry, I still confuse. :(
>>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
>>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
>>>
>>> node_states is what? node_states[N_NORMAL_MEMOR] or
>>> node_states[N_MEMORY]?
>>
>> Are you asking what node_states[] is ?
>>
>> node_states[] is an array of nodemask,
>>
>>       extern nodemask_t node_states[NR_NODE_STATES];
>>
>> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
>> normal memory.
>> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
>> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
>> ZONE_MOVABLE.
>>
>
> Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> *ZONE_MOVABLE*, the comment of enum nodes_states said that
> N_NORMAL_MEMORY just means the node has regular memory.
>

Hi Simon,

Let's say it in this way.

If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
don't have a separate
macro to represent highmem because we don't have highmem.
This is easy to understand, right ?

Now, think it just like above:
If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
N_NORMAL_MEMORY.
This means we don't allow a node to have only movable memory, not we 
don't have movable memory.
A node could have normal memory and movable memory. So 
nodes_state[N_NORMAL_MEMORY] represents
a node have 0 ... *ZONE_MOVABLE*.

I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
only movable memory.
So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
movable memory. It means
the node cannot have only movable memory. It can have normal memory and 
movable memory.

1) With CONFIG_MOVABLE_NODE:
    N_NORMAL_MEMORY: nodes who have normal memory.
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem
    N_MEMORY: nodes who has memory (any memory)
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem ---------------- We can have 
movablemem.
                     highmem only -------------------------
                     highmem and movablemem ---------------
                     movablemem only ---------------------- We can have 
movablemem only.    ***

2) With out CONFIG_MOVABLE_NODE:
    N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem ---------------- We can have 
movablemem.
                     No movablemem only ------------------- We cannot 
have movablemem only. ***

The semantics is not that clear here. So we can only try to understand 
it from the code where
we use N_MEMORY. :)

That is my understanding of this.

Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  6:19 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <5109E59F.5080104@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> Hi Simon,
> 
> Please see below. :)
> 
> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >
> > Sorry, I still confuse. :(
> > update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> > node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >
> > node_states is what? node_states[N_NORMAL_MEMOR] or
> > node_states[N_MEMORY]?
> 
> Are you asking what node_states[] is ?
> 
> node_states[] is an array of nodemask,
> 
>      extern nodemask_t node_states[NR_NODE_STATES];
> 
> For example, node_states[N_NORMAL_MEMOR] represents which nodes have 
> normal memory.
> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ... 
> ZONE_MOVABLE.
> 

Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
*ZONE_MOVABLE*, the comment of enum nodes_states said that
N_NORMAL_MEMORY just means the node has regular memory.  

> 
> > Why check !z1->wait_table in function move_pfn_range_left and function
> > __add_zone? I think zone->wait_table is initialized in
> > free_area_init_core, which will be called during system initialization
> > and hotadd_new_pgdat path.
> 
> I think,
> 
> free_area_init_core(), in the for loop,
>   |--> size = zone_spanned_pages_in_node();
>   |--> if (!size)
>                continue;  ----------------  If zone is empty, we jump 
> out the for loop.
>   |--> init_currently_empty_zone()
> 
> So, if the zone is empty, wait_table is not initialized.
> 
> In move_pfn_range_left(z1, z2), we move pages from z2 to z1. But z1 
> could be empty.
> So we need to check it and initialize z1->wait_table because we are 
> moving pages into it.

thanks.

> 
> 
> > There is a zone populated check in function online_pages. But zone is
> > populated in free_area_init_core which will be called during system
> > initialization and hotadd_new_pgdat path. Why still need this check?
> >
> 
> Because we could also rebuild zone list when we offline pages.
> 
> __offline_pages()
>   |--> zone->present_pages -= offlined_pages;
>   |--> if (!populated_zone(zone)) {
>                build_all_zonelists(NULL, NULL);
>        }
> 
> If the zone is empty, and other zones on the same node is not empty, the 
> node
> won't be offlined, and next time we online pages of this zone, the pgdat 
> won't
> be initialized again, and we need to check populated_zone(zone) when 
> onlining
> pages.

thanks.

> 
> Thanks. :)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Greg KH @ 2013-01-31  5:24 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	Rafael J. Wysocki, linux-acpi, isimatu.yasuaki, srivatsa.bhat,
	guohanjun, bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359594912.15120.85.camel@misato.fc.hp.com>

On Wed, Jan 30, 2013 at 06:15:12PM -0700, Toshi Kani wrote:
> > Please make it a "real" pointer, and not a void *, those shouldn't be
> > used at all if possible.
> 
> How about changing the "void *handle" to acpi_dev_node below?   
> 
>    struct acpi_dev_node    acpi_node;
> 
> Basically, it has the same challenge as struct device, which uses
> acpi_dev_node as well.  We can add other FW node when needed (just like
> device also has *of_node).

That sounds good to me.

^ permalink raw reply

* [PATCH][UPSTEAM] powerpc/mpic: add irq_set_wake support
From: Wang Dongsheng @ 2013-01-31  3:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Wang Dongsheng

Add irq_set_wake support. Just add IRQF_NO_SUSPEND to desc->action->flag.
So the wake up interrupt will not be disable in suspend_device_irqs.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
---
 arch/powerpc/sysdev/mpic.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index 9c6e535..2ed0220 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -920,6 +920,18 @@ int mpic_set_irq_type(struct irq_data *d, unsigned int flow_type)
 	return IRQ_SET_MASK_OK_NOCOPY;
 }

+static int mpic_irq_set_wake(struct irq_data *d, unsigned int on)
+{
+	struct irq_desc *desc = container_of(d, struct irq_desc, irq_data);
+
+	if (on)
+		desc->action->flags |= IRQF_NO_SUSPEND;
+	else
+		desc->action->flags &= ~IRQF_NO_SUSPEND;
+
+	return 0;
+}
+
 void mpic_set_vector(unsigned int virq, unsigned int vector)
 {
 	struct mpic *mpic = mpic_from_irq(virq);
@@ -957,6 +969,7 @@ static struct irq_chip mpic_irq_chip = {
 	.irq_unmask	= mpic_unmask_irq,
 	.irq_eoi	= mpic_end_irq,
 	.irq_set_type	= mpic_set_irq_type,
+	.irq_set_wake	= mpic_irq_set_wake,
 };

 #ifdef CONFIG_SMP
@@ -971,6 +984,7 @@ static struct irq_chip mpic_tm_chip = {
 	.irq_mask	= mpic_mask_tm,
 	.irq_unmask	= mpic_unmask_tm,
 	.irq_eoi	= mpic_end_irq,
+	.irq_set_wake	= mpic_irq_set_wake,
 };

 #ifdef CONFIG_MPIC_U3_HT_IRQS
@@ -981,6 +995,7 @@ static struct irq_chip mpic_irq_ht_chip = {
 	.irq_unmask	= mpic_unmask_ht_irq,
 	.irq_eoi	= mpic_end_ht_irq,
 	.irq_set_type	= mpic_set_irq_type,
+	.irq_set_wake	= mpic_irq_set_wake,
 };
 #endif /* CONFIG_MPIC_U3_HT_IRQS */

--
1.7.5.1

^ permalink raw reply related

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-01-31  3:31 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359595344.1557.13.camel@kernel>

Hi Simon,

Please see below. :)

On 01/31/2013 09:22 AM, Simon Jeons wrote:
>
> Sorry, I still confuse. :(
> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
>
> node_states is what? node_states[N_NORMAL_MEMOR] or
> node_states[N_MEMORY]?

Are you asking what node_states[] is ?

node_states[] is an array of nodemask,

     extern nodemask_t node_states[NR_NODE_STATES];

For example, node_states[N_NORMAL_MEMOR] represents which nodes have 
normal memory.
If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ... 
ZONE_MOVABLE.

> Why check !z1->wait_table in function move_pfn_range_left and function
> __add_zone? I think zone->wait_table is initialized in
> free_area_init_core, which will be called during system initialization
> and hotadd_new_pgdat path.

I think,

free_area_init_core(), in the for loop,
  |--> size = zone_spanned_pages_in_node();
  |--> if (!size)
               continue;  ----------------  If zone is empty, we jump 
out the for loop.
  |--> init_currently_empty_zone()

So, if the zone is empty, wait_table is not initialized.

In move_pfn_range_left(z1, z2), we move pages from z2 to z1. But z1 
could be empty.
So we need to check it and initialize z1->wait_table because we are 
moving pages into it.

> There is a zone populated check in function online_pages. But zone is
> populated in free_area_init_core which will be called during system
> initialization and hotadd_new_pgdat path. Why still need this check?
>

Because we could also rebuild zone list when we offline pages.

__offline_pages()
  |--> zone->present_pages -= offlined_pages;
  |--> if (!populated_zone(zone)) {
               build_all_zonelists(NULL, NULL);
       }

If the zone is empty, and other zones on the same node is not empty, the 
node
won't be offlined, and next time we online pages of this zone, the pgdat 
won't
be initialized again, and we need to check populated_zone(zone) when 
onlining
pages.

Thanks. :)

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-01-31  2:57 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	rjw, linux-acpi, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130130045830.GH30002@kroah.com>

On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > +/*
> > + * Hot-plug device information
> > + */
> 
> Again, stop it with the "generic" hotplug term here, and everywhere
> else.  You are doing a very _specific_ type of hotplug devices, so spell
> it out.  We've worked hard to hotplug _everything_ in Linux, you are
> going to confuse a lot of people with this type of terms.

Agreed.  I will clarify in all places.

> > +union shp_dev_info {
> > +	struct shp_cpu {
> > +		u32		cpu_id;
> > +	} cpu;
> 
> What is this?  Why not point to the system device for the cpu?

This info is used to on-line a new CPU and create its system/cpu device.
In other word, a system/cpu device is created as a result of CPU
hotplug.

> > +	struct shp_memory {
> > +		int		node;
> > +		u64		start_addr;
> > +		u64		length;
> > +	} mem;
> 
> Same here, why not point to the system device?

Same as above.

> > +	struct shp_hostbridge {
> > +	} hb;
> > +
> > +	struct shp_node {
> > +	} node;
> 
> What happened here with these?  Empty structures?  Huh?

They are place holders for now.  PCI bridge hot-plug and node hot-plug
are still very much work in progress, so I have not integrated them into
this framework yet.

> > +};
> > +
> > +struct shp_device {
> > +	struct list_head	list;
> > +	struct device		*device;
> 
> No, make it a "real" device, embed the device into it.

This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
online/offline operation in order to maintain the current behavior.  CPU
online/offline operation only changes the state of CPU, so its
system/cpu device continues to be present before and after an operation.
(Whereas, CPU hot-add/delete operation creates or removes a system/cpu
device.)  So, this "*device" needs to be a pointer to reference an
existing device that is to be on-lined/off-lined.

> But, again, I'm going to ask why you aren't using the existing cpu /
> memory / bridge / node devices that we have in the kernel.  Please use
> them, or give me a _really_ good reason why they will not work.

We cannot use the existing system devices or ACPI devices here.  During
hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
device information in a platform-neutral way.  During hot-add, we first
creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
but platform-neutral modules cannot use them as they are ACPI-specific.
Also, its system device (i.e. device under /sys/devices/system) has not
been created until the hot-add operation completes.

> > +	enum shp_class		class;
> > +	union shp_dev_info	info;
> > +};
> > +
> > +/*
> > + * Hot-plug request
> > + */
> > +struct shp_request {
> > +	/* common info */
> > +	enum shp_operation	operation;	/* operation */
> > +
> > +	/* hot-plug event info: only valid for hot-plug operations */
> > +	void			*handle;	/* FW handle */
> > +	u32			event;		/* FW event */
> 
> What is this?

The shp_request describes a hotplug or online/offline operation that is
requested.  In case of hot-plug request, the "*handle" describes a
target device (which is an ACPI device object) and the "event" describes
a type of request, such as hot-add or hot-delete.

Thanks,
-Toshi

^ permalink raw reply

* Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
From: Mike @ 2013-01-31  2:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: tglx, linux-kernel
In-Reply-To: <1358235536-32741-1-git-send-email-qiudayu@linux.vnet.ibm.com>

Hi all

Any comments about my patchset?

Thanks 

Mike
在 2013-01-15二的 15:38 +0800，Mike Qiu写道：
> Currently, multiple MSI feature hasn't been enabled in pSeries,
> These patches try to enbale this feature.
> 
> These patches have been tested by using ipr driver, and the driver patch
> has been made by Wen Xiong <wenxiong@linux.vnet.ibm.com>:
> 
> [PATCH 0/7] Add support for new IBM SAS controllers
> 
> Test platform: One partition of pSeries with one cpu core(4 SMTs) and 
>                RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
> OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel 
> 
> IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
> 
> The test results is shown by 'cat /proc/interrups':
>           CPU0       CPU1       CPU2       CPU3       
> 16:     240458     261601     226310     200425      XICS Level     IPI
> 17:          0          0          0          0      XICS Level     RAS_EPOW
> 18:         10          0          3          2      XICS Level     hvc_console
> 19:     122182      28481      28527      28864      XICS Level     ibmvscsi
> 20:        506    7388226        108        118      XICS Level     eth0
> 21:          6          5          5          5      XICS Level     host1-0
> 22:        817        814        816        813      XICS Level     host1-1
> LOC:     398077     316725     231882     203049   Local timer interrupts
> SPU:       1659        919        961        903   Spurious interrupts
> CNT:          0          0          0          0   Performance
> monitoring interrupts
> MCE:          0          0          0          0   Machine check exceptions
> 
> Mike Qiu (3):
>   irq: Set multiple MSI descriptor data for multiple IRQs
>   irq: Add hw continuous IRQs map to virtual continuous IRQs support
>   powerpc/pci: Enable pSeries multiple MSI feature
> 
>  arch/powerpc/kernel/msi.c            |    4 --
>  arch/powerpc/platforms/pseries/msi.c |   62 ++++++++++++++++++++++++++++++++-
>  include/linux/irq.h                  |    4 ++
>  include/linux/irqdomain.h            |    3 ++
>  kernel/irq/chip.c                    |   40 ++++++++++++++++-----
>  kernel/irq/irqdomain.c               |   61 +++++++++++++++++++++++++++++++++
>  6 files changed, 158 insertions(+), 16 deletions(-)
> 

^ permalink raw reply

* Re: [RFC PATCH v2 03/12] drivers/base: Add system device hotplug framework
From: Toshi Kani @ 2013-01-31  1:48 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	rjw, linux-acpi, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130130045437.GG30002@kroah.com>

On Tue, 2013-01-29 at 23:54 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:21PM -0700, Toshi Kani wrote:
> > Added sys_hotplug.c, which is the system device hotplug framework code.
> > 
> > shp_register_handler() allows modules to register their hotplug handlers
> > to the framework.  shp_submit_req() provides the interface to submit
> > a hotplug or online/offline request of system devices.  The request is
> > then put into hp_workqueue.  shp_start_req() calls all registered handlers
> > in ascending order for each phase.  If any handler failed in validate or
> > execute phase, shp_start_req() initiates its rollback procedure.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  drivers/base/Makefile      |    1 
> >  drivers/base/sys_hotplug.c |  313 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 314 insertions(+)
> >  create mode 100644 drivers/base/sys_hotplug.c
> > 
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 5aa2d70..2e9b2f1 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -21,6 +21,7 @@ endif
> >  obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
> >  obj-$(CONFIG_REGMAP)	+= regmap/
> >  obj-$(CONFIG_SOC_BUS) += soc.o
> > +obj-y			+= sys_hotplug.o
> 
> No option to select this for systems that don't need it?  If not, then
> put it up higher with all of the other code for the core.

It used to have CONFIG_HOTPLUG, but I removed it as you suggested.  Yes,
I will put it up higher.  

Thanks,
-Toshi

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-01-31  1:46 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	rjw, linux-acpi, isimatu.yasuaki, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130130045330.GF30002@kroah.com>

On Tue, 2013-01-29 at 23:53 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > Added include/linux/sys_hotplug.h, which defines the system device
> > hotplug framework interfaces used by the framework itself and
> > handlers.
> > 
> > The order values define the calling sequence of handlers.  For add
> > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > CPU so that threads on new CPUs can start using their local memory.
> > The ordering of the delete execute is symmetric to the add execute.
> > 
> > struct shp_request defines a hot-plug request information.  The
> > device resource information is managed with a list so that a single
> > request may target to multiple devices.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  include/linux/sys_hotplug.h |  181 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 include/linux/sys_hotplug.h
> > 
> > diff --git a/include/linux/sys_hotplug.h b/include/linux/sys_hotplug.h
> > new file mode 100644
> > index 0000000..86674dd
> > --- /dev/null
> > +++ b/include/linux/sys_hotplug.h
> > @@ -0,0 +1,181 @@
> > +/*
> > + * sys_hotplug.h - System device hot-plug framework
> > + *
> > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > + *	Toshi Kani <toshi.kani@hp.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#ifndef _LINUX_SYS_HOTPLUG_H
> > +#define _LINUX_SYS_HOTPLUG_H
> > +
> > +#include <linux/list.h>
> > +#include <linux/device.h>
> > +
> > +/*
> > + * System device hot-plug operation proceeds in the following order.
> > + *   Validate phase -> Execute phase -> Commit phase
> > + *
> > + * The order values below define the calling sequence of platform
> > + * neutral handlers for each phase in ascending order.  The order
> > + * values of firmware-specific handlers are defined in sys_hotplug.h
> > + * under firmware specific directories.
> > + */
> > +
> > +/* All order values must be smaller than this value */
> > +#define SHP_ORDER_MAX				0xffffff
> > +
> > +/* Add Validate order values */
> > +
> > +/* Add Execute order values */
> > +#define SHP_MEM_ADD_EXECUTE_ORDER		100
> > +#define SHP_CPU_ADD_EXECUTE_ORDER		110
> > +
> > +/* Add Commit order values */
> > +
> > +/* Delete Validate order values */
> > +#define SHP_CPU_DEL_VALIDATE_ORDER		100
> > +#define SHP_MEM_DEL_VALIDATE_ORDER		110
> > +
> > +/* Delete Execute order values */
> > +#define SHP_CPU_DEL_EXECUTE_ORDER		10
> > +#define SHP_MEM_DEL_EXECUTE_ORDER		20
> > +
> > +/* Delete Commit order values */
> > +
> 
> Empty value?

Yes, in this version, all the delete commit order values are defined in
<acpi/sys_hotplug.h>.

> Anyway, as I said before, don't use "values", just call things directly
> in the order you need to.
> 
> This isn't like other operating systems, we don't need to be so
> "flexible", we can modify the core code as much as we want and need to
> if future things come along :)

Understood.  As described in the previous email, I will define them with
enum and avoid using values.

Thanks,
-Toshi

^ permalink raw reply

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-01-31  1:38 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	Rafael J. Wysocki, linux-acpi, isimatu.yasuaki, srivatsa.bhat,
	guohanjun, bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130130045153.GE30002@kroah.com>

On Tue, 2013-01-29 at 23:51 -0500, Greg KH wrote:
> On Mon, Jan 14, 2013 at 12:21:30PM -0700, Toshi Kani wrote:
> > On Mon, 2013-01-14 at 20:07 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > > > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > > > handlers.
> > > > > > > > 
> > > > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > > > ---
> > > > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 48 insertions(+)
> > > > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > > > 
> > > > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..ad80f61
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > > > @@ -0,0 +1,48 @@
> > > > > > > > +/*
> > > > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > > > + *
> > > > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > > > + *
> > > > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > > > + * published by the Free Software Foundation.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > > > +
> > > > > > > > +#include <linux/list.h>
> > > > > > > > +#include <linux/device.h>
> > > > > > > > +#include <linux/sys_hotplug.h>
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > > > + *
> > > > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/* Add Validate order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > > > +
> > > > > > > > +/* Add Execute order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > > > +
> > > > > > > > +/* Add Commit order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > > > +
> > > > > > > > +/* Delete Validate order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > > > +
> > > > > > > > +/* Delete Execute order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > > > +
> > > > > > > > +/* Delete Commit order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > > > +
> > > > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > > > --
> > > > > > > 
> > > > > > > Why did you use the particular values above?
> > > > > > 
> > > > > > The ordering values above are used to define the relative order among
> > > > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > > > potentially be 21 since it is still larger than 20 for
> > > > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > > > so that more platform-neutral handlers can be added in between 20 and
> > > > > > 100 in future.
> > > > > 
> > > > > I thought so, but I don't think it's a good idea to add gaps like this.
> > > > 
> > > > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > > > above example will be changed to 30.  
> > > 
> > > I wonder why you want to have those gaps at all.
> > 
> > Oh, I see.  I think some gap is helpful since it allows a new handler to
> > come between without recompiling other modules.  For instance, OEM
> > vendors may want to add their own handlers with loadable modules after
> > the kernel is distributed.
> 
> No, we don't support such a model, sorry, just make it a sequence of
> numbers and go from there.  If a vendor wants to modify the kernel to
> add new values, they can rebuild the core code as well.
> 
> I really don't like the whole idea of values in the first place, can't
> we just do things in the correct order in the code, and not be driven by
> random magic values?

OK, I will define all the values with enum, which is something like
below.  I think it is more manageable in this way as we do not have to
define magic values.

enum shp_add_order {
    /* Validate Phase */
    SHP_FW_BUS_ADD_VALIDATE_ORDER,

    /* Execute Phase */
    SHP_FW_BUS_ADD_EXECUTE_ORDER,
    SHP_FW_RES_ADD_EXECUTE_ORDER,
    SHP_MEM_ADD_EXECUTE_ORDER,
    SHP_CPU_ADD_EXECUTE_ORDER,

    /* Commit Phase */
    SHP_ADD_COMMIT_BASE_ORDER,
    SHP_FW_BUS_ADD_COMMIT_ORDER,
};

Thanks,
-Toshi

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-01-31  1:15 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390, jiang.liu, wency, linux-mm, yinghai, linux-kernel,
	Rafael J. Wysocki, linux-acpi, isimatu.yasuaki, srivatsa.bhat,
	guohanjun, bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <20130130044859.GD30002@kroah.com>

On Tue, 2013-01-29 at 23:48 -0500, Greg KH wrote:
> On Mon, Jan 14, 2013 at 12:02:04PM -0700, Toshi Kani wrote:
> > On Mon, 2013-01-14 at 19:48 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 08:33:48 AM Toshi Kani wrote:
> > > > On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > > > > > Added include/linux/sys_hotplug.h, which defines the system device
> > > > > > hotplug framework interfaces used by the framework itself and
> > > > > > handlers.
> > > > > > 
> > > > > > The order values define the calling sequence of handlers.  For add
> > > > > > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > > > > > CPU so that threads on new CPUs can start using their local memory.
> > > > > > The ordering of the delete execute is symmetric to the add execute.
> > > > > > 
> > > > > > struct shp_request defines a hot-plug request information.  The
> > > > > > device resource information is managed with a list so that a single
> > > > > > request may target to multiple devices.
> > > > > > 
> > > >  :
> > > > > > +
> > > > > > +struct shp_device {
> > > > > > +	struct list_head	list;
> > > > > > +	struct device		*device;
> > > > > > +	enum shp_class		class;
> > > > > > +	union shp_dev_info	info;
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Hot-plug request
> > > > > > + */
> > > > > > +struct shp_request {
> > > > > > +	/* common info */
> > > > > > +	enum shp_operation	operation;	/* operation */
> > > > > > +
> > > > > > +	/* hot-plug event info: only valid for hot-plug operations */
> > > > > > +	void			*handle;	/* FW handle */
> > > > > 
> > > > > What's the role of handle here?
> > > > 
> > > > On ACPI-based platforms, the handle keeps a notified ACPI handle when a
> > > > hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
> > > > acpi_del_execute(), then scans / trims ACPI devices from the handle.
> > > 
> > > OK, so this is ACPI-specific and should be described as such.
> > 
> > Other FW interface I know is parisc, which has mod_index (module index)
> > to identify a unique object, just like what ACPI handle does.  The
> > handle can keep the mod_index as an opaque value as well.  But as you
> > said, I do not know if the handle works for all other FWs.  So, I will
> > add descriptions, such that the hot-plug event info is modeled after
> > ACPI and may need to be revisited when supporting other FW.
> 
> Please make it a "real" pointer, and not a void *, those shouldn't be
> used at all if possible.

How about changing the "void *handle" to acpi_dev_node below?   

   struct acpi_dev_node    acpi_node;

Basically, it has the same challenge as struct device, which uses
acpi_dev_node as well.  We can add other FW node when needed (just like
device also has *of_node).

Thanks,
-Toshi

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  1:22 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <5108F2B3.3090506@cn.fujitsu.com>

Hi Tang,
On Wed, 2013-01-30 at 18:15 +0800, Tang Chen wrote:
> Hi Simon,
> 
> Please see below. :)
> 
> On 01/29/2013 08:52 PM, Simon Jeons wrote:
> > Hi Tang,
> >
> > On Wed, 2013-01-09 at 17:32 +0800, Tang Chen wrote:
> >> Here is the physical memory hot-remove patch-set based on 3.8rc-2.
> >
> > Some questions ask you, not has relationship with this patchset, but is
> > memory hotplug stuff.
> >
> > 1. In function node_states_check_changes_online:
> >
> > comments:
> > * If we don't have HIGHMEM nor movable node,
> > * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
> > * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
> >
> > How to understand it? Why we don't have HIGHMEM nor movable node and
> > node_staes[N_NORMAL_MEMORY] contains 0...ZONE_MOVABLE, IIUC,
> > N_NORMAL_MEMORY only means the node has regular memory.
> >
> 
> First of all, I think we need to understand why we need N_MEMORY.
> 
> In order to support movable node, which has only ZONE_MOVABLE (the last 
> zone),
> we introduce N_MEMORY to represent the node has normal, highmem and 
> movable memory.
> 
> Here, "we have movable node" means you configured CONFIG_MOVABLE_NODE.
> This config option doesn't mean we don't have movable pages, (NO)
> it means we don't have a node which has only movable pages (only have 
> ZONE_MOVABLE). (YES)
> 
> Here, if we don't have CONFIG_MOVABLE_NODE (we don't have movable node), 
> we don't need a
> separate node_states[] element to represent a particular node because we 
> won't have a node
> which has only ZONE_MOVABLE.
> 
> So,
> 1) if we don't have highmem nor movable node, N_MEMORY == N_HIGH_MEMORY 
> == N_NORMAL_MEMORY,
>     which means N_NORMAL_MEMORY effects as N_MEMORY. If we online pages 
> as movable, we need
>     to update node_states[N_NORMAL_MEMORY].

Sorry, I still confuse. :( 
update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?

> 
> Please refer to the definition of enum zone_type, if we don't have 
> CONFIG_HIGHMEM, we won't
> have ZONE_HIGHMEM, but ZONE_NORMAL and ZONE_MOVABLE will always there. 
> So we can have movable
> pages, and the zone_last should be ZONE_MOVABLE.

node_states is what? node_states[N_NORMAL_MEMOR] or
node_states[N_MEMORY]?

> 
> Again, because we won't have a node only having ZONE_MOVABLE, so we just 
> need to update
> node_states[N_NORMAL_MEMORY].
> 
> > * If we don't have movable node, node_states[N_NORMAL_MEMORY]
> > * contains nodes which have zones of 0...ZONE_MOVABLE,
> > * set zone_last to ZONE_MOVABLE.
> >
> > How to understand?
> 
> 2) this code is in #ifdef CONFIG_HIGHMEM, which means we have highmem, 
> so if we don't have
>     movable node, N_MEMORY == N_HIGH_MEMORY, and N_HIGH_MEMORY effects 
> as N_MEMORY. If we
>     online pages as movable, we need to update node_states[N_NORMAL_MEMORY].
> 
> >
> > 2. In function move_pfn_range_left, why end<= z2->zone_start_pfn is not
> > correct? The comments said that must include/overlap, why?
> >
> 
> This one is easy, if I understand you correctly.
> move_pfn_range_left() is used to move the left most part [start_pfn, 
> end_pfn) of z2 to z1.
> So if end_pfn<= z2->zone_start_pfn, it means [start_pfn, end_pfn) is not 
> part of z2.
> Then it fails.

Yup, very clear now. :)
Why check !z1->wait_table in function move_pfn_range_left and function
__add_zone? I think zone->wait_table is initialized in
free_area_init_core, which will be called during system initialization
and hotadd_new_pgdat path.

> 
> > 3. In function online_pages, the normal case(w/o online_kenrel,
> > online_movable), why not check if the new zone is overlap with adjacent
> > zones?
> >
> 
> Can a zone overlap with the others ? I don't think so.
> 
> One pfn could only be in one zone,
>     zone = page_zone(pfn_to_page(pfn));

thanks. :)

There is a zone populated check in function online_pages. But zone is
populated in free_area_init_core which will be called during system
initialization and hotadd_new_pgdat path. Why still need this check?

> 
> it could overlap with others, I think. :)
> 
> But maybe I misunderstand you. :)
> 
> > 4. Could you summarize the difference implementation between hot-add and
> > logic-add, hot-remove and logic-remove?
> 
> Sorry, I don't quite understand what do you mean by logic-add/remove.
> Would you please explain more ?
> 
> If you meant the sys fs interfaces, I think they are just another set of 
> entrances
> of memory hotplug.

Please ingore this silly question. :(

> 
> Thanks.  :)
> 
> >
> >
> >>
> >> This patch-set aims to implement physical memory hot-removing.
> >>
> >> The patches can free/remove the following things:
> >>
> >>    - /sys/firmware/memmap/X/{end, start, type} : [PATCH 4/15]
> >>    - memmap of sparse-vmemmap                  : [PATCH 6,7,8,10/15]
> >>    - page table of removed memory              : [RFC PATCH 7,8,10/15]
> >>    - node and related sysfs files              : [RFC PATCH 13-15/15]
> >>
> >>
> >> Existing problem:
> >> If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
> >> when we online pages.
> >>
> >> For example: there is a memory device on node 1. The address range
> >> is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
> >> and memory11 under the directory /sys/devices/system/memory/.
> >>
> >> If CONFIG_MEMCG is selected, when we online memory8, the memory stored page
> >> cgroup is not provided by this memory device. But when we online memory9, the
> >> memory stored page cgroup may be provided by memory8. So we can't offline
> >> memory8 now. We should offline the memory in the reversed order.
> >>
> >> When the memory device is hotremoved, we will auto offline memory provided
> >> by this memory device. But we don't know which memory is onlined first, so
> >> offlining memory may fail.
> >>
> >> In patch1, we provide a solution which is not good enough:
> >> Iterate twice to offline the memory.
> >> 1st iterate: offline every non primary memory block.
> >> 2nd iterate: offline primary (i.e. first added) memory block.
> >>
> >> And a new idea from Wen Congyang<wency@cn.fujitsu.com>  is:
> >> allocate the memory from the memory block they are describing.
> >>
> >> But we are not sure if it is OK to do so because there is not existing API
> >> to do so, and we need to move page_cgroup memory allocation from MEM_GOING_ONLINE
> >> to MEM_ONLINE. And also, it may interfere the hugepage.
> >>
> >>
> >>
> >> How to test this patchset?
> >> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
> >>     ACPI_HOTPLUG_MEMORY must be selected.
> >> 2. load the module acpi_memhotplug
> >> 3. hotplug the memory device(it depends on your hardware)
> >>     You will see the memory device under the directory /sys/bus/acpi/devices/.
> >>     Its name is PNP0C80:XX.
> >> 4. online/offline pages provided by this memory device
> >>     You can write online/offline to /sys/devices/system/memory/memoryX/state to
> >>     online/offline pages provided by this memory device
> >> 5. hotremove the memory device
> >>     You can hotremove the memory device by the hardware, or writing 1 to
> >>     /sys/bus/acpi/devices/PNP0C80:XX/eject.
> >
> > Is there a similar knode to hot-add the memory device?
> >
> >>
> >>
> >> Note: if the memory provided by the memory device is used by the kernel, it
> >> can't be offlined. It is not a bug.
> >>
> >>
> >> Changelogs from v5 to v6:
> >>   Patch3: Add some more comments to explain memory hot-remove.
> >>   Patch4: Remove bootmem member in struct firmware_map_entry.
> >>   Patch6: Repeatedly register bootmem pages when using hugepage.
> >>   Patch8: Repeatedly free bootmem pages when using hugepage.
> >>   Patch14: Don't free pgdat when offlining a node, just reset it to 0.
> >>   Patch15: New patch, pgdat is not freed in patch14, so don't allocate a new
> >>            one when online a node.
> >>
> >> Changelogs from v4 to v5:
> >>   Patch7: new patch, move pgdat_resize_lock into sparse_remove_one_section() to
> >>           avoid disabling irq because we need flush tlb when free pagetables.
> >>   Patch8: new patch, pick up some common APIs that are used to free direct mapping
> >>           and vmemmap pagetables.
> >>   Patch9: free direct mapping pagetables on x86_64 arch.
> >>   Patch10: free vmemmap pagetables.
> >>   Patch11: since freeing memmap with vmemmap has been implemented, the config
> >>            macro CONFIG_SPARSEMEM_VMEMMAP when defining __remove_section() is
> >>            no longer needed.
> >>   Patch13: no need to modify acpi_memory_disable_device() since it was removed,
> >>            and add nid parameter when calling remove_memory().
> >>
> >> Changelogs from v3 to v4:
> >>   Patch7: remove unused codes.
> >>   Patch8: fix nr_pages that is passed to free_map_bootmem()
> >>
> >> Changelogs from v2 to v3:
> >>   Patch9: call sync_global_pgds() if pgd is changed
> >>   Patch10: fix a problem int the patch
> >>
> >> Changelogs from v1 to v2:
> >>   Patch1: new patch, offline memory twice. 1st iterate: offline every non primary
> >>           memory block. 2nd iterate: offline primary (i.e. first added) memory
> >>           block.
> >>
> >>   Patch3: new patch, no logical change, just remove reduntant codes.
> >>
> >>   Patch9: merge the patch from wujianguo into this patch. flush tlb on all cpu
> >>           after the pagetable is changed.
> >>
> >>   Patch12: new patch, free node_data when a node is offlined.
> >>
> >>
> >> Tang Chen (6):
> >>    memory-hotplug: move pgdat_resize_lock into
> >>      sparse_remove_one_section()
> >>    memory-hotplug: remove page table of x86_64 architecture
> >>    memory-hotplug: remove memmap of sparse-vmemmap
> >>    memory-hotplug: Integrated __remove_section() of
> >>      CONFIG_SPARSEMEM_VMEMMAP.
> >>    memory-hotplug: remove sysfs file of node
> >>    memory-hotplug: Do not allocate pdgat if it was not freed when
> >>      offline.
> >>
> >> Wen Congyang (5):
> >>    memory-hotplug: try to offline the memory twice to avoid dependence
> >>    memory-hotplug: remove redundant codes
> >>    memory-hotplug: introduce new function arch_remove_memory() for
> >>      removing page table depends on architecture
> >>    memory-hotplug: Common APIs to support page tables hot-remove
> >>    memory-hotplug: free node_data when a node is offlined
> >>
> >> Yasuaki Ishimatsu (4):
> >>    memory-hotplug: check whether all memory blocks are offlined or not
> >>      when removing memory
> >>    memory-hotplug: remove /sys/firmware/memmap/X sysfs
> >>    memory-hotplug: implement register_page_bootmem_info_section of
> >>      sparse-vmemmap
> >>    memory-hotplug: memory_hotplug: clear zone when removing the memory
> >>
> >>   arch/arm64/mm/mmu.c                  |    3 +
> >>   arch/ia64/mm/discontig.c             |   10 +
> >>   arch/ia64/mm/init.c                  |   18 ++
> >>   arch/powerpc/mm/init_64.c            |   10 +
> >>   arch/powerpc/mm/mem.c                |   12 +
> >>   arch/s390/mm/init.c                  |   12 +
> >>   arch/s390/mm/vmem.c                  |   10 +
> >>   arch/sh/mm/init.c                    |   17 ++
> >>   arch/sparc/mm/init_64.c              |   10 +
> >>   arch/tile/mm/init.c                  |    8 +
> >>   arch/x86/include/asm/pgtable_types.h |    1 +
> >>   arch/x86/mm/init_32.c                |   12 +
> >>   arch/x86/mm/init_64.c                |  390 +++++++++++++++++++++++++++++
> >>   arch/x86/mm/pageattr.c               |   47 ++--
> >>   drivers/acpi/acpi_memhotplug.c       |    8 +-
> >>   drivers/base/memory.c                |    6 +
> >>   drivers/firmware/memmap.c            |   96 +++++++-
> >>   include/linux/bootmem.h              |    1 +
> >>   include/linux/firmware-map.h         |    6 +
> >>   include/linux/memory_hotplug.h       |   15 +-
> >>   include/linux/mm.h                   |    4 +-
> >>   mm/memory_hotplug.c                  |  459 +++++++++++++++++++++++++++++++---
> >>   mm/sparse.c                          |    8 +-
> >>   23 files changed, 1094 insertions(+), 69 deletions(-)
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
> >
> >
> >

^ permalink raw reply

* [GIT PULL 00/21] perf/core improvements and fixes
From: Arnaldo Carvalho de Melo @ 2013-01-30 14:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Frederic Weisbecker, Stephane Eranian,
	arnaldo.melo, linuxppc-dev, Paul Mackerras, Thomas Jarosch,
	Jiri Olsa, Arnaldo Carvalho de Melo, Andi Kleen, Hugh Dickins,
	Mel Gorman, Michael Ellerman, Borislav Petkov, Andrea Arcangeli,
	Rik van Riel, Corey Ashford, Namhyung Kim, Anton Blanchard,
	Steven Rostedt, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Peter Hurley, Mike Galbraith, linux-kernel, David Ahern,
	Andrew Morton

Hi Ingo,

	Please consider pulling.

	Namhyung, Jiri, the 'group report' patches are at acme/perf/group,
will send a pull req later if it survives further testing.

- Arnaldo

The following changes since commit a2d28d0c198b65fac28ea6212f5f8edc77b29c27:

  Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2013-01-25 11:34:00 +0100)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo

for you to fetch changes up to 5809fde040de2afa477a6c593ce2e8fd2c11d9d3:

  perf header: Fix double fclose() on do_write(fd, xxx) failure (2013-01-30 10:40:44 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

. Fix some leaks in exit paths.

. Use memdup where applicable

. Remove some die() calls, allowing callers to handle exit paths
  gracefully.

. Correct typo in tools Makefile, fix from Borislav Petkov.

. Add 'perf bench numa mem' NUMA performance measurement suite, from Ingo Molnar.

. Handle dynamic array's element size properly, fix from Jiri Olsa.

. Fix memory leaks on evsel->counts, from Namhyung Kim.

. Make numa benchmark optional, allowing the build in machines where required
  numa libraries are not present, fix from Peter Hurley.

. Add interval printing in 'perf stat', from Stephane Eranian.

. Fix compile warnings in tests/attr.c, from Sukadev Bhattiprolu.

. Fix double free, pclose instead of fclose, leaks and double fclose errors
  found with the cppcheck tool, from Thomas Jarosch.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (8):
      perf tools: Stop using 'self' in strlist
      perf tools: Stop using 'self' in map.[ch]
      perf tools: Use memdup in map__clone
      perf kmem: Use memdup()
      perf header: Stop using die() calls when processing tracing data
      perf ui browser: Free browser->helpline() on ui_browser__hide()
      perf tests: Call machine__exit in the vmlinux matches kallsyms test
      perf tests: Fix leaks on PERF_RECORD_* test

Borislav Petkov (1):
      tools: Correct typo in tools Makefile

Ingo Molnar (1):
      perf: Add 'perf bench numa mem' NUMA performance measurement suite

Jiri Olsa (1):
      tools lib traceevent: Handle dynamic array's element size properly

Namhyung Kim (1):
      perf evsel: Fix memory leaks on evsel->counts

Peter Hurley (1):
      perf tools: Make numa benchmark optional

Stephane Eranian (2):
      perf evsel: Add prev_raw_count field
      perf stat: Add interval printing

Sukadev Bhattiprolu (1):
      perf tools, powerpc: Fix compile warnings in tests/attr.c

Thomas Jarosch (5):
      perf tools: Fix possible double free on error
      perf sort: Use pclose() instead of fclose() on pipe stream
      perf tools: Fix memory leak on error
      perf header: Fix memory leak for the "Not caching a kptr_restrict'ed /proc/kallsyms" case
      perf header: Fix double fclose() on do_write(fd, xxx) failure

 tools/Makefile                           |    2 +-
 tools/lib/traceevent/event-parse.c       |   39 +-
 tools/perf/Documentation/perf-stat.txt   |    4 +
 tools/perf/Makefile                      |   13 +
 tools/perf/arch/common.c                 |    1 +
 tools/perf/bench/bench.h                 |    1 +
 tools/perf/bench/numa.c                  | 1731 ++++++++++++++++++++++++++++++
 tools/perf/builtin-bench.c               |   17 +
 tools/perf/builtin-kmem.c                |    6 +-
 tools/perf/builtin-stat.c                |  158 ++-
 tools/perf/config/feature-tests.mak      |   11 +
 tools/perf/tests/attr.c                  |    5 +
 tools/perf/tests/open-syscall-all-cpus.c |    1 +
 tools/perf/tests/perf-record.c           |   12 +-
 tools/perf/tests/vmlinux-kallsyms.c      |    4 +-
 tools/perf/ui/browser.c                  |    2 +
 tools/perf/util/event.c                  |    4 +-
 tools/perf/util/evsel.c                  |   31 +
 tools/perf/util/evsel.h                  |    2 +
 tools/perf/util/header.c                 |   25 +-
 tools/perf/util/map.c                    |  118 +-
 tools/perf/util/map.h                    |   24 +-
 tools/perf/util/sort.c                   |    7 +-
 tools/perf/util/strlist.c                |   54 +-
 tools/perf/util/strlist.h                |   42 +-
 25 files changed, 2154 insertions(+), 160 deletions(-)
 create mode 100644 tools/perf/bench/numa.c

^ permalink raw reply

* [PATCH 16/21] perf tools, powerpc: Fix compile warnings in tests/attr.c
From: Arnaldo Carvalho de Melo @ 2013-01-30 14:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anton Blanchard, linux-kernel, Arnaldo Carvalho de Melo,
	linuxppc-dev, Michael Ellerman, Sukadev Bhattiprolu, Jiri Olsa
In-Reply-To: <1359557222-17547-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

We print several '__u64' quantities using '%llu'. On powerpc, we by
default include '<asm-generic/int-l64.h> which results in __u64 being an
unsigned long. This causes compile warnings which are treated as errors
due to '-Werror'.

By defining __SANE_USERSPACE_TYPES__ we include <asm-generic/int-ll64.h>
and define __u64 as unsigned long long.

Changelog[v2]:
	[Michael Ellerman] Use __SANE_USERSPACE_TYPES__ and avoid PRIu64
	format specifier - which as Jiri Olsa pointed out, breaks on x86-64.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <ellerman@au1.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130124054439.GA31588@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/attr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index f61dd3f..bdcceb8 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -19,6 +19,11 @@
  * permissions. All the event text files are stored there.
  */
 
+/*
+ * Powerpc needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select
+ * 'int-ll64.h' and avoid compile warnings when printing __u64 with %llu.
+ */
+#define __SANE_USERSPACE_TYPES__
 #include <stdlib.h>
 #include <stdio.h>
 #include <inttypes.h>
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* powerpc/usb: machine check when writing to portsc in ehci-fsl
From: Guy Yribarren @ 2013-01-30 14:27 UTC (permalink / raw)
  To: linuxppc-dev

Currently using kernel 3.7.1 with mpc8315e, I am observing instability
during the usb initialisation. The usb controller is set as a host
controller with internal PHY defined in UTMI mode. This issue appears
sometimes after a warm reset. The kernel hangs and the log buffer =
contents
shows that a machine check exception is detected when execution is =
located
into ehci_fsl_setup_phy function.=20
=20
The latest messages available from the console are:
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
/immr@e0000000/usb@23000: Invalid 'dr_mode' property, fallback to host =
mode
fsl-ehci fsl-ehci.0: Freescale On-Chip EHCI Host Controller
fsl-ehci fsl-ehci.0: new USB bus registered, assigned bus number 1

By adding multiple printk into the ehci-fsl, I found that the issue =
happens
when executing the write to the portsc register nearly at the end of the
function (ehci_writel(ehci, portsc, =
&ehci->regs->port_status[port_offset]);)

This issue appears either with or without external usb devices =
connected.

I didn't find any restrictions to access this register and/or =
explanation
from freescale doc and usb2 & ehci specification.

This issue has been also seen with older kernel release like 2.6.39.=20

Applying a warn reset, then the processor boot correctly and the usb
interface is now fully functional and really stable with intensive
activities.

The processor is based on the latest 1.2 mask revision.=20

Hope that someone has some suggestions about this strange behaviour.=20

Thanks for your help,
Regards

Guy Yribarren
=20
ACTIS Computer - 42 Route de Satigny - CH-1217 Meyrin - Switzerland
Tel +41 (22) 706 1830 - Fax +41 (22) 794 4391
guy.yribarren <at> actis-computer.com

=A0=A0=A0

^ permalink raw reply

* [PATCH 5/5] KVM: PPC: e500mc: Enable e6500 cores
From: Mihai Caraman @ 2013-01-30 13:29 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm
In-Reply-To: <1359552584-17861-1-git-send-email-mihai.caraman@freescale.com>

Extend processor compatibility names to e6500 cores.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/kvm/e500mc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 1f89d26..6c87299 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -172,6 +172,8 @@ int kvmppc_core_check_processor_compat(void)
 		r = 0;
 	else if (strcmp(cur_cpu_spec->cpu_name, "e5500") == 0)
 		r = 0;
+	else if (strcmp(cur_cpu_spec->cpu_name, "e6500") == 0)
+		r = 0;
 	else
 		r = -ENOTSUPP;
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register
From: Mihai Caraman @ 2013-01-30 13:29 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm
In-Reply-To: <1359552584-17861-1-git-send-email-mihai.caraman@freescale.com>

EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate EPTCFG register now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/include/asm/kvm_host.h |    1 +
 arch/powerpc/kvm/e500.h             |    6 ++++++
 arch/powerpc/kvm/e500_emulate.c     |    9 +++++++++
 arch/powerpc/kvm/e500_mmu.c         |    5 +++++
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 88fcfe6..f480b20 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
 	u32 tlbcfg[4];
 	u32 tlbps[4];
 	u32 mmucfg;
+	u32 eptcfg;
 	u32 epr;
 	struct kvmppc_booke_debug_reg dbg_reg;
 #endif
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index b9f76d8..983eb95 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -308,4 +308,10 @@ static inline unsigned int has_mmu_v2(const struct kvm_vcpu *vcpu)
 	return ((vcpu->arch.mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2);
 }
 
+static inline unsigned int supports_page_tables(const struct kvm_vcpu *vcpu)
+{
+	return ((vcpu->arch.tlbcfg[0] & TLBnCFG_IND)
+		|| (vcpu->arch.tlbcfg[1] & TLBnCFG_IND));
+}
+
 #endif /* KVM_E500_H */
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index 5515dc5..493e231 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -339,6 +339,15 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
 			return EMULATE_FAIL;
 		*spr_val = vcpu->arch.tlbps[1];
 		break;
+	case SPRN_EPTCFG:
+		if (!has_mmu_v2(vcpu))
+			return EMULATE_FAIL;
+		/*
+		 * Legacy Linux guests access EPTCFG register even if the E.PT
+		 * category is disabled in the VM. Give them a chance to live.
+		 */
+		*spr_val = vcpu->arch.eptcfg;
+		break;
 	default:
 		emulated = kvmppc_booke_emulate_mfspr(vcpu, sprn, spr_val);
 	}
diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 9a1f7b7..199c11e 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -799,6 +799,11 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500)
 	if (has_mmu_v2(vcpu)) {
 		vcpu->arch.tlbps[0] = mfspr(SPRN_TLB0PS);
 		vcpu->arch.tlbps[1] = mfspr(SPRN_TLB1PS);
+
+		if (supports_page_tables(vcpu))
+			vcpu->arch.eptcfg = mfspr(SPRN_EPTCFG);
+		else
+			vcpu->arch.eptcfg = 0;
 	}
 
 	kvmppc_recalc_tlb1map_range(vcpu_e500);
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/5] KVM: PPC: e500: Remove E.PT category from VCPUs
From: Mihai Caraman @ 2013-01-30 13:29 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm
In-Reply-To: <1359552584-17861-1-git-send-email-mihai.caraman@freescale.com>

Embedded.Page Table (E.PT) category in VMs requires indirect tlb entries
emulation which is not supported yet. Configure TLBnCFG to remove E.PT
category from VCPUs.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/kvm/e500_mmu.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 129299a..9a1f7b7 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -692,12 +692,14 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
 	vcpu_e500->gtlb_offset[0] = 0;
 	vcpu_e500->gtlb_offset[1] = params.tlb_sizes[0];
 
-	vcpu->arch.tlbcfg[0] &= ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
+	vcpu->arch.tlbcfg[0] &=
+			      ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
 	if (params.tlb_sizes[0] <= 2048)
 		vcpu->arch.tlbcfg[0] |= params.tlb_sizes[0];
 	vcpu->arch.tlbcfg[0] |= params.tlb_ways[0] << TLBnCFG_ASSOC_SHIFT;
 
-	vcpu->arch.tlbcfg[1] &= ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
+	vcpu->arch.tlbcfg[1] &=
+			      ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
 	vcpu->arch.tlbcfg[1] |= params.tlb_sizes[1];
 	vcpu->arch.tlbcfg[1] |= params.tlb_ways[1] << TLBnCFG_ASSOC_SHIFT;
 
@@ -783,13 +785,13 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500)
 
 	/* Init TLB configuration register */
 	vcpu->arch.tlbcfg[0] = mfspr(SPRN_TLB0CFG) &
-			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
+			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
 	vcpu->arch.tlbcfg[0] |= vcpu_e500->gtlb_params[0].entries;
 	vcpu->arch.tlbcfg[0] |=
 		vcpu_e500->gtlb_params[0].ways << TLBnCFG_ASSOC_SHIFT;
 
 	vcpu->arch.tlbcfg[1] = mfspr(SPRN_TLB1CFG) &
-			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
+			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
 	vcpu->arch.tlbcfg[1] |= vcpu_e500->gtlb_params[1].entries;
 	vcpu->arch.tlbcfg[1] |=
 		vcpu_e500->gtlb_params[1].ways << TLBnCFG_ASSOC_SHIFT;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 0/5] KVM: PPC: e500: Enable FSL e6500 core
From: Mihai Caraman @ 2013-01-30 13:29 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

Enable Freescale e6500 core adding missing MAV 2.0 support. LRAT and Page
Table are not addresses by this commit.

Mihai Caraman (5):
  KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
  KVM: PPC: e500: Emulate TLBnPS registers
  KVM: PPC: e500: Remove E.PT category from VCPUs
  KVM: PPC: e500: Emulate EPTCFG register
  KVM: PPC: e500mc: Enable e6500 cores

 arch/powerpc/include/asm/kvm_host.h |    2 ++
 arch/powerpc/kvm/e500.h             |   11 +++++++++++
 arch/powerpc/kvm/e500_emulate.c     |   19 +++++++++++++++++++
 arch/powerpc/kvm/e500_mmu.c         |   24 ++++++++++++++++++------
 arch/powerpc/kvm/e500mc.c           |    2 ++
 5 files changed, 52 insertions(+), 6 deletions(-)

-- 
1.7.4.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox