LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Rafael J. Wysocki @ 2013-01-31 20:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-s390, jiang.liu, wency, linux-acpi, Greg KH, linux-kernel,
	linux-mm, isimatu.yasuaki, yinghai, srivatsa.bhat, guohanjun,
	bhelgaas, akpm, linuxppc-dev, lenb
In-Reply-To: <1359601065.15120.156.camel@misato.fc.hp.com>

On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > > +/*
> > > + * Hot-plug device information
> > > + */
> > 
> > Again, stop it with the "generic" hotplug term here, and everywhere
> > else.  You are doing a very _specific_ type of hotplug devices, so spell
> > it out.  We've worked hard to hotplug _everything_ in Linux, you are
> > going to confuse a lot of people with this type of terms.
> 
> Agreed.  I will clarify in all places.
> 
> > > +union shp_dev_info {
> > > +	struct shp_cpu {
> > > +		u32		cpu_id;
> > > +	} cpu;
> > 
> > What is this?  Why not point to the system device for the cpu?
> 
> This info is used to on-line a new CPU and create its system/cpu device.
> In other word, a system/cpu device is created as a result of CPU
> hotplug.
> 
> > > +	struct shp_memory {
> > > +		int		node;
> > > +		u64		start_addr;
> > > +		u64		length;
> > > +	} mem;
> > 
> > Same here, why not point to the system device?
> 
> Same as above.
> 
> > > +	struct shp_hostbridge {
> > > +	} hb;
> > > +
> > > +	struct shp_node {
> > > +	} node;
> > 
> > What happened here with these?  Empty structures?  Huh?
> 
> They are place holders for now.  PCI bridge hot-plug and node hot-plug
> are still very much work in progress, so I have not integrated them into
> this framework yet.
> 
> > > +};
> > > +
> > > +struct shp_device {
> > > +	struct list_head	list;
> > > +	struct device		*device;
> > 
> > No, make it a "real" device, embed the device into it.
> 
> This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> online/offline operation in order to maintain the current behavior.  CPU
> online/offline operation only changes the state of CPU, so its
> system/cpu device continues to be present before and after an operation.
> (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> device.)  So, this "*device" needs to be a pointer to reference an
> existing device that is to be on-lined/off-lined.
> 
> > But, again, I'm going to ask why you aren't using the existing cpu /
> > memory / bridge / node devices that we have in the kernel.  Please use
> > them, or give me a _really_ good reason why they will not work.
> 
> We cannot use the existing system devices or ACPI devices here.  During
> hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> device information in a platform-neutral way.  During hot-add, we first
> creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> but platform-neutral modules cannot use them as they are ACPI-specific.

But suppose we're smart and have ACPI scan handlers that will create
"physical" device nodes for those devices during the ACPI namespace scan.
Then, the platform-neutral nodes will be able to bind to those "physical"
nodes.  Moreover, it should be possible to get a hierarchy of device objects
this way that will reflect all of the dependencies we need to take into
account during hot-add and hot-remove operations.  That may not be what we
have today, but I don't see any *fundamental* obstacles preventing us from
using this approach.

This is already done for PCI host bridges and platform devices and I don't
see why we can't do that for the other types of devices too.

The only missing piece I see is a way to handle the "eject" problem, i.e.
when we try do eject a device at the top of a subtree and need to tear down
the entire subtree below it, but if that's going to lead to a system crash,
for example, we want to cancel the eject.  It seems to me that we'll need some
help from the driver core here.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH 22/25] perf: Make EVENT_ATTR global
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
available to all architectures.

Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
in the variable name as a parameter.

Changelog[v2]
	- [Jiri Olsa] No need to define PMU_EVENT_PTR()

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c | 13 +++----------
 include/linux/perf_event.h       | 11 +++++++++++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6774c17..c0df5ed2 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1310,11 +1310,6 @@ static struct attribute_group x86_pmu_format_group = {
 	.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-	struct device_attribute attr;
-	u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
@@ -1348,11 +1343,9 @@ static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *at
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
 
-#define EVENT_ATTR(_name, _id)					\
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {		\
-	.attr = __ATTR(_name, 0444, events_sysfs_show, NULL),	\
-	.id   =  PERF_COUNT_HW_##_id,				\
-};
+#define EVENT_ATTR(_name, _id)						\
+	PMU_EVENT_ATTR(_name, EVENT_VAR(_id), PERF_COUNT_HW_##_id,	\
+			events_sysfs_show)
 
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..42adf01 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -817,6 +817,17 @@ do {									\
 } while (0)
 
 
+struct perf_pmu_events_attr {
+	struct device_attribute attr;
+	u64 id;
+};
+
+#define PMU_EVENT_ATTR(_name, _var, _id, _show)				\
+static struct perf_pmu_events_attr _var = {				\
+	.attr = __ATTR(_name, 0444, _show, NULL),			\
+	.id   =  _id,							\
+};
+
 #define PMU_FORMAT_ATTR(_name, _format)					\
 static ssize_t								\
 _name##_show(struct device *dev,					\
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* [PATCH 21/25] perf/Power7: Use macros to identify perf events
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Define and use macros to identify perf events codes This would make it
easier and more readable when these event codes need to be used in more
than one place.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062353.GB13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/powerpc/perf/power7-pmu.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 2ee01e3..eebb36d 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -51,6 +51,18 @@
 #define MMCR1_PMCSEL_MSK	0xff
 
 /*
+ * Power7 event codes.
+ */
+#define	PME_PM_CYC			0x1e
+#define	PME_PM_GCT_NOSLOT_CYC		0x100f8
+#define	PME_PM_CMPLU_STALL		0x4000a
+#define	PME_PM_INST_CMPL		0x2
+#define	PME_PM_LD_REF_L1		0xc880
+#define	PME_PM_LD_MISS_L1		0x400f0
+#define	PME_PM_BRU_FIN			0x10068
+#define	PME_PM_BRU_MPRED		0x400f6
+
+/*
  * Layout of constraint bits:
  * 6666555555555544444444443333333333222222222211111111110000000000
  * 3210987654321098765432109876543210987654321098765432109876543210
@@ -307,14 +319,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 }
 
 static int power7_generic_events[] = {
-	[PERF_COUNT_HW_CPU_CYCLES] = 0x1e,
-	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */
-	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a,  /* CMPLU_STALL */
-	[PERF_COUNT_HW_INSTRUCTIONS] = 2,
-	[PERF_COUNT_HW_CACHE_REFERENCES] = 0xc880,	/* LD_REF_L1_LSU*/
-	[PERF_COUNT_HW_CACHE_MISSES] = 0x400f0,		/* LD_MISS_L1	*/
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068,	/* BRU_FIN	*/
-	[PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,	/* BR_MPRED	*/
+	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PME_PM_CMPLU_STALL,
+	[PERF_COUNT_HW_INSTRUCTIONS] =			PME_PM_INST_CMPL,
+	[PERF_COUNT_HW_CACHE_REFERENCES] =		PME_PM_LD_REF_L1,
+	[PERF_COUNT_HW_CACHE_MISSES] =			PME_PM_LD_MISS_L1,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PME_PM_BRU_FIN,
+	[PERF_COUNT_HW_BRANCH_MISSES] =			PME_PM_BRU_MPRED,
 };
 
 #define C(x)	PERF_COUNT_HW_CACHE_##x
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* [PATCH 23/25] perf/POWER7: Make generic event translations available in sysfs
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Make the generic perf events in POWER7 available via sysfs.

	$ ls /sys/bus/event_source/devices/cpu/events
	branch-instructions
	branch-misses
	cache-misses
	cache-references
	cpu-cycles
	instructions
	stalled-cycles-backend
	stalled-cycles-frontend

	$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
	event=0x400f0

This patch is based on commits that implement this functionality on x86.
Eg:
	commit a47473939db20e3961b200eb00acf5fcf084d755
	Author: Jiri Olsa <jolsa@redhat.com>
	Date:   Wed Oct 10 14:53:11 2012 +0200

	    perf/x86: Make hardware event translations available in sysfs

Changelog:[v2]
	[Jiri Osla] Drop EVENT_ID() macro since it is only used once.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062454.GD13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/ABI/stable/sysfs-devices-cpu-events |  0
 arch/powerpc/include/asm/perf_event_server.h      | 23 +++++++++++++++
 arch/powerpc/perf/core-book3s.c                   | 12 ++++++++
 arch/powerpc/perf/power7-pmu.c                    | 34 +++++++++++++++++++++++
 4 files changed, 69 insertions(+)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events b/Documentation/ABI/stable/sysfs-devices-cpu-events
new file mode 100644
index 0000000..e69de29
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 9710be3..b9b6c55 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -11,6 +11,7 @@
 
 #include <linux/types.h>
 #include <asm/hw_irq.h>
+#include <linux/device.h>
 
 #define MAX_HWEVENTS		8
 #define MAX_EVENT_ALTERNATIVES	8
@@ -35,6 +36,7 @@ struct power_pmu {
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
 	u32		flags;
+	const struct attribute_group	**attr_groups;
 	int		n_generic;
 	int		*generic_events;
 	int		(*cache_events)[PERF_COUNT_HW_CACHE_MAX]
@@ -109,3 +111,24 @@ extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
  * If an event_id is not subject to the constraint expressed by a particular
  * field, then it will have 0 in both the mask and value for that field.
  */
+
+extern ssize_t power_events_sysfs_show(struct device *dev,
+				struct device_attribute *attr, char *page);
+
+/*
+ * EVENT_VAR() is same as PMU_EVENT_VAR with a suffix.
+ *
+ * Having a suffix allows us to have aliases in sysfs - eg: the generic
+ * event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and
+ * 'PM_CYC' where the latter is the name by which the event is known in
+ * POWER CPU specification.
+ */
+#define	EVENT_VAR(_id, _suffix)		event_attr_##_id##_suffix
+#define	EVENT_PTR(_id, _suffix)		&EVENT_VAR(_id, _suffix)
+
+#define	EVENT_ATTR(_name, _id, _suffix)					\
+	PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,	\
+			power_events_sysfs_show)
+
+#define	GENERIC_EVENT_ATTR(_name, _id)	EVENT_ATTR(_name, _id, _g)
+#define	GENERIC_EVENT_PTR(_id)		EVENT_PTR(_id, _g)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index aa2465e..fa476d5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event)
 	return event->hw.idx;
 }
 
+ssize_t power_events_sysfs_show(struct device *dev,
+				struct device_attribute *attr, char *page)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_enable,
 	.pmu_disable	= power_pmu_disable,
@@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu)
 	pr_info("%s performance monitor hardware support registered\n",
 		pmu->name);
 
+	power_pmu.attr_groups = ppmu->attr_groups;
+
 #ifdef MSR_HV
 	/*
 	 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index eebb36d..269bf24 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -374,6 +374,39 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 	},
 };
 
+
+GENERIC_EVENT_ATTR(cpu-cycles,			CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,	GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend,	CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,		INST_CMPL);
+GENERIC_EVENT_ATTR(cache-references,		LD_REF_L1);
+GENERIC_EVENT_ATTR(cache-misses,		LD_MISS_L1);
+GENERIC_EVENT_ATTR(branch-instructions,		BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,		BRU_MPRED);
+
+static struct attribute *power7_events_attr[] = {
+	GENERIC_EVENT_PTR(CYC),
+	GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
+	GENERIC_EVENT_PTR(CMPLU_STALL),
+	GENERIC_EVENT_PTR(INST_CMPL),
+	GENERIC_EVENT_PTR(LD_REF_L1),
+	GENERIC_EVENT_PTR(LD_MISS_L1),
+	GENERIC_EVENT_PTR(BRU_FIN),
+	GENERIC_EVENT_PTR(BRU_MPRED),
+	NULL
+};
+
+
+static struct attribute_group power7_pmu_events_group = {
+	.name = "events",
+	.attrs = power7_events_attr,
+};
+
+static const struct attribute_group *power7_pmu_attr_groups[] = {
+	&power7_pmu_events_group,
+	NULL,
+};
+
 static struct power_pmu power7_pmu = {
 	.name			= "POWER7",
 	.n_counter		= 6,
@@ -385,6 +418,7 @@ static struct power_pmu power7_pmu = {
 	.get_alternatives	= power7_get_alternatives,
 	.disable_pmc		= power7_disable_pmc,
 	.flags			= PPMU_ALT_SIPR,
+	.attr_groups		= power7_pmu_attr_groups,
 	.n_generic		= ARRAY_SIZE(power7_generic_events),
 	.generic_events		= power7_generic_events,
 	.cache_events		= &power7_cache_events,
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* [GIT PULL 00/25] perf/core improvements and fixes
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo,
	Robert Richter, Andi Kleen, Peter Zijlstra, Frederic Weisbecker,
	Namhyung Kim, Anton Blanchard, linux-kernel, Stephane Eranian,
	Pekka Enberg, linuxppc-dev, Paul Mackerras, Mike Galbraith, acme,
	David Ahern, Namhyung Kim, Sukadev Bhattiprolu, Jiri Olsa

Hi Ingo,

	Please consider pulling,

- Arnaldo

The following changes since commit 152fefa921535665f95840c08062844ab2f5593e:

  Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2013-01-31 10:20:14 +0100)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo

for you to fetch changes up to 2ac3634a7e1c8eedc961030c87c5c36ebd5bbf8e:

  perf: Document the ABI of perf sysfs entries (2013-01-31 13:07:51 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

. Make some POWER7 events available in sysfs, equivalent to
  what was done on x86, from Sukadev Bhattiprolu.

. Add event group view, from Namyung Kim:

  To use it, 'perf record' should group events when recording. And then perf
  report parses the saved group relation from file header and prints them
  together if --group option is provided.  You can use 'perf evlist' command to
  see event group information:

    $ perf record -e '{ref-cycles,cycles}' noploop 1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

    $ perf evlist --group
    {ref-cycles,cycles}

  With this example, default perf report will show you each event
  separately like this:

    $ perf report
    ...
    # group: {ref-cycles,cycles}
    # ========
    # Samples: 3K of event 'ref-cycles'
    # Event count (approx.): 3153797218
    #
    # Overhead  Command      Shared Object                      Symbol
    # ........  .......  .................  ..........................
        99.84%  noploop  noploop            [.] main
         0.07%  noploop  ld-2.15.so         [.] strcmp
         0.03%  noploop  [kernel.kallsyms]  [k] timerqueue_del
         0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
         0.02%  noploop  [kernel.kallsyms]  [k] account_user_time
         0.01%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
         0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe

    # Samples: 3K of event 'cycles'
    # Event count (approx.): 3722310525
    #
    # Overhead  Command      Shared Object                     Symbol
    # ........  .......  .................  .........................
        99.76%  noploop  noploop            [.] main
         0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
         0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
         0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
         0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
         0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
         0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe

  In this case the event group information will be shown in the end of
  header area.  So you can use --group option to enable event group view.

    $ perf report --group
    ...
    # group: {ref-cycles,cycles}
    # ========
    # Samples: 7K of event 'anon group { ref-cycles, cycles }'
    # Event count (approx.): 6876107743
    #
    #         Overhead  Command      Shared Object                      Symbol
    # ................  .......  .................  ..........................
        99.84%  99.76%  noploop  noploop            [.] main
         0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
         0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
         0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
         0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
         0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
         0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
         0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
         0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
         0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
         0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

  As you can see the Overhead column now contains both of ref-cycles and
  cycles and header line shows group information also - 'anon group {
  ref-cycles, cycles }'.  The output is sorted by period of group leader
  first.

  If perf.data file doesn't contain group information, this --group
  option does nothing.  So if you want enable event group view by
  default you can set it in ~/.perfconfig file:

    $ cat ~/.perfconfig
    [report]
    group = true

  It can be overridden with command line if you want:

    $ perf report --no-group

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (2):
      perf top: Stop using exit()
      perf top: Delete maps on exit

Namhyung Kim (18):
      perf tools: Keep group information
      perf tests: Add group test conditions
      perf header: Add HEADER_GROUP_DESC feature
      perf report: Make another loop for linking group hists
      perf hists: Resort hist entries using group members for output
      perf ui/hist: Consolidate hpp helpers
      perf hists browser: Convert hpp helpers to a function
      perf gtk/browser: Convert hpp helpers to a function
      perf ui/hist: Add support for event group view
      perf hists browser: Move coloring logic to hpp functions
      perf hists browser: Add suppport for event group view
      perf gtk/browser: Add support for event group view
      perf gtk/browser: Trim column header string when event group enabled
      perf report: Bypass non-leader events when event group is enabled
      perf report: Show group description when event group is enabled
      perf report: Add --group option
      perf report: Add report.group config option
      perf evlist: Add --group option

Sukadev Bhattiprolu (5):
      perf/Power7: Use macros to identify perf events
      perf: Make EVENT_ATTR global
      perf/POWER7: Make generic event translations available in sysfs
      perf/POWER7: Make some POWER7 events available in sysfs
      perf: Document the ABI of perf sysfs entries

 .../testing/sysfs-bus-event_source-devices-events  |  62 +++++
 arch/powerpc/include/asm/perf_event_server.h       |  26 ++
 arch/powerpc/perf/core-book3s.c                    |  12 +
 arch/powerpc/perf/power7-pmu.c                     |  80 +++++-
 arch/x86/kernel/cpu/perf_event.c                   |  13 +-
 include/linux/perf_event.h                         |  11 +
 tools/perf/Documentation/perf-evlist.txt           |   4 +
 tools/perf/Documentation/perf-report.txt           |   3 +
 tools/perf/builtin-evlist.c                        |   7 +
 tools/perf/builtin-record.c                        |   3 +
 tools/perf/builtin-report.c                        |  47 +++-
 tools/perf/builtin-top.c                           |  62 +++--
 tools/perf/tests/parse-events.c                    |  28 ++
 tools/perf/ui/browsers/hists.c                     | 217 ++++++++++++---
 tools/perf/ui/gtk/hists.c                          | 130 +++++++--
 tools/perf/ui/hist.c                               | 306 ++++++++++-----------
 tools/perf/ui/stdio/hist.c                         |   2 +
 tools/perf/util/evlist.c                           |   7 +-
 tools/perf/util/evlist.h                           |   1 +
 tools/perf/util/evsel.c                            |  49 +++-
 tools/perf/util/evsel.h                            |  16 ++
 tools/perf/util/header.c                           | 164 +++++++++++
 tools/perf/util/header.h                           |   2 +
 tools/perf/util/hist.c                             |  59 +++-
 tools/perf/util/parse-events.c                     |   1 +
 tools/perf/util/parse-events.h                     |   1 +
 tools/perf/util/parse-events.y                     |  10 +
 tools/perf/util/symbol.h                           |   3 +-
 28 files changed, 1059 insertions(+), 267 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-events

^ permalink raw reply

* [PATCH 25/25] perf: Document the ABI of perf sysfs entries
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

This patchset addes two new sets of files to sysfs for POWER architecture.

	- perf event config format in /sys/devices/cpu/format/event
	- generic and POWER-specific perf events in /sys/devices/cpu/events/

The format of the first file is already documented in:

	sysfs-bus-event_source-devices-format

Document the format of the second set of files '/sys/devices/cpu/events/*'
which would also become part of the ABI.

Changelog[v4]:
	[Jiri Olsa]: Mention that multiple event= like terms can be specified
	in the 'events' file.
	[Jiri Olsa]: Remove the documentation for the 'config format' file
	as it is already documented in 'Documentation/ABI/testing/'.
	[Jiri Olsa]: Move ABI documentation from 'stable/' to 'testing/'

Changelog[v3]:
	[Greg KH] Include ABI documentation.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062645.GG13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/ABI/stable/sysfs-devices-cpu-events  |  0
 .../testing/sysfs-bus-event_source-devices-events  | 62 ++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 delete mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-events

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events b/Documentation/ABI/stable/sysfs-devices-cpu-events
deleted file mode 100644
index e69de29..0000000
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
new file mode 100644
index 0000000..0adeb52
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -0,0 +1,62 @@
+What:		/sys/devices/cpu/events/
+		/sys/devices/cpu/events/branch-misses
+		/sys/devices/cpu/events/cache-references
+		/sys/devices/cpu/events/cache-misses
+		/sys/devices/cpu/events/stalled-cycles-frontend
+		/sys/devices/cpu/events/branch-instructions
+		/sys/devices/cpu/events/stalled-cycles-backend
+		/sys/devices/cpu/events/instructions
+		/sys/devices/cpu/events/cpu-cycles
+
+Date:		2013/01/08
+
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+
+Description:	Generic performance monitoring events
+
+		A collection of performance monitoring events that may be
+		supported by many/most CPUs. These events can be monitored
+		using the 'perf(1)' tool.
+
+		The contents of each file would look like:
+
+			event=0xNNNN
+
+		where 'N' is a hex digit and the number '0xNNNN' shows the
+		"raw code" for the perf event identified by the file's
+		"basename".
+
+
+What: 		/sys/devices/cpu/events/PM_LD_MISS_L1
+		/sys/devices/cpu/events/PM_LD_REF_L1
+		/sys/devices/cpu/events/PM_CYC
+		/sys/devices/cpu/events/PM_BRU_FIN
+		/sys/devices/cpu/events/PM_GCT_NOSLOT_CYC
+		/sys/devices/cpu/events/PM_BRU_MPRED
+		/sys/devices/cpu/events/PM_INST_CMPL
+		/sys/devices/cpu/events/PM_CMPLU_STALL
+
+Date:		2013/01/08
+
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+		Linux Powerpc mailing list <linuxppc-dev@ozlabs.org>
+
+Description:	POWER-systems specific performance monitoring events
+
+		A collection of performance monitoring events that may be
+		supported by the POWER CPU. These events can be monitored
+		using the 'perf(1)' tool.
+
+		These events may not be supported by other CPUs.
+
+		The contents of each file would look like:
+
+			event=0xNNNN
+
+		where 'N' is a hex digit and the number '0xNNNN' shows the
+		"raw code" for the perf event identified by the file's
+		"basename".
+
+		Further, multiple terms like 'event=0xNNNN' can be specified
+		and separated with comma. All available terms are defined in
+		the /sys/bus/event_source/devices/<dev>/format file.
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* [PATCH 24/25] perf/POWER7: Make some POWER7 events available in sysfs
From: Arnaldo Carvalho de Melo @ 2013-01-31 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Peter Zijlstra, Robert Richter, Anton Blanchard,
	linux-kernel, Stephane Eranian, Arnaldo Carvalho de Melo,
	linuxppc-dev, Ingo Molnar, Paul Mackerras, Sukadev Bhattiprolu,
	Jiri Olsa
In-Reply-To: <1359653128-10433-1-git-send-email-acme@infradead.org>

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Make some POWER7-specific perf events available in sysfs.

	$ /bin/ls -1 /sys/bus/event_source/devices/cpu/events/
	branch-instructions
	branch-misses
	cache-misses
	cache-references
	cpu-cycles
	instructions
	PM_BRU_FIN
	PM_BRU_MPRED
	PM_CMPLU_STALL
	PM_CYC
	PM_GCT_NOSLOT_CYC
	PM_INST_CMPL
	PM_LD_MISS_L1
	PM_LD_REF_L1
	stalled-cycles-backend
	stalled-cycles-frontend

where the 'PM_*' events are POWER specific and the others are the
generic events.

This will enable users to specify these events with their symbolic
names rather than with their raw code.

	perf stat -e 'cpu/PM_CYC' ...

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062528.GE13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/powerpc/include/asm/perf_event_server.h |  3 +++
 arch/powerpc/perf/power7-pmu.c               | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index b9b6c55..b29fcc6 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -132,3 +132,6 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 
 #define	GENERIC_EVENT_ATTR(_name, _id)	EVENT_ATTR(_name, _id, _g)
 #define	GENERIC_EVENT_PTR(_id)		EVENT_PTR(_id, _g)
+
+#define	POWER_EVENT_ATTR(_name, _id)	EVENT_ATTR(PM_##_name, _id, _p)
+#define	POWER_EVENT_PTR(_id)		EVENT_PTR(_id, _p)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 269bf24..b554879 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -384,6 +384,15 @@ GENERIC_EVENT_ATTR(cache-misses,		LD_MISS_L1);
 GENERIC_EVENT_ATTR(branch-instructions,		BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,		BRU_MPRED);
 
+POWER_EVENT_ATTR(CYC,				CYC);
+POWER_EVENT_ATTR(GCT_NOSLOT_CYC,		GCT_NOSLOT_CYC);
+POWER_EVENT_ATTR(CMPLU_STALL,			CMPLU_STALL);
+POWER_EVENT_ATTR(INST_CMPL,			INST_CMPL);
+POWER_EVENT_ATTR(LD_REF_L1,			LD_REF_L1);
+POWER_EVENT_ATTR(LD_MISS_L1,			LD_MISS_L1);
+POWER_EVENT_ATTR(BRU_FIN,			BRU_FIN)
+POWER_EVENT_ATTR(BRU_MPRED,			BRU_MPRED);
+
 static struct attribute *power7_events_attr[] = {
 	GENERIC_EVENT_PTR(CYC),
 	GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
@@ -393,6 +402,15 @@ static struct attribute *power7_events_attr[] = {
 	GENERIC_EVENT_PTR(LD_MISS_L1),
 	GENERIC_EVENT_PTR(BRU_FIN),
 	GENERIC_EVENT_PTR(BRU_MPRED),
+
+	POWER_EVENT_PTR(CYC),
+	POWER_EVENT_PTR(GCT_NOSLOT_CYC),
+	POWER_EVENT_PTR(CMPLU_STALL),
+	POWER_EVENT_PTR(INST_CMPL),
+	POWER_EVENT_PTR(LD_REF_L1),
+	POWER_EVENT_PTR(LD_MISS_L1),
+	POWER_EVENT_PTR(BRU_FIN),
+	POWER_EVENT_PTR(BRU_MPRED),
 	NULL
 };
 
-- 
1.8.1.1.361.gec3ae6e

^ permalink raw reply related

* Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Scott Wood @ 2013-01-31 16:48 UTC (permalink / raw)
  To: Caraman Mihai Claudiu-B02008
  Cc: linuxppc-dev@lists.ozlabs.org, Alexander Graf,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF2F7861@039-SN2MPN1-012.039d.mgd.msft.net>

On 01/31/2013 09:26:20 AM, Caraman Mihai Claudiu-B02008 wrote:
> > -----Original Message-----
> > From: Alexander Graf [mailto:agraf@suse.de]
> > Sent: Thursday, January 31, 2013 4:58 PM
> > To: Caraman Mihai Claudiu-B02008
> > Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> > dev@lists.ozlabs.org
> > Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> > initialization earlier
> >
> >
> > On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:
> >
> > >> -----Original Message-----
> > >> From: Alexander Graf [mailto:agraf@suse.de]
> > >> Sent: Thursday, January 31, 2013 3:21 PM
> > >> To: Caraman Mihai Claudiu-B02008
> > >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> > >> dev@lists.ozlabs.org
> > >> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG =20
> register
> > >> initialization earlier
> > >>
> > >>
> > >> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> > >>
> > >>> VCPU's MMUCFG register initialization should not depend on
> > >> KVM_CAP_SW_TLB
> > >>> ioctl call. Move it earlier into tlb initalization phase.
> > >>
> > >> Quite the contrary. The fact that there is an mfspr() in =20
> e500_mmu.c
> > >> already tells us that the code is broken. The TLB guest code =20
> should
> > only
> > >> depend on input from the SW_TLB configuration. It's completely
> > orthogonal
> > >> to the host capabilities.
> > >
> > > Then we have the same issue for TLBnCFG registers which need to be
> > configured
> > > via SW_TLB ioctl. What is the purpose of guest tlb initalization =20
> in
> > e500_mmu.c
> > > if we rely on SW_TLB?
> >
> > It's to provide a fallback to user space that doesn't implement =20
> SW_TLB
> > configuration yet.
>=20
> Do we have such a case now or is it just hypothetical? For the =20
> fallback we
> need to initialize the MMUCFG register which I intended to say in the =20
> commit
> message.

I don't think we need to support a fallback for e6500, since there's =20
nothing to be backwards compatible with.

As for use case, I don't see us ever supporting the guest being a =20
different CPU than the host.  Page sizes probably aren't a problem, but =20
there are other barriers.

The main reasons that TLBnCFG are settable through SW_TLB are:
1. The guest TLB can be enlarged as a performance hack (like in Topaz, =20
though QEMU doesn't currently do this),
2. The legacy default in KVM is based on the e500v1 TLB0 size, which is =20
half of what e500v2/e500mc have, and
3. QEMU needs to know the exact geometry of the TLB so that it can =20
interpret the shared data properly.

#3 seems like a compelling reason here, to avoid silent weirdness if =20
there's a slight mismatch between what QEMU thinks it's modelling and =20
what we're actually running on.

-Scott=

^ permalink raw reply

* RE: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Caraman Mihai Claudiu-B02008 @ 2013-01-31 15:26 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <6E880F7F-4EA9-44CF-8C8A-E1963988ACCD@suse.de>

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Thursday, January 31, 2013 4:58 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> initialization earlier
>=20
>=20
> On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:
>=20
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Thursday, January 31, 2013 3:21 PM
> >> To: Caraman Mihai Claudiu-B02008
> >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> >> dev@lists.ozlabs.org
> >> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> >> initialization earlier
> >>
> >>
> >> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> >>
> >>> VCPU's MMUCFG register initialization should not depend on
> >> KVM_CAP_SW_TLB
> >>> ioctl call. Move it earlier into tlb initalization phase.
> >>
> >> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
> >> already tells us that the code is broken. The TLB guest code should
> only
> >> depend on input from the SW_TLB configuration. It's completely
> orthogonal
> >> to the host capabilities.
> >
> > Then we have the same issue for TLBnCFG registers which need to be
> configured
> > via SW_TLB ioctl. What is the purpose of guest tlb initalization in
> e500_mmu.c
> > if we rely on SW_TLB?
>=20
> It's to provide a fallback to user space that doesn't implement SW_TLB
> configuration yet.

Do we have such a case now or is it just hypothetical? For the fallback we
need to initialize the MMUCFG register which I intended to say in the commi=
t
message.

>=20
>=20
> Alex
>=20

^ permalink raw reply

* RE: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register
From: Caraman Mihai Claudiu-B02008 @ 2013-01-31 14:58 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <911274D3-9188-45A5-81B0-643F5DB21653@suse.de>

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Thursday, January 31, 2013 3:31 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register
>=20
>=20
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
>=20
> > EPTCFG register defined by E.PT is accessed unconditionally by Linux
> guests
> > in the presence of MAV 2.0. Emulate EPTCFG register now.
> >
> > Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> > ---
> > arch/powerpc/include/asm/kvm_host.h |    1 +
> > arch/powerpc/kvm/e500.h             |    6 ++++++
> > arch/powerpc/kvm/e500_emulate.c     |    9 +++++++++
> > arch/powerpc/kvm/e500_mmu.c         |    5 +++++
> > 4 files changed, 21 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_host.h
> b/arch/powerpc/include/asm/kvm_host.h
> > index 88fcfe6..f480b20 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
> > 	u32 tlbcfg[4];
> > 	u32 tlbps[4];
> > 	u32 mmucfg;
> > +	u32 eptcfg;
>=20
> This too needs to be settable through SW_TLB.
>=20
> > 	u32 epr;
> > 	struct kvmppc_booke_debug_reg dbg_reg;
> > #endif
> > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> > index b9f76d8..983eb95 100644
> > --- a/arch/powerpc/kvm/e500.h
> > +++ b/arch/powerpc/kvm/e500.h
> > @@ -308,4 +308,10 @@ static inline unsigned int has_mmu_v2(const struct
> kvm_vcpu *vcpu)
> > 	return ((vcpu->arch.mmucfg & MMUCFG_MAVN) =3D=3D MMUCFG_MAVN_V2);
> > }
> >
> > +static inline unsigned int supports_page_tables(const struct kvm_vcpu
> *vcpu)
>=20
> bool again. Can we generalize this a bit more? How about a small
> framework that allows us to differentiate across e.XX features?=20

I thought you will ask for it :)

-Mike

^ permalink raw reply

* Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Alexander Graf @ 2013-01-31 14:58 UTC (permalink / raw)
  To: Caraman Mihai Claudiu-B02008
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF2F7733@039-SN2MPN1-012.039d.mgd.msft.net>


On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:

>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Thursday, January 31, 2013 3:21 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
>> dev@lists.ozlabs.org
>> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
>> initialization earlier
>>=20
>>=20
>> On 30.01.2013, at 14:29, Mihai Caraman wrote:
>>=20
>>> VCPU's MMUCFG register initialization should not depend on
>> KVM_CAP_SW_TLB
>>> ioctl call. Move it earlier into tlb initalization phase.
>>=20
>> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
>> already tells us that the code is broken. The TLB guest code should =
only
>> depend on input from the SW_TLB configuration. It's completely =
orthogonal
>> to the host capabilities.
>=20
> Then we have the same issue for TLBnCFG registers which need to be =
configured
> via SW_TLB ioctl. What is the purpose of guest tlb initalization in =
e500_mmu.c
> if we rely on SW_TLB?

It's to provide a fallback to user space that doesn't implement SW_TLB =
configuration yet.


Alex

^ permalink raw reply

* RE: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Caraman Mihai Claudiu-B02008 @ 2013-01-31 14:56 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <FE5499BE-EC13-47EA-806C-345F7135B677@suse.de>

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Thursday, January 31, 2013 3:21 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> initialization earlier
>=20
>=20
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
>=20
> > VCPU's MMUCFG register initialization should not depend on
> KVM_CAP_SW_TLB
> > ioctl call. Move it earlier into tlb initalization phase.
>=20
> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
> already tells us that the code is broken. The TLB guest code should only
> depend on input from the SW_TLB configuration. It's completely orthogonal
> to the host capabilities.

Then we have the same issue for TLBnCFG registers which need to be configur=
ed
via SW_TLB ioctl. What is the purpose of guest tlb initalization in e500_mm=
u.c
if we rely on SW_TLB?

-Mike

^ permalink raw reply

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
From: Toshi Kani @ 2013-01-31 14:42 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-mm@kvack.org, yinghai@kernel.org,
	linux-kernel@vger.kernel.org, Rafael J. Wysocki,
	linux-acpi@vger.kernel.org, isimatu.yasuaki@jp.fujitsu.com,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
In-Reply-To: <20130131052448.GC3228@kroah.com>

On Thu, 2013-01-31 at 05:24 +0000, Greg KH wrote:
> On Wed, Jan 30, 2013 at 06:15:12PM -0700, Toshi Kani wrote:
> > > Please make it a "real" pointer, and not a void *, those shouldn't be
> > > used at all if possible.
> > 
> > How about changing the "void *handle" to acpi_dev_node below?   
> > 
> >    struct acpi_dev_node    acpi_node;
> > 
> > Basically, it has the same challenge as struct device, which uses
> > acpi_dev_node as well.  We can add other FW node when needed (just like
> > device also has *of_node).
> 
> That sounds good to me.

Great!  Thanks Greg,
-Toshi

^ permalink raw reply

* Re: [PATCH 2/5] KVM: PPC: e500: Emulate TLBnPS registers
From: Alexander Graf @ 2013-01-31 13:32 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <F796A11A-01C0-4B4E-B4FE-8E88E40DB2CF@suse.de>


On 31.01.2013, at 14:24, Alexander Graf wrote:

>=20
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
>=20
>> Emulate TLBnPS registers which are available in MMU Architecture =
Version
>> (MAV) 2.0.
>>=20
>> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
>> ---
>> arch/powerpc/include/asm/kvm_host.h |    1 +
>> arch/powerpc/kvm/e500.h             |    5 +++++
>> arch/powerpc/kvm/e500_emulate.c     |   10 ++++++++++
>> arch/powerpc/kvm/e500_mmu.c         |    5 +++++
>> 4 files changed, 21 insertions(+), 0 deletions(-)
>>=20
>> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
>> index 8a72d59..88fcfe6 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -501,6 +501,7 @@ struct kvm_vcpu_arch {
>> 	spinlock_t wdt_lock;
>> 	struct timer_list wdt_timer;
>> 	u32 tlbcfg[4];
>> +	u32 tlbps[4];
>> 	u32 mmucfg;
>> 	u32 epr;
>> 	struct kvmppc_booke_debug_reg dbg_reg;
>> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
>> index 41cefd4..b9f76d8 100644
>> --- a/arch/powerpc/kvm/e500.h
>> +++ b/arch/powerpc/kvm/e500.h
>> @@ -303,4 +303,9 @@ static inline unsigned int get_tlbmiss_tid(struct =
kvm_vcpu *vcpu)
>> #define get_tlb_sts(gtlbe)              (MAS1_TS)
>> #endif /* !BOOKE_HV */
>>=20
>> +static inline unsigned int has_mmu_v2(const struct kvm_vcpu *vcpu)
>=20
> bool. Also rename it to "is_..." then.

In light of the comment I did in a later patch, this too could be =
convert to feature flags.


Alex

^ permalink raw reply

* Re: [PATCH 5/5] KVM: PPC: e500mc: Enable e6500 cores
From: Alexander Graf @ 2013-01-31 13:31 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-6-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Extend processor compatibility names to e6500 cores.
> 
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>

Looks good to me.

Reviewed-by: Alexander Graf <agraf@suse.de>


Alex

> ---
> arch/powerpc/kvm/e500mc.c |    2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index 1f89d26..6c87299 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -172,6 +172,8 @@ int kvmppc_core_check_processor_compat(void)
> 		r = 0;
> 	else if (strcmp(cur_cpu_spec->cpu_name, "e5500") == 0)
> 		r = 0;
> +	else if (strcmp(cur_cpu_spec->cpu_name, "e6500") == 0)
> +		r = 0;
> 	else
> 		r = -ENOTSUPP;
> 
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register
From: Alexander Graf @ 2013-01-31 13:31 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-5-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> EPTCFG register defined by E.PT is accessed unconditionally by Linux =
guests
> in the presence of MAV 2.0. Emulate EPTCFG register now.
>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> arch/powerpc/include/asm/kvm_host.h |    1 +
> arch/powerpc/kvm/e500.h             |    6 ++++++
> arch/powerpc/kvm/e500_emulate.c     |    9 +++++++++
> arch/powerpc/kvm/e500_mmu.c         |    5 +++++
> 4 files changed, 21 insertions(+), 0 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
> index 88fcfe6..f480b20 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
> 	u32 tlbcfg[4];
> 	u32 tlbps[4];
> 	u32 mmucfg;
> +	u32 eptcfg;

This too needs to be settable through SW_TLB.

> 	u32 epr;
> 	struct kvmppc_booke_debug_reg dbg_reg;
> #endif
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index b9f76d8..983eb95 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -308,4 +308,10 @@ static inline unsigned int has_mmu_v2(const =
struct kvm_vcpu *vcpu)
> 	return ((vcpu->arch.mmucfg & MMUCFG_MAVN) =3D=3D =
MMUCFG_MAVN_V2);
> }
>=20
> +static inline unsigned int supports_page_tables(const struct kvm_vcpu =
*vcpu)

bool again. Can we generalize this a bit more? How about a small =
framework that allows us to differentiate across e.XX features?

if (has_feature(vcpu, FEATURE_E_PT))
   ...


> +{
> +	return ((vcpu->arch.tlbcfg[0] & TLBnCFG_IND)
> +		|| (vcpu->arch.tlbcfg[1] & TLBnCFG_IND));
> +}
> +
> #endif /* KVM_E500_H */
> diff --git a/arch/powerpc/kvm/e500_emulate.c =
b/arch/powerpc/kvm/e500_emulate.c
> index 5515dc5..493e231 100644
> --- a/arch/powerpc/kvm/e500_emulate.c
> +++ b/arch/powerpc/kvm/e500_emulate.c
> @@ -339,6 +339,15 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu =
*vcpu, int sprn, ulong *spr_val)
> 			return EMULATE_FAIL;
> 		*spr_val =3D vcpu->arch.tlbps[1];
> 		break;
> +	case SPRN_EPTCFG:
> +		if (!has_mmu_v2(vcpu))
> +			return EMULATE_FAIL;
> +		/*
> +		 * Legacy Linux guests access EPTCFG register even if =
the E.PT
> +		 * category is disabled in the VM. Give them a chance to =
live.
> +		 */
> +		*spr_val =3D vcpu->arch.eptcfg;
> +		break;
> 	default:
> 		emulated =3D kvmppc_booke_emulate_mfspr(vcpu, sprn, =
spr_val);
> 	}
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 9a1f7b7..199c11e 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -799,6 +799,11 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 =
*vcpu_e500)
> 	if (has_mmu_v2(vcpu)) {
> 		vcpu->arch.tlbps[0] =3D mfspr(SPRN_TLB0PS);
> 		vcpu->arch.tlbps[1] =3D mfspr(SPRN_TLB1PS);
> +
> +		if (supports_page_tables(vcpu))
> +			vcpu->arch.eptcfg =3D mfspr(SPRN_EPTCFG);

Please don't introduce new mfspr()s here :). Just have user space set =
it.


Alex

> +		else
> +			vcpu->arch.eptcfg =3D 0;
> 	}
>=20
> 	kvmppc_recalc_tlb1map_range(vcpu_e500);
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/5] KVM: PPC: e500: Remove E.PT category from VCPUs
From: Alexander Graf @ 2013-01-31 13:27 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-4-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Embedded.Page Table (E.PT) category in VMs requires indirect tlb =
entries
> emulation which is not supported yet. Configure TLBnCFG to remove E.PT
> category from VCPUs.
>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>

Please do this in a separate function that you call from these =
locations. That way the code is self-documenting on what it actually =
does.

Also add a comment to this one function that removes E.PT related bits =
from TLBCFG that our _guest_ mmu emulation currently doesn't handle =
E.PT.


Alex

> ---
> arch/powerpc/kvm/e500_mmu.c |   10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)
>=20
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 129299a..9a1f7b7 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -692,12 +692,14 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu =
*vcpu,
> 	vcpu_e500->gtlb_offset[0] =3D 0;
> 	vcpu_e500->gtlb_offset[1] =3D params.tlb_sizes[0];
>=20
> -	vcpu->arch.tlbcfg[0] &=3D ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +	vcpu->arch.tlbcfg[0] &=3D
> +			      ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | =
TLBnCFG_IND);
> 	if (params.tlb_sizes[0] <=3D 2048)
> 		vcpu->arch.tlbcfg[0] |=3D params.tlb_sizes[0];
> 	vcpu->arch.tlbcfg[0] |=3D params.tlb_ways[0] << =
TLBnCFG_ASSOC_SHIFT;
>=20
> -	vcpu->arch.tlbcfg[1] &=3D ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +	vcpu->arch.tlbcfg[1] &=3D
> +			      ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | =
TLBnCFG_IND);
> 	vcpu->arch.tlbcfg[1] |=3D params.tlb_sizes[1];
> 	vcpu->arch.tlbcfg[1] |=3D params.tlb_ways[1] << =
TLBnCFG_ASSOC_SHIFT;
>=20
> @@ -783,13 +785,13 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 =
*vcpu_e500)
>=20
> 	/* Init TLB configuration register */
> 	vcpu->arch.tlbcfg[0] =3D mfspr(SPRN_TLB0CFG) &
> -			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | =
TLBnCFG_IND);
> 	vcpu->arch.tlbcfg[0] |=3D vcpu_e500->gtlb_params[0].entries;
> 	vcpu->arch.tlbcfg[0] |=3D
> 		vcpu_e500->gtlb_params[0].ways << TLBnCFG_ASSOC_SHIFT;
>=20
> 	vcpu->arch.tlbcfg[1] =3D mfspr(SPRN_TLB1CFG) &
> -			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | =
TLBnCFG_IND);
> 	vcpu->arch.tlbcfg[1] |=3D vcpu_e500->gtlb_params[1].entries;
> 	vcpu->arch.tlbcfg[1] |=3D
> 		vcpu_e500->gtlb_params[1].ways << TLBnCFG_ASSOC_SHIFT;
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/5] KVM: PPC: e500: Emulate TLBnPS registers
From: Alexander Graf @ 2013-01-31 13:24 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-3-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Emulate TLBnPS registers which are available in MMU Architecture =
Version
> (MAV) 2.0.
>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> arch/powerpc/include/asm/kvm_host.h |    1 +
> arch/powerpc/kvm/e500.h             |    5 +++++
> arch/powerpc/kvm/e500_emulate.c     |   10 ++++++++++
> arch/powerpc/kvm/e500_mmu.c         |    5 +++++
> 4 files changed, 21 insertions(+), 0 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
> index 8a72d59..88fcfe6 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -501,6 +501,7 @@ struct kvm_vcpu_arch {
> 	spinlock_t wdt_lock;
> 	struct timer_list wdt_timer;
> 	u32 tlbcfg[4];
> +	u32 tlbps[4];
> 	u32 mmucfg;
> 	u32 epr;
> 	struct kvmppc_booke_debug_reg dbg_reg;
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index 41cefd4..b9f76d8 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -303,4 +303,9 @@ static inline unsigned int get_tlbmiss_tid(struct =
kvm_vcpu *vcpu)
> #define get_tlb_sts(gtlbe)              (MAS1_TS)
> #endif /* !BOOKE_HV */
>=20
> +static inline unsigned int has_mmu_v2(const struct kvm_vcpu *vcpu)

bool. Also rename it to "is_..." then.

> +{
> +	return ((vcpu->arch.mmucfg & MMUCFG_MAVN) =3D=3D =
MMUCFG_MAVN_V2);
> +}
> +
> #endif /* KVM_E500_H */
> diff --git a/arch/powerpc/kvm/e500_emulate.c =
b/arch/powerpc/kvm/e500_emulate.c
> index e78f353..5515dc5 100644
> --- a/arch/powerpc/kvm/e500_emulate.c
> +++ b/arch/powerpc/kvm/e500_emulate.c
> @@ -329,6 +329,16 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu =
*vcpu, int sprn, ulong *spr_val)
> 		*spr_val =3D vcpu->arch.ivor[BOOKE_IRQPRIO_DBELL_CRIT];
> 		break;
> #endif
> +	case SPRN_TLB0PS:
> +		if (!has_mmu_v2(vcpu))
> +			return EMULATE_FAIL;
> +		*spr_val =3D vcpu->arch.tlbps[0];
> +		break;
> +	case SPRN_TLB1PS:
> +		if (!has_mmu_v2(vcpu))
> +			return EMULATE_FAIL;
> +		*spr_val =3D vcpu->arch.tlbps[1];
> +		break;
> 	default:
> 		emulated =3D kvmppc_booke_emulate_mfspr(vcpu, sprn, =
spr_val);
> 	}
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index bb1b2b0..129299a 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -794,6 +794,11 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 =
*vcpu_e500)
> 	vcpu->arch.tlbcfg[1] |=3D
> 		vcpu_e500->gtlb_params[1].ways << TLBnCFG_ASSOC_SHIFT;
>=20
> +	if (has_mmu_v2(vcpu)) {
> +		vcpu->arch.tlbps[0] =3D mfspr(SPRN_TLB0PS);
> +		vcpu->arch.tlbps[1] =3D mfspr(SPRN_TLB1PS);

So I suppose that means that user space doesn't tell us the possible TLB =
entry sizes through the SW_TLB config? Then we should add them there.

To not break untested code paths, we can still compare if the values =
user space asks for are identical to what physical hardware does. But =
eventually we shouldn't care.


Alex

^ permalink raw reply

* Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier
From: Alexander Graf @ 2013-01-31 13:21 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1359552584-17861-2-git-send-email-mihai.caraman@freescale.com>


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> VCPU's MMUCFG register initialization should not depend on =
KVM_CAP_SW_TLB
> ioctl call. Move it earlier into tlb initalization phase.

Quite the contrary. The fact that there is an mfspr() in e500_mmu.c =
already tells us that the code is broken. The TLB guest code should only =
depend on input from the SW_TLB configuration. It's completely =
orthogonal to the host capabilities.


Alex

>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> arch/powerpc/kvm/e500_mmu.c |    4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>=20
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 5c44759..bb1b2b0 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -692,8 +692,6 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu =
*vcpu,
> 	vcpu_e500->gtlb_offset[0] =3D 0;
> 	vcpu_e500->gtlb_offset[1] =3D params.tlb_sizes[0];
>=20
> -	vcpu->arch.mmucfg =3D mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> -
> 	vcpu->arch.tlbcfg[0] &=3D ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> 	if (params.tlb_sizes[0] <=3D 2048)
> 		vcpu->arch.tlbcfg[0] |=3D params.tlb_sizes[0];
> @@ -781,6 +779,8 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 =
*vcpu_e500)
> 	if (!vcpu_e500->g2h_tlb1_map)
> 		goto err;
>=20
> +	vcpu->arch.mmucfg =3D mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> +
> 	/* Init TLB configuration register */
> 	vcpu->arch.tlbcfg[0] =3D mfspr(SPRN_TLB0CFG) &
> 			     ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31 10:38 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A3CE6.202@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> Hi Simon,
> 
> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >
> > 1. IIUC, there is a button on machine which supports hot-remove memory,
> > then what's the difference between press button and echo to /sys?
> 
> No important difference, I think. Since I don't have the machine you are
> saying, I cannot surely answer you. :)
> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> is just another entrance. At last, they will run into the same code.
> 
> > 2. Since kernel memory is linear mapping(I mean direct mapping part),
> > why can't put kernel direct mapping memory into one memory device, and
> > other memory into the other devices?
> 
> We cannot do that because in that way, we will lose NUMA performance.
> 
> If you know NUMA, you will understand the following example:
> 
> node0:                    node1:
>     cpu0~cpu15                cpu16~cpu31
>     memory0~memory511         memory512~memory1023
> 
> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> If we set direct mapping area in node0, and movable area in node1, then
> the kernel code running on cpu16~cpu31 will have to access 
> memory0~memory511.
> This is a terrible performance down.

So if config NUMA, kernel memory will not be linear mapping anymore? For
example, 

Node 0  Node 1 

0 ~ 10G 11G~14G

kernel memory only at Node 0? Can part of kernel memory also at Node 1?

How big is kernel direct mapping memory in x86_64? Is there max limit?
It seems that only around 896MB on x86_32. 

> 
> >As you know x86_64 don't need
> > highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> > idea available? If is correct, x86_32 can't implement in the same way
> > since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> > hard to focus kernel memory on single memory device.
> 
> Sorry, I'm not quite familiar with x86_32 box.
> 
> > 3. In current implementation, if memory hotplug just need memory
> > subsystem and ACPI codes support? Or also needs firmware take part in?
> > Hope you can explain in details, thanks in advance. :)
> 
> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> based memory migration mentioned by Liu Jiang.

Is there any material about firmware based memory migration?

> 
> So far, I only know this. :)
> 
> > 4. What's the status of memory hotplug? Apart from can't remove kernel
> > memory, other things are fully implementation?
> 
> I think the main job is done for now. And there are still bugs to fix.
> And this functionality is not stable.
> 
> Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-01-31  9:44 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359622123.1391.19.camel@kernel>

Hi Simon,

On 01/31/2013 04:48 PM, Simon Jeons wrote:
> Hi Tang,
> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>
> 1. IIUC, there is a button on machine which supports hot-remove memory,
> then what's the difference between press button and echo to /sys?

No important difference, I think. Since I don't have the machine you are
saying, I cannot surely answer you. :)
AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
is just another entrance. At last, they will run into the same code.

> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> why can't put kernel direct mapping memory into one memory device, and
> other memory into the other devices?

We cannot do that because in that way, we will lose NUMA performance.

If you know NUMA, you will understand the following example:

node0:                    node1:
    cpu0~cpu15                cpu16~cpu31
    memory0~memory511         memory512~memory1023

cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
If we set direct mapping area in node0, and movable area in node1, then
the kernel code running on cpu16~cpu31 will have to access 
memory0~memory511.
This is a terrible performance down.

>As you know x86_64 don't need
> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> idea available? If is correct, x86_32 can't implement in the same way
> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> hard to focus kernel memory on single memory device.

Sorry, I'm not quite familiar with x86_32 box.

> 3. In current implementation, if memory hotplug just need memory
> subsystem and ACPI codes support? Or also needs firmware take part in?
> Hope you can explain in details, thanks in advance. :)

We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
based memory migration mentioned by Liu Jiang.

So far, I only know this. :)

> 4. What's the status of memory hotplug? Apart from can't remove kernel
> memory, other things are fully implementation?

I think the main job is done for now. And there are still bugs to fix.
And this functionality is not stable.

Thanks. :)

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  8:48 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A18FA.2010107@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

1. IIUC, there is a button on machine which supports hot-remove memory,
then what's the difference between press button and echo to /sys?
2. Since kernel memory is linear mapping(I mean direct mapping part),
why can't put kernel direct mapping memory into one memory device, and
other memory into the other devices? As you know x86_64 don't need
highmem, IIUC, all kernel memory will linear mapping in this case. Is my
idea available? If is correct, x86_32 can't implement in the same way
since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
hard to focus kernel memory on single memory device.
3. In current implementation, if memory hotplug just need memory
subsystem and ACPI codes support? Or also needs firmware take part in?
Hope you can explain in details, thanks in advance. :)
4. What's the status of memory hotplug? Apart from can't remove kernel
memory, other things are fully implementation?  


> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>       extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
>     N_NORMAL_MEMORY: nodes who have normal memory.
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem
>     N_MEMORY: nodes who has memory (any memory)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      highmem only -------------------------
>                      highmem and movablemem ---------------
>                      movablemem only ---------------------- We can have 
> movablemem only.    ***
> 
> 2) With out CONFIG_MOVABLE_NODE:
>     N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      No movablemem only ------------------- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.
> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [GIT PULL 00/21] perf/core improvements and fixes
From: Ingo Molnar @ 2013-01-31  9:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Frederic Weisbecker, Stephane Eranian,
	arnaldo.melo, linuxppc-dev, Paul Mackerras, Jiri Olsa,
	Andrea Arcangeli, Andi Kleen, Hugh Dickins, Mel Gorman,
	Michael Ellerman, Borislav Petkov, Thomas Jarosch, Rik van Riel,
	Corey Ashford, Namhyung Kim, Anton Blanchard, Steven Rostedt,
	Arnaldo Carvalho de Melo, Sukadev Bhattiprolu, Peter Hurley,
	Mike Galbraith, linux-kernel, David Ahern, Andrew Morton
In-Reply-To: <1359557222-17547-1-git-send-email-acme@infradead.org>


* Arnaldo Carvalho de Melo <acme@infradead.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling.
> 
> 	Namhyung, Jiri, the 'group report' patches are at acme/perf/group,
> will send a pull req later if it survives further testing.
> 
> - Arnaldo
> 
> The following changes since commit a2d28d0c198b65fac28ea6212f5f8edc77b29c27:
> 
>   Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2013-01-25 11:34:00 +0100)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo
> 
> for you to fetch changes up to 5809fde040de2afa477a6c593ce2e8fd2c11d9d3:
> 
>   perf header: Fix double fclose() on do_write(fd, xxx) failure (2013-01-30 10:40:44 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> . Fix some leaks in exit paths.
> 
> . Use memdup where applicable
> 
> . Remove some die() calls, allowing callers to handle exit paths
>   gracefully.
> 
> . Correct typo in tools Makefile, fix from Borislav Petkov.
> 
> . Add 'perf bench numa mem' NUMA performance measurement suite, from Ingo Molnar.
> 
> . Handle dynamic array's element size properly, fix from Jiri Olsa.
> 
> . Fix memory leaks on evsel->counts, from Namhyung Kim.
> 
> . Make numa benchmark optional, allowing the build in machines where required
>   numa libraries are not present, fix from Peter Hurley.
> 
> . Add interval printing in 'perf stat', from Stephane Eranian.
> 
> . Fix compile warnings in tests/attr.c, from Sukadev Bhattiprolu.
> 
> . Fix double free, pclose instead of fclose, leaks and double fclose errors
>   found with the cppcheck tool, from Thomas Jarosch.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (8):
>       perf tools: Stop using 'self' in strlist
>       perf tools: Stop using 'self' in map.[ch]
>       perf tools: Use memdup in map__clone
>       perf kmem: Use memdup()
>       perf header: Stop using die() calls when processing tracing data
>       perf ui browser: Free browser->helpline() on ui_browser__hide()
>       perf tests: Call machine__exit in the vmlinux matches kallsyms test
>       perf tests: Fix leaks on PERF_RECORD_* test
> 
> Borislav Petkov (1):
>       tools: Correct typo in tools Makefile
> 
> Ingo Molnar (1):
>       perf: Add 'perf bench numa mem' NUMA performance measurement suite
> 
> Jiri Olsa (1):
>       tools lib traceevent: Handle dynamic array's element size properly
> 
> Namhyung Kim (1):
>       perf evsel: Fix memory leaks on evsel->counts
> 
> Peter Hurley (1):
>       perf tools: Make numa benchmark optional
> 
> Stephane Eranian (2):
>       perf evsel: Add prev_raw_count field
>       perf stat: Add interval printing
> 
> Sukadev Bhattiprolu (1):
>       perf tools, powerpc: Fix compile warnings in tests/attr.c
> 
> Thomas Jarosch (5):
>       perf tools: Fix possible double free on error
>       perf sort: Use pclose() instead of fclose() on pipe stream
>       perf tools: Fix memory leak on error
>       perf header: Fix memory leak for the "Not caching a kptr_restrict'ed /proc/kallsyms" case
>       perf header: Fix double fclose() on do_write(fd, xxx) failure
> 
>  tools/Makefile                           |    2 +-
>  tools/lib/traceevent/event-parse.c       |   39 +-
>  tools/perf/Documentation/perf-stat.txt   |    4 +
>  tools/perf/Makefile                      |   13 +
>  tools/perf/arch/common.c                 |    1 +
>  tools/perf/bench/bench.h                 |    1 +
>  tools/perf/bench/numa.c                  | 1731 ++++++++++++++++++++++++++++++
>  tools/perf/builtin-bench.c               |   17 +
>  tools/perf/builtin-kmem.c                |    6 +-
>  tools/perf/builtin-stat.c                |  158 ++-
>  tools/perf/config/feature-tests.mak      |   11 +
>  tools/perf/tests/attr.c                  |    5 +
>  tools/perf/tests/open-syscall-all-cpus.c |    1 +
>  tools/perf/tests/perf-record.c           |   12 +-
>  tools/perf/tests/vmlinux-kallsyms.c      |    4 +-
>  tools/perf/ui/browser.c                  |    2 +
>  tools/perf/util/event.c                  |    4 +-
>  tools/perf/util/evsel.c                  |   31 +
>  tools/perf/util/evsel.h                  |    2 +
>  tools/perf/util/header.c                 |   25 +-
>  tools/perf/util/map.c                    |  118 +-
>  tools/perf/util/map.h                    |   24 +-
>  tools/perf/util/sort.c                   |    7 +-
>  tools/perf/util/strlist.c                |   54 +-
>  tools/perf/util/strlist.h                |   42 +-
>  25 files changed, 2154 insertions(+), 160 deletions(-)
>  create mode 100644 tools/perf/bench/numa.c

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Simon Jeons @ 2013-01-31  8:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <510A18FA.2010107@cn.fujitsu.com>

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>       extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
>     N_NORMAL_MEMORY: nodes who have normal memory.
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem
>     N_MEMORY: nodes who has memory (any memory)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      highmem only -------------------------
>                      highmem and movablemem ---------------
>                      movablemem only ---------------------- We can have 
> movablemem only.    ***
> 
> 2) With out CONFIG_MOVABLE_NODE:
>     N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>                      normal memory only
>                      normal and highmem
>                      normal and highmem and movablemem
>                      normal and movablemem ---------------- We can have 
> movablemem.
>                      No movablemem only ------------------- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.

Thanks for your clarify, very clear now. :)

> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
From: Tang Chen @ 2013-01-31  7:10 UTC (permalink / raw)
  To: Simon Jeons
  Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
	linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
	kosaki.motohiro, rientjes, len.brown, wency, cmetcalf, glommer,
	wujianguo, yinghai, laijs, linux-kernel, minchan.kim, akpm,
	linuxppc-dev
In-Reply-To: <1359613162.1587.0.camel@kernel>

On 01/31/2013 02:19 PM, Simon Jeons wrote:
> Hi Tang,
> On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> Please see below. :)
>>
>> On 01/31/2013 09:22 AM, Simon Jeons wrote:
>>>
>>> Sorry, I still confuse. :(
>>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
>>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
>>>
>>> node_states is what? node_states[N_NORMAL_MEMOR] or
>>> node_states[N_MEMORY]?
>>
>> Are you asking what node_states[] is ?
>>
>> node_states[] is an array of nodemask,
>>
>>       extern nodemask_t node_states[NR_NODE_STATES];
>>
>> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
>> normal memory.
>> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
>> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
>> ZONE_MOVABLE.
>>
>
> Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> *ZONE_MOVABLE*, the comment of enum nodes_states said that
> N_NORMAL_MEMORY just means the node has regular memory.
>

Hi Simon,

Let's say it in this way.

If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
don't have a separate
macro to represent highmem because we don't have highmem.
This is easy to understand, right ?

Now, think it just like above:
If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
N_NORMAL_MEMORY.
This means we don't allow a node to have only movable memory, not we 
don't have movable memory.
A node could have normal memory and movable memory. So 
nodes_state[N_NORMAL_MEMORY] represents
a node have 0 ... *ZONE_MOVABLE*.

I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
only movable memory.
So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
movable memory. It means
the node cannot have only movable memory. It can have normal memory and 
movable memory.

1) With CONFIG_MOVABLE_NODE:
    N_NORMAL_MEMORY: nodes who have normal memory.
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem
    N_MEMORY: nodes who has memory (any memory)
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem ---------------- We can have 
movablemem.
                     highmem only -------------------------
                     highmem and movablemem ---------------
                     movablemem only ---------------------- We can have 
movablemem only.    ***

2) With out CONFIG_MOVABLE_NODE:
    N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
                     normal memory only
                     normal and highmem
                     normal and highmem and movablemem
                     normal and movablemem ---------------- We can have 
movablemem.
                     No movablemem only ------------------- We cannot 
have movablemem only. ***

The semantics is not that clear here. So we can only try to understand 
it from the code where
we use N_MEMORY. :)

That is my understanding of this.

Thanks. :)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox