* [V6 04/11] perf, documentation: Description for conditional branch filter
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
Adding documentation support for conditional branch filter.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
tools/perf/Documentation/perf-record.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index c71b0f3..d460049 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -184,9 +184,10 @@ following filters are defined:
- in_tx: only when the target is in a hardware transaction
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
+ - cond: conditional branches
+
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
The privilege levels may be omitted, in which case, the privilege levels of the associated
event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
levels are subject to permissions. When sampling on multiple events, branch stack sampling
--
1.7.11.7
^ permalink raw reply related
* [V6 11/11] powerpc, perf: Enable privilege mode SW branch filters
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
This patch enables privilege mode SW branch filters. Also modifies
POWER8 PMU branch filter configuration so that the privilege mode
branch filter implemented as part of base PMU event configuration
is reflected in bhrb filter mask. As a result, the SW will skip and
not try to process the privilege mode branch filters itself.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
arch/powerpc/perf/core-book3s.c | 53 +++++++++++++++++++++++++++++++----------
arch/powerpc/perf/power8-pmu.c | 13 ++++++++--
2 files changed, 52 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a94cc43..297cddb 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -26,6 +26,9 @@
#define BHRB_PREDICTION 0x0000000000000001
#define BHRB_EA 0xFFFFFFFFFFFFFFFCUL
+#define POWER_ADDR_USER 0
+#define POWER_ADDR_KERNEL 1
+
struct cpu_hw_events {
int n_events;
int n_percpu;
@@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 sw_filter)
* Access the instruction contained in the address and check
* whether it complies with the applicable SW branch filters.
*/
-static bool keep_branch(u64 from, u64 sw_filter)
+static bool keep_branch(u64 from, u64 to, u64 sw_filter)
{
unsigned int instr;
- bool ret;
+ bool to_plm, ret, flag;
/*
* The "from" branch for every branch record has to go
@@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter)
if (sw_filter == 0)
return true;
+ to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER;
+
+ /*
+ * Applying privilege mode SW branch filters first on the
+ * 'to' address makes an AND semantic with the SW generic
+ * branch filters (OR with each other) being applied on the
+ * from address there after.
+ */
+
+ /* Ignore PERF_SAMPLE_BRANCH_HV */
+ sw_filter &= ~PERF_SAMPLE_BRANCH_HV;
+
+ /* Privilege mode branch filters for "TO" address */
+ if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) {
+ flag = false;
+
+ if (sw_filter & PERF_SAMPLE_BRANCH_USER) {
+ if(to_plm == POWER_ADDR_USER)
+ flag = true;
+ }
+
+ if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) {
+ if(to_plm == POWER_ADDR_KERNEL)
+ flag = true;
+ }
+
+ if (!flag)
+ return false;
+ }
+
+ /* Generic branch filters for "FROM" address */
if (is_kernel_addr(from)) {
return check_instruction((unsigned int *) from, sw_filter);
} else {
@@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter)
if (!(branch_sample_type & x))
continue;
/*
- * Privilege filter requests have been already
- * taken care during the base PMU configuration.
- */
- if ((x == PERF_SAMPLE_BRANCH_USER)
- || (x == PERF_SAMPLE_BRANCH_KERNEL)
- || (x == PERF_SAMPLE_BRANCH_HV))
- continue;
-
- /*
* Requested filter not available either
* in PMU or in SW.
*/
@@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter)
}
/* SW implemented branch filters */
-static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL,
+static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER,
+ PERF_SAMPLE_BRANCH_KERNEL,
+ PERF_SAMPLE_BRANCH_HV,
+ PERF_SAMPLE_BRANCH_ANY_CALL,
PERF_SAMPLE_BRANCH_COND,
PERF_SAMPLE_BRANCH_ANY_RETURN,
PERF_SAMPLE_BRANCH_IND_CALL };
@@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
/* Apply SW branch filters and drop the entry if required */
if (!keep_branch(cpuhw->bhrb_entries[u_index].from,
+ cpuhw->bhrb_entries[u_index].to,
cpuhw->bhrb_sw_filter))
u_index--;
u_index++;
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 4743bde..b6e21da 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
* filter configuration. BHRB is always recorded along with a
* regular PMU event. As the privilege state filter is handled
* in the basic PMC configuration of the accompanying regular
- * PMU event, we ignore any separate BHRB specific request.
+ * PMU event, we ignore any separate BHRB specific request. But
+ * this needs to be communicated with the branch filter mask.
*/
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_USER)
+ *bhrb_filter |= PERF_SAMPLE_BRANCH_USER;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL)
+ *bhrb_filter |= PERF_SAMPLE_BRANCH_KERNEL;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HV)
+ *bhrb_filter |= PERF_SAMPLE_BRANCH_HV;
+
/* Ignore user, kernel, hv bits */
branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
@@ -679,7 +689,6 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
if (branch_sample_type) {
/* Multiple branch filters will be processed in SW */
pmu_bhrb_filter = 0;
- *bhrb_filter = 0;
return pmu_bhrb_filter;
} else {
/* Individual branch filter will be processed in PMU */
--
1.7.11.7
^ permalink raw reply related
* [V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU can only handle one HW based branch filter request at any point of time.
For all other combinations PMU will pass it on to the SW.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
arch/powerpc/perf/power8-pmu.c | 50 ++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 699b1dd..4743bde 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,6 +635,16 @@ static int power8_generic_events[] = {
static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
{
+ u64 x, pmu_bhrb_filter;
+ pmu_bhrb_filter = 0;
+ *bhrb_filter = 0;
+
+ /* No branch filter requested */
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+ *bhrb_filter = PERF_SAMPLE_BRANCH_ANY;
+ return pmu_bhrb_filter;
+ }
+
/* BHRB and regular PMU events share the same privilege state
* filter configuration. BHRB is always recorded along with a
* regular PMU event. As the privilege state filter is handled
@@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
/* Ignore user, kernel, hv bits */
branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
- /* No branch filter requested */
- if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
- return 0;
+ /*
+ * P8 does not support oring of PMU HW branch filters. Hence
+ * if multiple branch filters are requested which includes filters
+ * supported in PMU, still go ahead and clear the PMU based HW branch
+ * filter component as in this case all the filters will be processed
+ * in SW.
+ */
- if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
- return POWER8_MMCRA_IFM1;
+ for_each_branch_sample_type(x) {
+ /* Ignore privilege branch filters */
+ if ((x == PERF_SAMPLE_BRANCH_USER)
+ || (x == PERF_SAMPLE_BRANCH_KERNEL)
+ || (x == PERF_SAMPLE_BRANCH_HV))
+ continue;
+
+ if (!(branch_sample_type & x))
+ continue;
+
+ /* Supported individual PMU branch filters */
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+ branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+ if (branch_sample_type) {
+ /* Multiple branch filters will be processed in SW */
+ pmu_bhrb_filter = 0;
+ *bhrb_filter = 0;
+ return pmu_bhrb_filter;
+ } else {
+ /* Individual branch filter will be processed in PMU */
+ pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+ *bhrb_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
+ return pmu_bhrb_filter;
+ }
+ }
}
- /* Every thing else is unsupported */
- return -1;
+ return pmu_bhrb_filter;
}
static void power8_config_bhrb(u64 pmu_bhrb_filter)
--
1.7.11.7
^ permalink raw reply related
* [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Various architectures can provide
this functionality with either with HW filtering support (if present)
or with SW filtering of captured branch instructions.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
include/uapi/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX = 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX = 1U << 9, /* not in transaction */
+ PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
- PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */
+ PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */
};
#define PERF_SAMPLE_BRANCH_PLM_ALL \
--
1.7.11.7
^ permalink raw reply related
* [V6 05/11] powerpc, perf: Re-arrange BHRB processing
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
This patch cleans up some existing indentation problem and
re-organizes the BHRB processing code with an helper function
named `update_branch_entry` making it more readable. This patch
does not change any functionality.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
arch/powerpc/perf/core-book3s.c | 102 ++++++++++++++++++++--------------------
1 file changed, 52 insertions(+), 50 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4520c93..66bea54 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr)
return target - (unsigned long)&instr + addr;
}
+/* Update individual branch entry */
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred)
+{
+ cpuhw->bhrb_entries[u_index].from = from;
+ cpuhw->bhrb_entries[u_index].to = to;
+ cpuhw->bhrb_entries[u_index].mispred = pred;
+ cpuhw->bhrb_entries[u_index].predicted = ~pred;
+ return;
+}
+
/* Processing BHRB entries */
void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
{
u64 val;
- u64 addr;
+ u64 addr, tmp;
int r_index, u_index, pred;
r_index = 0;
@@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
if (!val)
/* Terminal marker: End of valid BHRB entries */
break;
- else {
- addr = val & BHRB_EA;
- pred = val & BHRB_PREDICTION;
- if (!addr)
- /* invalid entry */
- continue;
+ addr = val & BHRB_EA;
+ pred = val & BHRB_PREDICTION;
- /* Branches are read most recent first (ie. mfbhrb 0 is
- * the most recent branch).
- * There are two types of valid entries:
- * 1) a target entry which is the to address of a
- * computed goto like a blr,bctr,btar. The next
- * entry read from the bhrb will be branch
- * corresponding to this target (ie. the actual
- * blr/bctr/btar instruction).
- * 2) a from address which is an actual branch. If a
- * target entry proceeds this, then this is the
- * matching branch for that target. If this is not
- * following a target entry, then this is a branch
- * where the target is given as an immediate field
- * in the instruction (ie. an i or b form branch).
- * In this case we need to read the instruction from
- * memory to determine the target/to address.
+ if (!addr)
+ /* invalid entry */
+ continue;
+
+ /* Branches are read most recent first (ie. mfbhrb 0 is
+ * the most recent branch).
+ * There are two types of valid entries:
+ * 1) a target entry which is the to address of a
+ * computed goto like a blr,bctr,btar. The next
+ * entry read from the bhrb will be branch
+ * corresponding to this target (ie. the actual
+ * blr/bctr/btar instruction).
+ * 2) a from address which is an actual branch. If a
+ * target entry proceeds this, then this is the
+ * matching branch for that target. If this is not
+ * following a target entry, then this is a branch
+ * where the target is given as an immediate field
+ * in the instruction (ie. an i or b form branch).
+ * In this case we need to read the instruction from
+ * memory to determine the target/to address.
+ */
+ if (val & BHRB_TARGET) {
+ /* Target branches use two entries
+ * (ie. computed gotos/XL form)
*/
+ tmp = addr;
+ /* Get from address in next entry */
+ val = read_bhrb(r_index++);
+ addr = val & BHRB_EA;
if (val & BHRB_TARGET) {
- /* Target branches use two entries
- * (ie. computed gotos/XL form)
- */
- cpuhw->bhrb_entries[u_index].to = addr;
- cpuhw->bhrb_entries[u_index].mispred = pred;
- cpuhw->bhrb_entries[u_index].predicted = ~pred;
-
- /* Get from address in next entry */
- val = read_bhrb(r_index++);
- addr = val & BHRB_EA;
- if (val & BHRB_TARGET) {
- /* Shouldn't have two targets in a
- row.. Reset index and try again */
- r_index--;
- addr = 0;
- }
- cpuhw->bhrb_entries[u_index].from = addr;
- } else {
- /* Branches to immediate field
- (ie I or B form) */
- cpuhw->bhrb_entries[u_index].from = addr;
- cpuhw->bhrb_entries[u_index].to =
- power_pmu_bhrb_to(addr);
- cpuhw->bhrb_entries[u_index].mispred = pred;
- cpuhw->bhrb_entries[u_index].predicted = ~pred;
+ /* Shouldn't have two targets in a
+ row.. Reset index and try again */
+ r_index--;
+ addr = 0;
}
- u_index++;
-
+ update_branch_entry(cpuhw, u_index, addr, tmp, pred);
+ } else {
+ /* Branches to immediate field
+ (ie I or B form) */
+ tmp = power_pmu_bhrb_to(addr);
+ update_branch_entry(cpuhw, u_index, addr, tmp, pred);
}
+ u_index++;
}
cpuhw->bhrb_stack.nr = u_index;
return;
--
1.7.11.7
^ permalink raw reply related
* [V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
From: Anshuman Khandual @ 2014-05-05 9:09 UTC (permalink / raw)
To: linuxppc-dev, linux-kernel
Cc: mikey, ak, eranian, michael, acme, sukadev, mingo
In-Reply-To: <1399280953-31442-1-git-send-email-khandual@linux.vnet.ibm.com>
This patch does some code re-arrangements to make it clear that
it ignores any separate privilege level branch filter request
and does not support any combinations of HW PMU branch filters.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
arch/powerpc/perf/power8-pmu.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index fe2763b..13f47f5 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,8 +635,6 @@ static int power8_generic_events[] = {
static u64 power8_bhrb_filter_map(u64 branch_sample_type)
{
- u64 pmu_bhrb_filter = 0;
-
/* BHRB and regular PMU events share the same privilege state
* filter configuration. BHRB is always recorded along with a
* regular PMU event. As the privilege state filter is handled
@@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
* PMU event, we ignore any separate BHRB specific request.
*/
- /* No branch filter requested */
- if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
- return pmu_bhrb_filter;
-
- /* Invalid branch filter options - HW does not support */
- if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
- return -1;
+ /* Ignore user, kernel, hv bits */
+ branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
- if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
- return -1;
+ /* No branch filter requested */
+ if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
+ return 0;
- if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
- pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
- return pmu_bhrb_filter;
+ if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
+ return POWER8_MMCRA_IFM1;
}
/* Every thing else is unsupported */
--
1.7.11.7
^ permalink raw reply related
* Re: Boot problems with a PA6T board
From: Christian Zigotzky @ 2014-05-05 9:41 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
In-Reply-To: <1399268891.4600.3.camel@concordia>
Hi Michael,
Thanks a lot for your answer. They reasoned that "starting cpu hw idx
0... failed" is reported because that core of the CPU is already up and
running.
I have built a git kernel from 2014-04-02.
-> git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git
-> git show 3e75c6de1ac33fe3500f44573d9212dc82c99f59
-> git checkout -f 3e75c6de1ac33fe3500f44573d9212dc82c99f59; git clean -fdx
This kernel booted and showed a Kernel Panic with the following error
message:
Oops: Machine check, sig: 7 [#1]
Rgds,
Christian
On 05.05.2014 07:48, Michael Ellerman wrote:
> On Sun, 2014-05-04 at 18:02 +0200, Christian Zigotzky wrote:
>> Hi All,
>>
>> The RC 1, 2, and 3 of the kernel 3.15 don't boot on my PA6T board with a
>> Radeon HD 6870 graphics card.
>>
>> Screenshot:
>> http://forum.hyperion-entertainment.biz/download/file.php?id=1060&mode=view
>>
>> The kernel 3.14 starts without any problems. Has anyone a tip for me,
>> please?
> The line that says "starting cpu hw idx 0... failed" looks a little worrying.
> Do you see that on 3.14 as well?
>
> Otherwise bisection is probably your best bet.
>
> cheers
>
>
>
^ permalink raw reply
* Re: [PATCH V5] KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest
From: Alexander Graf @ 2014-05-05 11:17 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <1399259384-6623-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 05/05/2014 05:09 AM, Aneesh Kumar K.V wrote:
> This patch make sure we inherit the LE bit correctly in different case
> so that we can run Little Endian distro in PR mode
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Thanks, applied to kvm-ppc-queue.
Alex
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Alexander Graf @ 2014-05-05 11:19 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <1399224075-18041-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to use
> saved dsisr or not
>
> arch/powerpc/include/asm/disassemble.h | 34 +++++++++++++++++++++++++++
> arch/powerpc/kernel/align.c | 34 +--------------------------
> arch/powerpc/kvm/book3s_emulate.c | 43 ++++------------------------------
> 3 files changed, 40 insertions(+), 71 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h
> index 856f8deb557a..6330a61b875a 100644
> --- a/arch/powerpc/include/asm/disassemble.h
> +++ b/arch/powerpc/include/asm/disassemble.h
> @@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst)
> {
> return (inst >> 11) & 0x7fff;
> }
> +
> +#define IS_XFORM(inst) (get_op(inst) == 31)
> +#define IS_DSFORM(inst) (get_op(inst) >= 56)
> +
> +/*
> + * Create a DSISR value from the instruction
> + */
> +static inline unsigned make_dsisr(unsigned instr)
> +{
> + unsigned dsisr;
> +
> +
> + /* bits 6:15 --> 22:31 */
> + dsisr = (instr & 0x03ff0000) >> 16;
> +
> + if (IS_XFORM(instr)) {
> + /* bits 29:30 --> 15:16 */
> + dsisr |= (instr & 0x00000006) << 14;
> + /* bit 25 --> 17 */
> + dsisr |= (instr & 0x00000040) << 8;
> + /* bits 21:24 --> 18:21 */
> + dsisr |= (instr & 0x00000780) << 3;
> + } else {
> + /* bit 5 --> 17 */
> + dsisr |= (instr & 0x04000000) >> 12;
> + /* bits 1: 4 --> 18:21 */
> + dsisr |= (instr & 0x78000000) >> 17;
> + /* bits 30:31 --> 12:13 */
> + if (IS_DSFORM(instr))
> + dsisr |= (instr & 0x00000003) << 18;
> + }
> +
> + return dsisr;
> +}
> #endif /* __ASM_PPC_DISASSEMBLE_H__ */
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index 94908af308d8..34f55524d456 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -25,14 +25,13 @@
> #include <asm/cputable.h>
> #include <asm/emulated_ops.h>
> #include <asm/switch_to.h>
> +#include <asm/disassemble.h>
>
> struct aligninfo {
> unsigned char len;
> unsigned char flags;
> };
>
> -#define IS_XFORM(inst) (((inst) >> 26) == 31)
> -#define IS_DSFORM(inst) (((inst) >> 26) >= 56)
>
> #define INVALID { 0, 0 }
>
> @@ -192,37 +191,6 @@ static struct aligninfo aligninfo[128] = {
> };
>
> /*
> - * Create a DSISR value from the instruction
> - */
> -static inline unsigned make_dsisr(unsigned instr)
> -{
> - unsigned dsisr;
> -
> -
> - /* bits 6:15 --> 22:31 */
> - dsisr = (instr & 0x03ff0000) >> 16;
> -
> - if (IS_XFORM(instr)) {
> - /* bits 29:30 --> 15:16 */
> - dsisr |= (instr & 0x00000006) << 14;
> - /* bit 25 --> 17 */
> - dsisr |= (instr & 0x00000040) << 8;
> - /* bits 21:24 --> 18:21 */
> - dsisr |= (instr & 0x00000780) << 3;
> - } else {
> - /* bit 5 --> 17 */
> - dsisr |= (instr & 0x04000000) >> 12;
> - /* bits 1: 4 --> 18:21 */
> - dsisr |= (instr & 0x78000000) >> 17;
> - /* bits 30:31 --> 12:13 */
> - if (IS_DSFORM(instr))
> - dsisr |= (instr & 0x00000003) << 18;
> - }
> -
> - return dsisr;
> -}
> -
> -/*
> * The dcbz (data cache block zero) instruction
> * gives an alignment fault if used on non-cacheable
> * memory. We handle the fault mainly for the
> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
> index 99d40f8977e8..04c38f049dfd 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c
> @@ -569,48 +569,14 @@ unprivileged:
>
> u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)
> {
> - u32 dsisr = 0;
> -
> - /*
> - * This is what the spec says about DSISR bits (not mentioned = 0):
> - *
> - * 12:13 [DS] Set to bits 30:31
> - * 15:16 [X] Set to bits 29:30
> - * 17 [X] Set to bit 25
> - * [D/DS] Set to bit 5
> - * 18:21 [X] Set to bits 21:24
> - * [D/DS] Set to bits 1:4
> - * 22:26 Set to bits 6:10 (RT/RS/FRT/FRS)
> - * 27:31 Set to bits 11:15 (RA)
> - */
> -
> - switch (get_op(inst)) {
> - /* D-form */
> - case OP_LFS:
> - case OP_LFD:
> - case OP_STFD:
> - case OP_STFS:
> - dsisr |= (inst >> 12) & 0x4000; /* bit 17 */
> - dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */
> - break;
> - /* X-form */
> - case 31:
> - dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */
> - dsisr |= (inst << 8) & 0x04000; /* bit 17 */
> - dsisr |= (inst << 3) & 0x03c00; /* bits 18:21 */
> - break;
> - default:
> - printk(KERN_INFO "KVM: Unaligned instruction 0x%x\n", inst);
> - break;
> - }
> -
> - dsisr |= (inst >> 16) & 0x03ff; /* bits 22:31 */
> -
> - return dsisr;
> + return make_dsisr(inst);
> }
>
> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
> {
> +#ifdef CONFIG_PPC_BOOK3S_64
> + return vcpu->arch.fault_dar;
How about PA6T and G5s?
Alex
^ permalink raw reply
* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Alexander Graf @ 2014-05-05 11:26 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <1399224322-22028-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
> We reserve 5% of total ram for CMA allocation and not using that can
> result in us running out of numa node memory with specific
> configuration. One caveat is we may not have node local hpt with pinned
> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
> after creating hash page table.
I don't understand the problem. Can you please elaborate?
Alex
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++++++-----------------
> 1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index fb25ebc0af0c..f32896ffd784 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm);
>
> long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> {
> - unsigned long hpt;
> + unsigned long hpt = 0;
> struct revmap_entry *rev;
> struct page *page = NULL;
> long order = KVM_DEFAULT_HPT_ORDER;
> @@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> }
>
> kvm->arch.hpt_cma_alloc = 0;
> - /*
> - * try first to allocate it from the kernel page allocator.
> - * We keep the CMA reserved for failed allocation.
> - */
> - hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
> - __GFP_NOWARN, order - PAGE_SHIFT);
> -
> - /* Next try to allocate from the preallocated pool */
> - if (!hpt) {
> - VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
> - page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
> - if (page) {
> - hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
> - kvm->arch.hpt_cma_alloc = 1;
> - } else
> - --order;
> + VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
> + page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
> + if (page) {
> + hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
> + kvm->arch.hpt_cma_alloc = 1;
> }
>
> /* Lastly try successively smaller sizes from the page allocator */
^ permalink raw reply
* Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
From: Alexander Graf @ 2014-05-05 11:29 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <1399224368-22122-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 05/04/2014 07:26 PM, Aneesh Kumar K.V wrote:
> With debug option "sleep inside atomic section checking" enabled we get
> the below WARN_ON during a PR KVM boot. This is because upstream now
> have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
> warning by adding preempt_disable/enable around floating point and altivec
> enable.
>
> WARNING: at arch/powerpc/kernel/process.c:156
> Modules linked in: kvm_pr kvm
> CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: G W 3.15.0-rc1+ #4
> task: c0000000eb85b3a0 ti: c0000000ec59c000 task.ti: c0000000ec59c000
> NIP: c000000000015c84 LR: d000000003334644 CTR: c000000000015c00
> REGS: c0000000ec59f140 TRAP: 0700 Tainted: G W (3.15.0-rc1+)
> MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42000024 XER: 20000000
> CFAR: c000000000015c24 SOFTE: 1
> GPR00: d000000003334644 c0000000ec59f3c0 c000000000e2fa40 c0000000e2f80000
> GPR04: 0000000000000800 0000000000002000 0000000000000001 8000000000000000
> GPR08: 0000000000000001 0000000000000001 0000000000002000 c000000000015c00
> GPR12: d00000000333da18 c00000000fb80900 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 00003fffce4e0fa1
> GPR20: 0000000000000010 0000000000000001 0000000000000002 00000000100b9a38
> GPR24: 0000000000000002 0000000000000000 0000000000000000 0000000000000013
> GPR28: 0000000000000000 c0000000eb85b3a0 0000000000002000 c0000000e2f80000
> NIP [c000000000015c84] .enable_kernel_fp+0x84/0x90
> LR [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
> Call Trace:
> [c0000000ec59f3c0] [0000000000000010] 0x10 (unreliable)
> [c0000000ec59f430] [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
> [c0000000ec59f4c0] [d00000000324b380] .kvmppc_set_msr+0x30/0x50 [kvm]
> [c0000000ec59f530] [d000000003337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr]
> [c0000000ec59f5f0] [d00000000324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm]
> [c0000000ec59f6c0] [d000000003336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr]
> [c0000000ec59f790] [d000000003338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr]
> [c0000000ec59f960] [d000000003336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr]
> [c0000000ec59f9f0] [d00000000324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm]
> [c0000000ec59fa60] [d000000003249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm]
> [c0000000ec59faf0] [d000000003244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm]
> [c0000000ec59fcb0] [c000000000224e34] .do_vfs_ioctl+0x4d4/0x790
> [c0000000ec59fd90] [c000000000225148] .SyS_ioctl+0x58/0xb0
> [c0000000ec59fe30] [c00000000000a1e4] syscall_exit+0x0/0x98
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Thanks, applied to kvm-ppc-queue.
Alex
^ permalink raw reply
* Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
From: Alexander Graf @ 2014-05-05 11:38 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <1399224616-25142-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
No patch description, no proper explanations anywhere why you're doing
what. All of that in a pretty sensitive piece of code. There's no way
this patch can go upstream in its current form.
Alex
^ permalink raw reply
* Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
From: Alexander Graf @ 2014-05-05 11:56 UTC (permalink / raw)
To: Gavin Shan; +Cc: kvm, aik, kvm-ppc, alex.williamson, qiudayu, linuxppc-dev
In-Reply-To: <1399253291-3975-1-git-send-email-gwshan@linux.vnet.ibm.com>
On 05/05/2014 03:27 AM, Gavin Shan wrote:
> The series of patches intends to support EEH for PCI devices, which have been
> passed through to PowerKVM based guest via VFIO. The implementation is
> straightforward based on the issues or problems we have to resolve to support
> EEH for PowerKVM based guest.
>
> - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
> to emulate XICS. Without introducing new mechanism, we just extend that
> existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
> initiated from guest are posted to host where the requests get handled or
> delivered to underly firmware for further handling. For that, the host kerenl
> has to maintain the PCI address (host domain/bus/slot/function to guest's
> PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
> will be built when initializing VFIO device in QEMU and destroied when the
> VFIO device in QEMU is going to offline, or VM is destroy.
Do you also expose all those interfaces to user space? VFIO is as much
about user space device drivers as it is about device assignment.
I would like to first see an implementation that doesn't touch KVM
emulation code at all but instead routes everything through QEMU. As a
second step we can then accelerate performance critical paths inside of KVM.
That way we ensure that user space device drivers have all the power
over a device they need to drive it.
Alex
^ permalink raw reply
* Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
From: Tudor Laurentiu @ 2014-05-05 12:17 UTC (permalink / raw)
To: Alexander Graf
Cc: Scott Wood, Laurentiu.Tudor, linuxppc-dev@lists.ozlabs.org,
Stuart Yoder
In-Reply-To: <53615872.3010006@suse.de>
On 04/30/2014 11:09 PM, Alexander Graf wrote:
>
> On 30.04.14 22:03, Stuart Yoder wrote:
>>
>>> -----Original Message-----
>>> From: Alexander Graf [mailto:agraf@suse.de]
>>> Sent: Wednesday, April 30, 2014 2:56 PM
>>> To: Yoder Stuart-B08248; benh@kernel.crashing.org; Wood Scott-B07421
>>> Cc: linuxppc-dev@lists.ozlabs.org
>>> Subject: Re: [PATCH] powerpc: move epapr paravirt init of power_save to
>>> an initcall
>>>
>>>
>>> On 30.04.14 21:54, Stuart Yoder wrote:
>>>> From: Stuart Yoder <stuart.yoder@freescale.com>
>>>>
>>>> some restructuring of epapr paravirt init resulted in
>>>> ppc_md.power_save being set, and then overwritten to
>>>> NULL during machine_init. This patch splits the
>>>> initialization of ppc_md.power_save out into a postcore
>>>> init call.
>>>>
>>>> Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
>>>> ---
>>>> arch/powerpc/kernel/epapr_paravirt.c | 25
>>>> ++++++++++++++++++++-----
>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/epapr_paravirt.c
>>> b/arch/powerpc/kernel/epapr_paravirt.c
>>>> index 6300c13..c49b69c 100644
>>>> --- a/arch/powerpc/kernel/epapr_paravirt.c
>>>> +++ b/arch/powerpc/kernel/epapr_paravirt.c
>>>> @@ -52,11 +52,6 @@ static int __init early_init_dt_scan_epapr(unsigned
>>> long node,
>>>> #endif
>>>> }
>>>>
>>>> -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>> - if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>> - ppc_md.power_save = epapr_ev_idle;
>>>> -#endif
>>>> -
>>>> epapr_paravirt_enabled = true;
>>>>
>>>> return 1;
>>>> @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
>>>> return 0;
>>>> }
>>>>
>>>> +static int __init epapr_idle_init_dt_scan(unsigned long node,
>>>> + const char *uname,
>>>> + int depth, void *data)
>>>> +{
>>>> +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>> + if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>> + ppc_md.power_save = epapr_ev_idle;
>>>> +#endif
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int __init epapr_idle_init(void)
>>>> +{
>>>> + if (epapr_paravirt_enabled)
>>>> + of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);
>>> Doesn't this scan all nodes? We only want to match on
>>> /hypervisor/has-idle, no?
>> I cut/pasted from the approach the existing code in that file
>> took, but yes you're right we just need the one property.
>> Let me respin that to look at the hypervisor node only.
>
> Yeah, the same commit that introduced the breakage on has-idle also
> removed the explicit check for /hypervisor.
>
> Laurentiu, was this change on purpose?
>
Alex,
IIRC, at that time i had to switch from the normal "of" functions to a
completely different api that's available in early init stage. This
early "of" api is pretty limited (e.g. doesn't have a way to address a
specific node) and i had to use that function that scans the whole tree.
---
Best Regards, Laurentiu
^ permalink raw reply
* Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
From: Alexander Graf @ 2014-05-05 12:21 UTC (permalink / raw)
To: Tudor Laurentiu
Cc: Scott Wood, Laurentiu.Tudor, linuxppc-dev@lists.ozlabs.org,
Stuart Yoder
In-Reply-To: <5367815F.5010509@freescale.com>
On 05/05/2014 02:17 PM, Tudor Laurentiu wrote:
> On 04/30/2014 11:09 PM, Alexander Graf wrote:
>>
>> On 30.04.14 22:03, Stuart Yoder wrote:
>>>
>>>> -----Original Message-----
>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>> Sent: Wednesday, April 30, 2014 2:56 PM
>>>> To: Yoder Stuart-B08248; benh@kernel.crashing.org; Wood Scott-B07421
>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>> Subject: Re: [PATCH] powerpc: move epapr paravirt init of
>>>> power_save to
>>>> an initcall
>>>>
>>>>
>>>> On 30.04.14 21:54, Stuart Yoder wrote:
>>>>> From: Stuart Yoder <stuart.yoder@freescale.com>
>>>>>
>>>>> some restructuring of epapr paravirt init resulted in
>>>>> ppc_md.power_save being set, and then overwritten to
>>>>> NULL during machine_init. This patch splits the
>>>>> initialization of ppc_md.power_save out into a postcore
>>>>> init call.
>>>>>
>>>>> Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
>>>>> ---
>>>>> arch/powerpc/kernel/epapr_paravirt.c | 25
>>>>> ++++++++++++++++++++-----
>>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kernel/epapr_paravirt.c
>>>> b/arch/powerpc/kernel/epapr_paravirt.c
>>>>> index 6300c13..c49b69c 100644
>>>>> --- a/arch/powerpc/kernel/epapr_paravirt.c
>>>>> +++ b/arch/powerpc/kernel/epapr_paravirt.c
>>>>> @@ -52,11 +52,6 @@ static int __init
>>>>> early_init_dt_scan_epapr(unsigned
>>>> long node,
>>>>> #endif
>>>>> }
>>>>>
>>>>> -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>>> - if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>>> - ppc_md.power_save = epapr_ev_idle;
>>>>> -#endif
>>>>> -
>>>>> epapr_paravirt_enabled = true;
>>>>>
>>>>> return 1;
>>>>> @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static int __init epapr_idle_init_dt_scan(unsigned long node,
>>>>> + const char *uname,
>>>>> + int depth, void *data)
>>>>> +{
>>>>> +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>>> + if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>>> + ppc_md.power_save = epapr_ev_idle;
>>>>> +#endif
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int __init epapr_idle_init(void)
>>>>> +{
>>>>> + if (epapr_paravirt_enabled)
>>>>> + of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);
>>>> Doesn't this scan all nodes? We only want to match on
>>>> /hypervisor/has-idle, no?
>>> I cut/pasted from the approach the existing code in that file
>>> took, but yes you're right we just need the one property.
>>> Let me respin that to look at the hypervisor node only.
>>
>> Yeah, the same commit that introduced the breakage on has-idle also
>> removed the explicit check for /hypervisor.
>>
>> Laurentiu, was this change on purpose?
>>
>
> Alex,
>
> IIRC, at that time i had to switch from the normal "of" functions to a
> completely different api that's available in early init stage. This
> early "of" api is pretty limited (e.g. doesn't have a way to address a
> specific node) and i had to use that function that scans the whole tree.
Ok, so it is an accident. Could you please post a patch that checks that
the node we're looking at is called "hypervisor"? The simple API should
give you enough information for that at least. Maybe you could even
check that the parent node is the root node.
Alex
^ permalink raw reply
* Re: [PATCH] powerpc: move epapr paravirt init of power_save to an initcall
From: Tudor Laurentiu @ 2014-05-05 12:35 UTC (permalink / raw)
To: Alexander Graf
Cc: Scott Wood, Laurentiu.Tudor, linuxppc-dev@lists.ozlabs.org,
Stuart Yoder
In-Reply-To: <53678236.7060206@suse.de>
On 05/05/2014 03:21 PM, Alexander Graf wrote:
> On 05/05/2014 02:17 PM, Tudor Laurentiu wrote:
>> On 04/30/2014 11:09 PM, Alexander Graf wrote:
>>>
>>> On 30.04.14 22:03, Stuart Yoder wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: Alexander Graf [mailto:agraf@suse.de]
>>>>> Sent: Wednesday, April 30, 2014 2:56 PM
>>>>> To: Yoder Stuart-B08248; benh@kernel.crashing.org; Wood Scott-B07421
>>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>>> Subject: Re: [PATCH] powerpc: move epapr paravirt init of
>>>>> power_save to
>>>>> an initcall
>>>>>
>>>>>
>>>>> On 30.04.14 21:54, Stuart Yoder wrote:
>>>>>> From: Stuart Yoder <stuart.yoder@freescale.com>
>>>>>>
>>>>>> some restructuring of epapr paravirt init resulted in
>>>>>> ppc_md.power_save being set, and then overwritten to
>>>>>> NULL during machine_init. This patch splits the
>>>>>> initialization of ppc_md.power_save out into a postcore
>>>>>> init call.
>>>>>>
>>>>>> Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
>>>>>> ---
>>>>>> arch/powerpc/kernel/epapr_paravirt.c | 25
>>>>>> ++++++++++++++++++++-----
>>>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/kernel/epapr_paravirt.c
>>>>> b/arch/powerpc/kernel/epapr_paravirt.c
>>>>>> index 6300c13..c49b69c 100644
>>>>>> --- a/arch/powerpc/kernel/epapr_paravirt.c
>>>>>> +++ b/arch/powerpc/kernel/epapr_paravirt.c
>>>>>> @@ -52,11 +52,6 @@ static int __init
>>>>>> early_init_dt_scan_epapr(unsigned
>>>>> long node,
>>>>>> #endif
>>>>>> }
>>>>>>
>>>>>> -#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>>>> - if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>>>> - ppc_md.power_save = epapr_ev_idle;
>>>>>> -#endif
>>>>>> -
>>>>>> epapr_paravirt_enabled = true;
>>>>>>
>>>>>> return 1;
>>>>>> @@ -69,3 +64,23 @@ int __init epapr_paravirt_early_init(void)
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +static int __init epapr_idle_init_dt_scan(unsigned long node,
>>>>>> + const char *uname,
>>>>>> + int depth, void *data)
>>>>>> +{
>>>>>> +#if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>>>>>> + if (of_get_flat_dt_prop(node, "has-idle", NULL))
>>>>>> + ppc_md.power_save = epapr_ev_idle;
>>>>>> +#endif
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int __init epapr_idle_init(void)
>>>>>> +{
>>>>>> + if (epapr_paravirt_enabled)
>>>>>> + of_scan_flat_dt(epapr_idle_init_dt_scan, NULL);
>>>>> Doesn't this scan all nodes? We only want to match on
>>>>> /hypervisor/has-idle, no?
>>>> I cut/pasted from the approach the existing code in that file
>>>> took, but yes you're right we just need the one property.
>>>> Let me respin that to look at the hypervisor node only.
>>>
>>> Yeah, the same commit that introduced the breakage on has-idle also
>>> removed the explicit check for /hypervisor.
>>>
>>> Laurentiu, was this change on purpose?
>>>
>>
>> Alex,
>>
>> IIRC, at that time i had to switch from the normal "of" functions to a
>> completely different api that's available in early init stage. This
>> early "of" api is pretty limited (e.g. doesn't have a way to address a
>> specific node) and i had to use that function that scans the whole tree.
>
> Ok, so it is an accident. Could you please post a patch that checks that
> the node we're looking at is called "hypervisor"? The simple API should
> give you enough information for that at least. Maybe you could even
> check that the parent node is the root node.
>
Just had a quick look and it looks that that early fdt api was improved
with a function that allows specifying a starting path for the scan
(of_scan_flat_dt_by_path() added in commit
57d74bcf3072b65bde5aa540cedc976a75c48e5c). So i think we can simply use
that instead.
---
Best Regards, Laurentiu
^ permalink raw reply
* Re: [PATCH] powerpc: memcpy optimization for 64bit LE
From: Philippe Bergheaud @ 2014-05-05 12:56 UTC (permalink / raw)
To: Anton Blanchard; +Cc: paulus, linuxppc-dev
In-Reply-To: <20140430091054.4de84c9b@kryten>
Anton Blanchard wrote:
> Unaligned stores take alignment exceptions on POWER7 running in little-endian.
> This is a dumb little-endian base memcpy that prevents unaligned stores.
> Once booted the feature fixup code switches over to the VMX copy loops
> (which are already endian safe).
>
> The question is what we do before that switch over. The base 64bit
> memcpy takes alignment exceptions on POWER7 so we can't use it as is.
> Fixing the causes of alignment exception would slow it down, because
> we'd need to ensure all loads and stores are aligned either through
> rotate tricks or bytewise loads and stores. Either would be bad for
> all other 64bit platforms.
>
> [ I simplified the loop a bit - Anton ]
Got it.
The 3 instructions that you have removed were modifying r5 for no reason,
as the last instruction was always resetting r5 to its initial value.
Philippe
^ permalink raw reply
* Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
From: Alex Williamson @ 2014-05-05 14:00 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, aik, Gavin Shan, kvm-ppc, qiudayu, linuxppc-dev
In-Reply-To: <53677C6C.8060508@suse.de>
On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
> On 05/05/2014 03:27 AM, Gavin Shan wrote:
> > The series of patches intends to support EEH for PCI devices, which have been
> > passed through to PowerKVM based guest via VFIO. The implementation is
> > straightforward based on the issues or problems we have to resolve to support
> > EEH for PowerKVM based guest.
> >
> > - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
> > to emulate XICS. Without introducing new mechanism, we just extend that
> > existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
> > initiated from guest are posted to host where the requests get handled or
> > delivered to underly firmware for further handling. For that, the host kerenl
> > has to maintain the PCI address (host domain/bus/slot/function to guest's
> > PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
> > will be built when initializing VFIO device in QEMU and destroied when the
> > VFIO device in QEMU is going to offline, or VM is destroy.
>
> Do you also expose all those interfaces to user space? VFIO is as much
> about user space device drivers as it is about device assignment.
>
> I would like to first see an implementation that doesn't touch KVM
> emulation code at all but instead routes everything through QEMU. As a
> second step we can then accelerate performance critical paths inside of KVM.
>
> That way we ensure that user space device drivers have all the power
> over a device they need to drive it.
+1
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Aneesh Kumar K.V @ 2014-05-05 14:26 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <536773C2.1070502@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>> Although it's optional IBM POWER cpus always had DAR value set on
>> alignment interrupt. So don't try to compute these values.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> Changes from V3:
>> * Use make_dsisr instead of checking feature flag to decide whether to use
>> saved dsisr or not
>>
....
>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>> {
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> + return vcpu->arch.fault_dar;
>
> How about PA6T and G5s?
>
>
Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.
-aneesh
^ permalink raw reply
* Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
From: Aneesh Kumar K.V @ 2014-05-05 14:35 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <53677558.50900@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>> We reserve 5% of total ram for CMA allocation and not using that can
>> result in us running out of numa node memory with specific
>> configuration. One caveat is we may not have node local hpt with pinned
>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>> after creating hash page table.
>
> I don't understand the problem. Can you please elaborate?
>
>
Lets take a system with 100GB RAM. We reserve around 5GB for htab
allocation. Now if we use rest of available memory for hugetlbfs
(because we want all the guest to be backed by huge pages), we would
end up in a situation where we have a few GB of free RAM and 5GB of CMA
reserve area. Now if we allow hash page table allocation to consume the
free space, we would end up hitting page allocation failure for other
non movable kernel allocation even though we still have 5GB CMA reserve
space free.
-aneesh
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Alexander Graf @ 2014-05-05 14:43 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: kvm, linuxppc-dev, kvm-ppc, paulus, olofj
In-Reply-To: <87tx949u9d.fsf@linux.vnet.ibm.com>
On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
> Alexander Graf <agraf@suse.de> writes:
>
>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>> Although it's optional IBM POWER cpus always had DAR value set on
>>> alignment interrupt. So don't try to compute these values.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> Changes from V3:
>>> * Use make_dsisr instead of checking feature flag to decide whether to use
>>> saved dsisr or not
>>>
> ....
>
>>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>> {
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> + return vcpu->arch.fault_dar;
>> How about PA6T and G5s?
>>
>>
> Paul mentioned that BOOK3S always had DAR value set on alignment
> interrupt. And the patch is to enable/collect correct DAR value when
> running with Little Endian PR guest. Now to limit the impact and to
> enable Little Endian PR guest, I ended up doing the conditional code
> only for book3s 64 for which we know for sure that we set DAR value.
Yes, and I'm asking whether we know that this statement holds true for
PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is
at least developed by IBM, I'd assume its semantics here are similar to
POWER4, but for PA6T I wouldn't be so sure.
Alex
^ permalink raw reply
* Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
From: Aneesh Kumar K.V @ 2014-05-05 14:47 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <53677834.5010808@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> No patch description, no proper explanations anywhere why you're doing
> what. All of that in a pretty sensitive piece of code. There's no way
> this patch can go upstream in its current form.
>
Sorry about being vague. Will add a better commit message. The goal is
to export MPSS support to guest if the host support the same. MPSS
support is exported via penc encoding in "ibm,segment-page-sizes". The
actual format can be found at htab_dt_scan_page_sizes. When the guest
memory is backed by hugetlbfs we expose the penc encoding the host
support to guest via kvmppc_add_seg_page_size.
Now the challenge to THP support is to make sure that our henter,
hremove etc decode base page size and actual page size correctly
from the hash table entry values. Most of the changes is to do that.
Rest of the stuff is already handled by kvm.
NOTE: It is much easier to read the code after applying the patch rather
than reading the diff. I have added comments around each steps in the
code.
-aneesh
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Aneesh Kumar K.V @ 2014-05-05 14:50 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, linuxppc-dev, kvm-ppc, paulus, olofj
In-Reply-To: <5367A39D.9080709@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf <agraf@suse.de> writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>>> Although it's optional IBM POWER cpus always had DAR value set on
>>>> alignment interrupt. So don't try to compute these values.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>> ---
>>>> Changes from V3:
>>>> * Use make_dsisr instead of checking feature flag to decide whether to use
>>>> saved dsisr or not
>>>>
>> ....
>>
>>>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>>> {
>>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>>> + return vcpu->arch.fault_dar;
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
> Yes, and I'm asking whether we know that this statement holds true for
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is
> at least developed by IBM, I'd assume its semantics here are similar to
> POWER4, but for PA6T I wouldn't be so sure.
I will have to defer to Paul on that question. But that should not
prevent this patch from going upstream right ?
-aneesh
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Olof Johansson @ 2014-05-05 14:54 UTC (permalink / raw)
To: Alexander Graf
Cc: kvm, kvm-ppc, Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <5367A39D.9080709@suse.de>
[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]
2014-05-05 7:43 GMT-07:00 Alexander Graf <agraf@suse.de>:
> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>
>> Alexander Graf <agraf@suse.de> writes:
>>
>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>>
>>>> Although it's optional IBM POWER cpus always had DAR value set on
>>>> alignment interrupt. So don't try to compute these values.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>> ---
>>>> Changes from V3:
>>>> * Use make_dsisr instead of checking feature flag to decide whether to
>>>> use
>>>> saved dsisr or not
>>>>
>>>> ....
>>
>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>>> {
>>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>>> + return vcpu->arch.fault_dar;
>>>>
>>> How about PA6T and G5s?
>>>
>>>
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>>
>
> Yes, and I'm asking whether we know that this statement holds true for
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at
> least developed by IBM, I'd assume its semantics here are similar to
> POWER4, but for PA6T I wouldn't be so sure.
>
>
Thanks for looking out for us, obviously IBM doesn't (based on the reply a
minute ago).
In the end, since there's been no work to enable KVM on PA6T, I'm not too
worried. I guess it's one more thing to sort out (and check for) whenever
someone does that.
I definitely don't have cycles to deal with that myself at this time. I can
help find hardware for someone who wants to, but even then I'm guessing the
interest is pretty limited.
-Olof
[-- Attachment #2: Type: text/html, Size: 3115 bytes --]
^ permalink raw reply
* Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
From: Olof Johansson @ 2014-05-05 14:57 UTC (permalink / raw)
To: Alexander Graf
Cc: kvm, kvm-ppc, Paul Mackerras, Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <5367A39D.9080709@suse.de>
[Now without HTML email -- it's what you get for cc:ing me at work
instead of my upstream email :)]
2014-05-05 7:43 GMT-07:00 Alexander Graf <agraf@suse.de>:
>
> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>
>> Alexander Graf <agraf@suse.de> writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>>>>
>>>> Although it's optional IBM POWER cpus always had DAR value set on
>>>> alignment interrupt. So don't try to compute these values.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>>> ---
>>>> Changes from V3:
>>>> * Use make_dsisr instead of checking feature flag to decide whether to=
use
>>>> saved dsisr or not
>>>>
>> ....
>>
>>>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst=
)
>>>> {
>>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>>> + return vcpu->arch.fault_dar;
>>>
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
>
> Yes, and I'm asking whether we know that this statement holds true for PA=
6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at lea=
st developed by IBM, I'd assume its semantics here are similar to POWER4, b=
ut for PA6T I wouldn't be so sure.
>
Thanks for looking out for us, obviously IBM doesn't (based on the
reply a minute ago).
In the end, since there's been no work to enable KVM on PA6T, I'm not
too worried. I guess it's one more thing to sort out (and check for)
whenever someone does that.
I definitely don't have cycles to deal with that myself at this time.
I can help find hardware for someone who wants to, but even then I'm
guessing the interest is pretty limited.
-Olof
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox