LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH V2 0/6] perf: New conditional branch filter
From: Anshuman Khandual @ 2013-09-10  3:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: sukadev, linux-kernel, eranian, acme, linuxppc-dev,
	Michael Neuling
In-Reply-To: <1378778772.25578.1.camel@concordia>

On 09/10/2013 07:36 AM, Michael Ellerman wrote:
> On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
>> 	This patchset is the re-spin of the original branch stack sampling
>> patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
>> also enables SW based branch filtering support for PPC64 platforms which have
>> branch stack sampling support. With this new enablement, the branch filter support
>> for PPC64 platforms have been extended to include all these combinations discussed
>> below with a sample test application program.
> 
> ...
> 
>> Mixed filters
>> -------------
>> (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
>> Error:
>> The perf.data file has no samples!
>>
>> NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
>> branches in that given set. Both the filters are mutually exclussive, so obviously no samples
>> found in the end profile.
> 
> The semantics of multiple filters is not clear to me. It could be an OR,
> or an AND. You have implemented AND, does that match existing behaviour
> on x86 for example?

I believe it does match. X86 code drops the branch records (originally captured
in the LBR) while applying the SW filters.

Regards
Anshuman

^ permalink raw reply

* Re: [PATCH v4 12/19] cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
From: Shawn Guo @ 2013-09-10  2:44 UTC (permalink / raw)
  To: Sudeep KarkadaNagesha
  Cc: Jonas Bonn, devicetree@vger.kernel.org, Michal Simek,
	linux-pm@vger.kernel.org, Viresh Kumar,
	linux-kernel@vger.kernel.org, rob.herring@calxeda.com,
	Rafael J. Wysocki, Greg Kroah-Hartman, grant.likely@linaro.org,
	linuxppc-dev@lists.ozlabs.org, Guennadi Liakhovetski,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <522DE822.3030907@arm.com>

On Mon, Sep 09, 2013 at 04:24:18PM +0100, Sudeep KarkadaNagesha wrote:
> Hi Shawn,
> 
> Ok. But I am bit suspicious about devm_clk_get(cpu_dev, NULL).
> I don't understand completely as how the clock are registered(whether
> with dev_id or with connection_id).

As the connection_id of devm_clk_get() call here is NULL, the clock
lookup should be registered with a proper dev_id in clk_register_clkdev()
call.  And that's what you have seen with imx and shmobile code.


> A quick grep revealed that i.mx and shmobile is using conection id while
> registering.

They are using dev_id.

> If the clock is registered with connection id and retrieved
> with cpu_dev(now dev_id is cpu0 and not cpufreq-cpu0), IIUC that would
> break. If we pass pdev->dev for clk_get, it should be fine but again
> IIUC it breaks highbank which gets all the information from DT.

If the clock lookup is from DT, we should be just fine, since it will
work as long as the DT node with 'clocks' property (/cpus/cpu@0 in this
case) is attached to the struct device pointer of devm_clk_get() call.

> So only solution I can think of is to continue to have the code
> assigning (&pdev->dev)->of_node with cpu device node which is not clean
> and arguable as incorrect since there is no DT node for cpufreq-cpu0.
> I don't have a strong opinion though.
> 
> Let me know how would you like to fix this.

So we only need to change all clkdev registration to use "cpu0" as
dev_id intstead of "cpufreq-cpu0.0", something like below.

And for imx, it should work even without the changes, because we have
device tree lookup ready there, and those clk_register_clkdev() calls
can just be removed now.  But I prefer to include the change and leave
the cleanup to another patch for keeping the change log clear.

Shawn

---8<----------

diff --git a/arch/arm/mach-imx/clk-imx27.c b/arch/arm/mach-imx/clk-imx27.c
index c3cfa41..c6b40f3 100644
--- a/arch/arm/mach-imx/clk-imx27.c
+++ b/arch/arm/mach-imx/clk-imx27.c
@@ -285,7 +285,7 @@ int __init mx27_clocks_init(unsigned long fref)
 	clk_register_clkdev(clk[ata_ahb_gate], "ata", NULL);
 	clk_register_clkdev(clk[rtc_ipg_gate], NULL, "imx21-rtc");
 	clk_register_clkdev(clk[scc_ipg_gate], "scc", NULL);
-	clk_register_clkdev(clk[cpu_div], NULL, "cpufreq-cpu0.0");
+	clk_register_clkdev(clk[cpu_div], NULL, "cpu0");
 	clk_register_clkdev(clk[emi_ahb_gate], "emi_ahb" , NULL);
 
 	mxc_timer_init(MX27_IO_ADDRESS(MX27_GPT1_BASE_ADDR), MX27_INT_GPT1);
diff --git a/arch/arm/mach-imx/clk-imx51-imx53.c b/arch/arm/mach-imx/clk-imx51-imx53.c
index 1a56a33..de1964c 100644
--- a/arch/arm/mach-imx/clk-imx51-imx53.c
+++ b/arch/arm/mach-imx/clk-imx51-imx53.c
@@ -328,7 +328,7 @@ static void __init mx5_clocks_common_init(unsigned long rate_ckil,
 	clk_register_clkdev(clk[ssi2_ipg_gate], NULL, "imx-ssi.1");
 	clk_register_clkdev(clk[ssi3_ipg_gate], NULL, "imx-ssi.2");
 	clk_register_clkdev(clk[sdma_gate], NULL, "imx35-sdma");
-	clk_register_clkdev(clk[cpu_podf], NULL, "cpufreq-cpu0.0");
+	clk_register_clkdev(clk[cpu_podf], NULL, "cpu0");
 	clk_register_clkdev(clk[iim_gate], "iim", NULL);
 	clk_register_clkdev(clk[dummy], NULL, "imx2-wdt.0");
 	clk_register_clkdev(clk[dummy], NULL, "imx2-wdt.1");
diff --git a/arch/arm/mach-shmobile/clock-r8a73a4.c b/arch/arm/mach-shmobile/clock-r8a73a4.c
index 8ea5ef6..5bd2e85 100644
--- a/arch/arm/mach-shmobile/clock-r8a73a4.c
+++ b/arch/arm/mach-shmobile/clock-r8a73a4.c
@@ -555,7 +555,7 @@ static struct clk_lookup lookups[] = {
 	CLKDEV_CON_ID("pll2h",			&pll2h_clk),
 
 	/* CPU clock */
-	CLKDEV_DEV_ID("cpufreq-cpu0",		&z_clk),
+	CLKDEV_DEV_ID("cpu0",			&z_clk),
 
 	/* DIV6 */
 	CLKDEV_CON_ID("zb",			&div6_clks[DIV6_ZB]),
diff --git a/arch/arm/mach-shmobile/clock-sh73a0.c b/arch/arm/mach-shmobile/clock-sh73a0.c
index 1942eae..c92c023 100644
--- a/arch/arm/mach-shmobile/clock-sh73a0.c
+++ b/arch/arm/mach-shmobile/clock-sh73a0.c
@@ -616,7 +616,7 @@ static struct clk_lookup lookups[] = {
 	CLKDEV_DEV_ID("smp_twd", &twd_clk), /* smp_twd */
 
 	/* DIV4 clocks */
-	CLKDEV_DEV_ID("cpufreq-cpu0", &div4_clks[DIV4_Z]),
+	CLKDEV_DEV_ID("cpu0", &div4_clks[DIV4_Z]),
 
 	/* DIV6 clocks */
 	CLKDEV_CON_ID("vck1_clk", &div6_clks[DIV6_VCK1]),

^ permalink raw reply related

* Re: [PATCH V2 0/6] perf: New conditional branch filter
From: Michael Ellerman @ 2013-09-10  2:06 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: sukadev, linux-kernel, eranian, acme, linuxppc-dev,
	Michael Neuling
In-Reply-To: <1377836690-32710-1-git-send-email-khandual@linux.vnet.ibm.com>

On Fri, 2013-08-30 at 09:54 +0530, Anshuman Khandual wrote:
> 	This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
> also enables SW based branch filtering support for PPC64 platforms which have
> branch stack sampling support. With this new enablement, the branch filter support
> for PPC64 platforms have been extended to include all these combinations discussed
> below with a sample test application program.

...

> Mixed filters
> -------------
> (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
> Error:
> The perf.data file has no samples!
> 
> NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
> branches in that given set. Both the filters are mutually exclussive, so obviously no samples
> found in the end profile.

The semantics of multiple filters is not clear to me. It could be an OR,
or an AND. You have implemented AND, does that match existing behaviour
on x86 for example?

cheers

^ permalink raw reply

* [PATCH] powerpc: Export cpu_to_chip_id() to fix build error
From: Guenter Roeck @ 2013-09-10  1:37 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-kernel, Vasant Hegde, Paul Mackerras, Shivaprasad G Bhat,
	Guenter Roeck

powerpc allmodconfig build fails with:

ERROR: ".cpu_to_chip_id" [drivers/block/mtip32xx/mtip32xx.ko] undefined!

The problem was introduced with commit 15863ff3b (powerpc: Make chip-id
information available to userspace).

Export the missing symbol.

Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Cc: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/powerpc/kernel/smp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 442d8e2..8e59abc 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -611,6 +611,7 @@ int cpu_to_chip_id(int cpu)
 	of_node_put(np);
 	return of_get_ibm_chip_id(np);
 }
+EXPORT_SYMBOL(cpu_to_chip_id);
 
 /* Helper routines for cpu to core mapping */
 int cpu_core_index_of_thread(int cpu)
-- 
1.7.9.7

^ permalink raw reply related

* Re: powerpc allmodconfig build broken due to commit 15863ff3b (powerpc: Make chip-id information available to userspace)
From: Guenter Roeck @ 2013-09-10  1:18 UTC (permalink / raw)
  To: Asai Thambi S P
  Cc: linux-kernel@vger.kernel.org, Vasant Hegde, Shivaprasad G Bhat,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <522E5FD5.40603@micron.com>

On 09/09/2013 04:55 PM, Asai Thambi S P wrote:
> On 09/08/2013 5:28 PM, Guenter Roeck wrote:
>> Hi all,
>>
>> powerpc allmodconfig build on the latest upstream kernel results in:
>>
>> ERROR: ".cpu_to_chip_id" [drivers/block/mtip32xx/mtip32xx.ko] undefined!
>>
>> This is due to commit 15863ff3b (powerpc: Make chip-id information available to userspace).
>> Not surprising, as cpu_to_chip_id() is not exported.
>>
> Apart from the above error, I have a concern on the patch, purely based on the commit message.
> (to be honest, I am not familiar with the ppc architecture)
>
> Commit message of 15863ff3b has the following text.
>
> ******************
> So far "/sys/devices/system/cpu/cpuX/topology/physical_package_id"
> was always default (-1) on ppc64 architecture.
>
> Now, some systems have an ibm,chip-id property in the cpu nodes in
> the device tree. On these systems, we now use this information to
> display physical_package_id
> ******************
>
> Shouldn't the new definition of "topology_physical_package_id" apply only to those systems supporting ibm,chip-id property?
>
Looking into the code, I think that is what it does. For other platforms
(ie if there is no ibm,chip-id property) it still returns -1.

Question for the fix is what path to take to fix the problem.
Exporting cpu_to_chip_id() might be the easiest solution. Other
platforms export the respective data, so it should not be a problem.

I might submit a patch and see where it goes.

Guenter

>
>> Reverting this commit fixes the problem. Any good idea how to fix it for real ?
>>
>> Guenter
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>

^ permalink raw reply

* Re: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
From: Stephen Rothwell @ 2013-09-10  0:25 UTC (permalink / raw)
  To: Tom Musta; +Cc: Paul Mackerras, linuxppc-dev, Tom Musta
In-Reply-To: <OF5204CF2E.B2F7FFF8-ON86257BE1.0044FF5E-86257BE1.0046077F@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

Hi Tom,

On Mon, 9 Sep 2013 07:44:54 -0500 Tom Musta <tmusta@us.ibm.com> wrote:
>
> > Looks like the patch has been mangled by the mailer ... it's in HTML to
> > begin with :-)
> 
> Frustrating.  I did use a plain text option on the mail client and did a
> test send to a few accounts before posting to the mailing list.  The
> patch was verified by checkpatch.pl before it got mangled.
> 
> I will find a better setup and resubmit.

Also please find a mailer that does *not* respond to the envelope
recipient (linuxppc-dev-bounces+tmusta=us.ibm.com@lists.ozlabs.org in
this case) as that just tends to irritate list owners.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
(slightly irritated list owner :-))

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: powerpc allmodconfig build broken due to commit 15863ff3b (powerpc: Make chip-id information available to userspace)
From: Asai Thambi S P @ 2013-09-09 23:55 UTC (permalink / raw)
  To: Vasant Hegde, Shivaprasad G Bhat
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel@vger.kernel.org,
	Guenter Roeck
In-Reply-To: <522D163E.80106@roeck-us.net>

On 09/08/2013 5:28 PM, Guenter Roeck wrote:
> Hi all,
>
> powerpc allmodconfig build on the latest upstream kernel results in:
>
> ERROR: ".cpu_to_chip_id" [drivers/block/mtip32xx/mtip32xx.ko] undefined!
>
> This is due to commit 15863ff3b (powerpc: Make chip-id information 
> available to userspace).
> Not surprising, as cpu_to_chip_id() is not exported.
>
Apart from the above error, I have a concern on the patch, purely based on the commit message.
(to be honest, I am not familiar with the ppc architecture)

Commit message of 15863ff3b has the following text.

******************
So far "/sys/devices/system/cpu/cpuX/topology/physical_package_id"
was always default (-1) on ppc64 architecture.

Now, some systems have an ibm,chip-id property in the cpu nodes in
the device tree. On these systems, we now use this information to
display physical_package_id
******************

Shouldn't the new definition of "topology_physical_package_id" apply only to those systems supporting ibm,chip-id property?


> Reverting this commit fixes the problem. Any good idea how to fix it 
> for real ?
>
> Guenter
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
From: Tom Musta @ 2013-09-09 20:20 UTC (permalink / raw)
  To: Tom Musta; +Cc: linuxppc-dev, Linuxppc-dev, Paul Mackerras, Tom Musta
In-Reply-To: <OF5204CF2E.B2F7FFF8-ON86257BE1.0044FF5E-86257BE1.0046077F@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 271 bytes --]

> > Isn't that code occasionally used with uprobes too nowadays  ?
>
> Yes. I believe so.

I'm going to back-pedal a little.  I reread code and can connect
single step code to kprobes but not necessarily to uprobes.  So
I am not sure that this code is used with uprobes.

[-- Attachment #2: Type: text/html, Size: 487 bytes --]

^ permalink raw reply

* Re: [PATCH v4 12/19] cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
From: Sudeep KarkadaNagesha @ 2013-09-09 15:24 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Jonas Bonn, devicetree@vger.kernel.org, Michal Simek,
	linux-pm@vger.kernel.org, Sudeep KarkadaNagesha, Viresh Kumar,
	linux-kernel@vger.kernel.org, rob.herring@calxeda.com,
	Rafael J. Wysocki, Greg Kroah-Hartman, grant.likely@linaro.org,
	linuxppc-dev@lists.ozlabs.org, Guennadi Liakhovetski,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20130909143253.GD4624@S2101-09.ap.freescale.net>

On 09/09/13 15:32, Shawn Guo wrote:
> Hi Sudeep,
>=20
> On Mon, Sep 09, 2013 at 10:24:39AM +0100, Sudeep KarkadaNagesha wrote:
>> Hi Shawn,
>>
>> Can you please clarify ? The fix would be as below but I would like to
>> know if setting cpu_dev to get_cpu_device(0) instead of &pdev->dev has
>> any impact on other parts of code using cpu_dev ?
>=20
> I'm sorry.  I should have given it a test on hardware before ACKing the
> changes.
>=20
> The fix below should not have other impact except the prefix of dev_err
> [info, dbg] message output ('cpufreq-cpu0:' to 'cpu cpu0:'), which
> shouldn't be a problem.
>
Hi Shawn,

Ok. But I am bit suspicious about devm_clk_get(cpu_dev, NULL).
I don't understand completely as how the clock are registered(whether
with dev_id or with connection_id).

A quick grep revealed that i.mx and shmobile is using conection id while
registering. If the clock is registered with connection id and retrieved
with cpu_dev(now dev_id is cpu0 and not cpufreq-cpu0), IIUC that would
break. If we pass pdev->dev for clk_get, it should be fine but again
IIUC it breaks highbank which gets all the information from DT.

So only solution I can think of is to continue to have the code
assigning (&pdev->dev)->of_node with cpu device node which is not clean
and arguable as incorrect since there is no DT node for cpufreq-cpu0.
I don't have a strong opinion though.

Let me know how would you like to fix this.

>>
>> diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cp=
u0.c
>> index cbfffa9..871c336 100644
>> --- a/drivers/cpufreq/cpufreq-cpu0.c
>> +++ b/drivers/cpufreq/cpufreq-cpu0.c
>> @@ -177,7 +177,7 @@ static int cpu0_cpufreq_probe(struct platform_device
>> *pdev)
>>         struct device_node *np;
>>         int ret;
>>
>> -       cpu_dev =3D &pdev->dev;
>> +       cpu_dev =3D get_cpu_device(0);
>>
>>         np =3D of_node_get(cpu_dev->of_node);
>>         if (!np) {
>>
>=20
> The imx6q-cpufreq driver needs a similar fixing.  Please include the
> following changes into your fixing patches.  Thanks.
>=20
Ok no problem I can post the fix based on response for the above question.

Regard,
Sudeep

^ permalink raw reply

* big latency while under HV
From: Ivan Krivonos @ 2013-09-09 15:19 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

i`m working on the embedded hypervisor targeting QorIQ platforms (p3041/p4080).
I have working prototype starting custom RTOS on just single core in the guest
space. What I see is the big latency (up to 3 times more) in RTOS running atop
of HV comparing to RTOS running bare-metal. I`m using lmbench utility.
It shows

nteger mul: 3.48 nanoseconds
integer div: 30.44 nanoseconds
integer mod: 13.92 nanoseconds
int64 bit: 1.75 nanoseconds
int64 add: 1.42 nanoseconds
int64 mul: 6.95 nanoseconds
HV:hvpriv_count 60000
int64 div: 447.56 nanoseconds
int64 mod: 385.42 nanoseconds
float add: 7.12 nanoseconds
float mul: 6.95 nanoseconds
float div: 33.05 nanoseconds
double add: 7.11 nanoseconds
double mul: 8.70 nanoseconds
double div: 57.36 nanoseconds
float bogomflops: 46.98 nanoseconds
double bogomflops: 73.09 nanoseconds

The bare-metal results are 3x better. Does anybody have any ideas on what
may be the source of such latency ? I forward all the exceptions to
the guest w/o
affecting HV. Only hvpriv is processed, it takes not more than 2 bus cycles.
Sorry for poor english

^ permalink raw reply

* Re: [PATCH v4 12/19] cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
From: Shawn Guo @ 2013-09-09 14:32 UTC (permalink / raw)
  To: Sudeep KarkadaNagesha
  Cc: Jonas Bonn, devicetree@vger.kernel.org, Michal Simek,
	linux-pm@vger.kernel.org, Viresh Kumar,
	linux-kernel@vger.kernel.org, rob.herring@calxeda.com,
	Rafael J. Wysocki, Greg Kroah-Hartman, grant.likely@linaro.org,
	linuxppc-dev@lists.ozlabs.org, Guennadi Liakhovetski,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <522D93D7.4010307@arm.com>

Hi Sudeep,

On Mon, Sep 09, 2013 at 10:24:39AM +0100, Sudeep KarkadaNagesha wrote:
> Hi Shawn,
> 
> Can you please clarify ? The fix would be as below but I would like to
> know if setting cpu_dev to get_cpu_device(0) instead of &pdev->dev has
> any impact on other parts of code using cpu_dev ?

I'm sorry.  I should have given it a test on hardware before ACKing the
changes.

The fix below should not have other impact except the prefix of dev_err
[info, dbg] message output ('cpufreq-cpu0:' to 'cpu cpu0:'), which
shouldn't be a problem.

> 
> diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
> index cbfffa9..871c336 100644
> --- a/drivers/cpufreq/cpufreq-cpu0.c
> +++ b/drivers/cpufreq/cpufreq-cpu0.c
> @@ -177,7 +177,7 @@ static int cpu0_cpufreq_probe(struct platform_device
> *pdev)
>         struct device_node *np;
>         int ret;
> 
> -       cpu_dev = &pdev->dev;
> +       cpu_dev = get_cpu_device(0);
> 
>         np = of_node_get(cpu_dev->of_node);
>         if (!np) {
> 

The imx6q-cpufreq driver needs a similar fixing.  Please include the
following changes into your fixing patches.  Thanks.

Shawn

---8<---------

diff --git a/arch/arm/mach-imx/mach-imx6q.c b/arch/arm/mach-imx/mach-imx6q.c
index 85a1b51..69fd4b6 100644
--- a/arch/arm/mach-imx/mach-imx6q.c
+++ b/arch/arm/mach-imx/mach-imx6q.c
@@ -233,9 +233,10 @@ put_node:
 	of_node_put(np);
 }
 
-static void __init imx6q_opp_init(struct device *cpu_dev)
+static void __init imx6q_opp_init(void)
 {
 	struct device_node *np;
+	struct device *cpu_dev = get_cpu_device(0);
 
 	np = of_node_get(cpu_dev->of_node);
 	if (!np) {
@@ -268,7 +269,7 @@ static void __init imx6q_init_late(void)
 		imx6q_cpuidle_init();
 
 	if (IS_ENABLED(CONFIG_ARM_IMX6Q_CPUFREQ)) {
-		imx6q_opp_init(&imx6q_cpufreq_pdev.dev);
+		imx6q_opp_init();
 		platform_device_register(&imx6q_cpufreq_pdev);
 	}
 }
diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
index 3e39654..d7ebd91 100644
--- a/drivers/cpufreq/imx6q-cpufreq.c
+++ b/drivers/cpufreq/imx6q-cpufreq.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/clk.h>
+#include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/delay.h>
 #include <linux/err.h>
@@ -202,7 +203,7 @@ static int imx6q_cpufreq_probe(struct platform_device *pdev)
 	unsigned long min_volt, max_volt;
 	int num, ret;
 
-	cpu_dev = &pdev->dev;
+	cpu_dev = get_cpu_device(0);
 
 	np = of_node_get(cpu_dev->of_node);
 	if (!np) {

^ permalink raw reply related

* Re: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
From: Tom Musta @ 2013-09-09 12:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, linuxppc-dev, Linuxppc-dev, Tom Musta
In-Reply-To: <1378721499.11525.5.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]



> Looks like the patch has been mangled by the mailer ... it's in HTML to
> begin with :-)

Frustrating.  I did use a plain text option on the mail client and did a
test send to a few accounts before posting to the mailing list.  The
patch was verified by checkpatch.pl before it got mangled.

I will find a better setup and resubmit.

> Isn't that code occasionally used with uprobes too nowadays  ?

Yes. I believe so.

Tom Musta (tmusta@us.ibm.com)
Senior Software Engineer
Blue Gene Kernel Development
IBM Rochester
(507) 253-4119   (T/L 553-4119)

[-- Attachment #2: Type: text/html, Size: 947 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
From: Benjamin Herrenschmidt @ 2013-09-09 10:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Tom Musta, linuxppc-dev
In-Reply-To: <20130908223559.GA495@iris.ozlabs.ibm.com>

On Mon, 2013-09-09 at 08:35 +1000, Paul Mackerras wrote:
> 
> Also, you need to indent your code correctly according to
> Documentation/CodingStyle.

Looks like the patch has been mangled by the mailer ... it's in HTML to
begin with :-)

Tom, you need to find a mailer setup that works for sending patches,
basically that sends them in plain text without any mangling of
tabs and spaces and without word wrapping.

I personally use "evolution" under Linux which has a "preformatted"
style mode for that but I'm sure there are plenty of other options.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
From: Benjamin Herrenschmidt @ 2013-09-09 10:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Tom Musta, linuxppc-dev
In-Reply-To: <20130908223559.GA495@iris.ozlabs.ibm.com>

On Mon, 2013-09-09 at 08:35 +1000, Paul Mackerras wrote:
> On Fri, Sep 06, 2013 at 10:13:00AM -0500, Tom Musta wrote:
> > To: linuxppc-dev@lists.ozlabs.org
> > Subject: [PATCH] powerpc: OE=1 Form Instructions Not Decoded Correctly
> > From: Tom Musta <tommusta@gmail.com>
> > 
> > PowerISA uses instruction bit 21 to indicate that the overflow (OV) bit
> > of the XER is to be set, as well as its corresponding sticky bit (SO).
> > This patch addresses two defects in the implementation of the PowerISA
> > single step code for this category of instructions:  (a) the OE=1 case
> > is not correctly accounted for in the case statement for the extended
> > opcode handling.  (b) the implementation is not setting XER[OV] and
> > XER[SO].
> 
> Are you seeing any actual problems arising from the OE=1 instructions
> not being emulated?  This code was designed primarily for emulating
> instructions in the kernel, which is written in C, and the C compiler
> doesn't emit OE=1 instructions -- or at least it didn't in the past.
> So, does the impetus for this change come because the C compiler is
> now emitting these instructions, or because this code is being used on
> non-kernel instructions, or just for completeness?  Your patch
> description needs to include answers to these kinds of questions.

Isn't that code occasionally used with uprobes too nowadays  ?

Cheers,
Ben.

> Also, you need to indent your code correctly according to
> Documentation/CodingStyle.
> 
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* [PATCH V3] powerpc: Add I2C bus multiplexer node for B4 and T4240QDS
From: Jia Hongtao @ 2013-09-09 10:03 UTC (permalink / raw)
  To: linuxppc-dev, galak, B07421; +Cc: hongtao.jia, Wei.Yang

In both B4 and T4240QDS platform PCA9547 I2C bus multiplexer is used.
The sub-nodes are also reorganized according to right I2C topology.

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
---
V3 change log:
* Change "channel" to "i2c" based on i2c-mux binding.
* Update vendor from philips to nxp.

V2 change log:
* Reorganized the sub-nodes under I2C multiplexer to represent right topology.

 arch/powerpc/boot/dts/b4qds.dtsi   | 51 +++++++++++++++++-----------
 arch/powerpc/boot/dts/t4240qds.dts | 69 ++++++++++++++++++++++----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/boot/dts/b4qds.dtsi b/arch/powerpc/boot/dts/b4qds.dtsi
index e6d2f8f..8b47edc 100644
--- a/arch/powerpc/boot/dts/b4qds.dtsi
+++ b/arch/powerpc/boot/dts/b4qds.dtsi
@@ -120,25 +120,38 @@
 		};
 
 		i2c@118000 {
-			eeprom@50 {
-				compatible = "at24,24c64";
-				reg = <0x50>;
-			};
-			eeprom@51 {
-				compatible = "at24,24c256";
-				reg = <0x51>;
-			};
-			eeprom@53 {
-				compatible = "at24,24c256";
-				reg = <0x53>;
-			};
-			eeprom@57 {
-				compatible = "at24,24c256";
-				reg = <0x57>;
-			};
-			rtc@68 {
-				compatible = "dallas,ds3232";
-				reg = <0x68>;
+			mux@77 {
+				compatible = "nxp,pca9547";
+				reg = <0x77>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+
+				i2c@0 {
+					#address-cells = <1>;
+					#size-cells = <0>;
+					reg = <0>;
+
+					eeprom@50 {
+						compatible = "at24,24c64";
+						reg = <0x50>;
+					};
+					eeprom@51 {
+						compatible = "at24,24c256";
+						reg = <0x51>;
+					};
+					eeprom@53 {
+						compatible = "at24,24c256";
+						reg = <0x53>;
+					};
+					eeprom@57 {
+						compatible = "at24,24c256";
+						reg = <0x57>;
+					};
+					rtc@68 {
+						compatible = "dallas,ds3232";
+						reg = <0x68>;
+					};
+				};
 			};
 		};
 
diff --git a/arch/powerpc/boot/dts/t4240qds.dts b/arch/powerpc/boot/dts/t4240qds.dts
index 0555976..a35508f 100644
--- a/arch/powerpc/boot/dts/t4240qds.dts
+++ b/arch/powerpc/boot/dts/t4240qds.dts
@@ -118,34 +118,47 @@
 		};
 
 		i2c@118000 {
-			eeprom@51 {
-				compatible = "at24,24c256";
-				reg = <0x51>;
-			};
-			eeprom@52 {
-				compatible = "at24,24c256";
-				reg = <0x52>;
-			};
-			eeprom@53 {
-				compatible = "at24,24c256";
-				reg = <0x53>;
-			};
-			eeprom@54 {
-				compatible = "at24,24c256";
-				reg = <0x54>;
-			};
-			eeprom@55 {
-				compatible = "at24,24c256";
-				reg = <0x55>;
-			};
-			eeprom@56 {
-				compatible = "at24,24c256";
-				reg = <0x56>;
-			};
-			rtc@68 {
-				compatible = "dallas,ds3232";
-				reg = <0x68>;
-				interrupts = <0x1 0x1 0 0>;
+			mux@77 {
+				compatible = "nxp,pca9547";
+				reg = <0x77>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+
+				i2c@0 {
+					#address-cells = <1>;
+					#size-cells = <0>;
+					reg = <0>;
+
+					eeprom@51 {
+						compatible = "at24,24c256";
+						reg = <0x51>;
+					};
+					eeprom@52 {
+						compatible = "at24,24c256";
+						reg = <0x52>;
+					};
+					eeprom@53 {
+						compatible = "at24,24c256";
+						reg = <0x53>;
+					};
+					eeprom@54 {
+						compatible = "at24,24c256";
+						reg = <0x54>;
+					};
+					eeprom@55 {
+						compatible = "at24,24c256";
+						reg = <0x55>;
+					};
+					eeprom@56 {
+						compatible = "at24,24c256";
+						reg = <0x56>;
+					};
+					rtc@68 {
+						compatible = "dallas,ds3232";
+						reg = <0x68>;
+						interrupts = <0x1 0x1 0 0>;
+					};
+				};
 			};
 		};
 	};
-- 
1.8.0

^ permalink raw reply related

* Re: Please pull 'next' branch of 5xxx tree
From: Anatolij Gustschin @ 2013-09-09  9:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1378363848.4321.129.camel@pasglop>

On Thu, 05 Sep 2013 16:50:48 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
...
> Thanks. BTW. Next time, any chance you can base this off the same point
> in Linus tree where my next branch is based  ? Or base of my next
> branch :-)

Okay, I will base of your next branch.

Thanks,

Anatolij

^ permalink raw reply

* Re: [RFC PATCH v3 04/12] Validate r1 value before going to host kernel in virtual mode.
From: Benjamin Herrenschmidt @ 2013-09-09  9:26 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Mahesh J Salgaonkar, Jeremy Kerr, Anton Blanchard, linuxppc-dev
In-Reply-To: <20130909052930.GD6248@drongo>

On Mon, 2013-09-09 at 15:29 +1000, Paul Mackerras wrote:
> Are we guaranteed that Sapphire will keep the stack pointer positive
> at all times?  (More a question for Ben H than you.)

Good point, we *might* eventually set the top bit..

Ben.

^ permalink raw reply

* Re: [PATCH v4 12/19] cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
From: Sudeep KarkadaNagesha @ 2013-09-09  9:24 UTC (permalink / raw)
  To: Guennadi Liakhovetski, Shawn Guo
  Cc: Jonas Bonn, devicetree@vger.kernel.org, Michal Simek,
	linux-pm@vger.kernel.org, Sudeep KarkadaNagesha, Viresh Kumar,
	linux-kernel@vger.kernel.org, rob.herring@calxeda.com,
	Rafael J. Wysocki, Greg Kroah-Hartman, grant.likely@linaro.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <Pine.LNX.4.64.1309061539310.11068@axis700.grange>

On 06/09/13 14:44, Guennadi Liakhovetski wrote:
> Hi
>=20
> On Tue, 20 Aug 2013, Sudeep KarkadaNagesha wrote:
>=20
>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>
>> Now that the cpu device registration initialises the of_node(if availabl=
e)
>> appropriately for all the cpus, parsing here is redundant.
>>
>> This patch removes all DT parsing and uses cpu->of_node instead.
>>
>> Acked-by: Shawn Guo <shawn.guo@linaro.org>
>> Acked-by: Rob Herring <rob.herring@calxeda.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>> ---
>>  drivers/cpufreq/cpufreq-cpu0.c | 23 ++++-------------------
>>  1 file changed, 4 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cp=
u0.=3D
>> c
>> index ad1fde2..5b05c26 100644
>> --- a/drivers/cpufreq/cpufreq-cpu0.c
>> +++ b/drivers/cpufreq/cpufreq-cpu0.c
>> @@ -174,29 +174,17 @@ static struct cpufreq_driver cpu0_cpufreq_driver =
=3D3D =3D
>> {
>> =3D20
>>  static int cpu0_cpufreq_probe(struct platform_device *pdev)
>>  {
>> -=3D09struct device_node *np, *parent;
>> +=3D09struct device_node *np;
>>  =3D09int ret;
>> =3D20
>> -=3D09parent =3D3D of_find_node_by_path("/cpus");
>> -=3D09if (!parent) {
>> -=3D09=3D09pr_err("failed to find OF /cpus\n");
>> -=3D09=3D09return -ENOENT;
>> -=3D09}
>> -
>> -=3D09for_each_child_of_node(parent, np) {
>> -=3D09=3D09if (of_get_property(np, "operating-points", NULL))
>> -=3D09=3D09=3D09break;
>> -=3D09}
>> +=3D09cpu_dev =3D3D &pdev->dev;
>> =3D20
>> +=3D09np =3D3D of_node_get(cpu_dev->of_node);
>=20
> Has this actually been tested? This seems to break cpufreq-cpu0. The=20
> reason is, that this probe function is called not for the DT CPU node, bu=
t=20
> for a special virtual cpufreq-cpu0 platform device, typically created by=
=20
> platforms, using
>=20
> =09platform_device_register_simple("cpufreq-cpu0", -1, NULL, 0);
>=20
> which then of course doesn't have on .of_node associated with it.
>=20

Hi Guennadi,

Based on my understanding of the original code:
        cpu_dev =3D &pdev->dev;
=09...
=09ret =3D of_init_opp_table(cpu_dev);

of_init_opp_table needs cpu_dev to be get_cpu_device(0). My
understanding was that platform using cpufreq-cpu0 sets &pdev->dev to
get_cpu_device(0). But looks like that's not the case.

Hi Shawn,

Can you please clarify ? The fix would be as below but I would like to
know if setting cpu_dev to get_cpu_device(0) instead of &pdev->dev has
any impact on other parts of code using cpu_dev ?

diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.=
c
index cbfffa9..871c336 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -177,7 +177,7 @@ static int cpu0_cpufreq_probe(struct platform_device
*pdev)
        struct device_node *np;
        int ret;

-       cpu_dev =3D &pdev->dev;
+       cpu_dev =3D get_cpu_device(0);

        np =3D of_node_get(cpu_dev->of_node);
        if (!np) {


Regards,
Sudeep

^ permalink raw reply related

* Re: [RFC PATCH v3 08/12] powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8.
From: Paul Mackerras @ 2013-09-09  6:01 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193220.2855.33875.stgit@mars>

On Tue, Aug 27, 2013 at 01:02:20AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This patch handles the memory errors on power8. If we get a machine check
> exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to
> recover.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [RFC PATCH v3 07/12] powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.
From: Paul Mackerras @ 2013-09-09  6:00 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193212.2855.38138.stgit@mars>

On Tue, Aug 27, 2013 at 01:02:12AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> If we get a machine check exception due to SLB or TLB errors, then flush
> SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
> on MMU. Otherwise we would run into nested machine checks.
> 
> If we get a machine check when we are in guest, then just flush the
> SLBs and continue. This patch handles errors for power7. The next
> patch will handle errors for power8

Some comments...

> +/* PPC bit number conversion */
> +#define PPC_BIT(bit)		(0x8000000000000000UL >> (bit))
> +#define PPC_BITLSHIFT(be)	(63 - (be))
> +#define PPC_BITMASK(bs, be)	((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))

This will break if anyone is so incautious as to use this in a 32-bit
kernel.  It would be kinder to do:

#define PPC_BITLSHIFT(be)	(BITS_PER_LONG - 1 - (be))
#define PPC_BIT(bit)		(1ul << PPC_BITLSHIFT(bit))

> +/* SRR1 bits for machine check (On Power7 and Power8) */
> +#define P7_SRR1_MC_IFETCH(srr1)	((srr1) & PPC_BITMASK(43, 45)) /* P8 too */
> +
> +#define P7_SRR1_MC_IFETCH_UE		(0x1 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_SLB_PARITY	(0x2 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_SLB_MULTIHIT	(0x3 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_SLB_BOTH	(0x4 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_TLB_MULTIHIT	(0x5 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_UE_TLB_RELOAD	(0x6 << PPC_BITLSHIFT(45)) /* P8 too */
> +#define P7_SRR1_MC_IFETCH_UE_IFU_INTERNAL	(0x7 << PPC_BITLSHIFT(45))
> +
> +/* SRR1 bits for machine check (On Power8) */
> +#define P8_SRR1_MC_IFETCH_ERAT_MULTIHIT	(0x4 << PPC_BITLSHIFT(45))

How do we tell the difference between that and P7_SRR1_MC_IFETCH_SLB_BOTH?

> +/* flush SLBs and reload */
> +static void flush_and_reload_slb(void)
> +{
> +	struct slb_shadow *slb;
> +	unsigned long i, n;
> +
> +	if (!mmu_has_feature(MMU_FTR_SLB))
> +		return;

This seems unnecessary when we can only call this on POWER7.

Paul.

^ permalink raw reply

* Re: [RFC PATCH v3 06/12] powerpc/book3s: Add flush_tlb operation in cpu_spec.
From: Paul Mackerras @ 2013-09-09  5:36 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193204.2855.6506.stgit@mars>

On Tue, Aug 27, 2013 at 01:02:04AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This patch introduces flush_tlb operation in cpu_spec structure. This will
> help us to invoke appropriate CPU-side flush tlb routine. This patch
> adds the foundation to invoke CPU specific flush routine for respective
> architectures. Currently this patch introduce flush_tlb for p7 and p8.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [RFC PATCH v3 05/12] powerpc/book3s: Introduce a early machine check hook in cpu_spec.
From: Paul Mackerras @ 2013-09-09  5:33 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193156.2855.85412.stgit@mars>

On Tue, Aug 27, 2013 at 01:01:56AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This patch adds the early machine check function pointer in cputable for
> CPU specific early machine check handling. The early machine handle routine
> will be called in real mode to handle SLB and TLB errors. This patch just
> sets up a mechanism invoke CPU specific handler. The subsequent patches
> will populate the function pointer.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Your patch description should talk about how this new hook is
different from the existing machine_check hook and why you need a new
one.

Apart from that:

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [RFC PATCH v3 04/12] Validate r1 value before going to host kernel in virtual mode.
From: Paul Mackerras @ 2013-09-09  5:29 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193148.2855.95627.stgit@mars>

On Tue, Aug 27, 2013 at 01:01:48AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> We can get machine checks from any context. We need to make sure that
> we handle all of them correctly. Once we decode MCE reason and generate
> MCE event, we continue in host kernel in virtual mode so that we can
> log/display it later. But before going to virtual mode we need to make
> sure that r1 points to host kernel stack. But machine check can occur
> in any context and r1 may not always point to host kernel stack. In cases
> where we can not trust r1 value, we should queue up the MCE event and return
> from interrupt. This patch implements the additional checks that helps to
> decide whether to deleiver machine check event to host kernel right away
> or queue it up and return.

Some comments below...

> +	/*
> +	 * We are now going to host kernel in V mode. We need to make sure
> +	 * that r1 points to host kernel stack.
> +	 *
> +	 * If we are coming from userspace then we can continue in host kernel
> +	 * in V mode.
> +	 * But if we are coming from kernel and r1 does not point to kernel
> +	 * stack then we can not continue, instead we return from here.
> +	 */
> +
> +	ld	r12,_MSR(r1)
> +	andi.	r11,r12,MSR_PR		/* See if coming from user. */
> +	bne	3f			/* continue if we are. */
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HV
> +	/*
> +	 * We are coming from kernel context. Check if we are coming from
> +	 * guest. if yes, then we can continue. We will fall through
> +	 * do_kvm_200->kvmppc_interrupt which will setup r1 correctly.
> +	 */

It seems fragile to have to check various conditions to know whether
r1 is actually a kernel stack pointer, but I guess it's the best we
can do at present.

> +	lbz	r11,HSTATE_IN_GUEST(r13)
> +	cmpwi	r11,0			/* Check if coming from guest */
> +	bne	3f			/* continue if we are. */
> +
> +	/*
> +	 * So, we did not come from guest. That leaves three possibilities:
> +	 * a. We come from secondary thread which just came out of nap and
> +	 *    about to call kvm_start_guest.
> +	 * b. We come from secondary thread which is about to go to nap
> +	 *    state (see kvm_no_guest()).
> +	 * c. We come from opal context and r1 may be pointing to opal
> +	 *    kernel stack.
> +	 */
> +
> +	lbz	r11,HSTATE_HWTHREAD_STATE(r13)
> +	cmpwi	r11,KVM_HWTHREAD_IN_NAP	/* Was it nap-ing? or about to */
> +	beq	0f		/* Queue up event and return from interrupt */

Two comments here: first, we change the hwthread_state to
KVM_HWTHREAD_IN_KERNEL before loading up r1 -- this is in
system_reset_pSeries in exceptions-64s.S.  So this test isn't really
safe.  It would be possible to add ld r1, PACAR1(r13) before setting
the hwthread_state, and I think that would fix it.

Secondly, if the CPU is napping when the machine check comes along,
it doesn't jump to the machine check vector.  It restarts the CPU at
the system reset vector, with a particular wakeup code in SRR1, which
we currently don't handle.  So you need to add code to do that.

> +	 * So far we checked all possible situations where we can not
> +	 * trust r1. Now we can trust r1.
> +	 *	r1 < 0		r1 points to host kernel stack
> +	 *	r1 > 0		r1 points to opal stack

Are we guaranteed that Sapphire will keep the stack pointer positive
at all times?  (More a question for Ben H than you.)

Paul.

^ permalink raw reply

* Re: [RFC PATCH v3 03/12] powerpc/book3s: handle machine check in Linux host.
From: Paul Mackerras @ 2013-09-09  4:52 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193140.2855.96452.stgit@mars>

On Tue, Aug 27, 2013 at 01:01:40AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Move machine check entry point into Linux. So far we were dependent on
> firmware to decode MCE error details and handover the high level info to OS.
> 
> This patch introduces early machine check routine that saves the MCE
> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
> stack frame on emergency stack and set the r1 accordingly. This allows us to be
> prepared to take another exception without loosing context. One thing to note
> here that, if we get another machine check while ME bit is off then we risk a
> checkstop. Hence we restrict ourselves to save only MCE information and turn
> the ME bit on. We use paca->in_mce flag to differentiate between first entry
> and nested machine check entry which helps proper use of emergency stack. We
> increment paca->in_mce every time we enter in early machine check handler and
> decrement it while leaving. When we enter machine check early handler first
> time (paca->in_mce == 0), we are sure nobody is using MC emergency stack and
> allocate a stack frame at the start of the emergency stack. During subsequent
> entry (paca->in_mce > 0), we know that r1 points inside emergency stack and we
> allocate separate stack frame accordingly. This prevents us from clobbering MCE
> information during nested machine checks.
> 
> The early machine check handler changes are placed under CPU_FTR_HVMODE
> section. This makes sure that the early machine check handler will get executed
> only in hypervisor kernel.

I have a few comments on this patch...

>  	.align	7
>  	/* moved from 0x200 */
> +machine_check_pSeries_early:
> +BEGIN_FTR_SECTION
> +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> +	/*
> +	 * Register contents:
> +	 * R12		= interrupt vector

Actually r12 doesn't contain anything except the value it had when the
machine check occurred.

> +	 * R13		= PACA
> +	 * R9		= CR
> +	 * R11 & R12 is saved on PACA_EXMC

All of r9 to r12 are saved in the PACA_EXMC save area.

> +	 *
> +	 * Switch to mc_emergency stack and handle re-entrancy (though we
> +	 * currently don't test for overflow). Save MCE registers srr1,
> +	 * srr0, dar and dsisr and then set ME=1
> +	 *
> +	 * We use paca->in_mce to check whether this is the first entry or
> +	 * nested machine check. We increment paca->in_mce to track nested
> +	 * machine checks.
> +	 *
> +	 * If this is the first entry then set stack pointer to
> +	 * paca->mc_emergency_sp, otherwise r1 is already pointing to
> +	 * stack frame on mc_emergency stack.
> +	 *
> +	 * NOTE: We are here with MSR_ME=0 (off), which means we risk a
> +	 * checkstop if we get another machine check exception before we do
> +	 * rfid with MSR_ME=1.
> +	 */
> +	mr	r11,r1			/* Save r1 */
> +	lhz	r10,PACA_IN_MCE(r13)
> +	cmpwi	r10,0			/* Are we in nested machine check */
> +	bne	0f			/* Yes, we are. */
> +	/* First machine check entry */
> +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency stack */
> +0:	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame */
> +	addi	r10,r10,1		/* increment paca->in_mce */
> +	sth	r10,PACA_IN_MCE(r13)
> +	std	r11,GPR1(r1)		/* Save r1 on the stack. */
> +	std	r11,0(r1)		/* make stack chain pointer */
> +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> +	std	r11,_NIP(r1)
> +	mfspr	r11,SPRN_SRR1		/* Save SRR1 */
> +	std	r11,_MSR(r1)
> +	mfspr	r11,SPRN_DAR		/* Save DAR */
> +	std	r11,_DAR(r1)
> +	mfspr	r11,SPRN_DSISR		/* Save DSISR */
> +	std	r11,_DSISR(r1)
> +	mfmsr	r11			/* get MSR value */
> +	ori	r11,r11,MSR_ME		/* turn on ME bit */

At this point you should turn on MSR_RI as well, so that if we get
another machine check after the rfid we don't consider it
unrecoverable.  Also, since we could in principle get another machine
check soon after the rfid, and that would use the EX_MC save area,
we need to copy everything out of the EX_MC save area onto the
stack before turning on ME.

> +	ld	r12,PACAKBASE(r13)	/* get high part of &label */
> +	LOAD_HANDLER(r12, machine_check_handle_early)
> +	mtspr	SPRN_SRR0,r12
> +	mtspr	SPRN_SRR1,r11
> +	rfid
> +	b	.	/* prevent speculative execution */
> +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
> +
>  machine_check_pSeries:
>  	.globl machine_check_fwnmi
>  machine_check_fwnmi:
> @@ -681,6 +740,56 @@ machine_check_common:
>  	bl	.machine_check_exception
>  	b	.ret_from_except
>  
> +#define MACHINE_CHECK_HANDLER_WINDUP			\
> +	/* Move original SRR0 and SRR1 into the respective regs */	\
> +	ld	r9,_MSR(r1);				\
> +	mtspr	SPRN_SRR1,r9;				\
> +	ld	r3,_NIP(r1);				\
> +	mtspr	SPRN_SRR0,r3;				\
> +	REST_NVGPRS(r1);				\
> +	ld	r9,_CTR(r1);				\
> +	mtctr	r9;					\
> +	ld	r9,_XER(r1);				\
> +	mtxer	r9;					\
> +BEGIN_FTR_SECTION_NESTED(66);				\
> +	ld	r9,ORIG_GPR3(r1);			\
> +	mtspr	SPRN_CFAR,r9;				\
> +END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66);	\
> +	ld	r9,_LINK(r1);				\
> +	mtlr	r9;					\
> +	REST_GPR(0, r1);				\
> +	REST_8GPRS(2, r1);				\
> +	REST_GPR(10, r1);				\
> +	ld	r11,_CCR(r1);				\
> +	mtcr	r11;					\
> +	/* Decrement paca->in_mce. */			\
> +	lhz	r12,PACA_IN_MCE(r13);			\
> +	subi	r12,r12,1;				\
> +	sth	r12,PACA_IN_MCE(r13);			\
> +	REST_GPR(11, r1);				\
> +	REST_2GPRS(12, r1);				\
> +	/* restore original r1. */			\
> +	ld	r1,GPR1(r1)

This is basically fast_exception_return without the rfid, and with the
decrementing of paca->in_mce.  You forgot to clear MSR_RI before
setting SRR0 and SRR1.  Also, there's no point restoring CFAR if the
next thing you are going to do is either an rfid or a branch, both of
which will set CFAR.  Further, you don't need the REST_NVGPRS unless
you are expecting that your machine check handler will want to modify
some of the GPR values in regs->gpr[14 ... 31], which I don't believe
is the case.

Paul.

^ permalink raw reply

* Re: [RFC PATCH v3 02/12] powerpc/book3s: Introduce exclusive emergency stack for machine check exception.
From: Paul Mackerras @ 2013-09-09  4:30 UTC (permalink / raw)
  To: Mahesh J Salgaonkar; +Cc: linuxppc-dev, Jeremy Kerr, Anton Blanchard
In-Reply-To: <20130826193132.2855.85275.stgit@mars>

On Tue, Aug 27, 2013 at 01:01:32AM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This patch introduces exclusive emergency stack for machine check exception.
> We use emergency stack to handle machine check exception so that we can save
> MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be
> ready for re-entrancy. This helps us to prevent clobbering of MCE information
> in case of nested machine checks.
> 
> The reason for using emergency stack over normal kernel stack is that the
> machine check might occur in the middle of setting up a stack frame which may
> result into improper use of kernel stack.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox