LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/4][v2] powerpc/fsl-booke: Add initial silicon device tree files for B4860 and B4420
From: Kumar Gala @ 2013-04-10 15:17 UTC (permalink / raw)
  To: Shaveta Leekha
  Cc: Zhao Chenhui, Minghuan Lian, Vakul Garg, Tang Yuantian,
	Andy Fleming, Ramneek Mehresh, Varun Sethi, linuxppc-dev
In-Reply-To: <1365143632-19362-1-git-send-email-shaveta@freescale.com>


On Apr 5, 2013, at 1:33 AM, Shaveta Leekha wrote:

> B4860 and B4420 are similar that share some commonalities
>=20
> * common features have been added in b4si-pre.dtsi and b4si-post.dtsi
> * differences are added in respective silicon files of B4860 and B4420
>=20
> There are several things missing from the device trees of B4860 and =
B4420:
>=20
> * DPAA related nodes (Qman, Bman, Fman, Rman)
> * DSP related nodes/information
> * serdes, sfp(security fuse processor), thermal,
> gpio, maple, cpri, quad timers nodes
>=20
> Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
> Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
> Signed-off-by: Li Yang <leoli@freescale.com>
> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
> Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
> Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
> Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: Andy Fleming <afleming@freescale.com>
> Signed-off-by: Vakul Garg <vakul@freescale.com>
> ---
> v2:=20
>   - incorporated review comments on commits message
>   - change unit address of cpu nodes to match the reg property
>=20
> arch/powerpc/boot/dts/fsl/b4420si-post.dtsi |   94 ++++++++++
> arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi  |   49 +++++
> arch/powerpc/boot/dts/fsl/b4860si-post.dtsi |  138 ++++++++++++++
> arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi  |   59 ++++++
> arch/powerpc/boot/dts/fsl/b4si-post.dtsi    |  262 =
+++++++++++++++++++++++++++
> arch/powerpc/boot/dts/fsl/b4si-pre.dtsi     |   65 +++++++
> 6 files changed, 667 insertions(+), 0 deletions(-)
> create mode 100644 arch/powerpc/boot/dts/fsl/b4420si-post.dtsi
> create mode 100644 arch/powerpc/boot/dts/fsl/b4420si-pre.dtsi
> create mode 100644 arch/powerpc/boot/dts/fsl/b4860si-post.dtsi
> create mode 100644 arch/powerpc/boot/dts/fsl/b4860si-pre.dtsi
> create mode 100644 arch/powerpc/boot/dts/fsl/b4si-post.dtsi
> create mode 100644 arch/powerpc/boot/dts/fsl/b4si-pre.dtsi

* merged b4si-pre.dtsi into b4860-pre.dtsi & b4420-pre.dtsi
* Make some fixes to GUTs related nodes

applied to next

- k

^ permalink raw reply

* Re: [RFC][PATCH 1/2] powerpc/fsl-pci: Keep PCI SoC controller registers in
From: Kumar Gala @ 2013-04-10 15:17 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1363201636-7318-1-git-send-email-galak@kernel.crashing.org>


On Mar 13, 2013, at 2:07 PM, Kumar Gala wrote:

> Move to keeping the SoC registers that control and config the PCI
> controllers on FSL SoCs in the pci_controller struct.  This allows us =
to
> not need to ioremap() the registers in multiple different places that
> use them.
>=20
> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> ---
> arch/powerpc/include/asm/pci-bridge.h |    5 ++-
> arch/powerpc/sysdev/fsl_pci.c         |   69 =
++++++++++++++-------------------
> 2 files changed, 34 insertions(+), 40 deletions(-)

applied to next

- k=

^ permalink raw reply

* Re: [PATCH] powerpc: remove "config MPC10X_OPENPIC"
From: Kumar Gala @ 2013-04-10 15:16 UTC (permalink / raw)
  To: Paul Bolle; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <1365449711.1830.130.camel@x61.thuisdomein>


On Apr 8, 2013, at 2:35 PM, Paul Bolle wrote:

> The last users of Kconfig symbol MPC10X_OPENPIC were removed in v2.6.27.
> Its Kconfig entry can be removed now.
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
> ---
> Untested.
> 
> arch/powerpc/platforms/embedded6xx/Kconfig | 5 -----
> 1 file changed, 5 deletions(-)

applied to next

- k

^ permalink raw reply

* Re: [PATCH] of: remove the unnecessary of_node_put for of_parse_phandle_with_args()
From: Stephen Rothwell @ 2013-04-10 11:09 UTC (permalink / raw)
  To: Tang Yuantian-B29983
  Cc: devicetree-discuss@lists.ozlabs.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	rob.herring@calxeda.com
In-Reply-To: <D07C73A334FF604B95B3CBD2A545D07B0B128AE0@039-SN2MPN1-013.039d.mgd.msft.net>

[-- Attachment #1: Type: text/plain, Size: 334 bytes --]

Hi,

On Wed, 10 Apr 2013 09:06:11 +0000 Tang Yuantian-B29983 <B29983@freescale.com> wrote:
>
> Yes, I already sent out the v2 patch.
> Please see: http://patchwork.ozlabs.org/patch/235288/

Thanks.  I should read all my email before replying to it :-)

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc/pci: fix PCI-e devices rescan issue on powerpc platform
From: Chen Yuanquan-B41889 @ 2013-04-10  9:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yuanquan Chen, linux-pci, bhelgaas, linuxppc-dev, Hiroo Matsumoto
In-Reply-To: <515BAB44.9070001@freescale.com>

On 04/03/2013 12:08 PM, Chen Yuanquan-B41889 wrote:
> On 04/02/2013 11:10 PM, Benjamin Herrenschmidt wrote:
>> On Tue, 2013-04-02 at 19:26 +0800, Yuanquan Chen wrote:
>>> So we move the DMA & IRQ initialization code from 
>>> pcibios_setup_devices() and
>>> construct a new function pcibios_enable_device. We call this 
>>> function in
>>> pcibios_enable_device, which will be called by PCI-e rescan code. At 
>>> the
>>> meanwhile, we avoid the the impact on cardbus. I also validate this 
>>> patch with
>>> silicon's PCIe-sata which encounters the IRQ issue.
>> My worry is that this delays the setup of the IRQ and DMA to very 
>> late in
>> the process, possibly after the quirks have been run, which can be
>> problematic. We have platform hooks that might try to "fixup" specific
>> IRQ issues on some platforms (especially macs) which I worry might fail
>> if delayed that way (I may be wrong, I don't have a specific case in 
>> mind,
>> but I would feel better if we kept setting up these things earlier).
>>
>> Cheers,
>> Ben.
>>
>
> Hi Ben,
>
> I have checked all the quirk functions which are declared in kernel 
> arch/powerpc
> with command :
> grep DECLARE_PCI_FIXUP_ `find arch/powerpc/ *.[hc]`
>
> All the quirk function are defined as DECLARE_PCI_FIXUP_EARLY , 
> DECLARE_PCI_FIXUP_HEADER
> and DECLARE_PCI_FIXUP_FINAL, except quirk_uli5229() in 
> arch/powerpc/platforms/fsl_uli1575.c, which is
> defined both as DECLARE_PCI_FIXUP_HEADER and DECLARE_PCI_FIXUP_RESUME. 
> So the quirk_uli5229()
> will also be called with PCI pm module. The quirk functions defined as 
> xxx_FINAL, HEADER and EARLY,
> will be called in the path:
>
> pci_scan_child_bus()->pci_scan_slot()->pci_scan_single_device()->pci_scan_device()->pci_setup_device() 
>
> ->pci_device_add()
>
> the pci_scan_slot() is called earlier than pcibios_fixup_bus() even 
> for the first scan of PCI-e bus, so the quirk
> functions on powerpc platform is called before the DMA & IRQ fixup. So 
> in reality, the delay of DMA & IRQ fixup
> won't affect anything.
>
> Regards,
> Yuanquan
>

Hi Ben,

How do you think about this? Do you have any comment?

Thanks,
Yuanquan

>>
>>
>>
>

^ permalink raw reply

* RE: [PATCH] of: remove the unnecessary of_node_put for of_parse_phandle_with_args()
From: Tang Yuantian-B29983 @ 2013-04-10  9:06 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: devicetree-discuss@lists.ozlabs.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	rob.herring@calxeda.com
In-Reply-To: <20130410190327.a70d4c3798509d664aef2f13@canb.auug.org.au>

DQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IFN0ZXBoZW4gUm90aHdlbGwg
W21haWx0bzpzZnJAY2FuYi5hdXVnLm9yZy5hdV0NCj4gU2VudDogMjAxM8TqNNTCMTDI1SAxNzow
Mw0KPiBUbzogVGFuZyBZdWFudGlhbi1CMjk5ODMNCj4gQ2M6IGdyYW50Lmxpa2VseUBzZWNyZXRs
YWIuY2E7IGRldmljZXRyZWUtZGlzY3Vzc0BsaXN0cy5vemxhYnMub3JnOw0KPiBsaW51eHBwYy1k
ZXZAbGlzdHMub3psYWJzLm9yZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsNCj4gcm9i
LmhlcnJpbmdAY2FseGVkYS5jb20NCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gb2Y6IHJlbW92ZSB0
aGUgdW5uZWNlc3Nhcnkgb2Zfbm9kZV9wdXQgZm9yDQo+IG9mX3BhcnNlX3BoYW5kbGVfd2l0aF9h
cmdzKCkNCj4gDQo+IEhpLA0KPiANCj4gT24gVHVlLCA5IEFwciAyMDEzIDE0OjU2OjA5ICswODAw
IDxZdWFudGlhbi5UYW5nQGZyZWVzY2FsZS5jb20+IHdyb3RlOg0KPiA+DQo+ID4gRnJvbTogVGFu
ZyBZdWFudGlhbiA8eXVhbnRpYW4udGFuZ0BmcmVlc2NhbGUuY29tPg0KPiA+DQo+ID4gQXMgdGhl
IGZ1bmN0aW9uIGl0c2VsZiBzYXlzIGl0IGlzIGNhbGxlcidzIHJlc3BvbnNpYmlsaXR5IHRvIGNh
bGwgdGhlDQo+ID4gb2Zfbm9kZV9wdXQoKS4gIFNvLCByZW1vdmUgaXQgb24gc3VjY2VzcyB0byBr
ZWVwIHRoZSByZWZlcmVuY2UgY291bnQNCj4gPiBjb3JyZWN0Lg0KPiA+DQo+ID4gU2lnbmVkLW9m
Zi1ieTogVGFuZyBZdWFudGlhbiA8WXVhbnRpYW4uVGFuZ0BmcmVlc2NhbGUuY29tPg0KPiA+IC0t
LQ0KPiA+ICBkcml2ZXJzL29mL2Jhc2UuYyB8IDMgLS0tDQo+ID4gIDEgZmlsZSBjaGFuZ2VkLCAz
IGRlbGV0aW9ucygtKQ0KPiA+DQo+ID4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvb2YvYmFzZS5jIGIv
ZHJpdmVycy9vZi9iYXNlLmMgaW5kZXgNCj4gPiAzMjFkM2VmLi5lOGI0YzI4IDEwMDY0NA0KPiA+
IC0tLSBhL2RyaXZlcnMvb2YvYmFzZS5jDQo+ID4gKysrIGIvZHJpdmVycy9vZi9iYXNlLmMNCj4g
PiBAQCAtMTE2OCw5ICsxMTY4LDYgQEAgc3RhdGljIGludCBfX29mX3BhcnNlX3BoYW5kbGVfd2l0
aF9hcmdzKGNvbnN0DQo+IHN0cnVjdCBkZXZpY2Vfbm9kZSAqbnAsDQo+ID4gIAkJCQkJb3V0X2Fy
Z3MtPmFyZ3NbaV0gPSBiZTMyX3RvX2NwdXAobGlzdCsrKTsNCj4gPiAgCQkJfQ0KPiA+DQo+ID4g
LQkJCS8qIEZvdW5kIGl0ISByZXR1cm4gc3VjY2VzcyAqLw0KPiA+IC0JCQlpZiAobm9kZSkNCj4g
PiAtCQkJCW9mX25vZGVfcHV0KG5vZGUpOw0KPiANCj4gQWN0dWFsbHksIGlmIG91dF9hcmdzIGlz
IE5VTEwsIHlvdSBzaG91bGQgZG8gdGhlIG9mX25vZGVfcHV0KG5vZGUpLCBzbw0KPiBwcm9iYWJs
eSB0aGUgY29ycmVjdCBmaXggaXMgdG8gYWRkIGFuICJlbHNlIiBiZWZvcmUgdGhlIGFib3ZlICJp
ZiIgKGFuZA0KPiBtb3ZlIHRoZSBjb21tZW50KS4NCj4gDQoNClllcywgSSBhbHJlYWR5IHNlbnQg
b3V0IHRoZSB2MiBwYXRjaC4NClBsZWFzZSBzZWU6IGh0dHA6Ly9wYXRjaHdvcmsub3psYWJzLm9y
Zy9wYXRjaC8yMzUyODgvDQoNClJlZ2FyZHMsDQpZdWFudGlhbg0KDQo+IC0tDQo+IENoZWVycywN
Cj4gU3RlcGhlbiBSb3Rod2VsbCAgICAgICAgICAgICAgICAgICAgc2ZyQGNhbmIuYXV1Zy5vcmcu
YXUNCj4gaHR0cDovL3d3dy5jYW5iLmF1dWcub3JnLmF1L35zZnIvDQo=

^ permalink raw reply

* Re: [PATCH] of: remove the unnecessary of_node_put for of_parse_phandle_with_args()
From: Stephen Rothwell @ 2013-04-10  9:03 UTC (permalink / raw)
  To: Yuantian.Tang; +Cc: devicetree-discuss, linuxppc-dev, linux-kernel, rob.herring
In-Reply-To: <1365490569-32455-1-git-send-email-Yuantian.Tang@freescale.com>

[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]

Hi,

On Tue, 9 Apr 2013 14:56:09 +0800 <Yuantian.Tang@freescale.com> wrote:
>
> From: Tang Yuantian <yuantian.tang@freescale.com>
> 
> As the function itself says it is caller's responsibility to call the
> of_node_put().  So, remove it on success to keep the reference count
> correct.
> 
> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
> ---
>  drivers/of/base.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 321d3ef..e8b4c28 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1168,9 +1168,6 @@ static int __of_parse_phandle_with_args(const struct device_node *np,
>  					out_args->args[i] = be32_to_cpup(list++);
>  			}
>  
> -			/* Found it! return success */
> -			if (node)
> -				of_node_put(node);

Actually, if out_args is NULL, you should do the of_node_put(node), so
probably the correct fix is to add an "else" before the above "if" (and
move the comment).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [RESEND PATCH 0/4] Add support for vmap_area_list.
From: Atsushi Kumagai @ 2013-04-10  6:05 UTC (permalink / raw)
  To: kexec; +Cc: linuxppc-dev

# CCing linuxppc-dev to get advice on ppc for makedumpfile.
# makedumpfile is a userspace utility to create a smaller dump file,
# you can download it from the following URL.
# http://sourceforge.net/projects/makedumpfile/

-----
Hello,

This patchset is to work with "remove vm_struct list management":

  https://lkml.org/lkml/2013/3/13/40

In more detail, this patchset try to use vmap_area_list to get the
start address of vmalloc region.

I tested this on i386/x86_64, but I can't do it on other 
architectures. So, please let me know if you find any problems.

---

Atsushi Kumagai (4):
      Introduce new symbols to look at the first vmap_area.
      Use vmap_area_list to get vmalloc_start for i386.
      Use vmap_area_list to get vmalloc_start for ppc32.
      Use vmap_area_list to get vmalloc_start for ppc64.

 arch/ppc.c     | 46 ++++++++++++++++++++++++++++++++--------------
 arch/ppc64.c   | 46 ++++++++++++++++++++++++++++++++--------------
 arch/x86.c     | 46 ++++++++++++++++++++++++++++++++--------------
 makedumpfile.c |  9 +++++++++
 makedumpfile.h |  5 +++++
 5 files changed, 110 insertions(+), 42 deletions(-)


Thanks
Atsushi Kumagai

^ permalink raw reply

* Re: [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage
From: Aneesh Kumar K.V @ 2013-04-10  8:52 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <87obdmom5o.fsf@linux.vnet.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Michael Ellerman <michael@ellerman.id.au> writes:
>
>> On Thu, Apr 04, 2013 at 11:27:44AM +0530, Aneesh Kumar K.V wrote:
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>> 
>>> We allocate one page for the last level of linux page table. With THP and
>>> large page size of 16MB, that would mean we are wasting large part
>>> of that page. To map 16MB area, we only need a PTE space of 2K with 64K
>>> page size. This patch reduce the space wastage by sharing the page
>>> allocated for the last level of linux page table with multiple pmd
>>> entries. We call these smaller chunks PTE page fragments and allocated
>>> page, PTE page.
>>
>> This is not compiling for me:
>>
>> arch/powerpc/mm/mmu_context_hash64.c:118:3: error: implicit declaration of function 'reset_page_mapcount'
>>
>
> can you share the .config ? I have the git tree at 
>
> git://github.com/kvaneesh/linux.git ppc64-thp-7

22b751c3d0376e86a377e3a0aa2ddbbe9d2eefc1 . Will rebase to latest linus tree.

-aneesh

^ permalink raw reply

* Re: [PATCH] powerpc: add a missing label in resume_kernel
From: tiejun.chen @ 2013-04-10  8:43 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc
In-Reply-To: <1365582684-11136-1-git-send-email-haokexin@gmail.com>

On 04/10/2013 04:31 PM, Kevin Hao wrote:
> A label 0 was missed in the patch a9c4e541 (powerpc/kprobe: Complete
> kprobe and migrate exception frame). This will cause the kernel
> branch to an undetermined address if there really has a conflict when
> updating the thread flags.
>
> Signed-off-by: Kevin Hao <haokexin@gmail.com>

Acked-By: Tiejun Chen <tiejun.chen@windriver.com>


> Cc: stable@vger.kernel.org
> ---
>   arch/powerpc/kernel/entry_64.S | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 256c5bf..ab079ed 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -657,7 +657,7 @@ resume_kernel:
>   	/* Clear _TIF_EMULATE_STACK_STORE flag */
>   	lis	r11,_TIF_EMULATE_STACK_STORE@h
>   	addi	r5,r9,TI_FLAGS
> -	ldarx	r4,0,r5
> +0:	ldarx	r4,0,r5
>   	andc	r4,r4,r11
>   	stdcx.	r4,0,r5
>   	bne-	0b
>

^ permalink raw reply

* [PATCH 4/4] powerpc/perf: Add support for SIER
From: Michael Ellerman @ 2013-04-10  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras
In-Reply-To: <1365582765-6939-1-git-send-email-michael@ellerman.id.au>

From: Michael Ellerman <michaele@au1.ibm.com>

On power8 we have a new SIER (Sampled Instruction Event Register), which
captures information about instructions when we have random sampling
enabled.

Add support for loading the SIER into pt_regs, overloading regs->dar.
Also set the new NO_SIPR flag in regs->result if we don't have SIPR.

Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.

Signed-off-by: Michael Ellerman <michaele@au1.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |    1 +
 arch/powerpc/perf/core-book3s.c              |   19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index e287aef..a1a1ad8 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -53,6 +53,7 @@ struct power_pmu {
 #define PPMU_NO_CONT_SAMPLING	0x00000008 /* no continuous sampling */
 #define PPMU_SIAR_VALID		0x00000010 /* Processor has SIAR Valid bit */
 #define PPMU_HAS_SSLOT		0x00000020 /* Has sampled slot in MMCRA */
+#define PPMU_HAS_SIER		0x00000040 /* Has SIER */
 
 /*
  * Values for flags to get_alternatives()
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4255b12..a4bbd4d 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -116,6 +116,9 @@ static bool regs_sihv(struct pt_regs *regs)
 {
 	unsigned long sihv = MMCRA_SIHV;
 
+	if (ppmu->flags & PPMU_HAS_SIER)
+		return !!(regs->dar & SIER_SIHV);
+
 	if (ppmu->flags & PPMU_ALT_SIPR)
 		sihv = POWER6_MMCRA_SIHV;
 
@@ -126,6 +129,9 @@ static bool regs_sipr(struct pt_regs *regs)
 {
 	unsigned long sipr = MMCRA_SIPR;
 
+	if (ppmu->flags & PPMU_HAS_SIER)
+		return !!(regs->dar & SIER_SIPR);
+
 	if (ppmu->flags & PPMU_ALT_SIPR)
 		sipr = POWER6_MMCRA_SIPR;
 
@@ -184,6 +190,7 @@ static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 /*
  * Overload regs->dsisr to store MMCRA so we only need to read it once
  * on each interrupt.
+ * Overload regs->dar to store SIER if we have it.
  * Overload regs->result to specify whether we should use the MSR (result
  * is zero) or the SIAR (result is non zero).
  */
@@ -200,6 +207,18 @@ static inline void perf_read_regs(struct pt_regs *regs)
 		regs->result |= 2;
 
 	/*
+	 * On power8 if we're in random sampling mode, the SIER is updated.
+	 * If we're in continuous sampling mode, we don't have SIPR.
+	 */
+	if (ppmu->flags & PPMU_HAS_SIER) {
+		if (marked)
+			regs->dar = mfspr(SPRN_SIER);
+		else
+			regs->result |= 2;
+	}
+
+
+	/*
 	 * If this isn't a PMU exception (eg a software event) the SIAR is
 	 * not valid. Use pt_regs.
 	 *
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/4] powerpc/perf: Add regs_no_sipr()
From: Michael Ellerman @ 2013-04-10  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras
In-Reply-To: <1365582765-6939-1-git-send-email-michael@ellerman.id.au>

From: Michael Ellerman <michaele@au1.ibm.com>

On power8 the presence or absence of SIPR depends on settings at runtime,
so convert to using a dynamic flag for NO_SIPR. Existing backends that
set NO_SIPR unconditionally set the dynamic flag obviously.

Signed-off-by: Michael Ellerman <michaele@au1.ibm.com>
---
 arch/powerpc/perf/core-book3s.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 770f359..4255b12 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -137,6 +137,11 @@ static bool regs_use_siar(struct pt_regs *regs)
 	return !!(regs->result & 1);
 }
 
+static bool regs_no_sipr(struct pt_regs *regs)
+{
+	return !!(regs->result & 2);
+}
+
 static inline u32 perf_flags_from_msr(struct pt_regs *regs)
 {
 	if (regs->msr & MSR_PR)
@@ -159,7 +164,7 @@ static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 	 * SIAR which should give slightly more reliable
 	 * results
 	 */
-	if (ppmu->flags & PPMU_NO_SIPR) {
+	if (regs_no_sipr(regs)) {
 		unsigned long siar = mfspr(SPRN_SIAR);
 		if (siar >= PAGE_OFFSET)
 			return PERF_RECORD_MISC_KERNEL;
@@ -189,6 +194,10 @@ static inline void perf_read_regs(struct pt_regs *regs)
 	int use_siar;
 
 	regs->dsisr = mmcra;
+	regs->result = 0;
+
+	if (ppmu->flags & PPMU_NO_SIPR)
+		regs->result |= 2;
 
 	/*
 	 * If this isn't a PMU exception (eg a software event) the SIAR is
@@ -213,12 +222,12 @@ static inline void perf_read_regs(struct pt_regs *regs)
 		use_siar = 1;
 	else if ((ppmu->flags & PPMU_NO_CONT_SAMPLING))
 		use_siar = 0;
-	else if (!(ppmu->flags & PPMU_NO_SIPR) && regs_sipr(regs))
+	else if (!regs_no_sipr(regs) && regs_sipr(regs))
 		use_siar = 0;
 	else
 		use_siar = 1;
 
-	regs->result = use_siar;
+	regs->result |= use_siar;
 }
 
 /*
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/4] powerpc/perf: Add an accessor for regs->result
From: Michael Ellerman @ 2013-04-10  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras
In-Reply-To: <1365582765-6939-1-git-send-email-michael@ellerman.id.au>

From: Michael Ellerman <michaele@au1.ibm.com>

Add an accessor for regs->result so we can use it to store more flags in
future.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/core-book3s.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index cb1618d..770f359 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -132,6 +132,11 @@ static bool regs_sipr(struct pt_regs *regs)
 	return !!(regs->dsisr & sipr);
 }
 
+static bool regs_use_siar(struct pt_regs *regs)
+{
+	return !!(regs->result & 1);
+}
+
 static inline u32 perf_flags_from_msr(struct pt_regs *regs)
 {
 	if (regs->msr & MSR_PR)
@@ -143,7 +148,7 @@ static inline u32 perf_flags_from_msr(struct pt_regs *regs)
 
 static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 {
-	unsigned long use_siar = regs->result;
+	bool use_siar = regs_use_siar(regs);
 
 	if (!use_siar)
 		return perf_flags_from_msr(regs);
@@ -1413,7 +1418,7 @@ unsigned long perf_misc_flags(struct pt_regs *regs)
  */
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	unsigned long use_siar = regs->result;
+	bool use_siar = regs_use_siar(regs);
 
 	if (use_siar && siar_valid(regs))
 		return mfspr(SPRN_SIAR) + perf_ip_adjust(regs);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/4] powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv()
From: Michael Ellerman @ 2013-04-10  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras

From: Michael Ellerman <michaele@au1.ibm.com>

On power8 the SIPR and SIHV are not in MMCRA, so convert the routines
to take regs and change the names accordingly.

Signed-off-by: Michael Ellerman <michaele@au1.ibm.com>
---
 arch/powerpc/perf/core-book3s.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index fcfafa0..cb1618d 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -112,24 +112,24 @@ static inline void perf_get_data_addr(struct pt_regs *regs, u64 *addrp)
 		*addrp = mfspr(SPRN_SDAR);
 }
 
-static bool mmcra_sihv(unsigned long mmcra)
+static bool regs_sihv(struct pt_regs *regs)
 {
 	unsigned long sihv = MMCRA_SIHV;
 
 	if (ppmu->flags & PPMU_ALT_SIPR)
 		sihv = POWER6_MMCRA_SIHV;
 
-	return !!(mmcra & sihv);
+	return !!(regs->dsisr & sihv);
 }
 
-static bool mmcra_sipr(unsigned long mmcra)
+static bool regs_sipr(struct pt_regs *regs)
 {
 	unsigned long sipr = MMCRA_SIPR;
 
 	if (ppmu->flags & PPMU_ALT_SIPR)
 		sipr = POWER6_MMCRA_SIPR;
 
-	return !!(mmcra & sipr);
+	return !!(regs->dsisr & sipr);
 }
 
 static inline u32 perf_flags_from_msr(struct pt_regs *regs)
@@ -143,7 +143,6 @@ static inline u32 perf_flags_from_msr(struct pt_regs *regs)
 
 static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 {
-	unsigned long mmcra = regs->dsisr;
 	unsigned long use_siar = regs->result;
 
 	if (!use_siar)
@@ -163,10 +162,12 @@ static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 	}
 
 	/* PR has priority over HV, so order below is important */
-	if (mmcra_sipr(mmcra))
+	if (regs_sipr(regs))
 		return PERF_RECORD_MISC_USER;
-	if (mmcra_sihv(mmcra) && (freeze_events_kernel != MMCR0_FCHV))
+
+	if (regs_sihv(regs) && (freeze_events_kernel != MMCR0_FCHV))
 		return PERF_RECORD_MISC_HYPERVISOR;
+
 	return PERF_RECORD_MISC_KERNEL;
 }
 
@@ -182,6 +183,8 @@ static inline void perf_read_regs(struct pt_regs *regs)
 	int marked = mmcra & MMCRA_SAMPLE_ENABLE;
 	int use_siar;
 
+	regs->dsisr = mmcra;
+
 	/*
 	 * If this isn't a PMU exception (eg a software event) the SIAR is
 	 * not valid. Use pt_regs.
@@ -205,12 +208,11 @@ static inline void perf_read_regs(struct pt_regs *regs)
 		use_siar = 1;
 	else if ((ppmu->flags & PPMU_NO_CONT_SAMPLING))
 		use_siar = 0;
-	else if (!(ppmu->flags & PPMU_NO_SIPR) && mmcra_sipr(mmcra))
+	else if (!(ppmu->flags & PPMU_NO_SIPR) && regs_sipr(regs))
 		use_siar = 0;
 	else
 		use_siar = 1;
 
-	regs->dsisr = mmcra;
 	regs->result = use_siar;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] powerpc: add a missing label in resume_kernel
From: Kevin Hao @ 2013-04-10  8:31 UTC (permalink / raw)
  To: Tiejun Chen, Benjamin Herrenschmidt; +Cc: linuxppc

A label 0 was missed in the patch a9c4e541 (powerpc/kprobe: Complete
kprobe and migrate exception frame). This will cause the kernel
branch to an undetermined address if there really has a conflict when
updating the thread flags.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
---
 arch/powerpc/kernel/entry_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 256c5bf..ab079ed 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -657,7 +657,7 @@ resume_kernel:
 	/* Clear _TIF_EMULATE_STACK_STORE flag */
 	lis	r11,_TIF_EMULATE_STACK_STORE@h
 	addi	r5,r9,TI_FLAGS
-	ldarx	r4,0,r5
+0:	ldarx	r4,0,r5
 	andc	r4,r4,r11
 	stdcx.	r4,0,r5
 	bne-	0b
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH v2 2/11] Add PRRN Event Handler
From: Michael Ellerman @ 2013-04-10  8:30 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev
In-Reply-To: <51509CF0.10200@linux.vnet.ibm.com>

On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> A PRRN event is signaled via the RTAS event-scan mechanism, which
> returns a Hot Plug Event message "fixed part" indicating "Platform
> Resource Reassignment". In response to the Hot Plug Event message,
> we must call ibm,update-nodes to determine which resources were
> reassigned and then ibm,update-properties to obtain the new affinity
> information about those resources.
..

> Index: powerpc/arch/powerpc/kernel/rtasd.c
> ===================================================================
> --- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-03-20 08:24:14.000000000 -0500
> +++ powerpc/arch/powerpc/kernel/rtasd.c	2013-03-20 08:52:08.000000000 -0500
> @@ -87,6 +87,8 @@
>  			return "Resource Deallocation Event";
>  		case RTAS_TYPE_DUMP:
>  			return "Dump Notification Event";
> +		case RTAS_TYPE_PRRN:
> +			return "Platform Resource Reassignment Event";
>  	}
>  
>  	return rtas_type[0];
> @@ -265,7 +267,38 @@
>  		spin_unlock_irqrestore(&rtasd_log_lock, s);
>  		return;
>  	}
> +}
> +
> +static s32 update_scope;
> +
> +static void prrn_work_fn(struct work_struct *work)
> +{
> +	/*
> +	 * For PRRN, we must pass the negative of the scope value in
> +	 * the RTAS event.
> +	 */
> +	pseries_devicetree_update(-update_scope);
> +}
> +static DECLARE_WORK(prrn_work, prrn_work_fn);

This breaks the 32-bit build (ppc6xx_defconfig):

arch/powerpc/kernel/rtasd.c:280: undefined reference to `pseries_devicetree_update'

cheers

^ permalink raw reply

* Re: [PATCH -V5 08/25] powerpc: Decode the pte-lp-encoding bits correctly.
From: Aneesh Kumar K.V @ 2013-04-10  8:11 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130410071915.GI8165@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Thu, Apr 04, 2013 at 11:27:46AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> We look at both the segment base page size and actual page size and store
>> the pte-lp-encodings in an array per base page size.
>> 
>> We also update all relevant functions to take actual page size argument
>> so that we can use the correct PTE LP encoding in HPTE. This should also
>> get the basic Multiple Page Size per Segment (MPSS) support. This is needed
>> to enable THP on ppc64.
>> 

....

>> +static inline int hpte_actual_psize(struct hash_pte *hptep, int psize)
>> +{
>> +	int i, shift;
>> +	unsigned int mask;
>> +	/* Look at the 8 bit LP value */
>> +	unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
>> +
>> +	if (!(hptep->v & HPTE_V_VALID))
>> +		return -1;
>
> Folding the validity check into the size check seems confusing to me.

We do end up with invalid hpte with which we call
hpte_actual_psize. So that check is needed. I can either move to caller,
but then i will have to replicate it in all the call sites.


>
>> +	/* First check if it is large page */
>> +	if (!(hptep->v & HPTE_V_LARGE))
>> +		return MMU_PAGE_4K;
>> +
>> +	/* start from 1 ignoring MMU_PAGE_4K */
>> +	for (i = 1; i < MMU_PAGE_COUNT; i++) {
>> +		/* valid entries have a shift value */
>> +		if (!mmu_psize_defs[i].shift)
>> +			continue;
>
> Isn't this check redundant with the one below?

Yes. I guess we can safely assume that if penc is valid then we do
support that specific large page.

I will drop this and keep the penc check. That is more correct check

>
>> +		/* invalid penc */
>> +		if (mmu_psize_defs[psize].penc[i] == -1)
>> +			continue;
>> +		/*
>> +		 * encoding bits per actual page size
>> +		 *        PTE LP     actual page size
>> +		 *    rrrr rrrz		>=8KB
>> +		 *    rrrr rrzz		>=16KB
>> +		 *    rrrr rzzz		>=32KB
>> +		 *    rrrr zzzz		>=64KB
>> +		 * .......
>> +		 */
>> +		shift = mmu_psize_defs[i].shift - LP_SHIFT;
>> +		if (shift > LP_BITS)
>> +			shift = LP_BITS;
>> +		mask = (1 << shift) - 1;
>> +		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
>> +			return i;
>> +	}
>
> Shouldn't we have a BUG() or something here.  If we get here we've
> somehow created a PTE with LP bits we can't interpret, yes?
>

I don't know. Is BUG() the right thing to do ? 


>> +	return -1;
>> +}
>> +
>>  static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
>>  				 unsigned long vpn, int psize, int ssize,
>>  				 int local)
>> @@ -251,6 +294,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
>>  	struct hash_pte *hptep = htab_address + slot;
>>  	unsigned long hpte_v, want_v;
>>  	int ret = 0;
>> +	int actual_psize;
>>  
>>  	want_v = hpte_encode_avpn(vpn, psize, ssize);
>>  
>> @@ -260,9 +304,13 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
>>  	native_lock_hpte(hptep);
>>  
>>  	hpte_v = hptep->v;
>> -
>> +	actual_psize = hpte_actual_psize(hptep, psize);
>> +	if (actual_psize < 0) {
>> +		native_unlock_hpte(hptep);
>> +		return -1;
>> +	}
>
> Wouldn't it make more sense to only do the psize lookup once you've
> found a matching hpte?

But we need to do psize lookup even if V_COMPARE fail, because we want
to do tlbie in both the case.

>
>>  	/* Even if we miss, we need to invalidate the TLB */
>> -	if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) {
>> +	if (!HPTE_V_COMPARE(hpte_v, want_v)) {
>>  		DBG_LOW(" -> miss\n");
>>  		ret = -1;
>>  	} else {
>> @@ -274,7 +322,7 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
>>  	native_unlock_hpte(hptep);
>>  
>>  	/* Ensure it is out of the tlb too. */
>> -	tlbie(vpn, psize, ssize, local);
>> +	tlbie(vpn, psize, actual_psize, ssize, local);
>>  
>>  	return ret;
>>  }
>> @@ -315,6 +363,7 @@ static long native_hpte_find(unsigned long vpn, int psize, int ssize)
>>  static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>>  				       int psize, int ssize)
>>  {
>> +	int actual_psize;
>>  	unsigned long vpn;
>>  	unsigned long vsid;
>>  	long slot;
>> @@ -327,13 +376,16 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>>  	if (slot == -1)
>>  		panic("could not find page to bolt\n");
>>  	hptep = htab_address + slot;
>> +	actual_psize = hpte_actual_psize(hptep, psize);
>> +	if (actual_psize < 0)
>> +		return;
>>  
>>  	/* Update the HPTE */
>>  	hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) |
>>  		(newpp & (HPTE_R_PP | HPTE_R_N));
>>  
>>  	/* Ensure it is out of the tlb too. */
>> -	tlbie(vpn, psize, ssize, 0);
>> +	tlbie(vpn, psize, actual_psize, ssize, 0);
>>  }
>>  
>>  static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
>> @@ -343,6 +395,7 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>  	unsigned long hpte_v;
>>  	unsigned long want_v;
>>  	unsigned long flags;
>> +	int actual_psize;
>>  
>>  	local_irq_save(flags);
>>  
>> @@ -352,35 +405,38 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>  	native_lock_hpte(hptep);
>>  	hpte_v = hptep->v;
>>  
>> +	actual_psize = hpte_actual_psize(hptep, psize);
>> +	if (actual_psize < 0) {
>> +		native_unlock_hpte(hptep);
>> +		local_irq_restore(flags);
>> +		return;
>> +	}
>>  	/* Even if we miss, we need to invalidate the TLB */
>> -	if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
>> +	if (!HPTE_V_COMPARE(hpte_v, want_v))
>>  		native_unlock_hpte(hptep);
>>  	else
>>  		/* Invalidate the hpte. NOTE: this also unlocks it */
>>  		hptep->v = 0;
>>  
>>  	/* Invalidate the TLB */
>> -	tlbie(vpn, psize, ssize, local);
>> +	tlbie(vpn, psize, actual_psize, ssize, local);
>>  
>>  	local_irq_restore(flags);
>>  }
>>  
>> -#define LP_SHIFT	12
>> -#define LP_BITS		8
>> -#define LP_MASK(i)	((0xFF >> (i)) << LP_SHIFT)
>> -
>>  static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
>> -			int *psize, int *ssize, unsigned long *vpn)
>> +			int *psize, int *apsize, int *ssize, unsigned long *vpn)
>>  {
>>  	unsigned long avpn, pteg, vpi;
>>  	unsigned long hpte_r = hpte->r;
>>  	unsigned long hpte_v = hpte->v;
>>  	unsigned long vsid, seg_off;
>> -	int i, size, shift, penc;
>> +	int i, size, a_size, shift, penc;
>>  
>> -	if (!(hpte_v & HPTE_V_LARGE))
>> -		size = MMU_PAGE_4K;
>> -	else {
>> +	if (!(hpte_v & HPTE_V_LARGE)) {
>> +		size   = MMU_PAGE_4K;
>> +		a_size = MMU_PAGE_4K;
>> +	} else {
>>  		for (i = 0; i < LP_BITS; i++) {
>>  			if ((hpte_r & LP_MASK(i+1)) == LP_MASK(i+1))
>>  				break;
>> @@ -388,19 +444,26 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
>>  		penc = LP_MASK(i+1) >> LP_SHIFT;
>>  		for (size = 0; size < MMU_PAGE_COUNT; size++) {
>
>>  
>> -			/* 4K pages are not represented by LP */
>> -			if (size == MMU_PAGE_4K)
>> -				continue;
>> -
>>  			/* valid entries have a shift value */
>>  			if (!mmu_psize_defs[size].shift)
>>  				continue;
>> +			for (a_size = 0; a_size < MMU_PAGE_COUNT; a_size++) {
>
> Can't you resize hpte_actual_psize() here instead of recoding the
> lookup?

I thought about that, but re-coding avoided some repeated check. But
then, if I follow your review comments of avoiding hpte valid check etc, may
be I can reuse the hpte_actual_psize. Will try this. 


>
>> -			if (penc == mmu_psize_defs[size].penc)
>> -				break;
>> +				/* 4K pages are not represented by LP */
>> +				if (a_size == MMU_PAGE_4K)
>> +					continue;
>> +
>> +				/* valid entries have a shift value */
>> +				if (!mmu_psize_defs[a_size].shift)
>> +					continue;
>> +
>> +				if (penc == mmu_psize_defs[size].penc[a_size])
>> +					goto out;
>> +			}
>>  		}
>>  	}
>>  
>> +out:

-aneesh

^ permalink raw reply

* Re: [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage
From: Aneesh Kumar K.V @ 2013-04-10  7:54 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130410071453.GB24786@concordia>

Michael Ellerman <michael@ellerman.id.au> writes:

> On Thu, Apr 04, 2013 at 11:27:44AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> We allocate one page for the last level of linux page table. With THP and
>> large page size of 16MB, that would mean we are wasting large part
>> of that page. To map 16MB area, we only need a PTE space of 2K with 64K
>> page size. This patch reduce the space wastage by sharing the page
>> allocated for the last level of linux page table with multiple pmd
>> entries. We call these smaller chunks PTE page fragments and allocated
>> page, PTE page.
>
> This is not compiling for me:
>
> arch/powerpc/mm/mmu_context_hash64.c:118:3: error: implicit declaration of function 'reset_page_mapcount'
>

can you share the .config ? I have the git tree at 

git://github.com/kvaneesh/linux.git ppc64-thp-7

-aneesh

^ permalink raw reply

* Re: [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage
From: Aneesh Kumar K.V @ 2013-04-10  7:53 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130410070403.GH8165@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Wed, Apr 10, 2013 at 11:59:29AM +0530, Aneesh Kumar K.V wrote:
>> David Gibson <dwg@au1.ibm.com> writes:
>> > On Thu, Apr 04, 2013 at 11:27:44AM +0530, Aneesh Kumar K.V wrote:
> [snip]
>> >> @@ -97,13 +100,45 @@ void __destroy_context(int context_id)
>> >>  }
>> >>  EXPORT_SYMBOL_GPL(__destroy_context);
>> >>  
>> >> +#ifdef CONFIG_PPC_64K_PAGES
>> >> +static void destroy_pagetable_page(struct mm_struct *mm)
>> >> +{
>> >> +	int count;
>> >> +	struct page *page;
>> >> +
>> >> +	page = mm->context.pgtable_page;
>> >> +	if (!page)
>> >> +		return;
>> >> +
>> >> +	/* drop all the pending references */
>> >> +	count = atomic_read(&page->_mapcount) + 1;
>> >> +	/* We allow PTE_FRAG_NR(16) fragments from a PTE page */
>> >> +	count = atomic_sub_return(16 - count, &page->_count);
>> >
>> > You should really move PTE_FRAG_NR to a header so you can actually use
>> > it here rather than hard coding 16.
>> >
>> > It took me a fair while to convince myself that there is no race here
>> > with something altering mapcount and count between the atomic_read()
>> > and the atomic_sub_return().  It could do with a comment to explain
>> > why that is safe.
>> >
>> > Re-using the mapcount field for your index also seems odd, and it took
>> > me a while to convince myself that that's safe too.  Wouldn't it be
>> > simpler to store a pointer to the next sub-page in the mm_context
>> > instead? You can get from that to the struct page easily enough with a
>> > shift and pfn_to_page().
>> 
>> I found using _mapcount simpler in this case. I was looking at it not
>> as an index, but rather how may fragments are mapped/used already.
>
> Except that it's actually (#fragments - 1).  Using subpage pointer
> makes the fragments calculation (very slightly) harder, but the
> calculation of the table address easier.  More importantly it avoids
> adding effectively an extra variable - which is then shoehorned into a
> structure not really designed to hold it.

Even with subpage pointer we would need mm->context.pgtable_page or
something similar. We don't add any other extra variable right ?. Let me
try what you are suggesting here and see if that make it simpler.


>> Using
>> subpage pointer in mm->context.xyz means, we have to calculate the
>> number of fragments used/mapped via the pointer. We need the fragment
>> count so that we can drop page reference count correctly here.
>> 
>> 
>> >
>> >> +	if (!count) {
>> >> +		pgtable_page_dtor(page);
>> >> +		reset_page_mapcount(page);
>> >> +		free_hot_cold_page(page, 0);
>> >
>> > It would be nice to use put_page() somehow instead of duplicating its
>> > logic, though I realise the sparc code you've based this on does the
>> > same thing.
>> 
>> That is not exactly put_page. We can avoid lots of check in this
>> specific case.
>
> [snip]
>> >> +static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
>> >> +{
>> >> +	pte_t *ret = NULL;
>> >> +	struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
>> >> +				       __GFP_REPEAT | __GFP_ZERO);
>> >> +	if (!page)
>> >> +		return NULL;
>> >> +
>> >> +	spin_lock(&mm->page_table_lock);
>> >> +	/*
>> >> +	 * If we find pgtable_page set, we return
>> >> +	 * the allocated page with single fragement
>> >> +	 * count.
>> >> +	 */
>> >> +	if (likely(!mm->context.pgtable_page)) {
>> >> +		atomic_set(&page->_count, PTE_FRAG_NR);
>> >> +		atomic_set(&page->_mapcount, 0);
>> >> +		mm->context.pgtable_page = page;
>> >> +	}
>> >
>> > .. and in the unlikely case where there *is* a pgtable_page already
>> > set, what then?  Seems like you should BUG_ON, or at least return NULL
>> > - as it is you will return the first sub-page of that page again,
>> > which is very likely in use.
>> 
>> 
>> As explained in the comment above, we return with the allocated page
>> with fragment count set to 1. So we end up having only one fragment. The
>> other option I had was to to free the allocated page and do a
>> get_from_cache under the page_table_lock. But since we already allocated
>> the page, why not use that ?. It also keep the code similar to
>> sparc.
>
> My point is that I can't see any circumstance under which we should
> ever hit this case.  Which means if we do something is badly messed up
> and we should BUG() (or at least WARN()).

A multi threaded test would easily hit that. stream is the test I used.

-aneesh

^ permalink raw reply

* [RESEND PATCH 4/4] Use vmap_area_list to get vmalloc_start for ppc64.
From: Atsushi Kumagai @ 2013-04-10  7:20 UTC (permalink / raw)
  To: kexec; +Cc: linuxppc-dev
In-Reply-To: <20130410150524.804cd23b99a697f71146be67@mxc.nes.nec.co.jp>

From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Date: Fri, 15 Mar 2013 19:34:30 +0900
Subject: [PATCH 4/4] Use vmap_area_list to get vmalloc_start for ppc64.

Try to get vmalloc_start value from vmap_area_list first for
newer ppc64 kernels.

Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
---
 arch/ppc64.c | 44 +++++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/ppc64.c b/arch/ppc64.c
index afbaf55..c229ede 100644
--- a/arch/ppc64.c
+++ b/arch/ppc64.c
@@ -66,22 +66,40 @@ get_machdep_info_ppc64(void)
 	DEBUG_MSG("kernel_start : %lx\n", info->kernel_start);
 
 	/*
-	 * For the compatibility, makedumpfile should run without the symbol
-	 * vmlist and the offset of vm_struct.addr if they are not necessary.
+	 * Get vmalloc_start value from either vmap_area_list or vmlist.
 	 */
-	if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
-	    || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
+	if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
+	    && (OFFSET(vmap_area.va_start) != NOT_FOUND_STRUCTURE)
+	    && (OFFSET(vmap_area.list) != NOT_FOUND_STRUCTURE)) {
+		if (!readmem(VADDR, SYMBOL(vmap_area_list) + OFFSET(list_head.next),
+			     &vmap_area_list, sizeof(vmap_area_list))) {
+			ERRMSG("Can't get vmap_area_list.\n");
+			return FALSE;
+		}
+		if (!readmem(VADDR, vmap_area_list - OFFSET(vmap_area.list) +
+			     OFFSET(vmap_area.va_start), &vmalloc_start,
+			     sizeof(vmalloc_start))) {
+			ERRMSG("Can't get vmalloc_start.\n");
+			return FALSE;
+		}
+	} else if ((SYMBOL(vmlist) != NOT_FOUND_SYMBOL)
+		   && (OFFSET(vm_struct.addr) != NOT_FOUND_STRUCTURE)) {
+		if (!readmem(VADDR, SYMBOL(vmlist), &vmlist, sizeof(vmlist))) {
+			ERRMSG("Can't get vmlist.\n");
+			return FALSE;
+		}
+		if (!readmem(VADDR, vmlist + OFFSET(vm_struct.addr), &vmalloc_start,
+			     sizeof(vmalloc_start))) {
+			ERRMSG("Can't get vmalloc_start.\n");
+			return FALSE;
+		}
+	} else {
+		/*
+		 * For the compatibility, makedumpfile should run without the symbol
+		 * vmlist and the offset of vm_struct.addr if they are not necessary.
+		 */
 		return TRUE;
 	}
-	if (!readmem(VADDR, SYMBOL(vmlist), &vmlist, sizeof(vmlist))) {
-		ERRMSG("Can't get vmlist.\n");
-		return FALSE;
-	}
-	if (!readmem(VADDR, vmlist + OFFSET(vm_struct.addr), &vmalloc_start,
-	    sizeof(vmalloc_start))) {
-		ERRMSG("Can't get vmalloc_start.\n");
-		return FALSE;
-	}
 	info->vmalloc_start = vmalloc_start;
 	DEBUG_MSG("vmalloc_start: %lx\n", vmalloc_start);
 
-- 
1.8.0.2

^ permalink raw reply related

* [PATCH 8/8] Read common partition via pstore
From: Aruna Balakrishnaiah @ 2013-04-10  7:24 UTC (permalink / raw)
  To: linuxppc-dev, paulus, linux-kernel, benh; +Cc: jkenisto, mahesh, anton
In-Reply-To: <20130410071835.20150.56489.stgit@aruna-ThinkPad-T420>

This patch exploits pstore infrastructure to read the details
from NVRAM's common partition.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
---
 arch/powerpc/platforms/pseries/nvram.c |   17 ++++++++++++++++-
 fs/pstore/inode.c                      |    3 +++
 include/linux/pstore.h                 |    1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index b65a670..542dc7e 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -84,6 +84,12 @@ static struct nvram_os_partition of_config_partition = {
 	.index = -1,
 	.os_partition = false
 };
+
+static struct nvram_os_partition common_partition = {
+	.name = "common",
+	.index = -1,
+	.os_partition = false
+};
 #endif
 
 struct oops_log_info {
@@ -157,6 +163,7 @@ static enum pstore_type_id nvram_type_ids[] = {
 	PSTORE_TYPE_DMESG,
 	PSTORE_TYPE_RTAS,
 	PSTORE_TYPE_OF,
+	PSTORE_TYPE_COMMON,
 	-1
 };
 static int read_type;
@@ -770,7 +777,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
 }
 
 /*
- * Reads the oops/panic report, rtas-log and of-config partition.
+ * Reads the oops/panic report, rtas-log, of-config and common partition.
  * Returns the length of the data we read from each partition.
  * Returns 0 if we've been called before.
  */
@@ -806,6 +813,14 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 		time->tv_sec = 0;
 		time->tv_nsec = 0;
 		break;
+	case PSTORE_TYPE_COMMON:
+		sig = NVRAM_SIG_SYS;
+		part = &common_partition;
+		*type = PSTORE_TYPE_COMMON;
+		*id = PSTORE_TYPE_COMMON;
+		time->tv_sec = 0;
+		time->tv_nsec = 0;
+		break;
 	default:
 		return 0;
 	}
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index c3d1846..11cae64 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -330,6 +330,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count,
 	case PSTORE_TYPE_OF:
 		sprintf(name, "of-%s-%lld", psname, id);
 		break;
+	case PSTORE_TYPE_COMMON:
+		sprintf(name, "common-%s-%lld", psname, id);
+		break;
 	case PSTORE_TYPE_UNKNOWN:
 		sprintf(name, "unknown-%s-%lld", psname, id);
 		break;
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index a23d7d2..08224c2 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -38,6 +38,7 @@ enum pstore_type_id {
 	/* PPC64 partition types */
 	PSTORE_TYPE_RTAS	= 10,
 	PSTORE_TYPE_OF		= 11,
+	PSTORE_TYPE_COMMON	= 12,
 	PSTORE_TYPE_UNKNOWN	= 255
 };
 

^ permalink raw reply related

* [PATCH 7/8] Read of-config partition via pstore
From: Aruna Balakrishnaiah @ 2013-04-10  7:24 UTC (permalink / raw)
  To: linuxppc-dev, paulus, linux-kernel, benh; +Cc: jkenisto, mahesh, anton
In-Reply-To: <20130410071835.20150.56489.stgit@aruna-ThinkPad-T420>

This patch exploits pstore infrastructure to read the details
from NVRAM's of-config partition.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
---
 arch/powerpc/platforms/pseries/nvram.c |   58 ++++++++++++++++++++++++++------
 fs/pstore/inode.c                      |    3 ++
 include/linux/pstore.h                 |    1 +
 3 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index 6a3a7cd..b65a670 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -78,6 +78,14 @@ static const char *pseries_nvram_os_partitions[] = {
 	NULL
 };
 
+#ifdef CONFIG_PSTORE
+static struct nvram_os_partition of_config_partition = {
+	.name = "of-config",
+	.index = -1,
+	.os_partition = false
+};
+#endif
+
 struct oops_log_info {
 	u16 version;
 	u16 report_length;
@@ -148,6 +156,7 @@ static size_t oops_data_sz;
 static enum pstore_type_id nvram_type_ids[] = {
 	PSTORE_TYPE_DMESG,
 	PSTORE_TYPE_RTAS,
+	PSTORE_TYPE_OF,
 	-1
 };
 static int read_type;
@@ -350,11 +359,15 @@ int nvram_read_partition(struct nvram_os_partition *part, char *buff,
 
 	tmp_index = part->index;
 
-	rc = ppc_md.nvram_read((char *)&info, sizeof(struct err_log_info), &tmp_index);
-	if (rc <= 0) {
-		printk(KERN_ERR "nvram_read_partition: "
-				"Failed nvram_read (%d)\n", rc);
-		return rc;
+	if (part->os_partition) {
+		rc = ppc_md.nvram_read((char *)&info,
+					sizeof(struct err_log_info),
+					&tmp_index);
+		if (rc <= 0) {
+			printk(KERN_ERR "nvram_read_partition: "
+					"Failed nvram_read (%d)\n", rc);
+			return rc;
+		}
 	}
 
 	rc = ppc_md.nvram_read(buff, length, &tmp_index);
@@ -364,8 +377,10 @@ int nvram_read_partition(struct nvram_os_partition *part, char *buff,
 		return rc;
 	}
 
-	*error_log_cnt = info.seq_num;
-	*err_type = info.error_type;
+	if (part->os_partition) {
+		*error_log_cnt = info.seq_num;
+		*err_type = info.error_type;
+	}
 
 	return 0;
 }
@@ -755,7 +770,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
 }
 
 /*
- * Reads the oops/panic report and ibm,rtas-log partition.
+ * Reads the oops/panic report, rtas-log and of-config partition.
  * Returns the length of the data we read from each partition.
  * Returns 0 if we've been called before.
  */
@@ -764,9 +779,11 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 				struct pstore_info *psi)
 {
 	struct oops_log_info *oops_hdr;
-	unsigned int err_type, id_no;
+	unsigned int err_type, id_no, size = 0;
 	struct nvram_os_partition *part = NULL;
 	char *buff = NULL;
+	int sig = 0;
+	loff_t p;
 
 	read_type++;
 
@@ -781,10 +798,29 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 		time->tv_sec = last_rtas_event;
 		time->tv_nsec = 0;
 		break;
+	case PSTORE_TYPE_OF:
+		sig = NVRAM_SIG_OF;
+		part = &of_config_partition;
+		*type = PSTORE_TYPE_OF;
+		*id = PSTORE_TYPE_OF;
+		time->tv_sec = 0;
+		time->tv_nsec = 0;
+		break;
 	default:
 		return 0;
 	}
 
+	if (!part->os_partition) {
+		p = nvram_find_partition(part->name, sig, &size);
+		if (p <= 0) {
+			pr_err("nvram: Failed to find partition %s, "
+				"err %d\n", part->name, (int)p);
+			return 0;
+		}
+		part->index = p;
+		part->size = size;
+	}
+
 	buff = kmalloc(part->size, GFP_KERNEL);
 
 	if (!buff)
@@ -796,7 +832,9 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 	}
 
 	*count = 0;
-	*id = id_no;
+
+	if (part->os_partition)
+		*id = id_no;
 
 	if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) {
 		oops_hdr = (struct oops_log_info *)buff;
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 59b1454..c3d1846 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -327,6 +327,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count,
 	case PSTORE_TYPE_RTAS:
 		sprintf(name, "rtas-%s-%lld", psname, id);
 		break;
+	case PSTORE_TYPE_OF:
+		sprintf(name, "of-%s-%lld", psname, id);
+		break;
 	case PSTORE_TYPE_UNKNOWN:
 		sprintf(name, "unknown-%s-%lld", psname, id);
 		break;
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 4eb94c9..a23d7d2 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -37,6 +37,7 @@ enum pstore_type_id {
 	PSTORE_TYPE_FTRACE	= 3,
 	/* PPC64 partition types */
 	PSTORE_TYPE_RTAS	= 10,
+	PSTORE_TYPE_OF		= 11,
 	PSTORE_TYPE_UNKNOWN	= 255
 };
 

^ permalink raw reply related

* [PATCH 6/8] Distinguish between a os-partition and non-os partition
From: Aruna Balakrishnaiah @ 2013-04-10  7:23 UTC (permalink / raw)
  To: linuxppc-dev, paulus, linux-kernel, benh; +Cc: jkenisto, mahesh, anton
In-Reply-To: <20130410071835.20150.56489.stgit@aruna-ThinkPad-T420>

Introduce os_partition member in nvram_os_partition structure
to identify if the partition is an os partition or not. This
will be useful to handle non-os partitions of-config and
common in subsequent patches.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
---
 arch/powerpc/platforms/pseries/nvram.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index d420b1d..6a3a7cd 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -53,20 +53,23 @@ struct nvram_os_partition {
 	int min_size;	/* minimum acceptable size (0 means req_size) */
 	long size;	/* size of data portion (excluding err_log_info) */
 	long index;	/* offset of data portion of partition */
+	bool os_partition; /* partition initialized by OS, not FW */
 };
 
 static struct nvram_os_partition rtas_log_partition = {
 	.name = "ibm,rtas-log",
 	.req_size = 2079,
 	.min_size = 1055,
-	.index = -1
+	.index = -1,
+	.os_partition = true
 };
 
 static struct nvram_os_partition oops_log_partition = {
 	.name = "lnx,oops-log",
 	.req_size = 4000,
 	.min_size = 2000,
-	.index = -1
+	.index = -1,
+	.os_partition = true
 };
 
 static const char *pseries_nvram_os_partitions[] = {

^ permalink raw reply related

* [PATCH 5/8] Read rtas partition via pstore
From: Aruna Balakrishnaiah @ 2013-04-10  7:23 UTC (permalink / raw)
  To: linuxppc-dev, paulus, linux-kernel, benh; +Cc: jkenisto, mahesh, anton
In-Reply-To: <20130410071835.20150.56489.stgit@aruna-ThinkPad-T420>

This patch exploits pstore infrastructure to read the details
from NVRAM's rtas partition.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
---
 arch/powerpc/platforms/pseries/nvram.c |   33 +++++++++++++++++++++++++-------
 fs/pstore/inode.c                      |    3 +++
 include/linux/pstore.h                 |    2 ++
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index 82d32a2..d420b1d 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -144,9 +144,11 @@ static size_t oops_data_sz;
 #ifdef CONFIG_PSTORE
 static enum pstore_type_id nvram_type_ids[] = {
 	PSTORE_TYPE_DMESG,
+	PSTORE_TYPE_RTAS,
 	-1
 };
 static int read_type;
+static unsigned long last_rtas_event;
 #endif
 /* Compression parameters */
 #define COMPR_LEVEL 6
@@ -315,8 +317,13 @@ int nvram_write_error_log(char * buff, int length,
 {
 	int rc = nvram_write_os_partition(&rtas_log_partition, buff, length,
 						err_type, error_log_cnt);
-	if (!rc)
+	if (!rc) {
 		last_unread_rtas_event = get_seconds();
+#ifdef CONFIG_PSTORE
+		last_rtas_event = get_seconds();
+#endif
+	}
+
 	return rc;
 }
 
@@ -745,7 +752,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
 }
 
 /*
- * Reads the oops/panic report.
+ * Reads the oops/panic report and ibm,rtas-log partition.
  * Returns the length of the data we read from each partition.
  * Returns 0 if we've been called before.
  */
@@ -765,6 +772,12 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 		part = &oops_log_partition;
 		*type = PSTORE_TYPE_DMESG;
 		break;
+	case PSTORE_TYPE_RTAS:
+		part = &rtas_log_partition;
+		*type = PSTORE_TYPE_RTAS;
+		time->tv_sec = last_rtas_event;
+		time->tv_nsec = 0;
+		break;
 	default:
 		return 0;
 	}
@@ -781,11 +794,17 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
 
 	*count = 0;
 	*id = id_no;
-	oops_hdr = (struct oops_log_info *)buff;
-	*buf = buff + sizeof(*oops_hdr);
-	time->tv_sec = oops_hdr->timestamp;
-	time->tv_nsec = 0;
-	return oops_hdr->report_length;
+
+	if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) {
+		oops_hdr = (struct oops_log_info *)buff;
+		*buf = buff + sizeof(*oops_hdr);
+		time->tv_sec = oops_hdr->timestamp;
+		time->tv_nsec = 0;
+		return oops_hdr->report_length;
+	}
+
+	*buf = buff;
+	return part->size;
 }
 #else
 static int nvram_pstore_open(struct pstore_info *psi)
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index e4bcb2c..59b1454 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -324,6 +324,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count,
 	case PSTORE_TYPE_MCE:
 		sprintf(name, "mce-%s-%lld", psname, id);
 		break;
+	case PSTORE_TYPE_RTAS:
+		sprintf(name, "rtas-%s-%lld", psname, id);
+		break;
 	case PSTORE_TYPE_UNKNOWN:
 		sprintf(name, "unknown-%s-%lld", psname, id);
 		break;
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 75d0176..4eb94c9 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -35,6 +35,8 @@ enum pstore_type_id {
 	PSTORE_TYPE_MCE		= 1,
 	PSTORE_TYPE_CONSOLE	= 2,
 	PSTORE_TYPE_FTRACE	= 3,
+	/* PPC64 partition types */
+	PSTORE_TYPE_RTAS	= 10,
 	PSTORE_TYPE_UNKNOWN	= 255
 };
 

^ permalink raw reply related

* [PATCH 4/8] Read/Write oops nvram partition via pstore
From: Aruna Balakrishnaiah @ 2013-04-10  7:23 UTC (permalink / raw)
  To: linuxppc-dev, paulus, linux-kernel, benh; +Cc: jkenisto, mahesh, anton
In-Reply-To: <20130410071835.20150.56489.stgit@aruna-ThinkPad-T420>

This patch exploits pstore infrastructure in power systems.
IBM's system p machines provide persistent storage for LPARs
through NVRAM. NVRAM's lnx,oops-log partition is used to log
oops messages. In case pstore registration fails it will
fall back to kmsg_dump mechanism.

This patch will read/write the oops messages from/to this
partition via pstore.

Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/nvram.c |  145 ++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
index 6701b71..82d32a2 100644
--- a/arch/powerpc/platforms/pseries/nvram.c
+++ b/arch/powerpc/platforms/pseries/nvram.c
@@ -18,6 +18,7 @@
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/kmsg_dump.h>
+#include <linux/pstore.h>
 #include <linux/ctype.h>
 #include <linux/zlib.h>
 #include <asm/uaccess.h>
@@ -87,6 +88,25 @@ static struct kmsg_dumper nvram_kmsg_dumper = {
 	.dump = oops_to_nvram
 };
 
+static int nvram_pstore_open(struct pstore_info *psi);
+
+static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
+				int *count, struct timespec *time, char **buf,
+				struct pstore_info *psi);
+
+static int nvram_pstore_write(enum pstore_type_id type,
+				enum kmsg_dump_reason reason, u64 *id,
+				unsigned int part, int count, size_t size,
+				struct pstore_info *psi);
+
+static struct pstore_info nvram_pstore_info = {
+	.owner = THIS_MODULE,
+	.name = "nvram",
+	.open = nvram_pstore_open,
+	.read = nvram_pstore_read,
+	.write = nvram_pstore_write,
+};
+
 /* See clobbering_unread_rtas_event() */
 #define NVRAM_RTAS_READ_TIMEOUT 5		/* seconds */
 static unsigned long last_unread_rtas_event;	/* timestamp */
@@ -121,6 +141,13 @@ static char *big_oops_buf, *oops_buf;
 static char *oops_data;
 static size_t oops_data_sz;
 
+#ifdef CONFIG_PSTORE
+static enum pstore_type_id nvram_type_ids[] = {
+	PSTORE_TYPE_DMESG,
+	-1
+};
+static int read_type;
+#endif
 /* Compression parameters */
 #define COMPR_LEVEL 6
 #define WINDOW_BITS 12
@@ -455,6 +482,23 @@ static void __init nvram_init_oops_partition(int rtas_partition_exists)
 	oops_data = oops_buf + sizeof(struct oops_log_info);
 	oops_data_sz = oops_log_partition.size - sizeof(struct oops_log_info);
 
+	nvram_pstore_info.buf = oops_data;
+	nvram_pstore_info.bufsize = oops_data_sz;
+
+	rc = pstore_register(&nvram_pstore_info);
+
+	if (rc != 0) {
+		pr_err("nvram: pstore_register() failed, defaults to "
+				"kmsg_dump; returned %d\n", rc);
+		goto kmsg_dump;
+	} else {
+		/*TODO: Support compression when pstore is configured */
+		pr_info("nvram: Compression of oops text supported only when "
+				"pstore is not configured");
+		return;
+	}
+
+kmsg_dump:
 	/*
 	 * Figure compression (preceded by elimination of each line's <n>
 	 * severity prefix) will reduce the oops/panic report to at most
@@ -663,3 +707,104 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
 
 	spin_unlock_irqrestore(&lock, flags);
 }
+
+#ifdef CONFIG_PSTORE
+static int nvram_pstore_open(struct pstore_info *psi)
+{
+	read_type = -1;
+	return 0;
+}
+
+/*
+ * Called by pstore_dump() when an oops or panic report is logged to the printk
+ * buffer. @size bytes have been written to oops_buf, starting after the
+ * oops_log_info header.
+ */
+static int nvram_pstore_write(enum pstore_type_id type,
+				enum kmsg_dump_reason reason,
+				u64 *id, unsigned int part, int count,
+				size_t size, struct pstore_info *psi)
+{
+	struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf;
+
+	/* part 1 has the recent messages from printk buffer */
+	if (part > 1 || clobbering_unread_rtas_event())
+		return -1;
+
+	BUG_ON(type != PSTORE_TYPE_DMESG);
+	BUG_ON(sizeof(*oops_hdr) + size > oops_log_partition.size);
+	oops_hdr->version = OOPS_HDR_VERSION;
+	oops_hdr->report_length = (u16) size;
+	oops_hdr->timestamp = get_seconds();
+	(void) nvram_write_os_partition(&oops_log_partition, oops_buf,
+		(int) (sizeof(*oops_hdr) + size), ERR_TYPE_KERNEL_PANIC,
+		count);
+	*id = part;
+
+	return 0;
+}
+
+/*
+ * Reads the oops/panic report.
+ * Returns the length of the data we read from each partition.
+ * Returns 0 if we've been called before.
+ */
+static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
+				int *count, struct timespec *time, char **buf,
+				struct pstore_info *psi)
+{
+	struct oops_log_info *oops_hdr;
+	unsigned int err_type, id_no;
+	struct nvram_os_partition *part = NULL;
+	char *buff = NULL;
+
+	read_type++;
+
+	switch (nvram_type_ids[read_type]) {
+	case PSTORE_TYPE_DMESG:
+		part = &oops_log_partition;
+		*type = PSTORE_TYPE_DMESG;
+		break;
+	default:
+		return 0;
+	}
+
+	buff = kmalloc(part->size, GFP_KERNEL);
+
+	if (!buff)
+		return -ENOMEM;
+
+	if (nvram_read_partition(part, buff, part->size, &err_type, &id_no)) {
+		kfree(buff);
+		return 0;
+	}
+
+	*count = 0;
+	*id = id_no;
+	oops_hdr = (struct oops_log_info *)buff;
+	*buf = buff + sizeof(*oops_hdr);
+	time->tv_sec = oops_hdr->timestamp;
+	time->tv_nsec = 0;
+	return oops_hdr->report_length;
+}
+#else
+static int nvram_pstore_open(struct pstore_info *psi)
+{
+	return 0;
+}
+
+static int nvram_pstore_write(enum pstore_type_id type,
+				enum kmsg_dump_reason reason, u64 *id,
+				unsigned int part, int count, size_t size,
+				struct pstore_info *psi)
+{
+	return 0;
+}
+
+static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
+				int *count, struct timespec *time, char **buf,
+				struct pstore_info *psi)
+{
+	return 0;
+}
+#endif

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox