LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall
From: Kumar Gala @ 2012-09-27 22:33 UTC (permalink / raw)
  To: Scott Wood
  Cc: Lan Chunhe-B25806, Wood Scott-B07421, Gala Kumar-B11780,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1348782713.18375.22@snotra>


On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:

> On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
>> On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
>>> On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
>>>> Original process of call:
>>>> 	The mpc85xx_pci_err_probe function completes to been registered
>>>> 	and enabled of EDAC PCI err driver at the latter time stage of
>>>> 	kernel boot in the mpc85xx_edac.c.
>>>> Current process of call:
>>>> 	The mpc85xx_pci_err_probe function completes to been registered
>>>> 	and enabled of EDAC PCI err driver at the first	time stage of
>>>> 	kernel boot in the fsl_pci.c.
>>>> So in this case the following error messages appear in the boot =
log:
>>>>   PCI: Probing PCI hardware
>>>>   pci 0000:00:00.0: ignoring class b20 (doesn't match header type =
01)
>>>>   PCIE error(s) detected
>>>>   PCIE ERR_DR register: 0x00020000
>>>>   PCIE ERR_CAP_STAT register: 0x80000001
>>>>   PCIE ERR_CAP_R0 register: 0x00000800
>>>>   PCIE ERR_CAP_R1 register: 0x00000000
>>>>   PCIE ERR_CAP_R2 register: 0x00000000
>>>>   PCIE ERR_CAP_R3 register: 0x00000000
>>>> Because the EDAC PCI err driver is registered and enabled earlier =
than
>>>> original point of call. But at this point of time, PCI hardware is =
not
>>>> probed and initialized, and it is in unknowable state.
>>>> So, move enable function into mpc85xx_pci_err_en which is called at =
the
>>>> middle time stage of kernel boot and after PCI hardware is probed =
and
>>>> initialized by device_initcall in the fsl_pci.c.
>>>> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
>>>> ---
>>>> arch/powerpc/sysdev/fsl_pci.c |   12 ++++++++++
>>>> arch/powerpc/sysdev/fsl_pci.h |    5 ++++
>>>> drivers/edac/mpc85xx_edac.c   |   47 =
++++++++++++++++++++++++++++------------
>>>> 3 files changed, 50 insertions(+), 14 deletions(-)
>>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c =
b/arch/powerpc/sysdev/fsl_pci.c
>>>> index 3d6f4d8..a591965 100644
>>>> --- a/arch/powerpc/sysdev/fsl_pci.c
>>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>>>> @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
>>>> 	return platform_driver_register(&fsl_pci_driver);
>>>> }
>>>> arch_initcall(fsl_pci_init);
>>>> +
>>>> +static int __init fsl_pci_err_en(void)
>>>> +{
>>>> +	struct device_node *np;
>>>> +
>>>> +	for_each_node_by_type(np, "pci")
>>>> +		if (of_match_node(pci_ids, np))
>>>> +			mpc85xx_pci_err_en(np);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +device_initcall(fsl_pci_err_en);
>>>=20
>>> Why can't you call this from the normal PCIe controller init, =
instead of searching for the node independently?
>> Don't we have this now with mpc85xx_pci_err_probe() ??
>=20
> What do you mean by "this"?

I'm saying don't we replace fsl_pci_err_en() with =
mpc85xx_pci_err_probe()...

I need to look at this more, but not clear why mpc85xx_pci_err_en() can =
just be part of mpc85xx_pci_err_probe()

- k=

^ permalink raw reply

* R: Re: PCI device not working
From: Davide Viti @ 2012-09-28 14:48 UTC (permalink / raw)
  To: galak; +Cc: linuxppc-dev

Hi Kumar,

>
>It was, can you figure out in u-boot what exact config read on=20
the bus would return the correct thing.
>
>The fact that when we probe the=20
device at 0001:03 we should get back something like cfg_data=3D0xabba1b65
>

here=20
follow some details about what is going on inside u-boot; verbosity increas=
es=20
from [1] to [3]

 [1] PCI printouts when the board come up
 [2] output of "pci=20
[0-3] long" u-boot command
 [3] same as [1] but with debug print inside=20
indirect_read_config_##size() [drivers/pci/pci_indirect.c]

if you were curious=20
about our u-boot board settings, please refer to:
http://www.mail-archive.
com/linuxppc-dev@lists.ozlabs.org/msg62007.html

thanx alot,
Davide



*************
*    [1]    *
*************
    PCIE1 used as Root Complex (base=20
addr ffe09000)
               Scanning PCI bus 01
        01  00  1b65  abba =20
0280  00
        cfg_addr:ffe09000  cfg_data:ffe09004  indirect_type:0
   =20
PCIE1 on bus 00 - 01


    PCIE2 used as Root Complex (base addr ffe0a000)

               Scanning PCI bus 03
        03  00  1b65  abba  0280  00
       =20
cfg_addr:ffe0a000  cfg_data:ffe0a004  indirect_type:0
    PCIE2 on bus 02 - 03


*************
*    [2]    *
*************

=3D> pci 0 long
Scanning PCI devices=20
on bus 0

Found PCI device 00.00.00:
  vendor ID =3D                   0x1957
 =20
device ID =3D                   0x0100
  command register =3D            0x0006
 =20
status register =3D             0x0010
  revision ID =3D                 0x11
 =20
class code =3D                  0x0b (Processor)
  sub class code =3D             =20
0x20
  programming interface =3D       0x00
  cache line =3D                  0x08

  latency time =3D                0x00
  header type =3D                 0x01
 =20
BIST =3D                        0x00
  base address 0 =3D              0xfff00000
 =20
base address 1 =3D              0x00000000
  primary bus number =3D          0x00
 =20
secondary bus number =3D        0x01
  subordinate bus number =3D      0x01
 =20
secondary latency timer =3D     0x00
  IO base =3D                     0x00
  IO=20
limit =3D                    0x00
  secondary status =3D            0x0000
  memory=20
base =3D                 0xa000
  memory limit =3D                0xa000
  prefetch=20
memory base =3D        0x1001
  prefetch memory limit =3D       0x0001
  prefetch=20
memory base upper =3D  0x00000000
  prefetch memory limit upper =3D 0x00000000
  IO=20
base upper 16 bits =3D       0x0000
  IO limit upper 16 bits =3D      0x0000
 =20
expansion ROM base address =3D  0x00000000
  interrupt line =3D              0x00
 =20
interrupt pin =3D               0x00
  bridge control =3D              0x0000

=3D>=20
pci 1 long
Scanning PCI devices on bus 1

Found PCI device 01.00.00:kk
  vendor=20
ID =3D                   0x1b65
  device ID =3D                   0xabba
  command=20
register =3D            0x0006
  status register =3D             0x0010
  revision=20
ID =3D                 0x01
  class code =3D                  0x02 (Network=20
controller)
  sub class code =3D              0x80
  programming interface=20
=3D       0x00
  cache line =3D                  0x08
  latency time=20
=3D                0x00
  header type =3D                 0x00
  BIST=20
=3D                        0x00
  base address 0 =3D              0xa0000000
  base=20
address 1 =3D              0xa0010000
  base address 2 =3D              0x00000000

  base address 3 =3D              0x00000000
  base address 4 =3D             =20
0x00000000
  base address 5 =3D              0x00000000
  cardBus CIS pointer=20
=3D         0x00000000
  sub system vendor ID =3D        0x0000
  sub system ID=20
=3D               0x0000
  expansion ROM base address =3D  0x00000000
  interrupt=20
line =3D              0x00
  interrupt pin =3D               0x01
  min Grant=20
=3D                   0x00
  max Latency =3D                 0x00

=3D> pci 2 long

Scanning PCI devices on bus 2

Found PCI device 02.00.00:
  vendor ID=20
=3D                   0x1957
  device ID =3D                   0x0100
  command=20
register =3D            0x0006
  status register =3D             0x0010
  revision=20
ID =3D                 0x11
  class code =3D                  0x0b (Processor)
 =20
sub class code =3D              0x20
  programming interface =3D       0x00
  cache=20
line =3D                  0x08
  latency time =3D                0x00
  header type=20
=3D                 0x01
  BIST =3D                        0x00
  base address 0=20
=3D              0xfff00000
  base address 1 =3D              0x00000000
  primary=20
bus number =3D          0x00
  secondary bus number =3D        0x01
  subordinate=20
bus number =3D      0x01
  secondary latency timer =3D     0x00
  IO base=20
=3D                     0x00
  IO limit =3D                    0x00
  secondary=20
status =3D            0x0000
  memory base =3D                 0xb000
  memory=20
limit =3D                0xb000
  prefetch memory base =3D        0x1001
  prefetch=20
memory limit =3D       0x0001
  prefetch memory base upper =3D  0x00000000
 =20
prefetch memory limit upper =3D 0x00000000
  IO base upper 16 bits =3D       0x0000

  IO limit upper 16 bits =3D      0x0000
  expansion ROM base address =3D =20
0x00000000
  interrupt line =3D              0x00
  interrupt pin =3D              =20
0x00
  bridge control =3D              0x0000

=3D> pci 3 long
Scanning PCI devices=20
on bus 3

Found PCI device 03.00.00:
  vendor ID =3D                   0x1b65
 =20
device ID =3D                   0xabba
  command register =3D            0x0006
 =20
status register =3D             0x0010
  revision ID =3D                 0x01
 =20
class code =3D                  0x02 (Network controller)
  sub class code=20
=3D              0x80
  programming interface =3D       0x00
  cache line=20
=3D                  0x08
  latency time =3D                0x00
  header type=20
=3D                 0x00
  BIST =3D                        0x00
  base address 0=20
=3D              0xb0000000
  base address 1 =3D              0xb0010000
  base=20
address 2 =3D              0x00000000
  base address 3 =3D              0x00000000

  base address 4 =3D              0x00000000
  base address 5 =3D             =20
0x00000000
  cardBus CIS pointer =3D         0x00000000
  sub system vendor ID=20
=3D        0x0000
  sub system ID =3D               0x0000
  expansion ROM base=20
address =3D  0x00000000
  interrupt line =3D              0x00
  interrupt pin=20
=3D               0x01
  min Grant =3D                   0x00
  max Latency=20
=3D                 0x00


*************
*    [3]    *
*************

    PCIE1=20
used as Root Complex (base addr ffe09000)
b=3D0 d=3D0 f=3D0 (fbusno=3D0 itype=3D0=20
cfg_adr=3Dffe09000 cfg_data=3Dffe09004) ofs=3D10 mask=3D0
...
               Scanning=20
PCI bus 01
b=3D1 d=3D0 f=3D0 (fbusno=3D0 itype=3D0 cfg_adr=3Dffe09000 cfg_data=3Dffe09=
004)=20
ofs=3De mask=3D3
...
b=3D1 d=3D0 f=3D0 (fbusno=3D0 itype=3D0 cfg_adr=3Dffe09000=20
cfg_data=3Dffe09004) ofs=3D3c mask=3D3
        01  00  1b65  abba  0280  00
b=3D1 d=3D1=20
f=3D0 (fbusno=3D0 itype=3D0 cfg_adr=3Dffe09000 cfg_data=3Dffe09004) ofs=3De=
 mask=3D3
b=3D1 d=3D1=20
f=3D0 (fbusno=3D0 itype=3D0 cfg_adr=3Dffe09000 cfg_data=3Dffe09004) ofs=3D0=
 mask=3D2
...
b=3D0=20
d=3D0 f=3D0 (fbusno=3D0 itype=3D0 cfg_adr=3Dffe09000 cfg_data=3Dffe09004) o=
fs=3D9 mask=3D3
   =20
PCIE1 on bus 00 - 01

    PCIE2 used as Root Complex (base addr ffe0a000)
b=3D0=20
d=3D0 f=3D0 (fbusno=3D2 itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a004) o=
fs=3D10 mask=3D0
b=3D0=20
d=3D0 f=3D0 (fbusno=3D2 itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a004) o=
fs=3D10 mask=3D0
...

b=3D0 d=3D0 f=3D0 (fbusno=3D2 itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a=
004) ofs=3D9 mask=3D3

               Scanning PCI bus 03
b=3D1 d=3D0 f=3D0 (fbusno=3D2 itype=3D0=20
cfg_adr=3Dffe0a000 cfg_data=3Dffe0a004) ofs=3De mask=3D3
b=3D1 d=3D0 f=3D0 (fbusno=3D2 itype=3D0=20
cfg_adr=3Dffe0a000 cfg_data=3Dffe0a004) ofs=3D0 mask=3D2
...
b=3D1 d=3D0 f=3D0 (fbusno=3D2=20
itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a004) ofs=3D3c mask=3D3
        03  00  1b65 =20
abba  0280  00
        cfg_addr:ffe0a000  cfg_data:ffe0a004  indirect_type:0

b=3D1 d=3D1 f=3D0 (fbusno=3D2 itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a=
004) ofs=3De mask=3D3

...
b=3D0 d=3D0 f=3D0 (fbusno=3D2 itype=3D0 cfg_adr=3Dffe0a000 cfg_data=3Dffe0a=
004) ofs=3D9=20
mask=3D3
    PCIE2 on bus 02 - 03



Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale pi=
=C3=B9 di uno spot in TV. Per ogni nuovo abbonato 30 =E2=82=AC di premio pe=
r te e per lui! Un amico al mese e parli e navighi sempre gratis: http://fr=
eelosophy.tiscali.it/

^ permalink raw reply

* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: J. Bruce Fields @ 2012-09-28 15:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-nfs, Jan Kara, Linus Torvalds, LKML List, skinsbursky,
	bfields, linuxppc-dev
In-Reply-To: <DC545CD9-8745-47DD-B13B-3385C0EB5B27@suse.de>

On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
> 
> On 28.09.2012, at 04:04, Linus Torvalds wrote:
> 
> > On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <agraf@suse.de> wrote:
> >> 
> >> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
> > 
> > Since you seem to be able to reproduce it easily (and apparently
> > reliably), any chance you could just bisect it?
> > 
> > Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
> > 
> >   git bisect start
> >   git bisect good v3.5
> >   git bisect bad v3.6-rc1
> > 
> > will get you started on your adventure..
> 
> Heh, will give it a try :). The thing really does look quite bisectable.
> 
> 
> It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.

I doubt it's anything special about that workload, but just for kicks I
tried a "git clone -ls" (cloning my linux tree to another directory on
the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
anything interesting (just an xfs lockdep warning that looks like this
one jlayton already reported:
http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
)

Any (even partial) bisection results would certainly be useful, thanks.

--b.

^ permalink raw reply

* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Alexander Graf @ 2012-09-28 15:34 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs, Jan Kara, Linus Torvalds, LKML List, skinsbursky,
	bfields, linuxppc-dev
In-Reply-To: <20120928151043.GA19102@fieldses.org>


On 28.09.2012, at 17:10, J. Bruce Fields wrote:

> On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
>>=20
>> On 28.09.2012, at 04:04, Linus Torvalds wrote:
>>=20
>>> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <agraf@suse.de> =
wrote:
>>>>=20
>>>> Below are OOPS excerpts from different rc's I tried. All of them =
crashed - all the way up to current Linus' master branch. I haven't =
cross-checked, but I don't remember any such behavior from pre-3.6 =
releases.
>>>=20
>>> Since you seem to be able to reproduce it easily (and apparently
>>> reliably), any chance you could just bisect it?
>>>=20
>>> Since I assume v3.5 is fine, and apparently -rc1 is already busted, =
a simple
>>>=20
>>>  git bisect start
>>>  git bisect good v3.5
>>>  git bisect bad v3.6-rc1
>>>=20
>>> will get you started on your adventure..
>>=20
>> Heh, will give it a try :). The thing really does look quite =
bisectable.
>>=20
>>=20
>> It might take a few hours though - the machine isn't exactly fast by =
today's standards and it's getting late here. But I'll keep you updated.
>=20
> I doubt it's anything special about that workload, but just for kicks =
I
> tried a "git clone -ls" (cloning my linux tree to another directory on
> the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
> anything interesting (just an xfs lockdep warning that looks like this
> one jlayton already reported:
> http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
> )
>=20
> Any (even partial) bisection results would certainly be useful, =
thanks.

Yeah, still trying. Running the same workload in a PPC VM didn't show =
any badness. Then I tried again to bisect on the machine it broken on, =
and that commit failed even more badly on me than the previous ones, =
destroying my local git tree.

Trying to narrow down now in a slightly more contained environment :).


Alex

^ permalink raw reply

* Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall
From: Scott Wood @ 2012-09-28 17:35 UTC (permalink / raw)
  To: Kumar Gala
  Cc: Lan Chunhe-B25806, Wood Scott-B07421, Gala Kumar-B11780,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <8C412487-0A6A-4C2E-9DE5-05B141201653@kernel.crashing.org>

On 09/27/2012 05:33:26 PM, Kumar Gala wrote:
>=20
> On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:
>=20
> > On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
> >> On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
> >>> On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
> >>>> Original process of call:
> >>>> 	The mpc85xx_pci_err_probe function completes to been registered
> >>>> 	and enabled of EDAC PCI err driver at the latter time stage of
> >>>> 	kernel boot in the mpc85xx_edac.c.
> >>>> Current process of call:
> >>>> 	The mpc85xx_pci_err_probe function completes to been registered
> >>>> 	and enabled of EDAC PCI err driver at the first	time stage of
> >>>> 	kernel boot in the fsl_pci.c.
> >>>> So in this case the following error messages appear in the boot =20
> log:
> >>>>   PCI: Probing PCI hardware
> >>>>   pci 0000:00:00.0: ignoring class b20 (doesn't match header =20
> type 01)
> >>>>   PCIE error(s) detected
> >>>>   PCIE ERR_DR register: 0x00020000
> >>>>   PCIE ERR_CAP_STAT register: 0x80000001
> >>>>   PCIE ERR_CAP_R0 register: 0x00000800
> >>>>   PCIE ERR_CAP_R1 register: 0x00000000
> >>>>   PCIE ERR_CAP_R2 register: 0x00000000
> >>>>   PCIE ERR_CAP_R3 register: 0x00000000
> >>>> Because the EDAC PCI err driver is registered and enabled =20
> earlier than
> >>>> original point of call. But at this point of time, PCI hardware =20
> is not
> >>>> probed and initialized, and it is in unknowable state.
> >>>> So, move enable function into mpc85xx_pci_err_en which is called =20
> at the
> >>>> middle time stage of kernel boot and after PCI hardware is =20
> probed and
> >>>> initialized by device_initcall in the fsl_pci.c.
> >>>> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
> >>>> ---
> >>>> arch/powerpc/sysdev/fsl_pci.c |   12 ++++++++++
> >>>> arch/powerpc/sysdev/fsl_pci.h |    5 ++++
> >>>> drivers/edac/mpc85xx_edac.c   |   47 =20
> ++++++++++++++++++++++++++++------------
> >>>> 3 files changed, 50 insertions(+), 14 deletions(-)
> >>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c =20
> b/arch/powerpc/sysdev/fsl_pci.c
> >>>> index 3d6f4d8..a591965 100644
> >>>> --- a/arch/powerpc/sysdev/fsl_pci.c
> >>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
> >>>> @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
> >>>> 	return platform_driver_register(&fsl_pci_driver);
> >>>> }
> >>>> arch_initcall(fsl_pci_init);
> >>>> +
> >>>> +static int __init fsl_pci_err_en(void)
> >>>> +{
> >>>> +	struct device_node *np;
> >>>> +
> >>>> +	for_each_node_by_type(np, "pci")
> >>>> +		if (of_match_node(pci_ids, np))
> >>>> +			mpc85xx_pci_err_en(np);
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +device_initcall(fsl_pci_err_en);
> >>>
> >>> Why can't you call this from the normal PCIe controller init, =20
> instead of searching for the node independently?
> >> Don't we have this now with mpc85xx_pci_err_probe() ??
> >
> > What do you mean by "this"?
>=20
> I'm saying don't we replace fsl_pci_err_en() with =20
> mpc85xx_pci_err_probe()...
>=20
> I need to look at this more, but not clear why mpc85xx_pci_err_en() =20
> can just be part of mpc85xx_pci_err_probe()

OK, I was confused -- I thought the point was to make it happen =20
earlier, not later.  The changelog is not clear at all.

Don't we want to be able to capture errors that happen during PCI =20
driver initialization, though?

-Scott=

^ permalink raw reply

* Re: [PATCH 2/6] powerpc: Add enable_ppr kernel parameter to enable PPR save/restore
From: Ryan Arnold @ 2012-09-28 22:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Neuling, Adhemerval Zanella, sjmunroe, paulus, anton,
	linuxppc-dev, Haren Myneni
In-Reply-To: <1347342911.2603.39.camel@pasglop>

On Tue, 2012-09-11 at 15:55 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-09-10 at 22:42 -0700, Haren Myneni wrote:
> > 
> > Thanks Michael. Yes, we noticed 6% overhead with null syscall test.
> > Hence added cmdline option as suggested. I will add this comment in
> > the
> > changelog.
> > 
> > Regarding the option name, I thought about various ones such as
> > retain_process_ppr, retain_smt_priority, save_ppr and etc. Finally
> > added
> > 'enable_ppr' since it enables CPU_FTR (CPU_FTR_HAS_PPR) which allows
> > to
> > save/restore PPR value. Sure, I will change this option.
> 
> No, that isn't a problem with the name. It's a problem with the polarity
> of the option.
> 
> If you need a command line argument to enable the option, then nobody
> will enable it, it's pointless.

In GLIBC (ppc.h) we'll be providing a user space API to change the
thread priority in user state.  We're also interested in using this in
some of the locking constructs if performance tests indicate it's
beneficial.

I have concerns with being able to enable/disable this option at boot
time.  Usually, in GLIBC we'll just do a kernel version check and enable
certain facilities if we're building against a particular kernel that
supports them.

In this case, with a configurable option, GLIBC is going to need the
kernel to export a hwcap bit that tells us whether we need to do the
save/restore ourselves.  Having to check the hwcap, and do the
save/restore in user space will, of course, increase the overhead on our
side.

If no hwcap bit is provided and this is disabled at kernel boot time, no
check is done and the user process assumes it's running under a certain
priority when it is, in-fact, not.  I don't care for this option.  We'll
be hitting code paths that are ineffective and unnecessary.

Ryan S. Arnold
Linux Technology Center

^ permalink raw reply

* Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
From: KOSAKI Motohiro @ 2012-09-28 22:15 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
	x86, Ni zhan Chen, linux-kernel, cmetcalf, linux-mm, paulus,
	minchan.kim, rientjes, sparclinux, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <50651E68.3040208@jp.fujitsu.com>

On Thu, Sep 27, 2012 at 11:50 PM, Yasuaki Ishimatsu
<isimatu.yasuaki@jp.fujitsu.com> wrote:
> Hi Chen,
>
>
> 2012/09/28 11:22, Ni zhan Chen wrote:
>>
>> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>>>
>>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> remove_memory() only try to offline pages. It is called in two cases:
>>> 1. hot remove a memory device
>>> 2. echo offline >/sys/devices/system/memory/memoryXX/state
>>>
>>> In the 1st case, we should also change memory block's state, and notify
>>> the userspace that the memory block's state is changed after offlining
>>> pages.
>>>
>>> So rename remove_memory() to offline_memory()/offline_pages(). And in
>>> the 1st case, offline_memory() will be used. The function
>>> offline_memory()
>>> is not implemented. In the 2nd case, offline_pages() will be used.
>>
>>
>> But this time there is not a function associated with add_memory.
>
>
> To associate with add_memory() later, we renamed it.

Then, you introduced bisect breakage. It is definitely unacceptable.

NAK.

^ permalink raw reply

* Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem
From: Ni zhan Chen @ 2012-09-29  2:15 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: linux-s390, linux-ia64, Wen Congyang, len.brown, linux-acpi,
	linux-sh, x86, linux-kernel, cmetcalf, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
	linuxppc-dev, akpm, liuj97
In-Reply-To: <1346837155-534-14-git-send-email-wency@cn.fujitsu.com>

On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> The function get_page_bootmem() may be called more than one time to the same
> page. There is no need to set page's type, private if the function is not
> the first time called to the page.
>
> Note: the patch is just optimization and does not fix any problem.

Hi Yasuaki,

this patch is reasonable to me. I have another question associated to 
get_page_bootmem(), the question is from another fujitsu guy's patch 
changelog [commit : 04753278769f3], the changelog said  that:

  1) When the memmap of removing section is allocated on other
      section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
      same section, it shouldn't be freed. Because the section has to be
      logical memory offlined already and all pages must be isolated against
      page allocater. If it is freed, page allocator may use it which will
      be removed physically soon.

but I don't see his patch guarantee 2), it means that his patch doesn't 
guarantee the memmap of removing section which is allocated on other 
section by bootmem doesn't be freed. Hopefully get your explaination in 
details, thanks in advance. :-)

>
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
>   mm/memory_hotplug.c |   15 +++++++++++----
>   1 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d736df3..26a5012 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>   static void get_page_bootmem(unsigned long info,  struct page *page,
>   			     unsigned long type)
>   {
> -	page->lru.next = (struct list_head *) type;
> -	SetPagePrivate(page);
> -	set_page_private(page, info);
> -	atomic_inc(&page->_count);
> +	unsigned long page_type;
> +
> +	page_type = (unsigned long)page->lru.next;
> +	if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> +	    page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
> +		page->lru.next = (struct list_head *)type;
> +		SetPagePrivate(page);
> +		set_page_private(page, info);
> +		atomic_inc(&page->_count);
> +	} else
> +		atomic_inc(&page->_count);
>   }
>   
>   /* reference to __meminit __free_pages_bootmem is valid

^ permalink raw reply

* Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall
From: Chunhe Lan @ 2012-09-29 14:42 UTC (permalink / raw)
  To: Scott Wood
  Cc: Wood Scott-B07421, Gala Kumar-B11780,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1348853717.5580.5@snotra>

On 09/28/2012 01:35 PM, Scott Wood wrote:
> On 09/27/2012 05:33:26 PM, Kumar Gala wrote:
>>
>> On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:
>>
>> > On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
>> >> On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
>> >>> On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
>> >>>> Original process of call:
>> >>>>     The mpc85xx_pci_err_probe function completes to been registered
>> >>>>     and enabled of EDAC PCI err driver at the latter time stage of
>> >>>>     kernel boot in the mpc85xx_edac.c.
>> >>>> Current process of call:
>> >>>>     The mpc85xx_pci_err_probe function completes to been registered
>> >>>>     and enabled of EDAC PCI err driver at the first    time 
>> stage of
>> >>>>     kernel boot in the fsl_pci.c.
>> >>>> So in this case the following error messages appear in the boot 
>> log:
>> >>>>   PCI: Probing PCI hardware
>> >>>>   pci 0000:00:00.0: ignoring class b20 (doesn't match header 
>> type 01)
>> >>>>   PCIE error(s) detected
>> >>>>   PCIE ERR_DR register: 0x00020000
>> >>>>   PCIE ERR_CAP_STAT register: 0x80000001
>> >>>>   PCIE ERR_CAP_R0 register: 0x00000800
>> >>>>   PCIE ERR_CAP_R1 register: 0x00000000
>> >>>>   PCIE ERR_CAP_R2 register: 0x00000000
>> >>>>   PCIE ERR_CAP_R3 register: 0x00000000
>> >>>> Because the EDAC PCI err driver is registered and enabled 
>> earlier than
>> >>>> original point of call. But at this point of time, PCI hardware 
>> is not
>> >>>> probed and initialized, and it is in unknowable state.
>> >>>> So, move enable function into mpc85xx_pci_err_en which is called 
>> at the
>> >>>> middle time stage of kernel boot and after PCI hardware is 
>> probed and
>> >>>> initialized by device_initcall in the fsl_pci.c.
>> >>>> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
>> >>>> ---
>> >>>> arch/powerpc/sysdev/fsl_pci.c |   12 ++++++++++
>> >>>> arch/powerpc/sysdev/fsl_pci.h |    5 ++++
>> >>>> drivers/edac/mpc85xx_edac.c   |   47 
>> ++++++++++++++++++++++++++++------------
>> >>>> 3 files changed, 50 insertions(+), 14 deletions(-)
>> >>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c 
>> b/arch/powerpc/sysdev/fsl_pci.c
>> >>>> index 3d6f4d8..a591965 100644
>> >>>> --- a/arch/powerpc/sysdev/fsl_pci.c
>> >>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>> >>>> @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
>> >>>>     return platform_driver_register(&fsl_pci_driver);
>> >>>> }
>> >>>> arch_initcall(fsl_pci_init);
>> >>>> +
>> >>>> +static int __init fsl_pci_err_en(void)
>> >>>> +{
>> >>>> +    struct device_node *np;
>> >>>> +
>> >>>> +    for_each_node_by_type(np, "pci")
>> >>>> +        if (of_match_node(pci_ids, np))
>> >>>> +            mpc85xx_pci_err_en(np);
>> >>>> +
>> >>>> +    return 0;
>> >>>> +}
>> >>>> +device_initcall(fsl_pci_err_en);
>> >>>
>> >>> Why can't you call this from the normal PCIe controller init, 
>> instead of searching for the node independently?
>> >> Don't we have this now with mpc85xx_pci_err_probe() ??
>> >
>> > What do you mean by "this"?
>>
>> I'm saying don't we replace fsl_pci_err_en() with 
>> mpc85xx_pci_err_probe()...
>>
>> I need to look at this more, but not clear why mpc85xx_pci_err_en() 
>> can just be part of mpc85xx_pci_err_probe()
>
> OK, I was confused -- I thought the point was to make it happen 
> earlier, not later.  The changelog is not clear at all.
>
> Don't we want to be able to capture errors that happen during PCI 
> driver initialization, though?
     Yes.
     When PCI controller is probing slot which if the any device does 
not have on, happens the invalid address errors.
     Then the edac driver prints the many error massages. This makes 
sense as normal, but this is ugly.
     So, move the enable edac driver to later, and only detect the 
errors of the follow-up pci operations.

    Thanks,
    Chunhe
>
> -Scott

^ permalink raw reply

* Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Ni zhan Chen @ 2012-09-29  3:45 UTC (permalink / raw)
  To: wency
  Cc: linux-s390, linux-ia64, len.brown, linux-acpi, linux-sh, x86,
	linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
	minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
	linuxppc-dev, akpm, liuj97
In-Reply-To: <1346837155-534-1-git-send-email-wency@cn.fujitsu.com>

On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
>
> This patch series aims to support physical memory hot-remove.
>
> The patches can free/remove the following things:
>
>    - acpi_memory_info                          : [RFC PATCH 4/19]
>    - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>    - iomem_resource                            : [RFC PATCH 9/19]
>    - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
>    - page table of removed memory              : [RFC PATCH 12/19]
>    - node and related sysfs files              : [RFC PATCH 18-19/19]
>
> If you find lack of function for physical memory hot-remove, please let me
> know.

Since patchset is too big, could you add more patchset changelog to 
describe how this patchset works? in order that it is easier to review.

>
> How to test this patchset?
> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
>     ACPI_HOTPLUG_MEMORY must be selected.
> 2. load the module acpi_memhotplug
> 3. hotplug the memory device(it depends on your hardware)
>     You will see the memory device under the directory /sys/bus/acpi/devices/.
>     Its name is PNP0C80:XX.
> 4. online/offline pages provided by this memory device
>     You can write online/offline to /sys/devices/system/memory/memoryX/state to
>     online/offline pages provided by this memory device
> 5. hotremove the memory device
>     You can hotremove the memory device by the hardware, or writing 1 to
>     /sys/bus/acpi/devices/PNP0C80:XX/eject.
>
> Note: if the memory provided by the memory device is used by the kernel, it
> can't be offlined. It is not a bug.
>
> Known problems:
> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>     For example: there is a memory device on node 1. The address range
>     is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
>     and memory11 under the directory /sys/devices/system/memory/.
>     If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
>     when we online pages. When we online memory8, the memory stored page cgroup
>     is not provided by this memory device. But when we online memory9, the memory
>     stored page cgroup may be provided by memory8. So we can't offline memory8
>     now. We should offline the memory in the reversed order.
>     When the memory device is hotremoved, we will auto offline memory provided
>     by this memory device. But we don't know which memory is onlined first, so
>     offlining memory may fail. In such case, you should offline the memory by
>     hand before hotremoving the memory device.
> 2. hotremoving memory device may cause kernel panicked
>     This bug will be fixed by Liu Jiang's patch:
>     https://lkml.org/lkml/2012/7/3/1
>
> change log of v9:
>   [RFC PATCH v9 8/21]
>     * add a lock to protect the list map_entries
>     * add an indicator to firmware_map_entry to remember whether the memory
>       is allocated from bootmem
>   [RFC PATCH v9 10/21]
>     * change the macro to inline function
>   [RFC PATCH v9 19/21]
>     * don't offline the node if the cpu on the node is onlined
>   [RFC PATCH v9 21/21]
>     * create new patch: auto offline page_cgroup when onlining memory block
>       failed
>
> change log of v8:
>   [RFC PATCH v8 17/20]
>     * Fix problems when one node's range include the other nodes
>   [RFC PATCH v8 18/20]
>     * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
>       is not defined.
>   [RFC PATCH v8 19/20]
>     * don't offline node when some memory sections are not removed
>   [RFC PATCH v8 20/20]
>     * create new patch: clear hwpoisoned flag when onlining pages
>
> change log of v7:
>   [RFC PATCH v7 4/19]
>     * do not continue if acpi_memory_device_remove_memory() fails.
>   [RFC PATCH v7 15/19]
>     * handle usemap in register_page_bootmem_info_section() too.
>
> change log of v6:
>   [RFC PATCH v6 12/19]
>     * fix building error on other archtitectures than x86
>
>   [RFC PATCH v6 15-16/19]
>     * fix building error on other archtitectures than x86
>
> change log of v5:
>   * merge the patchset to clear page table and the patchset to hot remove
>     memory(from ishimatsu) to one big patchset.
>
>   [RFC PATCH v5 1/19]
>     * rename remove_memory() to offline_memory()/offline_pages()
>
>   [RFC PATCH v5 2/19]
>     * new patch: implement offline_memory(). This function offlines pages,
>       update memory block's state, and notify the userspace that the memory
>       block's state is changed.
>
>   [RFC PATCH v5 4/19]
>     * offline and remove memory in acpi_memory_disable_device() too.
>
>   [RFC PATCH v5 17/19]
>     * new patch: add a new function __remove_zone() to revert the things done
>       in the function __add_zone().
>
>   [RFC PATCH v5 18/19]
>     * flush work befor reseting node device.
>
> change log of v4:
>   * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
>     from the patch series, since the patch is a bugfix. It is being disccussed
>     on other thread. But for testing the patch series, the patch is needed.
>     So I added the patch as [PATCH 0/13].
>
>   [RFC PATCH v4 2/13]
>     * check memory is online or not at remove_memory()
>     * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
>       getting node id
>   
>   [RFC PATCH v4 3/13]
>     * create new patch : check memory is online or not at online_pages()
>
>   [RFC PATCH v4 4/13]
>     * add __ref section to remove_memory()
>     * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()
>
>   [RFC PATCH v4 11/13]
>     * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD
>
> change log of v3:
>   * rebase to 3.5.0-rc6
>
>   [RFC PATCH v2 2/13]
>     * remove extra kobject_put()
>
>     * The patch was commented by Wen. Wen's comment is
>       "acpi_memory_device_remove() should ignore a return value of
>       remove_memory() since caller does not care the return value".
>       But I did not change it since I think caller should care the
>       return value. And I am trying to fix it as follow:
>
>       https://lkml.org/lkml/2012/7/5/624
>
>   [RFC PATCH v2 4/13]
>     * remove a firmware_memmap_entry allocated by kzmalloc()
>
> change log of v2:
>   [RFC PATCH v2 2/13]
>     * check whether memory block is offline or not before calling offline_memory()
>     * check whether section is valid or not in is_memblk_offline()
>     * call kobject_put() for each memory_block in is_memblk_offline()
>
>   [RFC PATCH v2 3/13]
>     * unify the end argument of firmware_map_add_early/hotplug
>
>   [RFC PATCH v2 4/13]
>     * add release_firmware_map_entry() for freeing firmware_map_entry
>
>   [RFC PATCH v2 6/13]
>    * add release_memory_block() for freeing memory_block
>
>   [RFC PATCH v2 11/13]
>    * fix wrong arguments of free_pages()
>
>
> Wen Congyang (8):
>    memory-hotplug: implement offline_memory()
>    memory-hotplug: store the node id in acpi_memory_device
>    memory-hotplug: export the function acpi_bus_remove()
>    memory-hotplug: call acpi_bus_remove() to remove memory device
>    memory-hotplug: introduce new function arch_remove_memory()
>    memory-hotplug: remove sysfs file of node
>    memory-hotplug: clear hwpoisoned flag when onlining pages
>    memory-hotplug: auto offline page_cgroup when onlining memory block
>      failed
>
> Yasuaki Ishimatsu (13):
>    memory-hotplug: rename remove_memory() to
>      offline_memory()/offline_pages()
>    memory-hotplug: offline and remove memory when removing the memory
>      device
>    memory-hotplug: check whether memory is present or not
>    memory-hotplug: remove /sys/firmware/memmap/X sysfs
>    memory-hotplug: does not release memory region in PAGES_PER_SECTION
>      chunks
>    memory-hotplug: add memory_block_release
>    memory-hotplug: remove_memory calls __remove_pages
>    memory-hotplug: check page type in get_page_bootmem
>    memory-hotplug: move register_page_bootmem_info_node and
>      put_page_bootmem for sparse-vmemmap
>    memory-hotplug: implement register_page_bootmem_info_section of
>      sparse-vmemmap
>    memory-hotplug: free memmap of sparse-vmemmap
>    memory_hotplug: clear zone when the memory is removed
>    memory-hotplug: add node_device_release
>
>   arch/ia64/mm/discontig.c                        |   14 +
>   arch/ia64/mm/init.c                             |   16 +
>   arch/powerpc/mm/init_64.c                       |   14 +
>   arch/powerpc/mm/mem.c                           |   14 +
>   arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
>   arch/s390/mm/init.c                             |   12 +
>   arch/s390/mm/vmem.c                             |   14 +
>   arch/sh/mm/init.c                               |   15 +
>   arch/sparc/mm/init_64.c                         |   14 +
>   arch/tile/mm/init.c                             |    8 +
>   arch/x86/include/asm/pgtable_types.h            |    1 +
>   arch/x86/mm/init_32.c                           |   10 +
>   arch/x86/mm/init_64.c                           |  331 ++++++++++++++++++
>   arch/x86/mm/pageattr.c                          |   47 ++--
>   drivers/acpi/acpi_memhotplug.c                  |   54 +++-
>   drivers/acpi/scan.c                             |    3 +-
>   drivers/base/memory.c                           |   88 ++++-
>   drivers/base/node.c                             |   11 +
>   drivers/firmware/memmap.c                       |   98 +++++-
>   include/acpi/acpi_bus.h                         |    1 +
>   include/linux/firmware-map.h                    |    6 +
>   include/linux/memory.h                          |    5 +
>   include/linux/memory_hotplug.h                  |   25 +-
>   include/linux/mm.h                              |    5 +-
>   include/linux/mmzone.h                          |   19 +
>   mm/memory_hotplug.c                             |  424 +++++++++++++++++++++--
>   mm/page_cgroup.c                                |    3 +
>   mm/sparse.c                                     |    5 +-
>   28 files changed, 1181 insertions(+), 92 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply

* Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Ni zhan Chen @ 2012-09-29  8:19 UTC (permalink / raw)
  To: wency
  Cc: linux-s390, linux-ia64, len.brown, linux-acpi, linux-sh, x86,
	linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
	minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
	linuxppc-dev, akpm, liuj97
In-Reply-To: <1346837155-534-1-git-send-email-wency@cn.fujitsu.com>

On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
>
> This patch series aims to support physical memory hot-remove.
>
> The patches can free/remove the following things:
>
>    - acpi_memory_info                          : [RFC PATCH 4/19]
>    - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>    - iomem_resource                            : [RFC PATCH 9/19]
>    - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
>    - page table of removed memory              : [RFC PATCH 12/19]
>    - node and related sysfs files              : [RFC PATCH 18-19/19]
>
> If you find lack of function for physical memory hot-remove, please let me
> know.
>
> How to test this patchset?
> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
>     ACPI_HOTPLUG_MEMORY must be selected.
> 2. load the module acpi_memhotplug

Hi Yasuaki,

where is the acpi_memhotplug module?

> 3. hotplug the memory device(it depends on your hardware)
>     You will see the memory device under the directory /sys/bus/acpi/devices/.
>     Its name is PNP0C80:XX.
> 4. online/offline pages provided by this memory device
>     You can write online/offline to /sys/devices/system/memory/memoryX/state to
>     online/offline pages provided by this memory device
> 5. hotremove the memory device
>     You can hotremove the memory device by the hardware, or writing 1 to
>     /sys/bus/acpi/devices/PNP0C80:XX/eject.
>
> Note: if the memory provided by the memory device is used by the kernel, it
> can't be offlined. It is not a bug.
>
> Known problems:
> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>     For example: there is a memory device on node 1. The address range
>     is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
>     and memory11 under the directory /sys/devices/system/memory/.
>     If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
>     when we online pages. When we online memory8, the memory stored page cgroup
>     is not provided by this memory device. But when we online memory9, the memory
>     stored page cgroup may be provided by memory8. So we can't offline memory8
>     now. We should offline the memory in the reversed order.
>     When the memory device is hotremoved, we will auto offline memory provided
>     by this memory device. But we don't know which memory is onlined first, so
>     offlining memory may fail. In such case, you should offline the memory by
>     hand before hotremoving the memory device.
> 2. hotremoving memory device may cause kernel panicked
>     This bug will be fixed by Liu Jiang's patch:
>     https://lkml.org/lkml/2012/7/3/1
>
> change log of v9:
>   [RFC PATCH v9 8/21]
>     * add a lock to protect the list map_entries
>     * add an indicator to firmware_map_entry to remember whether the memory
>       is allocated from bootmem
>   [RFC PATCH v9 10/21]
>     * change the macro to inline function
>   [RFC PATCH v9 19/21]
>     * don't offline the node if the cpu on the node is onlined
>   [RFC PATCH v9 21/21]
>     * create new patch: auto offline page_cgroup when onlining memory block
>       failed
>
> change log of v8:
>   [RFC PATCH v8 17/20]
>     * Fix problems when one node's range include the other nodes
>   [RFC PATCH v8 18/20]
>     * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
>       is not defined.
>   [RFC PATCH v8 19/20]
>     * don't offline node when some memory sections are not removed
>   [RFC PATCH v8 20/20]
>     * create new patch: clear hwpoisoned flag when onlining pages
>
> change log of v7:
>   [RFC PATCH v7 4/19]
>     * do not continue if acpi_memory_device_remove_memory() fails.
>   [RFC PATCH v7 15/19]
>     * handle usemap in register_page_bootmem_info_section() too.
>
> change log of v6:
>   [RFC PATCH v6 12/19]
>     * fix building error on other archtitectures than x86
>
>   [RFC PATCH v6 15-16/19]
>     * fix building error on other archtitectures than x86
>
> change log of v5:
>   * merge the patchset to clear page table and the patchset to hot remove
>     memory(from ishimatsu) to one big patchset.
>
>   [RFC PATCH v5 1/19]
>     * rename remove_memory() to offline_memory()/offline_pages()
>
>   [RFC PATCH v5 2/19]
>     * new patch: implement offline_memory(). This function offlines pages,
>       update memory block's state, and notify the userspace that the memory
>       block's state is changed.
>
>   [RFC PATCH v5 4/19]
>     * offline and remove memory in acpi_memory_disable_device() too.
>
>   [RFC PATCH v5 17/19]
>     * new patch: add a new function __remove_zone() to revert the things done
>       in the function __add_zone().
>
>   [RFC PATCH v5 18/19]
>     * flush work befor reseting node device.
>
> change log of v4:
>   * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
>     from the patch series, since the patch is a bugfix. It is being disccussed
>     on other thread. But for testing the patch series, the patch is needed.
>     So I added the patch as [PATCH 0/13].
>
>   [RFC PATCH v4 2/13]
>     * check memory is online or not at remove_memory()
>     * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
>       getting node id
>   
>   [RFC PATCH v4 3/13]
>     * create new patch : check memory is online or not at online_pages()
>
>   [RFC PATCH v4 4/13]
>     * add __ref section to remove_memory()
>     * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()
>
>   [RFC PATCH v4 11/13]
>     * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD
>
> change log of v3:
>   * rebase to 3.5.0-rc6
>
>   [RFC PATCH v2 2/13]
>     * remove extra kobject_put()
>
>     * The patch was commented by Wen. Wen's comment is
>       "acpi_memory_device_remove() should ignore a return value of
>       remove_memory() since caller does not care the return value".
>       But I did not change it since I think caller should care the
>       return value. And I am trying to fix it as follow:
>
>       https://lkml.org/lkml/2012/7/5/624
>
>   [RFC PATCH v2 4/13]
>     * remove a firmware_memmap_entry allocated by kzmalloc()
>
> change log of v2:
>   [RFC PATCH v2 2/13]
>     * check whether memory block is offline or not before calling offline_memory()
>     * check whether section is valid or not in is_memblk_offline()
>     * call kobject_put() for each memory_block in is_memblk_offline()
>
>   [RFC PATCH v2 3/13]
>     * unify the end argument of firmware_map_add_early/hotplug
>
>   [RFC PATCH v2 4/13]
>     * add release_firmware_map_entry() for freeing firmware_map_entry
>
>   [RFC PATCH v2 6/13]
>    * add release_memory_block() for freeing memory_block
>
>   [RFC PATCH v2 11/13]
>    * fix wrong arguments of free_pages()
>
>
> Wen Congyang (8):
>    memory-hotplug: implement offline_memory()
>    memory-hotplug: store the node id in acpi_memory_device
>    memory-hotplug: export the function acpi_bus_remove()
>    memory-hotplug: call acpi_bus_remove() to remove memory device
>    memory-hotplug: introduce new function arch_remove_memory()
>    memory-hotplug: remove sysfs file of node
>    memory-hotplug: clear hwpoisoned flag when onlining pages
>    memory-hotplug: auto offline page_cgroup when onlining memory block
>      failed
>
> Yasuaki Ishimatsu (13):
>    memory-hotplug: rename remove_memory() to
>      offline_memory()/offline_pages()
>    memory-hotplug: offline and remove memory when removing the memory
>      device
>    memory-hotplug: check whether memory is present or not
>    memory-hotplug: remove /sys/firmware/memmap/X sysfs
>    memory-hotplug: does not release memory region in PAGES_PER_SECTION
>      chunks
>    memory-hotplug: add memory_block_release
>    memory-hotplug: remove_memory calls __remove_pages
>    memory-hotplug: check page type in get_page_bootmem
>    memory-hotplug: move register_page_bootmem_info_node and
>      put_page_bootmem for sparse-vmemmap
>    memory-hotplug: implement register_page_bootmem_info_section of
>      sparse-vmemmap
>    memory-hotplug: free memmap of sparse-vmemmap
>    memory_hotplug: clear zone when the memory is removed
>    memory-hotplug: add node_device_release
>
>   arch/ia64/mm/discontig.c                        |   14 +
>   arch/ia64/mm/init.c                             |   16 +
>   arch/powerpc/mm/init_64.c                       |   14 +
>   arch/powerpc/mm/mem.c                           |   14 +
>   arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
>   arch/s390/mm/init.c                             |   12 +
>   arch/s390/mm/vmem.c                             |   14 +
>   arch/sh/mm/init.c                               |   15 +
>   arch/sparc/mm/init_64.c                         |   14 +
>   arch/tile/mm/init.c                             |    8 +
>   arch/x86/include/asm/pgtable_types.h            |    1 +
>   arch/x86/mm/init_32.c                           |   10 +
>   arch/x86/mm/init_64.c                           |  331 ++++++++++++++++++
>   arch/x86/mm/pageattr.c                          |   47 ++--
>   drivers/acpi/acpi_memhotplug.c                  |   54 +++-
>   drivers/acpi/scan.c                             |    3 +-
>   drivers/base/memory.c                           |   88 ++++-
>   drivers/base/node.c                             |   11 +
>   drivers/firmware/memmap.c                       |   98 +++++-
>   include/acpi/acpi_bus.h                         |    1 +
>   include/linux/firmware-map.h                    |    6 +
>   include/linux/memory.h                          |    5 +
>   include/linux/memory_hotplug.h                  |   25 +-
>   include/linux/mm.h                              |    5 +-
>   include/linux/mmzone.h                          |   19 +
>   mm/memory_hotplug.c                             |  424 +++++++++++++++++++++--
>   mm/page_cgroup.c                                |    3 +
>   mm/sparse.c                                     |    5 +-
>   28 files changed, 1181 insertions(+), 92 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply

* [PATCH] powerpc/mpc85xx: Change spin table to cached memory
From: York Sun @ 2012-09-29 23:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, kumar.gala, timur

ePAPR v1.1 requires the spin table to be in cached memory. So we need
to change the call argument of ioremap to enable cache and coherence.
We also flush the cache after writing to spin table to keep it compatible
with previous cache-inhibit spin table. Flushing before and after
accessing spin table is recommended by ePAPR.

Signed-off-by: York Sun <yorksun@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
---
This patch applies to git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git next branch.

 arch/powerpc/platforms/85xx/smp.c |   49 +++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index 6fcfa12..148c2f2 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -128,6 +128,19 @@ static void __cpuinit smp_85xx_mach_cpu_die(void)
 }
 #endif
 
+static inline void flush_spin_table(void *spin_table)
+{
+	flush_dcache_range((ulong)spin_table,
+		(ulong)spin_table + sizeof(struct epapr_spin_table));
+}
+
+static inline u32 read_spin_table_addr_l(void *spin_table)
+{
+	flush_dcache_range((ulong)spin_table,
+		(ulong)spin_table + sizeof(struct epapr_spin_table));
+	return in_be32(&((struct epapr_spin_table *)spin_table)->addr_l);
+}
+
 static int __cpuinit smp_85xx_kick_cpu(int nr)
 {
 	unsigned long flags;
@@ -161,8 +174,8 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 
 	/* Map the spin table */
 	if (ioremappable)
-		spin_table = ioremap(*cpu_rel_addr,
-				sizeof(struct epapr_spin_table));
+		spin_table = ioremap_prot(*cpu_rel_addr,
+			sizeof(struct epapr_spin_table), _PAGE_COHERENT);
 	else
 		spin_table = phys_to_virt(*cpu_rel_addr);
 
@@ -173,7 +186,16 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 	generic_set_cpu_up(nr);
 
 	if (system_state == SYSTEM_RUNNING) {
+		/*
+		 * To keep it compatible with old boot program which uses
+		 * cache-inhibit spin table, we need to flush the cache
+		 * before accessing spin table to invalidate any staled data.
+		 * We also need to flush the cache after writing to spin
+		 * table to push data out.
+		 */
+		flush_spin_table(spin_table);
 		out_be32(&spin_table->addr_l, 0);
+		flush_spin_table(spin_table);
 
 		/*
 		 * We don't set the BPTR register here since it already points
@@ -181,9 +203,14 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 		 */
 		mpic_reset_core(hw_cpu);
 
-		/* wait until core is ready... */
-		if (!spin_event_timeout(in_be32(&spin_table->addr_l) == 1,
-						10000, 100)) {
+		/*
+		 * wait until core is ready...
+		 * We need to invalidate the stale data, in case the boot
+		 * loader uses a cache-inhibited spin table.
+		 */
+		if (!spin_event_timeout(
+				read_spin_table_addr_l(spin_table) == 1,
+				10000, 100)) {
 			pr_err("%s: timeout waiting for core %d to reset\n",
 							__func__, hw_cpu);
 			ret = -ENOENT;
@@ -194,12 +221,10 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
 		__secondary_hold_acknowledge = -1;
 	}
 #endif
+	flush_spin_table(spin_table);
 	out_be32(&spin_table->pir, hw_cpu);
 	out_be32(&spin_table->addr_l, __pa(__early_start));
-
-	if (!ioremappable)
-		flush_dcache_range((ulong)spin_table,
-			(ulong)spin_table + sizeof(struct epapr_spin_table));
+	flush_spin_table(spin_table);
 
 	/* Wait a bit for the CPU to ack. */
 	if (!spin_event_timeout(__secondary_hold_acknowledge == hw_cpu,
@@ -213,13 +238,11 @@ out:
 #else
 	smp_generic_kick_cpu(nr);
 
+	flush_spin_table(spin_table);
 	out_be32(&spin_table->pir, hw_cpu);
 	out_be64((u64 *)(&spin_table->addr_h),
 	  __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
-
-	if (!ioremappable)
-		flush_dcache_range((ulong)spin_table,
-			(ulong)spin_table + sizeof(struct epapr_spin_table));
+	flush_spin_table(spin_table);
 #endif
 
 	local_irq_restore(flags);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] PPC: Do not make the entire heap executable
From: Jason Gunthorpe @ 2012-09-30 23:26 UTC (permalink / raw)
  To: linux-kernel, Benjamin Herrenschmidt; +Cc: linuxppc-dev, Alexander Viro

On PPC the ELF PLT sections look like this:

  [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
  [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
  [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4

Which results in an ELF load header:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000

This is all correct, the load region containing the PLT is marked as
executable. Note that the PLT starts at 0002b00c but the file mapping ends at
0002aff8, so the PLT falls in the 0 fill section described by the load header,
and after a page boundary.

Unfortunately the generic ELF loader ignores the X bit in the load headers
when it creates the 0 filled non-file backed mappings. It assumes all of these
mappings are RW BSS sections, which is not the case for PPC.

Teach the ELF loader to check the X bit in the relevant load header and
create 0 filled anonymous mappings that are executable if the load header
requests that.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 arch/powerpc/include/asm/page.h    |   10 +--------
 arch/powerpc/include/asm/page_32.h |    2 -
 arch/powerpc/include/asm/page_64.h |    4 ---
 fs/binfmt_elf.c                    |   37 ++++++++++++++++++++++++++++-------
 4 files changed, 30 insertions(+), 23 deletions(-)

Please consider this a proposal to solve this issue.. 

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f072e97..61e46fc 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -215,15 +215,7 @@ extern long long virt_phys_offset;
 #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET + MEMORY_START)
 #endif
 
-/*
- * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI,
- * and needs to be executable.  This means the whole heap ends
- * up being executable.
- */
-#define VM_DATA_DEFAULT_FLAGS32	(VM_READ | VM_WRITE | VM_EXEC | \
-				 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
-
-#define VM_DATA_DEFAULT_FLAGS64	(VM_READ | VM_WRITE | \
+#define VM_DATA_DEFAULT_FLAGS	(VM_READ | VM_WRITE | \
 				 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
 #ifdef __powerpc64__
diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
index 68d73b2..aaae5a6 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -7,8 +7,6 @@
 #endif
 #endif
 
-#define VM_DATA_DEFAULT_FLAGS	VM_DATA_DEFAULT_FLAGS32
-
 #ifdef CONFIG_NOT_COHERENT_CACHE
 #define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
 #endif
diff --git a/arch/powerpc/include/asm/page_64.h b/arch/powerpc/include/asm/page_64.h
index fed85e6..615d88b 100644
--- a/arch/powerpc/include/asm/page_64.h
+++ b/arch/powerpc/include/asm/page_64.h
@@ -136,10 +136,6 @@ do {						\
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 
-#define VM_DATA_DEFAULT_FLAGS \
-	(is_32bit_task() ? \
-	 VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64)
-
 /*
  * This is the default if a program doesn't have a PT_GNU_STACK
  * program header entry. The PPC64 ELF ABI has a non executable stack
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 1b52956..e5a432b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -76,13 +76,20 @@ static struct linux_binfmt elf_format = {
 
 #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
-static int set_brk(unsigned long start, unsigned long end)
+static int set_brk(unsigned long start, unsigned long end, int prot)
 {
 	start = ELF_PAGEALIGN(start);
 	end = ELF_PAGEALIGN(end);
 	if (end > start) {
 		unsigned long addr;
-		addr = vm_brk(start, end - start);
+		/* Map the non-file portion of the last load header. If the
+		   header is requesting these pages to be executeable then
+		   we have to honour that, otherwise assume they are bss. */
+		if (prot & PROT_EXEC)
+			addr = vm_mmap(0, start, end - start, prot,
+				MAP_PRIVATE | MAP_FIXED, 0);
+		else
+			addr = vm_brk(start, end - start);
 		if (BAD_ADDR(addr))
 			return addr;
 	}
@@ -381,6 +388,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 	unsigned long load_addr = 0;
 	int load_addr_set = 0;
 	unsigned long last_bss = 0, elf_bss = 0;
+	int bss_prot = 0;
 	unsigned long error = ~0UL;
 	unsigned long total_size;
 	int retval, i, size;
@@ -489,8 +497,10 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 			 * elf_bss and last_bss is the bss section.
 			 */
 			k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
-			if (k > last_bss)
+			if (k > last_bss) {
 				last_bss = k;
+				bss_prot = elf_prot;
+			}
 		}
 	}
 
@@ -509,8 +519,15 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 		/* What we have mapped so far */
 		elf_bss = ELF_PAGESTART(elf_bss + ELF_MIN_ALIGN - 1);
 
-		/* Map the last of the bss segment */
-		error = vm_brk(elf_bss, last_bss - elf_bss);
+		/* Map the non-file portion of the last load header. If the
+		   header is requesting these pages to be executeable then
+		   we have to honour that, otherwise assume they are bss. */
+		if (bss_prot & PROT_EXEC)
+			error = vm_mmap(0, elf_bss, last_bss - elf_bss,
+					bss_prot, MAP_PRIVATE | MAP_FIXED, 0);
+		else
+			error = vm_brk(elf_bss, last_bss - elf_bss);
+
 		if (BAD_ADDR(error))
 			goto out_close;
 	}
@@ -560,6 +577,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 	unsigned long error;
 	struct elf_phdr *elf_ppnt, *elf_phdata;
 	unsigned long elf_bss, elf_brk;
+	int bss_prot = 0;
 	int retval, i;
 	unsigned int size;
 	unsigned long elf_entry;
@@ -750,7 +768,8 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 			   before this one. Map anonymous pages, if needed,
 			   and clear the area.  */
 			retval = set_brk(elf_bss + load_bias,
-					 elf_brk + load_bias);
+					 elf_brk + load_bias,
+					 bss_prot);
 			if (retval) {
 				send_sig(SIGKILL, current, 0);
 				goto out_free_dentry;
@@ -852,8 +871,10 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 		if (end_data < k)
 			end_data = k;
 		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
-		if (k > elf_brk)
+		if (k > elf_brk) {
+			bss_prot = elf_prot;
 			elf_brk = k;
+		}
 	}
 
 	loc->elf_ex.e_entry += load_bias;
@@ -869,7 +890,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 	 * mapping in the interpreter, to make sure it doesn't wind
 	 * up getting placed where the bss needs to go.
 	 */
-	retval = set_brk(elf_bss, elf_brk);
+	retval = set_brk(elf_bss, elf_brk, bss_prot);
 	if (retval) {
 		send_sig(SIGKILL, current, 0);
 		goto out_free_dentry;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] PPC: Correct the tophys/tovirt macros
From: Jason Gunthorpe @ 2012-09-30 23:28 UTC (permalink / raw)
  To: linuxppc-dev

asm/page.h discusses the calculation for v2p and p2v, it should be:
 va = pa + KERNELBASE - PHYSICAL_START
which is the same as:
 va = pa + LOAD_OFFSET

tophys/tovirt were using PAGE_OFFSET, which as page.h says, is almost
always the same thing.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 arch/powerpc/include/asm/ppc_asm.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index ea2a86e..44edc3a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -461,14 +461,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define fromreal(rd)	tovirt(rd,rd)
 
 #define tophys(rd,rs)				\
-0:	addis	rd,rs,-PAGE_OFFSET@h;		\
+0:	addis	rd,rs,-LOAD_OFFSET@h;		\
 	.section ".vtop_fixup","aw";		\
 	.align  1;				\
 	.long   0b;				\
 	.previous
 
 #define tovirt(rd,rs)				\
-0:	addis	rd,rs,PAGE_OFFSET@h;		\
+0:	addis	rd,rs,LOAD_OFFSET@h;		\
 	.section ".ptov_fixup","aw";		\
 	.align  1;				\
 	.long   0b;				\
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] PPC: Enable the Watchdog vector for 405
From: Jason Gunthorpe @ 2012-09-30 23:27 UTC (permalink / raw)
  To: linuxppc-dev

Move the body of the PIT exception out of line to make room.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 arch/powerpc/kernel/head_40x.S |   26 +++++++++++++-------------
 arch/powerpc/kernel/traps.c    |    2 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 4989661..7edd7b1 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -431,29 +431,19 @@ label:
 
 /* 0x1000 - Programmable Interval Timer (PIT) Exception */
 	START_EXCEPTION(0x1000, Decrementer)
-	NORMAL_EXCEPTION_PROLOG
-	lis	r0,TSR_PIS@h
-	mtspr	SPRN_TSR,r0		/* Clear the PIT exception */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	EXC_XFER_LITE(0x1000, timer_interrupt)
+	b pit_longer
 
-#if 0
 /* NOTE:
- * FIT and WDT handlers are not implemented yet.
+ * FIT handler is not implemented yet.
  */
 
 /* 0x1010 - Fixed Interval Timer (FIT) Exception
 */
-	STND_EXCEPTION(0x1010,	FITException,		unknown_exception)
+//	STND_EXCEPTION(0x1010,	FITException,		unknown_exception)
 
 /* 0x1020 - Watchdog Timer (WDT) Exception
 */
-#ifdef CONFIG_BOOKE_WDT
 	CRITICAL_EXCEPTION(0x1020, WDTException, WatchdogException)
-#else
-	CRITICAL_EXCEPTION(0x1020, WDTException, unknown_exception)
-#endif
-#endif
 
 /* 0x1100 - Data TLB Miss Exception
  * As the name implies, translation is not in the MMU, so search the
@@ -738,6 +728,16 @@ label:
 		(MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
 		NOCOPY, crit_transfer_to_handler, ret_from_crit_exc)
 
+	/* Programmable Interval Timer (PIT) Exception. The PIT runs into
+	   the space reserved for other exceptions, so we branch down
+	   to here. */
+pit_longer:
+	NORMAL_EXCEPTION_PROLOG
+	lis	r0,TSR_PIS@h
+	mtspr	SPRN_TSR,r0		/* Clear the PIT exception */
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	EXC_XFER_LITE(0x1000, timer_interrupt)
+
 /*
  * The other Data TLB exceptions bail out to this point
  * if they can't resolve the lightweight TLB fault.
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ae0843f..0701ec1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1514,7 +1514,7 @@ void unrecoverable_exception(struct pt_regs *regs)
 	die("Unrecoverable exception", regs, SIGABRT);
 }
 
-#ifdef CONFIG_BOOKE_WDT
+#if defined(CONFIG_BOOKE_WDT) | defined(CONFIG_40x)
 /*
  * Default handler for a Watchdog exception,
  * spins until a reboot occurs
-- 
1.7.4.1

^ permalink raw reply related

* Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem
From: Yasuaki Ishimatsu @ 2012-10-01  3:03 UTC (permalink / raw)
  To: Ni zhan Chen
  Cc: linux-s390, linux-ia64, Wen Congyang, len.brown, linux-acpi,
	linux-sh, x86, linux-kernel, cmetcalf, linux-mm, paulus,
	minchan.kim, kosaki.motohiro, rientjes, sparclinux, cl,
	linuxppc-dev, akpm, liuj97
In-Reply-To: <506659D7.9080904@gmail.com>

Hi Chen,

2012/09/29 11:15, Ni zhan Chen wrote:
> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> The function get_page_bootmem() may be called more than one time to the same
>> page. There is no need to set page's type, private if the function is not
>> the first time called to the page.
>>
>> Note: the patch is just optimization and does not fix any problem.
>
> Hi Yasuaki,
>
> this patch is reasonable to me. I have another question associated to get_page_bootmem(), the question is from another fujitsu guy's patch changelog [commit : 04753278769f3], the changelog said  that:
>
>   1) When the memmap of removing section is allocated on other
>       section by bootmem, it should/can be free.
>   2) When the memmap of removing section is allocated on the
>       same section, it shouldn't be freed. Because the section has to be
>       logical memory offlined already and all pages must be isolated against
>       page allocater. If it is freed, page allocator may use it which will
>       be removed physically soon.
>
> but I don't see his patch guarantee 2), it means that his patch doesn't guarantee the memmap of removing section which is allocated on other section by bootmem doesn't be freed. Hopefully get your explaination in details, thanks in advance. :-)

In my understanding, the patch does not guarantee it.
Please see [commit : 0c0a4a517a31e]. free_map_bootmem() in the commit
guarantees it.

Thanks,
Yasuaki Ishimatsu

>
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> ---
>>   mm/memory_hotplug.c |   15 +++++++++++----
>>   1 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index d736df3..26a5012 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>>   static void get_page_bootmem(unsigned long info,  struct page *page,
>>                    unsigned long type)
>>   {
>> -    page->lru.next = (struct list_head *) type;
>> -    SetPagePrivate(page);
>> -    set_page_private(page, info);
>> -    atomic_inc(&page->_count);
>> +    unsigned long page_type;
>> +
>> +    page_type = (unsigned long)page->lru.next;
>> +    if (page_type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
>> +        page_type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
>> +        page->lru.next = (struct list_head *)type;
>> +        SetPagePrivate(page);
>> +        set_page_private(page, info);
>> +        atomic_inc(&page->_count);
>> +    } else
>> +        atomic_inc(&page->_count);
>>   }
>>   /* reference to __meminit __free_pages_bootmem is valid
>

^ permalink raw reply

* Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Yasuaki Ishimatsu @ 2012-10-01  4:44 UTC (permalink / raw)
  To: Ni zhan Chen
  Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
	x86, linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
	kosaki.motohiro, rientjes, sparclinux, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <5066AF1A.4090509@gmail.com>

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:
> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> This patch series aims to support physical memory hot-remove.
>>
>> The patches can free/remove the following things:
>>
>>    - acpi_memory_info                          : [RFC PATCH 4/19]
>>    - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
>>    - iomem_resource                            : [RFC PATCH 9/19]
>>    - mem_section and related sysfs files       : [RFC PATCH 10-11, 13-16/19]
>>    - page table of removed memory              : [RFC PATCH 12/19]
>>    - node and related sysfs files              : [RFC PATCH 18-19/19]
>>
>> If you find lack of function for physical memory hot-remove, please let me
>> know.
>>
>> How to test this patchset?
>> 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
>>     ACPI_HOTPLUG_MEMORY must be selected.
>> 2. load the module acpi_memhotplug
>
> Hi Yasuaki,
>
> where is the acpi_memhotplug module?

If you build acpi_memhotplug as module, it is created under
/lib/modules/<kernel-version>/driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.  

Thanks,
Yasuaki Ishimatsu

>
>> 3. hotplug the memory device(it depends on your hardware)
>>     You will see the memory device under the directory /sys/bus/acpi/devices/.
>>     Its name is PNP0C80:XX.
>> 4. online/offline pages provided by this memory device
>>     You can write online/offline to /sys/devices/system/memory/memoryX/state to
>>     online/offline pages provided by this memory device
>> 5. hotremove the memory device
>>     You can hotremove the memory device by the hardware, or writing 1 to
>>     /sys/bus/acpi/devices/PNP0C80:XX/eject.
>>
>> Note: if the memory provided by the memory device is used by the kernel, it
>> can't be offlined. It is not a bug.
>>
>> Known problems:
>> 1. memory can't be offlined when CONFIG_MEMCG is selected.
>>     For example: there is a memory device on node 1. The address range
>>     is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
>>     and memory11 under the directory /sys/devices/system/memory/.
>>     If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
>>     when we online pages. When we online memory8, the memory stored page cgroup
>>     is not provided by this memory device. But when we online memory9, the memory
>>     stored page cgroup may be provided by memory8. So we can't offline memory8
>>     now. We should offline the memory in the reversed order.
>>     When the memory device is hotremoved, we will auto offline memory provided
>>     by this memory device. But we don't know which memory is onlined first, so
>>     offlining memory may fail. In such case, you should offline the memory by
>>     hand before hotremoving the memory device.
>> 2. hotremoving memory device may cause kernel panicked
>>     This bug will be fixed by Liu Jiang's patch:
>>     https://lkml.org/lkml/2012/7/3/1
>>
>> change log of v9:
>>   [RFC PATCH v9 8/21]
>>     * add a lock to protect the list map_entries
>>     * add an indicator to firmware_map_entry to remember whether the memory
>>       is allocated from bootmem
>>   [RFC PATCH v9 10/21]
>>     * change the macro to inline function
>>   [RFC PATCH v9 19/21]
>>     * don't offline the node if the cpu on the node is onlined
>>   [RFC PATCH v9 21/21]
>>     * create new patch: auto offline page_cgroup when onlining memory block
>>       failed
>>
>> change log of v8:
>>   [RFC PATCH v8 17/20]
>>     * Fix problems when one node's range include the other nodes
>>   [RFC PATCH v8 18/20]
>>     * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
>>       is not defined.
>>   [RFC PATCH v8 19/20]
>>     * don't offline node when some memory sections are not removed
>>   [RFC PATCH v8 20/20]
>>     * create new patch: clear hwpoisoned flag when onlining pages
>>
>> change log of v7:
>>   [RFC PATCH v7 4/19]
>>     * do not continue if acpi_memory_device_remove_memory() fails.
>>   [RFC PATCH v7 15/19]
>>     * handle usemap in register_page_bootmem_info_section() too.
>>
>> change log of v6:
>>   [RFC PATCH v6 12/19]
>>     * fix building error on other archtitectures than x86
>>
>>   [RFC PATCH v6 15-16/19]
>>     * fix building error on other archtitectures than x86
>>
>> change log of v5:
>>   * merge the patchset to clear page table and the patchset to hot remove
>>     memory(from ishimatsu) to one big patchset.
>>
>>   [RFC PATCH v5 1/19]
>>     * rename remove_memory() to offline_memory()/offline_pages()
>>
>>   [RFC PATCH v5 2/19]
>>     * new patch: implement offline_memory(). This function offlines pages,
>>       update memory block's state, and notify the userspace that the memory
>>       block's state is changed.
>>
>>   [RFC PATCH v5 4/19]
>>     * offline and remove memory in acpi_memory_disable_device() too.
>>
>>   [RFC PATCH v5 17/19]
>>     * new patch: add a new function __remove_zone() to revert the things done
>>       in the function __add_zone().
>>
>>   [RFC PATCH v5 18/19]
>>     * flush work befor reseting node device.
>>
>> change log of v4:
>>   * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
>>     from the patch series, since the patch is a bugfix. It is being disccussed
>>     on other thread. But for testing the patch series, the patch is needed.
>>     So I added the patch as [PATCH 0/13].
>>
>>   [RFC PATCH v4 2/13]
>>     * check memory is online or not at remove_memory()
>>     * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
>>       getting node id
>>   [RFC PATCH v4 3/13]
>>     * create new patch : check memory is online or not at online_pages()
>>
>>   [RFC PATCH v4 4/13]
>>     * add __ref section to remove_memory()
>>     * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()
>>
>>   [RFC PATCH v4 11/13]
>>     * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD
>>
>> change log of v3:
>>   * rebase to 3.5.0-rc6
>>
>>   [RFC PATCH v2 2/13]
>>     * remove extra kobject_put()
>>
>>     * The patch was commented by Wen. Wen's comment is
>>       "acpi_memory_device_remove() should ignore a return value of
>>       remove_memory() since caller does not care the return value".
>>       But I did not change it since I think caller should care the
>>       return value. And I am trying to fix it as follow:
>>
>>       https://lkml.org/lkml/2012/7/5/624
>>
>>   [RFC PATCH v2 4/13]
>>     * remove a firmware_memmap_entry allocated by kzmalloc()
>>
>> change log of v2:
>>   [RFC PATCH v2 2/13]
>>     * check whether memory block is offline or not before calling offline_memory()
>>     * check whether section is valid or not in is_memblk_offline()
>>     * call kobject_put() for each memory_block in is_memblk_offline()
>>
>>   [RFC PATCH v2 3/13]
>>     * unify the end argument of firmware_map_add_early/hotplug
>>
>>   [RFC PATCH v2 4/13]
>>     * add release_firmware_map_entry() for freeing firmware_map_entry
>>
>>   [RFC PATCH v2 6/13]
>>    * add release_memory_block() for freeing memory_block
>>
>>   [RFC PATCH v2 11/13]
>>    * fix wrong arguments of free_pages()
>>
>>
>> Wen Congyang (8):
>>    memory-hotplug: implement offline_memory()
>>    memory-hotplug: store the node id in acpi_memory_device
>>    memory-hotplug: export the function acpi_bus_remove()
>>    memory-hotplug: call acpi_bus_remove() to remove memory device
>>    memory-hotplug: introduce new function arch_remove_memory()
>>    memory-hotplug: remove sysfs file of node
>>    memory-hotplug: clear hwpoisoned flag when onlining pages
>>    memory-hotplug: auto offline page_cgroup when onlining memory block
>>      failed
>>
>> Yasuaki Ishimatsu (13):
>>    memory-hotplug: rename remove_memory() to
>>      offline_memory()/offline_pages()
>>    memory-hotplug: offline and remove memory when removing the memory
>>      device
>>    memory-hotplug: check whether memory is present or not
>>    memory-hotplug: remove /sys/firmware/memmap/X sysfs
>>    memory-hotplug: does not release memory region in PAGES_PER_SECTION
>>      chunks
>>    memory-hotplug: add memory_block_release
>>    memory-hotplug: remove_memory calls __remove_pages
>>    memory-hotplug: check page type in get_page_bootmem
>>    memory-hotplug: move register_page_bootmem_info_node and
>>      put_page_bootmem for sparse-vmemmap
>>    memory-hotplug: implement register_page_bootmem_info_section of
>>      sparse-vmemmap
>>    memory-hotplug: free memmap of sparse-vmemmap
>>    memory_hotplug: clear zone when the memory is removed
>>    memory-hotplug: add node_device_release
>>
>>   arch/ia64/mm/discontig.c                        |   14 +
>>   arch/ia64/mm/init.c                             |   16 +
>>   arch/powerpc/mm/init_64.c                       |   14 +
>>   arch/powerpc/mm/mem.c                           |   14 +
>>   arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
>>   arch/s390/mm/init.c                             |   12 +
>>   arch/s390/mm/vmem.c                             |   14 +
>>   arch/sh/mm/init.c                               |   15 +
>>   arch/sparc/mm/init_64.c                         |   14 +
>>   arch/tile/mm/init.c                             |    8 +
>>   arch/x86/include/asm/pgtable_types.h            |    1 +
>>   arch/x86/mm/init_32.c                           |   10 +
>>   arch/x86/mm/init_64.c                           |  331 ++++++++++++++++++
>>   arch/x86/mm/pageattr.c                          |   47 ++--
>>   drivers/acpi/acpi_memhotplug.c                  |   54 +++-
>>   drivers/acpi/scan.c                             |    3 +-
>>   drivers/base/memory.c                           |   88 ++++-
>>   drivers/base/node.c                             |   11 +
>>   drivers/firmware/memmap.c                       |   98 +++++-
>>   include/acpi/acpi_bus.h                         |    1 +
>>   include/linux/firmware-map.h                    |    6 +
>>   include/linux/memory.h                          |    5 +
>>   include/linux/memory_hotplug.h                  |   25 +-
>>   include/linux/mm.h                              |    5 +-
>>   include/linux/mmzone.h                          |   19 +
>>   mm/memory_hotplug.c                             |  424 +++++++++++++++++++++--
>>   mm/page_cgroup.c                                |    3 +
>>   mm/sparse.c                                     |    5 +-
>>   28 files changed, 1181 insertions(+), 92 deletions(-)
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>

^ permalink raw reply

* Re: [RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device
From: Yasuaki Ishimatsu @ 2012-10-01  7:38 UTC (permalink / raw)
  To: Ni zhan Chen
  Cc: linux-s390, linux-ia64, wency, linux-acpi, linux-sh, len.brown,
	x86, linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
	kosaki.motohiro, rientjes, sparclinux, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <506517C1.7050909@gmail.com>

Hi Chen,

2012/09/28 12:21, Ni zhan Chen wrote:
> On 09/05/2012 05:25 PM, wency@cn.fujitsu.com wrote:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>>
>> The memory device has only one node id. Store the node id when
>> enable the memory device, and we can reuse it when removing the
>> memory device.
>
> one question:
> if use numa emulation, memory device will associated to one node or ...?

Memory device has only one node, even if you use numa emulation.

Thanks,
Yasuaki Ishimatsu

>
>>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Jiang Liu <liuj97@gmail.com>
>> CC: Len Brown <len.brown@intel.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Christoph Lameter <cl@linux.com>
>> Cc: Minchan Kim <minchan.kim@gmail.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> CC: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> ---
>>   drivers/acpi/acpi_memhotplug.c |    4 ++++
>>   1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 2a7beac..7873832 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -83,6 +83,7 @@ struct acpi_memory_info {
>>   struct acpi_memory_device {
>>       struct acpi_device * device;
>>       unsigned int state;    /* State of the memory device */
>> +    int nid;
>>       struct list_head res_list;
>>   };
>> @@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>           info->enabled = 1;
>>           num_enabled++;
>>       }
>> +
>> +    mem_device->nid = node;
>> +
>>       if (!num_enabled) {
>>           printk(KERN_ERR PREFIX "add_memory failed\n");
>>           mem_device->state = MEMORY_INVALID_STATE;
>

^ permalink raw reply

* Re: [PATCH] PPC: Enable the Watchdog vector for 405
From: Josh Boyer @ 2012-10-01 12:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linuxppc-dev
In-Reply-To: <20120930232723.GF30637@obsidianresearch.com>

On Sun, Sep 30, 2012 at 7:27 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> Move the body of the PIT exception out of line to make room.

What boards did you test this on?  What driver are you using for the
watchdog?

> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  arch/powerpc/kernel/head_40x.S |   26 +++++++++++++-------------
>  arch/powerpc/kernel/traps.c    |    2 +-
>  2 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 4989661..7edd7b1 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -431,29 +431,19 @@ label:
>
>  /* 0x1000 - Programmable Interval Timer (PIT) Exception */
>         START_EXCEPTION(0x1000, Decrementer)
> -       NORMAL_EXCEPTION_PROLOG
> -       lis     r0,TSR_PIS@h
> -       mtspr   SPRN_TSR,r0             /* Clear the PIT exception */
> -       addi    r3,r1,STACK_FRAME_OVERHEAD
> -       EXC_XFER_LITE(0x1000, timer_interrupt)
> +       b pit_longer
>
> -#if 0
>  /* NOTE:
> - * FIT and WDT handlers are not implemented yet.
> + * FIT handler is not implemented yet.
>   */
>
>  /* 0x1010 - Fixed Interval Timer (FIT) Exception
>  */
> -       STND_EXCEPTION(0x1010,  FITException,           unknown_exception)
> +//     STND_EXCEPTION(0x1010,  FITException,           unknown_exception)

Please just move the #endif for the #if 0 up instead of putting a C++
style comment here.

>  /* 0x1020 - Watchdog Timer (WDT) Exception
>  */
> -#ifdef CONFIG_BOOKE_WDT
>         CRITICAL_EXCEPTION(0x1020, WDTException, WatchdogException)
> -#else
> -       CRITICAL_EXCEPTION(0x1020, WDTException, unknown_exception)
> -#endif
> -#endif

Please leave this wrapped in CONFIG_BOOKE_WDT.  I don't agree with
unconditionally enabling this for every 405 chip out there.

>  /* 0x1100 - Data TLB Miss Exception
>   * As the name implies, translation is not in the MMU, so search the
> @@ -738,6 +728,16 @@ label:
>                 (MSR_KERNEL & ~(MSR_ME|MSR_DE|MSR_CE)), \
>                 NOCOPY, crit_transfer_to_handler, ret_from_crit_exc)
>
> +       /* Programmable Interval Timer (PIT) Exception. The PIT runs into
> +          the space reserved for other exceptions, so we branch down
> +          to here. */
> +pit_longer:
> +       NORMAL_EXCEPTION_PROLOG
> +       lis     r0,TSR_PIS@h
> +       mtspr   SPRN_TSR,r0             /* Clear the PIT exception */
> +       addi    r3,r1,STACK_FRAME_OVERHEAD
> +       EXC_XFER_LITE(0x1000, timer_interrupt)
> +
>  /*
>   * The other Data TLB exceptions bail out to this point
>   * if they can't resolve the lightweight TLB fault.
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index ae0843f..0701ec1 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1514,7 +1514,7 @@ void unrecoverable_exception(struct pt_regs *regs)
>         die("Unrecoverable exception", regs, SIGABRT);
>  }
>
> -#ifdef CONFIG_BOOKE_WDT
> +#if defined(CONFIG_BOOKE_WDT) | defined(CONFIG_40x)

Pretty sure you meant || here?  Thought if you just enable the existing
config option, I don't think you'd need to edit this file at all.

josh

^ permalink raw reply

* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: Alexander Graf @ 2012-10-01 14:03 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs, Jan Kara, Linus Torvalds, LKML List, anton,
	skinsbursky, bfields, linuxppc-dev
In-Reply-To: <20120928151043.GA19102@fieldses.org>


On 28.09.2012, at 17:10, J. Bruce Fields wrote:

> On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
>>=20
>> On 28.09.2012, at 04:04, Linus Torvalds wrote:
>>=20
>>> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <agraf@suse.de> =
wrote:
>>>>=20
>>>> Below are OOPS excerpts from different rc's I tried. All of them =
crashed - all the way up to current Linus' master branch. I haven't =
cross-checked, but I don't remember any such behavior from pre-3.6 =
releases.
>>>=20
>>> Since you seem to be able to reproduce it easily (and apparently
>>> reliably), any chance you could just bisect it?
>>>=20
>>> Since I assume v3.5 is fine, and apparently -rc1 is already busted, =
a simple
>>>=20
>>>  git bisect start
>>>  git bisect good v3.5
>>>  git bisect bad v3.6-rc1
>>>=20
>>> will get you started on your adventure..
>>=20
>> Heh, will give it a try :). The thing really does look quite =
bisectable.
>>=20
>>=20
>> It might take a few hours though - the machine isn't exactly fast by =
today's standards and it's getting late here. But I'll keep you updated.
>=20
> I doubt it's anything special about that workload, but just for kicks =
I
> tried a "git clone -ls" (cloning my linux tree to another directory on
> the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
> anything interesting (just an xfs lockdep warning that looks like this
> one jlayton already reported:
> http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
> )
>=20
> Any (even partial) bisection results would certainly be useful, =
thanks.

Phew. Here we go :). It looks to be more of a PPC specific problem than =
it appeared as at first:


b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
Author: Anton Blanchard <anton@samba.org>
Date:   Thu Jun 7 18:14:48 2012 +0000

    powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter =
performance
   =20
    At the moment all queues in a multiqueue adapter will serialise
    against the IOMMU table lock. This is proving to be a big issue,
    especially with 10Gbit ethernet.
   =20
    This patch creates 4 pools and tries to spread the load across
    them. If the table is under 1GB in size we revert back to the
    original behaviour of 1 pool and 1 largealloc pool.
   =20
    We create a hash to map CPUs to pools. Since we prefer interrupts to
    be affinitised to primary CPUs, without some form of hashing we are
    very likely to end up using the same pool. As an example, POWER7
    has 4 way SMT and with 4 pools all primary threads will map to the
    same pool.
   =20
    The largealloc pool is reduced from 1/2 to 1/4 of the space to
    partially offset the overhead of breaking the table up into pools.
   =20
    Some performance numbers were obtained with a Chelsio T3 adapter on
    two POWER7 boxes, running a 100 session TCP round robin test.
   =20
    Performance improved 69% with this patch applied.
   =20
    Signed-off-by: Anton Blanchard <anton@samba.org>
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

:040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 =
6755a8c4a690cc80dcf834d1127f21db925476d6 M	arch


Alex

^ permalink raw reply

* Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC
From: J. Bruce Fields @ 2012-10-01 15:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-nfs, Jan Kara, Linus Torvalds, LKML List, anton,
	skinsbursky, bfields, linuxppc-dev
In-Reply-To: <2A52FC96-148C-4F7A-9950-E152E0C6698D@suse.de>

On Mon, Oct 01, 2012 at 04:03:06PM +0200, Alexander Graf wrote:
> 
> On 28.09.2012, at 17:10, J. Bruce Fields wrote:
> 
> > On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
> >> 
> >> On 28.09.2012, at 04:04, Linus Torvalds wrote:
> >> 
> >>> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf <agraf@suse.de> wrote:
> >>>> 
> >>>> Below are OOPS excerpts from different rc's I tried. All of them crashed - all the way up to current Linus' master branch. I haven't cross-checked, but I don't remember any such behavior from pre-3.6 releases.
> >>> 
> >>> Since you seem to be able to reproduce it easily (and apparently
> >>> reliably), any chance you could just bisect it?
> >>> 
> >>> Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
> >>> 
> >>>  git bisect start
> >>>  git bisect good v3.5
> >>>  git bisect bad v3.6-rc1
> >>> 
> >>> will get you started on your adventure..
> >> 
> >> Heh, will give it a try :). The thing really does look quite bisectable.
> >> 
> >> 
> >> It might take a few hours though - the machine isn't exactly fast by today's standards and it's getting late here. But I'll keep you updated.
> > 
> > I doubt it's anything special about that workload, but just for kicks I
> > tried a "git clone -ls" (cloning my linux tree to another directory on
> > the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
> > anything interesting (just an xfs lockdep warning that looks like this
> > one jlayton already reported:
> > http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
> > )
> > 
> > Any (even partial) bisection results would certainly be useful, thanks.
> 
> Phew. Here we go :). It looks to be more of a PPC specific problem than it appeared as at first:

Yep, thanks--I'll assume this is somebody else's problem until somebody
tells me otherwise!

--b.

> 
> 
> b4c3a8729ae57b4f84d661e16a192f828eca1d03 is first bad commit
> commit b4c3a8729ae57b4f84d661e16a192f828eca1d03
> Author: Anton Blanchard <anton@samba.org>
> Date:   Thu Jun 7 18:14:48 2012 +0000
> 
>     powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance
>     
>     At the moment all queues in a multiqueue adapter will serialise
>     against the IOMMU table lock. This is proving to be a big issue,
>     especially with 10Gbit ethernet.
>     
>     This patch creates 4 pools and tries to spread the load across
>     them. If the table is under 1GB in size we revert back to the
>     original behaviour of 1 pool and 1 largealloc pool.
>     
>     We create a hash to map CPUs to pools. Since we prefer interrupts to
>     be affinitised to primary CPUs, without some form of hashing we are
>     very likely to end up using the same pool. As an example, POWER7
>     has 4 way SMT and with 4 pools all primary threads will map to the
>     same pool.
>     
>     The largealloc pool is reduced from 1/2 to 1/4 of the space to
>     partially offset the overhead of breaking the table up into pools.
>     
>     Some performance numbers were obtained with a Chelsio T3 adapter on
>     two POWER7 boxes, running a 100 session TCP round robin test.
>     
>     Performance improved 69% with this patch applied.
>     
>     Signed-off-by: Anton Blanchard <anton@samba.org>
>     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> :040000 040000 039ae3cbdcfded9c6b13e58a3fc67609f1b587b0 6755a8c4a690cc80dcf834d1127f21db925476d6 M	arch
> 
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] PPC: Enable the Watchdog vector for 405
From: Jason Gunthorpe @ 2012-10-01 16:25 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev
In-Reply-To: <CA+5PVA767HNZUoFNj=UubLCdh91EtWiuK2JaY2qarMn=oCCB+A@mail.gmail.com>

On Mon, Oct 01, 2012 at 08:16:29AM -0400, Josh Boyer wrote:
> On Sun, Sep 30, 2012 at 7:27 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> > Move the body of the PIT exception out of line to make room.
> 
> What boards did you test this on?  What driver are you using for the
> watchdog?

Tested on a 405F6 core (Xilinx's variant), the board is custom, and
the control for the watchdog SPRs was bundled into a watchdog driver
for the board's watchdog controller.

> >  /* 0x1010 - Fixed Interval Timer (FIT) Exception
> >  */
> > -       STND_EXCEPTION(0x1010,  FITException,           unknown_exception)
> > +//     STND_EXCEPTION(0x1010,  FITException,           unknown_exception)
> 
> Please just move the #endif for the #if 0 up instead of putting a C++
> style comment here.

Sure
 
> >  /* 0x1020 - Watchdog Timer (WDT) Exception
> >  */
> > -#ifdef CONFIG_BOOKE_WDT
> >         CRITICAL_EXCEPTION(0x1020, WDTException, WatchdogException)
> > -#else
> > -       CRITICAL_EXCEPTION(0x1020, WDTException, unknown_exception)
> > -#endif
> > -#endif
> 
> Please leave this wrapped in CONFIG_BOOKE_WDT.  I don't agree with
> unconditionally enabling this for every 405 chip out there.

What are you concerned with? If some core varient does not put a
watchdog there, then you still get a panic from the default watchdog
exception handler..

> > -#ifdef CONFIG_BOOKE_WDT
> > +#if defined(CONFIG_BOOKE_WDT) | defined(CONFIG_40x)
> 
> Pretty sure you meant || here?  Thought if you just enable the existing
> config option, I don't think you'd need to edit this file at all.

Yes, I didn't want to use BOOKE_WDT because I have not tested that
driver, nor do I want that driver included in my kernel.. I think the
watchdog driver in use should be orthogonal to having the exception
wired in?

Jason

^ permalink raw reply

* Re: [PATCH] PPC: Enable the Watchdog vector for 405
From: Josh Boyer @ 2012-10-01 17:32 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linuxppc-dev
In-Reply-To: <20121001162547.GD31620@obsidianresearch.com>

On Mon, Oct 1, 2012 at 12:25 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Mon, Oct 01, 2012 at 08:16:29AM -0400, Josh Boyer wrote:
>> On Sun, Sep 30, 2012 at 7:27 PM, Jason Gunthorpe
>> <jgunthorpe@obsidianresearch.com> wrote:
>> > Move the body of the PIT exception out of line to make room.
>>
>> What boards did you test this on?  What driver are you using for the
>> watchdog?
>
> Tested on a 405F6 core (Xilinx's variant), the board is custom, and
> the control for the watchdog SPRs was bundled into a watchdog driver
> for the board's watchdog controller.
>
>> >  /* 0x1020 - Watchdog Timer (WDT) Exception
>> >  */
>> > -#ifdef CONFIG_BOOKE_WDT
>> >         CRITICAL_EXCEPTION(0x1020, WDTException, WatchdogException)
>> > -#else
>> > -       CRITICAL_EXCEPTION(0x1020, WDTException, unknown_exception)
>> > -#endif
>> > -#endif
>>
>> Please leave this wrapped in CONFIG_BOOKE_WDT.  I don't agree with
>> unconditionally enabling this for every 405 chip out there.
>
> What are you concerned with? If some core varient does not put a
> watchdog there, then you still get a panic from the default watchdog
> exception handler..

I'm concerned with the fact that you've moved PIT and now enabled
something that's been enabled for years.  There's no need to do it like
that.

>> > -#ifdef CONFIG_BOOKE_WDT
>> > +#if defined(CONFIG_BOOKE_WDT) | defined(CONFIG_40x)
>>
>> Pretty sure you meant || here?  Thought if you just enable the existing
>> config option, I don't think you'd need to edit this file at all.
>
> Yes, I didn't want to use BOOKE_WDT because I have not tested that
> driver, nor do I want that driver included in my kernel.. I think the
> watchdog driver in use should be orthogonal to having the exception
> wired in?

And it certainly can be.  Just make the driver a module and don't
install it or load it.  The #ifdef will still evaluate to true.

josh

^ permalink raw reply

* Re: [PATCH] PPC: Enable the Watchdog vector for 405
From: Jason Gunthorpe @ 2012-10-01 17:48 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev
In-Reply-To: <CA+5PVA5qW-toX1_UA9wATy=q69_yK5RXd3cXKSy6-LohExJ3+Q@mail.gmail.com>

On Mon, Oct 01, 2012 at 01:32:47PM -0400, Josh Boyer wrote:
> On Mon, Oct 1, 2012 at 12:25 PM, Jason Gunthorpe

> >> Please leave this wrapped in CONFIG_BOOKE_WDT.  I don't agree with
> >> unconditionally enabling this for every 405 chip out there.
> >
> > What are you concerned with? If some core varient does not put a
> > watchdog there, then you still get a panic from the default watchdog
> > exception handler..
> 
> I'm concerned with the fact that you've moved PIT and now enabled
> something that's been enabled for years.  There's no need to do it like
> that.

Well, just moving the ifdef still keeps the PIT change, and either the
vector is never called and it is harmless to add the new entry point,
or CPUs have been randomly calling into DTLBMiss for years, which
seems worth discovering.

FWIW, this patch has been carried in our tree since about 2.6.14,
mind you we only use two 405 varients.

> > Yes, I didn't want to use BOOKE_WDT because I have not tested that
> > driver, nor do I want that driver included in my kernel.. I think the
> > watchdog driver in use should be orthogonal to having the exception
> > wired in?
> 
> And it certainly can be.  Just make the driver a module and don't
> install it or load it.  The #ifdef will still evaluate to true.

Well, we use non-modular kernels, but I can certainly patch the driver
out.

If I resend using BOOKE_WDT will you take it?

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall
From: Scott Wood @ 2012-10-01 19:11 UTC (permalink / raw)
  To: Chunhe Lan
  Cc: Wood Scott-B07421, Gala Kumar-B11780,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <506708BE.1090905@freescale.com>

On 09/29/2012 09:42:06 AM, Chunhe Lan wrote:
> On 09/28/2012 01:35 PM, Scott Wood wrote:
>> On 09/27/2012 05:33:26 PM, Kumar Gala wrote:
>>>=20
>>> On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:
>>>=20
>>> > On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
>>> >> On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
>>> >>> On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
>>> >>>> Original process of call:
>>> >>>>     The mpc85xx_pci_err_probe function completes to been =20
>>> registered
>>> >>>>     and enabled of EDAC PCI err driver at the latter time =20
>>> stage of
>>> >>>>     kernel boot in the mpc85xx_edac.c.
>>> >>>> Current process of call:
>>> >>>>     The mpc85xx_pci_err_probe function completes to been =20
>>> registered
>>> >>>>     and enabled of EDAC PCI err driver at the first    time =20
>>> stage of
>>> >>>>     kernel boot in the fsl_pci.c.
>>> >>>> So in this case the following error messages appear in the =20
>>> boot log:
>>> >>>>   PCI: Probing PCI hardware
>>> >>>>   pci 0000:00:00.0: ignoring class b20 (doesn't match header =20
>>> type 01)
>>> >>>>   PCIE error(s) detected
>>> >>>>   PCIE ERR_DR register: 0x00020000
>>> >>>>   PCIE ERR_CAP_STAT register: 0x80000001
>>> >>>>   PCIE ERR_CAP_R0 register: 0x00000800
>>> >>>>   PCIE ERR_CAP_R1 register: 0x00000000
>>> >>>>   PCIE ERR_CAP_R2 register: 0x00000000
>>> >>>>   PCIE ERR_CAP_R3 register: 0x00000000
>>> >>>> Because the EDAC PCI err driver is registered and enabled =20
>>> earlier than
>>> >>>> original point of call. But at this point of time, PCI =20
>>> hardware is not
>>> >>>> probed and initialized, and it is in unknowable state.
>>> >>>> So, move enable function into mpc85xx_pci_err_en which is =20
>>> called at the
>>> >>>> middle time stage of kernel boot and after PCI hardware is =20
>>> probed and
>>> >>>> initialized by device_initcall in the fsl_pci.c.
>>> >>>> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
>>> >>>> ---
>>> >>>> arch/powerpc/sysdev/fsl_pci.c |   12 ++++++++++
>>> >>>> arch/powerpc/sysdev/fsl_pci.h |    5 ++++
>>> >>>> drivers/edac/mpc85xx_edac.c   |   47 =20
>>> ++++++++++++++++++++++++++++------------
>>> >>>> 3 files changed, 50 insertions(+), 14 deletions(-)
>>> >>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c =20
>>> b/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> index 3d6f4d8..a591965 100644
>>> >>>> --- a/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>>> >>>> @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
>>> >>>>     return platform_driver_register(&fsl_pci_driver);
>>> >>>> }
>>> >>>> arch_initcall(fsl_pci_init);
>>> >>>> +
>>> >>>> +static int __init fsl_pci_err_en(void)
>>> >>>> +{
>>> >>>> +    struct device_node *np;
>>> >>>> +
>>> >>>> +    for_each_node_by_type(np, "pci")
>>> >>>> +        if (of_match_node(pci_ids, np))
>>> >>>> +            mpc85xx_pci_err_en(np);
>>> >>>> +
>>> >>>> +    return 0;
>>> >>>> +}
>>> >>>> +device_initcall(fsl_pci_err_en);
>>> >>>
>>> >>> Why can't you call this from the normal PCIe controller init, =20
>>> instead of searching for the node independently?
>>> >> Don't we have this now with mpc85xx_pci_err_probe() ??
>>> >
>>> > What do you mean by "this"?
>>>=20
>>> I'm saying don't we replace fsl_pci_err_en() with =20
>>> mpc85xx_pci_err_probe()...
>>>=20
>>> I need to look at this more, but not clear why mpc85xx_pci_err_en() =20
>>> can just be part of mpc85xx_pci_err_probe()
>>=20
>> OK, I was confused -- I thought the point was to make it happen =20
>> earlier, not later.  The changelog is not clear at all.
>>=20
>> Don't we want to be able to capture errors that happen during PCI =20
>> driver initialization, though?
>     Yes.
>     When PCI controller is probing slot which if the any device does =20
> not have on, happens the invalid address errors.
>     Then the edac driver prints the many error massages. This makes =20
> sense as normal, but this is ugly.
>     So, move the enable edac driver to later, and only detect the =20
> errors of the follow-up pci operations.

Is there any way to identify whether the error is the result of such a =20
probe?  If nothing else, you could identify whether a probe is taking =20
place -- better than not having any error detection during driver init.

-Scott=

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox