LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Benjamin Herrenschmidt @ 2010-10-19 10:16 UTC (permalink / raw)
  To: pacman; +Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <20101018213348.10281.qmail@kosh.dhis.org>


> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
> 
> I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> /proc/cpuinfo
> processor	: 0
> cpu		: 7447/7457
> clock		: 999.999990MHz
> revision	: 1.1 (pvr 8002 0101)
> bogomips	: 66.66
> timebase	: 33333333
> platform	: CHRP
> model		: Pegasos2
> machine		: CHRP Pegasos2
> Memory		: 512 MB

AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)

> My next thought was: right after the correct value appears in memory, unmap
> the page from the kernel and let it Oops when it tries to write there. Then I
> found out that the kernel is using BATs instead of page tables for its own
> view of memory. Booting with "nobats" completely changes the memory usage
> pattern (probably because it's allocating a lot of pages to hold PTEs that it
> didn't need before)

Right. And that hides the problem I suppose ?

> > You can also sprinkle tests for the page content through the code if
> > that doesn't work to try to "close in" on the culprit (for example if
> > it's a case of stray DMA, like a network driver bug or such).
> 
> No network drivers are loaded when this happens.

Ok.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH RFC] PPC: /dev/mem: All RAM is cacheable, not just the kernel's.
From: Li Yang @ 2010-10-19  9:37 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20101018223256.GA30946@udp111988uds.am.freescale.net>

On Tue, Oct 19, 2010 at 6:32 AM, Scott Wood <scottwood@freescale.com> wrote=
:
> If mem=3D is used on the kernel command line to create reserved regions
> for userspace to map using /dev/mem, let it be mapped cacheable as long
> as it is within the memory region described in the device tree.
>
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> This isn't just a performance issue, but it could also be a correctness
> issue, if the reserved portion of RAM is mapped cacheable by e.g. a KVM
> guest. =C2=A0This patch does not address cases where such regions could s=
how up
> as something other than a standard memory node -- such as shared regions
> in an AMP configuration. =C2=A0Ideally there would be some means for a pl=
atform
> to register cacheable regions, without having to completely replace the
> entire phys_mem_access_prot function.

Agreed.  Such requirement might be special case in server/desktop
world, but much more common in embedded market when migrating to
multi-core in which shared memory is commonly used for inter-core
communication.  It will be best to have a common way to achieve this.

Not only cache-ability but also coherent property might need to be tunable.

- Leo

^ permalink raw reply

* unsubscribe
From: Roberto Mantovani @ 2010-10-19  8:51 UTC (permalink / raw)
  To: Linuxppc-dev


-- 
Roberto Mantovani <rmantovani@automazionelogistica.it>
A&L srl - Automazione e Logistica www.automazionelogistica.it
Via Lidice, 20
40139 Bologna
Cap Sociale € 10.000 I.V.
C.F., P.IVA e registro imprese di Bologna N.02296311208

^ permalink raw reply

* RE: eSDHC controller driver on MPC8308rdb
From: Maria Johansen @ 2010-10-19  8:49 UTC (permalink / raw)
  To: Bo.Liu; +Cc: linuxppc-dev
In-Reply-To: <4CBD4EFB.6070603@windriver.com>

Pi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+RnJvbTogVG9ueWxpdSBbbWFpbHRvOkJvLkxp
dUB3aW5kcml2ZXIuY29tXSANCj48c25pcD4NCj4+IFRoYW5rcywgdGhlIG1pc3NpbmcgY2xvY2st
ZnJlcXVlbmN5IHdhcyBwYXJ0IG9mIG15IHByb2JsZW0uIFRoZSByZXN0IHdhcyBzb2x2ZWQgYnkg
YWRkaW5nIHNvbWUgbW9yZSBxdWlya3MgdG8gdGhlIGVzZGhjLWRyaXZlciAoU0RIQ0lfUVVJUktf
Rk9SQ0VfMV9CSVRfREFUQSBhbmQgPlNESENJX1FVSVJLX1JFU0VUX0FGVEVSX1JFUVVFU1QpLiBX
aHkgaXQgaXMgbmVjZXNzYXJ5IHRvIGZvcmNlIDEtYml0IGRhdGEgdHJhbnNmZXJzIEkgYW0gYSBi
aXQgdW5zdXJlIGFib3V0LCBzaW5jZSB0aGUgbXBjODMwOCByZWZlcmVuY2UgbWFudWFsIGNsZWFy
bHkgc3RhdGVzIHRoYXQgNC1iaXQgPnRyYW5zZmVycyBhcmUgc3VwcG9ydGVkLg0KPj4gICANCj5B
Y3R1YWxseSwgSSBkaWQgdXNlIDQtYml0IG1vZGUuIFlvdSBjYW4gdHJ5IGFub3RoZXIgMiBxdWlj
a3M6DQo+DQo+KyAgIGlmIChvZl9nZXRfcHJvcGVydHkobnAsICJzZGhjaSxhdXRvLWNtZDEyIiwg
TlVMTCkpDQo+KyAgICAgICBob3N0LT5xdWlya3MgfD0gU0RIQ0lfUVVJUktfTVVMVElCTE9DS19S
RUFEX0FDTUQxMjsNCj4rDQo+KyAgIGlmIChvZl9nZXRfcHJvcGVydHkobnAsICJzZGhjaSxicm9r
ZW4tdGltZW91dCIsIE5VTEwpKQ0KPisgICAgICAgaG9zdC0+cXVpcmtzIHw9IFNESENJX1FVSVJL
X0JST0tFTl9USU1FT1VUX1ZBTDsNCj4rDQo+DQoNClNESENJX1FVSVJLX01VTFRJQkxPQ0tfUkVB
RF9BQ01EMTIgd29ya2VkICh3aXRoIG15IG90aGVyIHF1aXJrcyBkaXNhYmxlZCksIHRoYW5rcyEN
Cg0KLS0NCk1hcmlhDQo=

^ permalink raw reply

* [QUESTION] MPC8343 'internal only' DMA support
From: KRONSTORFER Horst @ 2010-10-19  8:15 UTC (permalink / raw)
  To: linuxppc-dev

hi!

i assume the mpc8343 dma controllers ability to do internally controlled
operations (csb/csb)
is _not_ affected by deactivating externally controlled operations via
pinmultiplexing in sicrl.

am I correct?=20

thanks
horst kronstorfer

^ permalink raw reply

* RE: eSDHC controller driver on MPC8308rdb
From: Zang Roy-R61911 @ 2010-10-19  8:09 UTC (permalink / raw)
  To: Bo.Liu, Maria Johansen; +Cc: linuxppc-dev
In-Reply-To: <4CBD4EFB.6070603@windriver.com>



> -----Original Message-----
> From: =
linuxppc-dev-bounces+tie-fei.zang=3Dfreescale.com@lists.ozlabs.org
>
[mailto:linuxppc-dev-bounces+tie-fei.zang=3Dfreescale.com@lists.ozlabs.or=
g
] On
> Behalf Of Tonyliu
> Sent: Tuesday, October 19, 2010 15:56 PM
> To: Maria Johansen
> Cc: linuxppc-dev@lists.ozlabs.org
> Subject: Re: eSDHC controller driver on MPC8308rdb
>=20
> Maria Johansen wrote:
> >
> >> -----Original Message-----
> >> From: Tonyliu [mailto:Bo.Liu@windriver.com]
> >> <snip>
> >>
> >>> Could this be a problem related the eSDHC controller (or the
driver),
> >>> or is it the memory card? (a 4GB SanDisk Extreme SDHC card, which
> unfortunately is the only card I have available at the moment.) I will
keep
> digging into drivers/mmc/host/sdhci.c in search of a solution, and any
tips to
> how I should proceed would be greatly appreciated!
> >>>
> >>>
> >> Hi,
> >> Try the patch in attatchment. By default, on some e300 platforms
such as
> mpc8308_rdb, the entry "clock-frequency" of section sdchi in DTB is
not fixed
> by u-boot, so has to >caculate the input clock for sdhci controller
explicitly
> in driver.
> >>
> >> Tony
> >>
> >
> > Thanks, the missing clock-frequency was part of my problem. The rest
was
> solved by adding some more quirks to the esdhc-driver
> (SDHCI_QUIRK_FORCE_1_BIT_DATA and SDHCI_QUIRK_RESET_AFTER_REQUEST).
Why it is
> necessary to force 1-bit data transfers I am a bit unsure about, since
the
> mpc8308 reference manual clearly states that 4-bit transfers are
supported.
> >
> Actually, I did use 4-bit mode. You can try another 2 quicks:
>=20
> +   if (of_get_property(np, "sdhci,auto-cmd12", NULL))
> +       host->quirks |=3D SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12;
> +
> +   if (of_get_property(np, "sdhci,broken-timeout", NULL))
> +       host->quirks |=3D SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;
> +
>=20
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 9d4fdfa..b7a1ce4 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -807,8 +807,12 @@ static void sdhci_set_transfer_mode(struct
> sdhci_host *host,
>     WARN_ON(!host->data);
>=20
>     mode =3D SDHCI_TRNS_BLK_CNT_EN;
> -   if (data->blocks > 1)
> +   if (data->blocks > 1) {
> +       if (host->quirks & SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12)
> +           mode |=3D SDHCI_TRNS_MULTI | SDHCI_TRNS_ACMD12;
> +       else
>         mode |=3D SDHCI_TRNS_MULTI;
> +   }
>     if (data->flags & MMC_DATA_READ)
>         mode |=3D SDHCI_TRNS_READ;
>     if (host->flags & SDHCI_REQ_USE_DMA)
> @@ -1099,6 +1103,13 @@ static void sdhci_request(struct mmc_host *mmc,
> struct mmc_request *mrq)
>     sdhci_activate_led(host);
>  #endif
>=20
> +   if (host->quirks & SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12) {
> +       if (mrq->stop) {
> +           mrq->data->stop =3D NULL;
> +           mrq->stop =3D NULL;
> +       }
> +   }
> +
>     host->mrq =3D mrq;
>=20
>=20
> I don't have patch for upstream, you can add them manually.


https://patchwork.kernel.org/patch/176752/


Roy

^ permalink raw reply

* Re: eSDHC controller driver on MPC8308rdb
From: Tonyliu @ 2010-10-19  7:55 UTC (permalink / raw)
  To: Maria Johansen; +Cc: linuxppc-dev
In-Reply-To: <DF61B4AD1C20A54EADF5C05E1F1110DE08D44C@Exchange.parkairsystems.net>

Maria Johansen wrote:
>   
>> -----Original Message-----
>> From: Tonyliu [mailto:Bo.Liu@windriver.com] 
>> <snip>
>>     
>>> Could this be a problem related the eSDHC controller (or the driver), 
>>> or is it the memory card? (a 4GB SanDisk Extreme SDHC card, which unfortunately is the only card I have available at the moment.) I will keep digging into drivers/mmc/host/sdhci.c in search of a solution, and any tips to how I should proceed would be greatly appreciated!
>>>   
>>>       
>> Hi,
>> Try the patch in attatchment. By default, on some e300 platforms such as mpc8308_rdb, the entry "clock-frequency" of section sdchi in DTB is not fixed by u-boot, so has to >caculate the input clock for sdhci controller explicitly in driver.
>>
>> Tony
>>     
>
> Thanks, the missing clock-frequency was part of my problem. The rest was solved by adding some more quirks to the esdhc-driver (SDHCI_QUIRK_FORCE_1_BIT_DATA and SDHCI_QUIRK_RESET_AFTER_REQUEST). Why it is necessary to force 1-bit data transfers I am a bit unsure about, since the mpc8308 reference manual clearly states that 4-bit transfers are supported.
>   
Actually, I did use 4-bit mode. You can try another 2 quicks:

+   if (of_get_property(np, "sdhci,auto-cmd12", NULL))
+       host->quirks |= SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12;
+
+   if (of_get_property(np, "sdhci,broken-timeout", NULL))
+       host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;
+

===================================================
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 9d4fdfa..b7a1ce4 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -807,8 +807,12 @@ static void sdhci_set_transfer_mode(struct 
sdhci_host *host,
    WARN_ON(!host->data);

    mode = SDHCI_TRNS_BLK_CNT_EN;
-   if (data->blocks > 1)
+   if (data->blocks > 1) {
+       if (host->quirks & SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12)
+           mode |= SDHCI_TRNS_MULTI | SDHCI_TRNS_ACMD12;
+       else
        mode |= SDHCI_TRNS_MULTI;
+   }
    if (data->flags & MMC_DATA_READ)
        mode |= SDHCI_TRNS_READ;
    if (host->flags & SDHCI_REQ_USE_DMA)
@@ -1099,6 +1103,13 @@ static void sdhci_request(struct mmc_host *mmc, 
struct mmc_request *mrq)
    sdhci_activate_led(host);
 #endif

+   if (host->quirks & SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12) {
+       if (mrq->stop) {
+           mrq->data->stop = NULL;
+           mrq->stop = NULL;
+       }
+   }
+
    host->mrq = mrq;


I don't have patch for upstream, you can add them manually.

Tony
> Thanks for your help :)
>
> --
> Maria
>
>
>
>
>   

^ permalink raw reply related

* RE: eSDHC controller driver on MPC8308rdb
From: Maria Johansen @ 2010-10-19  7:26 UTC (permalink / raw)
  To: Bo.Liu; +Cc: linuxppc-dev
In-Reply-To: <4CBC04E8.5030706@windriver.com>

DQoNCj4tLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPkZyb206IFRvbnlsaXUgW21haWx0bzpC
by5MaXVAd2luZHJpdmVyLmNvbV0gDQo+PHNuaXA+DQo+PiBDb3VsZCB0aGlzIGJlIGEgcHJvYmxl
bSByZWxhdGVkIHRoZSBlU0RIQyBjb250cm9sbGVyIChvciB0aGUgZHJpdmVyKSwgDQo+PiBvciBp
cyBpdCB0aGUgbWVtb3J5IGNhcmQ/IChhIDRHQiBTYW5EaXNrIEV4dHJlbWUgU0RIQyBjYXJkLCB3
aGljaCB1bmZvcnR1bmF0ZWx5IGlzIHRoZSBvbmx5IGNhcmQgSSBoYXZlIGF2YWlsYWJsZSBhdCB0
aGUgbW9tZW50LikgSSB3aWxsIGtlZXAgZGlnZ2luZyBpbnRvIGRyaXZlcnMvbW1jL2hvc3Qvc2Ro
Y2kuYyBpbiBzZWFyY2ggb2YgYSBzb2x1dGlvbiwgYW5kIGFueSB0aXBzIHRvIGhvdyBJIHNob3Vs
ZCBwcm9jZWVkIHdvdWxkIGJlIGdyZWF0bHkgYXBwcmVjaWF0ZWQhDQo+PiAgIA0KPkhpLA0KPlRy
eSB0aGUgcGF0Y2ggaW4gYXR0YXRjaG1lbnQuIEJ5IGRlZmF1bHQsIG9uIHNvbWUgZTMwMCBwbGF0
Zm9ybXMgc3VjaCBhcyBtcGM4MzA4X3JkYiwgdGhlIGVudHJ5ICJjbG9jay1mcmVxdWVuY3kiIG9m
IHNlY3Rpb24gc2RjaGkgaW4gRFRCIGlzIG5vdCBmaXhlZCBieSB1LWJvb3QsIHNvIGhhcyB0byA+
Y2FjdWxhdGUgdGhlIGlucHV0IGNsb2NrIGZvciBzZGhjaSBjb250cm9sbGVyIGV4cGxpY2l0bHkg
aW4gZHJpdmVyLg0KPg0KPlRvbnkNCg0KVGhhbmtzLCB0aGUgbWlzc2luZyBjbG9jay1mcmVxdWVu
Y3kgd2FzIHBhcnQgb2YgbXkgcHJvYmxlbS4gVGhlIHJlc3Qgd2FzIHNvbHZlZCBieSBhZGRpbmcg
c29tZSBtb3JlIHF1aXJrcyB0byB0aGUgZXNkaGMtZHJpdmVyIChTREhDSV9RVUlSS19GT1JDRV8x
X0JJVF9EQVRBIGFuZCBTREhDSV9RVUlSS19SRVNFVF9BRlRFUl9SRVFVRVNUKS4gV2h5IGl0IGlz
IG5lY2Vzc2FyeSB0byBmb3JjZSAxLWJpdCBkYXRhIHRyYW5zZmVycyBJIGFtIGEgYml0IHVuc3Vy
ZSBhYm91dCwgc2luY2UgdGhlIG1wYzgzMDggcmVmZXJlbmNlIG1hbnVhbCBjbGVhcmx5IHN0YXRl
cyB0aGF0IDQtYml0IHRyYW5zZmVycyBhcmUgc3VwcG9ydGVkLg0KDQpUaGFua3MgZm9yIHlvdXIg
aGVscCA6KQ0KDQotLQ0KTWFyaWENCg0KDQoNCg0K

^ permalink raw reply

* RE: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
From: Zang Roy-R61911 @ 2010-10-19  6:37 UTC (permalink / raw)
  To: Wood Scott-B07421, tiejun.chen
  Cc: dedekind1, Lan Chunhe-B25806, linuxppc-dev, linux-mtd, akpm,
	dwmw2, Gala Kumar-B11780
In-Reply-To: <20101018110631.22d6f081@udp111988uds.am.freescale.net>


> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, October 19, 2010 0:07 AM
> To: tiejun.chen
> Cc: Zang Roy-R61911; linux-mtd@lists.infradead.org; Wood Scott-B07421;
> dedekind1@gmail.com; Lan Chunhe-B25806; linuxppc-dev@ozlabs.org;
akpm@linux-
> foundation.org; dwmw2@infradead.org; Gala Kumar-B11780
> Subject: Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt
common to
> elbc devices
>=20
> On Mon, 18 Oct 2010 16:55:49 +0800
> "tiejun.chen" <tiejun.chen@windriver.com> wrote:
>=20
> > Looks you always iounmap(fsl_lbc_ctrl_dev->regs) on position 'err'
but here
> > of_iomap() is already failed you should skip iounmap()
fsl_lbc_ctrl_dev-
> >regs
> > again. So you should improve that as the following on 'err', or
layout 'err'
> in
> > gain.
> > ------
> > 	if(fsl_lbc_ctrl_dev->regs)
> > 		iounmap(fsl_lbc_ctrl_dev->regs);
> >
>=20
> It looks like iounmap(NULL) is a no-op, just like kfree(NULL).
Thanks for the reminder. I will keep original post, if there is no more
comment.
Roy

^ permalink raw reply

* Re: linux-next: manual merge of the tip tree with the powerpc tree
From: Stephen Rothwell @ 2010-10-19  4:56 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra
  Cc: linux-next, Paul Mackerras, linux-kernel, linuxppc-dev
In-Reply-To: <20101019154849.126a736a.sfr@canb.auug.org.au>

Hi again,

On Tue, 19 Oct 2010 15:48:49 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the tip tree got a conflict in
> arch/powerpc/kernel/time.c between commit
> cf9efce0ce3136fa076f53e53154e98455229514 ("powerpc: Account time using
> timebase rather than PURR") from the powerpc tree and commit
> e360adbe29241a0194e10e20595360dd7b98a2b3 ("irq_work: Add generic hardirq
> context callbacks") from the tip tree.
> 
> Just context changes.  I fixed it up and can carry the fix as necessary.

I forgot the merge diff, sorry:

diff --cc arch/powerpc/kernel/time.c
index 644f918,54888eb..0000000
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@@ -578,9 -585,11 +578,9 @@@ void timer_interrupt(struct pt_regs * r
  	old_regs = set_irq_regs(regs);
  	irq_enter();
  
- 	if (test_perf_event_pending()) {
- 		clear_perf_event_pending();
- 		perf_event_do_pending();
 -	calculate_steal_time();
 -
+ 	if (test_irq_work_pending()) {
+ 		clear_irq_work_pending();
+ 		irq_work_run();
  	}
  
  #ifdef CONFIG_PPC_ISERIES

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

^ permalink raw reply

* linux-next: manual merge of the tip tree with the powerpc tree
From: Stephen Rothwell @ 2010-10-19  4:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra
  Cc: linux-next, Paul Mackerras, linux-kernel, linuxppc-dev

Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/powerpc/kernel/time.c between commit
cf9efce0ce3136fa076f53e53154e98455229514 ("powerpc: Account time using
timebase rather than PURR") from the powerpc tree and commit
e360adbe29241a0194e10e20595360dd7b98a2b3 ("irq_work: Add generic hardirq
context callbacks") from the tip tree.

Just context changes.  I fixed it up and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

^ permalink raw reply

* Re: CONFIG_FEC is not good for mpc8xx ethernet?
From: tiejun.chen @ 2010-10-19  2:03 UTC (permalink / raw)
  To: Scott Wood; +Cc: ppcdev
In-Reply-To: <20101018114214.2e72a616@udp111988uds.am.freescale.net>

Scott Wood wrote:
> On Mon, 18 Oct 2010 16:40:42 +0800
> "tiejun.chen" <tiejun.chen@windriver.com> wrote:
> 
>> Shawn Jin wrote:
>>> Hi,
>>>
>>> My target is a mpc875 based board and has FEC ethernet. The phy is
>>> AM79C874. I have the following configuration for the network support.
>>>
>>> CONFIG_PHYLIB=y
>>> CONFIG_NET_ETHERNET=y
>>> CONFIG_MII=y
>>> CONFIG_FS_ENET=y
>>> CONFIG_FS_ENET_HAS_FEC=y
>>> CONFIG_FS_ENET_MDIO_FEC=y
>>>
>>> However I found that the phy support (AM79C874) is actually in
>>> drivers/net/fec.c which is compiled only when CONFIG_FEC=y. However
>> The phy driver should not be embedded into the NIC driver in theory.
> 
> Right, those are handled by drivers/net/phy/.
> 
>> I think you should include the phy driver, mdio-bitbang.c, which should be
>> support AMD79C874.
> 
> On MPC8xx you want drivers/net/fs_enet/mii-fec.c.  This is just the
> MDIO driver; it doesn't handle any particular PHY.  I don't know if
> there is a driver specifically for AM79C874, though the generic PHY
> support may be good enough.

Maybe.

I can found one related patch for supporting PHY AM79C874 on 2.6.15,
------
http://lists.ozlabs.org/pipermail/linuxppc-embedded/2005-November/021043.html

But I don't see that on the latest kernel, and also I don't know the history
completely for that. Maybe its already merged into one generic PHY driver but
I'm not sure.

Tiejun

> 
> -Scott
> 
> 

^ permalink raw reply

* Re: [PATCH 1/2] P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
From: tiejun.chen @ 2010-10-19  1:35 UTC (permalink / raw)
  To: Scott Wood
  Cc: B07421, dedekind1, B25806, linuxppc-dev, linux-mtd, akpm, dwmw2,
	B11780
In-Reply-To: <20101018110631.22d6f081@udp111988uds.am.freescale.net>

Scott Wood wrote:
> On Mon, 18 Oct 2010 16:55:49 +0800
> "tiejun.chen" <tiejun.chen@windriver.com> wrote:
> 
>> Looks you always iounmap(fsl_lbc_ctrl_dev->regs) on position 'err' but here
>> of_iomap() is already failed you should skip iounmap() fsl_lbc_ctrl_dev->regs
>> again. So you should improve that as the following on 'err', or layout 'err' in
>> gain.
>> ------
>> 	if(fsl_lbc_ctrl_dev->regs)
>> 		iounmap(fsl_lbc_ctrl_dev->regs);
>>
> 
> It looks like iounmap(NULL) is a no-op, just like kfree(NULL).
> 

Absolutely, I know what you mean :)

But I think we should take care of every line as one normal progress/rule. That
will make us understand easily, and maybe we also would benefit a little good
performance from those codes.

Tiejun

> -Scott
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply

* Re: BUG: dead loop in PowerPC hcall tracepoint (Was: [LTP] [PATCH v2] Add ftrace-stress-test to LTP)
From: Li Zefan @ 2010-10-19  0:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: ltp-list, Peter Zijlstra, linuxppc-dev, LKML, Paul Mackerras,
	Anton Blanchard, Ingo Molnar, subrata
In-Reply-To: <1287411903.16971.275.camel@gandalf.stny.rr.com>

Steven Rostedt wrote:
> On Mon, 2010-10-18 at 11:19 +0800, Li Zefan wrote:
> 
>> This is a dead loop:
>>
>> trace_hcall_entry() -> trace_clock_global() -> trace_hcall_entry() ..
>>
>> And this is a PPC specific bug. Hope some ppc guys will fix it?
>> Or we kill trace_clock_global() if no one actually uses it..
> 
> trace_clock_global() is used by many. I use it (and recommend using it)
> on boxes where the TSC is horribly out of sync, and the trace needs
> synchronization between CPUs.
> 
> The trace_hcall_entry and exit has wrappers already. Just add recursion
> protection there.
> 

Right, I thought of this. But as I have no machine to test, I'll leave
this to others.

> Perhaps something like this:
> 
> (Not compiled nor ran)
> 
> +static DEFINE_PER_CPU(hcall_trace_disable);
> +
>  void hcall_tracepoint_regfunc(void)
>  {
>  	hcall_tracepoint_refcount++;
>  }
> 
>  void hcall_tracepoint_unregfunc(void)
>  {
>  	hcall_tracepoint_refcount--;
>  }
> 
> +int __trace_disable_check(void)
> +{
> +	if (!hcall_tracepoint_refcount)
> +		return 1;
> +
> +	if (get_cpu_var(hcall_trace_disable)) {
> +		put_cpu_var(hcall_trace_disable);
> +		return 1;
> +	}
> +
> +	__get_cpu_var(hcall_trace_disable)++;
> +
> +	return 0;
> +}
> +
> +void __trace_disable_put(void)
> +{
> +	__get_cpu_var(hcall_trace_disable)--;
> +	put_cpu_var(hcall_trace_disable);
> +}
> +
>  void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
>  {
> +	int trace_disable;
> +
> +	if (__trace_disable_check())
> +		return;
> +
>  	trace_hcall_entry(opcode, args);
> +	__trace_disable_put();
>  }
> 
>  void __trace_hcall_exit(long opcode, unsigned long retval,
>  			unsigned long *retbuf)
>  {
> +	if (__trace_disable_check())
> +		return;
> +
>  	trace_hcall_exit(opcode, retval, retbuf);
> +	__trace_disable_put();
>  }
> 
> -- Steve
> 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures
From: Benjamin Herrenschmidt @ 2010-10-18 23:34 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linuxppc-dev list
In-Reply-To: <1287439021.3700.22.camel@shaggy-w500>

On Mon, 2010-10-18 at 16:57 -0500, Dave Kleikamp wrote:
> 
> I didn't like the implementation of a per-core stale map.  The existing
> implementation flushes the core's tlb, but only clears a specific entry
> from the stale map.  It's dealing with the stale contexts one at a time,
> where the new function is accumulating many stale contexts, with the
> intent to do a single tlb flush per core.

Right, because I wrote it with A2 in mind which has a TLB invalidate by
PID instruction :-) So I don't flush the whole TLB there... but then
this instruction can take hundreds (or more) of cycles so it might not
necessarily be that great anyways...

> Since I originally intended the new function only to be enabled on the
> 47x, I left the context-stealing code as untouched as possible thinking
> it wouldn't be exercised in 47x-land.  This was probably narrow-minded,
> and I should look at either 1) aligning the context-stealing code closer
> to the new lazy flush code, or 2) dropping this code on the floor and
> picking back up the new design that we worked on last year.

In any case, we can probably merge you current stuff upstream (with
appropriate bug fixes if necessary) for now and move from there.

> > At the very least, the "old style" stale map code and "new style" stale
> > TLB code should be more in sync, you may end up flushing the TLB
> > twice...
> 
> yeah.  if we enable this for 440, it is more likely to be an issue than
> on 476.

Right.

> > >  void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > >  {
> > >     unsigned int i, id, cpu = smp_processor_id();
> > > @@ -197,6 +260,8 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > >     /* No lockless fast path .. yet */
> > >     raw_spin_lock(&context_lock);
> > >  
> > > +   flush_recycled_contexts(cpu);
> > > +
> > 
> > Ok so here's the nasty one I think. You need to make sure that whenever
> > you pick something off the context_available_map, you've done the above
> > first within the same context_lock section right ? At least before you
> > actually -use- said context.
> 
> right.
> 
> > So far so good ... but steal_context can drop the lock iirc. So you may
> > need to re-flush there. Not sure that can happen in practice but better
> > safe than sorry. I would have preferred seeing that flush near the end
> > of the function to avoid such problem.
> 
> I can fix this.  For 476, I don't think that even if steal_context()
> could be called, it wouldn't drop the lock.  But then again, if we
> enable this for other architectures, it may be a possibility.

Yeah, it's a minor issue but I'd rather get the code right to avoid
surprises later.

 .../...

> > Now...
> > 
> > > +/*
> > > + * This is called from flush_tlb_mm().  Mark the current context as stale
> > > + * and grab an available one.  The tlb will be flushed when no more
> > > + * contexts are available
> > > + */
> > > +void lazy_flush_context(struct mm_struct *mm)
> > > +{
> > > +   unsigned int id;
> > > +   unsigned long flags;
> > > +   unsigned long *map;
> > > +
> > > +   raw_spin_lock_irqsave(&context_lock, flags);
> > > +
> > > +   id = mm->context.id;
> > > +   if (unlikely(id == MMU_NO_CONTEXT))
> > > +           goto no_context;
> > 
> > First thing is ... you reproduce quite a bit of logic from
> > switch_mmu_context() here. Shouldn't it be abstracted in a separate
> > function ?
> 
> I'm sure there's something I can do there.
> 
> > The other thing here is that another CPU might have done a
> > recycle_stale_contexts() before you get here. IE. Your TLB may be stale.
> > Shouln't you do a flush here ? Since you are picking up a new PID from
> > the context_available_map, it can potentially be stale if your tlb needs
> > flushing due to another CPU having just done a recycle.
> 
> It looks like I missed that.  It does seem that there should be a flush
> in here.

Ok, so It wasn't just shit in my eyes :-)

> I think I'm going to dust off the newer implementation based on your and
> Paul's design.  I can probably get that in good working order without
> too much more work, and it's something we need to look at eventually
> anyway.  If I find anything that really gets in my way, I might fix up
> this patch in the mean time.

As you like. As I said earlier, I'm happy to merge a fixed version of
this one first if you think it's going to take a while to get the other
one right. However, I believe this is too late for the next merge window
anyways so that gives you some time ahead to play and make a decision.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH RFC] PPC: /dev/mem: All RAM is cacheable, not just the kernel's.
From: Kumar Gala @ 2010-10-18 22:41 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20101018223256.GA30946@udp111988uds.am.freescale.net>


On Oct 18, 2010, at 5:32 PM, Scott Wood wrote:

> If mem=3D is used on the kernel command line to create reserved =
regions
> for userspace to map using /dev/mem, let it be mapped cacheable as =
long
> as it is within the memory region described in the device tree.
>=20
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---

Just a nit, but subject should be powerpc:.. not PPC:

> This isn't just a performance issue, but it could also be a =
correctness
> issue, if the reserved portion of RAM is mapped cacheable by e.g. a =
KVM
> guest.  This patch does not address cases where such regions could =
show up
> as something other than a standard memory node -- such as shared =
regions
> in an AMP configuration.  Ideally there would be some means for a =
platform
> to register cacheable regions, without having to completely replace =
the
> entire phys_mem_access_prot function.
>=20
> This patch assumes that there is no region between memstart and memend =
that
> must be non-cacheable.  This is already the case with the "for now"
> implementation of page_is_ram on 32-bit, but will this be a problem on
> 64-bit?
>=20
> arch/powerpc/kernel/pci-common.c |    5 ++++-
> arch/powerpc/kernel/prom.c       |    3 +++
> arch/powerpc/mm/mem.c            |    3 ++-
> arch/powerpc/mm/mmu_decl.h       |    1 +
> 4 files changed, 10 insertions(+), 2 deletions(-)

For some time I've been meaning for us to look at the address range =
tracking that x86 does so one can make sure we aren't mapping regions =
with different WIMG settings.  However, never enough time for this.  So =
I think this is reasonable for now.  Hopefully Ben will comment on =
64-bit side of the world.

- k=

^ permalink raw reply

* [PATCH RFC] PPC: /dev/mem: All RAM is cacheable, not just the kernel's.
From: Scott Wood @ 2010-10-18 22:32 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev

If mem= is used on the kernel command line to create reserved regions
for userspace to map using /dev/mem, let it be mapped cacheable as long
as it is within the memory region described in the device tree.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
This isn't just a performance issue, but it could also be a correctness
issue, if the reserved portion of RAM is mapped cacheable by e.g. a KVM
guest.  This patch does not address cases where such regions could show up
as something other than a standard memory node -- such as shared regions
in an AMP configuration.  Ideally there would be some means for a platform
to register cacheable regions, without having to completely replace the
entire phys_mem_access_prot function.

This patch assumes that there is no region between memstart and memend that
must be non-cacheable.  This is already the case with the "for now"
implementation of page_is_ram on 32-bit, but will this be a problem on
64-bit?

 arch/powerpc/kernel/pci-common.c |    5 ++++-
 arch/powerpc/kernel/prom.c       |    3 +++
 arch/powerpc/mm/mem.c            |    3 ++-
 arch/powerpc/mm/mmu_decl.h       |    1 +
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 10a44e6..4298a56 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -39,6 +39,8 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+#include <mm/mmu_decl.h>
+
 static DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
@@ -398,7 +400,8 @@ pgprot_t pci_phys_mem_access_prot(struct file *file,
 	resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
 	int i;
 
-	if (page_is_ram(pfn))
+	if (pfn >= (memstart_addr >> PAGE_SHIFT) &&
+	    pfn <= (memend_addr >> PAGE_SHIFT))
 		return prot;
 
 	prot = pgprot_noncached(prot);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index fed9bf6..f8f5966 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -62,6 +62,8 @@
 #define DBG(fmt...)
 #endif
 
+phys_addr_t memend_addr;
+
 #ifdef CONFIG_PPC64
 int __initdata iommu_is_off;
 int __initdata iommu_force_on;
@@ -504,6 +506,7 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 	memblock_add(base, size);
 
 	memstart_addr = min((u64)memstart_addr, base);
+	memend_addr = max((u64)memend_addr, base + size - 1);
 }
 
 u64 __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1a84a8d..f947a39 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,7 +104,8 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 	if (ppc_md.phys_mem_access_prot)
 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
 
-	if (!page_is_ram(pfn))
+	if (pfn < (memstart_addr >> PAGE_SHIFT) ||
+	    pfn > (memend_addr >> PAGE_SHIFT))
 		vma_prot = pgprot_noncached(vma_prot);
 
 	return vma_prot;
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index dd0a258..05f1ac6 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -119,6 +119,7 @@ extern phys_addr_t __initial_memory_limit_addr;
 extern phys_addr_t total_memory;
 extern phys_addr_t total_lowmem;
 extern phys_addr_t memstart_addr;
+extern phys_addr_t memend_addr;
 extern phys_addr_t lowmem_end_addr;
 
 #ifdef CONFIG_WII
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures
From: Dave Kleikamp @ 2010-10-18 21:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list
In-Reply-To: <1287017539.2205.48.camel@pasglop>

On Thu, 2010-10-14 at 11:52 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2010-09-24 at 13:01 -0500, Dave Kleikamp wrote:
> > On PPC_MMU_NOHASH processors that support a large number of contexts,
> > implement a lazy flush_tlb_mm() that switches to a free context, marking
> > the old one stale.  The tlb is only flushed when no free contexts are
> > available.
> > 
> > The lazy tlb flushing is controlled by the global variable tlb_lazy_flush
> > which is set during init, dependent upon MMU_FTR_TYPE_47x.
> 
> Unless I'm mistaken, there are some issues with that patch... sorry for
> the late review, I've been away for a couple of weeks.
> 
> > +int tlb_lazy_flush;
> > +static int tlb_needs_flush[NR_CPUS];
> > +static unsigned long *context_available_map;
> > +static unsigned int nr_stale_contexts;
> 
> Now I understand what you're doing here, but wouldn't it have been
> possible to re-use the existing stale map concept or do you reckon it
> would have been too messy ?

I didn't like the implementation of a per-core stale map.  The existing
implementation flushes the core's tlb, but only clears a specific entry
from the stale map.  It's dealing with the stale contexts one at a time,
where the new function is accumulating many stale contexts, with the
intent to do a single tlb flush per core.

Since I originally intended the new function only to be enabled on the
47x, I left the context-stealing code as untouched as possible thinking
it wouldn't be exercised in 47x-land.  This was probably narrow-minded,
and I should look at either 1) aligning the context-stealing code closer
to the new lazy flush code, or 2) dropping this code on the floor and
picking back up the new design that we worked on last year.


> At the very least, the "old style" stale map code and "new style" stale
> TLB code should be more in sync, you may end up flushing the TLB
> twice...

yeah.  if we enable this for 440, it is more likely to be an issue than
on 476.

> > @@ -189,6 +220,38 @@ static void context_check_map(void)
> >  static void context_check_map(void) { }
> >  #endif
> >  
> > +/*
> > + * On architectures that support a large number of contexts, the tlb
> > + * can be flushed lazily by picking a new context and making the stale
> > + * context unusable until a lazy tlb flush has been issued.
> > + *
> > + * context_available_map keeps track of both active and stale contexts,
> > + * while context_map continues to track only active contexts.  When the
> > + * lazy tlb flush is triggered, context_map is copied to
> > + * context_available_map, making the once-stale contexts available again
> > + */
> > +static void recycle_stale_contexts(void)
> > +{
> > +	if (nr_free_contexts == 0 && nr_stale_contexts > 0) {
> 
> Do an early return and avoid the indentation instead ?

Yeah, that makes sense.

> > +		unsigned int cpu = smp_processor_id();
> > +		unsigned int i;
> > +
> > +		pr_hard("[%d] recycling stale contexts\n", cpu);
> > +		/* Time to flush the TLB's */
> > +		memcpy(context_available_map, context_map, CTX_MAP_SIZE);
> > +		nr_free_contexts = nr_stale_contexts;
> > +		nr_stale_contexts = 0;
> > +		for_each_online_cpu(i) {
> > +			if ((i < cpu_first_thread_in_core(cpu)) ||
> > +			    (i > cpu_last_thread_in_core(cpu)))
> > +				tlb_needs_flush[i] = 1;
> > +			else
> > +				tlb_needs_flush[i] = 0;	/* This core */
> > +		}
> > +		_tlbil_all();
> > +	}
> > +}
> > +
> >  void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> >  {
> >  	unsigned int i, id, cpu = smp_processor_id();
> > @@ -197,6 +260,8 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> >  	/* No lockless fast path .. yet */
> >  	raw_spin_lock(&context_lock);
> >  
> > +	flush_recycled_contexts(cpu);
> > +
> 
> Ok so here's the nasty one I think. You need to make sure that whenever
> you pick something off the context_available_map, you've done the above
> first within the same context_lock section right ? At least before you
> actually -use- said context.

right.

> So far so good ... but steal_context can drop the lock iirc. So you may
> need to re-flush there. Not sure that can happen in practice but better
> safe than sorry. I would have preferred seeing that flush near the end
> of the function to avoid such problem.

I can fix this.  For 476, I don't think that even if steal_context()
could be called, it wouldn't drop the lock.  But then again, if we
enable this for other architectures, it may be a possibility.

> Then, you can end up in cases where you flush the TLB, but your context
> is marked stale due to stealing, and flush again. That's one of the
> reason I wonder if we can consolidate a bit the two orthogonal
> "staleness" concepts we have now.
> 
> Granted, stealing on 47x is unlikely, but I have reasons to think that
> this lazy flushing will benefit 440 too.
> 
> >  	pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
> >  		cpu, next, next->context.active, next->context.id);
> >  
> > @@ -227,7 +292,12 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> >  	id = next_context;
> >  	if (id > last_context)
> >  		id = first_context;
> > -	map = context_map;
> > +
> > +	if (tlb_lazy_flush) {
> > +		recycle_stale_contexts();
> > +		map = context_available_map;
> > +	} else
> > +		map = context_map;
> >  
> >  	/* No more free contexts, let's try to steal one */
> >  	if (nr_free_contexts == 0) {
> > @@ -250,6 +320,13 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> >  		if (id > last_context)
> >  			id = first_context;
> >  	}
> > +	if (tlb_lazy_flush)
> > +		/*
> > +		 * In the while loop above, we set the bit in
> > +		 * context_available_map, it also needs to be set in
> > +		 * context_map
> > +		 */
> > +		__set_bit(id, context_map);
> >   stolen:
> >  	next_context = id + 1;
> >  	context_mm[id] = next;
> > @@ -267,7 +344,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> >  			    id, cpu_first_thread_in_core(cpu),
> >  			    cpu_last_thread_in_core(cpu));
> >  
> > -		local_flush_tlb_mm(next);
> > +		__local_flush_tlb_mm(next);
> >  
> >  		/* XXX This clear should ultimately be part of local_flush_tlb_mm */
> >  		for (i = cpu_first_thread_in_core(cpu);
> > @@ -317,11 +394,61 @@ void destroy_context(struct mm_struct *mm)
> >  		mm->context.active = 0;
> >  #endif
> >  		context_mm[id] = NULL;
> > -		nr_free_contexts++;
> > +
> > +		if (tlb_lazy_flush)
> > +			nr_stale_contexts++;
> > +		else
> > +			nr_free_contexts++;
> >  	}
> >  	raw_spin_unlock_irqrestore(&context_lock, flags);
> >  }
> 
> Now...
> 
> > +/*
> > + * This is called from flush_tlb_mm().  Mark the current context as stale
> > + * and grab an available one.  The tlb will be flushed when no more
> > + * contexts are available
> > + */
> > +void lazy_flush_context(struct mm_struct *mm)
> > +{
> > +	unsigned int id;
> > +	unsigned long flags;
> > +	unsigned long *map;
> > +
> > +	raw_spin_lock_irqsave(&context_lock, flags);
> > +
> > +	id = mm->context.id;
> > +	if (unlikely(id == MMU_NO_CONTEXT))
> > +		goto no_context;
> 
> First thing is ... you reproduce quite a bit of logic from
> switch_mmu_context() here. Shouldn't it be abstracted in a separate
> function ?

I'm sure there's something I can do there.

> The other thing here is that another CPU might have done a
> recycle_stale_contexts() before you get here. IE. Your TLB may be stale.
> Shouln't you do a flush here ? Since you are picking up a new PID from
> the context_available_map, it can potentially be stale if your tlb needs
> flushing due to another CPU having just done a recycle.

It looks like I missed that.  It does seem that there should be a flush
in here.

> > +	/*
> > +	 * Make the existing context stale.  It remains in
> > +	 * context_available_map as long as nr_free_contexts remains non-zero
> > +	 */
> > +	 __clear_bit(id, context_map);
> > +	 context_mm[id] = NULL;
> > +	 nr_stale_contexts++;
> > +
> > +	recycle_stale_contexts();
> > +	BUG_ON(nr_free_contexts == 0);
> > +
> > +	nr_free_contexts--;
> > +	id = last_context;
> > +	map = context_available_map;
> > +	while (__test_and_set_bit(id, map)) {
> > +		id = find_next_zero_bit(map, last_context+1, id);
> > +		if (id > last_context)
> > +			id = first_context;
> > +	}
> > +	set_bit(id, context_map);
> > +	next_context = id + 1;
> > +	context_mm[id] = mm;
> > +	mm->context.id = id;
> > +	if (current->active_mm == mm)
> > +		set_context(id, mm->pgd);
> > +no_context:
> > +	raw_spin_unlock_irqrestore(&context_lock, flags);
> > +}
> > +
> >  #ifdef CONFIG_SMP
> >  
> >  static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
> > @@ -407,6 +534,7 @@ void __init mmu_context_init(void)
> >  	} else if (mmu_has_feature(MMU_FTR_TYPE_47x)) {
> >  		first_context = 1;
> >  		last_context = 65535;
> > +		tlb_lazy_flush = 1;
> >  	} else {
> >  		first_context = 1;
> >  		last_context = 255;
> 
> Somebody should measure on 440, might actually improve perfs. Something
> like a kernel compile sounds like a good test here.

I think I'm going to dust off the newer implementation based on your and
Paul's design.  I can probably get that in good working order without
too much more work, and it's something we need to look at eventually
anyway.  If I find anything that really gets in my way, I might fix up
this patch in the mean time.

Thanks,
Shaggy
-- 
Dave Kleikamp
IBM Linux Technology Center

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Thomas Gleixner @ 2010-10-18 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helmut Grohne, Mel Gorman, LKML, linux-mm, pacman, linuxppc-dev
In-Reply-To: <20101018123750.ef7d6d48.akpm@linux-foundation.org>

On Mon, 18 Oct 2010, Andrew Morton wrote:

> On Mon, 18 Oct 2010 12:33:31 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > A bit but I still don't know why it would cause corruption. Maybe this is still
> > a caching issue but the difference in timing between list_add and list_add_tail
> > is enough to hide the bug. It's also possible there are some registers
> > ioremapped after the memmap array and reading them is causing some
> > problem.
> > 
> > Andrew, what is the right thing to do here? We could flail around looking
> > for explanations as to why the bug causes a user buffer corruption but never
> > get an answer or do we go with this patch, preferably before 2.6.36 releases?
> 
> Well, you've spotted a bug so I'd say we fix it asap.
> 
> It's a bit of a shame that we lose the only known way of reproducing a
> different bug, but presumably that will come back and bite someone else
> one day, and we'll fix it then :(

I might be completely one off as usual, but this thing reminds me of a
bug I stared at yesterday night:

    http://permalink.gmane.org/gmane.linux.kernel/1049605

Reporter Cc'ed

Thanks,

	tglx

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: pacman @ 2010-10-18 21:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <1287436205.2341.14.camel@pasglop>

Benjamin Herrenschmidt writes:
> 
> You can do something fun... like a timer interrupt that peeks at those
> physical addresses from the linear mapping for example, and try to find
> out "when" they get set to the wrong value (you should observe the load
> from disk, then the corruption, unless they end up being loaded
> incorrectly (ie. dma coherency problem ?) ...

I'm headed toward something like that. Maybe not a timer, maybe a "check it
every time the kernel is entered". But first I have to work out exactly when
the disk load completes so I know when to start checking.

> 
> >From there, you might be able to close onto the culprit a bit more, for
> example, try using the DABR register to set data access breakpoints
> shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> can set whether you want it to break on a real or a virtual address.

I thought of that, but as far as I can tell, this CPU doesn't have DABR.
/proc/cpuinfo
processor	: 0
cpu		: 7447/7457
clock		: 999.999990MHz
revision	: 1.1 (pvr 8002 0101)
bogomips	: 66.66
timebase	: 33333333
platform	: CHRP
model		: Pegasos2
machine		: CHRP Pegasos2
Memory		: 512 MB

My next thought was: right after the correct value appears in memory, unmap
the page from the kernel and let it Oops when it tries to write there. Then I
found out that the kernel is using BATs instead of page tables for its own
view of memory. Booting with "nobats" completely changes the memory usage
pattern (probably because it's allocating a lot of pages to hold PTEs that it
didn't need before)

> 
> You can also sprinkle tests for the page content through the code if
> that doesn't work to try to "close in" on the culprit (for example if
> it's a case of stray DMA, like a network driver bug or such).

No network drivers are loaded when this happens.

-- 
Alan Curry

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Benjamin Herrenschmidt @ 2010-10-18 21:10 UTC (permalink / raw)
  To: pacman; +Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <20101018191034.21165.qmail@kosh.dhis.org>

On Mon, 2010-10-18 at 14:10 -0500, pacman@kosh.dhis.org wrote:

> I've been flailing around quite a bit. Here's my latest result:
> 
> Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
> clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
> loaded into memory during boot. So now after the corruption happens, I can
>   cmp -l /sbin/e2fsck good-e2fsck
> for a quick look at the changed bytes. Much easier than provoking a segfault
> under gdb.
> 
> Then I got really creative and wrote a cmp replacement which mmaps the files
> and reports the physical addresses from /proc/self/pagemap of the pages that
> don't match. And the consistent result is that physical pages 64604 and 64609
> (both in the range of the order=9 64512) have wrong contents. And the
> corruption is always a single word 128 bytes after the start of the page.
> Physical addresses 0x0fc5c080 and 0x0fc61080 are hit every time.

 .../...

You can do something fun... like a timer interrupt that peeks at those
physical addresses from the linear mapping for example, and try to find
out "when" they get set to the wrong value (you should observe the load
from disk, then the corruption, unless they end up being loaded
incorrectly (ie. dma coherency problem ?) ...

>From there, you might be able to close onto the culprit a bit more, for
example, try using the DABR register to set data access breakpoints
shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
can set whether you want it to break on a real or a virtual address.

You can also sprinkle tests for the page content through the code if
that doesn't work to try to "close in" on the culprit (for example if
it's a case of stray DMA, like a network driver bug or such).

Cheers,
Ben.


> The values of the corrupted words, observed in 5 consecutive boots, were:
>   at 0fc5c080   at 0fc61080
>   -----------   -----------
>   c3540000      92510000
>   565c0000      23590000
>   c85b0000      97580000
>   d15f0000      9e5c0000
>   d95b0000      a8580000
> 
> The low 16 bits are all 0 and the upper 16 bits seem randomly distributed.
> But look at the differences:
> 
>   c3540000 - 92510000 = 31030000
>   565c0000 - 23590000 = 33030000
>   c85b0000 - 97580000 = 31030000
>   d15f0000 - 9e5c0000 = 33030000
>   d95b0000 - a8580000 = 31030000
> 
> This means something... but I don't know what.
> 
> In a completely different method of investigation, I went back a few stable
> kernels, got 2.6.33.7 and applied 6dda9d55 to it, thinking that if 6dda9d55
> only reveals a pre-existing bug, I could bisect it using 6dda9d55 as a
> bug-revealing assistant. The bug appeared when running 2.6.33.7 with 6dda9d55
> applied. That was discouraging.
> 
> >This patch fixes the problem by ensuring we are not reading a possibly
> >invalid location of memory. It's not clear why the read causes
> >corruption but one way or the other it is a buggy read.
> 
> At least that part of the explanation is wrong. Where's the buggy read?
> The action taken by the 6dda9d55 version of __free_one_page looks perfectly
> legitimate to me. Page numbers:
> 
> [129024       ] [130048       ]   order=10
> [129024 129536] [130048 130560]   order=9
> 
> 130048 is being freed. 130560 is not free. 129024 (the higher_buddy) is
> already free at order=10. So 130048 is being pushed to the tail of the free
> list, on the speculation that 130560 might soon be free and then the whole
> thing will form an order=11 free page, the only problem being that order=11
> is too high so that later merge will never happen. It's not useful, and maybe
> not conceptually valid to say that 129024 is the buddy of 130048, but it is
> an existing page, and the only way it wouldn't be is if the total memory size
> was not a multiple of 1<<(MAX_ORDER-1) pages
> 
> -- 
> Alan Curry
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Benjamin Herrenschmidt @ 2010-10-18 21:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, linux-mm, pacman, linuxppc-dev, linux-kernel
In-Reply-To: <20101018123750.ef7d6d48.akpm@linux-foundation.org>

On Mon, 2010-10-18 at 12:37 -0700, Andrew Morton wrote:
> Well, you've spotted a bug so I'd say we fix it asap.
> 
> It's a bit of a shame that we lose the only known way of reproducing a
> different bug, but presumably that will come back and bite someone
> else
> one day, and we'll fix it then :(

Well, I can always revert that and run some experiments here, provided I
can reproduce the problem at all ...

Cheers,
Ben.

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Benjamin Herrenschmidt @ 2010-10-18 20:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Yinghai Lu, linux-kernel, linux-mm, pacman, KOSAKI Motohiro,
	Andrew Morton, linuxppc-dev, Christoph Lameter
In-Reply-To: <20101013144044.GS30667@csn.ul.ie>

On Wed, 2010-10-13 at 15:40 +0100, Mel Gorman wrote:
> 
> This is somewhat contrived but I can see how it might happen even on one
> CPU particularly if the L1 cache is virtual and is loose about checking
> physical tags.
> 
> > How sensitive/vulnerable is PPC32 to such things?
> > 
> 
> I can not tell you specifically but if the above scenario is in any way
> plausible, I believe it would depend on what sort of L1 cache the CPU
> has. Maybe this particular version has a virtual cache with no physical
> tagging and is depending on the OS not to make virtual aliasing mistakes.

Nah, ppc doesn't have problems with cache aliases, it all looks
physically tagged to the programmer (tho there's subtleties but none
that explains the reported behaviour).

Looks like real memory corruption to me.

Cheers,
Ben.

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: Andrew Morton @ 2010-10-18 19:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, pacman, linuxppc-dev, linux-kernel
In-Reply-To: <20101018113331.GB30667@csn.ul.ie>

On Mon, 18 Oct 2010 12:33:31 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> A bit but I still don't know why it would cause corruption. Maybe this is still
> a caching issue but the difference in timing between list_add and list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap array and reading them is causing some
> problem.
> 
> Andrew, what is the right thing to do here? We could flail around looking
> for explanations as to why the bug causes a user buffer corruption but never
> get an answer or do we go with this patch, preferably before 2.6.36 releases?

Well, you've spotted a bug so I'd say we fix it asap.

It's a bit of a shame that we lose the only known way of reproducing a
different bug, but presumably that will come back and bite someone else
one day, and we'll fix it then :(

^ permalink raw reply

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
From: pacman @ 2010-10-18 19:10 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Andrew Morton, linuxppc-dev, linux-kernel
In-Reply-To: <20101018113331.GB30667@csn.ul.ie>

Mel Gorman writes:
> 
> A bit but I still don't know why it would cause corruption. Maybe this is still
> a caching issue but the difference in timing between list_add and list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap array and reading them is causing some
> problem.

I've been doing a lot more tests and I'm sure that 6dda9d55 is not really
responsible. It just happens to provoke the bug in my particular setup.
Whatever it is, it's very sensitive to small changes.

At the end of free_all_bootmem, the free list for order 9 has 4 entries.
Which one is at the head of the list depends on whether 6dda9d55 is applied
or not. If page number 130048 is at the head of the list, it gets used fairly
soon, and everything's fine. The alternative is that page number 64512 is at
the head of the list, so it gets used fairly soon, and corruption occurs.

> 
> Andrew, what is the right thing to do here? We could flail around looking
> for explanations as to why the bug causes a user buffer corruption but never
> get an answer or do we go with this patch, preferably before 2.6.36 releases?

I've been flailing around quite a bit. Here's my latest result:

Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
loaded into memory during boot. So now after the corruption happens, I can
  cmp -l /sbin/e2fsck good-e2fsck
for a quick look at the changed bytes. Much easier than provoking a segfault
under gdb.

Then I got really creative and wrote a cmp replacement which mmaps the files
and reports the physical addresses from /proc/self/pagemap of the pages that
don't match. And the consistent result is that physical pages 64604 and 64609
(both in the range of the order=9 64512) have wrong contents. And the
corruption is always a single word 128 bytes after the start of the page.
Physical addresses 0x0fc5c080 and 0x0fc61080 are hit every time.

The values of the corrupted words, observed in 5 consecutive boots, were:
  at 0fc5c080   at 0fc61080
  -----------   -----------
  c3540000      92510000
  565c0000      23590000
  c85b0000      97580000
  d15f0000      9e5c0000
  d95b0000      a8580000

The low 16 bits are all 0 and the upper 16 bits seem randomly distributed.
But look at the differences:

  c3540000 - 92510000 = 31030000
  565c0000 - 23590000 = 33030000
  c85b0000 - 97580000 = 31030000
  d15f0000 - 9e5c0000 = 33030000
  d95b0000 - a8580000 = 31030000

This means something... but I don't know what.

In a completely different method of investigation, I went back a few stable
kernels, got 2.6.33.7 and applied 6dda9d55 to it, thinking that if 6dda9d55
only reveals a pre-existing bug, I could bisect it using 6dda9d55 as a
bug-revealing assistant. The bug appeared when running 2.6.33.7 with 6dda9d55
applied. That was discouraging.

>This patch fixes the problem by ensuring we are not reading a possibly
>invalid location of memory. It's not clear why the read causes
>corruption but one way or the other it is a buggy read.

At least that part of the explanation is wrong. Where's the buggy read?
The action taken by the 6dda9d55 version of __free_one_page looks perfectly
legitimate to me. Page numbers:

[129024       ] [130048       ]   order=10
[129024 129536] [130048 130560]   order=9

130048 is being freed. 130560 is not free. 129024 (the higher_buddy) is
already free at order=10. So 130048 is being pushed to the tail of the free
list, on the speculation that 130560 might soon be free and then the whole
thing will form an order=11 free page, the only problem being that order=11
is too high so that later merge will never happen. It's not useful, and maybe
not conceptually valid to say that 129024 is the buddy of 130048, but it is
an existing page, and the only way it wouldn't be is if the total memory size
was not a multiple of 1<<(MAX_ORDER-1) pages

-- 
Alan Curry

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox