linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: pacman@kosh.dhis.org
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
Date: Mon, 29 Nov 2010 16:44:52 +1100	[thread overview]
Message-ID: <1291009492.32570.294.camel@pasglop> (raw)
In-Reply-To: <20101105064332.1678.qmail@kosh.dhis.org>


> I have a mostly-finished patch to do the above. I'll include it below, but
> first a few words about why it's only mostly finished.
> 
> The other Pegasos workarounds are in fixup_device_tree_chrp, and I don't see
> anything like an "if(machine_is_pegasos)" around them. What keeps them from
> being erroneously run on other CHRP-type machines? I made this patch mainly
> by copying pieces of other functions from prom_init.c, but couldn't find the
> "test for Pegasos before running a Pegasos workaround" piece.

Probably bcs the condition they test for really only happens on
pegasos ? :-)

I agree it's a bit gross tho.

The "ranges" property fixup is pretty harmless in any case. The other
fixup might be worth moving to a separate pegasos-only function in which
you would test for a pegasos properly and add your own stuff.

> Another issue is, since the firmware doesn't give me a "compatible" property
> with the details of the controller, I just have to assume that it's
> little-endian. I'm not sure if that's clean, since the real ohci driver
> supports both endiannesses, with at least 3 different Kconfig options(!) to
> choose between them.

If it's PCI it's LE or somebody needs to be shot :-)

> Then there's the volatile which I guess is supposed to be replaced by
> something else, but I don't know what the something else is. I believe this
> usage is extremely close to what volatile was meant for.

Yeah, it's fine, just add something like that on the next line:

 asm volatile("eieio" : : : "memory");

> Finally, when I updated to a more recent upstream kernel to test the patch, I
> found that an intervening commit (3df7169e73fc1d71a39cffeacc969f6840cdf52b,
> OHCI: work around for nVidia shutdown problem) has had a major effect,
> on the appearance of my bug.
> 
> Before that change, the window in which the bug could strike was from the end
> of prom_init (when the kernel believes that devices are quiescent) to the
> initialization of the ohci-hcd driver (which actually quietens the device, or
> at least directs its scribbling to a properly allocated page). After the
> change, the window ends at some point early in the PCI bus setup. That's a
> window so small that with a new kernel, I can't provoke a symptom even if I
> try.

Right but it's very fishy, ie, it may still be DMA'ing and god knows
where ... you may or may not get lucky. I'd rather you do a proper
fixup :-)

Cheers,
Ben.

> Mostly-finished patch:
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 941ff4d..a14f21b 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -2237,6 +2237,81 @@ static void __init fixup_device_tree_chrp(void)
>  		}
>  	}
>  }
> +
> +/*
> + * Pegasos firmware doesn't quiesce OHCI controllers, so do it manually
> + */
> +static void __init pegasos_quiesce(void)
> +{
> +	phandle node, parent_node;
> +	ihandle parent_ih;
> +	int rc;
> +	char type[16], *path;
> +	u32 prop[5], map_size;
> +	prom_arg_t ohci_virt;
> +
> +	for (node = 0; prom_next_node(&node); ) {
> +		memset(type, 0, sizeof(type));
> +		prom_getprop(node, "device_type", type, sizeof(type));
> +		if (strcmp(type, RELOC("usb")) != 0)
> +			continue;
> +
> +		/* Parent should be a PCI bus (so class-code makes sense).
> +		   class-code should be 0x0C0310 */
> +		parent_node = call_prom("parent", 1, 1, node);
> +		if (!parent_node)
> +			continue;
> +		rc = prom_getprop(node, "class-code", prop, sizeof(u32));
> +		if (rc != sizeof(u32) || prop[0] != 0x0c0310)
> +			continue;
> +
> +		rc = prom_getprop(node, "assigned-addresses",
> +				  prop, 5*sizeof(u32));
> +		if (rc != 5*sizeof(u32))
> +			continue;
> +
> +		/* Open the parent and call map-in */
> +
> +		/* It seems OF doesn't null-terminate the path :-( */
> +		path = RELOC(prom_scratch);
> +		memset(path, 0, PROM_SCRATCH_SIZE);
> +
> +		if (call_prom("package-to-path", 3, 1, parent_node,
> +			      path, PROM_SCRATCH_SIZE-1) == PROM_ERROR)
> +			continue;
> +		parent_ih = call_prom("open", 1, 1, path);
> +
> +		/* Get the OHCI node's pathname, for printing later */
> +		memset(path, 0, PROM_SCRATCH_SIZE);
> +		call_prom("package-to-path", 3, 1, node,
> +			  path, PROM_SCRATCH_SIZE-1);
> +
> +		map_size = prop[4];
> +		if (call_prom_ret("call-method", 6, 2, &ohci_virt,
> +				  ADDR("map-in"), parent_ih,
> +				  map_size, prop[0], prop[1], prop[2]) == 0) {
> +			prom_printf("resetting OHCI device %s...", path);
> +
> +			/* Set HostControllerReset (==1) in HcCommandStatus,
> +			 * located at offset 8 in the register area. The <<24
> +			 * is because the CPU is big-endian and the device is
> +			 * little-endian. */
> +			*(volatile u32 *)(ohci_virt + 8) |= (1<<24);
> +
> +			/* controller should acknowledge by zeroing the bit
> +			 * within 10us. waiting 1ms should be plenty. */
> +			call_prom("interpret", 1, 1, "1 ms");
> +			if (*(volatile u32 *)(ohci_virt + 8) & (1<<24))
> +				prom_printf("failed\n");
> +			else
> +				prom_printf("done\n");
> +
> +			call_prom("call-method", 4, 1, ADDR("map-out"),
> +				  parent_ih, map_size, ohci_virt);
> +		}
> +		call_prom("close", 1, 0, parent_ih);
> +	}
> +}
>  #else
>  #define fixup_device_tree_chrp()
>  #endif
> @@ -2642,6 +2717,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
>  	 * devices etc...
>  	 */
>  	prom_printf("Calling quiesce...\n");
> +	pegasos_quiesce();
>  	call_prom("quiesce", 0, 0);
>  
>  	/*
> 

  reply	other threads:[~2010-11-29  5:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20101009095718.1775.qmail@kosh.dhis.org>
     [not found] ` <20101011143022.GD30667@csn.ul.ie>
2010-10-11 21:00   ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Andrew Morton
2010-10-13 14:40     ` Mel Gorman
2010-10-13 17:52       ` pacman
2010-10-18 11:33         ` Mel Gorman
2010-10-18 19:10           ` pacman
2010-10-18 21:10             ` Benjamin Herrenschmidt
2010-10-18 21:33               ` pacman
2010-10-19 10:16                 ` Benjamin Herrenschmidt
2010-10-19 18:10                   ` pacman
2010-10-19 20:47                     ` Segher Boessenkool
2010-10-19 21:02                       ` Benjamin Herrenschmidt
2010-10-20  3:23                         ` pacman
2010-10-20 10:32                           ` Benjamin Herrenschmidt
2010-10-20 18:33                             ` pacman
2010-10-20 20:56                               ` Benjamin Herrenschmidt
2010-10-22  9:15                                 ` pacman
2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
2010-10-27 10:13                                   ` Olaf Hering
2010-10-27 21:04                                     ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, pacman
2010-10-27 22:05                                       ` Segher Boessenkool
2010-10-27 22:58                                         ` pacman
2010-10-27 23:33                                           ` Segher Boessenkool
2010-10-28  1:11                                             ` pacman
2010-10-28 19:50                                               ` Segher Boessenkool
2010-10-28 21:07                                                 ` pacman
2010-10-29  0:16                                                   ` Segher Boessenkool
2010-11-05  6:43                                                     ` pacman
2010-11-29  5:44                                                       ` Benjamin Herrenschmidt [this message]
2010-10-27 13:27                                   ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) Benjamin Herrenschmidt
2010-10-19 20:58                     ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Benjamin Herrenschmidt
2010-10-18 19:37           ` Andrew Morton
2010-10-18 21:02             ` Benjamin Herrenschmidt
2010-10-18 21:55             ` Thomas Gleixner
2010-10-19 16:24               ` Helmut Grohne
2010-10-19 16:42                 ` Thomas Gleixner
2010-10-18 20:59       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291009492.32570.294.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pacman@kosh.dhis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).