LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] synchronize_irq needs a barrier
From: Benjamin Herrenschmidt @ 2007-10-18 23:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Linux Kernel Mailing List, linuxppc-dev,
	Thomas Gleixner, akpm, Ingo Molnar
In-Reply-To: <alpine.LFD.0.999.0710181627091.26902@woody.linux-foundation.org>

> So how about something like this (untested! not necessarily very well 
> thought through either!)
> 
> Basic notion: the only thing that serializes the IRQ_INPROGRESS flag is 
> the descriptor lock. And we don't have to (or even want to!) hold it while 
> waiting for the thing, but we want to *have*held*it* in between whatever 
> we're synchronizing with.
> 
> The internal irq handler functions already held the lock when they did 
> whatever they need to serialize - and they are possibly performance 
> critical too - so they use the "internal" function that doesn't get the 
> lock unnecessarily again.

That may do the trick as the read can't cross the spin_lock (it can
cross spin_unlock but not lock). Advantage over adding a barrier to
handle_IRQ_event() is that it keeps the overhead to the slow path
(synchronize_irq).

Note that I didn't actually experience a problem here. I just came upon
that by accident while thinking about a similar issue I have with
napi_synchronize().

Looks good to me on a first glance (unfortunately a bit ugly but heh).

Ben.

^ permalink raw reply

* [PATCH] [POWERPC] ucc_geth: Eliminate compile warnings
From: Emil Medve @ 2007-10-18 22:15 UTC (permalink / raw)
  To: jgarzik, leoli, netdev, linuxppc-dev

drivers/net/ucc_geth.c: In function 'ucc_geth_startup':
drivers/net/ucc_geth.c:2614: warning: assignment makes integer from pointer without a cast
drivers/net/ucc_geth.c:2651: warning: assignment makes integer from pointer without a cast

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
---

This patch is against Jeff's tree: d85714d81cc0408daddb68c10f7fd69eafe7c213 

netdev-2.6> scripts/checkpatch.pl 0001-POWERPC-ucc_geth-Eliminate-compile-warnings.patch 
Your patch has no obvious style problems and is ready for submission.

 drivers/net/ucc_geth.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index bec413b..d427da8 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2611,7 +2611,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 			if (UCC_GETH_TX_BD_RING_ALIGNMENT > 4)
 				align = UCC_GETH_TX_BD_RING_ALIGNMENT;
 			ugeth->tx_bd_ring_offset[j] =
-				kmalloc((u32) (length + align), GFP_KERNEL);
+				(u32)kmalloc(length + align, GFP_KERNEL);
 
 			if (ugeth->tx_bd_ring_offset[j] != 0)
 				ugeth->p_tx_bd_ring[j] =
@@ -2648,7 +2648,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 			if (UCC_GETH_RX_BD_RING_ALIGNMENT > 4)
 				align = UCC_GETH_RX_BD_RING_ALIGNMENT;
 			ugeth->rx_bd_ring_offset[j] =
-				kmalloc((u32) (length + align), GFP_KERNEL);
+				(u32)kmalloc(length + align, GFP_KERNEL);
 			if (ugeth->rx_bd_ring_offset[j] != 0)
 				ugeth->p_rx_bd_ring[j] =
 					(void*)((ugeth->rx_bd_ring_offset[j] +
-- 
1.5.3.GIT

^ permalink raw reply related

* Re: [PATCH v2 3/4] Implement clockevents driver for powerpc
From: Paul Mackerras @ 2007-10-19  0:14 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: linuxppc-dev, Thomas Gleixner, Realtime Kernel
In-Reply-To: <47177229.2030200@ru.mvista.com>

Sergei Shtylyov writes:

> > What problem do you see arising from this?
> 
>     Timers firing too early.

Only if the minimum interrupt latency is less than 1 decrementer
tick.  That seems pretty unlikely to me unless you have a very slow
timebase frequency.

In fact what we should program the decrementer to is:

	timeout - (is_booke? 0: 1) - min_interrupt_latency

I was assuming that min_interrupt_latency (measured in timebase ticks)
would be at least 1, but apparently some systems can have a timebase
frequency as low as 1kHz, so we'll have to have an ifdef or something.

Paul.

^ permalink raw reply

* Re: [PATCH v3 2/2] [POWERPC] MPC8568E-MDS: add support for flash
From: Anton Vorontsov @ 2007-10-18 22:29 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <C2A61B74-FDB4-4571-A2F4-F0B7317FD7C7@kernel.crashing.org>

On Thu, Oct 18, 2007 at 02:58:25PM -0500, Kumar Gala wrote:
[...]
> > +
> > +		flash@0,0 {
> > +			#address-cells = <1>;
> > +			#size-cells = <1>;
> > +			compatible = "Spansion,S29GL256N11TFIV2O", "cfi-flash";
> > +			reg = <0 0 2000000>;
> > +			bank-width = <2>;
> > +			device-width = <1>;
> > +
> 
> Are you basing the partition map on something or making it up?   
> Clearly hrcw & u-boot are at fixed offsets, wondering about kernel &  
> rootfs?

I'm making it up. From the brief look at the u-boot* git sources,
there is no `flashboot` yet, thus I'm free to make this up...

I've partitioned this flash based on these thoughts:

1. HRCW - whole sector, to not wear out it, plus it's impossible to
   create just 64 bytes partition;
2. Kernel - 2MB, should be enough for bootup kernels?
3. Rootfs - the rest up to...
4. U-Boot at the end.

I'm not saying that it's best map ever, I'm open to suggestions. ;-)

* Unfortunately I didn't look at the stock Freescale u-boot, maybe
  there was flashboot, most probably.. I'll find the CD to look this
  up.

Much thanks for noticing this,

-- 
Anton Vorontsov
email: cbou@mail.ru
backup email: ya-cbou@yandex.ru
irc://irc.freenode.net/bd2

^ permalink raw reply

* Re: [PATCH] [POWERPC] ucc_geth: Eliminate compile warnings
From: David Miller @ 2007-10-19  0:30 UTC (permalink / raw)
  To: Emilian.Medve; +Cc: netdev, leoli, jgarzik, linuxppc-dev
In-Reply-To: <1192745713-20829-1-git-send-email-Emilian.Medve@Freescale.com>

From: Emil Medve <Emilian.Medve@freescale.com>
Date: Thu, 18 Oct 2007 17:15:13 -0500

> drivers/net/ucc_geth.c: In function 'ucc_geth_startup':
> drivers/net/ucc_geth.c:2614: warning: assignment makes integer from pointer without a cast
> drivers/net/ucc_geth.c:2651: warning: assignment makes integer from pointer without a cast
> 
> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>

It only kills the warning on 32-bit systems, the cast is wrong
either way.

>  			ugeth->tx_bd_ring_offset[j] =
> -				kmalloc((u32) (length + align), GFP_KERNEL);
> +				(u32)kmalloc(length + align, GFP_KERNEL);
>  
>  			if (ugeth->tx_bd_ring_offset[j] != 0)
>  				ugeth->p_tx_bd_ring[j] =

Pointers can be up to "unsigned long" in size, therefore that
is the minimal amount of storage you need to store them into
if they are needed in integer form for some reason.

Any cast from pointer to integer like this is a huge red flag.

^ permalink raw reply

* tiny login: useradd command
From: pjmaiya @ 2007-10-18 10:24 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 800 bytes --]

Hi,
 I am using montavista linux. Using TCT tool i have added package for user creation. I am having following problem
  a.. If I use tiny login package, I will be getting useradd binary but number of parameter are few like
 Usage: adduser [OPTIONS]... <USER>
Options:

  -h <directory>    specify home directory
  -s <shell>        specify shell
  -g <gecos>        specify GECOS stringa.. If I don't use tiny login package, I will be selecting useradd package from admin menu. But I am unable to execute this command since it gives follwing error
      /usr/sbin/groupadd missing..

  a.. Actually I want useradd command similar to linux where more argument are taken, especially I wanted 'user' to be part of more than one group.
can anyone help me out..

thanx in advance,
pjmaiya

[-- Attachment #2: Type: text/html, Size: 1773 bytes --]

^ permalink raw reply

* Re: [PATCH v6 4/9] add platform support for MPC837x MDS board
From: Stephen Rothwell @ 2007-10-19  0:43 UTC (permalink / raw)
  To: Li Yang; +Cc: linuxppc-dev, paulus
In-Reply-To: <1192719847-25045-4-git-send-email-leoli@freescale.com>

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On Thu, 18 Oct 2007 23:04:02 +0800 Li Yang <leoli@freescale.com> wrote:
>
> +++ b/arch/powerpc/platforms/83xx/mpc837x_mds.c
> +static void __init mpc837x_mds_setup_arch(void)
> +{
> +#ifdef CONFIG_PCI
> +	struct device_node *np;
> +#endif
> +
> +	if (ppc_md.progress)
> +		ppc_md.progress("mpc837x_mds_setup_arch()", 0);
> +
> +#ifdef CONFIG_PCI
> +	for (np = NULL; (np = of_find_node_by_type(np, "pci")) != NULL;)

	for_each_node_by_type(np, "pci")

> +static int __init mpc837x_mds_probe(void)
> +{
> +        unsigned long root = of_get_flat_dt_root();
> +
> +        return of_flat_dt_is_compatible(root, "fsl,mpc837xmds");

To call these two routines, you should include <asm/prom.h> directly.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH v5 9/9] add MPC837x MDS board default device tree
From: David Gibson @ 2007-10-19  0:56 UTC (permalink / raw)
  To: Li Yang; +Cc: linuxppc-dev, paulus
In-Reply-To: <1192702580-6353-1-git-send-email-leoli@freescale.com>

On Thu, Oct 18, 2007 at 06:16:20PM +0800, Li Yang wrote:
> Signed-off-by: Li Yang <leoli@freescale.com>
> ---
> diff --git a/arch/powerpc/boot/dts/mpc8377_mds.dts b/arch/powerpc/boot/dts/mpc8377_mds.dts
> new file mode 100644
> index 0000000..8530de6
> --- /dev/null
> +++ b/arch/powerpc/boot/dts/mpc8377_mds.dts
> @@ -0,0 +1,282 @@
[snip]
> +		crypto@30000 {
> +			model = "SEC3";
> +			compatible = "talitos";

That compatible doesn't look specific enough.  It should at least have
a vendor portion.  In general it's best to have all the information
you need to pick a driver and options in compatible, rather than
splitting that info into model.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: Merge dtc
From: David Gibson @ 2007-10-19  1:30 UTC (permalink / raw)
  To: Milton Miller; +Cc: ppcdev, Paul Mackerras, Sam Ravnborg
In-Reply-To: <9a3f3582990ad94feb76770e82c50bdd@bga.com>

On Thu, Oct 18, 2007 at 12:49:54PM -0500, Milton Miller wrote:
> On Tue Oct 16 15:02:17 EST 2007, David Gibson wrote:
> 
> > This very large patch incorporates a copy of dtc into the kernel
> > source, in arch/powerpc/boot/dtc-src.  This means that dtc is no
> > longer an external dependency to build kernels with configurations
> > which need a dtb file.
> >
> > Signed-off-by: David Gibson <david at gibson.dropbear.id.au>
> >
> > Too big for the list, full patch at
> > http://ozlabs.org/~dgibson/home/merge-dtc.patch+
> 
> So split it up.   The obvious one is "here is the unique content, then 
> copy these files from dtc git" would have been better.

*facepalm*

The point of the small patches thing is not to glom together logically
separate patches.  It's *not* to split patches up simply for the sake
of making them smaller.

You've suggested the closest thing to a logical split here, but even
then - the dtc files are dead without the Makefile changes, and the
Makefile changes won't build without the dtc files (so the patches
would have to go in reversed order to what you suggest).

Not to mention that the Makefile patch will be tiny and the raw import
will still be way too big for the list.

> I finally went and looked at the url.  The Kbuild integration is wrong. 

Wrong?  Not how you'd prefer them perhaps...

>   It should build dtc in dtc-src and run the binary from there, and the 
> rules should be in a Kconfig as a normal host-target in that directory. 

Why?  This approach has the advantage that the subdirectory contains
*only* verbatim imported files, which could well make updates simpler.

>    (I don't have a problem with that Kbuild including Makefie.dtc).
> 
> things like the dependancy on scripts_basic in the top level
> Makefile.

I'm not even sure what you mean by this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: Merge dtc
From: David Gibson @ 2007-10-19  1:45 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: ppcdev, Paul Mackerras, Milton Miller
In-Reply-To: <20071018195926.GA22887@uranus.ravnborg.org>

On Thu, Oct 18, 2007 at 09:59:26PM +0200, Sam Ravnborg wrote:
> On Thu, Oct 18, 2007 at 12:49:54PM -0500, Milton Miller wrote:
> > On Tue Oct 16 15:02:17 EST 2007, David Gibson wrote:
> > 
> > >This very large patch incorporates a copy of dtc into the kernel
> > >source, in arch/powerpc/boot/dtc-src.  This means that dtc is no
> > >longer an external dependency to build kernels with configurations
> > >which need a dtb file.
> > >
> > >Signed-off-by: David Gibson <david at gibson.dropbear.id.au>
> > >
> > >Too big for the list, full patch at
> > >http://ozlabs.org/~dgibson/home/merge-dtc.patch+
> > 
> > So split it up.   The obvious one is "here is the unique content, then 
> > copy these files from dtc git" would have been better.
> One obvious split is a patch solely containing the _shipped files.
> And next patch the rest.

Um.. why?  No way is that a logical split.

> As Milton already pointed out you should build dtc in the
> dtc directory (why the -src prefix??).

The -src suffix is only there because I'm not building in the
directory - we can't have both a dtc binary and a dtc directory in
arch/powerpc/boot.  

Ok, so how do I build in the subdirectory?  I was going to do that,
but couldn't for the life of me figure out how.

> And the dtc specific Makefile looks like something from
> the late 80'. Please drop all these ALLUPPERCASE variables
> and accept a little bit of redundancy.

Hrm... I'm pretty dubious about this.  Practically every Makefile in
the universe, *except* Kbuild uses uppercase for most variables.
Makefile.dtc is imported verbatim from the standalone dtc package, and
is supposed to have the minimal information about what needs to be
built to import into Makefiles that actually know how to build things.

> Then mere humans may be able to read the Makefile.

Says a maintainer of Kbuild, about my tiny and not-very-complex
Makefile fragment... um, ok...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: Freescale Interrupt enabling
From: David Gibson @ 2007-10-19  1:46 UTC (permalink / raw)
  To: Alan Bennett; +Cc: linuxppc-dev
In-Reply-To: <bfa0697f0710180741i5063ad94ga13086bf3c43a511@mail.gmail.com>

On Thu, Oct 18, 2007 at 08:41:35AM -0600, Alan Bennett wrote:
> We were developing with Linux 2.6.10 and a Planetcore boot loader,
> however, recent work has us up and running with 2.6.23+ and U-boot
> 1.2.0.  However, we are now running into a few challenges regarding
> the differences.
> 
> Our driver writer's code isn't functioning, but it was with 2.6.10 and
> planet core.  The best I can tell is that the default interrupt
> controller configuration isn't where it was in the planetcore/2.6.10
> version.
> 
> for example, let's look at enabling timer1 / interrupt number 12
> 
> simple description.
> 	       timer1 {
> 	 	       name = "timer1";
>  		       compatible = "fsl,mpc8248_timer";
> 		       interrupts = <c 8>;
> 		       interrupt-parent = <&PIC>;

Device tree nodes should really describe an actual device, not just
some random floating interrupt.  You need to work out what device this
interrupt actually belongs to, and describe that.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [PATCH v2 3/4] Implement clockevents driver for powerpc
From: Paul Mackerras @ 2007-10-19  1:53 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: linuxppc-dev, Thomas Gleixner, Realtime Kernel
In-Reply-To: <471777BD.8090800@ru.mvista.com>

Sergei Shtylyov writes:

>     And now you have incomplete read_persistent_clock() implementation for 

I don't see anything incomplete about it.  If you do, feel free to
post a patch.

> example, god knows why it was preferred to mine -- well, it also implemented 

Your most recent post of your patch to implement read_persistent_clock
was in May -- five months ago -- and you said this about it: "This
patch hasn't received a good testing though".

You don't have to be a god to figure out why I preferred a patch that
had been tested, where the author was responding to comments and
posting updated versions of his patch in the period leading up to the
merge window, over that.

>     Well, that's up to you.  I take it you wouldn't accept a patch 
> implementing auto-reload mode?

I already told you.  Show me numbers (real measurements showing that
it's better) and I'll consider it.

>     There are. I'll have to send patches (it's not that I have time for this) 
> but this is surely the fastest way to get things fixed (if I don't get ignored 
> that is).

All of us only get stuff in by spending the time to develop patches
and posting them for comment.  Stop whinging.

>     I just wanted the reasons clarified and got what I wanted -- as I thought, 
> the decision behind preferring patches was somewhat biased, nobody really 
> cared about code quality or just wasn't familiar with hrtimers enough to judge 
> on the code quality...

You really know how to persuade people to cooperate, don't you...  :P

Paul.

^ permalink raw reply

* [PATCH v2] [2/2] powerpc: Switch to generic WARN_ON()/BUG_ON()
From: Olof Johansson @ 2007-10-19  2:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: grundler, kyle, linuxppc-dev, lethal, akpm
In-Reply-To: <20071011171413.GC10877@lixom.net>

[POWERPC] Switch to generic WARN_ON()/BUG_ON()

Not using the ppc-specific WARN_ON/BUG_ON constructs actually saves about
4K text on a ppc64_defconfig. The main reason seems to be that prepping
the arguments to the conditional trap instructions is more work than
just doing a compare and branch.

Signed-off-by: Olof Johansson <olof@lixom.net>

Index: k.org/include/asm-powerpc/bug.h
===================================================================
--- k.org.orig/include/asm-powerpc/bug.h
+++ k.org/include/asm-powerpc/bug.h
@@ -54,12 +54,6 @@
 	".previous\n"
 #endif
 
-/*
- * BUG_ON() and WARN_ON() do their best to cooperate with compile-time
- * optimisations. However depending on the complexity of the condition
- * some compiler versions may not produce optimal results.
- */
-
 #define BUG() do {						\
 	__asm__ __volatile__(					\
 		"1:	twi 31,0,0\n"				\
@@ -69,20 +63,6 @@
 	for(;;) ;						\
 } while (0)
 
-#define BUG_ON(x) do {						\
-	if (__builtin_constant_p(x)) {				\
-		if (x)						\
-			BUG();					\
-	} else {						\
-		__asm__ __volatile__(				\
-		"1:	"PPC_TLNEI"	%4,0\n"			\
-		_EMIT_BUG_ENTRY					\
-		: : "i" (__FILE__), "i" (__LINE__), "i" (0),	\
-		  "i" (sizeof(struct bug_entry)),		\
-		  "r" ((__force long)(x)));			\
-	}							\
-} while (0)
-
 #define __WARN() do {						\
 	__asm__ __volatile__(					\
 		"1:	twi 31,0,0\n"				\
@@ -92,23 +72,6 @@
 		  "i" (sizeof(struct bug_entry)));		\
 } while (0)
 
-#define WARN_ON(x) ({						\
-	int __ret_warn_on = !!(x);				\
-	if (__builtin_constant_p(__ret_warn_on)) {		\
-		if (__ret_warn_on)				\
-			__WARN();				\
-	} else {						\
-		__asm__ __volatile__(				\
-		"1:	"PPC_TLNEI"	%4,0\n"			\
-		_EMIT_BUG_ENTRY					\
-		: : "i" (__FILE__), "i" (__LINE__),		\
-		  "i" (BUGFLAG_WARNING),			\
-		  "i" (sizeof(struct bug_entry)),		\
-		  "r" (__ret_warn_on));				\
-	}							\
-	unlikely(__ret_warn_on);				\
-})
-
 #endif /* __ASSEMBLY __ */
 #endif /* CONFIG_BUG */
 

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Herbert Xu @ 2007-10-19  2:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linuxppc-dev, Thomas Gleixner, akpm,
	Ingo Molnar
In-Reply-To: <alpine.LFD.0.999.0710181627091.26902@woody.linux-foundation.org>

On Thu, Oct 18, 2007 at 04:39:59PM -0700, Linus Torvalds wrote:
>
> Basic notion: the only thing that serializes the IRQ_INPROGRESS flag is 
> the descriptor lock. And we don't have to (or even want to!) hold it while 
> waiting for the thing, but we want to *have*held*it* in between whatever 
> we're synchronizing with.

Thinking about this again and checking code I have come to the
conclusion that

1) There is a race (in some drivers) even with the full mb.
2) Linus's patch fixes it.
3) It's difficult to fix it elegantly in the driver.

In other words I think this patch is great :)

First of all let's agree on some basic assumptions:

* A pair of spin lock/unlock subsumes the effect of a full mb.
* A spin lock in general only equates to (SS/SL/LL).
* A spin unlock in general only equates to (SS/LS).

In particular, a load after a spin unlock may pass before the
spin unlock.

Here is the race (with tg3 as the only example that I know of).
The driver attempts to quiesce interrupts such that after the
call to synchronize_irq it is intended that no further IRQ
handler calls for that device will do any work besides acking
the IRQ.

Here is how it does it:

CPU0				CPU1
				spin lock
					load irq_sync
irq_sync = 1
smp_mb
synchronize_irq()
	while (IRQ_INPROGRESS)
		wait
	return
				set IRQ_INPROGRESS
				spin unlock
				tg3_msi
					ack IRQ
					if (irq_sync)
						return
					do work

The problem here is that load of irq_sync in the handler has
passed above the setting of IRQ_INPROGRESS.

Linus's patch fixes it because this becomes:

CPU0				CPU1
				spin lock
					load irq_sync
irq_sync = 1
smp_mb
synchronize_irq
				set IRQ_INPROGRESS
				spin unlock
	spin lock
	spin unlock
				tg3_msi
					ack IRQ
					if (irq_sync)
						return
					do work
	while (IRQ_INPROGRESS)
		wait
				spin lock
				clear IRQ_INPROGRESS
				spin unlock
	return

Even though we still do the work on the right we will now notice
the INPROGRESS flag on the left and wait.

It's hard to fix this in the drivers because they'd either have
to access the desc lock or add a full mb to the fast path on the
right.

Once this goes in we can also remove the smp_mb from tg3.c.  BTW,
a lot of drivers (including the fusion example Ben quoted) call
synchronize_irq before free_irq.  This is unnecessary because
the latter already calls it anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Linus Torvalds @ 2007-10-19  2:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Kernel Mailing List, linuxppc-dev, Thomas Gleixner, akpm,
	Ingo Molnar
In-Reply-To: <20071019023219.GB8453@gondor.apana.org.au>

On Fri, 19 Oct 2007, Herbert Xu wrote:
>
> In other words I think this patch is great :)

Hey, I appreciate it, but I do think that the "spinlock only to 
immediately unlock it again" is pretty horrid.

I'm convinced that there should be some way to do this without actually 
taking the lock. I *think* it should work with something like

	for (;;) {
		smp_rmb();
		if (!spin_is_locked(&desc->lock)) {
			smp_rmb();
			if (!(desc->status & IRQ_INPROGRESS)
				break;
		}
		cpu_relax();
	}

instead. Which basically requires that we see the descriptor lock being 
not held, before we see the IRQ_INPROGRESS bit being clear. Put another 
way: it loops until it sees the lock having been released, and the 
IRQ_INPROGRESS bit being clear after that.

The above requires no serializing instructions on x86, which is a good 
goal (now that smp_rmb() is just a compiler barrier). And it doesn't 
actually have to bounce any cachelines.

And it doesn't have that ugly "get lock only to release it", which just 
makes me go "Eww!".

But it's a bit subtler. It basically depends on the fact that 
spin_unlock() obviously has to make sure that there is a release barrier 
in the unlock, so any writes done (to the IRQ_INPROGRESS bit) within the 
locked region *must* be visible before the spinlock itself has been 
released.

So somebody should:
 - use another pair of eyes and brains to back me up on this.
 - write up some coherent changelog entry, using the emails that have 
   passed back and forth.
 - actually turn the above into a tested patch with a comment.

And I'm pushing for that "somebody" being somebody else than me ;)

			Linus

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Nick Piggin @ 2007-10-19  2:52 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Kernel Mailing List, linuxppc-dev, Thomas Gleixner, akpm,
	Linus Torvalds, Ingo Molnar
In-Reply-To: <20071019023219.GB8453@gondor.apana.org.au>

On Friday 19 October 2007 12:32, Herbert Xu wrote:

> First of all let's agree on some basic assumptions:
>
> * A pair of spin lock/unlock subsumes the effect of a full mb.

Not unless you mean a pair of spin lock/unlock as in
2 spin lock/unlock pairs (4 operations).

*X = 10;
spin_lock(&lock);
/* *Y speculatively loaded here */
/* store to *X leaves CPU store queue here */
spin_unlock(&lock);
y = *Y;

> * A spin lock in general only equates to (SS/SL/LL).
> * A spin unlock in general only equates to (SS/LS).

I don't use the sparc barriers, so they don't come naturally to
me ;)

I think both loads and stores can pass into the critical section
by having the spin_lock pass earlier ops, or by having spin_unlock
be passed by later ones.


> In particular, a load after a spin unlock may pass before the
> spin unlock.
>
> Here is the race (with tg3 as the only example that I know of).
> The driver attempts to quiesce interrupts such that after the
> call to synchronize_irq it is intended that no further IRQ
> handler calls for that device will do any work besides acking
> the IRQ.
>
> Here is how it does it:
>
> CPU0				CPU1
> 				spin lock
> 					load irq_sync
> irq_sync = 1
> smp_mb
> synchronize_irq()
> 	while (IRQ_INPROGRESS)
> 		wait
> 	return
> 				set IRQ_INPROGRESS
> 				spin unlock
> 				tg3_msi
> 					ack IRQ
> 					if (irq_sync)
> 						return
> 					do work
>
> The problem here is that load of irq_sync in the handler has
> passed above the setting of IRQ_INPROGRESS.
>
> Linus's patch fixes it because this becomes:
>
> CPU0				CPU1
> 				spin lock
> 					load irq_sync
> irq_sync = 1
> smp_mb
> synchronize_irq
> 				set IRQ_INPROGRESS
> 				spin unlock
> 	spin lock
> 	spin unlock
> 				tg3_msi
> 					ack IRQ
> 					if (irq_sync)
> 						return
> 					do work
> 	while (IRQ_INPROGRESS)
> 		wait
> 				spin lock
> 				clear IRQ_INPROGRESS
> 				spin unlock
> 	return
>
> Even though we still do the work on the right we will now notice
> the INPROGRESS flag on the left and wait.
>
> It's hard to fix this in the drivers because they'd either have
> to access the desc lock or add a full mb to the fast path on the
> right.
>
> Once this goes in we can also remove the smp_mb from tg3.c.  BTW,
> a lot of drivers (including the fusion example Ben quoted) call
> synchronize_irq before free_irq.  This is unnecessary because
> the latter already calls it anyway.
>
> Cheers,

^ permalink raw reply

* Re: [PATCH] [POWERPC] powerpc: Add -mno-spe for ARCH=powerpc builds
From: Kumar Gala @ 2007-10-19  3:20 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev
In-Reply-To: <18199.59909.763111.810973@cargo.ozlabs.ibm.com>


On Oct 18, 2007, at 6:19 PM, Paul Mackerras wrote:

> Kumar Gala writes:
>
>> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
>
> Your commit message doesn't give any reason why you are doing this, or
> any explanation of what goes wrong without it.  In fact, the commit
> message is completely empty. :)  Please resubmit with a decent commit
> message.

I will, just as an FYI I based this on your commit for -mno-altivec  
(which has no rationale for the commit) :)

- k

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Linus Torvalds @ 2007-10-19  3:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linux Kernel Mailing List, linuxppc-dev, Thomas Gleixner, akpm,
	Ingo Molnar
In-Reply-To: <alpine.LFD.0.999.0710181941020.26902@woody.linux-foundation.org>

On Thu, 18 Oct 2007, Linus Torvalds wrote:
>
> I *think* it should work with something like
> 
> 	for (;;) {
> 		smp_rmb();
> 		if (!spin_is_locked(&desc->lock)) {
> 			smp_rmb();
> 			if (!(desc->status & IRQ_INPROGRESS)
> 				break;
> 		}
> 		cpu_relax();
> 	}

I'm starting to doubt this. 

One of the issues is that we still need the smp_mb() in front of the loop 
(because we want to serialize the loop with any writes in the caller).

The other issue is that I don't think it's enough that we saw the 
descriptor lock unlocked, and then the IRQ_INPROGRESS bit clear. It might 
have been unlocked *while* the IRQ was in progress, but the interrupt 
handler is now in its last throes, and re-takes the spinlock and clears 
the IRQ_INPROGRESS thing. But we're not actually happy until we've seen 
the IRQ_INPROGRESS bit clear and the spinlock has been released *again*.

So those two tests should actually be the other way around: we want to see 
the IRQ_INPROGRESS bit clear first.

It's all just too damn subtle and clever. Something like this should not 
need to be that subtle. 

Maybe the rigth thing to do is to not rely on *any* ordering what-so-ever, 
and just make the rule be: "if you look at the IRQ_INPROGRESS bit, you'd 
better hold the descriptor spinlock", and not have any subtle ordering 
issues at all.

But that makes us have a loop with getting/releasing the lock all the 
time, and then we get back to horrid issues with cacheline bouncing and 
unfairness of cache accesses across cores (ie look at the issues we had 
with the runqueue starvation in wait_task_inactive()).

Those were fixed by starting out with the non-locked and totally unsafe 
versions, but then having one last "check with lock held, and repeat only 
if that says things went south". 

See commit fa490cfd15d7ce0900097cc4e60cfd7a76381138 and ponder. Maybe we 
should take the same approach here, and do something like

	repeat:
		/* Optimistic, no-locking loop */
		while (desc->status & IRQ_INPROGRESS)
			cpu_relax();

		/* Ok, that indicated we're done: double-check carefully */
		spin_lock_irqsave(&desc->lock, flags);
		status = desc->status;
		spin_unlock_irqrestore(&desc->lock, flags);

		/* Oops, that failed? */
		if (status & IRQ_INPROGRESS)
			goto repeat;

Hmm?

			Linus

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Herbert Xu @ 2007-10-19  3:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: herbert, linux-kernel, linuxppc-dev, tglx, akpm, torvalds, mingo
In-Reply-To: <200710191252.18272.nickpiggin@yahoo.com.au>

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> First of all let's agree on some basic assumptions:
>>
>> * A pair of spin lock/unlock subsumes the effect of a full mb.
> 
> Not unless you mean a pair of spin lock/unlock as in
> 2 spin lock/unlock pairs (4 operations).
> 
> *X = 10;
> spin_lock(&lock);
> /* *Y speculatively loaded here */
> /* store to *X leaves CPU store queue here */
> spin_unlock(&lock);
> y = *Y;

Good point.

Although in this case we're still safe because in the worst
cases:

CPU0				CPU1
irq_sync = 1
synchronize_irq
	spin lock
	load IRQ_INPROGRESS
irq_sync sync is visible
	spin unlock
				spin lock
					load irq_sync
	while (IRQ_INPROGRESS)
		wait
	return
				set IRQ_INPROGRESS
				spin unlock
				tg3_msi
					ack IRQ
					if (irq_sync)
						return
				spin lock
				clear IRQ_INPROGRESS
				spin unlock

------------------------------------------------------------

CPU0				CPU1
				spin lock
					load irq_sync
irq_sync = 1
synchronize_irq
				set IRQ_INPROGRESS
				spin unlock
	spin lock
	load IRQ_INPROGRESS
irq_sync sync is visible
	spin unlock
	while (IRQ_INPROGRESS)
		wait
				tg3_msi
					ack IRQ
					if (irq_sync)
						return
					do work
				spin lock
				clear IRQ_INPROGRESS
				spin unlock
	return

So because we're using the same lock on both sides, it does
do the right thing even without the memory barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* powerpc 440 monitor program
From: Bai Shuwei @ 2007-10-19  3:32 UTC (permalink / raw)
  To: linuxppc-embedded, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]

all,
    hi, when I boot my system, there are no output on the screen.
I choice the framebuffer and my monitor card is ATI PAGE 128 PCI, qt/e GUI.
When i compile the kernel, the boot logo and aty 128fb has be compiled into
kernel patched with videoboot, and x86emu. The video parameters set:
     video=aty128fb:640x480-16@7.
    The kenel can recognize the monitor card, and the frame buffer device
has run.
The output showed below:

    videoboot: Booting PCI video card bus 0, function 0, device 6
   aty128fb: Found Intel x86 BIOS ROM Image
   aty128fb: Rage128 BIOS located
   aty128fb: Rage128 RK PCI [chip rev 0x2] 32M 64-bit SDR SGRAM (2:1)
   fb0: ATY Rage128 frame buffer device on Rage128 RK PCI

My problem is there is no output on the monitor,and cannot see the penguin
boot logo.
I also run a program to test the framebuffer, which can read data from the
fb. After running the qt/e program, which can run scussessfully, there is no
output.!
   The test program reading data are showed below;


id = ATY Rage128
depth =  16
smemlen = 33554432
line_length = 0
FB_TYPE_ = 0
type_aux = 0
FB_VISUAL_ = 3
xpanstep = 8
ypanstep = 1
ywrapstep = 0
mmio_start = bbffc0000d
mmio_len = 8192
accel = 32
xres = 640
yres = 480
xres_virtul = 640
yres_virtul = 480
xoffset = 0
yoffset = 0
bits_per_pixel = 16

         My linux kernel is DENX 2.6.19.

        What can I do to make my system output into monitor.?

best regards!

buroc
-- 

Add: Tianshui South Road 222, Lanzhou, P.R.China
Tel: +86-931-8912025
Zip Code: 730000
URL: oss.lzu.edu.cn
Email: baishuwei@gmail.com, buroc@126.com

Tel: +86-931-8912025
Zip Code: 730000
URL: oss.lzu.edu.cn
Email: baishuwei@gmail.com, buroc@126.com

[-- Attachment #2: Type: text/html, Size: 2511 bytes --]

^ permalink raw reply

* Re: powerpc 440 monitor program
From: David Gibson @ 2007-10-19  3:49 UTC (permalink / raw)
  To: Bai Shuwei; +Cc: linuxppc-dev, linuxppc-embedded
In-Reply-To: <f3566d60710182032h69e2d2bejafd96f18dae95829@mail.gmail.com>

On Fri, Oct 19, 2007 at 11:32:33AM +0800, Bai Shuwei wrote:
> all,
>     hi, when I boot my system, there are no output on the screen.
> I choice the framebuffer and my monitor card is ATI PAGE 128 PCI, qt/e GUI.
> When i compile the kernel, the boot logo and aty 128fb has be compiled into
> kernel patched with videoboot, and x86emu. The video parameters set:
>      video=aty128fb:640x480-16@7.
>     The kenel can recognize the monitor card, and the frame buffer device
> has run.
> The output showed below:
> 
>     videoboot: Booting PCI video card bus 0, function 0, device 6
>    aty128fb: Found Intel x86 BIOS ROM Image
>    aty128fb: Rage128 BIOS located
>    aty128fb: Rage128 RK PCI [chip rev 0x2] 32M 64-bit SDR SGRAM (2:1)
>    fb0: ATY Rage128 frame buffer device on Rage128 RK PCI
> 
> My problem is there is no output on the monitor,and cannot see the penguin
> boot logo.

Uh.. most 440 systems use a serial console.  It ought in theory to be
possible to use a framebuffer instead, but it wouldn't surprise me if
rather a lot of debugging was necessary.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* RE: [PATCH v6 4/9] add platform support for MPC837x MDS board
From: Li Yang-r58472 @ 2007-10-19  3:56 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev, paulus
In-Reply-To: <20071019104318.edfd2ad1.sfr@canb.auug.org.au>

> -----Original Message-----
> From: Stephen Rothwell [mailto:sfr@canb.auug.org.au]=20
> Sent: Friday, October 19, 2007 8:43 AM
> To: Li Yang-r58472
> Cc: galak@kernel.crashing.org; paulus@samba.org;=20
> linuxppc-dev@ozlabs.org
> Subject: Re: [PATCH v6 4/9] add platform support for MPC837x MDS board

> > +static int __init mpc837x_mds_probe(void) {
> > +        unsigned long root =3D of_get_flat_dt_root();
> > +
> > +        return of_flat_dt_is_compatible(root, "fsl,mpc837xmds");
>=20
> To call these two routines, you should include <asm/prom.h> directly.

<asm/prom.h> is included in <linux/of.h> now.  Do you think it should be
included explicitly?

- Leo

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Benjamin Herrenschmidt @ 2007-10-19  4:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Linux Kernel Mailing List, linuxppc-dev,
	Thomas Gleixner, akpm, Ingo Molnar
In-Reply-To: <alpine.LFD.0.999.0710182007170.26902@woody.linux-foundation.org>

> 
> 	repeat:
> 		/* Optimistic, no-locking loop */
> 		while (desc->status & IRQ_INPROGRESS)
> 			cpu_relax();
> 
> 		/* Ok, that indicated we're done: double-check carefully */
> 		spin_lock_irqsave(&desc->lock, flags);
> 		status = desc->status;
> 		spin_unlock_irqrestore(&desc->lock, flags);
> 
> 		/* Oops, that failed? */
> 		if (status & IRQ_INPROGRESS)
> 			goto repeat;
> 
> Hmm?

Paulus and I convinced ourselves that this would work. If we call our
variable that gets set before synchronize_irq and read in the IRQ
handler "foo", we get to:

	- writing foo can travel down at most to just before the unlock in the
code above

	- reading foo can travel up out of the IRQ handler at most just after
the lock in the code that sets IRQ_INPROGRESS.

The whole lock/set IRQ_INPROGRESS/unlock path can then only happen
before the locked section above, in which case we see and wait nicely
and all is good, or after, in which case the store to foo will be
visible to the IRQ handler as it will be ordered with the unlock in the
code above.

Pfiew !

So Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Thanks !

Ben.

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Herbert Xu @ 2007-10-19  4:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linuxppc-dev, Thomas Gleixner, akpm,
	Ingo Molnar
In-Reply-To: <alpine.LFD.0.999.0710182007170.26902@woody.linux-foundation.org>

On Thu, Oct 18, 2007 at 08:26:45PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 18 Oct 2007, Linus Torvalds wrote:
> >
> > I *think* it should work with something like
> > 
> > 	for (;;) {
> > 		smp_rmb();
> > 		if (!spin_is_locked(&desc->lock)) {
> > 			smp_rmb();
> > 			if (!(desc->status & IRQ_INPROGRESS)
> > 				break;
> > 		}
> > 		cpu_relax();
> > 	}
> 
> I'm starting to doubt this. 
> 
> One of the issues is that we still need the smp_mb() in front of the loop 
> (because we want to serialize the loop with any writes in the caller).

Right.  I think if we accept the definition of a spin lock/unlock
that Nick outlined earlier, then we can see that on the IRQ path
there simply isn't a memory barrier between the changing of the
status field and the execution of the action:

	spin_lock
	set IRQ_INPROGRESS
	spin_unlock

	action

	spin_lock
	clear IRQ_INPROGRESS
	spin_unlock

What may happen is that action can either float upwards to give

	spin_lock
	action
	set IRQ_INPROGRESS
	spin_unlock

	spin_lock
	clear IRQ_INPROGRESS
	spin_unlock

or it can float downwards to give

	spin_lock
	set IRQ_INPROGRESS
	spin_unlock

	spin_lock
	clear IRQ_INPROGRESS
	action
	spin_unlock

As a result we can add as many barriers as we want on the slow
(synchronize_irq) path and it just won't do any good (not unless
we add some barriers on the IRQ path == fast path).

What we do have on the right though is the fact in some ways
action behaves as if it's inside the spin lock even though it's
not.  In particular, action cannot float up past the first spin
lock nor can it float down past the last spin unlock.

> See commit fa490cfd15d7ce0900097cc4e60cfd7a76381138 and ponder. Maybe we 
> should take the same approach here, and do something like
> 
> 	repeat:
> 		/* Optimistic, no-locking loop */
> 		while (desc->status & IRQ_INPROGRESS)
> 			cpu_relax();
> 
> 		/* Ok, that indicated we're done: double-check carefully */
> 		spin_lock_irqsave(&desc->lock, flags);
> 		status = desc->status;
> 		spin_unlock_irqrestore(&desc->lock, flags);
> 
> 		/* Oops, that failed? */
> 		if (status & IRQ_INPROGRESS)
> 			goto repeat;

That's why I think this patch is in fact the only one that
solves all the races in this thread.  The case that it solves
which the lock/unlock patch does not is the one where action
flows downwards past the clearing of IRQ_INPROGRESS.  I missed
this case earlier.

In fact this bug exists elsewhere too.  For example, the network
stack does this in net/sched/sch_generic.c:

        /* Wait for outstanding qdisc_run calls. */
	while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
		yield();

This has the same problem as the current synchronize_irq code.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] synchronize_irq needs a barrier
From: Benjamin Herrenschmidt @ 2007-10-19  4:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Linux Kernel Mailing List, linuxppc-dev,
	Thomas Gleixner, akpm, Ingo Molnar
In-Reply-To: <1192767073.7367.84.camel@pasglop>

> The whole lock/set IRQ_INPROGRESS/unlock path can then only happen
> before the locked section above, in which case we see and wait nicely
> and all is good, or after, in which case the store to foo will be
> visible to the IRQ handler as it will be ordered with the unlock in the
> code above.

Note that napi_synchronize needs a slightly different treatement.

Here, the situation boils down to:

one CPU does:

	foo = 1;
	while(test_bit(bar))
		barrier();

and the other:

	if (!test_and_set_bit(bar)) {
		read & use foo
		smp_mb__before_clear_bit();
		clear_bit(bar);
	}

The good thing here is that read & use foo is part of the critical
section (I hate hand-made locks ...) defined by bar which makes
things somewhat easier than the synchronize_irq() case.

I think a simple smp_mb(); here after foo = 1; is enough, which means
basically just having an smp_mp(); inside napi_synchronize(), before
the test_bit(). Or do I miss something ?

Cheers,
Ben.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox