LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc/pseries: Add support for IO Event interrupt drivers
From: Sonny Rao @ 2010-05-19 18:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: markn, linuxppc-dev, miltonm
In-Reply-To: <1274271058.13433.30.camel@concordia>

On Wed, May 19, 2010 at 10:10:58PM +1000, Michael Ellerman wrote:
<snip>
> > > The "checks the scope" requires an RTAS call, which takes a global lock
> > > (and you add another) - these aren't going to be used for anything
> > > performance critical I hope?
> > 
> > Nope it shouldn't be performance critical, but it does raise the point
> > that the current RTAS implementation in Linux *always* uses the global
> > lock.  There is a set of calls which are not required to be serialized
> > against each other.  This would be a totally different patchset to fix that
> > problem.  The "check-exception" call is one that doesn't require the global
> > RTAS lock.  I might work on that someday :-)
> 
> Aha, that's kind of what I was wondering. I take it PAPR documents which
> calls need to be serialised and which don't?

Yeah, here's my workin list of what calls can avoid the global lock:

List of "re-entrant to the number of processors in the system" RTAS Calls
--
ibm,get-xive
ibm,set-xive
ibm,int-off
ibm,int-on


OS machine check and soft-reset handlerse must be able to call rtas
(I'm saying these are therefore re-entrant because we could deadlock
if we took a machine check or reset with the global lock held)
--
nvram-fetch
nvram-store
check-exception (includes our io-events)
display-character
system-reboot
set-power-level(0,0)
power-off
ibm,set-eeh-option
ibm,set-slot-reset
ibm,read-slot-reset-state2

additional serialiaztion group by itself
--
stop-self
start-cpu
set-power-level


<snip>
> > Also, if we're going to go ahead and use rtas_call() which locks
> > it's own buffer which meets the requirements, why do we even need
> > a separate buffer?  Really, we should make this call, then copy
> > the content of the buffer before handing it over to the drivers.
> 
> But another CPU could rtas_call() and blow away our buffer after we've
> dropped the RTAS lock but before we've used the content of the buffer.

Yeah, maybe I'm getting ahead of myself here -- to me this just highlights
the whole locking problem with the API as written, since the locking is
all done inside that call. The API needs to be extended such that
we have the option of doing our own locking of RTAS calls.


> > > > +	if (rtas_elog->type != RTAS_TYPE_IO_EVENT) {
> > > > +		pr_warning("IO Events: We got called with an event type of %d"
> > > > +			   " rather than %d!\n", rtas_elog->type,
> > > > +			   RTAS_TYPE_IO_EVENT);
> > > > +		WARN_ON(1);
> > > > +		goto out;
> > > > +	}
> > 
> > Should we try to process this instead of just warning?  
> > The type we get might be one of the the ones we recognize in
> > ras.c; so this is an argument for combining ras.c with this code
> > or at least report this in the same manner we report any other
> > RTAS error log.
> 
> We could, but that would be a massive firmware bug - not that it
> wouldn't happen, but it would be Not Our Problem TM.

Yeah, this is paranoia (*cough* Milton's suggestion)

> > > We /could/ copy the ioei_sec and drop the buf lock, which would allow
> > > another interrupt to come in and start doing the RTAS call (on another
> > > cpu, and iff there are actually multiple interrupts). But we probably
> > > don't care.
> > 
> > I think we *have* to copy it because we don't want our lock held when we
> > call random handlers.
> 
> They're not really random, and as long as they don't call the
> register/unregister routines it should be /OK/. But copying is probably
> good so we don't hold the lock for too long.

Yeah, this is probably ok since it's all happening in interrupt
context anyway the handlers have to be running in an atomic context
anyway.

-- 
Sonny Rao, LTC OzLabs, BML team

^ permalink raw reply

* [PATCH] powerpc: make the padding for the device tree a configurable option
From: Timur Tabi @ 2010-05-19 19:53 UTC (permalink / raw)
  To: benh, linuxppc-dev, devicetree-discuss

Add the DTS_PADDING Kconfig option, which allows users and board defconfig
files to specify a padding value when compiling device trees.

When a device tree source (DTS) is compiled into a binary (DTB), a hard-coded
padding of 1024 bytes is used (i.e. dtc command-line parameter "-p 1024").
Although this has worked so far, it may not be sufficient in the future for
some boards.  Newer versions of U-boot perform more and more fixup on the
device tree, and eventually it will run out.

So to accommodate future boards where more padding is needed, we make the
option for the -p parameter configurable.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 arch/powerpc/boot/Makefile     |    4 ++--
 arch/powerpc/platforms/Kconfig |   13 +++++++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index bb2465b..750fa6b 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -35,7 +35,7 @@ endif
 
 BOOTCFLAGS	+= -I$(obj) -I$(srctree)/$(obj)
 
-DTS_FLAGS	?= -p 1024
+DTS_FLAGS	?= -p ${CONFIG_DTS_PADDING}
 
 $(obj)/4xx.o: BOOTCFLAGS += -mcpu=405
 $(obj)/ebony.o: BOOTCFLAGS += -mcpu=405
@@ -331,7 +331,7 @@ $(obj)/treeImage.%: vmlinux $(obj)/%.dtb $(wrapperbits)
 # Rule to build device tree blobs
 DTC = $(objtree)/scripts/dtc/dtc
 
-$(obj)/%.dtb: $(dtstree)/%.dts
+$(obj)/%.dtb: $(dtstree)/%.dts $(srctree)/.config
 	$(DTC) -O dtb -o $(obj)/$*.dtb -b 0 $(DTS_FLAGS) $(dtstree)/$*.dts
 
 # If there isn't a platform selected then just strip the vmlinux.
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index d1663db..2b0a9c3 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -41,6 +41,19 @@ config PPC_OF_BOOT_TRAMPOLINE
 
 	  In case of doubt, say Y
 
+config DTS_PADDING
+	int "Padding for the device tree binary"
+	default 1024
+	help
+	  Specify the padding value to be used when compiling a DTS (device
+	  tree source) file into a DTB (device tree binary).  The padding is
+	  used to ensure enough space for any additional nodes and properties
+	  that the boot loader adds during fix-up.  If your boot loader
+	  complains about lack of space during fix-up, try increasing this
+	  value.
+
+	  If unsure, leave this value at the default.
+
 config UDBG_RTAS_CONSOLE
 	bool "RTAS based debug console"
 	depends on PPC_RTAS
-- 
1.6.5

^ permalink raw reply related

* Re: [PATCH 1/4] powerpc/fsl_soc.c: prepare for addition of mpc5121 USB code
From: Grant Likely @ 2010-05-19 20:47 UTC (permalink / raw)
  To: Anatolij Gustschin
  Cc: Greg Kroah-Hartman, Wolfgang Denk, Detlev Zundel, linux-usb,
	linuxppc-dev, David Brownell
In-Reply-To: <20100506211800.335baf63@wker>

On Thu, May 6, 2010 at 1:18 PM, Anatolij Gustschin <agust@denx.de> wrote:
> Hi Grant,
>
> On Tue, 27 Apr 2010 10:51:21 -0600
> Grant Likely <grant.likely@secretlab.ca> wrote:
>
>> On Tue, Apr 27, 2010 at 10:11 AM, Anatolij Gustschin <agust@denx.de> wro=
te:
>> > Factor out common code for registering a FSL EHCI platform
>> > device into new fsl_usb2_register_device() function. This
>> > is done to avoid code duplication while adding code for
>> > instantiating of MPC5121 dual role USB platform devices.
>> > Then, the subsequent patch can use
>> > for_each_compatible_node(np, NULL, "fsl,mpc5121-usb2-dr") {
>> > =A0 =A0 =A0 =A0...
>> > =A0 =A0 =A0 =A0fsl_usb2_register_device();
>> > }
>> >
>> > Signed-off-by: Anatolij Gustschin <agust@denx.de>
>> > Cc: Kumar Gala <galak@kernel.crashing.org>
>> > Cc: Grant Likely <grant.likely@secretlab.ca>
>> > ---
>> > =A0arch/powerpc/sysdev/fsl_soc.c | =A0231 +++++++++++++++++++---------=
------------
>>
>> Hi Anatolij,
>>
>> Thanks for this work. =A0However, I've got concerns.
>>
>> Forgive me for ragging on code that you didn't write, but this
>> fsl_soc.c code for registering the USB device really doesn't belong
>> here anymore. =A0It should be part of the drivers/usb/host/ehci-fsl.c
>> and the driver should do of-style binding (Which should be a lot
>> easier if I manage to get the merge of platform bus and of_platform
>> bus into 2.6.35).
>
> Now I moved the USB devices registration code to
> drivers/usb/host/ehci-fsl.c and added of-style binding there. It
> works for one platform driver based on your test-devicetree branch.
> It seems I can't bind more than one driver to the device described
> in the tree. But I need to bind at least 2 drivers, ehci-hcd and
> fsl-usb2-udc. For USB OTG support I need additional one to be bound
> to the same USB dual role device (also tree different drivers
> simultaneously).
> I also can't register UDC device in the ehci-fsl.c since registering
> of the UDC device should also be possible independent of the EHCI-HDC
> driver (for USB device support we do not need host controller driver).
> Is it possible to bind more than one driver to the same device
> simultaneously? If so, how can I implement this?

No, the linux driver model can bind exactly one driver to a struct
device.  However, that doesn't mean you can't have multiple struct
devices (in whatever form they come) to tell the Linux kernel about
all the details of a single hardware device.

>> This patch series makes the fsl_soc.c code even more complicated, and
>> scatters what is essentially driver code over even more places in the
>> arch/powerpc tree. =A0I'm really not keen on it being merged in this
>> form.
>
> Now I see one reason why it has been originally implemented
> this way.

Should be a solvable problem.  The fsl_soc.c stuff was originally
written simply because the infrastructure wasn't there for doing it
any other way.  We're long past that point now.  I don't have time to
dig into the details now (merge window and all), but ping me in a few
weeks and I'll take another look to see if I can help you.

g.

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Thomas Gleixner @ 2010-05-19 21:08 UTC (permalink / raw)
  To: Darren Hart
  Cc: Jan-Bernd Themann, dvhltc, linux-kernel, Will Schmidt, Brian King,
	niv, Doug Maxey, linuxppc-dev
In-Reply-To: <alpine.LFD.2.00.1005191620350.3368@localhost.localdomain>

On Wed, 19 May 2010, Thomas Gleixner wrote:
> > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > edge triggered as edge triggered. Probably some complexity of the entire power
> > stack that I am ignorant of.
> > 
> > > Apart from the issue of loosing interrupts there is also the fact that
> > > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, I'd love to avoid that but with real level interrupts we'd run
> into an interrupt storm. Though another solution would be to issue the
> EOI after the threaded handler finished, that'd work as well, but
> needs testing.

Thought more about that. The case at hand (ehea) is nasty:

The driver does _NOT_ disable the rx interrupt in the card in the rx
interrupt handler - for whatever reason.

 So even in mainline you get repeated rx interrupts when packets
 arrive while napi is processing the poll, which is suboptimal at
 least. In fact it is counterproductive as the whole purpose of NAPI
 is to _NOT_ get interrupts for consecutive incoming packets while the
 poll is active.

Most of the other network drivers do:

      rx_irq()
	disable rx interrupts on card
	napi_schedule()

Now when the napi poll is done (no more packets available) then the
driver reenables the rx interrupt on the card.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH 1/3] video: add support for getting video mode from device tree
From: Grant Likely @ 2010-05-19 21:19 UTC (permalink / raw)
  To: Anatolij Gustschin
  Cc: linux-fbdev, wd, dzu, devicetree-discuss, linuxppc-dev,
	Mitch Bradley, yorksun
In-Reply-To: <20100428154303.45d1ce87@wker>

On Wed, Apr 28, 2010 at 7:43 AM, Anatolij Gustschin <agust@denx.de> wrote:
> On Mon, 01 Mar 2010 14:45:20 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
>> On Sat, 2010-02-27 at 22:44 -1000, Mitch Bradley wrote:
>>
>> > As it turns out, I'm doing exactly that - exporting verbatim EDID
>> > data
>> > as the value of the "edid" property - for the display node on the Via
>> > version of the OLPC machine. =A0The kernel driver uses it instead of
>> > trying to obtain the EDID data from the monitor, because the builtin
>> > OLPC display cannot supply EDID data through the usual hardware
>> > interfaces.
>>
>> This is actually a common practice (though EDID is most often in
>> uppercase) on Apple hardware too. It has issues though in the sense that
>> it doesn't carry proper connector information and falls over in many
>> multi-head cases.
>>
>> I think passing the EDID data, when available, is thus the right thing
>> to do indeed, however that doesn't solve two problems:
>>
>> =A0- Where to put that property ? This is a complicated problem and we
>> might argue on a binding for weeks because video cards typically support
>> multiple outputs, etc. etc... I think the best for now is to stick as
>> closely as possible to the existing OF fb binding, and have "display"
>> nodes for each output, which can eventually contain an EDID. We might
>> also want to add a string that indicate the connector type. Specific
>> drivers might want to define additional properties here where it makes
>> sense such as binding of those outputs to CRTCs or such, up to you.
>
> Putting EDID to display node would be really sufficient for LCDs in
> our case. Other systems might define this additional connector type
> property.
>
>>
>> =A0- We -also- want a way to specify the "default" mode as set by the
>> firmware, at least on some devices. The EDID gives capabilities, and
>> often for LCDs also the "preferred" mode which is almost always the
>> "default" mode ... but could be different. In order to avoid "flicking",
>> the driver might wants to know what is the currently programmed mode.
>> For that, having split out properties makes sense, though I would like
>> to either prefix them all with "mode," or stick them in a sub-node of
>> the display@.
>
> I would propose defining following properties in the case the
> programmed mode is different from "default" mode:
>
> display@...{
> =A0 =A0 =A0 =A0compatible =3D "<vendor>,<model>"
> =A0 =A0 =A0 =A0EDID =3D [edid-data];
>
> =A0 =A0 =A0 =A0current-mode {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pixel_clock =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0horizontal_active =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0horizontal_blank =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vertical_active =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vertical_blank =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0horizontal_active =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0hsync_offset =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0hsync_pulse_width =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vsync_offset =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vsync_pulse_width =3D <value>;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0hsync_positive;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vsync_positive;
> =A0 =A0 =A0 =A0}
> };
>
> The firmware can set the "default" mode using the EDID's preferred
> Detailed Timing Descriptor data. If on some devices it sets another
> mode than the preferred mode, then the firmware can insert a
> "current-mode" sub-node with currently programmed mode. The driver
> can check for this sub-node and use it's data and if it isn't present,
> it can use the preferred timing data from EDID. The names of the
> properties here are actually what Detailed Timing Descriptor in EDID
> specifies. What do you think?

If you really want to do that, then I think it is okay.  I really
don't know enough about the problem space to say whether or not that
particular description is a good binding or not, but regardless it
should be a video-controller-specific binding.  The name of the node
should probably be prefixed with "<manufacturer>," and it should be
documented along with the display controller's device tree binding.
If another controller wants/needs a different binding format (for the
current mode), then that is fine too (unless you can make a really
good argument that this current-mode binding is perfect and no other
layout should ever be required).  :-)

Cheers,
g.

>
> Thanks,
> Anatolij
>



--=20
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Benjamin Herrenschmidt @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev, devicetree-discuss
In-Reply-To: <1274298809-12772-1-git-send-email-timur@freescale.com>

On Wed, 2010-05-19 at 14:53 -0500, Timur Tabi wrote:
> Add the DTS_PADDING Kconfig option, which allows users and board defconfig
> files to specify a padding value when compiling device trees.
> 
> When a device tree source (DTS) is compiled into a binary (DTB), a hard-coded
> padding of 1024 bytes is used (i.e. dtc command-line parameter "-p 1024").
> Although this has worked so far, it may not be sufficient in the future for
> some boards.  Newer versions of U-boot perform more and more fixup on the
> device tree, and eventually it will run out.
> 
> So to accommodate future boards where more padding is needed, we make the
> option for the -p parameter configurable.

Can't u-boot just allocate more space ?

Ben.

> Signed-off-by: Timur Tabi <timur@freescale.com>
> ---
>  arch/powerpc/boot/Makefile     |    4 ++--
>  arch/powerpc/platforms/Kconfig |   13 +++++++++++++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index bb2465b..750fa6b 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -35,7 +35,7 @@ endif
>  
>  BOOTCFLAGS	+= -I$(obj) -I$(srctree)/$(obj)
>  
> -DTS_FLAGS	?= -p 1024
> +DTS_FLAGS	?= -p ${CONFIG_DTS_PADDING}
>  
>  $(obj)/4xx.o: BOOTCFLAGS += -mcpu=405
>  $(obj)/ebony.o: BOOTCFLAGS += -mcpu=405
> @@ -331,7 +331,7 @@ $(obj)/treeImage.%: vmlinux $(obj)/%.dtb $(wrapperbits)
>  # Rule to build device tree blobs
>  DTC = $(objtree)/scripts/dtc/dtc
>  
> -$(obj)/%.dtb: $(dtstree)/%.dts
> +$(obj)/%.dtb: $(dtstree)/%.dts $(srctree)/.config
>  	$(DTC) -O dtb -o $(obj)/$*.dtb -b 0 $(DTS_FLAGS) $(dtstree)/$*.dts
>  
>  # If there isn't a platform selected then just strip the vmlinux.
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index d1663db..2b0a9c3 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -41,6 +41,19 @@ config PPC_OF_BOOT_TRAMPOLINE
>  
>  	  In case of doubt, say Y
>  
> +config DTS_PADDING
> +	int "Padding for the device tree binary"
> +	default 1024
> +	help
> +	  Specify the padding value to be used when compiling a DTS (device
> +	  tree source) file into a DTB (device tree binary).  The padding is
> +	  used to ensure enough space for any additional nodes and properties
> +	  that the boot loader adds during fix-up.  If your boot loader
> +	  complains about lack of space during fix-up, try increasing this
> +	  value.
> +
> +	  If unsure, leave this value at the default.
> +
>  config UDBG_RTAS_CONSOLE
>  	bool "RTAS based debug console"
>  	depends on PPC_RTAS

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/mpc5121: add initial support for PDM360NG board
From: Grant Likely @ 2010-05-19 21:27 UTC (permalink / raw)
  To: Scott Wood
  Cc: Detlev Zundel, Markus Fischer, devicetree-discuss, Michael Weiss,
	linuxppc-dev, Anatolij Gustschin, Wolfgang Grandegger
In-Reply-To: <4BDEFB13.5060407@freescale.com>

On Mon, May 3, 2010 at 10:34 AM, Scott Wood <scottwood@freescale.com> wrote=
:
> Grant Likely wrote:
>>>
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // IPIC
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // interrupts cell =3D <intr #, sense>
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense values match linux IORESOURCE_IR=
Q_* defines:
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense =3D=3D 8: Level, low assertion
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense =3D=3D 2: Edge, high-to-low chan=
ge
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 //
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ipic: interrupt-controller@c00 {
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 compatible =3D "fsl,mpc51=
21-ipic", "fsl,ipic";
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 interrupt-controller;
>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 #address-cells =3D <0>;
>>
>> Don't need #address-cells here
>
> #address-cells is required by ePAPR for interrupt controllers if an
> interrupt-map is used.

Why?

/me is too lazy to dig out ePAPR and look.

g.

--=20
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Timur Tabi @ 2010-05-19 21:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, devicetree-discuss
In-Reply-To: <1274304013.1931.6.camel@pasglop>

Benjamin Herrenschmidt wrote:

>> So to accommodate future boards where more padding is needed, we make the
>> option for the -p parameter configurable.
> 
> Can't u-boot just allocate more space ?

Yes and no.  U-Boot has functions to increase the size of an fdt, but these
functions can't be sure that the fdt will grow beyond its allocated space.
So if U-Boot calls fdt_setprop() or fdt_add_subnode(), and there isn't
enough space in the fdt, those functions will return with an error.

The problem with growing the fdt is that the function which does this
(fdt_open_into) cannot guarantee that the fdt will grow too large and
overwrite the end of whatever allocated memory it's in.

I had a long argument with Wolfgang on this (see "libfdt: make
fdt_increase_size() available to everyone"), and he says he'll reject any
patch that can't guarantee that fdt_open_into() won't grow too large.  He'll
also reject any patch that uses a macro constant to reserve this space, even
if I use that constant to ensure that fdt_open_into() won't do anything bad.

So in other words, U-Boot could allocate more space, but Wolfgang won't let it.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/mpc5121: add initial support for PDM360NG board
From: Scott Wood @ 2010-05-19 21:37 UTC (permalink / raw)
  To: Grant Likely
  Cc: Detlev Zundel, Markus Fischer, devicetree-discuss, Michael Weiss,
	linuxppc-dev, Anatolij Gustschin, Wolfgang Grandegger
In-Reply-To: <AANLkTilBveOR1aJTgunOq0aqnsGtf1vWHfn8A6uXyoPU@mail.gmail.com>

On 05/19/2010 04:27 PM, Grant Likely wrote:
> On Mon, May 3, 2010 at 10:34 AM, Scott Wood<scottwood@freescale.com>  wrote:
>> Grant Likely wrote:
>>>>
>>>> +               // IPIC
>>>> +               // interrupts cell =<intr #, sense>
>>>> +               // sense values match linux IORESOURCE_IRQ_* defines:
>>>> +               // sense == 8: Level, low assertion
>>>> +               // sense == 2: Edge, high-to-low change
>>>> +               //
>>>> +               ipic: interrupt-controller@c00 {
>>>> +                       compatible = "fsl,mpc5121-ipic", "fsl,ipic";
>>>> +                       interrupt-controller;
>>>> +                       #address-cells =<0>;
>>>
>>> Don't need #address-cells here
>>
>> #address-cells is required by ePAPR for interrupt controllers if an
>> interrupt-map is used.
>
> Why?
>
> /me is too lazy to dig out ePAPR and look.

Address cells are part of the interrupt identification.  Typically with 
interrupt maps this is only used on the child end (e.g. to select a 
particular PCI slot), but if the parent interrupt controller's address 
cells are non-zero it will be expected in the parent interrupt specifier 
as well.

I believe the only part of this that is new with ePAPR is that it asks 
that the interrupt controller address cells be explicitly specified, as 
it's a bit icky for it to default to 2 in some contexts and 0 in others.

-Scott

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/mpc5121: add initial support for PDM360NG board
From: Grant Likely @ 2010-05-19 21:47 UTC (permalink / raw)
  To: Scott Wood
  Cc: Detlev Zundel, Markus Fischer, devicetree-discuss, Michael Weiss,
	linuxppc-dev, Anatolij Gustschin, Wolfgang Grandegger
In-Reply-To: <4BF45A1F.70100@freescale.com>

On Wed, May 19, 2010 at 3:37 PM, Scott Wood <scottwood@freescale.com> wrote=
:
> On 05/19/2010 04:27 PM, Grant Likely wrote:
>>
>> On Mon, May 3, 2010 at 10:34 AM, Scott Wood<scottwood@freescale.com>
>> =A0wrote:
>>>
>>> Grant Likely wrote:
>>>>>
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // IPIC
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // interrupts cell =3D<intr #, sense>
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense values match linux IORESOURCE_=
IRQ_* defines:
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense =3D=3D 8: Level, low assertion
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 // sense =3D=3D 2: Edge, high-to-low ch=
ange
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 //
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ipic: interrupt-controller@c00 {
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 compatible =3D "fsl,mpc=
5121-ipic", "fsl,ipic";
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 interrupt-controller;
>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 #address-cells =3D<0>;
>>>>
>>>> Don't need #address-cells here
>>>
>>> #address-cells is required by ePAPR for interrupt controllers if an
>>> interrupt-map is used.
>>
>> Why?
>>
>> /me is too lazy to dig out ePAPR and look.
>
> Address cells are part of the interrupt identification. =A0Typically with
> interrupt maps this is only used on the child end (e.g. to select a
> particular PCI slot), but if the parent interrupt controller's address ce=
lls
> are non-zero it will be expected in the parent interrupt specifier as wel=
l.
>
> I believe the only part of this that is new with ePAPR is that it asks th=
at
> the interrupt controller address cells be explicitly specified, as it's a
> bit icky for it to default to 2 in some contexts and 0 in others.

Hmmm.  I've not seen that before.  On the 5200 the value of
#address-cells for interrupt controllers has apparently defaulted to
<0> so I've never encountered or thought about it.  I'm not even sure
what #address-cells !=3D 0 would mean in the context of interrupt
mapping, or where it would be relevant.

g.

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Benjamin Herrenschmidt @ 2010-05-19 22:44 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev, devicetree-discuss
In-Reply-To: <4BF4593E.8030101@freescale.com>

On Wed, 2010-05-19 at 16:33 -0500, Timur Tabi wrote:
> Benjamin Herrenschmidt wrote:
> 
> >> So to accommodate future boards where more padding is needed, we make the
> >> option for the -p parameter configurable.
> > 
> > Can't u-boot just allocate more space ?
> 
> Yes and no.  U-Boot has functions to increase the size of an fdt, but these
> functions can't be sure that the fdt will grow beyond its allocated space.
> So if U-Boot calls fdt_setprop() or fdt_add_subnode(), and there isn't
> enough space in the fdt, those functions will return with an error.

 .../...

It's still not kernel business to have to deal with u-boot memory
allocation constraints. The padding in the kernel built is intended to
make space for DT changes done by the zImage wrapper.

Maybe we could add to libfdt a way to provide a realloc() callback to it
when it hits the max size, and uboot can then move things around (or
fail).

Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Timur Tabi @ 2010-05-20  0:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, devicetree-discuss
In-Reply-To: <1274309050.1931.21.camel@pasglop>

On Wed, May 19, 2010 at 5:44 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:

> It's still not kernel business to have to deal with u-boot memory
> allocation constraints.

I agree, but it still makes sense to me to allow the padding to be configurable.

> The padding in the kernel built is intended to
> make space for DT changes done by the zImage wrapper.

Well, okay.  I think it would be nice if we expanded that to handle
general usage.

> Maybe we could add to libfdt a way to provide a realloc() callback to it
> when it hits the max size, and uboot can then move things around (or
> fail).

The problem is that the code which allocates a block for the fdt is
completely distinct from the code that manipulates the fdt.  We'd need
to put in either some kind of funky callback mechanism, or insist that
every fdt exist in a block of memory allocated by some specific method
(e.g. lmb).

I'm stuck between a rock and a hard place, it seems.  No one is
willing to compromise on any of my ideas.  It's hard to convince our
BSP developers that they should be pushing more code upstream when I
get so much resistance for a such a mundane change.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Benjamin Herrenschmidt @ 2010-05-20  0:23 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev, devicetree-discuss
In-Reply-To: <AANLkTilefxXcMnWqKSultu88r4D9W98adHLHxvUwi113@mail.gmail.com>

On Wed, 2010-05-19 at 19:03 -0500, Timur Tabi wrote:
> The problem is that the code which allocates a block for the fdt is
> completely distinct from the code that manipulates the fdt.  We'd need
> to put in either some kind of funky callback mechanism, or insist that
> every fdt exist in a block of memory allocated by some specific method
> (e.g. lmb).
> 
> I'm stuck between a rock and a hard place, it seems.  No one is
> willing to compromise on any of my ideas.  It's hard to convince our
> BSP developers that they should be pushing more code upstream when I
> get so much resistance for a such a mundane change. 

I don't see why we couldn't add a callback to libfdt for allocation /
reallocation.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: M. Warner Losh @ 2010-05-20  0:36 UTC (permalink / raw)
  To: timur; +Cc: devicetree-discuss, linuxppc-dev
In-Reply-To: <AANLkTilefxXcMnWqKSultu88r4D9W98adHLHxvUwi113@mail.gmail.com>

In message: <AANLkTilefxXcMnWqKSultu88r4D9W98adHLHxvUwi113@mail.gmail.com>
            Timur Tabi <timur@freescale.com> writes:
: I'm stuck between a rock and a hard place, it seems.  No one is
: willing to compromise on any of my ideas.  It's hard to convince our
: BSP developers that they should be pushing more code upstream when I
: get so much resistance for a such a mundane change.

http://www.bikeshed.org/ seems appropriate here.

Warner

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: David Gibson @ 2010-05-20  1:18 UTC (permalink / raw)
  To: Timur Tabi; +Cc: devicetree-discuss, linuxppc-dev
In-Reply-To: <AANLkTilefxXcMnWqKSultu88r4D9W98adHLHxvUwi113@mail.gmail.com>

On Wed, May 19, 2010 at 07:03:17PM -0500, Timur Tabi wrote:
> On Wed, May 19, 2010 at 5:44 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> 
> > It's still not kernel business to have to deal with u-boot memory
> > allocation constraints.
> 
> I agree, but it still makes sense to me to allow the padding to be configurable.
> 
> > The padding in the kernel built is intended to
> > make space for DT changes done by the zImage wrapper.
> 
> Well, okay.  I think it would be nice if we expanded that to handle
> general usage.
> 
> > Maybe we could add to libfdt a way to provide a realloc() callback to it
> > when it hits the max size, and uboot can then move things around (or
> > fail).
> 
> The problem is that the code which allocates a block for the fdt is
> completely distinct from the code that manipulates the fdt.  We'd need
> to put in either some kind of funky callback mechanism, or insist that
> every fdt exist in a block of memory allocated by some specific method
> (e.g. lmb).
> 
> I'm stuck between a rock and a hard place, it seems.  No one is
> willing to compromise on any of my ideas.  It's hard to convince our
> BSP developers that they should be pushing more code upstream when I
> get so much resistance for a such a mundane change.

Couldn't you use the configurable padding, but put the stuff to do it
into u-boot.  i.e. repad the dtb at u-boot build time, rather than
u-boot runtime.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Michael Ellerman @ 2010-05-20  1:28 UTC (permalink / raw)
  To: Darren Hart
  Cc: Jan-Bernd Themann, dvhltc, linux-kernel, Milton Miller,
	Will Schmidt, Brian King, niv, Thomas Gleixner, Doug Maxey,
	linuxppc-dev
In-Reply-To: <4BF3F2DB.7030701@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]

On Wed, 2010-05-19 at 07:16 -0700, Darren Hart wrote:
> On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> >> On 05/18/2010 02:52 PM, Brian King wrote:
> >>> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
> >>
> >> Yes, it basically says "don't make this handler threaded".
> >
> > That is a good fix for EHEA, but the threaded handling is still broken
> > for anything else that is edge triggered isn't it?
> 
> No, I don't believe so. Edge triggered interrupts that are reported as 
> edge triggered interrupts will use the edge handler (which was the 
> approach Sebastien took to make this work back in 2008). Since XICS 
> presents all interrupts as Level Triggered, they use the fasteoi path.

But that's the point, no interrupts on XICS are reported as edge, even
if they are actually edge somewhere deep in the hardware. I don't think
we have any reliable way to determine what is what.

> > The result of the discussion about two years ago on this was that we
> > needed a custom flow handler for XICS on RT.
> 
> I'm still not clear on why the ultimate solution wasn't to have XICS 
> report edge triggered as edge triggered. Probably some complexity of the 
> entire power stack that I am ignorant of.

I'm not really sure either, but I think it's a case of a leaky
abstraction on the part of the hypervisor. Edge interrupts behave as
level as long as you handle the irq before EOI, but if you mask they
don't. But Milton's the expert on that.

> > Apart from the issue of loosing interrupts there is also the fact that
> > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, one of may reasons why we felt this was the right fix. The other 
> is that there is no real additional overhead in running this as 
> non-threaded since the receive handler is so short (just napi_schedule()).

True. It's not a fix in general though. I'm worried that we're going to
see the exact same bug for MSI(-X) interrupts.

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Michael Ellerman @ 2010-05-20  1:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Jan-Bernd Themann, dvhltc, linux-kernel,
	Milton Miller, Will Schmidt, Brian King, niv, Doug Maxey,
	linuxppc-dev
In-Reply-To: <alpine.LFD.2.00.1005191620350.3368@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> On Wed, 19 May 2010, Darren Hart wrote:
> 
> > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> > > > On 05/18/2010 02:52 PM, Brian King wrote:
> > > > > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
> > > > > mainline...
> > > > 
> > > > Yes, it basically says "don't make this handler threaded".
> > > 
> > > That is a good fix for EHEA, but the threaded handling is still broken
> > > for anything else that is edge triggered isn't it?
> > 
> > No, I don't believe so. Edge triggered interrupts that are reported as edge
> > triggered interrupts will use the edge handler (which was the approach
> > Sebastien took to make this work back in 2008). Since XICS presents all
> > interrupts as Level Triggered, they use the fasteoi path.
> 
> I wonder whether the XICS interrupts which are edge type can be
> identified from the irq_desc->flags. Then we could avoid the masking
> for those in the fasteoi_handler in general.

I'm not sure they can be. I know on other similar HW we can detect LSI
vs MSI, but that's without the HV in the equation.

> > > The result of the discussion about two years ago on this was that we
> > > needed a custom flow handler for XICS on RT.
> > 
> > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > edge triggered as edge triggered. Probably some complexity of the entire power
> > stack that I am ignorant of.
> > 
> > > Apart from the issue of loosing interrupts there is also the fact that
> > > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, I'd love to avoid that but with real level interrupts we'd run
> into an interrupt storm. Though another solution would be to issue the
> EOI after the threaded handler finished, that'd work as well, but
> needs testing.

Yeah I think that was the idea for the custom flow handler. We'd reset
the processor priority so we can take other interrupts (which the EOI
usually does for you), then do the actual EOI after the handler
finished.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Michael Ellerman @ 2010-05-20  1:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, Brian King, niv, Doug Maxey, linuxppc-dev
In-Reply-To: <alpine.LFD.2.00.1005192244430.3368@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
> On Wed, 19 May 2010, Thomas Gleixner wrote:
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > > 
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock.
> > 
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
> 
> Thought more about that. The case at hand (ehea) is nasty:
> 
> The driver does _NOT_ disable the rx interrupt in the card in the rx
> interrupt handler - for whatever reason.

Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: Timur Tabi @ 2010-05-20  1:46 UTC (permalink / raw)
  To: Timur Tabi, Benjamin Herrenschmidt, linuxppc-dev,
	devicetree-discuss
In-Reply-To: <20100520011820.GT25892@yookeroo>

On Wed, May 19, 2010 at 8:18 PM, David Gibson
<david@gibson.dropbear.id.au> wrote:

> Couldn't you use the configurable padding, but put the stuff to do it
> into u-boot. =A0i.e. repad the dtb at u-boot build time, rather than
> u-boot runtime.

That's what I was trying to do.   Take a look at this thread:

http://lists.denx.de/pipermail/u-boot/2010-May/071520.html

--=20
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH] powerpc: Fix string library functions
From: Paul Mackerras @ 2010-05-20  3:03 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <m2bpcdjcue.fsf@igel.home>

On Tue, May 18, 2010 at 08:15:21PM +0200, Andreas Schwab wrote:

> The powerpc strncmp implementation does not correctly handle a zero
> length, despite the claim in 0119536cd314ef95553604208c25bc35581f7f0a
> (Add hand-coded assembly strcmp).
> 
> Additionally, all the length arguments are size_t, not int, so use
> PPC_LCMPI and eq instead of cmpwi and le throughout.
> 
> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>

Looks fine to me.

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply

* Re: [Patch 2/2] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
From: K.Prasad @ 2010-05-20  4:06 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Michael Neuling, Benjamin Herrenschmidt, shaggy,
	Frederic Weisbecker, David Gibson, linuxppc-dev@ozlabs.org,
	Alan Stern, Roland McGrath
In-Reply-To: <20100517123241.GA5379@brick.ozlabs.ibm.com>

On Mon, May 17, 2010 at 10:32:41PM +1000, Paul Mackerras wrote:
> On Fri, May 14, 2010 at 12:25:31PM +0530, K.Prasad wrote:
> 
> > Okay. I will re-use single_step_exception() after modifications; it
> > appearsto have no in-kernel users for it.
> 
> It's called from exceptions-64s.S, head_32.S and head_8xx.S in
> arch/powerpc/kernel.
> 
> > > Suppose the address at which the data breakpoint has been unmapped,
> > > and the process has a handler for the SIGSEGV signal.  When we try to
> > > single-step the load or store, we will get a DSI (0x300) interrupt,
> > > call into do_page_fault, and end up sending the process a SIGSEGV.
> > > That will invoke the signal handler, which can then do anything it
> > > likes.  It can do a blocking system call, it can longjmp() back into
> > > its main event, or it can return from the signal handler.  Only in the
> > > last case will it retry the load or store, and then only if the signal
> > > handler hasn't modified the NIP value in the signal frame.  That's
> > > what I mean by "doesn't return to the instruction".
> > > 
> > 
> > At the outset, this seemed to be a scary thing to happen; but turns out
> > to be harmful only to the extent of generating a false hw-breakpoint
> > exception in certain cases. A case-by-case basis analysis reveals thus:
> > 
> > Consider an instruction stream i1, i2, i3, ... iN, where i1 has
> > finished execution and i2 is about to be executed but has generated a
> > DSI interrupt with the above-mentioned conditions i.e. DSISR indicates a
> > DABR match + Page-Table entry not found. Now according to do_hash_page
> > in exception-64s.S (as pasted below), do_page_fault() and do_dabr() are
> > invoked one after the other.
> > 
> > _STATIC(do_hash_page)
> > 	std	r3,_DAR(r1)
> > 	std	r4,_DSISR(r1)
> > 
> > 	andis.	r0,r4,0xa410		/* weird error? */
> > 	bne-	handle_page_fault	/* if not, try to insert a HPTE */
> > 	andis.  r0,r4,DSISR_DABRMATCH@h
> > 	bne-    handle_dabr_fault
> 
> Note that bne is not a procedure call; we'll actually get two DSIs in
> this scenario.  But I don't think that matters.  Also note that the
> branch to handle_page_fault here is not for the HPTE-not-found case;
> it's for the unusual errors.  So we'll handle the HPTE insertion after
> handling the DABR match.
> 

Okay.

> > Thus, when control returns to user-space to instruction 'i2', the
> > hw_breakpoint_handler() has completed execution, and a SIGSEGV is pending
> > to be delivered and single-stepping enabled MSR_SE is set. Upon return to
> > user-space, the handler for SIGSEGV is executed and it may perform one of
> > the following (as you stated previously):
> > (a) Make a blocking syscall, eventually yielding the CPU to a new thread
> > (b) Jump to a different instruction in user-space, say iN, and not complete
> > the execution of instruction i2 at all.
> > (c) Return to instruction i2 and complete the execution.
> > 
> > In case of (a), the context-switches should not affect the ability to
> > single-step the instruction when the thread is eventually scheduled to
> > run. The thread, when scheduled onto the CPU will complete signal
> > handling, return to execute instruction i2, cause single-step exception,
> > restore breakpoints and run smoothly thereafter.
> 
> Right.  However, the thread is running the signal handler without the
> DABR being set, which is unfortunate.
> 

In order to keep the breakpoint active during signal handling, a
PowerPC specific signal handling code, say do_signal_pending() in
arch/powerpc/kernel/signal.c, must be tapped to check for/restore
the breakpoint for the process (if it exists).

Then, single_step_dabr_instruction() must be suitably modified to not
clear the current->thread.last_hit_ubp pointer if breakpoint has been
taken in a nested condition i.e. a breakpoint exception in signal-handler
which was preceded by a previous breakpoint exception in normal user-space
stack. A flag to denote such a condition would be required in
'struct thread_struct'.

I'm afraid if this is more complexity than we want to handle in the
first patchset. I agree that this will create a 'blind-spot' of code
which cannot not be monitored using breakpoints and may limit debugging
efforts (specially for memory corruption); however suspecting that signal
handlers (especially those that don't return to original instruction)
would be rare, I feel that this could be a 'feature' that can be brought
later-in. What do you think?

> > In case of (b), the new instruction iN is single-stepped, the breakpoint
> > values are restored and the hw-breakpoint exception callback is invoked
> > after iN is executed. The user of this breakpoint i.e. the caller of
> > register_user_hw_breakpoint() who had placed a breakpoint on addressed
> > accessed by instruction i2 will be confused to find that an unrelated
> > instruction (which may not be a load/store) has caused the breakpoint.
> 
> That's the case if the signal handler modifies the NIP value in the
> register set saved on the stack and returns.  If the signal handler
> instead simply jumps to instruction iN (e.g. with longjmp or
> siglongjmp), we'll never get the single-step callback.
> 

True. As described above, this will lead to unreliable restoration of
breakpoints. I'm not sure if this is a common occurrence in user-space,
but if rare, I think we should be able to tolerate this behaviour for now.

> > If so desired, we may adopt the 'trigger-before-execute' semantics for
> > user-space breakpoints wherein the hw-breakpoint callback (through
> > perf_bp_event()) is invoked in hw_breakpoint_handler() itself. This
> > would indicate to the user that the impending instruction causes a DABR
> > 'hit' but it may or may not be executed due to the role of
> > signal-handler or due to self-modifying code (as mentioned below).
> > 
> > Kindly let me know what you think about it.
> > 
> > (c) is the normal execution path we desire. The instruction i2 will be
> > safely single-stepped and breakpoints are restored.
> > 
> > > The instruction could be changed underneath us if the program is
> > > multi-threaded and another thread writes another instruction to the
> > > instruction word where the load or store is.  Or it could use mmap()
> > > to map some other page at the address of the load or store.  Either
> > > way we could end up with a different instruction there.
> > > 
> > 
> > If the instruction that originally caused the DABR exception is changed,
> > the new instruction in its place would still single-step to restore
> > breakpoint values. However the user of breakpoint interface will be
> > confused to find that the callback is invoked for an irrelevant
> > instruction.
> > 
> > It could be circumvented, to an extent, through the use of
> > trigger-before-execute semantics (as described before).
> 
> I don't think we want to do trigger-before-execute.  Ideally what we
> want to ensure at all times is that either DABR is set (enabled) or
> MSR.SE is set, but not both.  To ensure that we'd have to modify the
> signal delivery code and possibly other places.
> 

The case where both DABR and MSR_SE are set is when a page-fault
resulting in context-switch occurs before single-stepping, right? I have
responded to that below. The problem with signal-handling, as I
understand, is unreliable restoration of/lost breakpoints and I wish to
deal them in future patchsets. Let me know if my understanding is incorrect.

> > > If we do get a context switch, e.g. as a result of a page fault, and
> > > then switch back to the task, it looks to me like we will end up with
> > > MSR_SE and DABR both set.  I don't suppose that will actually cause
> > > any real problem besides double-counting the hit.
> > > 
> > 
> > Page fault exception will be handled before hw_breakpoint_handler(),
> > hence MSR_SE would not have been set if a context-switch happened in
> > pange-fault handling itself. I don't see a case where both MSR_SE and
> > DABR will be set together.
> 
> Imagine this scenario: we get the DABR match, set MSR_SE and return to
> the task.  In the meantime another higher-priority task has become
> runnable and our need_resched flag is set, so we call schedule() on
> the way back out to usermode.  The other task runs and then blocks and
> our task gets scheduled again.  As part of the context switch,
> arch_install_hw_breakpoint() will get called and will set DABR.  So
> we'll return to usermode with both DABR and MSE_SE set.
> 

I didn't foresee such a possibility. I think this can be handled by
putting a check in arch_install_hw_breakpoint() as shown below:

int arch_install_hw_breakpoint(struct perf_event *bp)
{
	struct arch_hw_breakpoint *info = counter_arch_bp(bp);
	struct perf_event **slot = &__get_cpu_var(bp_per_reg);

	*slot = bp;
	if (!current->thread.last_hit_ubp)
		set_dabr(info->address | info->type | DABR_TRANSLATION);
	return 0;
}

This should prevent DABR and MSR_SE from being enabled at the same time.

> > Thanks for the comments. Let me know if the analysis above is incorrect
> > or if I've failed to recognise any important issue that you pointed out.
> > I will send out a patch with changes to emulate_single_step() in the
> > next version of the patchset, if I don't hear any further comments.
> 
> We haven't discussed at all the case where the breakpoint is a per-cpu
> breakpoint or where it's a per-task breakpoint but the DABR match
> occurs within the kernel -- which can happen, even for a user address,
> in __get_user, __put_user, __copy_tofrom_user, etc.  If the access
> there is to a bad address, we'll invoke the exception case in
> bad_page_fault(), which looks to be another place where we need to
> recognize that single-stepping won't succeed and reinstall the DABR
> setting.  Do we count that as an event or not? - I'm not sure.
>

I'll respond to this part in the subsequent mail.

Thanks,
K.Prasad

^ permalink raw reply

* Re: [PATCH] powerpc: make the padding for the device tree a configurable option
From: David Gibson @ 2010-05-20  6:17 UTC (permalink / raw)
  To: Timur Tabi; +Cc: devicetree-discuss, linuxppc-dev
In-Reply-To: <AANLkTily6AYGJbvTx-4MJm7HKeRWOuB9jpPzgVmys1sm@mail.gmail.com>

On Wed, May 19, 2010 at 08:46:19PM -0500, Timur Tabi wrote:
> On Wed, May 19, 2010 at 8:18 PM, David Gibson
> <david@gibson.dropbear.id.au> wrote:
> 
> > Couldn't you use the configurable padding, but put the stuff to do it
> > into u-boot.  i.e. repad the dtb at u-boot build time, rather than
> > u-boot runtime.
> 
> That's what I was trying to do.   Take a look at this thread:
> 
> http://lists.denx.de/pipermail/u-boot/2010-May/071520.html

Um.. what?  As far as I can tell that thread is about runtime
expanding the space given to the dtb.  I'm talking about altering the
padding on the file before giving it to u-boot in the first place.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Jan-Bernd Themann @ 2010-05-20  7:37 UTC (permalink / raw)
  To: michael
  Cc: Darren Hart, dvhltc, linux-kernel, Will Schmidt, Brian King, niv,
	Thomas Gleixner, Doug Maxey, linuxppc-dev
In-Reply-To: <1274319248.22892.40.camel@concordia>

Hi,


Michael Ellerman <michael@ellerman.id.au> wrote on 20.05.2010 03:34:08:


> Subject:
>
> Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
>
> On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
> > On Wed, 19 May 2010, Thomas Gleixner wrote:
> > > > I'm still not clear on why the ultimate solution wasn't to
> have XICS report
> > > > edge triggered as edge triggered. Probably some complexity of
> the entire power
> > > > stack that I am ignorant of.
> > > >
> > > > > Apart from the issue of loosing interrupts there is also thefact
that
> > > > > masking on the XICS requires an RTAS call which takes a global
lock.
> > >
> > > Right, I'd love to avoid that but with real level interrupts we'd run
> > > into an interrupt storm. Though another solution would be to issue
the
> > > EOI after the threaded handler finished, that'd work as well, but
> > > needs testing.
> >
> > Thought more about that. The case at hand (ehea) is nasty:
> >
> > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > interrupt handler - for whatever reason.
>
> Yeah I saw that, but I don't know why it's written that way. Perhaps
> Jan-Bernd or Doug will chime in and enlighten us? :)

>From our perspective there is no need to disable interrupts for the RX side
as
the chip does not fire further interrupts until we tell the chip to do so
for a
particular queue. We have multiple receive queues with an own interrupt
each
so that the interrupts can arrive on multiple CPUs in parallel.
Interrupts are enabled again when we leave the NAPI Poll function for the
corresponding
receive queue.

Michael, does this answer your question?

Regards,
Jan-Bernd

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Thomas Gleixner @ 2010-05-20  8:14 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Darren Hart, dvhltc, linux-kernel, Will Schmidt, Brian King, niv,
	Doug Maxey, linuxppc-dev
In-Reply-To: <OF17F8F35C.B09E4797-ONC1257729.0028292D-C1257729.0029E9AC@de.ibm.com>

On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > Thought more about that. The case at hand (ehea) is nasty:
> > >
> > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > interrupt handler - for whatever reason.
> >
> > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > Jan-Bernd or Doug will chime in and enlighten us? :)
> 
> From our perspective there is no need to disable interrupts for the
> RX side as the chip does not fire further interrupts until we tell
> the chip to do so for a particular queue. We have multiple receive

The traces tell a different story though:

    ehea_recv_irq_handler()
      napi_reschedule()
    eoi()
    ehea_poll()
      ...
      ehea_recv_irq_handler()    <---------------- ???
        napi_reschedule()
      ...
      napi_complete()

Can't tell whether you can see the same behaviour in mainline, but I
don't see a reason why not.

> queues with an own interrupt each so that the interrupts can arrive
> on multiple CPUs in parallel.  Interrupts are enabled again when we
> leave the NAPI Poll function for the corresponding receive queue.

I can't see a piece of code which does that, but that's probably just
lack of detailed hardware knowledge on my side.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
From: Thomas Gleixner @ 2010-05-20  8:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Darren Hart, Jan-Bernd Themann, dvhltc, linux-kernel,
	Milton Miller, Will Schmidt, Brian King, niv, Doug Maxey,
	linuxppc-dev
In-Reply-To: <1274319125.22892.38.camel@concordia>

On Thu, 20 May 2010, Michael Ellerman wrote:
> On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > On Wed, 19 May 2010, Darren Hart wrote:
> > 
> > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> 
> > > > The result of the discussion about two years ago on this was that we
> > > > needed a custom flow handler for XICS on RT.
> > > 
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > > 
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock.
> > 
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
> 
> Yeah I think that was the idea for the custom flow handler. We'd reset
> the processor priority so we can take other interrupts (which the EOI
> usually does for you), then do the actual EOI after the handler
> finished.

That only works when the card does not issue new interrupts until the
EOI happens. If the EOI is only relevant for the interrupt controller,
then you are going to lose any edge which comes in before the EOI as
well.

Thanks,

	tglx

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox