LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 00/15] ps3: Support more than the OtherOS lpar
From: Andre Heider @ 2011-08-04 16:31 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <4E39CA6C.3000202@infradead.org>

Hi Geoff,

On Thu, Aug 4, 2011 at 12:23 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Andre,
>
> On 08/01/2011 01:02 PM, Andre Heider wrote:
>> This series addresses various issues and extends support when running
>> in lpars like GameOS. Included are some patches from Hector Martin, whic=
h
>> I found useful.
>
> Much of this is just general fixups and improvements to the existing PS3
> support. =A0I think you should separate those changes out and work to get
> them included, then consider others. =A0If I give some comment, then
> I consider that part worth pursuing at the present time.

Sounds like a good approach to me.

> I have limited time to review the patches, so it will take me a while to
> get through them.

No problem at all, it will probably take some time to resolve all the
details anyway.

>> Patches are based on 2.6.39 since master doesn't boot with smp on my
>> console. =A0I wasn't able to pinpoint the cause so far (not that I tried
>> too hard).
>
> I'm looking into this problem, but it will take some time.

Just for the record: I didn't mean to push ;)

Thanks,
Andre

^ permalink raw reply

* Re: [PATCH 03/15] [PS3] Add region 1 memory early
From: Geoff Levand @ 2011-08-04 15:57 UTC (permalink / raw)
  To: Hector Martin; +Cc: cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <4E39E302.9060702@marcansoft.com>

Hi Hector,

On 08/03/2011 05:08 PM, Hector Martin wrote:
> On 08/04/2011 12:32 AM, Geoff Levand wrote:
>> We need an explanation of this change.

Sorry for such a terse request.  What I meant was that
this is a significant change to how high mem is managed,
so the patch needs a comment explaining the change.

> I actually have a hard time understanding the reason for the existing
> behavior of hot-adding memory halfway through the boot process. Maybe
> you can shed some light on this?

LV1 was intended to be a generic hypervisor for the Cell
processor.  It was imagined that it could be used on machines
which could be running many lpars.  Around the same time I
was doing the high mem support the hot plug memory support was
being developed.  I thought at some point there would be
hot-unplug, which could be used to move memory between lpars.

At the present time this change make sense, since it is simpler
and more flexible.

-Geoff

^ permalink raw reply

* Re: [PATCH 03/15] [PS3] Add region 1 memory early
From: Hector Martin @ 2011-08-04 11:13 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Geoff Levand, cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <CAMuHMdWuefw8WM+320vr--uWTjiFo6hJrinD4HuzNFSZcM8sjQ@mail.gmail.com>

On 08/04/2011 09:05 AM, Geert Uytterhoeven wrote:
> The reason for that is to make sure the allocations will succeed.
> Chances are very
> slim you can allocate a contiguous 9 MiB buffer at any arbitrary time.

Fair enough, but then they don't need to happen as early as they do now;
any time during kernel startup should work (as long as they aren't freed
with the drivers if they're unloaded). How about switching to
__get_free_pages and doing the allocation inside an arch_initcall or
similar?

-- 
Hector Martin (hector@marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-04 10:41 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras, David Gibson,
	Avi Kivity, Anthony Liguori, linux-pci@vger.kernel.org,
	linuxppc-dev
In-Reply-To: <1312230476.2653.395.camel@bling.home>

On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote:
> It's not clear to me how we could skip it.  With VT-d, we'd have to
> implement an emulated interrupt remapper and hope that the guest picks
> unused indexes in the host interrupt remapping table before it could do
> anything useful with direct access to the MSI-X table.  Maybe AMD IOMMU
> makes this easier?

AMD IOMMU provides remapping tables per-device, and not a global one.
But that does not make direct guest-access to the MSI-X table safe. The
table contains the table contains the interrupt-type and the vector
which is used as an index into the remapping table by the IOMMU. So when
the guest writes into its MSI-X table the remapping-table in the host
needs to be updated too.

	Joerg

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-04 10:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Anthony Liguori, linuxppc-dev
In-Reply-To: <1311983933.8793.42.camel@pasglop>

Hi Ben,

thanks for your detailed introduction to the requirements for POWER. Its
good to know that the granularity problem is not x86-only.

On Sat, Jul 30, 2011 at 09:58:53AM +1000, Benjamin Herrenschmidt wrote:
> In IBM POWER land, we call this a "partitionable endpoint" (the term
> "endpoint" here is historic, such a PE can be made of several PCIe
> "endpoints"). I think "partitionable" is a pretty good name tho to
> represent the constraints, so I'll call this a "partitionable group"
> from now on.

On x86 this is mostly an issue of the IOMMU and which set of devices use
the same request-id. I used to call that an alias-group because the
devices have a request-id alias to the pci-bridge.

> - The -minimum- granularity of pass-through is not always a single
> device and not always under SW control

Correct.
 
> - Having a magic heuristic in libvirt to figure out those constraints is
> WRONG. This reeks of XFree 4 PCI layer trying to duplicate the kernel
> knowledge of PCI resource management and getting it wrong in many many
> cases, something that took years to fix essentially by ripping it all
> out. This is kernel knowledge and thus we need the kernel to expose in a
> way or another what those constraints are, what those "partitionable
> groups" are.

I agree. Managing the ownership of a group should be done in the kernel.
Doing this in userspace is just too dangerous.

The problem to be solved here is how to present these PEs inside the
kernel and to userspace. I thought a bit about making this visbible
through the iommu-api for in-kernel users. That is probably the most
logical place.

For userspace I would like to propose a new device attribute in sysfs.
This attribute contains the group number. All devices with the same
group number belong to the same PE. Libvirt needs to scan the whole
device tree to build the groups but that is probalbly not a big deal.


	Joerg

> 
> - That does -not- mean that we cannot specify for each individual device
> within such a group where we want to put it in qemu (what devfn etc...).
> As long as there is a clear understanding that the "ownership" of the
> device goes with the group, this is somewhat orthogonal to how they are
> represented in qemu. (Not completely... if the iommu is exposed to the
> guest ,via paravirt for example, some of these constraints must be
> exposed but I'll talk about that more later).
> 
> The interface currently proposed for VFIO (and associated uiommu)
> doesn't handle that problem at all. Instead, it is entirely centered
> around a specific "feature" of the VTd iommu's for creating arbitrary
> domains with arbitrary devices (tho those devices -do- have the same
> constraints exposed above, don't try to put 2 legacy PCI devices behind
> the same bridge into 2 different domains !), but the API totally ignores
> the problem, leaves it to libvirt "magic foo" and focuses on something
> that is both quite secondary in the grand scheme of things, and quite
> x86 VTd specific in the implementation and API definition.
> 
> Now, I'm not saying these programmable iommu domains aren't a nice
> feature and that we shouldn't exploit them when available, but as it is,
> it is too much a central part of the API.
> 
> I'll talk a little bit more about recent POWER iommu's here to
> illustrate where I'm coming from with my idea of groups:
> 
> On p7ioc (the IO chip used on recent P7 machines), there -is- a concept
> of domain and a per-RID filtering. However it differs from VTd in a few
> ways:
> 
> The "domains" (aka PEs) encompass more than just an iommu filtering
> scheme. The MMIO space and PIO space are also segmented, and those
> segments assigned to domains. Interrupts (well, MSI ports at least) are
> assigned to domains. Inbound PCIe error messages are targeted to
> domains, etc...
> 
> Basically, the PEs provide a very strong isolation feature which
> includes errors, and has the ability to immediately "isolate" a PE on
> the first occurence of an error. For example, if an inbound PCIe error
> is signaled by a device on a PE or such a device does a DMA to a
> non-authorized address, the whole PE gets into error state. All
> subsequent stores (both DMA and MMIO) are swallowed and reads return all
> 1's, interrupts are blocked. This is designed to prevent any propagation
> of bad data, which is a very important feature in large high reliability
> systems.
> 
> Software then has the ability to selectively turn back on MMIO and/or
> DMA, perform diagnostics, reset devices etc...
> 
> Because the domains encompass more than just DMA, but also segment the
> MMIO space, it is not practical at all to dynamically reconfigure them
> at runtime to "move" devices into domains. The firmware or early kernel
> code (it depends) will assign devices BARs using an algorithm that keeps
> them within PE segment boundaries, etc....
> 
> Additionally (and this is indeed a "restriction" compared to VTd, though
> I expect our future IO chips to lift it to some extent), PE don't get
> separate DMA address spaces. There is one 64-bit DMA address space per
> PCI host bridge, and it is 'segmented' with each segment being assigned
> to a PE. Due to the way PE assignment works in hardware, it is not
> practical to make several devices share a segment unless they are on the
> same bus. Also the resulting limit in the amount of 32-bit DMA space a
> device can access means that it's impractical to put too many devices in
> a PE anyways. (This is clearly designed for paravirt iommu, I'll talk
> more about that later).
> 
> The above essentially extends the granularity requirement (or rather is
> another factor defining what the granularity of partitionable entities
> is). You can think of it as "pre-existing" domains.
> 
> I believe the way to solve that is to introduce a kernel interface to
> expose those "partitionable entities" to userspace. In addition, it
> occurs to me that the ability to manipulate VTd domains essentially
> boils down to manipulating those groups (creating larger ones with
> individual components).
> 
> I like the idea of defining / playing with those groups statically
> (using a command line tool or sysfs, possibly having a config file
> defining them in a persistent way) rather than having their lifetime
> tied to a uiommu file descriptor.
> 
> It also makes it a LOT easier to have a channel to manipulate
> platform/arch specific attributes of those domains if any.
> 
> So we could define an API or representation in sysfs that exposes what
> the partitionable entities are, and we may add to it an API to
> manipulate them. But we don't have to and I'm happy to keep the
> additional SW grouping you can do on VTd as a sepparate "add-on" API
> (tho I don't like at all the way it works with uiommu). However, qemu
> needs to know what the grouping is regardless of the domains, and it's
> not nice if it has to manipulate two different concepts here so
> eventually those "partitionable entities" from a qemu standpoint must
> look like domains.
> 
> My main point is that I don't want the "knowledge" here to be in libvirt
> or qemu. In fact, I want to be able to do something as simple as passing
> a reference to a PE to qemu (sysfs path ?) and have it just pickup all
> the devices in there and expose them to the guest.
> 
> This can be done in a way that isn't PCI specific as well (the
> definition of the groups and what is grouped would would obviously be
> somewhat bus specific and handled by platform code in the kernel).
> 
> Maybe something like /sys/devgroups ? This probably warrants involving
> more kernel people into the discussion.
> 
> * IOMMU
> 
> Now more on iommu. I've described I think in enough details how ours
> work, there are others, I don't know what freescale or ARM are doing,
> sparc doesn't quite work like VTd either, etc...
> 
> The main problem isn't that much the mechanics of the iommu but really
> how it's exposed (or not) to guests.
> 
> VFIO here is basically designed for one and only one thing: expose the
> entire guest physical address space to the device more/less 1:1.
> 
> This means:
> 
>   - It only works with iommu's that provide complete DMA address spaces
> to devices. Won't work with a single 'segmented' address space like we
> have on POWER.
> 
>   - It requires the guest to be pinned. Pass-through -> no more swap
> 
>   - The guest cannot make use of the iommu to deal with 32-bit DMA
> devices, thus a guest with more than a few G of RAM (I don't know the
> exact limit on x86, depends on your IO hole I suppose), and you end up
> back to swiotlb & bounce buffering.
> 
>   - It doesn't work for POWER server anyways because of our need to
> provide a paravirt iommu interface to the guest since that's how pHyp
> works today and how existing OSes expect to operate.
> 
> Now some of this can be fixed with tweaks, and we've started doing it
> (we have a working pass-through using VFIO, forgot to mention that, it's
> just that we don't like what we had to do to get there).
> 
> Basically, what we do today is:
> 
> - We add an ioctl to VFIO to expose to qemu the segment information. IE.
> What is the DMA address and size of the DMA "window" usable for a given
> device. This is a tweak, that should really be handled at the "domain"
> level.
> 
> That current hack won't work well if two devices share an iommu. Note
> that we have an additional constraint here due to our paravirt
> interfaces (specificed in PAPR) which is that PE domains must have a
> common parent. Basically, pHyp makes them look like a PCIe host bridge
> per domain in the guest. I think that's a pretty good idea and qemu
> might want to do the same.
> 
> - We hack out the currently unconditional mapping of the entire guest
> space in the iommu. Something will have to be done to "decide" whether
> to do that or not ... qemu argument -> ioctl ?
> 
> - We hook up the paravirt call to insert/remove a translation from the
> iommu to the VFIO map/unmap ioctl's.
> 
> This limps along but it's not great. Some of the problems are:
> 
> - I've already mentioned, the domain problem again :-) 
> 
> - Performance sucks of course, the vfio map ioctl wasn't mean for that
> and has quite a bit of overhead. However we'll want to do the paravirt
> call directly in the kernel eventually ...
> 
>   - ... which isn't trivial to get back to our underlying arch specific
> iommu object from there. We'll probably need a set of arch specific
> "sideband" ioctl's to "register" our paravirt iommu "bus numbers" and
> link them to the real thing kernel-side.
> 
> - PAPR (the specification of our paravirt interface and the expectation
> of current OSes) wants iommu pages to be 4k by default, regardless of
> the kernel host page size, which makes things a bit tricky since our
> enterprise host kernels have a 64k base page size. Additionally, we have
> new PAPR interfaces that we want to exploit, to allow the guest to
> create secondary iommu segments (in 64-bit space), which can be used
> (under guest control) to do things like map the entire guest (here it
> is :-) or use larger iommu page sizes (if permitted by the host kernel,
> in our case we could allow 64k iommu page size with a 64k host kernel).
> 
> The above means we need arch specific APIs. So arch specific vfio
> ioctl's, either that or kvm ones going to vfio or something ... the
> current structure of vfio/kvm interaction doesn't make it easy.
> 
> * IO space
> 
> On most (if not all) non-x86 archs, each PCI host bridge provide a
> completely separate PCI address space. Qemu doesn't deal with that very
> well. For MMIO it can be handled since those PCI address spaces are
> "remapped" holes in the main CPU address space so devices can be
> registered by using BAR + offset of that window in qemu MMIO mapping.
> 
> For PIO things get nasty. We have totally separate PIO spaces and qemu
> doesn't seem to like that. We can try to play the offset trick as well,
> we haven't tried yet, but basically that's another one to fix. Not a
> huge deal I suppose but heh ...
> 
> Also our next generation chipset may drop support for PIO completely.
> 
> On the other hand, because PIO is just a special range of MMIO for us,
> we can do normal pass-through on it and don't need any of the emulation
> done qemu.
> 
>   * MMIO constraints
> 
> The QEMU side VFIO code hard wires various constraints that are entirely
> based on various requirements you decided you have on x86 but don't
> necessarily apply to us :-)
> 
> Due to our paravirt nature, we don't need to masquerade the MSI-X table
> for example. At all. If the guest configures crap into it, too bad, it
> can only shoot itself in the foot since the host bridge enforce
> validation anyways as I explained earlier. Because it's all paravirt, we
> don't need to "translate" the interrupt vectors & addresses, the guest
> will call hyercalls to configure things anyways.
> 
> We don't need to prevent MMIO pass-through for small BARs at all. This
> should be some kind of capability or flag passed by the arch. Our
> segmentation of the MMIO domain means that we can give entire segments
> to the guest and let it access anything in there (those segments are a
> multiple of the page size always). Worst case it will access outside of
> a device BAR within a segment and will cause the PE to go into error
> state, shooting itself in the foot, there is no risk of side effect
> outside of the guest boundaries.
> 
> In fact, we don't even need to emulate BAR sizing etc... in theory. Our
> paravirt guests expect the BARs to have been already allocated for them
> by the firmware and will pick up the addresses from the device-tree :-)
> 
> Today we use a "hack", putting all 0's in there and triggering the linux
> code path to reassign unassigned resources (which will use BAR
> emulation) but that's not what we are -supposed- to do. Not a big deal
> and having the emulation there won't -hurt- us, it's just that we don't
> really need any of it.
> 
> We have a small issue with ROMs. Our current KVM only works with huge
> pages for guest memory but that is being fixed. So the way qemu maps the
> ROM copy into the guest address space doesn't work. It might be handy
> anyways to have a way for qemu to use MMIO emulation for ROM access as a
> fallback. I'll look into it.
> 
>   * EEH
> 
> This is the name of those fancy error handling & isolation features I
> mentioned earlier. To some extent it's a superset of AER, but we don't
> generally expose AER to guests (or even the host), it's swallowed by
> firmware into something else that provides a superset (well mostly) of
> the AER information, and allow us to do those additional things like
> isolating/de-isolating, reset control etc...
> 
> Here too, we'll need arch specific APIs through VFIO. Not necessarily a
> huge deal, I mention it for completeness.
> 
>    * Misc
> 
> There's lots of small bits and pieces... in no special order:
> 
>  - netlink ? WTF ! Seriously, we don't need a hybrid API with a bit of
> netlink and a bit of ioctl's ... it's not like there's something
> fundamentally  better for netlink vs. ioctl... it really depends what
> you are doing, and in this case I fail to see what netlink brings you
> other than bloat and more stupid userspace library deps.
> 
>  - I don't like too much the fact that VFIO provides yet another
> different API to do what we already have at least 2 kernel APIs for, ie,
> BAR mapping and config space access. At least it should be better at
> using the backend infrastructure of the 2 others (sysfs & procfs). I
> understand it wants to filter in some case (config space) and -maybe-
> yet another API is the right way to go but allow me to have my doubts.
> 
> One thing I thought about but you don't seem to like it ... was to use
> the need to represent the partitionable entity as groups in sysfs that I
> talked about earlier. Those could have per-device subdirs with the usual
> config & resource files, same semantic as the ones in the real device,
> but when accessed via the group they get filtering. I might or might not
> be practical in the end, tbd, but it would allow apps using a slightly
> modified libpci for example to exploit some of this.
> 
>  - The qemu vfio code hooks directly into ioapic ... of course that
> won't fly with anything !x86
> 
>  - The various "objects" dealt with here, -especially- interrupts and
> iommu, need a better in-kernel API so that fast in-kernel emulation can
> take over from qemu based emulation. The way we need to do some of this
> on POWER differs from x86. We can elaborate later, it's not necessarily
> a killer either but essentially we'll take the bulk of interrupt
> handling away from VFIO to the point where it won't see any of it at
> all.
> 
>   - Non-PCI devices. That's a hot topic for embedded. I think the vast
> majority here is platform devices. There's quite a bit of vfio that
> isn't intrinsically PCI specific. We could have an in-kernel platform
> driver like we have an in-kernel PCI driver to attach to. The mapping of
> resources to userspace is rather generic, as goes for interrupts. I
> don't know whether that idea can be pushed much further, I don't have
> the bandwidth to look into it much at this point, but maybe it would be
> possible to refactor vfio a bit to better separate what is PCI specific
> to what is not. The idea would be to move the PCI specific bits to
> inside the "placeholder" PCI driver, and same goes for platform bits.
> "generic" ioctl's go to VFIO core, anything that doesn't handle, it
> passes them to the driver which allows the PCI one to handle things
> differently than the platform one, maybe an amba one while at it,
> etc.... just a thought, I haven't gone into the details at all.
> 
> I think that's all I had on my plate today, it's a long enough email
> anyway :-) Anthony suggested we put that on a wiki, I'm a bit
> wiki-disabled myself so he proposed to pickup my email and do that. We
> should probably discuss the various items in here separately as
> different threads to avoid too much confusion.
> 
> One other thing we should do on our side is publish somewhere our
> current hacks to get you an idea of where we are going and what we had
> to do (code speaks more than words). We'll try to do that asap, possibly
> next week.
> 
> Note that I'll be on/off the next few weeks, travelling and doing
> bringup. So expect latency in my replies.
> 
> Cheers,
> Ben.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Nothing great was ever achieved without enthusiasm.

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-04 10:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras, qemu-devel,
	David Gibson, aafabbri, iommu, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve
In-Reply-To: <1312050011.2265.185.camel@x201.home>

On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote:
> On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote:
> > - The -minimum- granularity of pass-through is not always a single
> > device and not always under SW control
> 
> But IMHO, we need to preserve the granularity of exposing a device to a
> guest as a single device.  That might mean some devices are held hostage
> by an agent on the host.

Thats true. There is a difference between unassign a group from the host
and make single devices in that PE visible to the guest. But we need
to make sure that no device in a PE is used by the host while at least
one device is assigned to a guest.

Unlike the other proposals to handle this in libvirt, I think this
belongs into the kernel. Doing this in userspace may break the entire
system if done wrong.

For example, if one device from e PE is assigned to a guest while
another one is not unbound from its host driver, the driver may get very
confused when DMA just stops working. This may crash the entire system
or lead to silent data corruption in the guest. The behavior is
basically undefined then. The kernel must not not allow that.

	Joerg

^ permalink raw reply

* Re: [PATCH 03/15] [PS3] Add region 1 memory early
From: Geert Uytterhoeven @ 2011-08-04  7:05 UTC (permalink / raw)
  To: Hector Martin; +Cc: Geoff Levand, cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <4E39E302.9060702@marcansoft.com>

On Thu, Aug 4, 2011 at 02:08, Hector Martin <hector@marcansoft.com> wrote:
> tight. Can we get rid of the ps3flash and ps3fb preallocations to save
> bootmem and just allocate them during device init like the other drivers
> do? What is the reason for preallocating these?

The reason for that is to make sure the allocations will succeed.
Chances are very
slim you can allocate a contiguous 9 MiB buffer at any arbitrary time.

Gr{oetje,eeting}s,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k=
.org

In personal conversations with technical people, I call myself a hacker. Bu=
t
when I'm talking to journalists I just say "programmer" or something like t=
hat.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 02/15] [PS3] Get lv1 high memory region from devtree
From: Hector Martin @ 2011-08-04  1:19 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <4E39CBFC.2010501@infradead.org>

On 08/04/2011 12:30 AM, Geoff Levand wrote:
> With this mechanism how is the address of the initrd passed to the
> new kernel, in the DT?

Using the /chosen linux,initrd-{start,end} properties. The bootloader
knows about the Linux trick of sticking together bootmem and highmem and
precalculates the linux "physical" address. Yeah, that's a hack, it
should probably be done in the kernel so the bootloader doesn't have to
know or care about how Linux decides to lay out its physical address
space. Do you have any suggestion as to how we would do this sanely?
Right now early_init_dt_setup_initrd_arch in arch/powerpc/kernel/prom.c
is generic and doesn't know anything about platform specifics.

> How would a kexec based bootloader work?  If it's kernel were to allocate
> high mem and the bootloader program uses the high mem, how could it tell
> that kernel not to destroy the region on shutdown?

The current code contemplates the case where a non-kexec based
bootloader is the first stage and allocates highmem (and knows how to
tell the kernel about it), possibly followed by kexec stages that just
keep that allocation. To support a kexec bootloader as the first
bootloader using this mechanism would indeed require extra support to
tell that kernel to retain its allocation, preferably something that can
be decided from userland. Of course the current kexec bootloader
behavior where highmem isn't handed over to the child kernel will still
work.

> If arch/powerpc/boot/ps3.c allocated the mem and added a DT entry
> then other OSes that don't know about the Linux device tree won't
> be able to use that allocated memory.  Other OSes could do a
> test to see if the allocation was already done.  Another option
> that might work is to write info into the LV1 repository then
> have boot code look there for allocated hig mem.

If you're booting another OS that isn't Linux then it also has no use
for a Linux-specific ramdisk (linux,initrd-start) and thus no use for
preallocated highmem and should be booted as such (maybe make the
userland tools tell the kernel to release highmem if there's no initrd
defined).

Using the lv1 repo is an option, but does it make sense? It's even less
standard than a FDT and we'd have to put both the region1 location and
the initrd location in there (there's no point to maintaining highmem if
you aren't going to use it).

FWIW, the lv1 repo writing hypercalls are unused and undocumented.

>> +	if (!map.r1.size) {
>> +		DBG("%s:%d: no region 1, not adding memory\n",
>> +		    __func__, __LINE__);
>> +		return 0;
>> +	}
> 
> Did you find this to be hit?  Also, in the general case,
> there could be more than one high mem region, but I don't
> know of any current systems that do.

Probably only during debugging, but it doesn't sound like a bad idea
anyway (e.g. bootloader allocated highmem but didn't tell the kernel so
the kernel couldn't allocate it).

As for multiple regions, well, currently it only supports one and that
is hardcoded in the phys->lpar translation, so I see no point in
worrying about that now.

ACK on the other code comments.

-- 
Hector Martin (hector@marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: David Gibson @ 2011-08-04  0:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aafabbri, Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, chrisw, iommu,
	Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1312343090.2653.564.camel@bling.home>

On Tue, Aug 02, 2011 at 09:44:49PM -0600, Alex Williamson wrote:
> On Wed, 2011-08-03 at 12:04 +1000, David Gibson wrote:
> > On Tue, Aug 02, 2011 at 12:35:19PM -0600, Alex Williamson wrote:
> > > On Tue, 2011-08-02 at 12:14 -0600, Alex Williamson wrote:
> > > > On Tue, 2011-08-02 at 18:28 +1000, David Gibson wrote:
> > > > > On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote:
> > > > > > On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote:
> > > > > [snip]
> > > > > > On x86, the USB controllers don't typically live behind a PCIe-to-PCI
> > > > > > bridge, so don't suffer the source identifier problem, but they do often
> > > > > > share an interrupt.  But even then, we can count on most modern devices
> > > > > > supporting PCI2.3, and thus the DisINTx feature, which allows us to
> > > > > > share interrupts.  In any case, yes, it's more rare but we need to know
> > > > > > how to handle devices behind PCI bridges.  However I disagree that we
> > > > > > need to assign all the devices behind such a bridge to the guest.
> > > > > > There's a difference between removing the device from the host and
> > > > > > exposing the device to the guest.
> > > > > 
> > > > > I think you're arguing only over details of what words to use for
> > > > > what, rather than anything of substance here.  The point is that an
> > > > > entire partitionable group must be assigned to "host" (in which case
> > > > > kernel drivers may bind to it) or to a particular guest partition (or
> > > > > at least to a single UID on the host).  Which of the assigned devices
> > > > > the partition actually uses is another matter of course, as is at
> > > > > exactly which level they become "de-exposed" if you don't want to use
> > > > > all of then.
> > > > 
> > > > Well first we need to define what a partitionable group is, whether it's
> > > > based on hardware requirements or user policy.  And while I agree that
> > > > we need unique ownership of a partition, I disagree that qemu is
> > > > necessarily the owner of the entire partition vs individual devices.
> > > 
> > > Sorry, I didn't intend to have such circular logic.  "... I disagree
> > > that qemu is necessarily the owner of the entire partition vs granted
> > > access to devices within the partition".  Thanks,
> > 
> > I still don't understand the distinction you're making.  We're saying
> > the group is "owned" by a given user or guest in the sense that no-one
> > else may use anything in the group (including host drivers).  At that
> > point none, some or all of the devices in the group may actually be
> > used by the guest.
> > 
> > You seem to be making a distinction between "owned by" and "assigned
> > to" and "used by" and I really don't see what it is.
> 
> How does a qemu instance that uses none of the devices in a group still
> own that group?

?? In the same way that you still own a file you don't have open..?

>  Aren't we at that point free to move the group to a
> different qemu instance or return ownership to the host?

Of course.  But until you actually do that, the group is still
notionally owned by the guest.

>  Who does that?

The admin.  Possily by poking sysfs, or possibly by frobbing some
character device, or maybe something else.  Naturally libvirt or
whatever could also do this.

> In my mental model, there's an intermediary that "owns" the group and
> just as kernel drivers bind to devices when the host owns the group,
> qemu is a userspace device driver that binds to sets of devices when the
> intermediary owns it.  Obviously I'm thinking libvirt, but it doesn't
> have to be.  Thanks,

Well sure, but I really don't see how such an intermediary fits into
the kernel's model of ownership.

So, first, take a step back and look at what sort of entities can
"own" a group (or device or whatever).  I notice that when I've said
"owned by the guest" you seem to have read this as "owned by qemu"
which is not necessarily the same thing.

What I had in mind is that each group is either owned by "host", in
which case host kernel drivers can bind to it, or it's in "guest mode"
in which case it has a user, group and mode and can be bound by user
drivers (and therefore guests) with the right permission.  From the
kernel's perspective there is therefore no distinction between "owned
by qemu" and "owned by libvirt".


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* Re: [PATCH 03/15] [PS3] Add region 1 memory early
From: Hector Martin @ 2011-08-04  0:08 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <4E39CC79.3050208@infradead.org>

On 08/04/2011 12:32 AM, Geoff Levand wrote:
> We need an explanation of this change.

I actually have a hard time understanding the reason for the existing
behavior of hot-adding memory halfway through the boot process. Maybe
you can shed some light on this?

The reason for the change is that under the default GameOS LPAR, real
mode memory is 16MB which is already tight for a kernel (under certain
conditions) and runs out quickly as memory is allocated during kernel
startup. Having region1 available sooner fixes this.

Though, reviewing the code, I think I found a bug (that should already
have a chance of happening as things stand now, though this patch might
make it more likely): if storage bounce buffers or the ps3fb xdr happen
to straddle the boundary between the regions, bad things will happen
since they're not actually contiguous in LPAR space. This won't happen
right now for ps3flash or ps3fb since those are allocated early out of
bootmem, but it can currently happen for the other buffers (ps3disk,
ps3vram, etc.) AFAICT.

Maybe we should introduce a reserved or nonexistent page gap at the
beginning of region1 to ensure that nothing will ever allocate
contiguous memory across the boundary. That will probably prevent
bootmem from grabbing region1 due to the gap, so early on memory will be
tight. Can we get rid of the ps3flash and ps3fb preallocations to save
bootmem and just allocate them during device init like the other drivers
do? What is the reason for preallocating these?

-- 
Hector Martin (hector@marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

^ permalink raw reply

* Re: [PATCH 07/15] ps3flash: Refuse to work in lpars other than OtherOS
From: Geoff Levand @ 2011-08-03 22:34 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-8-git-send-email-a.heider@gmail.com>

On 08/01/2011 01:02 PM, Andre Heider wrote:
> The driver implements a character and misc device, meant for the
> axed OtherOS to exchange various settings with GameOS.
> Since Firmware 3.21 there is no GameOS support anymore to write these
> settings, so limit the driver to the OtherOS environment.

This is really a test if running on the PS3 OtherOS, so this
comment should state that.

> 
> Signed-off-by: Andre Heider <a.heider@gmail.com>
> ---
>  arch/powerpc/platforms/ps3/Kconfig |    1 +
>  drivers/char/ps3flash.c            |    7 +++++++
>  2 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/ps3/Kconfig b/arch/powerpc/platforms/ps3/Kconfig
> index 84df5c8..5eb956a 100644
> --- a/arch/powerpc/platforms/ps3/Kconfig
> +++ b/arch/powerpc/platforms/ps3/Kconfig
> @@ -121,6 +121,7 @@ config PS3_FLASH
>  
>  	  This support is required to access the PS3 FLASH ROM, which
>  	  contains the boot loader and some boot options.
> +	  This driver only supports the deprecated OtherOS LPAR.

This will be confusing for OtherOS users, so should be removed.

>  	  In general, all users will say Y or M.


This could be changed to: 'In general, all PS3 OtherOS users will say Y or M.'

>  
>  	  As this driver needs a fixed buffer of 256 KiB of memory, it can
> diff --git a/drivers/char/ps3flash.c b/drivers/char/ps3flash.c
> index 69c734a..b1e8659 100644
> --- a/drivers/char/ps3flash.c
> +++ b/drivers/char/ps3flash.c
> @@ -25,6 +25,7 @@
>  
>  #include <asm/lv1call.h>
>  #include <asm/ps3stor.h>
> +#include <asm/firmware.h>
>  
>  
>  #define DEVICE_NAME		"ps3flash"
> @@ -455,6 +456,12 @@ static struct ps3_system_bus_driver ps3flash = {
>  
>  static int __init ps3flash_init(void)
>  {
> +	if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
> +		return -ENODEV;

Is this needed?  Won't this driver only be loaded on PS3 hardware?

> +
> +	if (ps3_get_ss_laid() != PS3_SS_LAID_OTHEROS)
> +		return -ENODEV;
> +
>  	return ps3_system_bus_driver_register(&ps3flash);
>  }
>  

-Geoff

^ permalink raw reply

* Re: [PATCH 03/15] [PS3] Add region 1 memory early
From: Geoff Levand @ 2011-08-03 22:32 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-4-git-send-email-a.heider@gmail.com>

On 08/01/2011 01:02 PM, Andre Heider wrote:
> From: Hector Martin <hector@marcansoft.com>

We need an explanation of this change.

> Signed-off-by: Hector Martin <hector@marcansoft.com>
> [a.heider: Various cleanups to make checkpatch.pl happy]
> Signed-off-by: Andre Heider <a.heider@gmail.com>
> ---
>  arch/powerpc/platforms/ps3/mm.c |   62 +++++++--------------------------------
>  1 files changed, 11 insertions(+), 51 deletions(-)

^ permalink raw reply

* Re: [PATCH 01/15] [PS3] Add udbg driver using the PS3 gelic Ethernet device
From: Geoff Levand @ 2011-08-03 22:32 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-2-git-send-email-a.heider@gmail.com>

On 08/01/2011 01:02 PM, Andre Heider wrote:
> --- /dev/null
> +++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
> @@ -0,0 +1,272 @@
> +/*
> + * arch/powerpc/platforms/ps3/gelic_udbg.c

Don't put file names in files.  When the file gets moved, then this will
no longer be correct.

> + *
> + * udbg debug output routine via GELIC UDP broadcasts
> + * Copyright (C) 2010 Hector Martin <hector@marcansoft.com>
> + * Copyright (C) 2011 Andre Heider <a.heider@gmail.com>

Some of this seems to be taken from the gelic driver, so shouldn't
the copyright info from there be included here?

> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + */
> +
> +#include <asm/io.h>
> +#include <asm/udbg.h>
> +#include <asm/lv1call.h>
> +
> +#define GELIC_BUS_ID 1
> +#define GELIC_DEVICE_ID 0
> +#define GELIC_DEBUG_PORT 18194
> +#define GELIC_MAX_MESSAGE_SIZE 1000
> +
> +#define GELIC_LV1_GET_MAC_ADDRESS 1
> +#define GELIC_LV1_GET_VLAN_ID 4
> +#define GELIC_LV1_VLAN_TX_ETHERNET_0 2
> +
> +#define GELIC_DESCR_DMA_STAT_MASK 0xf0000000
> +#define GELIC_DESCR_DMA_CARDOWNED 0xa0000000
> +
> +#define GELIC_DESCR_TX_DMA_IKE 0x00080000
> +#define GELIC_DESCR_TX_DMA_NO_CHKSUM 0x00000000
> +#define GELIC_DESCR_TX_DMA_FRAME_TAIL 0x00040000
> +
> +#define GELIC_DESCR_DMA_CMD_NO_CHKSUM (GELIC_DESCR_DMA_CARDOWNED | \
> +				       GELIC_DESCR_TX_DMA_IKE | \
> +				       GELIC_DESCR_TX_DMA_NO_CHKSUM)
> +
> +static u64 bus_addr;
> +
> +struct gelic_descr {
> +	/* as defined by the hardware */

These are BE from the hardware, so should be __beXX types.

> +	u32 buf_addr;
> +	u32 buf_size;
> +	u32 next_descr_addr;
> +	u32 dmac_cmd_status;
> +	u32 result_size;
> +	u32 valid_size;	/* all zeroes for tx */
> +	u32 data_status;
> +	u32 data_error;	/* all zeroes for tx */
> +} __attribute__((aligned(32)));

...

> +static void gelic_debug_init(void)
> +{

...

> +	result = lv1_net_control(GELIC_BUS_ID, GELIC_DEVICE_ID,
> +				 GELIC_LV1_GET_VLAN_ID,
> +				 GELIC_LV1_VLAN_TX_ETHERNET_0, 0, 0,
> +				 &vlan_id, &v2);
> +	if (result == 0) {

This should be 'if (!result)'

-Geoff

^ permalink raw reply

* Re: [PATCH 05/15] ps3: Detect the current lpar environment
From: Geoff Levand @ 2011-08-03 22:31 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-6-git-send-email-a.heider@gmail.com>

On 08/01/2011 01:02 PM, Andre Heider wrote:
> ---
>  arch/powerpc/include/asm/ps3.h          |    7 +++++++
>  arch/powerpc/platforms/ps3/platform.h   |    4 ++++
>  arch/powerpc/platforms/ps3/repository.c |   19 +++++++++++++++++++
>  arch/powerpc/platforms/ps3/setup.c      |   22 ++++++++++++++++++++++
>  4 files changed, 52 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ps3.h b/arch/powerpc/include/asm/ps3.h
> index 7f065e1..136354a 100644
> --- a/arch/powerpc/include/asm/ps3.h
> +++ b/arch/powerpc/include/asm/ps3.h
> @@ -39,6 +39,13 @@ union ps3_firmware_version {
>  void ps3_get_firmware_version(union ps3_firmware_version *v);
>  int ps3_compare_firmware_version(u16 major, u16 minor, u16 rev);
>  
> +enum ps3_ss_laid {
> +	PS3_SS_LAID_GAMEOS = 0x1070000002000001UL,
> +	PS3_SS_LAID_OTHEROS = 0x1080000004000001UL,

Only PS3_SS_LAID_OTHEROS is used for anything outside ps3_setup_arch(),
so I think it makes sense to split this into two patches with one adding
just PS3_SS_LAID_OTHEROS and ps3_get_ss_laid() with a comment that
it adds the ps3_get_ss_laid routine.

> +};
> +
> +enum ps3_ss_laid ps3_get_ss_laid(void);
> +
>  /* 'Other OS' area */
>  
>  enum ps3_param_av_multi_out {
> diff --git a/arch/powerpc/platforms/ps3/platform.h b/arch/powerpc/platforms/ps3/platform.h
> index 9a196a8..1ba15b8 100644
> --- a/arch/powerpc/platforms/ps3/platform.h
> +++ b/arch/powerpc/platforms/ps3/platform.h
> @@ -232,4 +232,8 @@ int ps3_repository_read_spu_resource_id(unsigned int res_index,
>  int ps3_repository_read_vuart_av_port(unsigned int *port);
>  int ps3_repository_read_vuart_sysmgr_port(unsigned int *port);
>  
> +/* repository ss info */
> +
> +int ps3_repository_read_ss_laid(enum ps3_ss_laid *laid);
> +
>  #endif
> diff --git a/arch/powerpc/platforms/ps3/repository.c b/arch/powerpc/platforms/ps3/repository.c
> index 5e304c2..6fa3e96 100644
> --- a/arch/powerpc/platforms/ps3/repository.c
> +++ b/arch/powerpc/platforms/ps3/repository.c
> @@ -1002,6 +1002,25 @@ int ps3_repository_read_lpm_privileges(unsigned int be_index, u64 *lpar,
>  			    lpar, rights);
>  }
>  
> +/**
> + * ps3_repository_read_ss_laid - Read the lpar auth id
> + */
> +
> +int ps3_repository_read_ss_laid(enum ps3_ss_laid *laid)
> +{
> +	int result;
> +	u64 id, v1;
> +
> +	lv1_get_logical_partition_id(&id);
> +	result = read_node(PS3_LPAR_ID_PME,
> +			   make_first_field("ss", 0),
> +			   make_field("laid", 0),
> +			   id, 0,
> +			   &v1, NULL);
> +	*laid = v1;
> +	return result;
> +}
> +
>  #if defined(DEBUG)
>  
>  int ps3_repository_dump_resource_info(const struct ps3_repository_device *repo)
> diff --git a/arch/powerpc/platforms/ps3/setup.c b/arch/powerpc/platforms/ps3/setup.c
> index 149bea2..f430279 100644
> --- a/arch/powerpc/platforms/ps3/setup.c
> +++ b/arch/powerpc/platforms/ps3/setup.c
> @@ -47,6 +47,7 @@ DEFINE_MUTEX(ps3_gpu_mutex);
>  EXPORT_SYMBOL_GPL(ps3_gpu_mutex);
>  
>  static union ps3_firmware_version ps3_firmware_version;
> +static enum ps3_ss_laid ps3_ss_laid;
>  
>  void ps3_get_firmware_version(union ps3_firmware_version *v)
>  {
> @@ -68,6 +69,12 @@ int ps3_compare_firmware_version(u16 major, u16 minor, u16 rev)
>  }
>  EXPORT_SYMBOL_GPL(ps3_compare_firmware_version);
>  
> +enum ps3_ss_laid ps3_get_ss_laid(void)
> +{
> +	return ps3_ss_laid;
> +}
> +EXPORT_SYMBOL_GPL(ps3_get_ss_laid);
> +
>  static void ps3_power_save(void)
>  {
>  	/*
> @@ -192,6 +199,7 @@ static int ps3_set_dabr(unsigned long dabr)
>  
>  static void __init ps3_setup_arch(void)
>  {
> +	const char *laid_str;
>  
>  	DBG(" -> %s:%d\n", __func__, __LINE__);
>  
> @@ -200,6 +208,20 @@ static void __init ps3_setup_arch(void)
>  	       ps3_firmware_version.major, ps3_firmware_version.minor,
>  	       ps3_firmware_version.rev);
>  
> +	ps3_repository_read_ss_laid(&ps3_ss_laid);
> +	switch (ps3_ss_laid) {
> +	case PS3_SS_LAID_GAMEOS:
> +		laid_str = "GameOS";
> +		break;
> +	case PS3_SS_LAID_OTHEROS:
> +		laid_str = "OtherOS";
> +		break;
> +	default:
> +		laid_str = "unknown";
> +		break;
> +	}
> +	printk(KERN_INFO "Running in %s lpar\n", laid_str);
> +
>  	ps3_spu_set_platform();
>  
>  #ifdef CONFIG_SMP

^ permalink raw reply

* Re: [PATCH 02/15] [PS3] Get lv1 high memory region from devtree
From: Geoff Levand @ 2011-08-03 22:30 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-3-git-send-email-a.heider@gmail.com>

On 08/01/2011 01:02 PM, Andre Heider wrote:
> 
> This lets the bootloader preallocate the high lv1 region and pass its
> location to the kernel through the devtree. Thus, it can be used to hold
> the initrd. If the property doesn't exist, the kernel retains the old
> behavior and attempts to allocate the region itself.

With this mechanism how is the address of the initrd passed to the
new kernel, in the DT?

How would a kexec based bootloader work?  If it's kernel were to allocate
high mem and the bootloader program uses the high mem, how could it tell
that kernel not to destroy the region on shutdown?

If arch/powerpc/boot/ps3.c allocated the mem and added a DT entry
then other OSes that don't know about the Linux device tree won't
be able to use that allocated memory.  Other OSes could do a
test to see if the allocation was already done.  Another option
that might work is to write info into the LV1 repository then
have boot code look there for allocated hig mem.

> Signed-off-by: Hector Martin <hector@marcansoft.com>
> [a.heider: Various cleanups to make checkpatch.pl happy]
> Signed-off-by: Andre Heider <a.heider@gmail.com>
> ---
>  arch/powerpc/platforms/ps3/mm.c |   61 +++++++++++++++++++++++++++++++++++++-
>  1 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c
> index c204588..30bb096 100644
> --- a/arch/powerpc/platforms/ps3/mm.c
> +++ b/arch/powerpc/platforms/ps3/mm.c
> @@ -110,6 +110,7 @@ struct map {
>  	u64 htab_size;
>  	struct mem_region rm;
>  	struct mem_region r1;
> +	int destroy_r1;

In the general case we could have multiple high mem
regions, and each could need to be destroyed, so I
think struct mem_region should have a destroy flag.

>  };
>  
>  #define debug_dump_map(x) _debug_dump_map(x, __func__, __LINE__)
> @@ -287,6 +288,49 @@ static void ps3_mm_region_destroy(struct mem_region *r)
>  	}
>  }
>  
> +static int ps3_mm_scan_memory(unsigned long node, const char *uname,
> +			      int depth, void *data)
> +{

Something like 'ps3_mm_dt_scan_highmem() is more descriptive.

> +	struct mem_region *r = data;
> +	void *p;
> +	u64 prop[2];
> +	unsigned long l;
> +	char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +
> +	if (type == NULL)
> +		return 0;
> +	if (strcmp(type, "memory") != 0)

Should this be 'if (strcmp(type, "memory"))'?

> +		return 0;
> +
> +	p = of_get_flat_dt_prop(node, "sony,lv1-highmem", &l);
> +	if (p == NULL)
> +		return 0;
> +
> +	BUG_ON(l != sizeof(prop));
> +	memcpy(prop, p, sizeof(prop));
> +
> +	r->base = prop[0];
> +	r->size = prop[1];
> +	r->offset = r->base - map.rm.size;
> +
> +	return -1;
> +}
> +
> +static int ps3_mm_get_devtree_highmem(struct mem_region *r)
> +{
> +	r->size = r->base = r->offset = 0;
> +	of_scan_flat_dt(ps3_mm_scan_memory, r);
> +
> +	if (r->base && r->size) {
> +		DBG("%s:%d got high region from devtree: %llxh %llxh\n",
> +		__func__, __LINE__, r->base, r->size);
> +		return 0;
> +	} else {
> +		DBG("%s:%d no high region in devtree...\n", __func__, __LINE__);
> +		return -1;
> +	}
> +}
> +
>  /**
>   * ps3_mm_add_memory - hot add memory
>   */
> @@ -303,6 +347,12 @@ static int __init ps3_mm_add_memory(void)
>  
>  	BUG_ON(!mem_init_done);
>  
> +	if (!map.r1.size) {
> +		DBG("%s:%d: no region 1, not adding memory\n",
> +		    __func__, __LINE__);
> +		return 0;
> +	}

Did you find this to be hit?  Also, in the general case,
there could be more than one high mem region, but I don't
know of any current systems that do.

> +
>  	start_addr = map.rm.size;
>  	start_pfn = start_addr >> PAGE_SHIFT;
>  	nr_pages = (map.r1.size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> @@ -1219,7 +1269,13 @@ void __init ps3_mm_init(void)
>  
>  
>  	/* arrange to do this in ps3_mm_add_memory */
> -	ps3_mm_region_create(&map.r1, map.total - map.rm.size);
> +
> +	if (ps3_mm_get_devtree_highmem(&map.r1) == 0) {
> +		map.destroy_r1 = 0;
> +	} else {

This should be

	if (!ps3_mm_get_devtree_highmem(&map.r1))
		map.destroy_r1 = 0;
	else {

> +		ps3_mm_region_create(&map.r1, map.total - map.rm.size);
> +		map.destroy_r1 = 1;
> +	}
>  
>  	/* correct map.total for the real total amount of memory we use */
>  	map.total = map.rm.size + map.r1.size;
> @@ -1233,5 +1289,6 @@ void __init ps3_mm_init(void)
>  
>  void ps3_mm_shutdown(void)
>  {
> -	ps3_mm_region_destroy(&map.r1);
> +	if (map.destroy_r1)
> +		ps3_mm_region_destroy(&map.r1);
>  }

-Geoff

^ permalink raw reply

* Re: [PATCH 00/15] ps3: Support more than the OtherOS lpar
From: Geoff Levand @ 2011-08-03 22:23 UTC (permalink / raw)
  To: Andre Heider; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <1312228986-32307-1-git-send-email-a.heider@gmail.com>

Hi Andre,

On 08/01/2011 01:02 PM, Andre Heider wrote:
> This series addresses various issues and extends support when running
> in lpars like GameOS. Included are some patches from Hector Martin, which
> I found useful.

Much of this is just general fixups and improvements to the existing PS3
support.  I think you should separate those changes out and work to get
them included, then consider others.  If I give some comment, then
I consider that part worth pursuing at the present time.

I have limited time to review the patches, so it will take me a while to
get through them.

> Patches are based on 2.6.39 since master doesn't boot with smp on my
> console.  I wasn't able to pinpoint the cause so far (not that I tried
> too hard).

I'm looking into this problem, but it will take some time.

-Geoff

^ permalink raw reply

* [PATCH] powerpc/kvm: fix build errors with older toolchains
From: Nishanth Aravamudan @ 2011-08-03 18:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Marcelo Tosatti, Alexander Graf, kvm-ppc, linux-kernel,
	Paul Mackerras, Avi Kivity, linuxppc-dev

On a box with gcc 4.3.2, I see errors like:

arch/powerpc/kvm/book3s_hv_rmhandlers.S:1254: Error: Unrecognized opcode: stxvd2x
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1316: Error: Unrecognized opcode: lxvd2x

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6dd3358..de29501 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1251,7 +1251,7 @@ BEGIN_FTR_SECTION
 	reg = 0
 	.rept	32
 	li	r6,reg*16+VCPU_VSRS
-	stxvd2x	reg,r6,r3
+	STXVD2X(reg,r6,r3)
 	reg = reg + 1
 	.endr
 FTR_SECTION_ELSE
@@ -1313,7 +1313,7 @@ BEGIN_FTR_SECTION
 	reg = 0
 	.rept	32
 	li	r7,reg*16+VCPU_VSRS
-	lxvd2x	reg,r7,r4
+	LXVD2X(reg,r7,r4)
 	reg = reg + 1
 	.endr
 FTR_SECTION_ELSE
-- 
1.7.4.1

^ permalink raw reply related

* Re: [RFC PATCH V1 3/7] cpuidle: stop using pm_idle
From: Len Brown @ 2011-08-03 17:45 UTC (permalink / raw)
  To: Trinabh Gupta; +Cc: linuxppc-dev, linux-pm, linux-kernel
In-Reply-To: <20110607162947.6848.79430.stgit@tringupt.in.ibm.com>

On Tue, 7 Jun 2011, Trinabh Gupta wrote:

> From: Len Brown <len.brown@intel.com>
> 
> pm_idle does not scale as an idle handler registration mechanism.
> Don't use it for cpuidle.  Instead, call cpuidle directly, and
> allow architectures to use pm_idle as an arch-specific default
> if they need it.  ie.
> 
> cpu_idle()
> 	...
> 	if(cpuidle_call_idle())

Looks like you forgot to correct my typo that you pointed out earlier,
s/cpuidle_call_idle/cpuidle_idle_call/

both in the comment here and for arm and sh below.

Thanks for including the From: above, that is correct form.
But note in the future that when you modify somebody else's patch,
you should append a note about what you changed,
and also add your signed-off-by, so we can
track the changes.

thanks,
-Len

> 		pm_idle();
> 
> cc: x86@kernel.org
> cc: Kevin Hilman <khilman@deeprootsystems.com>
> cc: Paul Mundt <lethal@linux-sh.org>
> Signed-off-by: Len Brown <len.brown@intel.com>
> 
> ---
> 
>  arch/arm/kernel/process.c    |    4 +++-
>  arch/sh/kernel/idle.c        |    6 ++++--
>  arch/x86/kernel/process_32.c |    4 +++-
>  arch/x86/kernel/process_64.c |    4 +++-
>  drivers/cpuidle/cpuidle.c    |   39 ++++++++++++++++++---------------------
>  include/linux/cpuidle.h      |    2 ++
>  6 files changed, 33 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 5e1e541..d7ee0d4 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -30,6 +30,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/random.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/cpuidle.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/leds.h>
> @@ -196,7 +197,8 @@ void cpu_idle(void)
>  				cpu_relax();
>  			} else {
>  				stop_critical_timings();
> -				pm_idle();
> +				if (cpuidle_call_idle())
> +					pm_idle();
>  				start_critical_timings();
>  				/*
>  				 * This will eventually be removed - pm_idle
> diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
> index 425d604..9c7099e 100644
> --- a/arch/sh/kernel/idle.c
> +++ b/arch/sh/kernel/idle.c
> @@ -16,12 +16,13 @@
>  #include <linux/thread_info.h>
>  #include <linux/irqflags.h>
>  #include <linux/smp.h>
> +#include <linux/cpuidle.h>
>  #include <asm/pgalloc.h>
>  #include <asm/system.h>
>  #include <asm/atomic.h>
>  #include <asm/smp.h>
>  
> -void (*pm_idle)(void) = NULL;
> +static void (*pm_idle)(void);
>  
>  static int hlt_counter;
>  
> @@ -100,7 +101,8 @@ void cpu_idle(void)
>  			local_irq_disable();
>  			/* Don't trace irqs off for idle */
>  			stop_critical_timings();
> -			pm_idle();
> +			if (cpuidle_call_idle())
> +				pm_idle();
>  			/*
>  			 * Sanity check to ensure that pm_idle() returns
>  			 * with IRQs enabled
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 8d12878..61fadbe 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -38,6 +38,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/io.h>
>  #include <linux/kdebug.h>
> +#include <linux/cpuidle.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/system.h>
> @@ -109,7 +110,8 @@ void cpu_idle(void)
>  			local_irq_disable();
>  			/* Don't trace irqs off for idle */
>  			stop_critical_timings();
> -			pm_idle();
> +			if (cpuidle_idle_call())
> +				pm_idle();
>  			start_critical_timings();
>  		}
>  		tick_nohz_restart_sched_tick();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 6c9dd92..62c219a 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -37,6 +37,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/io.h>
>  #include <linux/ftrace.h>
> +#include <linux/cpuidle.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/system.h>
> @@ -136,7 +137,8 @@ void cpu_idle(void)
>  			enter_idle();
>  			/* Don't trace irqs off for idle */
>  			stop_critical_timings();
> -			pm_idle();
> +			if (cpuidle_idle_call())
> +				pm_idle();
>  			start_critical_timings();
>  
>  			/* In many cases the interrupt that ended idle
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 8d7303b..304e378 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -25,10 +25,10 @@ DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
>  
>  DEFINE_MUTEX(cpuidle_lock);
>  LIST_HEAD(cpuidle_detected_devices);
> -static void (*pm_idle_old)(void);
>  
>  static int enabled_devices;
>  static int off __read_mostly;
> +static int initialized __read_mostly;
>  
>  int cpuidle_disabled(void)
>  {
> @@ -56,27 +56,24 @@ static int __cpuidle_register_device(struct cpuidle_device *dev);
>   * cpuidle_idle_call - the main idle loop
>   *
>   * NOTE: no locks or semaphores should be used here
> + * return non-zero on failure
>   */
> -static void cpuidle_idle_call(void)
> +int cpuidle_idle_call(void)
>  {
>  	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>  	struct cpuidle_driver *drv = cpuidle_get_driver();
>  	struct cpuidle_state *target_state;
>  	int next_state, entered_state;
>  
> -	/* check if the device is ready */
> -	if (!dev || !dev->enabled) {
> -		if (pm_idle_old)
> -			pm_idle_old();
> -		else
> -#if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
> -			default_idle();
> -#else
> -			local_irq_enable();
> -#endif
> -		return;
> -	}
> +	if (off)
> +		return -ENODEV;
> +
> +	if (!initialized)
> +		return -ENODEV;
>  
> +	/* check if the device is ready */
> +	if (!dev || !dev->enabled)
> +		return -EBUSY;
>  #if 0
>  	/* shows regressions, re-enable for 2.6.29 */
>  	/*
> @@ -90,7 +87,7 @@ static void cpuidle_idle_call(void)
>  	next_state = cpuidle_curr_governor->select(drv, dev);
>  	if (need_resched()) {
>  		local_irq_enable();
> -		return;
> +		return 0;
>  	}
>  
>  	target_state = &drv->states[next_state];
> @@ -116,6 +113,8 @@ static void cpuidle_idle_call(void)
>  	/* give the governor an opportunity to reflect on the outcome */
>  	if (cpuidle_curr_governor->reflect)
>  		cpuidle_curr_governor->reflect(dev, entered_state);
> +
> +	return 0;
>  }
>  
>  /**
> @@ -123,10 +122,10 @@ static void cpuidle_idle_call(void)
>   */
>  void cpuidle_install_idle_handler(void)
>  {
> -	if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> +	if (enabled_devices) {
>  		/* Make sure all changes finished before we switch to new idle */
>  		smp_wmb();
> -		pm_idle = cpuidle_idle_call;
> +		initialized = 1;
>  	}
>  }
>  
> @@ -135,8 +134,8 @@ void cpuidle_install_idle_handler(void)
>   */
>  void cpuidle_uninstall_idle_handler(void)
>  {
> -	if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> -		pm_idle = pm_idle_old;
> +	if (enabled_devices) {
> +		initialized = 0;
>  		cpuidle_kick_cpus();
>  	}
>  }
> @@ -410,8 +409,6 @@ static int __init cpuidle_init(void)
>  	if (cpuidle_disabled())
>  		return -ENODEV;
>  
> -	pm_idle_old = pm_idle;
> -
>  	ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
>  	if (ret)
>  		return ret;
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 2786787..c904188 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -128,6 +128,7 @@ struct cpuidle_driver {
>  
>  #ifdef CONFIG_CPU_IDLE
>  extern void disable_cpuidle(void);
> +extern int cpuidle_idle_call(void);
>  
>  extern int cpuidle_register_driver(struct cpuidle_driver *drv);
>  struct cpuidle_driver *cpuidle_get_driver(void);
> @@ -142,6 +143,7 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
>  
>  #else
>  static inline void disable_cpuidle(void) { }
> +static inline int cpuidle_idle_call(void) { return -ENODEV; }
>  
>  static inline int cpuidle_register_driver(struct cpuidle_driver *drv)
>  {return -ENODEV; }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* RE: [PATCH 4/4] edac/85xx: PCI/PCIE error interrupt edac support.
From: Xie Shaohui-B21989 @ 2011-08-03  9:58 UTC (permalink / raw)
  To: Xie Shaohui-B21989, linuxppc-dev@lists.ozlabs.org, Kumar Gala
  Cc: mm-commits@vger.kernel.org, avorontsov@mvista.com,
	Jiang Kai-B18973, akpm@linux-foundation.org, davem@davemloft.net
In-Reply-To: <1311244404-4463-1-git-send-email-Shaohui.Xie@freescale.com>

Hi all,

Any concerns of this patch?


Best Regards,=20
Shaohui Xie=20


>-----Original Message-----
>From: Xie Shaohui-B21989
>Sent: Tuesday, July 26, 2011 2:52 PM
>To: linuxppc-dev@lists.ozlabs.org; Kumar Gala
>Cc: mm-commits@vger.kernel.org; avorontsov@mvista.com; davem@davemloft.net=
;
>grant.likely@secretlab.ca; akpm@linux-foundation.org; Jiang Kai-B18973
>Subject: RE: [PATCH 4/4] edac/85xx: PCI/PCIE error interrupt edac support.
>
>I've verified this patch can apply for galak/powerpc.git 'next' branch
>with no change.
>
>
>Best Regards,
>Shaohui Xie
>
>
>>-----Original Message-----
>>From: Xie Shaohui-B21989
>>Sent: Thursday, July 21, 2011 6:33 PM
>>To: linuxppc-dev@lists.ozlabs.org
>>Cc: Gala Kumar-B11780; mm-commits@vger.kernel.org; avorontsov@mvista.com;
>>davem@davemloft.net; grant.likely@secretlab.ca; akpm@linux-foundation.org=
;
>>Jiang Kai-B18973; Kumar Gala; Xie Shaohui-B21989
>>Subject: [PATCH 4/4] edac/85xx: PCI/PCIE error interrupt edac support.
>>
>>From: Kai.Jiang <Kai.Jiang@freescale.com>
>>
>>Add pcie error interrupt edac support for mpc85xx and p4080.
>>mpc85xx uses the legacy interrupt report mechanism - the error interrupts
>>are reported directly to mpic. While, p4080 attaches most of error
>>interrupts to interrupt 0. And report error interrupt to mpic via
>>interrupt 0. This patch can handle both of them.
>>
>>
>>Due to the error management register offset and definition
>>
>>difference between pci and pcie, use ccsr_pci structure to merge pci and
>>pcie edac code into one.
>>
>>Signed-off-by: Kai.Jiang <Kai.Jiang@freescale.com>
>>Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
>>Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
>>---
>> drivers/edac/mpc85xx_edac.c |  239 ++++++++++++++++++++++++++++++++-----
>-
>>----
>> drivers/edac/mpc85xx_edac.h |   17 +--
>> 2 files changed, 188 insertions(+), 68 deletions(-)
>>
>>diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
>>index b048a5f..dde156f 100644
>>--- a/drivers/edac/mpc85xx_edac.c
>>+++ b/drivers/edac/mpc85xx_edac.c
>>@@ -1,5 +1,6 @@
>> /*
>>  * Freescale MPC85xx Memory Controller kenel module
>>+ * Copyright (c) 2011 Freescale Semiconductor, Inc.
>>  *
>>  * Author: Dave Jiang <djiang@mvista.com>
>>  *
>>@@ -21,6 +22,8 @@
>>
>> #include <linux/of_platform.h>
>> #include <linux/of_device.h>
>>+#include <include/asm/pci.h>
>>+#include <sysdev/fsl_pci.h>
>> #include "edac_module.h"
>> #include "edac_core.h"
>> #include "mpc85xx_edac.h"
>>@@ -34,14 +37,6 @@ static int edac_mc_idx;  static u32
>>orig_ddr_err_disable;  static u32 orig_ddr_err_sbe;
>>
>>-/*
>>- * PCI Err defines
>>- */
>>-#ifdef CONFIG_PCI
>>-static u32 orig_pci_err_cap_dr;
>>-static u32 orig_pci_err_en;
>>-#endif
>>-
>> static u32 orig_l2_err_disable;
>> #ifdef CONFIG_FSL_SOC_BOOKE
>> static u32 orig_hid1[2];
>>@@ -151,37 +146,52 @@ static void mpc85xx_pci_check(struct
>>edac_pci_ctl_info *pci)  {
>> 	struct mpc85xx_pci_pdata *pdata =3D pci->pvt_info;
>> 	u32 err_detect;
>>+	struct ccsr_pci *reg =3D pdata->pci_reg;
>>+
>>+	err_detect =3D in_be32(&pdata->pci_reg->pex_err_dr);
>>+
>>+	if (pdata->pcie_flag) {
>>+		printk(KERN_ERR "PCIE error(s) detected\n");
>>+		printk(KERN_ERR "PCIE ERR_DR register: 0x%08x\n", err_detect);
>>+		printk(KERN_ERR "PCIE ERR_CAP_STAT register: 0x%08x\n",
>>+			in_be32(&reg->pex_err_cap_stat));
>>+		printk(KERN_ERR "PCIE ERR_CAP_R0 register: 0x%08x\n",
>>+			in_be32(&reg->pex_err_cap_r0));
>>+		printk(KERN_ERR "PCIE ERR_CAP_R1 register: 0x%08x\n",
>>+			in_be32(&reg->pex_err_cap_r1));
>>+		printk(KERN_ERR "PCIE ERR_CAP_R2 register: 0x%08x\n",
>>+			in_be32(&reg->pex_err_cap_r2));
>>+		printk(KERN_ERR "PCIE ERR_CAP_R3 register: 0x%08x\n",
>>+			in_be32(&reg->pex_err_cap_r3));
>>+	} else {
>>+		/* master aborts can happen during PCI config cycles */
>>+		if (!(err_detect & ~(PCI_EDE_MULTI_ERR | PCI_EDE_MST_ABRT))) {
>>+			out_be32(&reg->pex_err_dr, err_detect);
>>+			return;
>>+		}
>>
>>-	err_detect =3D in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DR);
>>-
>>-	/* master aborts can happen during PCI config cycles */
>>-	if (!(err_detect & ~(PCI_EDE_MULTI_ERR | PCI_EDE_MST_ABRT))) {
>>-		out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DR, err_detect);
>>-		return;
>>+		printk(KERN_ERR "PCI error(s) detected\n");
>>+		printk(KERN_ERR "PCI/X ERR_DR register: 0x%08x\n", err_detect);
>>+		printk(KERN_ERR "PCI/X ERR_ATTRIB register: 0x%08x\n",
>>+		       in_be32(&reg->pex_err_attrib));
>>+		printk(KERN_ERR "PCI/X ERR_ADDR register: 0x%08x\n",
>>+		       in_be32(&reg->pex_err_disr));
>>+		printk(KERN_ERR "PCI/X ERR_EXT_ADDR register: 0x%08x\n",
>>+		       in_be32(&reg->pex_err_ext_addr));
>>+		printk(KERN_ERR "PCI/X ERR_DL register: 0x%08x\n",
>>+		       in_be32(&reg->pex_err_dl));
>>+		printk(KERN_ERR "PCI/X ERR_DH register: 0x%08x\n",
>>+		       in_be32(&reg->pex_err_dh));
>>+
>>+		if (err_detect & PCI_EDE_PERR_MASK)
>>+			edac_pci_handle_pe(pci, pci->ctl_name);
>>+
>>+		if ((err_detect & ~PCI_EDE_MULTI_ERR) & ~PCI_EDE_PERR_MASK)
>>+			edac_pci_handle_npe(pci, pci->ctl_name);
>> 	}
>>
>>-	printk(KERN_ERR "PCI error(s) detected\n");
>>-	printk(KERN_ERR "PCI/X ERR_DR register: %#08x\n", err_detect);
>>-
>>-	printk(KERN_ERR "PCI/X ERR_ATTRIB register: %#08x\n",
>>-	       in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_ATTRIB));
>>-	printk(KERN_ERR "PCI/X ERR_ADDR register: %#08x\n",
>>-	       in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_ADDR));
>>-	printk(KERN_ERR "PCI/X ERR_EXT_ADDR register: %#08x\n",
>>-	       in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_EXT_ADDR));
>>-	printk(KERN_ERR "PCI/X ERR_DL register: %#08x\n",
>>-	       in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DL));
>>-	printk(KERN_ERR "PCI/X ERR_DH register: %#08x\n",
>>-	       in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DH));
>>-
>> 	/* clear error bits */
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DR, err_detect);
>>-
>>-	if (err_detect & PCI_EDE_PERR_MASK)
>>-		edac_pci_handle_pe(pci, pci->ctl_name);
>>-
>>-	if ((err_detect & ~PCI_EDE_MULTI_ERR) & ~PCI_EDE_PERR_MASK)
>>-		edac_pci_handle_npe(pci, pci->ctl_name);
>>+	out_be32(&reg->pex_err_dr, err_detect);
>> }
>>
>> static irqreturn_t mpc85xx_pci_isr(int irq, void *dev_id) @@ -190,7
>>+200,7 @@ static irqreturn_t mpc85xx_pci_isr(int irq, void *dev_id)
>> 	struct mpc85xx_pci_pdata *pdata =3D pci->pvt_info;
>> 	u32 err_detect;
>>
>>-	err_detect =3D in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DR);
>>+	err_detect =3D in_be32(&pdata->pci_reg->pex_err_dr);
>>
>> 	if (!err_detect)
>> 		return IRQ_NONE;
>>@@ -200,11 +210,99 @@ static irqreturn_t mpc85xx_pci_isr(int irq, void
>>*dev_id)
>> 	return IRQ_HANDLED;
>> }
>>
>>+#define MPC85XX_MPIC_EIMR0	0x3910
>>+/*
>>+ * This function is for error interrupt ORed mechanism.
>>+ * This mechanism attaches most functions' error interrupts to interrupt
>>0.
>>+ * And report error interrupt to mpic via interrupt 0.
>>+ * EIMR0 - Error Interrupt Mask Register 0.
>>+ *
>>+ * This function check whether the device support error interrupt ORed
>>+ * mechanism via device tree. If supported, umask pcie error interrupt
>>+ * bit in EIMR0.
>>+ */
>>+static int mpc85xx_err_int_en(struct device *op) {
>>+	u32 *int_cell =3D NULL;
>>+	struct device_node *np =3D NULL;
>>+	void __iomem *mpic_base =3D NULL;
>>+	u32 reg_tmp =3D 0;
>>+	u32 int_len =3D 0;
>>+	struct resource r;
>>+	int res =3D 0;
>>+
>>+	if (!op->of_node)
>>+		return -EINVAL;
>>+	/*
>>+	 * Unmask pcie error interrupt bit in EIMR0
>>+	 * extend interrupt specifier has 4 cells. For the 3rd cell:
>>+	 * 0 -- normal interrupt; 1 -- error interrupt.
>>+	 */
>>+	int_cell =3D (u32 *)of_get_property(op->of_node, "interrupts",
>>&int_len);
>>+	if ((int_len/sizeof(u32)) =3D=3D 4) {
>>+		/* soc has error interrupt integration handling mechanism */
>>+		if (*(int_cell + 2) =3D=3D 1) {
>>+			np =3D of_find_node_by_type(NULL, "open-pic");
>>+
>>+			if (of_address_to_resource(np, 0, &r)) {
>>+				printk(KERN_ERR
>>+				"%s:Failed to map mpic regs\n", __func__);
>>+				of_node_put(np);
>>+				res =3D -ENOMEM;
>>+				goto err;
>>+			}
>>+
>>+			if (!request_mem_region(r.start,
>>+						r.end - r.start + 1, "mpic")) {
>>+				printk(KERN_ERR
>>+				"%s:Error while requesting mem region\n",
>>+					 __func__);
>>+				res =3D -EBUSY;
>>+				goto err;
>>+			}
>>+
>>+			mpic_base =3D ioremap(r.start, r.end - r.start + 1);
>>+			if (!mpic_base) {
>>+				printk(KERN_ERR
>>+				"%s:Unable to map mpic regs\n", __func__);
>>+				res =3D -ENOMEM;
>>+				goto err_ioremap;
>>+			}
>>+
>>+			reg_tmp =3D in_be32(mpic_base + MPC85XX_MPIC_EIMR0);
>>+			out_be32(mpic_base + MPC85XX_MPIC_EIMR0,
>>+				reg_tmp & ~(1 << (31 - *(int_cell + 3))));
>>+			iounmap(mpic_base);
>>+			release_mem_region(r.start, r.end - r.start + 1);
>>+			of_node_put(np);
>>+		}
>>+	}
>>+
>>+	return 0;
>>+err_ioremap:
>>+	release_mem_region(r.start, r.end - r.start + 1);
>>+err:
>>+
>>+	return res;
>>+}
>>+
>>+static int mpc85xx_pcie_find_capability(struct device_node *np) {
>>+	struct pci_controller *hose;
>>+	if (!np)
>>+		return -EINVAL;
>>+
>>+	hose =3D pci_find_hose_for_OF_device(np);
>>+	return early_find_capability(hose, hose->bus->number,
>>+				     0, PCI_CAP_ID_EXP);
>>+}
>>+
>> static int __devinit mpc85xx_pci_err_probe(struct platform_device *op)
>{
>> 	struct edac_pci_ctl_info *pci;
>> 	struct mpc85xx_pci_pdata *pdata;
>> 	struct resource r;
>>+	struct ccsr_pci *reg =3D NULL;
>> 	int res =3D 0;
>>
>> 	if (!devres_open_group(&op->dev, mpc85xx_pci_err_probe, GFP_KERNEL))
>>@@ -217,6 +315,10 @@ static int __devinit mpc85xx_pci_err_probe(struct
>>platform_device *op)
>> 	pdata =3D pci->pvt_info;
>> 	pdata->name =3D "mpc85xx_pci_err";
>> 	pdata->irq =3D NO_IRQ;
>>+
>>+	if (mpc85xx_pcie_find_capability(op->dev.of_node) > 0)
>>+		pdata->pcie_flag =3D 1;
>>+
>> 	dev_set_drvdata(&op->dev, pci);
>> 	pci->dev =3D &op->dev;
>> 	pci->mod_name =3D EDAC_MOD_STR;
>>@@ -235,37 +337,40 @@ static int __devinit mpc85xx_pci_err_probe(struct
>>platform_device *op)
>> 		goto err;
>> 	}
>>
>>-	/* we only need the error registers */
>>-	r.start +=3D 0xe00;
>>-
>> 	if (!devm_request_mem_region(&op->dev, r.start, resource_size(&r),
>> 					pdata->name)) {
>>-		printk(KERN_ERR "%s: Error while requesting mem region\n",
>>-		       __func__);
>>+		printk(KERN_ERR
>>+		"%s:Error while requesting mem region\n", __func__);
>> 		res =3D -EBUSY;
>> 		goto err;
>> 	}
>>
>>-	pdata->pci_vbase =3D devm_ioremap(&op->dev, r.start,
>>resource_size(&r));
>>-	if (!pdata->pci_vbase) {
>>+	pdata->pci_reg =3D devm_ioremap(&op->dev, r.start, resource_size(&r));
>>+	if (!pdata->pci_reg) {
>> 		printk(KERN_ERR "%s: Unable to setup PCI err regs\n",
>>__func__);
>> 		res =3D -ENOMEM;
>> 		goto err;
>> 	}
>>
>>-	orig_pci_err_cap_dr =3D
>>-	    in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_CAP_DR);
>>-
>>-	/* PCI master abort is expected during config cycles */
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_CAP_DR, 0x40);
>>+	if (mpc85xx_err_int_en(&op->dev) < 0)
>>+		goto err;
>>
>>-	orig_pci_err_en =3D in_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_EN);
>>+	reg =3D pdata->pci_reg;
>>+	/* disable pci/pcie error detect */
>>+	if (pdata->pcie_flag) {
>>+		pdata->orig_pci_err_dr =3D  in_be32(&reg->pex_err_disr);
>>+		out_be32(&reg->pex_err_disr, ~0);
>>+	} else {
>>+		pdata->orig_pci_err_dr =3D  in_be32(&reg->pex_err_cap_dr);
>>+		out_be32(&reg->pex_err_cap_dr, ~0);
>>+	}
>>
>>-	/* disable master abort reporting */
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_EN, ~0x40);
>>+	/* disable all pcie error interrupt */
>>+	pdata->orig_pci_err_en =3D in_be32(&reg->pex_err_en);
>>+	out_be32(&reg->pex_err_en, 0);
>>
>>-	/* clear error bits */
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_DR, ~0);
>>+	/* clear all error bits */
>>+	out_be32(&reg->pex_err_dr, ~0);
>>
>> 	if (edac_pci_add_device(pci, pdata->edac_idx) > 0) {
>> 		debugf3("%s(): failed edac_pci_add_device()\n", __func__); @@
>>-275,7 +380,7 @@ static int __devinit mpc85xx_pci_err_probe(struct
>>platform_device *op)
>> 	if (edac_op_state =3D=3D EDAC_OPSTATE_INT) {
>> 		pdata->irq =3D irq_of_parse_and_map(op->dev.of_node, 0);
>> 		res =3D devm_request_irq(&op->dev, pdata->irq,
>>-				       mpc85xx_pci_isr, IRQF_DISABLED,
>>+				       mpc85xx_pci_isr, IRQF_SHARED,
>> 				       "[EDAC] PCI err", pci);
>> 		if (res < 0) {
>> 			printk(KERN_ERR
>>@@ -290,6 +395,17 @@ static int __devinit mpc85xx_pci_err_probe(struct
>>platform_device *op)
>> 		       pdata->irq);
>> 	}
>>
>>+	if (pdata->pcie_flag) {
>>+		/* enable all pcie error interrupt & error detect */
>>+		out_be32(&reg->pex_err_en, ~0);
>>+		out_be32(&reg->pex_err_disr, 0);
>>+	} else {
>>+		/* PCI master abort is expected during config cycles */
>>+		out_be32(&reg->pex_err_cap_dr, PCI_ERR_CAP_DR_DIS_MST);
>>+		/* disable master abort reporting */
>>+		out_be32(&reg->pex_err_en, PCI_ERR_EN_DIS_MST);
>>+	}
>>+
>> 	devres_remove_group(&op->dev, mpc85xx_pci_err_probe);
>> 	debugf3("%s(): success\n", __func__);
>> 	printk(KERN_INFO EDAC_MOD_STR " PCI err registered\n"); @@ -311,10
>>+427,13 @@ static int mpc85xx_pci_err_remove(struct platform_device *op)
>>
>> 	debugf0("%s()\n", __func__);
>>
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_CAP_DR,
>>-		 orig_pci_err_cap_dr);
>>+	if (pdata->pcie_flag)
>>+		out_be32(&pdata->pci_reg->pex_err_disr, pdata-
>>>orig_pci_err_dr);
>>+	else
>>+		out_be32(&pdata->pci_reg->pex_err_cap_dr,
>>+					pdata->orig_pci_err_dr);
>>
>>-	out_be32(pdata->pci_vbase + MPC85XX_PCI_ERR_EN, orig_pci_err_en);
>>+	out_be32(&pdata->pci_reg->pex_err_en, pdata->orig_pci_err_en);
>>
>> 	edac_pci_del_device(pci->dev);
>>
>>@@ -333,6 +452,12 @@ static struct of_device_id mpc85xx_pci_err_of_match[=
]
>>=3D {
>> 	{
>> 	 .compatible =3D "fsl,mpc8540-pci",
>> 	},
>>+	{
>>+	 .compatible =3D "fsl,mpc8548-pcie",
>>+	},
>>+	{
>>+	 .compatible =3D "fsl,p4080-pcie",
>>+	},
>> 	{},
>> };
>> MODULE_DEVICE_TABLE(of, mpc85xx_pci_err_of_match); diff --git
>>a/drivers/edac/mpc85xx_edac.h b/drivers/edac/mpc85xx_edac.h index
>>932016f..d0e7b11 100644
>>--- a/drivers/edac/mpc85xx_edac.h
>>+++ b/drivers/edac/mpc85xx_edac.h
>>@@ -131,16 +131,8 @@
>> #define PCI_EDE_PERR_MASK	(PCI_EDE_TGT_PERR | PCI_EDE_MST_PERR | \
>> 				PCI_EDE_ADDR_PERR)
>>
>>-#define MPC85XX_PCI_ERR_DR		0x0000
>>-#define MPC85XX_PCI_ERR_CAP_DR		0x0004
>>-#define MPC85XX_PCI_ERR_EN		0x0008
>>-#define MPC85XX_PCI_ERR_ATTRIB		0x000c
>>-#define MPC85XX_PCI_ERR_ADDR		0x0010
>>-#define MPC85XX_PCI_ERR_EXT_ADDR	0x0014
>>-#define MPC85XX_PCI_ERR_DL		0x0018
>>-#define MPC85XX_PCI_ERR_DH		0x001c
>>-#define MPC85XX_PCI_GAS_TIMR		0x0020
>>-#define MPC85XX_PCI_PCIX_TIMR		0x0024
>>+#define PCI_ERR_CAP_DR_DIS_MST         0x40
>>+#define PCI_ERR_EN_DIS_MST             (~0x40)
>>
>> struct mpc85xx_mc_pdata {
>> 	char *name;
>>@@ -159,8 +151,11 @@ struct mpc85xx_l2_pdata {  struct mpc85xx_pci_pdata
>{
>> 	char *name;
>> 	int edac_idx;
>>-	void __iomem *pci_vbase;
>> 	int irq;
>>+	struct ccsr_pci *pci_reg;
>>+	u8 pcie_flag;
>>+	u32 orig_pci_err_dr;
>>+	u32 orig_pci_err_en;
>> };
>>
>> #endif
>>--
>>1.6.4

^ permalink raw reply

* RE: [PATCH 3/4] powerpc/85xx: Merge PCI/PCI Express error management registers
From: Xie Shaohui-B21989 @ 2011-08-03  9:57 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org; +Cc: Jiang Kai-B18973, Xie Shaohui-B21989
In-Reply-To: <1311244195-4418-1-git-send-email-Shaohui.Xie@freescale.com>

Hi all,

Any concerns of this patch?


Best Regards,=20
Shaohui Xie=20


>-----Original Message-----
>From: Xie Shaohui-B21989
>Sent: Tuesday, July 26, 2011 2:50 PM
>To: Kumar Gala
>Cc: Jiang Kai-B18973; linuxppc-dev@lists.ozlabs.org
>Subject: RE: [PATCH 3/4] powerpc/85xx: Merge PCI/PCI Express error
>management registers
>
>
>
>>-----Original Message-----
>>From: Xie Shaohui-B21989
>>Sent: Thursday, July 21, 2011 6:30 PM
>>To: linuxppc-dev@lists.ozlabs.org
>>Cc: Gala Kumar-B11780; Jiang Kai-B18973; Kumar Gala; Xie Shaohui-B21989
>>Subject: [PATCH 3/4] powerpc/85xx: Merge PCI/PCI Express error
>>management registers
>>
>>From: Kai.Jiang <Kai.Jiang@freescale.com>
>>
>>There are some differences of register offset and definition between
>>pci and pcie error management registers. While, some other pci/pcie
>>error management registers are nearly the same.
>>
>>Signed-off-by: Kai.Jiang <Kai.Jiang@freescale.com>
>>Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
>>Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
>>---
>> arch/powerpc/sysdev/fsl_pci.h |   31 +++++++++++++++++++++++++------
>> 1 files changed, 25 insertions(+), 6 deletions(-)
>>
>>difg --gite a/arch/powerpc/sysdev/fsl_pci.h
>>b/arch/powerpc/sysdev/fsl_pci.h index a39ed5c..60a76e9 100644
>>--- a/arch/powerpc/sysdev/fsl_pci.h
>>+++ b/arch/powerpc/sysdev/fsl_pci.h
>>@@ -74,13 +74,32 @@ struct ccsr_pci {
>>  */
>> 	struct pci_inbound_window_regs piw[4];
>>
>>+/* Merge PCI/PCI Express error management registers */
>> 	__be32	pex_err_dr;		/* 0x.e00 - PCI/PCIE error detect
>>register */
>>-	u8	res21[4];
>>-	__be32	pex_err_en;		/* 0x.e08 - PCI/PCIE error interrupt
>>enable register */
>>-	u8	res22[4];
>>-	__be32	pex_err_disr;		/* 0x.e10 - PCI/PCIE error
>>disable register */
>>-	u8	res23[12];
>>-	__be32	pex_err_cap_stat;	/* 0x.e20 - PCI/PCIE error capture
>>status register */
>>+	__be32	pex_err_cap_dr;		/* 0x.e04 */
>>+					/* - PCI error capture disabled register */
>>+					/* - PCIE has no this register */
>>+	__be32	pex_err_en;		/* 0x.e08 */
>>+					/* - PCI/PCIE error interrupt enable
>>register*/
>>+	__be32	pex_err_attrib;		/* 0x.e0c */
>>+					/* - PCI error attributes capture register
>>*/
>>+					/* - PCIE has no this register */
>>+	__be32	pex_err_disr;		/* 0x.e10 */
>>+					/* - PCI error address capture register */
>>+					/* - PCIE error disable register */
>>+	__be32	pex_err_ext_addr;	/* 0x.e14 */
>>+					/* - PCI error extended addr capture
>>register*/
>>+					/* - PCIE has no this register */
>>+	__be32	pex_err_dl;		/* 0x.e18 */
>>+					/* - PCI error data low capture register */
>>+					/* - PCIE has no this register */
>>+	__be32	pex_err_dh;		/* 0x.e1c */
>>+					/* - PCI error data high capture register */
>>+					/* - PCIE has no this register */
>>+	__be32	pex_err_cap_stat;	/* 0x.e20 */
>>+					/* - PCI gasket timer register */
>>+					/* - PCIE error capture status register */
>>+
>> 	u8	res24[4];
>> 	__be32	pex_err_cap_r0;		/* 0x.e28 - PCIE error capture
>>register 0 */
>> 	__be32	pex_err_cap_r1;		/* 0x.e2c - PCIE error capture
>>register 0 */
>>--
>>1.6.4
>[Xie Shaohui] I've verified this patch can apply for galak/powerpc.git
>'next' branch with no change.
>
>
>Best Regards,
>Shaohui Xie

^ permalink raw reply

* [PATCH] mtd-utils: fix corrupt cleanmarker with flash_erase -j command
From: b35362 @ 2011-08-03  5:50 UTC (permalink / raw)
  To: dwmw2, linuxppc-dev; +Cc: Liu Shuo, linuxppc-dev, linux-mtd

From: Liu Shuo <b35362@freescale.com>

Flash_erase -j should fill discrete freeoob areas with required bytes
of JFFS2 cleanmarker in jffs2_check_nand_cleanmarker(). Not just fill
the first freeoob area.

Signed-off-by: Liu Shuo <b35362@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
---
 flash_erase.c |   41 +++++++++++++++++++++++++++++++++++------
 1 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/flash_erase.c b/flash_erase.c
index fe2eaca..e6747fc 100644
--- a/flash_erase.c
+++ b/flash_erase.c
@@ -98,6 +98,7 @@ int main(int argc, char *argv[])
 	int isNAND;
 	int error = 0;
 	uint64_t offset = 0;
+	void *oob_data = NULL;
 
 	/*
 	 * Process user arguments
@@ -197,15 +198,40 @@ int main(int argc, char *argv[])
 			if (ioctl(fd, MEMGETOOBSEL, &oobinfo) != 0)
 				return sys_errmsg("%s: unable to get NAND oobinfo", mtd_device);
 
+			cleanmarker.totlen = cpu_to_je32(8);
 			/* Check for autoplacement */
 			if (oobinfo.useecc == MTD_NANDECC_AUTOPLACE) {
+				struct nand_ecclayout_user ecclayout;
 				/* Get the position of the free bytes */
-				if (!oobinfo.oobfree[0][1])
+				if (ioctl(fd, ECCGETLAYOUT, &ecclayout) != 0)
+					return sys_errmsg("%s: unable to get NAND ecclayout", mtd_device);
+
+				if (!ecclayout.oobavail)
 					return errmsg(" Eeep. Autoplacement selected and no empty space in oob");
 				clmpos = oobinfo.oobfree[0][0];
-				clmlen = oobinfo.oobfree[0][1];
-				if (clmlen > 8)
-					clmlen = 8;
+				clmlen = MIN(ecclayout.oobavail, 8);
+
+				if (oobinfo.oobfree[0][1] < 8 && ecclayout.oobavail >= 8) {
+					int i, left, n, last = 0;
+					void *cm;
+
+					oob_data = malloc(mtd.oob_size);
+					if (!oob_data)
+						return -ENOMEM;
+
+					memset(oob_data, 0xff, mtd.oob_size);
+					cm = &cleanmarker;
+					for (i = 0, left = clmlen; left ; i++) {
+						n = MIN(left, oobinfo.oobfree[i][1]);
+						memcpy(oob_data + oobinfo.oobfree[i][0],
+								cm, n);
+						left -= n;
+						cm   += n;
+						last = oobinfo.oobfree[i][0] + n;
+					}
+
+					clmlen = last - clmpos;
+				}
 			} else {
 				/* Legacy mode */
 				switch (mtd.oob_size) {
@@ -223,7 +249,6 @@ int main(int argc, char *argv[])
 						break;
 				}
 			}
-			cleanmarker.totlen = cpu_to_je32(8);
 		}
 		cleanmarker.hdr_crc = cpu_to_je32(mtd_crc32(0, &cleanmarker, sizeof(cleanmarker) - 4));
 	}
@@ -272,7 +297,8 @@ int main(int argc, char *argv[])
 
 		/* write cleanmarker */
 		if (isNAND) {
-			if (mtd_write_oob(mtd_desc, &mtd, fd, offset + clmpos, clmlen, &cleanmarker) != 0) {
+			void *data = oob_data ? oob_data + clmpos : &cleanmarker;
+			if (mtd_write_oob(mtd_desc, &mtd, fd, offset + clmpos, clmlen, data) != 0) {
 				sys_errmsg("%s: MTD writeoob failure", mtd_device);
 				continue;
 			}
@@ -291,5 +317,8 @@ int main(int argc, char *argv[])
 	show_progress(&mtd, offset, eb, eb_start, eb_cnt);
 	bareverbose(!quiet, "\n");
 
+	if (oob_data)
+		free(oob_data);
+
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply related

* Re: Fwd: MPC7410 Linux Kernel
From: tiejun.chen @ 2011-08-03  6:42 UTC (permalink / raw)
  To: Vineeth; +Cc: linuxppc-dev
In-Reply-To: <CAFbQSaDhEpSqX4GQHjSTsmpwv3a36RE7WrXbn7ni=NVcTtVyiw@mail.gmail.com>

Vineeth wrote:
> Thanks for the reply.
> 
> We were referring kuroboxHG.dts which uses Sandpoint architecture; which is
> almost same as ours.
> 
> 1. one doubt in kuroboxHG is the ranges property in SOC node says EUMB is at
> 0xFC00_0000; and as per the datasheet of mpc107, the open pic address will
	
And for powerpc looks u-boot should initialize MPC107 and EUMB_ADDR should be
configured as well. And kernel doesn't reconfigure EUMB again. This is different
from original ppc implementation.

So I think you should check how u-boot did. Note not all targets use 0xfc000000
as EUMB_ADDR. At least for linkstation 0x8000000 is set as EUMB_ADDR. And so
maybe this value, 0xFC000000, should be one typo since kernel don't use this
node property as I previously said.

> be EUMB_BASE + 0x40000; but in kurobox its given as 0x80040000;
> 
> 2. We know that our UART is mapped at address 0xDB00_0100; which is
> connected in a PCI-LOCAL bridge whose base is at 0xDB00_0000
> How can i represent these things in dts ? Can the RANGES property of PCI
> node can mention this ?

How do u-boot resided on your target set EUMB_ADDR? Then you can migrate
kuroboxHG.dts with this base address.

Tiejun

> 
> 
> 
> 
> 
> 
> 
> On Tue, Aug 2, 2011 at 1:15 PM, tiejun.chen <tiejun.chen@windriver.com>wrote:
> 
>> Vineeth wrote:
>>> Hi,
>>>
>>> We are trying to port  linux 2.6.38 on MPC7410 based board (This is a
>>> preparatory design by our customer)
>>>
>>> System architecture is as follows,
>>>
>>> MPC7410 <=> MPC107 <=> PCI_to_LOCAL(plx9052) <=> UART
>> MPCXXX should be compatible with TSIXXX. So you can refer to mpc7448_hpc2.
>>
>>> Previously we were using ppc architecture and we had some issues with
>>> page_init() functions; which may be because of our configuration.As we
>> didnt
>>> get much support on ppc architecture we moved to powerpc.
>>>
>>> Now we moved to powerpc architecture. We have some doubts on writing the
>> dts
>>> file. Please find the dts file attached.
>>>
>>> when we checked the legacy_serial.c file, we found that
>>> legacy_serial_parents not expecting a pci-local or a pci bridge as
>> parent.
>>> is our understanding correct ? should we introduce a new pci parent in
>> that
>>> structure ?
>> So you can understand this after refer to the file,
>> arch/powerpc/boot/dts/mpc7448hpc2.dts.
>>
>> Tiejun
>>
>>>  We are confused about writing the ranges property of PCI node.we were
>>> referring booting_without_of doc but didnt get much info. Is there any
>> file
>>> which gives better idea about the ranges property ?
>>>
>>> Thanks
>>>  Vineeth
>>
> 

^ permalink raw reply

* RE: [PATCH v2] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Li Yang-R58472 @ 2011-08-03  6:15 UTC (permalink / raw)
  To: Liu Shuo-B35362, dwmw2@infradead.org, dedekind1@gmail.com
  Cc: linuxppc-dev@ozlabs.org, linux-mtd@lists.infradead.org
In-Reply-To: <1310446122-18050-1-git-send-email-b35362@freescale.com>

>-----Original Message-----
>From: Liu Shuo-B35362
>Sent: Tuesday, July 12, 2011 12:49 PM
>To: dwmw2@infradead.org
>Cc: linux-mtd@lists.infradead.org; linuxppc-dev@ozlabs.org; Liu Shuo-
>B35362; Li Yang-R58472
>Subject: [PATCH v2] mtd/nand : workaround for Freescale FCM to support
>large-page Nand chip
>
>From: Liu Shuo <b35362@freescale.com>
>
>Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>to support the Nand flash chip whose page size is larger than 2K bytes, we
>divide a page into multi-2K pages for MTD layer driver. In that case, we
>force to set the page size to 2K bytes. We convert the page address of MTD
>layer driver to a real page address in flash chips and a column index in
>fsl_elbc driver. We can issue any column address by UA instruction of elbc
>controller.
>
>Signed-off-by: Liu Shuo <b35362@freescale.com>
>Signed-off-by: Li Yang <leoli@freescale.com>
>---

Hi David and Artem,

We have fixed the multi-line comment style problem.  Could you help to pick=
 the patch?

- Leo

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-03  3:44 UTC (permalink / raw)
  To: David Gibson
  Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, aafabbri, iommu,
	Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <20110803020422.GF29719@yookeroo.fritz.box>

On Wed, 2011-08-03 at 12:04 +1000, David Gibson wrote:
> On Tue, Aug 02, 2011 at 12:35:19PM -0600, Alex Williamson wrote:
> > On Tue, 2011-08-02 at 12:14 -0600, Alex Williamson wrote:
> > > On Tue, 2011-08-02 at 18:28 +1000, David Gibson wrote:
> > > > On Sat, Jul 30, 2011 at 12:20:08PM -0600, Alex Williamson wrote:
> > > > > On Sat, 2011-07-30 at 09:58 +1000, Benjamin Herrenschmidt wrote:
> > > > [snip]
> > > > > On x86, the USB controllers don't typically live behind a PCIe-to-PCI
> > > > > bridge, so don't suffer the source identifier problem, but they do often
> > > > > share an interrupt.  But even then, we can count on most modern devices
> > > > > supporting PCI2.3, and thus the DisINTx feature, which allows us to
> > > > > share interrupts.  In any case, yes, it's more rare but we need to know
> > > > > how to handle devices behind PCI bridges.  However I disagree that we
> > > > > need to assign all the devices behind such a bridge to the guest.
> > > > > There's a difference between removing the device from the host and
> > > > > exposing the device to the guest.
> > > > 
> > > > I think you're arguing only over details of what words to use for
> > > > what, rather than anything of substance here.  The point is that an
> > > > entire partitionable group must be assigned to "host" (in which case
> > > > kernel drivers may bind to it) or to a particular guest partition (or
> > > > at least to a single UID on the host).  Which of the assigned devices
> > > > the partition actually uses is another matter of course, as is at
> > > > exactly which level they become "de-exposed" if you don't want to use
> > > > all of then.
> > > 
> > > Well first we need to define what a partitionable group is, whether it's
> > > based on hardware requirements or user policy.  And while I agree that
> > > we need unique ownership of a partition, I disagree that qemu is
> > > necessarily the owner of the entire partition vs individual devices.
> > 
> > Sorry, I didn't intend to have such circular logic.  "... I disagree
> > that qemu is necessarily the owner of the entire partition vs granted
> > access to devices within the partition".  Thanks,
> 
> I still don't understand the distinction you're making.  We're saying
> the group is "owned" by a given user or guest in the sense that no-one
> else may use anything in the group (including host drivers).  At that
> point none, some or all of the devices in the group may actually be
> used by the guest.
> 
> You seem to be making a distinction between "owned by" and "assigned
> to" and "used by" and I really don't see what it is.

How does a qemu instance that uses none of the devices in a group still
own that group?  Aren't we at that point free to move the group to a
different qemu instance or return ownership to the host?  Who does that?
In my mental model, there's an intermediary that "owns" the group and
just as kernel drivers bind to devices when the host owns the group,
qemu is a userspace device driver that binds to sets of devices when the
intermediary owns it.  Obviously I'm thinking libvirt, but it doesn't
have to be.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 3/3] KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code
From: Paul Mackerras @ 2011-08-03  3:31 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <4E380DEC.8030803@suse.de>

On Tue, Aug 02, 2011 at 04:47:08PM +0200, Alexander Graf wrote:

> >  int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
> >  {
> >-	if (irq->irq == KVM_INTERRUPT_UNSET)
> >+	if (irq->irq == KVM_INTERRUPT_UNSET) {
> >  		kvmppc_core_dequeue_external(vcpu, irq);
> >-	else
> >-		kvmppc_core_queue_external(vcpu, irq);
> >+		return 0;
> >+	}
> 
> Not sure I understand this part. Mind to explain?

It's a micro-optimization - we don't really need to wake up or
interrupt the vcpu thread when we're clearing the interrupt.
Unless of course I'm missing something... :)

> 
> Alex
> 
> >+
> >+	kvmppc_core_queue_external(vcpu, irq);
> >
> >-	if (waitqueue_active(&vcpu->wq)) {
> >-		wake_up_interruptible(&vcpu->wq);
> >+	if (waitqueue_active(vcpu->arch.wqp)) {
> >+		wake_up_interruptible(vcpu->arch.wqp);
> >  		vcpu->stat.halt_wakeup++;
> >  	} else if (vcpu->cpu != -1) {
> >  		smp_send_reschedule(vcpu->cpu);

Paul.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox