Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 03/11] drivers: soc: hisi: Add support for Hisilicon Djtag driver
From: Arnd Bergmann @ 2016-11-09 21:40 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <5822A5F6.9040806@gmail.com>

On Wednesday, November 9, 2016 9:58:38 AM CET Anurup M wrote:
> 
> > I also see that the compatible strings have the version included in
> > them, and you can probably drop them by requiring them only in the
> > fallback:
> >
> >       compatible = "hisilicon,hip05-cpu-djtag", "hisilicon,djtag-v1";
> >       compatible = "hisilicon,hip05-io-djtag", "hisilicon,djtag-v1";
> >       compatible = "hisilicon,hip06-cpu-djtag", "hisilicon,djtag-v1";
> >       compatible = "hisilicon,hip06-io-djtag", "hisilicon,djtag-v2";
> >       compatible = "hisilicon,hip07-cpu-djtag", "hisilicon,djtag-v2";
> >       compatible = "hisilicon,hip07-io-djtag", "hisilicon,djtag-v2";
> >
> > We want to have the first entry be as specific as possible, but
> > the last (second) entry is the one that can be used by the driver
> > for matching. When a future hip08/hip09/... chip uses an existing
> > interface, you then don't have to update the driver.
> Thanks. I had a similar thought on this. So as I have the version string 
> in the
> second entry "-v(1/2)".
> I can use it in driver for matching. So i think  I will change it as below.
> Please correct me if my understanding is wrong.
> 
>   static const struct of_device_id djtag_of_match[] = {
> -       /* for hip05(D02) cpu die */
> -       { .compatible = "hisilicon,hip05-cpu-djtag-v1",
> +       /* for hisi djtag-v1 cpu die */
> +       { .compatible = "hisilicon,hisi-cpu-djtag-v1",
>                  .data = djtag_readwrite_v1 },
> -       /* for hip05(D02) io die */
> -       { .compatible = "hisilicon,hip05-io-djtag-v1",
> +       /* for hisi djtag-v1 io die */
> +       { .compatible = "hisilicon,hisi-io-djtag-v1",

>From the code it looks like "hisilicon,hisi-io-djtag-v1" and
"hisilicon,hisi-cpu-djtag-v1" have the same register-level interface,
so we just need one compatible string for them to match the driver.

	Arnd

^ permalink raw reply

* [PATCH V5 2/3] ARM64 LPC: Add missing range exception for special ISA
From: Arnd Bergmann @ 2016-11-09 21:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109135453.2e5402bd@lxorguk.ukuu.org.uk>

On Wednesday, November 9, 2016 1:54:53 PM CET One Thousand Gnomes wrote:
> > I think it is a relatively safe assumption that there is only one
> > ISA bridge. A lot of old drivers hardcode PIO or memory addresses
> 
> It's not a safe assumption for x86 at least. There are a few systems with
> multiple ISA busses particularly older laptops with a docking station.

But do they have multiple ISA domains? There is no real harm in supporting
it, the (small) downsides I can think of are:

- a few extra cycles for the lookup, from possibly walking a linked list
  to find the correct set of helpers and MMIO addresses
- making it too general could invite more people to design hardware
  around the infrastructure when we really want them to stop adding
  stuff like this.

	Arnd

^ permalink raw reply

* [PATCH v2 0/8] Support TPS65217 PMIC interrupt in DT
From: Tony Lindgren @ 2016-11-09 21:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161028123702.21849-1-woogyom.kim@gmail.com>

* Milo Kim <woogyom.kim@gmail.com> [161028 05:38]:
> TPS65217 interrupt events include push button pressed/released, USB and AC 
> voltage status change. AM335x bone based boards (like BB, BBB, BBG) have 
> common PMIC interrupt pin (named NMI) of AM335x core.
> 
> This patchset support interrupts in device tree file.

Applying into omap-for-v4.10/dt except the last patch I'll apply into
omap-for-v4.10/fixes-not-urgent.

Regards,

Tony

^ permalink raw reply

* [PATCH V5 3/3] ARM64 LPC: LPC driver implementation on Hip06
From: Arnd Bergmann @ 2016-11-09 21:34 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <EE11001F9E5DDD47B7634E2F8A612F2E1F8F1A7A@lhreml507-mbx>

On Wednesday, November 9, 2016 12:10:43 PM CET Gabriele Paoloni wrote:
> > On Tuesday, November 8, 2016 11:47:09 AM CET zhichang.yuan wrote:
> > > +       /*
> > > +        * The first PCIBIOS_MIN_IO is reserved specifically for
> > indirectIO.
> > > +        * It will separate indirectIO range from pci host bridge to
> > > +        * avoid the possible PIO conflict.
> > > +        * Set the indirectIO range directly here.
> > > +        */
> > > +       lpcdev->io_ops.start = 0;
> > > +       lpcdev->io_ops.end = PCIBIOS_MIN_IO - 1;
> > > +       lpcdev->io_ops.devpara = lpcdev;
> > > +       lpcdev->io_ops.pfin = hisilpc_comm_in;
> > > +       lpcdev->io_ops.pfout = hisilpc_comm_out;
> > > +       lpcdev->io_ops.pfins = hisilpc_comm_ins;
> > > +       lpcdev->io_ops.pfouts = hisilpc_comm_outs;
> > 
> > I have to look at patch 2 in more detail again, after missing a few
> > review
> > rounds. I'm still a bit skeptical about hardcoding a logical I/O port
> > range here, and would hope that we can just go through the same
> > assignment of logical port ranges that we have for PCI buses,
> > decoupling
> > the bus addresses from the linux-internal ones.
> 
> The point here is that we want to avoid any conflict/overlap between
> the LPC I/O space and the PCI I/O space. With the assignment above
> we make sure that LPC never interfere with PCI I/O space.

But we already abstract the PCI I/O space using dynamic registration.
There is no need to hardcode the logical address for ISA, though
I think we can hardcode the bus address to start at zero here.

	Arnd

^ permalink raw reply

* [PATCH V5 1/3] ARM64 LPC: Indirect ISA port IO introduced
From: Arnd Bergmann @ 2016-11-09 21:33 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <6dee9821-5145-2113-e391-6317e4533c06@huawei.com>

On Wednesday, November 9, 2016 11:29:46 AM CET John Garry wrote:
> On 08/11/2016 22:35, Arnd Bergmann wrote:
> > On Tuesday, November 8, 2016 4:49:49 PM CET Will Deacon wrote:
> >> On Tue, Nov 08, 2016 at 04:33:44PM +0000, John Garry wrote:
> >>> On 08/11/2016 16:12, Will Deacon wrote:
> >>>> On Tue, Nov 08, 2016 at 11:47:07AM +0800, zhichang.yuan wrote:
> >
> >>>> Is there no way to make this slightly more generic, so that it can be
> >>>> re-used elsewhere? For example, if struct extio_ops was common, then
> >>>> you could have the singleton (which maybe should be an interval tree?),
> >>>> type definition, setter function and the BUILD_EXTIO invocations
> >>>> somewhere generic, rather than squirelled away in the arch backend.
> >>>>
> >>> The concern would be that some architecture which uses generic higher-level
> >>> ISA accessor ops, but have IO space, could be affected.
> >>
> >> You're already adding a Kconfig symbol for this stuff, so you can keep
> >> that if you don't want it on other architectures. I'm just arguing that
> >> plumbing drivers directly into arch code via arm64_set_extops is not
> >> something I'm particularly fond of, especially when it looks like it
> >> could be avoided with a small amount of effort.
> >
> > Agreed, I initially suggested putting this into arch/arm64/, but there isn't
> > really a reason why it couldn't just live in lib/ with the header file
> > bits moved to include/asm-generic/io.h which we already use.
> >
> 
> Right, Zhichang will check the logistics of this. The generic io.h is 
> quite clean, so as long as you don't mind new build switches of this 
> nature being added, it should be ok; and we'll plan on moving extio.h 
> into include/asm-generic as well.

I think all we need is an #ifdef CONFIG_something around the existing
defintion, with the alternative being "extern" declarations, after that
all the interesting logic can sit in a file in lib/.

	Arnd

^ permalink raw reply

* [alsa-devel] [PATCH 2/9] ALSA: ac97: add an ac97 bus
From: Robert Jarzmik @ 2016-11-09 21:27 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <5ae15e01-eaaf-4996-27ec-1179041c0268@metafoo.de>

Lars-Peter Clausen <lars@metafoo.de> writes:

> On 11/08/2016 10:18 PM, Robert Jarzmik wrote:
>>> I'd make the controller itself a struct dev, rather than just having the
>>> pointer to the parent. This is more idiomatic and matches what other
>>> subsystems do. It has several advantages, you get proper refcounting of your
>>> controller structure, the controller gets its own sysfs directory where the
>>> CODECs appear as children, you don't need the manual sysfs attribute
>>> creation and removal in ac97_controler_{un,}register().
>> 
>> If you mean having "struct device dev" instead of "struct device *dev", it has
>> also a drawback : all the legacy platforms have already a probing method, be
>> that platform data, device-tree or something else.
>> 
>> I'm a bit converned about the conversion toll. Maybe I've not understood your
>> point fully, so please feel free to explain, with an actual example even better.
>
> This would be a struct device that is not bound to a driver, but just acts
> as a shell for the controller and places it inside the device hierarchy. You
> get reference counting and other management functions as well as a
> consistent naming scheme. E.g. you can call the devices ac97c%d (or
> something similar) and then call the CODEC ac97c%d.%d.
Let me think about it.
If I get you right, the device model would be, from a parenthood point of view:
             pxa2xx_ac97 (platform_device)
                  ^
                  |
             ac97_controller.dev (device, son of pxa2xx_ac97)
                 / \
                /   \
               /     \
              /       \
             /         \
         wm97xx-core  codec2     (sons ac97_controller.dev)

I have already the device hierarchy AFAIK, created in ac97_codec_add(). I'm
still failing to see the difference, as the refcounting is already used. The
only additional thing I see is the introduction of an intermediate
ac97_controller.dev which is refcounted.

In current model, pxa2xx_ac97 refcount is incremented for each codec device. In
the one you propose, pxa2xx_ac97 will be 1 refcounted (maybe 2 actually), and
ac97_controller.dev will be refcounted for each codec. This ac97_controller.dev
will be a spiritual twin of spi_master/i2c_adapter.

As I said, I must think about it and find which value is brought by this
additionnal layer.

As for the name change, I must check if this breaks the sound machine files, and
their dai_link structures (ex: mioa701_wm9713.c, structure mioa701_dai). In a
former state of this patchset I had changed the device names, ie. wm9713-codec
became pxa2xx-ac97.0, and the my machine file became broken.

> This is how most frameworks implementing some kind of control bus are
> structured in the Linux kernel. E.g. take a look at I2C or SPI.
I will.

>>>> +	ret = sysfs_create_link(&codec->dev.kobj, &ac97_ctrl->dev->kobj,
>>>> +				"ac97_controller");
>>>
>>> Since the CODEC is a child of the controller this should not be necessary as
>>> this just points one directory up. It's like `ln -s .. parent`
>> This creates :
>> /sys/bus/ac97/devices/pxa2xx-ac97\:0/ac97_controller
>> 
>> Of course as you pointed out, it's a 'ln -s .. parent' like link, but it seems
>> to me very unfriendly to have :
>>  - /sys/bus/ac97/devices/pxa2xx-ac97\:0/../ac97_operations
>>  - while /sys/bus/ac97/devices/ac97_operations doesn't exist (obviously)
>> 
>> Mmm, I don't know for this one, my mind is not set, it's a bit hard to tell if
>> it's only an unecessary link bringing "comfort", or something usefull.
>
> It is additional ABI and we do not have this for any other bus either (e.g.
> you don't see a backlink for a I2C or SPI device to the parent). In my
> opinion for the sake of keeping things consistent and simple we should not
> add this.
Fair enough.

>>>> +int snd_ac97_codec_driver_register(struct ac97_codec_driver *drv)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	drv->driver.bus = &ac97_bus_type;
>>>> +
>>>> +	ret = driver_register(&drv->driver);
>>>> +	if (!ret)
>>>> +		ac97_rescan_all_controllers();
>>>
>>> Rescanning the bus when a new codec driver is registered should not be
>>> neccessary. The bus is scanned once when the controller is registered, this
>>> creates the device. The device driver core will take care of binding the
>>> device to the driver, if the driver is registered after thed evice.
>> That's because you suppose the initial scanning found all the ac97 codecs.
>> But that's an incomplete vision as a codec might be powered off at that time and
>> not seen by the first scanning, while the new scanning will discover it.
>
> But why would the device become suddenly visible when the driver is
> registered. This seems to be an as arbitrary point in time as any other.
Because in the meantime, a regulator or something else provided power to the
AC97 IP, or a clock, and it can answer to requests after that.

And another point if that the clock of controller might be provided by the AC97
codec, and the scan will only succeed once the codec is actually providing this
clock, which it might do only once the module_init() is done.

> Also consider that when you build a driver as a module, the module will
> typically only be auto-loaded after the device has been detected, based on
> the device ID. On the other hand, if the driver is built-in driver
> registration will happen either before or shortly after the controller
> registration.
> If there is the expectation that the AC97 CODEC might randomly appear on the
> bus, the core should periodically scan the bus.
Power wise a periodical scan doesn't look good, at all.

More globally, I don't see if there is an actual issue we're trying to address
here, ie. that the rescan is a bug, or if it's more an "Occam's razor"
discussion ?

>>> In my opinion this should return a handle to a ac97 controller which can
>>> then be passed to snd_ac97_controller_unregister(). This is in my opinion
>>> the better approach rather than looking up the controller by parent device.
>> This is another "legacy drivers" issue.
>> 
>> The legacy driver (the one probed by platform_data or devicetree) doesn't
>> necessarily have a private structure to store this ac97_controller pointer.
>
> I might be missing something, but I'm not convinced by this. Can you point
> me to such a legacy driver where you think this would not work?
The first one that popped out:
 - hac_soc_platform_probe()

Cheers.

-- 
Robert

^ permalink raw reply

* [PATCH v2 0/7] soc: renesas: Identify SoC and register with the SoC bus
From: Arnd Bergmann @ 2016-11-09 21:12 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAMuHMdVrdjqWruY=dBjWP=jZuZkc=aTBTMijOyTDfZFcaD_jsA@mail.gmail.com>

On Wednesday, November 9, 2016 6:19:06 PM CET Geert Uytterhoeven wrote:
> On Wed, Nov 9, 2016 at 5:56 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Wednesday, November 9, 2016 2:34:33 PM CET Geert Uytterhoeven wrote:
> >> > And Samsung.
> >> > Shall I create the immutable branch now?
> >>
> >> Arnd: are you happy with the new patches and changes?
> >
> > I still had some comments for patch 7, but that shouldn't stop
> > you from creating a branch for the first three so everyone can
> > build on top of that.
> 
> Thanks!
> 
> What about patch [4/7]?
> Haven't you received it? Your address was in the To-line for all 7 patches.

Ok, I see it now, looks good. That should be included as well then.

	Arnd

^ permalink raw reply

* [PATCH] staging: vc04_services: rework ioctl code path
From: Arnd Bergmann @ 2016-11-09 21:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109203727.28333-1-mzoran@crowfest.net>

On Wednesday, November 9, 2016 12:37:27 PM CET Michael Zoran wrote:

>  static long
> -vchiq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +vchiq_ioctl_shutdown(VCHIQ_INSTANCE_T instance,
> +		     unsigned int cmd,
> +		     unsigned long arg) {

This does not use cmd or arg, so you can just drop both parameters.
In cases where the argument is actually used, better make it take
a pointer than an unsigned long argument, to save a conversion.

> +	vchiq_log_trace(vchiq_arm_log_level,
> +			"vchiq_ioctl(compat) - instance %pK, cmd %s, arg %lx",
> +			instance,
> +			((_IOC_TYPE(cmd) == VCHIQ_IOC_MAGIC) &&
> +			(ioctl_nr <= VCHIQ_IOC_MAX)) ?
> +			ioctl_names[ioctl_nr] : "<invalid>", arg);

I'd suggest dropping the log_trace here.

> +	if ((ioctl_nr > VCHIQ_IOC_MAX) ||
> +	    (vchiq_ioctl_compat_table[ioctl_nr].ioctl != cmd)) {
> +		ret = -ENOTTY;
> +	} else {
> +		ret = vchiq_ioctl_compat_table[ioctl_nr].func(instance, cmd, arg);
>  	}

It's rather unusual to have a table like this, most drivers have a
simple switch/case statement. If you do a swapper like this, better
make it do something more and let it also pass the data as a kernel
pointer that you copy in and out of the kernel according to the
direction and size bits in the command number.

> @@ -104,6 +109,12 @@ typedef struct vchiq_service_base_struct {
>  	void *userdata;
>  } VCHIQ_SERVICE_BASE_T;
>  
> +struct vchiq_service_base32 {
> +	int fourcc;
> +	u32 callback;
> +	u32 userdata;
> +};

Maybe better use compat_uptr_t along with compat_ptr() for passing
32-bit pointers. This will however require moving the struct definitions
into an #ifdef.

	Arnd

^ permalink raw reply

* [PATCH] phy: rockchip-inno-usb2: correct 480MHz output clock stable time
From: Doug Anderson @ 2016-11-09 20:54 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1478523658-9400-1-git-send-email-wulf@rock-chips.com>

Hi,

On Mon, Nov 7, 2016 at 5:00 AM, William Wu <wulf@rock-chips.com> wrote:
> We found that the system crashed due to 480MHz output clock of
> USB2 PHY was unstable after clock had been enabled by gpu module.
>
> Theoretically, 1 millisecond is a critical value for 480MHz
> output clock stable time, so we try to change the delay time
> to 1.2 millisecond to avoid this issue.
>
> Signed-off-by: William Wu <wulf@rock-chips.com>
> ---
>  drivers/phy/phy-rockchip-inno-usb2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/phy/phy-rockchip-inno-usb2.c b/drivers/phy/phy-rockchip-inno-usb2.c
> index ecfd7d1..8f2d2b6 100644
> --- a/drivers/phy/phy-rockchip-inno-usb2.c
> +++ b/drivers/phy/phy-rockchip-inno-usb2.c
> @@ -267,7 +267,7 @@ static int rockchip_usb2phy_clk480m_enable(struct clk_hw *hw)
>                         return ret;
>
>                 /* waitting for the clk become stable */
> -               mdelay(1);
> +               udelay(1200);

Several people who have seen this patch have expressed concern that a
1.2 ms delay is pretty long for something that's supposed to be
"atomic" like a clk_enable().  Consider that someone might call
clk_enable() while interrupts are disabled and that a 1.2 ms interrupt
latency is not so great.

It seems like this clock should be moved to be enabled in "prepare"
and the "enable" should be a no-op.  This is a functionality change,
but I don't think there are any real users for this clock at the
moment so it should be fine.

(of course, the 1 ms latency that existed before this patch was still
pretty bad, but ...)

-Doug

^ permalink raw reply

* PM regression with LED changes in next-20161109
From: Jacek Anaszewski @ 2016-11-09 20:45 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109192301.GS26979@atomide.com>

Hi Tony,

On 11/09/2016 08:23 PM, Tony Lindgren wrote:
> Hi,
>
> Looks like commit 883d32ce3385 ("leds: core: Add support for poll()ing
> the sysfs brightness attr for changes.") breaks runtime PM for me.
>
> On my omap dm3730 based test system, idle power consumption is over 70
> times higher now with this patch! It goes from about 6mW for the core
> system to over 440mW during idle meaning there's some busy timer now
> active.
>
> Reverting this patch fixes the issue. Any ideas?

Thanks for the report. This is probably caused by sysfs_notify_dirent().
I'm afraid that we can't keep this feature in the current shape.
Hans, I'm dropping the patch. We probably will have to delegate this
call to a workqueue task. Think about use cases when the LED is blinked
with high frequency e.g. from ledtrig-disk.c.

Also, IMHO the notifications should be enabled only if explicitly
selected in the kernel config.

-- 
Best regards,
Jacek Anaszewski

^ permalink raw reply

* [PATCH] arm64: acpi: arch_apei_get_mem_attributes() should use memblock
From: Ard Biesheuvel @ 2016-11-09 20:39 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <5823677D.3050803@arm.com>

On 9 November 2016 at 18:14, James Morse <james.morse@arm.com> wrote:
> Hi Ard,
>
> On 09/11/16 12:12, Ard Biesheuvel wrote:
>> On 8 November 2016 at 10:27, James Morse <james.morse@arm.com> wrote:
[...]
>>> @@ -241,22 +238,13 @@ void __init acpi_boot_table_init(void)
>>>  pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
>>>  {
>>>         /*
>>> -        * According to "Table 8 Map: EFI memory types to AArch64 memory
>>> -        * types" of UEFI 2.5 section 2.3.6.1, each EFI memory type is
>>> -        * mapped to a corresponding MAIR attribute encoding.
>>> -        * The EFI memory attribute advises all possible capabilities
>>> -        * of a memory region. We use the most efficient capability.
>>> +        * The EFI memmap isn't mapped, instead read the version exported
>>> +        * into memblock. EFI's reserve_regions() call adds memory with the
>>> +        * WB attribute to memblock via early_init_dt_add_memory_arch().
>>>          */
>>> +       if (!memblock_is_memory(addr))
>>> +               return __pgprot(PROT_DEVICE_nGnRnE);
>>>
>>> -       u64 attr;
>>> -
>>> -       attr = efi_mem_attributes(addr);
>>> -       if (attr & EFI_MEMORY_WB)
>>> -               return PAGE_KERNEL;
>>> -       if (attr & EFI_MEMORY_WT)
>>> -               return __pgprot(PROT_NORMAL_WT);
>>> -       if (attr & EFI_MEMORY_WC)
>>> -               return __pgprot(PROT_NORMAL_NC);
>>> -       return __pgprot(PROT_DEVICE_nGnRnE);
>>> +       return PAGE_KERNEL;
>>
>> As you know, we recently applied [0] to ensure that memory regions
>> that are marked by UEFI was write-through cacheable (WT) or
>> write-combining (WC) are not misidentified by the kernel as having
>> strongly ordered semantics.
>
> Ah, I'd forgotten all about that...
>
>
>> This means that in this case, you may be returning PAGE_KERNEL for
>> regions that are lacking the EFI_MEMORY_WB attribute, which kind of
>> defeats the purpose of this function (AIUI), to align the kernel's
>> view of the region's attributes with the view of the firmware. I would
>> expect that, for use cases like this one, the burden is on the
>> firmware to ensure that only a single EFI_MEMORY_xx attribute is set
>> if any kind of coherency between UEFI and the kernel is expected. If
>> that single attribute is WT or WC, PAGE_KERNEL is most certainly
>> wrong.
>
> I've been trying to test some of the APEI code using some hacks on top of
> kvmtool. (actually, quite a lot of hacks).
>
> When I trigger an event, I see efi_mem_attributes() always return 0 because the
> EFI memory map isn't mapped. This turns out to be because I have 'efi=noruntime'
> on the command line. I stopped digging when I found this previously-dead code,
> but should have gone further.
>

Indeed. Ordinarily, the UEFI memory map should always be mapped at
runtime, and so this code should always be able to rely on it.
However, it does beg the question whether efi=noruntime should
propagate across the ACPI subsystem as well.

> If this is an expected interaction, we can ignore this as a bug in my test
> setup. (and I have to work out how to get efi runtime services going...)
>
> If not, I can try always mapping the EFI memory map in
> arm_enable_runtime_services() if we booted via ACPI, as in this case runtime
> services isn't the only user of the memory map.
>

efi=noruntime is intended to avoid UEFI runtime services calls into
known buggy firmware, and so it would be perfectly reasonable, also
given the above, to always map the UEFI memory map, even if
efi=noruntime is passed. We simply didn't bother at the time when
nothing was relying on the UEFI memory map other than those UEFI
runtime services.

> My intention here was to just mirror acpi_os_ioremap(), which will call
> ioremap_cache() for WB/WC/WT regions, which (also) ends up using PROT_NORMAL. We
> may get away with this, but you're right, we are less likely to here.
>

Yeah, that is another wart we have to do something about sooner rather
than later ...

^ permalink raw reply

* [PATCH] staging: vc04_services: rework ioctl code path
From: Michael Zoran @ 2016-11-09 20:37 UTC (permalink / raw)
  To: linux-arm-kernel

VCHIQ/vc04_services has a userland device interface
that includes ioctls. The current ioctl implementation
is a single monolithic function over 1,000+ lines
that handles 17 different ioctls through a complex
maze of switch and if statements.

The change reimplements that code path by breaking
up the code into smaller, easier to maintain functions
and uses a dispatch table to invoke the correct function.

Testing:

1. vchiq_test -f 10 and vchiq_test -p 1 were run from a native
64-bit OS(debian sid).

2. vchiq_test -f 10 and vchiq_test -p 1 where run from a 32-bit
chroot install from the same OS.

Both test cases pass.

Signed-off-by: Michael Zoran <mzoran@crowfest.net>
---
 .../vc04_services/interface/vchiq_arm/vchiq_arm.c  | 1914 +++++++++++++-------
 .../vc04_services/interface/vchiq_arm/vchiq_if.h   |   19 +
 .../interface/vchiq_arm/vchiq_ioctl.h              |   69 +
 3 files changed, 1389 insertions(+), 613 deletions(-)

diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index 8fcd940..e52a06b 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -503,709 +503,1394 @@ vchiq_ioc_queue_message(VCHIQ_SERVICE_HANDLE_T handle,
 				   &context, total_size);
 }
 
-/****************************************************************************
-*
-*   vchiq_ioctl
-*
-***************************************************************************/
+static long vchiq_map_status(VCHIQ_STATUS_T status)
+{
+	if (status == VCHIQ_ERROR)
+		return -EIO;
+
+	if (status == VCHIQ_RETRY)
+		return -EINTR;
+
+	return 0;
+}
+
 static long
-vchiq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+vchiq_ioctl_shutdown(VCHIQ_INSTANCE_T instance,
+		     unsigned int cmd,
+		     unsigned long arg) {
+	VCHIQ_SERVICE_T *service = NULL;
+	int i;
+
+	if (!instance->connected)
+		return 0;
+
+	/* Remove all services */
+	i = 0;
+	while ((service = next_service_by_instance(instance->state,
+						   instance, &i)) != NULL) {
+		VCHIQ_STATUS_T status;
+
+		status = vchiq_remove_service(service->handle);
+		unlock_service(service);
+
+		if (status != VCHIQ_SUCCESS)
+			return vchiq_map_status(status);
+	}
+
+	/* Wake the completion thread and ask it to exit */
+	instance->closing = 1;
+	up(&instance->insert_event);
+
+	return 0;
+}
+
+static long
+vchiq_ioctl_connect(VCHIQ_INSTANCE_T instance,
+		    unsigned int cmd,
+		    unsigned long arg)
 {
-	VCHIQ_INSTANCE_T instance = file->private_data;
 	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	int rc;
+
+	if (instance->connected)
+		return -EINVAL;
+
+	rc = mutex_lock_interruptible(&instance->state->mutex);
+	if (rc) {
+		vchiq_log_error(vchiq_arm_log_level,
+			"vchiq: connect: could not lock mutex for state %d: %d",
+			instance->state->id, rc);
+		return -EINTR;
+	}
+
+	status = vchiq_connect_internal(instance->state, instance);
+	mutex_unlock(&instance->state->mutex);
+
+	if (status != VCHIQ_SUCCESS) {
+		vchiq_log_error(vchiq_arm_log_level,
+				"vchiq: could not connect: %d", status);
+		return vchiq_map_status(status);
+	}
+
+	instance->connected = 1;
+	return 0;
+}
+
+static long
+vchiq_ioctl_create_service_internal(VCHIQ_INSTANCE_T instance,
+				    VCHIQ_CREATE_SERVICE_T *args)
+{
+	VCHIQ_SERVICE_T *service = NULL;
+	USER_SERVICE_T *user_service = NULL;
+	void *userdata;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	int srvstate;
+
+	user_service = kmalloc(sizeof(USER_SERVICE_T), GFP_KERNEL);
+	if (!user_service)
+		return -ENOMEM;
+
+	if (args->is_open) {
+		if (!instance->connected) {
+			kfree(user_service);
+			return -ENOTCONN;
+		}
+		srvstate = VCHIQ_SRVSTATE_OPENING;
+	} else {
+		srvstate =
+			instance->connected ?
+			VCHIQ_SRVSTATE_LISTENING :
+			VCHIQ_SRVSTATE_HIDDEN;
+	}
+
+	userdata = args->params.userdata;
+	args->params.callback = service_callback;
+	args->params.userdata = user_service;
+	service = vchiq_add_service_internal(
+		instance->state,
+		&args->params, srvstate,
+		instance, user_service_free);
+
+	if (!service) {
+		kfree(user_service);
+		return -EEXIST;
+	}
+
+	user_service->service = service;
+	user_service->userdata = userdata;
+	user_service->instance = instance;
+	user_service->is_vchi = (args->is_vchi != 0);
+	user_service->dequeue_pending = 0;
+	user_service->close_pending = 0;
+	user_service->message_available_pos = instance->completion_remove - 1;
+	user_service->msg_insert = 0;
+	user_service->msg_remove = 0;
+	sema_init(&user_service->insert_event, 0);
+	sema_init(&user_service->remove_event, 0);
+	sema_init(&user_service->close_event, 0);
+
+	if (args->is_open) {
+		status = vchiq_open_service_internal
+			(service, instance->pid);
+		if (status != VCHIQ_SUCCESS) {
+			vchiq_remove_service(service->handle);
+			return vchiq_map_status(status);
+		}
+	}
+
+	args->handle = service->handle;
+	return 0;
+}
+
+static long
+vchiq_ioctl_create_service(VCHIQ_INSTANCE_T instance,
+			   unsigned int cmd,
+			  unsigned long arg)
+{
+	VCHIQ_CREATE_SERVICE_T args;
+	long ret;
+
+	if (copy_from_user
+		(&args, (const void __user *)arg,
+		sizeof(args)) != 0)
+		return -EFAULT;
+
+	args.params.callback = NULL;
+
+	ret = vchiq_ioctl_create_service_internal(instance, &args);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_to_user((void __user *)
+		&(((VCHIQ_CREATE_SERVICE_T __user *)
+		arg)->handle),
+		(const void *)&args.handle,
+		sizeof(args.handle)) != 0) {
+		vchiq_remove_service(args.handle);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static long
+vchiq_ioctl_close_service(VCHIQ_INSTANCE_T instance,
+			  unsigned int cmd,
+			  unsigned long arg)
+{
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T) arg;
 	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	USER_SERVICE_T *user_service;
+
+	service = find_service_for_instance(instance, handle);
+
+	if (!service)
+		return -EINVAL;
+
+	user_service =	(USER_SERVICE_T *)service->base.userdata;
+
+	/*
+	 * close_pending is false on first entry, and when the
+	 * wait in vchiq_close_service has been interrupted.
+	 */
+	if (!user_service->close_pending) {
+		status = vchiq_close_service(service->handle);
+		if (status != VCHIQ_SUCCESS) {
+			unlock_service(service);
+			return vchiq_map_status(status);
+		}
+	}
+
+	/*
+	 * close_pending is true once the underlying service
+	 * has been closed until the client library calls the
+	 * CLOSE_DELIVERED ioctl, signalling close_event.
+	 */
+	if (user_service->close_pending &&
+	    down_interruptible(&user_service->close_event)) {
+			unlock_service(service);
+			return vchiq_map_status(VCHIQ_RETRY);
+	}
+
+	unlock_service(service);
+	return 0;
+}
+
+static long
+vchiq_ioctl_remove_service(VCHIQ_INSTANCE_T instance,
+			   unsigned int cmd,
+			   unsigned long arg) {
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	USER_SERVICE_T *user_service;
+
+	service = find_service_for_instance(instance, handle);
+
+	if (!service)
+		return -EINVAL;
+
+	user_service =	(USER_SERVICE_T *)service->base.userdata;
+
+	/*
+	 * close_pending is false on first entry, and when the
+	 * wait in vchiq_close_service has been interrupted.
+	 */
+	if (!user_service->close_pending) {
+		status = vchiq_remove_service(service->handle);
+		if (status != VCHIQ_SUCCESS) {
+			unlock_service(service);
+			return vchiq_map_status(status);
+		}
+	}
+
+	/*
+	 * close_pending is true once the underlying service
+	 * has been closed until the client library calls the
+	 * CLOSE_DELIVERED ioctl, signaling close_event.
+	 */
+	if (user_service->close_pending &&
+	    down_interruptible(&user_service->close_event)) {
+			unlock_service(service);
+			return vchiq_map_status(VCHIQ_RETRY);
+	}
+
+	unlock_service(service);
+	return 0;
+}
+
+static long
+vchiq_ioctl_use_service(VCHIQ_INSTANCE_T instance,
+			unsigned int cmd,
+			unsigned long arg) {
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+
+	service = find_service_for_instance(instance, handle);
+
+	if (!service)
+		return -EINVAL;
+
+	status = vchiq_use_service_internal(service);
+
+	if (status != VCHIQ_SUCCESS) {
+		vchiq_log_error(vchiq_susp_log_level,
+			"%s: cmd VCHIQ_IOC_USE_SERVICE returned error %d for service %c%c%c%c:%03d",
+			__func__,
+			status,
+			VCHIQ_FOURCC_AS_4CHARS(
+			service->base.fourcc),
+			service->client_id);
+		unlock_service(service);
+		return -EINVAL;
+	}
+
+	unlock_service(service);
+	return 0;
+}
+
+static long
+vchiq_ioctl_release_service(VCHIQ_INSTANCE_T instance,
+			    unsigned int cmd,
+			    unsigned long arg) {
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+
+	service = find_service_for_instance(instance, handle);
+
+	if (!service)
+		return -EINVAL;
+
+	status = vchiq_release_service_internal(service);
+
+	if (status != VCHIQ_SUCCESS) {
+		vchiq_log_error(vchiq_susp_log_level,
+			"%s: cmd VCHIQ_IOC_RELEASE_SERVICE returned error %d for service %c%c%c%c:%03d",
+			__func__,
+			status,
+			VCHIQ_FOURCC_AS_4CHARS(
+			service->base.fourcc),
+			service->client_id);
+		unlock_service(service);
+		return -EINVAL;
+	}
+
+	unlock_service(service);
+	return 0;
+}
+
+static long
+vchiq_ioctl_queue_message(VCHIQ_INSTANCE_T instance,
+			  unsigned int cmd,
+			  unsigned long arg) {
+	VCHIQ_QUEUE_MESSAGE_T args;
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	VCHIQ_ELEMENT_T elements[MAX_ELEMENTS];
+
+	if (copy_from_user(&args,
+			   (const void __user *)arg,
+			   sizeof(args)) != 0)
+		return -EFAULT;
+
+	service = find_service_for_instance(instance, args.handle);
+
+	if (!service) {
+		unlock_service(service);
+		return -EINVAL;
+	}
+
+	if (copy_from_user(elements, args.elements,
+			   args.count * sizeof(VCHIQ_ELEMENT_T)) != 0) {
+		unlock_service(service);
+		return -EINVAL;
+	}
+
+	status = vchiq_ioc_queue_message(args.handle,
+					 elements, args.count);
+
+	unlock_service(service);
+	return vchiq_map_status(status);
+}
+
+static long
+vchiq_ioctl_bulk_transfer_internal(VCHIQ_INSTANCE_T instance,
+				   unsigned int handle,
+				   void __user *data,
+				   unsigned int size,
+				   void *userdata,
+				   VCHIQ_BULK_MODE_T mode,
+				   VCHIQ_BULK_DIR_T dir,
+				   VCHIQ_BULK_MODE_T *ret_mode_waiting)
+{
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	struct bulk_waiter_node *waiter = NULL;
+
+	*ret_mode_waiting = mode;
+
+	service = find_service_for_instance(instance, handle);
+	if (!service)
+		return -EINVAL;
+
+	if (mode == VCHIQ_BULK_MODE_BLOCKING) {
+		waiter = kzalloc(sizeof(struct bulk_waiter_node), GFP_KERNEL);
+		if (!waiter) {
+			unlock_service(service);
+			return -ENOMEM;
+		}
+		userdata = &waiter->bulk_waiter;
+	} else if (mode == VCHIQ_BULK_MODE_WAITING) {
+		struct list_head *pos;
+
+		mutex_lock(&instance->bulk_waiter_list_mutex);
+
+		list_for_each(pos, &instance->bulk_waiter_list)
+		{
+			if (list_entry(pos, struct bulk_waiter_node,
+				       list)->pid == current->pid) {
+				waiter = list_entry(pos,
+						    struct bulk_waiter_node,
+					list);
+				list_del(pos);
+				break;
+			}
+		}
+		mutex_unlock(&instance->bulk_waiter_list_mutex);
+		if (!waiter) {
+			vchiq_log_error(vchiq_arm_log_level,
+					"no bulk_waiter found for pid %d",
+				current->pid);
+			unlock_service(service);
+			return -ESRCH;
+		}
+		vchiq_log_info(vchiq_arm_log_level,
+			       "found bulk_waiter %pK for pid %d", waiter,
+			current->pid);
+		userdata = &waiter->bulk_waiter;
+	}
+
+	status = vchiq_bulk_transfer(handle,
+				     VCHI_MEM_HANDLE_INVALID,
+				     data,
+				     size,
+				     userdata,
+				     mode,
+				     dir);
+
+	if (!waiter) {
+		unlock_service(service);
+		return vchiq_map_status(status);
+	}
+
+	if ((status != VCHIQ_RETRY) || fatal_signal_pending(current) ||
+	    !waiter->bulk_waiter.bulk) {
+		if (waiter->bulk_waiter.bulk) {
+			/*
+			 * Cancel the signal when the transfer
+			 * completes.
+			 */
+			spin_lock(&bulk_waiter_spinlock);
+			waiter->bulk_waiter.bulk->userdata = NULL;
+			spin_unlock(&bulk_waiter_spinlock);
+		}
+		kfree(waiter);
+	} else {
+		waiter->pid = current->pid;
+		mutex_lock(&instance->bulk_waiter_list_mutex);
+		list_add(&waiter->list, &instance->bulk_waiter_list);
+		mutex_unlock(&instance->bulk_waiter_list_mutex);
+		vchiq_log_info(vchiq_arm_log_level,
+			       "saved bulk_waiter %pK for pid %d",
+			waiter, current->pid);
+
+		*ret_mode_waiting = VCHIQ_BULK_MODE_WAITING;
+	}
+
+	unlock_service(service);
+	return vchiq_map_status(status);
+}
+
+static long
+vchiq_ioctl_bulk_transfer(VCHIQ_INSTANCE_T instance,
+			  unsigned int cmd,
+			  unsigned long arg)
+{
+	VCHIQ_QUEUE_BULK_TRANSFER_T args;
+	VCHIQ_BULK_MODE_T mode_waiting;
+	long ret;
+	VCHIQ_BULK_DIR_T dir =
+		(cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT) ?
+		VCHIQ_BULK_TRANSMIT : VCHIQ_BULK_RECEIVE;
+
+	if (copy_from_user(&args,
+			   (const void __user *)arg,
+			   sizeof(args)) != 0)
+		return -EFAULT;
+
+	ret = vchiq_ioctl_bulk_transfer_internal(instance,
+						 args.handle,
+						 args.data,
+						 args.size,
+						 args.userdata,
+						 args.mode,
+						 dir,
+						 &mode_waiting);
+
+	if (mode_waiting != args.mode) {
+		if (copy_to_user((void __user *)
+				 &(((VCHIQ_QUEUE_BULK_TRANSFER_T __user *)
+				 arg)->mode),
+				(const void *)&mode_waiting,
+				sizeof(mode_waiting)) != 0)
+					return -EFAULT;
+	}
+
+	return ret;
+}
+
+static long
+vchiq_ioctl_await_completion(VCHIQ_INSTANCE_T instance,
+			     unsigned int cmd,
+			     unsigned long arg)
+{
+	VCHIQ_AWAIT_COMPLETION_T args;
 	long ret = 0;
-	int i, rc;
-	DEBUG_INITIALISE(g_state.local)
+	int msgbufcount;
 
-	vchiq_log_trace(vchiq_arm_log_level,
-		"vchiq_ioctl - instance %pK, cmd %s, arg %lx",
-		instance,
-		((_IOC_TYPE(cmd) == VCHIQ_IOC_MAGIC) &&
-		(_IOC_NR(cmd) <= VCHIQ_IOC_MAX)) ?
-		ioctl_names[_IOC_NR(cmd)] : "<invalid>", arg);
+	DEBUG_INITIALISE(g_state.local);
 
-	switch (cmd) {
-	case VCHIQ_IOC_SHUTDOWN:
-		if (!instance->connected)
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+
+	if (!instance->connected)
+		return -ENOTCONN;
+
+	if (copy_from_user(&args, (const void __user *)arg,
+			   sizeof(args)) != 0)
+		return -EFAULT;
+
+	mutex_lock(&instance->completion_mutex);
+
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+	while ((instance->completion_remove == instance->completion_insert)
+		&& !instance->closing) {
+		int rc;
+
+		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+		mutex_unlock(&instance->completion_mutex);
+		rc = down_interruptible(&instance->insert_event);
+		mutex_lock(&instance->completion_mutex);
+		if (rc != 0) {
+			DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+			vchiq_log_info(vchiq_arm_log_level,
+				       "AWAIT_COMPLETION interrupted");
+			mutex_unlock(&instance->completion_mutex);
+			DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+			return -EINTR;
+		}
+	}
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+
+	/*
+	 * A read memory barrier is needed to stop prefetch of a stale
+	 * completion record.
+	 */
+	rmb();
+
+	msgbufcount = args.msgbufcount;
+	for (ret = 0; ret < args.count; ret++) {
+		VCHIQ_COMPLETION_DATA_T *completion;
+		VCHIQ_SERVICE_T *service;
+		USER_SERVICE_T *user_service;
+		VCHIQ_HEADER_T *header;
+
+		if (instance->completion_remove == instance->completion_insert)
 			break;
+		completion = &instance->completions[
+			instance->completion_remove &
+			(MAX_COMPLETIONS - 1)];
+
+		service = completion->service_userdata;
+		user_service = service->base.userdata;
+		completion->service_userdata = user_service->userdata;
+
+		header = completion->header;
+		if (header) {
+			void __user *msgbuf;
+			int msglen;
+
+			msglen = header->size + sizeof(VCHIQ_HEADER_T);
+			/* This must be a VCHIQ-style service */
+			if (args.msgbufsize < msglen) {
+				vchiq_log_error(
+					vchiq_arm_log_level,
+					"header %pK: msgbufsize %x < msglen %x",
+					header, args.msgbufsize,
+					msglen);
+				WARN(1, "invalid message size\n");
+				if (ret == 0)
+					ret = -EMSGSIZE;
+				break;
+			}
+			if (msgbufcount <= 0)
+				/*
+				 * Stall here for lack of a
+				 * buffer for the message.
+				 */
+				break;
+			/* Get the pointer from user space */
+			msgbufcount--;
+
+			if (copy_from_user(&msgbuf,
+					   (const void __user *)
+					   &args.msgbufs[msgbufcount],
+					   sizeof(msgbuf)) != 0) {
+				if (ret == 0)
+					ret = -EFAULT;
+				break;
+			}
 
-		/* Remove all services */
-		i = 0;
-		while ((service = next_service_by_instance(instance->state,
-			instance, &i)) != NULL) {
-			status = vchiq_remove_service(service->handle);
-			unlock_service(service);
-			if (status != VCHIQ_SUCCESS)
+			/* Copy the message to user space */
+			if (copy_to_user(msgbuf, header,
+					 msglen) != 0) {
+				if (ret == 0)
+					ret = -EFAULT;
 				break;
+			}
+
+			/*
+			 * Now it has been copied, the message
+			 * can be released.
+			 */
+			vchiq_release_message(service->handle,
+					      header);
+
+			/*
+			 * The completion must point to the
+			 * msgbuf.
+			 */
+			completion->header = msgbuf;
 		}
-		service = NULL;
 
-		if (status == VCHIQ_SUCCESS) {
-			/* Wake the completion thread and ask it to exit */
-			instance->closing = 1;
-			up(&instance->insert_event);
+		if ((completion->reason ==
+			VCHIQ_SERVICE_CLOSED) &&
+			!instance->use_close_delivered)
+			unlock_service(service);
+
+		if (copy_to_user((void __user *)(
+				 (size_t)args.buf +
+				 ret * sizeof(VCHIQ_COMPLETION_DATA_T)),
+				 completion,
+				 sizeof(VCHIQ_COMPLETION_DATA_T)) != 0) {
+			if (ret == 0)
+				ret = -EFAULT;
+			break;
 		}
 
-		break;
+		instance->completion_remove++;
+	}
+
+	if (msgbufcount != args.msgbufcount) {
+		if (copy_to_user((void __user *)
+				 &((VCHIQ_AWAIT_COMPLETION_T *)arg)->
+				 msgbufcount,
+				 &msgbufcount,
+				 sizeof(msgbufcount)) != 0)
+			ret = -EFAULT;
+	}
+
+	if (ret != 0)
+		up(&instance->remove_event);
+	mutex_unlock(&instance->completion_mutex);
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+
+	return ret;
+}
+
+static long
+vchiq_ioctl_dequeue_message_internal(VCHIQ_INSTANCE_T instance,
+				     unsigned int handle,
+				    bool blocking,
+				    unsigned int bufsize,
+				    void __user *buf)
+{
+	VCHIQ_SERVICE_T *service = NULL;
+	USER_SERVICE_T *user_service;
+	VCHIQ_HEADER_T *header;
+	long ret;
+
+	DEBUG_INITIALISE(g_state.local);
+
+	service = find_service_for_instance(instance, handle);
+	if (!service)
+		return -EINVAL;
+
+	user_service = (USER_SERVICE_T *)service->base.userdata;
+	if (user_service->is_vchi == 0) {
+		unlock_service(service);
+		return -EINVAL;
+	}
+
+	spin_lock(&msg_queue_spinlock);
+	if (user_service->msg_remove == user_service->msg_insert) {
+		if (!blocking) {
+			spin_unlock(&msg_queue_spinlock);
+			DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
+			unlock_service(service);
+			return -EWOULDBLOCK;
+		}
+		user_service->dequeue_pending = 1;
+		do {
+			spin_unlock(&msg_queue_spinlock);
+			DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
+			if (down_interruptible(
+				&user_service->insert_event) != 0) {
+				vchiq_log_info(vchiq_arm_log_level,
+					       "DEQUEUE_MESSAGE interrupted");
+				unlock_service(service);
+				return -EINTR;
+			}
+			spin_lock(&msg_queue_spinlock);
+		} while (user_service->msg_remove == user_service->msg_insert);
+	}
+
+	BUG_ON((int)(user_service->msg_insert - user_service->msg_remove) < 0);
+
+	header = user_service->msg_queue[user_service->msg_remove &
+		(MSG_QUEUE_SIZE - 1)];
+	user_service->msg_remove++;
+	spin_unlock(&msg_queue_spinlock);
+
+	up(&user_service->remove_event);
+	if (!header) {
+		unlock_service(service);
+		return -ENOTCONN;
+	}
+
+	if (header->size > bufsize) {
+		vchiq_log_error(vchiq_arm_log_level,
+				"header %pK: bufsize %x < size %x",
+			header, bufsize, header->size);
+		WARN(1, "invalid size\n");
+		unlock_service(service);
+		return -EMSGSIZE;
+	}
+
+	if (!buf) {
+		unlock_service(service);
+		return -EMSGSIZE;
+	}
+
+	if (copy_to_user(buf,
+			 header->data,
+			 header->size) != 0) {
+		unlock_service(service);
+		return -EFAULT;
+	}
+
+	ret = header->size;
+	vchiq_release_message(service->handle, header);
+
+	unlock_service(service);
+	DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
+
+	return ret;
+}
+
+static long
+vchiq_ioctl_dequeue_message(VCHIQ_INSTANCE_T instance,
+			    unsigned int cmd,
+			    unsigned long arg)
+{
+	VCHIQ_DEQUEUE_MESSAGE_T args;
+
+	if (copy_from_user(&args, (const void __user *)arg,
+			   sizeof(args)) != 0)
+		return -EFAULT;
+
+	return vchiq_ioctl_dequeue_message_internal(instance,
+						    args.handle,
+						    args.blocking,
+						    args.bufsize,
+						    args.buf);
+}
+
+static long
+vchiq_ioctl_get_client_handle(VCHIQ_INSTANCE_T instance,
+			      unsigned int cmd,
+			      unsigned long arg)
+{
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+
+	return vchiq_get_client_id(handle);
+}
+
+static long
+vchiq_ioctl_get_config(VCHIQ_INSTANCE_T instance,
+		       unsigned int cmd,
+		       unsigned long arg)
+{
+	VCHIQ_GET_CONFIG_T args;
+	VCHIQ_CONFIG_T config;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+
+	if (copy_from_user(&args, (const void __user *)arg,
+			   sizeof(args)) != 0)
+		return -EFAULT;
+
+	if (args.config_size > sizeof(config))
+		return -EFAULT;
+
+	status = vchiq_get_config(instance, args.config_size, &config);
+
+	if (status != VCHIQ_SUCCESS)
+		return vchiq_map_status(status);
+
+	if (copy_to_user((void __user *)args.pconfig,
+			 &config, args.config_size) != 0)
+		return -EFAULT;
+
+	return 0;
+}
+
+static long
+vchiq_ioctl_set_service_option(VCHIQ_INSTANCE_T instance,
+			       unsigned int cmd,
+			       unsigned long arg)
+{
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	VCHIQ_SET_SERVICE_OPTION_T args;
+
+	if (copy_from_user(
+		&args, (const void __user *)arg,
+		sizeof(args)) != 0)
+		return -EFAULT;
+
+	service = find_service_for_instance(instance, args.handle);
+	if (!service)
+		return -EINVAL;
+
+	status = vchiq_set_service_option(
+		args.handle, args.option, args.value);
+
+	unlock_service(service);
+	return vchiq_map_status(status);
+}
+
+static long
+vchiq_ioctl_dump_phys_mem(VCHIQ_INSTANCE_T instance,
+			  unsigned int cmd,
+	unsigned long arg)
+{
+	VCHIQ_DUMP_MEM_T args;
+
+	if (copy_from_user
+		(&args, (const void __user *)arg,
+		sizeof(args)) != 0)
+		return -EFAULT;
+
+	dump_phys_mem(args.virt_addr, args.num_bytes);
+
+	return 0;
+}
+
+static long
+vchiq_ioctl_lib_version(VCHIQ_INSTANCE_T instance,
+			unsigned int cmd,
+			unsigned long arg)
+{
+	unsigned int lib_version = (unsigned int)arg;
+
+	if (lib_version < VCHIQ_VERSION_MIN)
+		return -EINVAL;
+
+	if (lib_version >= VCHIQ_VERSION_CLOSE_DELIVERED)
+		instance->use_close_delivered = 1;
+
+	return 0;
+}
+
+static long
+vchiq_ioctl_close_delivered(VCHIQ_INSTANCE_T instance,
+			    unsigned int cmd,
+			    unsigned long arg)
+{
+	VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	VCHIQ_SERVICE_T *service = NULL;
+
+	service = find_closed_service_for_instance(instance, handle);
+
+	if (!service)
+		return -EINVAL;
+
+	close_delivered((USER_SERVICE_T *)service->base.userdata);
+
+	unlock_service(service);
+	return 0;
+}
+
+typedef long (*vchiq_ioctl_func_t)(VCHIQ_INSTANCE_T, unsigned int, unsigned long);
+
+struct vchiq_ioctl_entry {
+	vchiq_ioctl_func_t func;
+	unsigned int ioctl;
+};
+
+#define VCHIQ_MK_IOCTL(__ioctl, __func)	\
+	[_IOC_NR(__ioctl)] = {.func = __func, .ioctl = __ioctl}
+
+static const struct vchiq_ioctl_entry vchiq_ioctl_table[] = {
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CONNECT, vchiq_ioctl_connect),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_SHUTDOWN, vchiq_ioctl_shutdown),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CREATE_SERVICE, vchiq_ioctl_create_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CLOSE_SERVICE, vchiq_ioctl_close_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_REMOVE_SERVICE, vchiq_ioctl_remove_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_USE_SERVICE, vchiq_ioctl_use_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_RELEASE_SERVICE, vchiq_ioctl_release_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_MESSAGE, vchiq_ioctl_queue_message),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_BULK_TRANSMIT, vchiq_ioctl_bulk_transfer),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_BULK_RECEIVE, vchiq_ioctl_bulk_transfer),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_AWAIT_COMPLETION, vchiq_ioctl_await_completion),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_DEQUEUE_MESSAGE, vchiq_ioctl_dequeue_message),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_GET_CLIENT_ID, vchiq_ioctl_get_client_handle),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_GET_CONFIG, vchiq_ioctl_get_config),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_SET_SERVICE_OPTION, vchiq_ioctl_set_service_option),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_DUMP_PHYS_MEM, vchiq_ioctl_dump_phys_mem),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_LIB_VERSION, vchiq_ioctl_lib_version),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CLOSE_DELIVERED, vchiq_ioctl_close_delivered)
+};
+
+/**************************************************************************
+ *
+ * vchiq_ioctl
+ *
+ **************************************************************************/
+static long
+vchiq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	VCHIQ_INSTANCE_T instance = file->private_data;
+	long ret = 0;
+	unsigned int ioctl_nr;
+
+	ioctl_nr = _IOC_NR(cmd);
+
+	vchiq_log_trace(vchiq_arm_log_level,
+			"vchiq_ioctl - instance %pK, cmd %s, arg %lx",
+		instance,
+		((_IOC_TYPE(cmd) == VCHIQ_IOC_MAGIC) &&
+		(ioctl_nr <= VCHIQ_IOC_MAX)) ?
+		ioctl_names[ioctl_nr] : "<invalid>", arg);
 
-	case VCHIQ_IOC_CONNECT:
-		if (instance->connected) {
-			ret = -EINVAL;
-			break;
-		}
-		rc = mutex_lock_interruptible(&instance->state->mutex);
-		if (rc != 0) {
-			vchiq_log_error(vchiq_arm_log_level,
-				"vchiq: connect: could not lock mutex for "
-				"state %d: %d",
-				instance->state->id, rc);
-			ret = -EINTR;
-			break;
-		}
-		status = vchiq_connect_internal(instance->state, instance);
-		mutex_unlock(&instance->state->mutex);
+	if ((ioctl_nr > VCHIQ_IOC_MAX) ||
+	    (vchiq_ioctl_table[ioctl_nr].ioctl != cmd)) {
+		ret = -ENOTTY;
+	} else {
+		ret = vchiq_ioctl_table[ioctl_nr].func(instance, cmd, arg);
+	}
 
-		if (status == VCHIQ_SUCCESS)
-			instance->connected = 1;
-		else
-			vchiq_log_error(vchiq_arm_log_level,
-				"vchiq: could not connect: %d", status);
-		break;
+	if ((ret < 0) && (ret != -EINTR) && (ret != -EWOULDBLOCK))
+		vchiq_log_info(vchiq_arm_log_level,
+			       "  ioctl instance %lx, cmd %s, %ld",
+			(unsigned long)instance,
+			(ioctl_nr <= VCHIQ_IOC_MAX) ?
+				ioctl_names[ioctl_nr] :
+				"<invalid>",
+			ret);
+	else
+		vchiq_log_trace(vchiq_arm_log_level,
+				"  ioctl instance %lx, cmd %s -> %ld",
+			(unsigned long)instance,
+			(ioctl_nr <= VCHIQ_IOC_MAX) ?
+				ioctl_names[ioctl_nr] :
+				"<invalid>",
+			ret);
 
-	case VCHIQ_IOC_CREATE_SERVICE: {
-		VCHIQ_CREATE_SERVICE_T args;
-		USER_SERVICE_T *user_service = NULL;
-		void *userdata;
-		int srvstate;
+	return ret;
+}
 
-		if (copy_from_user
-			 (&args, (const void __user *)arg,
-			  sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
+#if defined(CONFIG_COMPAT)
 
-		user_service = kmalloc(sizeof(USER_SERVICE_T), GFP_KERNEL);
-		if (!user_service) {
-			ret = -ENOMEM;
-			break;
-		}
+static long
+vchiq_ioctl_compat_create_service(VCHIQ_INSTANCE_T instance,
+				  unsigned int cmd,
+				  unsigned long arg)
+{
+	struct vchiq_create_service32 args32;
+	VCHIQ_CREATE_SERVICE_T args;
+	long ret;
 
-		if (args.is_open) {
-			if (!instance->connected) {
-				ret = -ENOTCONN;
-				kfree(user_service);
-				break;
-			}
-			srvstate = VCHIQ_SRVSTATE_OPENING;
-		} else {
-			srvstate =
-				 instance->connected ?
-				 VCHIQ_SRVSTATE_LISTENING :
-				 VCHIQ_SRVSTATE_HIDDEN;
-		}
+	if (copy_from_user
+		(&args32, (const void __user *)arg,
+		sizeof(args32)) != 0)
+		return -EFAULT;
 
-		userdata = args.params.userdata;
-		args.params.callback = service_callback;
-		args.params.userdata = user_service;
-		service = vchiq_add_service_internal(
-				instance->state,
-				&args.params, srvstate,
-				instance, user_service_free);
-
-		if (service != NULL) {
-			user_service->service = service;
-			user_service->userdata = userdata;
-			user_service->instance = instance;
-			user_service->is_vchi = (args.is_vchi != 0);
-			user_service->dequeue_pending = 0;
-			user_service->close_pending = 0;
-			user_service->message_available_pos =
-				instance->completion_remove - 1;
-			user_service->msg_insert = 0;
-			user_service->msg_remove = 0;
-			sema_init(&user_service->insert_event, 0);
-			sema_init(&user_service->remove_event, 0);
-			sema_init(&user_service->close_event, 0);
-
-			if (args.is_open) {
-				status = vchiq_open_service_internal
-					(service, instance->pid);
-				if (status != VCHIQ_SUCCESS) {
-					vchiq_remove_service(service->handle);
-					service = NULL;
-					ret = (status == VCHIQ_RETRY) ?
-						-EINTR : -EIO;
-					break;
-				}
-			}
+	args.params.fourcc   = args32.params.fourcc;
+	args.params.callback = NULL;
+	args.params.userdata = (void *)(unsigned long)args32.params.userdata;
+	args.params.version = args32.params.version;
+	args.params.version_min = args32.params.version_min;
+	args.is_open = args32.is_open;
+	args.is_vchi = args32.is_vchi;
+	args.handle  = args32.handle;
+
+	ret = vchiq_ioctl_create_service_internal(instance, &args);
+
+	if (ret < 0)
+		return ret;
+
+	if (copy_to_user((void __user *)
+		&(((struct vchiq_create_service32 __user *)
+		arg)->handle),
+		(const void *)&args.handle,
+		sizeof(args.handle)) != 0) {
+		vchiq_remove_service(args.handle);
+		return -EFAULT;
+	}
 
-			if (copy_to_user((void __user *)
-				&(((VCHIQ_CREATE_SERVICE_T __user *)
-					arg)->handle),
-				(const void *)&service->handle,
-				sizeof(service->handle)) != 0) {
-				ret = -EFAULT;
-				vchiq_remove_service(service->handle);
-			}
+	return 0;
+}
 
-			service = NULL;
-		} else {
-			ret = -EEXIST;
-			kfree(user_service);
-		}
-	} break;
+static long
+vchiq_ioctl_compat_queue_message(VCHIQ_INSTANCE_T instance,
+				 unsigned int cmd,
+				 unsigned long arg) {
+	unsigned int i;
+	struct vchiq_queue_message32 args32;
+	VCHIQ_SERVICE_T *service = NULL;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
+	VCHIQ_ELEMENT_T elements[MAX_ELEMENTS];
+	struct vchiq_element32 elements32[MAX_ELEMENTS];
 
-	case VCHIQ_IOC_CLOSE_SERVICE: {
-		VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
-
-		service = find_service_for_instance(instance, handle);
-		if (service != NULL) {
-			USER_SERVICE_T *user_service =
-				(USER_SERVICE_T *)service->base.userdata;
-			/* close_pending is false on first entry, and when the
-                           wait in vchiq_close_service has been interrupted. */
-			if (!user_service->close_pending) {
-				status = vchiq_close_service(service->handle);
-				if (status != VCHIQ_SUCCESS)
-					break;
-			}
+	if (copy_from_user(&args32,
+			   (const void __user *)arg,
+			   sizeof(args32)) != 0)
+		return -EFAULT;
 
-			/* close_pending is true once the underlying service
-			   has been closed until the client library calls the
-			   CLOSE_DELIVERED ioctl, signalling close_event. */
-			if (user_service->close_pending &&
-				down_interruptible(&user_service->close_event))
-				status = VCHIQ_RETRY;
-		}
-		else
-			ret = -EINVAL;
-	} break;
+	service = find_service_for_instance(instance, args32.handle);
 
-	case VCHIQ_IOC_REMOVE_SERVICE: {
-		VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
-
-		service = find_service_for_instance(instance, handle);
-		if (service != NULL) {
-			USER_SERVICE_T *user_service =
-				(USER_SERVICE_T *)service->base.userdata;
-			/* close_pending is false on first entry, and when the
-                           wait in vchiq_close_service has been interrupted. */
-			if (!user_service->close_pending) {
-				status = vchiq_remove_service(service->handle);
-				if (status != VCHIQ_SUCCESS)
-					break;
-			}
+	if (!service) {
+		unlock_service(service);
+		return -EINVAL;
+	}
 
-			/* close_pending is true once the underlying service
-			   has been closed until the client library calls the
-			   CLOSE_DELIVERED ioctl, signalling close_event. */
-			if (user_service->close_pending &&
-				down_interruptible(&user_service->close_event))
-				status = VCHIQ_RETRY;
-		}
-		else
-			ret = -EINVAL;
-	} break;
+	if (copy_from_user(elements32,
+			   (void __user *)(unsigned long)args32.elements,
+			   args32.count * sizeof(struct vchiq_element32)) != 0) {
+		unlock_service(service);
+		return -EINVAL;
+	}
 
-	case VCHIQ_IOC_USE_SERVICE:
-	case VCHIQ_IOC_RELEASE_SERVICE:	{
-		VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	for (i = 0; i < args32.count; i++) {
+		elements[i].data =
+			(const void *)(unsigned long)elements32[i].data;
+		elements[i].size = elements32[i].size;
+	}
 
-		service = find_service_for_instance(instance, handle);
-		if (service != NULL) {
-			status = (cmd == VCHIQ_IOC_USE_SERVICE)	?
-				vchiq_use_service_internal(service) :
-				vchiq_release_service_internal(service);
-			if (status != VCHIQ_SUCCESS) {
-				vchiq_log_error(vchiq_susp_log_level,
-					"%s: cmd %s returned error %d for "
-					"service %c%c%c%c:%03d",
-					__func__,
-					(cmd == VCHIQ_IOC_USE_SERVICE) ?
-						"VCHIQ_IOC_USE_SERVICE" :
-						"VCHIQ_IOC_RELEASE_SERVICE",
-					status,
-					VCHIQ_FOURCC_AS_4CHARS(
-						service->base.fourcc),
-					service->client_id);
-				ret = -EINVAL;
-			}
-		} else
-			ret = -EINVAL;
-	} break;
+	status = vchiq_ioc_queue_message(args32.handle,
+					 elements, args32.count);
 
-	case VCHIQ_IOC_QUEUE_MESSAGE: {
-		VCHIQ_QUEUE_MESSAGE_T args;
-		if (copy_from_user
-			 (&args, (const void __user *)arg,
-			  sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
+	unlock_service(service);
+	return vchiq_map_status(status);
+}
 
-		service = find_service_for_instance(instance, args.handle);
+static long
+vchiq_ioctl_compat_bulk_transfer(VCHIQ_INSTANCE_T instance,
+				 unsigned int cmd,
+				 unsigned long arg)
+{
+	struct vchiq_queue_bulk_transfer32 args32;
+	VCHIQ_BULK_MODE_T mode_waiting;
+	long ret;
+	VCHIQ_BULK_DIR_T dir =
+		(cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT32) ?
+		VCHIQ_BULK_TRANSMIT : VCHIQ_BULK_RECEIVE;
+
+	if (copy_from_user(&args32,
+			   (const void __user *)arg,
+			   sizeof(args32)) != 0)
+		return -EFAULT;
 
-		if ((service != NULL) && (args.count <= MAX_ELEMENTS)) {
-			/* Copy elements into kernel space */
-			VCHIQ_ELEMENT_T elements[MAX_ELEMENTS];
-			if (copy_from_user(elements, args.elements,
-				args.count * sizeof(VCHIQ_ELEMENT_T)) == 0)
-				status = vchiq_ioc_queue_message
-					(args.handle,
-					elements, args.count);
-			else
-				ret = -EFAULT;
-		} else {
-			ret = -EINVAL;
-		}
-	} break;
+	ret = vchiq_ioctl_bulk_transfer_internal(instance,
+						 args32.handle,
+						 (void __user *)(unsigned long)
+							args32.data,
+						 args32.size,
+						 (void *)(unsigned long)
+							args32.userdata,
+						 args32.mode,
+						 dir,
+						 &mode_waiting);
+
+	if (mode_waiting != args32.mode) {
+		if (copy_to_user((void __user *)
+				 &(((struct vchiq_queue_bulk_transfer32 __user *)
+				 arg)->mode),
+				(const void *)&mode_waiting,
+				sizeof(mode_waiting)) != 0)
+					return -EFAULT;
+	}
 
-	case VCHIQ_IOC_QUEUE_BULK_TRANSMIT:
-	case VCHIQ_IOC_QUEUE_BULK_RECEIVE: {
-		VCHIQ_QUEUE_BULK_TRANSFER_T args;
-		struct bulk_waiter_node *waiter = NULL;
-		VCHIQ_BULK_DIR_T dir =
-			(cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT) ?
-			VCHIQ_BULK_TRANSMIT : VCHIQ_BULK_RECEIVE;
-
-		if (copy_from_user
-			(&args, (const void __user *)arg,
-			sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
+	return ret;
+}
 
-		service = find_service_for_instance(instance, args.handle);
-		if (!service) {
-			ret = -EINVAL;
-			break;
-		}
+static long
+vchiq_ioctl_compat_await_completion(VCHIQ_INSTANCE_T instance,
+				    unsigned int cmd,
+				    unsigned long arg)
+{
+	struct vchiq_await_completion32 args32;
+	VCHIQ_AWAIT_COMPLETION_T args;
+	long ret = 0;
+	int msgbufcount;
 
-		if (args.mode == VCHIQ_BULK_MODE_BLOCKING) {
-			waiter = kzalloc(sizeof(struct bulk_waiter_node),
-				GFP_KERNEL);
-			if (!waiter) {
-				ret = -ENOMEM;
-				break;
-			}
-			args.userdata = &waiter->bulk_waiter;
-		} else if (args.mode == VCHIQ_BULK_MODE_WAITING) {
-			struct list_head *pos;
-			mutex_lock(&instance->bulk_waiter_list_mutex);
-			list_for_each(pos, &instance->bulk_waiter_list) {
-				if (list_entry(pos, struct bulk_waiter_node,
-					list)->pid == current->pid) {
-					waiter = list_entry(pos,
-						struct bulk_waiter_node,
-						list);
-					list_del(pos);
-					break;
-				}
+	DEBUG_INITIALISE(g_state.local);
 
-			}
-			mutex_unlock(&instance->bulk_waiter_list_mutex);
-			if (!waiter) {
-				vchiq_log_error(vchiq_arm_log_level,
-					"no bulk_waiter found for pid %d",
-					current->pid);
-				ret = -ESRCH;
-				break;
-			}
-			vchiq_log_info(vchiq_arm_log_level,
-				"found bulk_waiter %pK for pid %d", waiter,
-				current->pid);
-			args.userdata = &waiter->bulk_waiter;
-		}
-		status = vchiq_bulk_transfer
-			(args.handle,
-			 VCHI_MEM_HANDLE_INVALID,
-			 args.data, args.size,
-			 args.userdata, args.mode,
-			 dir);
-		if (!waiter)
-			break;
-		if ((status != VCHIQ_RETRY) || fatal_signal_pending(current) ||
-			!waiter->bulk_waiter.bulk) {
-			if (waiter->bulk_waiter.bulk) {
-				/* Cancel the signal when the transfer
-				** completes. */
-				spin_lock(&bulk_waiter_spinlock);
-				waiter->bulk_waiter.bulk->userdata = NULL;
-				spin_unlock(&bulk_waiter_spinlock);
-			}
-			kfree(waiter);
-		} else {
-			const VCHIQ_BULK_MODE_T mode_waiting =
-				VCHIQ_BULK_MODE_WAITING;
-			waiter->pid = current->pid;
-			mutex_lock(&instance->bulk_waiter_list_mutex);
-			list_add(&waiter->list, &instance->bulk_waiter_list);
-			mutex_unlock(&instance->bulk_waiter_list_mutex);
-			vchiq_log_info(vchiq_arm_log_level,
-				"saved bulk_waiter %pK for pid %d",
-				waiter, current->pid);
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
 
-			if (copy_to_user((void __user *)
-				&(((VCHIQ_QUEUE_BULK_TRANSFER_T __user *)
-					arg)->mode),
-				(const void *)&mode_waiting,
-				sizeof(mode_waiting)) != 0)
-				ret = -EFAULT;
-		}
-	} break;
+	if (!instance->connected)
+		return -ENOTCONN;
 
-	case VCHIQ_IOC_AWAIT_COMPLETION: {
-		VCHIQ_AWAIT_COMPLETION_T args;
+	if (copy_from_user(&args32, (const void __user *)arg,
+			   sizeof(args32)) != 0)
+		return -EFAULT;
 
-		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
-		if (!instance->connected) {
-			ret = -ENOTCONN;
-			break;
-		}
+	args.count = args32.count;
+	args.buf = (VCHIQ_COMPLETION_DATA_T __user *)(unsigned long)args32.buf;
+	args.msgbufsize = args32.msgbufsize;
+	args.msgbufcount = args32.msgbufcount;
+	args.msgbufs = (void **)(unsigned long)args32.msgbufs;
 
-		if (copy_from_user(&args, (const void __user *)arg,
-			sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
+	mutex_lock(&instance->completion_mutex);
 
-		mutex_lock(&instance->completion_mutex);
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+	while ((instance->completion_remove ==
+		instance->completion_insert)
+		&& !instance->closing) {
+		int rc;
 
 		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
-		while ((instance->completion_remove ==
-			instance->completion_insert)
-			&& !instance->closing) {
-			int rc;
+		mutex_unlock(&instance->completion_mutex);
+		rc = down_interruptible(&instance->insert_event);
+		mutex_lock(&instance->completion_mutex);
+		if (rc != 0) {
 			DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+			vchiq_log_info(vchiq_arm_log_level,
+				       "AWAIT_COMPLETION interrupted");
 			mutex_unlock(&instance->completion_mutex);
-			rc = down_interruptible(&instance->insert_event);
-			mutex_lock(&instance->completion_mutex);
-			if (rc != 0) {
-				DEBUG_TRACE(AWAIT_COMPLETION_LINE);
-				vchiq_log_info(vchiq_arm_log_level,
-					"AWAIT_COMPLETION interrupted");
-				ret = -EINTR;
-				break;
-			}
+			DEBUG_TRACE(AWAIT_COMPLETION_LINE);
+			return -EINTR;
 		}
-		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
-
-		/* A read memory barrier is needed to stop prefetch of a stale
-		** completion record
-		*/
-		rmb();
-
-		if (ret == 0) {
-			int msgbufcount = args.msgbufcount;
-			for (ret = 0; ret < args.count; ret++) {
-				VCHIQ_COMPLETION_DATA_T *completion;
-				VCHIQ_SERVICE_T *service;
-				USER_SERVICE_T *user_service;
-				VCHIQ_HEADER_T *header;
-				if (instance->completion_remove ==
-					instance->completion_insert)
-					break;
-				completion = &instance->completions[
-					instance->completion_remove &
-					(MAX_COMPLETIONS - 1)];
-
-				service = completion->service_userdata;
-				user_service = service->base.userdata;
-				completion->service_userdata =
-					user_service->userdata;
-
-				header = completion->header;
-				if (header) {
-					void __user *msgbuf;
-					int msglen;
-
-					msglen = header->size +
-						sizeof(VCHIQ_HEADER_T);
-					/* This must be a VCHIQ-style service */
-					if (args.msgbufsize < msglen) {
-						vchiq_log_error(
-							vchiq_arm_log_level,
-							"header %pK: msgbufsize %x < msglen %x",
-							header, args.msgbufsize,
-							msglen);
-						WARN(1, "invalid message "
-							"size\n");
-						if (ret == 0)
-							ret = -EMSGSIZE;
-						break;
-					}
-					if (msgbufcount <= 0)
-						/* Stall here for lack of a
-						** buffer for the message. */
-						break;
-					/* Get the pointer from user space */
-					msgbufcount--;
-					if (copy_from_user(&msgbuf,
-						(const void __user *)
-						&args.msgbufs[msgbufcount],
-						sizeof(msgbuf)) != 0) {
-						if (ret == 0)
-							ret = -EFAULT;
-						break;
-					}
-
-					/* Copy the message to user space */
-					if (copy_to_user(msgbuf, header,
-						msglen) != 0) {
-						if (ret == 0)
-							ret = -EFAULT;
-						break;
-					}
-
-					/* Now it has been copied, the message
-					** can be released. */
-					vchiq_release_message(service->handle,
-						header);
+	}
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
 
-					/* The completion must point to the
-					** msgbuf. */
-					completion->header = msgbuf;
-				}
+	/*
+	 * A read memory barrier is needed to stop prefetch of a stale
+	 * completion record.
+	 */
+	rmb();
 
-				if ((completion->reason ==
-					VCHIQ_SERVICE_CLOSED) &&
-					!instance->use_close_delivered)
-					unlock_service(service);
-
-				if (copy_to_user((void __user *)(
-					(size_t)args.buf +
-					ret * sizeof(VCHIQ_COMPLETION_DATA_T)),
-					completion,
-					sizeof(VCHIQ_COMPLETION_DATA_T)) != 0) {
-						if (ret == 0)
-							ret = -EFAULT;
-					break;
-				}
+	msgbufcount = args.msgbufcount;
+	for (ret = 0; ret < args.count; ret++) {
+		VCHIQ_COMPLETION_DATA_T *completion;
+		struct vchiq_completion_data32 completion32;
+		VCHIQ_SERVICE_T *service;
+		USER_SERVICE_T *user_service;
+		VCHIQ_HEADER_T *header;
 
-				instance->completion_remove++;
+		if (instance->completion_remove == instance->completion_insert)
+			break;
+		completion = &instance->completions[
+			instance->completion_remove &
+			(MAX_COMPLETIONS - 1)];
+
+		service = completion->service_userdata;
+		user_service = service->base.userdata;
+		completion->service_userdata = user_service->userdata;
+
+		header = completion->header;
+		if (header) {
+			void __user *msgbuf;
+			u32 msgbuf32;
+			int msglen;
+
+			msglen = header->size + sizeof(VCHIQ_HEADER_T);
+			/* This must be a VCHIQ-style service */
+			if (args.msgbufsize < msglen) {
+				vchiq_log_error(
+					vchiq_arm_log_level,
+					"header %pK: msgbufsize %x < msglen %x",
+					header, args.msgbufsize,
+					msglen);
+				WARN(1, "invalid message size\n");
+				if (ret == 0)
+					ret = -EMSGSIZE;
+				break;
 			}
-
-			if (msgbufcount != args.msgbufcount) {
-				if (copy_to_user((void __user *)
-					&((VCHIQ_AWAIT_COMPLETION_T *)arg)->
-						msgbufcount,
-					&msgbufcount,
-					sizeof(msgbufcount)) != 0) {
+			if (msgbufcount <= 0)
+				/* Stall here for lack of a
+				 ** buffer for the message. */
+				break;
+			/* Get the pointer from user space */
+			msgbufcount--;
+
+			if (copy_from_user(&msgbuf32,
+					   (const void __user *)
+						(void *)(args.msgbufs) +
+						(sizeof(u32) * msgbufcount),
+					    sizeof(msgbuf32)) != 0) {
+				if (ret == 0)
 					ret = -EFAULT;
-				}
+				break;
 			}
-		}
 
-		if (ret != 0)
-			up(&instance->remove_event);
-		mutex_unlock(&instance->completion_mutex);
-		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
-	} break;
+			msgbuf = (void *)(unsigned long)msgbuf32;
 
-	case VCHIQ_IOC_DEQUEUE_MESSAGE: {
-		VCHIQ_DEQUEUE_MESSAGE_T args;
-		USER_SERVICE_T *user_service;
-		VCHIQ_HEADER_T *header;
+			/* Copy the message to user space */
+			if (copy_to_user(msgbuf, header,
+					 msglen) != 0) {
+				if (ret == 0)
+					ret = -EFAULT;
+				break;
+			}
 
-		DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
-		if (copy_from_user
-			 (&args, (const void __user *)arg,
-			  sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
-		service = find_service_for_instance(instance, args.handle);
-		if (!service) {
-			ret = -EINVAL;
-			break;
+			/*
+			 * Now it has been copied, the message
+			 * can be released.
+			 */
+			vchiq_release_message(service->handle,
+					      header);
+
+			/*
+			 * The completion must point to the
+			 * msgbuf.
+			 */
+			completion->header = msgbuf;
 		}
-		user_service = (USER_SERVICE_T *)service->base.userdata;
-		if (user_service->is_vchi == 0) {
-			ret = -EINVAL;
+
+		if ((completion->reason ==
+			VCHIQ_SERVICE_CLOSED) &&
+			!instance->use_close_delivered)
+			unlock_service(service);
+
+		completion32.reason = completion->reason;
+		completion32.header = (u32)(unsigned long)completion->header;
+		completion32.service_userdata =
+			(u32)(unsigned long)completion->service_userdata;
+		completion32.bulk_userdata =
+			(u32)(unsigned long)completion->bulk_userdata;
+
+		if (copy_to_user((void __user *)(
+			(void *)args.buf +
+			ret * sizeof(struct vchiq_completion_data32)),
+			&completion32,
+			sizeof(struct vchiq_completion_data32)) != 0) {
+			if (ret == 0)
+				ret = -EFAULT;
 			break;
 		}
 
-		spin_lock(&msg_queue_spinlock);
-		if (user_service->msg_remove == user_service->msg_insert) {
-			if (!args.blocking) {
-				spin_unlock(&msg_queue_spinlock);
-				DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
-				ret = -EWOULDBLOCK;
-				break;
-			}
-			user_service->dequeue_pending = 1;
-			do {
-				spin_unlock(&msg_queue_spinlock);
-				DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
-				if (down_interruptible(
-					&user_service->insert_event) != 0) {
-					vchiq_log_info(vchiq_arm_log_level,
-						"DEQUEUE_MESSAGE interrupted");
-					ret = -EINTR;
-					break;
-				}
-				spin_lock(&msg_queue_spinlock);
-			} while (user_service->msg_remove ==
-				user_service->msg_insert);
+		instance->completion_remove++;
+	}
 
-			if (ret)
-				break;
-		}
+	if (msgbufcount != args.msgbufcount) {
+		if (copy_to_user((void __user *)
+				 &((struct vchiq_await_completion32 *)arg)->
+				 msgbufcount,
+				 &msgbufcount,
+				 sizeof(msgbufcount)) != 0)
+			ret = -EFAULT;
+	}
 
-		BUG_ON((int)(user_service->msg_insert -
-			user_service->msg_remove) < 0);
+	if (ret != 0)
+		up(&instance->remove_event);
+	mutex_unlock(&instance->completion_mutex);
+	DEBUG_TRACE(AWAIT_COMPLETION_LINE);
 
-		header = user_service->msg_queue[user_service->msg_remove &
-			(MSG_QUEUE_SIZE - 1)];
-		user_service->msg_remove++;
-		spin_unlock(&msg_queue_spinlock);
+	return ret;
+}
 
-		up(&user_service->remove_event);
-		if (header == NULL)
-			ret = -ENOTCONN;
-		else if (header->size <= args.bufsize) {
-			/* Copy to user space if msgbuf is not NULL */
-			if ((args.buf == NULL) ||
-				(copy_to_user((void __user *)args.buf,
-				header->data,
-				header->size) == 0)) {
-				ret = header->size;
-				vchiq_release_message(
-					service->handle,
-					header);
-			} else
-				ret = -EFAULT;
-		} else {
-			vchiq_log_error(vchiq_arm_log_level,
-				"header %pK: bufsize %x < size %x",
-				header, args.bufsize, header->size);
-			WARN(1, "invalid size\n");
-			ret = -EMSGSIZE;
-		}
-		DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
-	} break;
+static long
+vchiq_ioctl_compat_dequeue_message(VCHIQ_INSTANCE_T instance,
+				   unsigned int cmd,
+				   unsigned long arg)
+{
+	struct vchiq_dequeue_message32 args32;
 
-	case VCHIQ_IOC_GET_CLIENT_ID: {
-		VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	if (copy_from_user(&args32, (const void __user *)arg,
+			   sizeof(args32)) != 0)
+		return -EFAULT;
 
-		ret = vchiq_get_client_id(handle);
-	} break;
+	return vchiq_ioctl_dequeue_message_internal(instance,
+						    args32.handle,
+						    args32.blocking,
+						    args32.bufsize,
+						    (void __user *)
+						    (unsigned long)args32.buf);
+}
 
-	case VCHIQ_IOC_GET_CONFIG: {
-		VCHIQ_GET_CONFIG_T args;
-		VCHIQ_CONFIG_T config;
+static long
+vchiq_ioctl_compat_get_config(VCHIQ_INSTANCE_T instance,
+			      unsigned int cmd,
+			      unsigned long arg)
+{
+	struct vchiq_get_config32 args32;
+	VCHIQ_CONFIG_T config;
+	VCHIQ_STATUS_T status = VCHIQ_SUCCESS;
 
-		if (copy_from_user(&args, (const void __user *)arg,
-			sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
-		if (args.config_size > sizeof(config)) {
-			ret = -EINVAL;
-			break;
-		}
-		status = vchiq_get_config(instance, args.config_size, &config);
-		if (status == VCHIQ_SUCCESS) {
-			if (copy_to_user((void __user *)args.pconfig,
-				    &config, args.config_size) != 0) {
-				ret = -EFAULT;
-				break;
-			}
-		}
-	} break;
+	if (copy_from_user(&args32, (const void __user *)arg,
+			   sizeof(args32)) != 0)
+		return -EFAULT;
 
-	case VCHIQ_IOC_SET_SERVICE_OPTION: {
-		VCHIQ_SET_SERVICE_OPTION_T args;
+	if (args32.config_size > sizeof(config))
+		return -EFAULT;
 
-		if (copy_from_user(
-			&args, (const void __user *)arg,
-			sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
+	status = vchiq_get_config(instance, args32.config_size, &config);
 
-		service = find_service_for_instance(instance, args.handle);
-		if (!service) {
-			ret = -EINVAL;
-			break;
-		}
+	if (status != VCHIQ_SUCCESS)
+		return vchiq_map_status(status);
 
-		status = vchiq_set_service_option(
-				args.handle, args.option, args.value);
-	} break;
+	if (copy_to_user((void __user *)(unsigned long)args32.pconfig,
+			 &config, args32.config_size) != 0)
+		return -EFAULT;
 
-	case VCHIQ_IOC_DUMP_PHYS_MEM: {
-		VCHIQ_DUMP_MEM_T  args;
+	return 0;
+}
 
-		if (copy_from_user
-			 (&args, (const void __user *)arg,
-			  sizeof(args)) != 0) {
-			ret = -EFAULT;
-			break;
-		}
-		dump_phys_mem(args.virt_addr, args.num_bytes);
-	} break;
+static long
+vchiq_ioctl_compat_dump_phys_mem(VCHIQ_INSTANCE_T instance,
+				 unsigned int cmd,
+				 unsigned long arg)
+{
+	struct vchiq_dump_mem32 args32;
 
-	case VCHIQ_IOC_LIB_VERSION: {
-		unsigned int lib_version = (unsigned int)arg;
+	if (copy_from_user
+		(&args32, (const void __user *)arg,
+		sizeof(args32)) != 0)
+		return -EFAULT;
 
-		if (lib_version < VCHIQ_VERSION_MIN)
-			ret = -EINVAL;
-		else if (lib_version >= VCHIQ_VERSION_CLOSE_DELIVERED)
-			instance->use_close_delivered = 1;
-	} break;
+	dump_phys_mem((void *)(unsigned long)args32.virt_addr,
+		      args32.num_bytes);
 
-	case VCHIQ_IOC_CLOSE_DELIVERED: {
-		VCHIQ_SERVICE_HANDLE_T handle = (VCHIQ_SERVICE_HANDLE_T)arg;
+	return 0;
+}
 
-		service = find_closed_service_for_instance(instance, handle);
-		if (service != NULL) {
-			USER_SERVICE_T *user_service =
-				(USER_SERVICE_T *)service->base.userdata;
-			close_delivered(user_service);
-		}
-		else
-			ret = -EINVAL;
-	} break;
+static const struct vchiq_ioctl_entry vchiq_ioctl_compat_table[] = {
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CONNECT, vchiq_ioctl_connect),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_SHUTDOWN, vchiq_ioctl_shutdown),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CREATE_SERVICE32, vchiq_ioctl_compat_create_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CLOSE_SERVICE, vchiq_ioctl_close_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_REMOVE_SERVICE, vchiq_ioctl_remove_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_USE_SERVICE, vchiq_ioctl_use_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_RELEASE_SERVICE, vchiq_ioctl_release_service),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_MESSAGE32, vchiq_ioctl_compat_queue_message),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_BULK_TRANSMIT32, vchiq_ioctl_compat_bulk_transfer),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_QUEUE_BULK_RECEIVE32, vchiq_ioctl_compat_bulk_transfer),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_AWAIT_COMPLETION32, vchiq_ioctl_compat_await_completion),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_DEQUEUE_MESSAGE32, vchiq_ioctl_compat_dequeue_message),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_GET_CLIENT_ID, vchiq_ioctl_get_client_handle),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_GET_CONFIG32, vchiq_ioctl_compat_get_config),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_SET_SERVICE_OPTION, vchiq_ioctl_set_service_option),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_DUMP_PHYS_MEM32, vchiq_ioctl_compat_dump_phys_mem),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_LIB_VERSION, vchiq_ioctl_lib_version),
+	VCHIQ_MK_IOCTL(VCHIQ_IOC_CLOSE_DELIVERED, vchiq_ioctl_close_delivered)
+};
 
-	default:
-		ret = -ENOTTY;
-		break;
-	}
 
-	if (service)
-		unlock_service(service);
+/****************************************************************************
+*
+*   vchiq_ioctl
+*
+***************************************************************************/
+static long
+vchiq_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	VCHIQ_INSTANCE_T instance = file->private_data;
+	long ret = 0;
+	unsigned int ioctl_nr;
 
-	if (ret == 0) {
-		if (status == VCHIQ_ERROR)
-			ret = -EIO;
-		else if (status == VCHIQ_RETRY)
-			ret = -EINTR;
+	ioctl_nr = _IOC_NR(cmd);
+
+	vchiq_log_trace(vchiq_arm_log_level,
+			"vchiq_ioctl(compat) - instance %pK, cmd %s, arg %lx",
+			instance,
+			((_IOC_TYPE(cmd) == VCHIQ_IOC_MAGIC) &&
+			(ioctl_nr <= VCHIQ_IOC_MAX)) ?
+			ioctl_names[ioctl_nr] : "<invalid>", arg);
+
+	if ((ioctl_nr > VCHIQ_IOC_MAX) ||
+	    (vchiq_ioctl_compat_table[ioctl_nr].ioctl != cmd)) {
+		ret = -ENOTTY;
+	} else {
+		ret = vchiq_ioctl_compat_table[ioctl_nr].func(instance, cmd, arg);
 	}
 
-	if ((status == VCHIQ_SUCCESS) && (ret < 0) && (ret != -EINTR) &&
-		(ret != -EWOULDBLOCK))
+	if ((ret < 0) && (ret != -EINTR) && (ret != -EWOULDBLOCK))
 		vchiq_log_info(vchiq_arm_log_level,
-			"  ioctl instance %lx, cmd %s -> status %d, %ld",
-			(unsigned long)instance,
-			(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
-				ioctl_names[_IOC_NR(cmd)] :
-				"<invalid>",
-			status, ret);
+			       "  ioctl(compat) instance %lx, cmd %s, %ld",
+			       (unsigned long)instance,
+			       (ioctl_nr <= VCHIQ_IOC_MAX) ?
+			       ioctl_names[ioctl_nr] :
+			       "<invalid>",
+			       ret);
 	else
 		vchiq_log_trace(vchiq_arm_log_level,
-			"  ioctl instance %lx, cmd %s -> status %d, %ld",
-			(unsigned long)instance,
-			(_IOC_NR(cmd) <= VCHIQ_IOC_MAX) ?
-				ioctl_names[_IOC_NR(cmd)] :
+				"  ioctl(compat) instance %lx, cmd %s -> %ld",
+				(unsigned long)instance,
+				(ioctl_nr <= VCHIQ_IOC_MAX) ?
+				ioctl_names[ioctl_nr] :
 				"<invalid>",
-			status, ret);
+				ret);
 
 	return ret;
 }
 
+#endif
+
 /****************************************************************************
 *
 *   vchiq_open
@@ -1660,6 +2345,9 @@ static const struct file_operations
 vchiq_fops = {
 	.owner = THIS_MODULE,
 	.unlocked_ioctl = vchiq_ioctl,
+#if defined(CONFIG_COMPAT)
+	.compat_ioctl = vchiq_ioctl_compat,
+#endif
 	.open = vchiq_open,
 	.release = vchiq_release,
 	.read = vchiq_read
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_if.h b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_if.h
index 377e8e4..5bb9d12 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_if.h
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_if.h
@@ -93,6 +93,11 @@ typedef struct {
 	unsigned int size;
 } VCHIQ_ELEMENT_T;
 
+struct vchiq_element32 {
+	u32 data;
+	unsigned int size;
+};
+
 typedef unsigned int VCHIQ_SERVICE_HANDLE_T;
 
 typedef VCHIQ_STATUS_T (*VCHIQ_CALLBACK_T)(VCHIQ_REASON_T, VCHIQ_HEADER_T *,
@@ -104,6 +109,12 @@ typedef struct vchiq_service_base_struct {
 	void *userdata;
 } VCHIQ_SERVICE_BASE_T;
 
+struct vchiq_service_base32 {
+	int fourcc;
+	u32 callback;
+	u32 userdata;
+};
+
 typedef struct vchiq_service_params_struct {
 	int fourcc;
 	VCHIQ_CALLBACK_T callback;
@@ -112,6 +123,14 @@ typedef struct vchiq_service_params_struct {
 	short version_min;   /* Update for incompatible changes */
 } VCHIQ_SERVICE_PARAMS_T;
 
+struct vchiq_service_params32 {
+	int fourcc;
+	u32 callback;
+	u32 userdata;
+	short version;       /* Increment for non-trivial changes */
+	short version_min;   /* Update for incompatible changes */
+};
+
 typedef struct vchiq_config_struct {
 	unsigned int max_msg_size;
 	unsigned int bulk_threshold; /* The message size above which it
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_ioctl.h b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_ioctl.h
index 6137ae9..412ab0f 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_ioctl.h
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_ioctl.h
@@ -47,12 +47,25 @@ typedef struct {
 	unsigned int handle;       /* OUT */
 } VCHIQ_CREATE_SERVICE_T;
 
+struct vchiq_create_service32 {
+	struct vchiq_service_params32 params;
+	int is_open;
+	int is_vchi;
+	unsigned int handle;       /* OUT */
+};
+
 typedef struct {
 	unsigned int handle;
 	unsigned int count;
 	const VCHIQ_ELEMENT_T *elements;
 } VCHIQ_QUEUE_MESSAGE_T;
 
+struct vchiq_queue_message32 {
+	unsigned int handle;
+	unsigned int count;
+	u32 elements;
+};
+
 typedef struct {
 	unsigned int handle;
 	void *data;
@@ -61,6 +74,14 @@ typedef struct {
 	VCHIQ_BULK_MODE_T mode;
 } VCHIQ_QUEUE_BULK_TRANSFER_T;
 
+struct vchiq_queue_bulk_transfer32 {
+	unsigned int handle;
+	u32 data;
+	unsigned int size;
+	u32 userdata;
+	VCHIQ_BULK_MODE_T mode;
+};
+
 typedef struct {
 	VCHIQ_REASON_T reason;
 	VCHIQ_HEADER_T *header;
@@ -68,6 +89,13 @@ typedef struct {
 	void *bulk_userdata;
 } VCHIQ_COMPLETION_DATA_T;
 
+struct vchiq_completion_data32 {
+	VCHIQ_REASON_T reason;
+	u32 header;
+	u32 service_userdata;
+	u32 bulk_userdata;
+};
+
 typedef struct {
 	unsigned int count;
 	VCHIQ_COMPLETION_DATA_T *buf;
@@ -76,6 +104,14 @@ typedef struct {
 	void **msgbufs;
 } VCHIQ_AWAIT_COMPLETION_T;
 
+struct vchiq_await_completion32 {
+	unsigned int count;
+	u32 buf;
+	unsigned int msgbufsize;
+	unsigned int msgbufcount; /* IN/OUT */
+	u32 msgbufs;
+};
+
 typedef struct {
 	unsigned int handle;
 	int blocking;
@@ -83,11 +119,23 @@ typedef struct {
 	void *buf;
 } VCHIQ_DEQUEUE_MESSAGE_T;
 
+struct vchiq_dequeue_message32 {
+	unsigned int handle;
+	int blocking;
+	unsigned int bufsize;
+	u32 buf;
+};
+
 typedef struct {
 	unsigned int config_size;
 	VCHIQ_CONFIG_T *pconfig;
 } VCHIQ_GET_CONFIG_T;
 
+struct vchiq_get_config32 {
+	unsigned int config_size;
+	u32 pconfig;
+};
+
 typedef struct {
 	unsigned int handle;
 	VCHIQ_SERVICE_OPTION_T option;
@@ -99,24 +147,43 @@ typedef struct {
 	size_t    num_bytes;
 } VCHIQ_DUMP_MEM_T;
 
+struct vchiq_dump_mem32 {
+	u32 virt_addr;
+	u32 num_bytes;
+};
+
 #define VCHIQ_IOC_CONNECT              _IO(VCHIQ_IOC_MAGIC,   0)
 #define VCHIQ_IOC_SHUTDOWN             _IO(VCHIQ_IOC_MAGIC,   1)
 #define VCHIQ_IOC_CREATE_SERVICE \
 	_IOWR(VCHIQ_IOC_MAGIC, 2, VCHIQ_CREATE_SERVICE_T)
+#define VCHIQ_IOC_CREATE_SERVICE32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 2, struct vchiq_create_service32)
 #define VCHIQ_IOC_REMOVE_SERVICE       _IO(VCHIQ_IOC_MAGIC,   3)
 #define VCHIQ_IOC_QUEUE_MESSAGE \
 	_IOW(VCHIQ_IOC_MAGIC,  4, VCHIQ_QUEUE_MESSAGE_T)
+#define VCHIQ_IOC_QUEUE_MESSAGE32 \
+	_IOW(VCHIQ_IOC_MAGIC,  4, struct vchiq_queue_message32)
 #define VCHIQ_IOC_QUEUE_BULK_TRANSMIT \
 	_IOWR(VCHIQ_IOC_MAGIC, 5, VCHIQ_QUEUE_BULK_TRANSFER_T)
+#define VCHIQ_IOC_QUEUE_BULK_TRANSMIT32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 5, struct vchiq_queue_bulk_transfer32)
 #define VCHIQ_IOC_QUEUE_BULK_RECEIVE \
 	_IOWR(VCHIQ_IOC_MAGIC, 6, VCHIQ_QUEUE_BULK_TRANSFER_T)
+#define VCHIQ_IOC_QUEUE_BULK_RECEIVE32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 6, struct vchiq_queue_bulk_transfer32)
 #define VCHIQ_IOC_AWAIT_COMPLETION \
 	_IOWR(VCHIQ_IOC_MAGIC, 7, VCHIQ_AWAIT_COMPLETION_T)
+#define VCHIQ_IOC_AWAIT_COMPLETION32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 7, struct vchiq_await_completion32)
 #define VCHIQ_IOC_DEQUEUE_MESSAGE \
 	_IOWR(VCHIQ_IOC_MAGIC, 8, VCHIQ_DEQUEUE_MESSAGE_T)
+#define VCHIQ_IOC_DEQUEUE_MESSAGE32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 8, struct vchiq_dequeue_message32)
 #define VCHIQ_IOC_GET_CLIENT_ID        _IO(VCHIQ_IOC_MAGIC,   9)
 #define VCHIQ_IOC_GET_CONFIG \
 	_IOWR(VCHIQ_IOC_MAGIC, 10, VCHIQ_GET_CONFIG_T)
+#define VCHIQ_IOC_GET_CONFIG32 \
+	_IOWR(VCHIQ_IOC_MAGIC, 10, struct vchiq_get_config32)
 #define VCHIQ_IOC_CLOSE_SERVICE        _IO(VCHIQ_IOC_MAGIC,   11)
 #define VCHIQ_IOC_USE_SERVICE          _IO(VCHIQ_IOC_MAGIC,   12)
 #define VCHIQ_IOC_RELEASE_SERVICE      _IO(VCHIQ_IOC_MAGIC,   13)
@@ -124,6 +191,8 @@ typedef struct {
 	_IOW(VCHIQ_IOC_MAGIC,  14, VCHIQ_SET_SERVICE_OPTION_T)
 #define VCHIQ_IOC_DUMP_PHYS_MEM \
 	_IOW(VCHIQ_IOC_MAGIC,  15, VCHIQ_DUMP_MEM_T)
+#define VCHIQ_IOC_DUMP_PHYS_MEM32 \
+	_IOW(VCHIQ_IOC_MAGIC,  15, struct vchiq_dump_mem32)
 #define VCHIQ_IOC_LIB_VERSION          _IO(VCHIQ_IOC_MAGIC,   16)
 #define VCHIQ_IOC_CLOSE_DELIVERED      _IO(VCHIQ_IOC_MAGIC,   17)
 #define VCHIQ_IOC_MAX                  17
-- 
2.10.2

^ permalink raw reply related

* Summary of LPC guest MSI discussion in Santa Fe
From: Will Deacon @ 2016-11-09 20:31 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109192303.GD15676@cbox>

On Wed, Nov 09, 2016 at 08:23:03PM +0100, Christoffer Dall wrote:
> On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote:
> > On 11/09/2016 12:03 PM, Will Deacon wrote:
> > >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
> > >>On 11/08/2016 06:35 PM, Alex Williamson wrote:
> > >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
> > >>>it's potentially a DMA target and we'll get bogus data on DMA read from
> > >>>the device, and lose data and potentially trigger spurious interrupts on
> > >>>DMA write from the device.  Thanks,
> > >>>
> > >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
> > >>they address match before the SMMU checks are done.  if
> > >>all DMA addrs had to go through SMMU first, then the DMA access could
> > >>be ignored/rejected.
> > >
> > >That's actually not true :( The SMMU can't generally distinguish between MSI
> > >writes and DMA writes, so it would just see a write transaction to the
> > >doorbell address, regardless of how it was generated by the endpoint.
> > >
> > So, we have real systems where MSI doorbells are placed at the same IOVA
> > that could have memory for a guest
> 
> I don't think this is a property of a hardware system.  THe problem is
> userspace not knowing where in the IOVA space the kernel is going to
> place the doorbell, so you can end up (basically by chance) that some
> IPA range of guest memory overlaps with the IOVA space for the doorbell.

I think the case that Don has in mind is where the host is using the SMMU
for DMA mapping. In that case, yes, the IOVAs assigned by things like
dma_map_single mustn't collide with any fixed MSI mapping. We currently take
care to avoid PCI windows, but nobody has added the code for the fixed MSI
mappings yet (I think we should put the onus on the people with the broken
systems for that). Depending on how the firmware describes the fixed MSI
address, either the irqchip driver can take care of it in compose_msi_msg,
or we could do something in the iommu_dma_map_msi_msg path to ensure that
the fixed region is preallocated in the msi_page_list.

I'm less fussed about this issue because there's not a user ABI involved,
so it can all be fixed later.

> >, but not at the same IOVA as memory on real hw ?
> 
> On real hardware without an IOMMU the system designer would have to
> separate the IOVA and RAM in the physical address space.  With an IOMMU,
> the SMMU driver just makes sure to allocate separate regions in the IOVA
> space.
> 
> The challenge, as I understand it, happens with the VM, because the VM
> doesn't allocate the IOVA for the MSI doorbell itself, but the host
> kernel does this, independently from the attributes (e.g. memory map) of
> the VM.
> 
> Because the IOVA is a single resource, but with two independent entities
> allocating chunks of it (the host kernel for the MSI doorbell IOVA, and
> the VFIO user for other DMA operations), you have to provide some
> coordination between those to entities to avoid conflicts.  In the case
> of KVM, the two entities are the host kernel and the VFIO user (QEMU/the
> VM), and the host kernel informs the VFIO user to never attempt to use
> the doorbell IOVA already reserved by the host kernel for DMA.
> 
> One way to do that is to ensure that the IPA space of the VFIO user
> corresponding to the doorbell IOVA is simply not valid, ie. the reserved
> regions that avoid for example QEMU to allocate RAM there.
> 
> (I suppose it's technically possible to get around this issue by letting
> QEMU place RAM wherever it wants but tell the guest to never use a
> particular subset of its RAM for DMA, because that would conflict with
> the doorbell IOVA or be seen as p2p transactions.  But I think we all
> probably agree that it's a disgusting idea.)

Disgusting, yes, but Ben's idea of hotplugging on the host controller with
firmware tables describing the reserved regions is something that we could
do in the distant future. In the meantime, I don't think that VFIO should
explicitly reject overlapping mappings if userspace asks for them.

Will

^ permalink raw reply

* [PATCHv2] PCI: QDF2432 32 bit config space accessors
From: Ard Biesheuvel @ 2016-11-09 20:29 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109200635.GM14322@bhelgaas-glaptop.roam.corp.google.com>

Hi Bjorn,

On 9 November 2016 at 20:06, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, Nov 09, 2016 at 02:25:56PM -0500, Christopher Covington wrote:
>> Hi Bjorn,
>>
[...]
>>
>> We're working to add the PNP0C02 resource to future firmware, but it's
>> not in the current firmware. Are dmesg and /proc/iomem from the
>> current firmware interesting or should we wait for the update to file?
>
> Note that the ECAM space is not the only thing that should be
> described via these PNP0C02 devices.  *All* non-enumerable resources
> should be described by the _CRS method of some ACPI device.  Here's a
> sample from my laptop:
>
>   PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
>   system 00:01: [io  0x1800-0x189f] could not be reserved
>   system 00:01: [io  0x0800-0x087f] has been reserved
>   system 00:01: [io  0x0880-0x08ff] has been reserved
>   system 00:01: [io  0x0900-0x097f] has been reserved
>   system 00:01: [io  0x0980-0x09ff] has been reserved
>   system 00:01: [io  0x0a00-0x0a7f] has been reserved
>   system 00:01: [io  0x0a80-0x0aff] has been reserved
>   system 00:01: [io  0x0b00-0x0b7f] has been reserved
>   system 00:01: [io  0x0b80-0x0bff] has been reserved
>   system 00:01: [io  0x15e0-0x15ef] has been reserved
>   system 00:01: [io  0x1600-0x167f] has been reserved
>   system 00:01: [io  0x1640-0x165f] has been reserved
>   system 00:01: [mem 0xf8000000-0xfbffffff] could not be reserved
>   system 00:01: [mem 0xfed10000-0xfed13fff] has been reserved
>   system 00:01: [mem 0xfed18000-0xfed18fff] has been reserved
>   system 00:01: [mem 0xfed19000-0xfed19fff] has been reserved
>   system 00:01: [mem 0xfeb00000-0xfebfffff] has been reserved
>   system 00:01: [mem 0xfed20000-0xfed3ffff] has been reserved
>   system 00:01: [mem 0xfed90000-0xfed93fff] has been reserved
>   system 00:01: [mem 0xf7fe0000-0xf7ffffff] has been reserved
>   system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>
> Do you have firmware in the field that may not get updated?  If so,
> I'd like to see the whole solution for that firmware, including the
> MCFG quirk (which tells the PCI core where the ECAM region is) and
> whatever PNP0C02 quirk you figure out to actually reserve the region.
>
> I proposed a PNP0C02 quirk to Duc along these lines of the below.  I
> don't actually know if it's feasible, but it didn't look as bad as I
> expected, so I'd kind of like somebody to try it out.  I think you
> would have to call this via a DMI hook (do you have DMI on arm64?),
> maybe from pnp_init() or similar.
>

We do have SMBIOS/DMI on arm64, but we have been successful so far not
to rely on it for quirks, and we'd very much like to keep it that way.

Since this ACPI _CRS method has nothing to do with SMBIOS/DMI, surely
there is a better way to wire up the reservation code to the
information exposed by ACPI?

-- 
Ard.

>   struct pnp_protocol pnpquirk_protocol = {
>     .name = "Plug and Play Quirks",
>   };
>
>   void quirk()
>   {
>     struct pnp_dev *dev;
>     struct resource res;
>
>     ret = pnp_register_protocol(&pnpquirk_protocol);
>     if (ret)
>       return;
>
>     dev = pnp_alloc_dev(&pnpquirk_protocol, 0, "PNP0C02");
>     if (!dev)
>       return;
>
>     res.start = XX;          /* ECAM start */
>     res.end = YY;            /* ECAM end */
>     res.flags = IORESOURCE_MEM;
>     pnp_add_resource(dev, &res);
>
>     dev->active = 1;
>     pnp_add_device(dev);
>
>     dev_info(&dev->dev, "fabricated device to reserve ECAM space %pR\n", &res);
>   }
>
> Bjorn

^ permalink raw reply

* Summary of LPC guest MSI discussion in Santa Fe
From: Robin Murphy @ 2016-11-09 20:11 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <582371FB.2040808@redhat.com>

On 09/11/16 18:59, Don Dutile wrote:
> On 11/09/2016 12:03 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
>>> On 11/08/2016 06:35 PM, Alex Williamson wrote:
>>>> On Tue, 8 Nov 2016 21:29:22 +0100
>>>> Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>>>> Is my understanding correct, that you need to tell userspace about the
>>>>> location of the doorbell (in the IOVA space) in case (2), because even
>>>>> though the configuration of the device is handled by the (host) kernel
>>>>> through trapping of the BARs, we have to avoid the VFIO user
>>>>> programming
>>>>> the device to create other DMA transactions to this particular
>>>>> address,
>>>>> since that will obviously conflict and either not produce the desired
>>>>> DMA transactions or result in unintended weird interrupts?
>>
>> Yes, that's the crux of the issue.
>>
>>>> Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
>>>> it's potentially a DMA target and we'll get bogus data on DMA read from
>>>> the device, and lose data and potentially trigger spurious
>>>> interrupts on
>>>> DMA write from the device.  Thanks,
>>>>
>>> That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
>>> they address match before the SMMU checks are done.  if
>>> all DMA addrs had to go through SMMU first, then the DMA access could
>>> be ignored/rejected.
>>
>> That's actually not true :( The SMMU can't generally distinguish
>> between MSI
>> writes and DMA writes, so it would just see a write transaction to the
>> doorbell address, regardless of how it was generated by the endpoint.
>>
>> Will
>>
> So, we have real systems where MSI doorbells are placed at the same IOVA
> that could have memory for a guest, but not at the same IOVA as memory
> on real hw ?

MSI doorbells integral to PCIe root complexes (and thus untranslatable)
typically have a programmable address, so could be anywhere. In the more
general category of "special hardware addresses", QEMU's default ARM
guest memory map puts RAM starting at 0x40000000; on the ARM Juno
platform, that happens to be where PCI config space starts; as Juno's
PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
the PCI bus to a guest (all of it, given the lack of ACS), the root
complex just sees the guest's attempts to DMA to "memory" as the device
attempting to access config space and aborts them.

> How are memory holes passed to SMMU so it doesn't have this issue for
> bare-metal
> (assign an IOVA that overlaps an MSI doorbell address)?

When we *are* in full control of the IOVA space, we just carve out what
we can find as best we can - see iova_reserve_pci_windows() in
dma-iommu.c, which isn't really all that different to what x86 does
(e.g. init_reserved_iova_ranges() in amd-iommu.c). Note that we don't
actually have any way currently to discover upstream MSI doorbells
(ponder dw_pcie_msi_init() in pcie-designware.c for an example of the
problem) - the specific MSI support we have in DMA ops at the moment
only covers GICv2m or GICv3 ITS downstream of translation, but
fortunately that's the typical relevant use-case on current platforms.

Robin.

^ permalink raw reply

* [PATCH 10/13] ARM: dts: exynos: replace to "max-frequecy" instead of "clock-freq-min-max"
From: Krzysztof Kozlowski @ 2016-11-09 20:10 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <72612112-3b79-8fd3-8be4-a8f60ab3b68a@samsung.com>

On Mon, Nov 07, 2016 at 09:38:15AM +0900, Jaehoon Chung wrote:
> On 11/05/2016 12:04 AM, Krzysztof Kozlowski wrote:
> > On Fri, Nov 04, 2016 at 12:19:49PM +0100, Heiko Stuebner wrote:
> >> Hi Jaehoon,
> >>
> >> Am Freitag, 4. November 2016, 19:21:30 CET schrieb Jaehoon Chung:
> >>> On 11/04/2016 03:41 AM, Krzysztof Kozlowski wrote:
> >>>> On Thu, Nov 03, 2016 at 03:21:32PM +0900, Jaehoon Chung wrote:
> >>>>> In drivers/mmc/core/host.c, there is "max-frequency" property.
> >>>>> It should be same behavior. So Use the "max-frequency" instead of
> >>>>> "clock-freq-min-max".
> >>>>>
> >>>>> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
> >>>>> ---
> >>>>>
> >>>>>  arch/arm/boot/dts/exynos3250-artik5-eval.dts | 2 +-
> >>>>>  arch/arm/boot/dts/exynos3250-artik5.dtsi     | 2 +-
> >>>>>  arch/arm/boot/dts/exynos3250-monk.dts        | 2 +-
> >>>>>  arch/arm/boot/dts/exynos3250-rinato.dts      | 2 +-
> >>>>>  4 files changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> This looks totally independent to rest of patches so it can be applied
> >>>> separately without any functional impact (except lack of minimum
> >>>> frequency). Is that correct?
> >>>
> >>> You're right. I will split the patches. And will resend.
> >>> Thanks!
> >>
> >> I think what Krzysztof was asking was just if he can simply pick up this patch 
> >> alone, as it does not require any of the previous changes.
> >>
> >> Same is true for the Rockchip patches I guess, so we could just take them 
> >> individually into samsung/rockchip dts branches.
> > 
> > Yes, I wanted to get exactly this information. I couldn't find it in
> > cover letter.
> 
> In drivers/mmc/core/host.c, there already is "max-frequency" property.
> It's same functionality with "clock-freq-min-max". 
> Minimum clock value can be fixed to 100K. because MMC core will check clock value from 400K to 100K.
> But max-frequency can be difference.
> If we can use "max-frequency" property, we don't need to use "clock-freq-min-max" property anymore.
> I will resend the deprecated property instead of removing "clock-freq-min-max".
> 
> If you want to pick this, it's possible to pick. Then i will resend the patches without dt patches.

Thanks, applied.

Best regards,
Krzysztof

^ permalink raw reply

* [PATCHv2] PCI: QDF2432 32 bit config space accessors
From: Bjorn Helgaas @ 2016-11-09 20:06 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <aa37ca67-9909-e8f2-1a3e-b724cdb46c7b@codeaurora.org>

On Wed, Nov 09, 2016 at 02:25:56PM -0500, Christopher Covington wrote:
> Hi Bjorn,
> 
> On 11/02/2016 12:08 PM, Bjorn Helgaas wrote:
> > On Tue, Nov 01, 2016 at 07:06:31AM -0600, cov at codeaurora.org wrote:
> >> Hi Bjorn,
> >>
> >> On 2016-10-31 15:48, Bjorn Helgaas wrote:
> >>> On Wed, Sep 21, 2016 at 06:38:05PM -0400, Christopher Covington wrote:
> >>>> The Qualcomm Technologies QDF2432 SoC does not support accesses
> >>>> smaller
> >>>> than 32 bits to the PCI configuration space. Register the appropriate
> >>>> quirk.
> >>>>
> >>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >>>
> >>> Hi Christopher,
> >>>
> >>> Can you rebase this against v4.9-rc1?  It no longer applies to my tree.
> >>
> >> I apologize for not being clearer. This patch depends on:
> >>
> >> PCI/ACPI: Extend pci_mcfg_lookup() responsibilities
> >> PCI/ACPI: Check platform-specific ECAM quirks
> >>
> >> These patches from Tomasz Nowicki were previously in your pci/ecam-v6
> >> branch, but that seems to have come and gone. How would you like to
> >> proceed?
> > 
> > Oh yes, that's right, I forgot that connection.  I'm afraid I kind of
> > dropped the ball on that thread, so I went back and read through it
> > again.
> > 
> > I *think* the current state is:
> > 
> >   - I'm OK with the first two patches that add the quirk
> >     infrastructure.
> > 
> >   - My issue with the last three patches that add ThunderX quirks is
> >     that there's no generic description of the ECAM address space.
> > 
> > So if I understand correctly, your Qualcomm patch depends only on the
> > first two patches.
> > 
> > Then the question is how the Qualcomm ECAM address space is described.
> > Your quirk overrides the default pci_generic_ecam_ops with the
> > &pci_32b_ops, but it doesn't touch the address space part, so I assume
> > the bus ranges and corresponding address space in your MCFG is
> > correct.  So far, so good.
> > 
> > Is there also an ACPI device that contains that space in _CRS?  I
> > think we concluded that the standard solution is to describe this with
> > a PNP0C02 device.
> > 
> > Would you mind opening a bugzilla at bugzilla.kernel.org and attaching
> > the dmesg log, /proc/iomem, and maybe a DSDT dump?  I'd like to have
> > something to point at to say "if you need an MCFG quirk, you need the
> > MCFG bit and *also* these other related ACPI device bits, and here's
> > how it should be done."
> 
> We're working to add the PNP0C02 resource to future firmware, but it's
> not in the current firmware. Are dmesg and /proc/iomem from the
> current firmware interesting or should we wait for the update to file?

Note that the ECAM space is not the only thing that should be
described via these PNP0C02 devices.  *All* non-enumerable resources
should be described by the _CRS method of some ACPI device.  Here's a
sample from my laptop:

  PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
  system 00:01: [io  0x1800-0x189f] could not be reserved
  system 00:01: [io  0x0800-0x087f] has been reserved
  system 00:01: [io  0x0880-0x08ff] has been reserved
  system 00:01: [io  0x0900-0x097f] has been reserved
  system 00:01: [io  0x0980-0x09ff] has been reserved
  system 00:01: [io  0x0a00-0x0a7f] has been reserved
  system 00:01: [io  0x0a80-0x0aff] has been reserved
  system 00:01: [io  0x0b00-0x0b7f] has been reserved
  system 00:01: [io  0x0b80-0x0bff] has been reserved
  system 00:01: [io  0x15e0-0x15ef] has been reserved
  system 00:01: [io  0x1600-0x167f] has been reserved
  system 00:01: [io  0x1640-0x165f] has been reserved
  system 00:01: [mem 0xf8000000-0xfbffffff] could not be reserved
  system 00:01: [mem 0xfed10000-0xfed13fff] has been reserved
  system 00:01: [mem 0xfed18000-0xfed18fff] has been reserved
  system 00:01: [mem 0xfed19000-0xfed19fff] has been reserved
  system 00:01: [mem 0xfeb00000-0xfebfffff] has been reserved
  system 00:01: [mem 0xfed20000-0xfed3ffff] has been reserved
  system 00:01: [mem 0xfed90000-0xfed93fff] has been reserved
  system 00:01: [mem 0xf7fe0000-0xf7ffffff] has been reserved
  system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)

Do you have firmware in the field that may not get updated?  If so,
I'd like to see the whole solution for that firmware, including the
MCFG quirk (which tells the PCI core where the ECAM region is) and
whatever PNP0C02 quirk you figure out to actually reserve the region.

I proposed a PNP0C02 quirk to Duc along these lines of the below.  I
don't actually know if it's feasible, but it didn't look as bad as I
expected, so I'd kind of like somebody to try it out.  I think you
would have to call this via a DMI hook (do you have DMI on arm64?),
maybe from pnp_init() or similar.

  struct pnp_protocol pnpquirk_protocol = {
    .name = "Plug and Play Quirks",
  };

  void quirk()
  {
    struct pnp_dev *dev;
    struct resource res;

    ret = pnp_register_protocol(&pnpquirk_protocol);
    if (ret)
      return;

    dev = pnp_alloc_dev(&pnpquirk_protocol, 0, "PNP0C02");
    if (!dev)
      return;

    res.start = XX;          /* ECAM start */
    res.end = YY;            /* ECAM end */
    res.flags = IORESOURCE_MEM;
    pnp_add_resource(dev, &res);

    dev->active = 1;
    pnp_add_device(dev);

    dev_info(&dev->dev, "fabricated device to reserve ECAM space %pR\n", &res);
  }

Bjorn

^ permalink raw reply

* [PATCH v3 2/2] ARM: EXYNOS: Remove unused soc_is_exynos{4,5}
From: Krzysztof Kozlowski @ 2016-11-09 20:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1478693755-11953-3-git-send-email-pankaj.dubey@samsung.com>

On Wed, Nov 09, 2016 at 05:45:55PM +0530, Pankaj Dubey wrote:
> As no more user of soc_is_exynos{4,5} we can safely remove them.
> 
> Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
> ---
>  arch/arm/mach-exynos/common.h | 5 -----
>  1 file changed, 5 deletions(-)
>

Thanks, applied.

Best regards,
Krzysztof

^ permalink raw reply

* [PATCH v3 1/2] ARM: EXYNOS: Remove static mapping of SCU SFR
From: Krzysztof Kozlowski @ 2016-11-09 20:03 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1478693755-11953-2-git-send-email-pankaj.dubey@samsung.com>

On Wed, Nov 09, 2016 at 05:45:54PM +0530, Pankaj Dubey wrote:
> Lets remove static mapping of SCU SFR mainly used in CORTEX-A9 SoC based
> boards. Instead use mapping from device tree node of SCU.
> 
> NOTE: This patch has dependency on DT file of any such CORTEX-A9 SoC
> based boards, in the absence of SCU device node in DTS file, only single
> CPU will boot. So if you are using OUT-OF-TREE DTS file of CORTEX-A9 based
> Exynos SoC make sure to add SCU device node to DTS file for SMP boot.
> 
> Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
> ---
>  arch/arm/mach-exynos/common.h                |  1 +
>  arch/arm/mach-exynos/exynos.c                | 22 ------------------
>  arch/arm/mach-exynos/include/mach/map.h      |  2 --
>  arch/arm/mach-exynos/platsmp.c               | 34 +++++++++++++++++++++-------
>  arch/arm/mach-exynos/pm.c                    |  4 +---
>  arch/arm/mach-exynos/suspend.c               |  4 +---
>  arch/arm/plat-samsung/include/plat/map-s5p.h |  4 ----
>  7 files changed, 29 insertions(+), 42 deletions(-)
>

Applied on a next/soc branch after merging next/dt to preserve
bisectability.

Best regards,
Krzysztof

^ permalink raw reply

* Summary of LPC guest MSI discussion in Santa Fe
From: Alex Williamson @ 2016-11-09 20:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161109192303.GD15676@cbox>

On Wed, 9 Nov 2016 20:23:03 +0100
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote:
> > On 11/09/2016 12:03 PM, Will Deacon wrote:  
> > >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:  
> > >>On 11/08/2016 06:35 PM, Alex Williamson wrote:  
> > >>>On Tue, 8 Nov 2016 21:29:22 +0100
> > >>>Christoffer Dall <christoffer.dall@linaro.org> wrote:  
> > >>>>Is my understanding correct, that you need to tell userspace about the
> > >>>>location of the doorbell (in the IOVA space) in case (2), because even
> > >>>>though the configuration of the device is handled by the (host) kernel
> > >>>>through trapping of the BARs, we have to avoid the VFIO user programming
> > >>>>the device to create other DMA transactions to this particular address,
> > >>>>since that will obviously conflict and either not produce the desired
> > >>>>DMA transactions or result in unintended weird interrupts?  
> > >
> > >Yes, that's the crux of the issue.
> > >  
> > >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
> > >>>it's potentially a DMA target and we'll get bogus data on DMA read from
> > >>>the device, and lose data and potentially trigger spurious interrupts on
> > >>>DMA write from the device.  Thanks,
> > >>>  
> > >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
> > >>they address match before the SMMU checks are done.  if
> > >>all DMA addrs had to go through SMMU first, then the DMA access could
> > >>be ignored/rejected.  
> > >
> > >That's actually not true :( The SMMU can't generally distinguish between MSI
> > >writes and DMA writes, so it would just see a write transaction to the
> > >doorbell address, regardless of how it was generated by the endpoint.
> > >
> > >Will
> > >  
> > So, we have real systems where MSI doorbells are placed at the same IOVA
> > that could have memory for a guest  
> 
> I don't think this is a property of a hardware system.  THe problem is
> userspace not knowing where in the IOVA space the kernel is going to
> place the doorbell, so you can end up (basically by chance) that some
> IPA range of guest memory overlaps with the IOVA space for the doorbell.
> 
> 
> >, but not at the same IOVA as memory on real hw ?  
> 
> On real hardware without an IOMMU the system designer would have to
> separate the IOVA and RAM in the physical address space.  With an IOMMU,
> the SMMU driver just makes sure to allocate separate regions in the IOVA
> space.
> 
> The challenge, as I understand it, happens with the VM, because the VM
> doesn't allocate the IOVA for the MSI doorbell itself, but the host
> kernel does this, independently from the attributes (e.g. memory map) of
> the VM.
> 
> Because the IOVA is a single resource, but with two independent entities
> allocating chunks of it (the host kernel for the MSI doorbell IOVA, and
> the VFIO user for other DMA operations), you have to provide some
> coordination between those to entities to avoid conflicts.  In the case
> of KVM, the two entities are the host kernel and the VFIO user (QEMU/the
> VM), and the host kernel informs the VFIO user to never attempt to use
> the doorbell IOVA already reserved by the host kernel for DMA.
> 
> One way to do that is to ensure that the IPA space of the VFIO user
> corresponding to the doorbell IOVA is simply not valid, ie. the reserved
> regions that avoid for example QEMU to allocate RAM there.
> 
> (I suppose it's technically possible to get around this issue by letting
> QEMU place RAM wherever it wants but tell the guest to never use a
> particular subset of its RAM for DMA, because that would conflict with
> the doorbell IOVA or be seen as p2p transactions.  But I think we all
> probably agree that it's a disgusting idea.)

Well, it's not like QEMU or libvirt stumbling through sysfs to figure
out where holes could be in order to instantiate a VM with matching
holes, just in case someone might decide to hot-add a device into the
VM, at some point, and hopefully they don't migrate the VM to another
host with a different layout first, is all that much less disgusting or
foolproof. It's just that in order to dynamically remove a page as a
possible DMA target we require a paravirt channel, such as a balloon
driver that's able to pluck a specific page.  In some ways it's
actually less disgusting, but it puts some prerequisites on
enlightening the guest OS.  Thanks,

Alex

^ permalink raw reply

* [PATCH 2/2] KVM: ARM64: Fix the issues when guest PMCCFILTR is configured
From: Wei Huang @ 2016-11-09 19:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1478721480-24852-1-git-send-email-wei@redhat.com>

KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
But this function can't deals with PMCCFILTR correctly because the evtCount
bit of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this
function shouldn't return immediately; instead it needs to check further
if select_idx is ARMV8_PMU_CYCLE_IDX.

Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
blindly to attr.config. Instead it ought to convert the request to the
"cpu cycle" event type (i.e. 0x11).

Signed-off-by: Wei Huang <wei@redhat.com>
---
 virt/kvm/arm/pmu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 6e9c40e..69ccce3 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -305,7 +305,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
 			continue;
 		type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
 		       & ARMV8_PMU_EVTYPE_EVENT;
-		if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+		if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
 		    && (enable & BIT(i))) {
 			reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
 			reg = lower_32_bits(reg);
@@ -379,7 +379,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 	eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
 
 	/* Software increment event does't need to be backed by a perf event */
-	if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR)
+	if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR &&
+	    select_idx != ARMV8_PMU_CYCLE_IDX)
 		return;
 
 	memset(&attr, 0, sizeof(struct perf_event_attr));
@@ -391,7 +392,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 	attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0;
 	attr.exclude_hv = 1; /* Don't count EL2 events */
 	attr.exclude_host = 1; /* Don't count host events */
-	attr.config = eventsel;
+	attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ?
+		ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel;
 
 	counter = kvm_pmu_get_counter_value(vcpu, select_idx);
 	/* The initial sample period (overflow count) of an event. */
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/2] arm64: perf: Move ARMv8 PMU perf event definitions to asm/perf_event.h
From: Wei Huang @ 2016-11-09 19:57 UTC (permalink / raw)
  To: linux-arm-kernel

This patch moves ARMv8-related perf event definitions from perf_event.c
to asm/perf_event.h; so KVM code can use them directly. This also help
remove a duplicated definition of SW_INCR in perf_event.h.

Signed-off-by: Wei Huang <wei@redhat.com>
---
 arch/arm64/include/asm/perf_event.h | 161 +++++++++++++++++++++++++++++++++++-
 arch/arm64/kernel/perf_event.c      | 161 ------------------------------------
 2 files changed, 160 insertions(+), 162 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 2065f46..6c7b18b 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -46,7 +46,166 @@
 #define	ARMV8_PMU_EVTYPE_MASK	0xc800ffff	/* Mask for writable bits */
 #define	ARMV8_PMU_EVTYPE_EVENT	0xffff		/* Mask for EVENT bits */
 
-#define ARMV8_PMU_EVTYPE_EVENT_SW_INCR	0	/* Software increment event */
+/*
+ * ARMv8 PMUv3 Performance Events handling code.
+ * Common event types.
+ */
+
+/* Required events. */
+#define ARMV8_PMUV3_PERFCTR_SW_INCR				0x00
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL			0x03
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE				0x04
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED				0x10
+#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES				0x11
+#define ARMV8_PMUV3_PERFCTR_BR_PRED				0x12
+
+/* At least one of the following is required. */
+#define ARMV8_PMUV3_PERFCTR_INST_RETIRED			0x08
+#define ARMV8_PMUV3_PERFCTR_INST_SPEC				0x1B
+
+/* Common architectural events. */
+#define ARMV8_PMUV3_PERFCTR_LD_RETIRED				0x06
+#define ARMV8_PMUV3_PERFCTR_ST_RETIRED				0x07
+#define ARMV8_PMUV3_PERFCTR_EXC_TAKEN				0x09
+#define ARMV8_PMUV3_PERFCTR_EXC_RETURN				0x0A
+#define ARMV8_PMUV3_PERFCTR_CID_WRITE_RETIRED			0x0B
+#define ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED			0x0C
+#define ARMV8_PMUV3_PERFCTR_BR_IMMED_RETIRED			0x0D
+#define ARMV8_PMUV3_PERFCTR_BR_RETURN_RETIRED			0x0E
+#define ARMV8_PMUV3_PERFCTR_UNALIGNED_LDST_RETIRED		0x0F
+#define ARMV8_PMUV3_PERFCTR_TTBR_WRITE_RETIRED			0x1C
+#define ARMV8_PMUV3_PERFCTR_CHAIN				0x1E
+#define ARMV8_PMUV3_PERFCTR_BR_RETIRED				0x21
+
+/* Common microarchitectural events. */
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL			0x01
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL			0x02
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL			0x05
+#define ARMV8_PMUV3_PERFCTR_MEM_ACCESS				0x13
+#define ARMV8_PMUV3_PERFCTR_L1I_CACHE				0x14
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_WB			0x15
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE				0x16
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL			0x17
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_WB			0x18
+#define ARMV8_PMUV3_PERFCTR_BUS_ACCESS				0x19
+#define ARMV8_PMUV3_PERFCTR_MEMORY_ERROR			0x1A
+#define ARMV8_PMUV3_PERFCTR_BUS_CYCLES				0x1D
+#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_ALLOCATE			0x1F
+#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_ALLOCATE			0x20
+#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED_RETIRED			0x22
+#define ARMV8_PMUV3_PERFCTR_STALL_FRONTEND			0x23
+#define ARMV8_PMUV3_PERFCTR_STALL_BACKEND			0x24
+#define ARMV8_PMUV3_PERFCTR_L1D_TLB				0x25
+#define ARMV8_PMUV3_PERFCTR_L1I_TLB				0x26
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE				0x27
+#define ARMV8_PMUV3_PERFCTR_L2I_CACHE_REFILL			0x28
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_ALLOCATE			0x29
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_REFILL			0x2A
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE				0x2B
+#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_WB			0x2C
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB_REFILL			0x2D
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB_REFILL			0x2E
+#define ARMV8_PMUV3_PERFCTR_L2D_TLB				0x2F
+#define ARMV8_PMUV3_PERFCTR_L2I_TLB				0x30
+
+/* ARMv8 recommended implementation defined event types */
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD			0x40
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR			0x41
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_RD		0x42
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_WR		0x43
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_INNER		0x44
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_OUTER		0x45
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WB_VICTIM		0x46
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WB_CLEAN			0x47
+#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_INVAL			0x48
+
+#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_REFILL_RD			0x4C
+#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_REFILL_WR			0x4D
+#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_RD				0x4E
+#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_WR				0x4F
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_RD			0x50
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WR			0x51
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_REFILL_RD		0x52
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_REFILL_WR		0x53
+
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WB_VICTIM		0x56
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WB_CLEAN			0x57
+#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_INVAL			0x58
+
+#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_REFILL_RD			0x5C
+#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_REFILL_WR			0x5D
+#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_RD				0x5E
+#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_WR				0x5F
+
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_RD			0x60
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_WR			0x61
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_SHARED			0x62
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_NOT_SHARED		0x63
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_NORMAL			0x64
+#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_PERIPH			0x65
+
+#define ARMV8_IMPDEF_PERFCTR_MEM_ACCESS_RD			0x66
+#define ARMV8_IMPDEF_PERFCTR_MEM_ACCESS_WR			0x67
+#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_LD_SPEC			0x68
+#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_ST_SPEC			0x69
+#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_LDST_SPEC		0x6A
+
+#define ARMV8_IMPDEF_PERFCTR_LDREX_SPEC				0x6C
+#define ARMV8_IMPDEF_PERFCTR_STREX_PASS_SPEC			0x6D
+#define ARMV8_IMPDEF_PERFCTR_STREX_FAIL_SPEC			0x6E
+#define ARMV8_IMPDEF_PERFCTR_STREX_SPEC				0x6F
+#define ARMV8_IMPDEF_PERFCTR_LD_SPEC				0x70
+#define ARMV8_IMPDEF_PERFCTR_ST_SPEC				0x71
+#define ARMV8_IMPDEF_PERFCTR_LDST_SPEC				0x72
+#define ARMV8_IMPDEF_PERFCTR_DP_SPEC				0x73
+#define ARMV8_IMPDEF_PERFCTR_ASE_SPEC				0x74
+#define ARMV8_IMPDEF_PERFCTR_VFP_SPEC				0x75
+#define ARMV8_IMPDEF_PERFCTR_PC_WRITE_SPEC			0x76
+#define ARMV8_IMPDEF_PERFCTR_CRYPTO_SPEC			0x77
+#define ARMV8_IMPDEF_PERFCTR_BR_IMMED_SPEC			0x78
+#define ARMV8_IMPDEF_PERFCTR_BR_RETURN_SPEC			0x79
+#define ARMV8_IMPDEF_PERFCTR_BR_INDIRECT_SPEC			0x7A
+
+#define ARMV8_IMPDEF_PERFCTR_ISB_SPEC				0x7C
+#define ARMV8_IMPDEF_PERFCTR_DSB_SPEC				0x7D
+#define ARMV8_IMPDEF_PERFCTR_DMB_SPEC				0x7E
+
+#define ARMV8_IMPDEF_PERFCTR_EXC_UNDEF				0x81
+#define ARMV8_IMPDEF_PERFCTR_EXC_SVC				0x82
+#define ARMV8_IMPDEF_PERFCTR_EXC_PABORT				0x83
+#define ARMV8_IMPDEF_PERFCTR_EXC_DABORT				0x84
+
+#define ARMV8_IMPDEF_PERFCTR_EXC_IRQ				0x86
+#define ARMV8_IMPDEF_PERFCTR_EXC_FIQ				0x87
+#define ARMV8_IMPDEF_PERFCTR_EXC_SMC				0x88
+
+#define ARMV8_IMPDEF_PERFCTR_EXC_HVC				0x8A
+#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_PABORT			0x8B
+#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_DABORT			0x8C
+#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_OTHER			0x8D
+#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_IRQ			0x8E
+#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_FIQ			0x8F
+#define ARMV8_IMPDEF_PERFCTR_RC_LD_SPEC				0x90
+#define ARMV8_IMPDEF_PERFCTR_RC_ST_SPEC				0x91
+
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_RD			0xA0
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WR			0xA1
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_REFILL_RD		0xA2
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_REFILL_WR		0xA3
+
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WB_VICTIM		0xA6
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WB_CLEAN			0xA7
+#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_INVAL			0xA8
+
+/* ARMv8 Cortex-A53 specific event types. */
+#define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
+
+/* ARMv8 Cavium ThunderX specific event types. */
+#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_MISS_ST			0xE9
+#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_PREF_ACCESS		0xEA
+#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_PREF_MISS		0xEB
+#define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_ACCESS		0xEC
+#define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_MISS		0xED
 
 /*
  * Event filters for PMUv3
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index a9310a6..108ba40 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -29,167 +29,6 @@
 #include <linux/perf/arm_pmu.h>
 #include <linux/platform_device.h>
 
-/*
- * ARMv8 PMUv3 Performance Events handling code.
- * Common event types.
- */
-
-/* Required events. */
-#define ARMV8_PMUV3_PERFCTR_SW_INCR				0x00
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL			0x03
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE				0x04
-#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED				0x10
-#define ARMV8_PMUV3_PERFCTR_CPU_CYCLES				0x11
-#define ARMV8_PMUV3_PERFCTR_BR_PRED				0x12
-
-/* At least one of the following is required. */
-#define ARMV8_PMUV3_PERFCTR_INST_RETIRED			0x08
-#define ARMV8_PMUV3_PERFCTR_INST_SPEC				0x1B
-
-/* Common architectural events. */
-#define ARMV8_PMUV3_PERFCTR_LD_RETIRED				0x06
-#define ARMV8_PMUV3_PERFCTR_ST_RETIRED				0x07
-#define ARMV8_PMUV3_PERFCTR_EXC_TAKEN				0x09
-#define ARMV8_PMUV3_PERFCTR_EXC_RETURN				0x0A
-#define ARMV8_PMUV3_PERFCTR_CID_WRITE_RETIRED			0x0B
-#define ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED			0x0C
-#define ARMV8_PMUV3_PERFCTR_BR_IMMED_RETIRED			0x0D
-#define ARMV8_PMUV3_PERFCTR_BR_RETURN_RETIRED			0x0E
-#define ARMV8_PMUV3_PERFCTR_UNALIGNED_LDST_RETIRED		0x0F
-#define ARMV8_PMUV3_PERFCTR_TTBR_WRITE_RETIRED			0x1C
-#define ARMV8_PMUV3_PERFCTR_CHAIN				0x1E
-#define ARMV8_PMUV3_PERFCTR_BR_RETIRED				0x21
-
-/* Common microarchitectural events. */
-#define ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL			0x01
-#define ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL			0x02
-#define ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL			0x05
-#define ARMV8_PMUV3_PERFCTR_MEM_ACCESS				0x13
-#define ARMV8_PMUV3_PERFCTR_L1I_CACHE				0x14
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_WB			0x15
-#define ARMV8_PMUV3_PERFCTR_L2D_CACHE				0x16
-#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL			0x17
-#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_WB			0x18
-#define ARMV8_PMUV3_PERFCTR_BUS_ACCESS				0x19
-#define ARMV8_PMUV3_PERFCTR_MEMORY_ERROR			0x1A
-#define ARMV8_PMUV3_PERFCTR_BUS_CYCLES				0x1D
-#define ARMV8_PMUV3_PERFCTR_L1D_CACHE_ALLOCATE			0x1F
-#define ARMV8_PMUV3_PERFCTR_L2D_CACHE_ALLOCATE			0x20
-#define ARMV8_PMUV3_PERFCTR_BR_MIS_PRED_RETIRED			0x22
-#define ARMV8_PMUV3_PERFCTR_STALL_FRONTEND			0x23
-#define ARMV8_PMUV3_PERFCTR_STALL_BACKEND			0x24
-#define ARMV8_PMUV3_PERFCTR_L1D_TLB				0x25
-#define ARMV8_PMUV3_PERFCTR_L1I_TLB				0x26
-#define ARMV8_PMUV3_PERFCTR_L2I_CACHE				0x27
-#define ARMV8_PMUV3_PERFCTR_L2I_CACHE_REFILL			0x28
-#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_ALLOCATE			0x29
-#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_REFILL			0x2A
-#define ARMV8_PMUV3_PERFCTR_L3D_CACHE				0x2B
-#define ARMV8_PMUV3_PERFCTR_L3D_CACHE_WB			0x2C
-#define ARMV8_PMUV3_PERFCTR_L2D_TLB_REFILL			0x2D
-#define ARMV8_PMUV3_PERFCTR_L2I_TLB_REFILL			0x2E
-#define ARMV8_PMUV3_PERFCTR_L2D_TLB				0x2F
-#define ARMV8_PMUV3_PERFCTR_L2I_TLB				0x30
-
-/* ARMv8 recommended implementation defined event types */
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD			0x40
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR			0x41
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_RD		0x42
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_WR		0x43
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_INNER		0x44
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_REFILL_OUTER		0x45
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WB_VICTIM		0x46
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WB_CLEAN			0x47
-#define ARMV8_IMPDEF_PERFCTR_L1D_CACHE_INVAL			0x48
-
-#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_REFILL_RD			0x4C
-#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_REFILL_WR			0x4D
-#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_RD				0x4E
-#define ARMV8_IMPDEF_PERFCTR_L1D_TLB_WR				0x4F
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_RD			0x50
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WR			0x51
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_REFILL_RD		0x52
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_REFILL_WR		0x53
-
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WB_VICTIM		0x56
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_WB_CLEAN			0x57
-#define ARMV8_IMPDEF_PERFCTR_L2D_CACHE_INVAL			0x58
-
-#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_REFILL_RD			0x5C
-#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_REFILL_WR			0x5D
-#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_RD				0x5E
-#define ARMV8_IMPDEF_PERFCTR_L2D_TLB_WR				0x5F
-
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_RD			0x60
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_WR			0x61
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_SHARED			0x62
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_NOT_SHARED		0x63
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_NORMAL			0x64
-#define ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_PERIPH			0x65
-
-#define ARMV8_IMPDEF_PERFCTR_MEM_ACCESS_RD			0x66
-#define ARMV8_IMPDEF_PERFCTR_MEM_ACCESS_WR			0x67
-#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_LD_SPEC			0x68
-#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_ST_SPEC			0x69
-#define ARMV8_IMPDEF_PERFCTR_UNALIGNED_LDST_SPEC		0x6A
-
-#define ARMV8_IMPDEF_PERFCTR_LDREX_SPEC				0x6C
-#define ARMV8_IMPDEF_PERFCTR_STREX_PASS_SPEC			0x6D
-#define ARMV8_IMPDEF_PERFCTR_STREX_FAIL_SPEC			0x6E
-#define ARMV8_IMPDEF_PERFCTR_STREX_SPEC				0x6F
-#define ARMV8_IMPDEF_PERFCTR_LD_SPEC				0x70
-#define ARMV8_IMPDEF_PERFCTR_ST_SPEC				0x71
-#define ARMV8_IMPDEF_PERFCTR_LDST_SPEC				0x72
-#define ARMV8_IMPDEF_PERFCTR_DP_SPEC				0x73
-#define ARMV8_IMPDEF_PERFCTR_ASE_SPEC				0x74
-#define ARMV8_IMPDEF_PERFCTR_VFP_SPEC				0x75
-#define ARMV8_IMPDEF_PERFCTR_PC_WRITE_SPEC			0x76
-#define ARMV8_IMPDEF_PERFCTR_CRYPTO_SPEC			0x77
-#define ARMV8_IMPDEF_PERFCTR_BR_IMMED_SPEC			0x78
-#define ARMV8_IMPDEF_PERFCTR_BR_RETURN_SPEC			0x79
-#define ARMV8_IMPDEF_PERFCTR_BR_INDIRECT_SPEC			0x7A
-
-#define ARMV8_IMPDEF_PERFCTR_ISB_SPEC				0x7C
-#define ARMV8_IMPDEF_PERFCTR_DSB_SPEC				0x7D
-#define ARMV8_IMPDEF_PERFCTR_DMB_SPEC				0x7E
-
-#define ARMV8_IMPDEF_PERFCTR_EXC_UNDEF				0x81
-#define ARMV8_IMPDEF_PERFCTR_EXC_SVC				0x82
-#define ARMV8_IMPDEF_PERFCTR_EXC_PABORT				0x83
-#define ARMV8_IMPDEF_PERFCTR_EXC_DABORT				0x84
-
-#define ARMV8_IMPDEF_PERFCTR_EXC_IRQ				0x86
-#define ARMV8_IMPDEF_PERFCTR_EXC_FIQ				0x87
-#define ARMV8_IMPDEF_PERFCTR_EXC_SMC				0x88
-
-#define ARMV8_IMPDEF_PERFCTR_EXC_HVC				0x8A
-#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_PABORT			0x8B
-#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_DABORT			0x8C
-#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_OTHER			0x8D
-#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_IRQ			0x8E
-#define ARMV8_IMPDEF_PERFCTR_EXC_TRAP_FIQ			0x8F
-#define ARMV8_IMPDEF_PERFCTR_RC_LD_SPEC				0x90
-#define ARMV8_IMPDEF_PERFCTR_RC_ST_SPEC				0x91
-
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_RD			0xA0
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WR			0xA1
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_REFILL_RD		0xA2
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_REFILL_WR		0xA3
-
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WB_VICTIM		0xA6
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_WB_CLEAN			0xA7
-#define ARMV8_IMPDEF_PERFCTR_L3D_CACHE_INVAL			0xA8
-
-/* ARMv8 Cortex-A53 specific event types. */
-#define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
-
-/* ARMv8 Cavium ThunderX specific event types. */
-#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_MISS_ST			0xE9
-#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_PREF_ACCESS		0xEA
-#define ARMV8_THUNDER_PERFCTR_L1D_CACHE_PREF_MISS		0xEB
-#define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_ACCESS		0xEC
-#define ARMV8_THUNDER_PERFCTR_L1I_CACHE_PREF_MISS		0xED
-
 /* PMUv3 HW events mapping. */
 
 /*
-- 
2.7.4

^ permalink raw reply related

* [PATCH] arm64: mm: Fix memmap to be initialized for the entire section
From: Robert Richter @ 2016-11-09 19:51 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161107210514.GP20591@arm.com>

Will,

On 07.11.16 21:05:14, Will Deacon wrote:
> Just to reiterate here, but your patch as it stands will break other parts
> of the kernel. For example, acpi_os_ioremap relies on being able to ioremap
> these regions afaict.
> 
> I think any solution involving pfn_valid is just going to move the crash
> around.

Let me describe the crash more detailed.

Following range is marked nomap (full efi map below):

[    0.000000] efi:   0x010ffffb9000-0x010ffffccfff [Runtime Code       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] efi:   0x010ffffcd000-0x010fffffefff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*

The mem belongs to this nodes:

[    0.000000] NUMA: Adding memblock [0x1400000 - 0xfffffffff] on node 0
[    0.000000] NUMA: Adding memblock [0x10000400000 - 0x10fffffffff] on node 1

The following mem ranges are created:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000001400000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000010fffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000001400000-0x00000000fffdffff]
[    0.000000]   node   0: [mem 0x00000000fffe0000-0x00000000ffffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x0000000fffffffff]
[    0.000000]   node   1: [mem 0x0000010000400000-0x0000010ff9e7ffff]
[    0.000000]   node   1: [mem 0x0000010ff9e80000-0x0000010ff9f1ffff]
[    0.000000]   node   1: [mem 0x0000010ff9f20000-0x0000010ffaeaffff]
[    0.000000]   node   1: [mem 0x0000010ffaeb0000-0x0000010ffaffffff]
[    0.000000]   node   1: [mem 0x0000010ffb000000-0x0000010ffffaffff]
[    0.000000]   node   1: [mem 0x0000010ffffb0000-0x0000010fffffffff]

The last range 0x0000010ffffb0000-0x0000010fffffffff is correctly
marked as a nomap area.

Paging is then initalized in free_area_init_core() (mm/page_alloc.c)
which calls memmap_init_zone(). This initializes all pages (struct
page) of the zones for each node except for pfns from nomap memory. It
uses early_pfn_valid() which is bound to pfn_valid().

In 4.4 *all* pages of a zone were initialized. Now page from nomap
ranges are skipped, and e.g. pfn 0x10ffffb has an uninitalized struct
page.  Note that nomap mem ranges are still part of the memblock list
and also part of the zone ranges within start_pfn and end_pfn, but
those don't have a valid struct page now. IMO this is a bug.

Later on all mappable memory is reserved and all pages of it are freed
by adding them to the free-pages list except for the reserved
memblocks.

Now the initrd is loaded. This reserves free memory by calling
move_freepages_block(). A block has the size of pageblock_nr_pages
which is in my configuration 0x2000. The kernel choses the block

 0x0000010fe0000000-0x0000010fffffffff

that also contains the nomap region around 10ffb000000.

In move_freepages() the pages of the zone are freed. The code accesses
pfn #10fffff but the struct page is uninitialized and thus node is 0.
This is different to #10fe000 and the zone which is node 1. This
causes the BUG_ON() to trigger:

[    4.173521] Unpacking initramfs...
[    8.420710] ------------[ cut here ]------------
[    8.425344] kernel BUG at mm/page_alloc.c:1844!
[    8.429870] Internal error: Oops - BUG: 0 [#1] SMP

I believe this is not the only case where the mm code relies on a
valid struct page for all pfns of a zone, thus modifying mm code to
make this case work is not an option. (E.g. this could be done by
moving the early_pfn_valid() check at the beginning of the loop in
memmap_init_zone().)

So for a fix we need to change early_pfn_valid() which is mapped to
pfn_valid() for SPARSEMEM. Using a different function than pfn_valid()
for this does not look reasonable. The only option is to revert
pfn_valid() to it's original bahaviour.

My fix now uses again memblock_is_memory() for pfn_valid() instead of
memblock_is_map_memory(). So I needed to change some users of
pfn_valid() to use memblock_is_map_memory() there necessary. This is
only required for arch specific code, all other uses of pfn_valid()
are save to use memblock_is_memory().

Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
keeps the same behaviour as before since it still uses memblock_is_
memory(). Could you more describe your concerns why do you think this
patch breaks the kernel and moves the problem somewhere else? I
believe it fixes the problem at all.

Thanks,

-Robert



[    0.000000] efi: Processing EFI memory map:
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x1ffe800000 reserved size = 0x10537
[    0.000000]  memory.cnt  = 0x2
[    0.000000]  memory[0x0]     [0x00000001400000-0x00000fffffffff], 0xffec00000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x00010000400000-0x00010fffffffff], 0xfffc00000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000021200000-0x00000021210536], 0x10537 bytes flags: 0x0
[    0.000000] efi:   0x000001400000-0x00000147ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000001400000-0x0000000147ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000001480000-0x0000024affff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000001480000-0x000000024affff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x0000024b0000-0x0000211fffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x000000024b0000-0x000000211fffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000021200000-0x00002121ffff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000021200000-0x0000002121ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000021220000-0x0000fffebfff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000021220000-0x000000fffeffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x0000fffec000-0x0000ffff5fff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x000000fffe0000-0x000000ffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x0000ffff6000-0x0000ffff6fff [ACPI Memory NVS    |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x000000ffff0000-0x000000ffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x0000ffff7000-0x0000ffffffff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x000000ffff0000-0x000000ffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000100000000-0x000ff7ffffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000100000000-0x00000ff7ffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000ff8000000-0x000ff801ffff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000ff8000000-0x00000ff801ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000ff8020000-0x000fffa9efff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000ff8020000-0x00000fffa9ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x000fffa9f000-0x000fffffffff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00000fffa90000-0x00000fffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010000400000-0x010f812b3fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010000400000-0x00010f812bffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f812b4000-0x010f812b6fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f812b0000-0x00010f812bffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f812b7000-0x010f822e6fff [Loader Code        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f812b0000-0x00010f822effff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f822e7000-0x010f822f6fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f822e0000-0x00010f822fffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f822f7000-0x010f82342fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f822f0000-0x00010f8234ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f82343000-0x010f91e73fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f82340000-0x00010f91e7ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f91e74000-0x010f91e74fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f91e70000-0x00010f91e7ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f91e75000-0x010f92c98fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f91e70000-0x00010f92c9ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f92c99000-0x010f93880fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f92c90000-0x00010f9388ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010f93881000-0x010ff7880fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010f93880000-0x00010ff788ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ff7881000-0x010ff7886fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ff7880000-0x00010ff788ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ff7887000-0x010ff78a3fff [Loader Code        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ff7880000-0x00010ff78affff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ff78a4000-0x010ff9e8dfff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ff78a0000-0x00010ff9e8ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ff9e8e000-0x010ff9f16fff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ff9e80000-0x00010ff9f1ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ff9f17000-0x010ffaeb5fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ff9f10000-0x00010ffaebffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffaeb6000-0x010ffafc8fff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ffaeb0000-0x00010ffafcffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffafc9000-0x010ffafccfff [Runtime Code       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ffafc0000-0x00010ffafcffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffafcd000-0x010ffaff4fff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ffafc0000-0x00010ffaffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffaff5000-0x010ffb008fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ffaff0000-0x00010ffb00ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffb009000-0x010fffe28fff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010ffb000000-0x00010fffe2ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010fffe29000-0x010fffe3ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010fffe20000-0x00010fffe3ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010fffe40000-0x010fffe53fff [Loader Data        |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010fffe40000-0x00010fffe5ffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010fffe54000-0x010ffffb8fff [Boot Code          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010fffe50000-0x00010ffffbffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffffb9000-0x010ffffccfff [Runtime Code       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ffffb0000-0x00010ffffcffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffffcd000-0x010fffffefff [Runtime Data       |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000] memblock_add: [0x00010ffffc0000-0x00010fffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x010ffffff000-0x010fffffffff [Boot Data          |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] memblock_add: [0x00010fffff0000-0x00010fffffffff] flags 0x0 early_init_dt_add_memory_arch+0x54/0x5c
[    0.000000] efi:   0x804000001000-0x804000001fff [Memory Mapped I/O  |RUN|  |  |  |  |  |  |   |  |  |  |UC]
[    0.000000] efi:   0x87e0d0001000-0x87e0d0001fff [Memory Mapped I/O  |RUN|  |  |  |  |  |  |   |  |  |  |UC]
[    0.000000] memblock_reserve: [0x00010f812b0000-0x00010f812bffff] flags 0x0 efi_init+0x208/0x2c8
[    0.000000] memblock_add: [0x00010f82340000-0x00010f91e7ffff] flags 0x0 arm64_memblock_init+0x16c/0x248
[    0.000000] memblock_reserve: [0x00010f82340000-0x00010f91e7ffff] flags 0x0 arm64_memblock_init+0x178/0x248
[    0.000000] memblock_reserve: [0x00000001480000-0x000000024affff] flags 0x0 arm64_memblock_init+0x1b0/0x248
[    0.000000] memblock_reserve: [0x00010f82343000-0x00010f91e73613] flags 0x0 arm64_memblock_init+0x1cc/0x248
[    0.000000] memblock_reserve: [0x000000c0000000-0x000000dfffffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] cma: Reserved 512 MiB at 0x00000000c0000000
[    0.000000] memblock_reserve: [0x00010ffffa0000-0x00010ffffaffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff90000-0x00010ffff9ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff80000-0x00010ffff8ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff70000-0x00010ffff7ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff60000-0x00010ffff6ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff50000-0x00010ffff5ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff40000-0x00010ffff4ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffff30000-0x00010ffff3ffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000]    memblock_free: [0x00010ffffa0000-0x00010ffffaffff] paging_init+0x5c0/0x610
[    0.000000]    memblock_free: [0x00000002490000-0x000000024affff] paging_init+0x5f4/0x610
[    0.000000] memblock_reserve: [0x00010fffefed80-0x00010ffff2fffb] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffffaffd0-0x00010ffffafffe] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffffaffa0-0x00010ffffaffce] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] memblock_reserve: [0x00010ffffa0000-0x00010ffffa000f] flags 0x0 numa_init+0x88/0x3f0
[    0.000000] NUMA: Adding memblock [0x1400000 - 0xfffffffff] on node 0
[    0.000000] NUMA: Adding memblock [0x10000400000 - 0x10fffffffff] on node 1
[    0.000000] NUMA: parsing numa-distance-map-v1
[    0.000000] NUMA: Initmem setup node 0 [mem 0x01400000-0xfffffffff]
[    0.000000] memblock_reserve: [0x00000fffffe500-0x00000fffffffff] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] NUMA: NODE_DATA [mem 0xfffffe500-0xfffffffff]
[    0.000000] NUMA: Initmem setup node 1 [mem 0x10000400000-0x10fffffffff]
[    0.000000] memblock_reserve: [0x00010ffffae480-0x00010ffffaff7f] flags 0x0 memblock_alloc_range_nid+0x30/0x58
[    0.000000] NUMA: NODE_DATA [mem 0x10ffffae480-0x10ffffaff7f]
...
[    8.420710] ------------[ cut here ]------------
[    8.425344] kernel BUG at mm/page_alloc.c:1844!
[    8.429870] Internal error: Oops - BUG: 0 [#1] SMP
[    8.434654] Modules linked in:
[    8.437712] CPU: 72 PID: 1 Comm: swapper/0 Tainted: G        W       4.8.1.4.vanilla10-00007-g9eb3a76b8b88 #111
[    8.447788] Hardware name: www.cavium.com ThunderX CRB-2S/ThunderX CRB-2S, BIOS 0.3 Sep 13 2016
[    8.456474] task: ffff800fee626800 task.stack: ffff800fec02c000
[    8.462404] PC is at move_freepages+0x198/0x1b0
[    8.466924] LR is at move_freepages_block+0xa8/0xb8
[    8.471791] pc : [<ffff0000081e5dd0>] lr : [<ffff0000081e5e90>] pstate: 000000c5
[    8.479174] sp : ffff800fec02f4d0
[    8.482478] x29: ffff800fec02f4d0 x28: 0000000000000001 
[    8.487786] x27: 0000000000000000 x26: 0000000000000000 
[    8.493093] x25: 000000000000000a x24: ffff7fe043fd0020 
[    8.498401] x23: ffff810ffffaec00 x22: 0000000000000000 
[    8.503708] x21: ffff7fe043ffffc0 x20: 0000000000000000 
[    8.509015] x19: ffff7fe043f80000 x18: ffff8100102f2548 
[    8.514322] x17: 0000000000000000 x16: 0000000100000000 
[    8.519630] x15: 0000000000000007 x14: 0000010200000000 
[    8.524938] x13: 0004a41400000000 x12: 00032c920000001a 
[    8.530245] x11: 0000010200000000 x10: 0004a40800000000 
[    8.535553] x9 : 0000000000000040 x8 : 0000000000000000 
[    8.540860] x7 : 0000000000000000 x6 : ffff810ffffaf120 
[    8.546168] x5 : 0000000000000001 x4 : ffff000008d70c48 
[    8.551475] x3 : ffff810ffffae480 x2 : 0000000000000000 
[    8.556782] x1 : 0000000000000001 x0 : ffff810ffffaec00 
[    8.562089] 
[    8.563571] Process swapper/0 (pid: 1, stack limit = 0xffff800fec02c020)
[    8.570261] Stack: (0xffff800fec02f4d0 to 0xffff800fec030000)
...
[    9.279652] Call trace:
[    9.282091] Exception stack(0xffff800fec02f300 to 0xffff800fec02f430)
[    9.288520] f300: ffff7fe043f80000 0001000000000000 ffff800fec02f4d0 ffff0000081e5dd0
[    9.296339] f320: 0000000000000001 ffff800fee626800 0000000000000000 00000000026012d0
[    9.304157] f340: 0000000100000000 ffffffffffffffff 0000000000000130 0000000000000040
[    9.311976] f360: 0000001200000000 ffff000008cf8198 0000000100000000 0000000000000001
[    9.319794] f380: 00000000026012d0 ffff810ff9a1e8e8 0000000000000000 0000000000000040
[    9.327613] f3a0: ffff810ffffaec00 0000000000000001 0000000000000000 ffff810ffffae480
[    9.335431] f3c0: ffff000008d70c48 0000000000000001 ffff810ffffaf120 0000000000000000
[    9.343250] f3e0: 0000000000000000 0000000000000040 0004a40800000000 0000010200000000
[    9.351068] f400: 00032c920000001a 0004a41400000000 0000010200000000 0000000000000007
[    9.358885] f420: 0000000100000000 0000000000000000
[    9.363756] [<ffff0000081e5dd0>] move_freepages+0x198/0x1b0
[    9.369318] [<ffff0000081e5e90>] move_freepages_block+0xa8/0xb8
[    9.375227] [<ffff0000081e657c>] __rmqueue+0x604/0x650
[    9.380355] [<ffff0000081e7898>] get_page_from_freelist+0x3f0/0xb88
[    9.386612] [<ffff0000081e860c>] __alloc_pages_nodemask+0x12c/0xce8
[    9.392876] [<ffff00000823dc1c>] alloc_page_interleave+0x64/0xc0
[    9.398873] [<ffff00000823e2f8>] alloc_pages_current+0x108/0x168
[    9.404870] [<ffff0000081de1cc>] __page_cache_alloc+0x104/0x140
[    9.410779] [<ffff0000081de31c>] pagecache_get_page+0x114/0x300
[    9.416688] [<ffff0000081de550>] grab_cache_page_write_begin+0x48/0x68
[    9.423207] [<ffff00000828e5f8>] simple_write_begin+0x40/0x150
[    9.429029] [<ffff0000081ddfe0>] generic_perform_write+0xb8/0x1a0
[    9.435112] [<ffff0000081df8c8>] __generic_file_write_iter+0x170/0x1b0
[    9.441628] [<ffff0000081df9d4>] generic_file_write_iter+0xcc/0x1c8
[    9.447893] [<ffff000008262eb4>] __vfs_write+0xcc/0x140
[    9.453108] [<ffff000008263b84>] vfs_write+0xa4/0x1c0
[    9.458150] [<ffff000008264bc4>] SyS_write+0x54/0xb0
[    9.463108] [<ffff000008be1fe4>] xwrite+0x34/0x7c
[    9.467801] [<ffff000008be20c8>] do_copy+0x9c/0xf4
[    9.472583] [<ffff000008be1da4>] write_buffer+0x34/0x50
[    9.477796] [<ffff000008be1e08>] flush_buffer+0x48/0xb4
[    9.483016] [<ffff000008c0fe08>] __gunzip+0x288/0x334
[    9.488057] [<ffff000008c0fecc>] gunzip+0x18/0x20
[    9.492750] [<ffff000008be26bc>] unpack_to_rootfs+0x168/0x284
[    9.498486] [<ffff000008be2848>] populate_rootfs+0x70/0x138
[    9.504051] [<ffff000008082ff4>] do_one_initcall+0x44/0x138
[    9.509613] [<ffff000008be0d04>] kernel_init_freeable+0x1ac/0x24c
[    9.515699] [<ffff000008847f20>] kernel_init+0x20/0xf8
[    9.520826] [<ffff000008082b80>] ret_from_fork+0x10/0x50
[    9.526129] Code: 910a0021 9400b47d d4210000 d503201f (d4210000) 
[    9.532233] ---[ end trace c3040dccdcf12d3a ]---
[    9.536889] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

^ permalink raw reply

* [PATCHv2] PCI: QDF2432 32 bit config space accessors
From: Christopher Covington @ 2016-11-09 19:25 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20161102160820.GA6568@bhelgaas-glaptop.roam.corp.google.com>

Hi Bjorn,

On 11/02/2016 12:08 PM, Bjorn Helgaas wrote:
> On Tue, Nov 01, 2016 at 07:06:31AM -0600, cov at codeaurora.org wrote:
>> Hi Bjorn,
>>
>> On 2016-10-31 15:48, Bjorn Helgaas wrote:
>>> On Wed, Sep 21, 2016 at 06:38:05PM -0400, Christopher Covington wrote:
>>>> The Qualcomm Technologies QDF2432 SoC does not support accesses
>>>> smaller
>>>> than 32 bits to the PCI configuration space. Register the appropriate
>>>> quirk.
>>>>
>>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>>
>>> Hi Christopher,
>>>
>>> Can you rebase this against v4.9-rc1?  It no longer applies to my tree.
>>
>> I apologize for not being clearer. This patch depends on:
>>
>> PCI/ACPI: Extend pci_mcfg_lookup() responsibilities
>> PCI/ACPI: Check platform-specific ECAM quirks
>>
>> These patches from Tomasz Nowicki were previously in your pci/ecam-v6
>> branch, but that seems to have come and gone. How would you like to
>> proceed?
> 
> Oh yes, that's right, I forgot that connection.  I'm afraid I kind of
> dropped the ball on that thread, so I went back and read through it
> again.
> 
> I *think* the current state is:
> 
>   - I'm OK with the first two patches that add the quirk
>     infrastructure.
> 
>   - My issue with the last three patches that add ThunderX quirks is
>     that there's no generic description of the ECAM address space.
> 
> So if I understand correctly, your Qualcomm patch depends only on the
> first two patches.
> 
> Then the question is how the Qualcomm ECAM address space is described.
> Your quirk overrides the default pci_generic_ecam_ops with the
> &pci_32b_ops, but it doesn't touch the address space part, so I assume
> the bus ranges and corresponding address space in your MCFG is
> correct.  So far, so good.
> 
> Is there also an ACPI device that contains that space in _CRS?  I
> think we concluded that the standard solution is to describe this with
> a PNP0C02 device.
> 
> Would you mind opening a bugzilla at bugzilla.kernel.org and attaching
> the dmesg log, /proc/iomem, and maybe a DSDT dump?  I'd like to have
> something to point at to say "if you need an MCFG quirk, you need the
> MCFG bit and *also* these other related ACPI device bits, and here's
> how it should be done."

We're working to add the PNP0C02 resource to future firmware, but it's
not in the current firmware. Are dmesg and /proc/iomem from the
current firmware interesting or should we wait for the update to file?

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Summary of LPC guest MSI discussion in Santa Fe
From: Christoffer Dall @ 2016-11-09 19:23 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <582371FB.2040808@redhat.com>

On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote:
> On 11/09/2016 12:03 PM, Will Deacon wrote:
> >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
> >>On 11/08/2016 06:35 PM, Alex Williamson wrote:
> >>>On Tue, 8 Nov 2016 21:29:22 +0100
> >>>Christoffer Dall <christoffer.dall@linaro.org> wrote:
> >>>>Is my understanding correct, that you need to tell userspace about the
> >>>>location of the doorbell (in the IOVA space) in case (2), because even
> >>>>though the configuration of the device is handled by the (host) kernel
> >>>>through trapping of the BARs, we have to avoid the VFIO user programming
> >>>>the device to create other DMA transactions to this particular address,
> >>>>since that will obviously conflict and either not produce the desired
> >>>>DMA transactions or result in unintended weird interrupts?
> >
> >Yes, that's the crux of the issue.
> >
> >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
> >>>it's potentially a DMA target and we'll get bogus data on DMA read from
> >>>the device, and lose data and potentially trigger spurious interrupts on
> >>>DMA write from the device.  Thanks,
> >>>
> >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
> >>they address match before the SMMU checks are done.  if
> >>all DMA addrs had to go through SMMU first, then the DMA access could
> >>be ignored/rejected.
> >
> >That's actually not true :( The SMMU can't generally distinguish between MSI
> >writes and DMA writes, so it would just see a write transaction to the
> >doorbell address, regardless of how it was generated by the endpoint.
> >
> >Will
> >
> So, we have real systems where MSI doorbells are placed at the same IOVA
> that could have memory for a guest

I don't think this is a property of a hardware system.  THe problem is
userspace not knowing where in the IOVA space the kernel is going to
place the doorbell, so you can end up (basically by chance) that some
IPA range of guest memory overlaps with the IOVA space for the doorbell.


>, but not at the same IOVA as memory on real hw ?

On real hardware without an IOMMU the system designer would have to
separate the IOVA and RAM in the physical address space.  With an IOMMU,
the SMMU driver just makes sure to allocate separate regions in the IOVA
space.

The challenge, as I understand it, happens with the VM, because the VM
doesn't allocate the IOVA for the MSI doorbell itself, but the host
kernel does this, independently from the attributes (e.g. memory map) of
the VM.

Because the IOVA is a single resource, but with two independent entities
allocating chunks of it (the host kernel for the MSI doorbell IOVA, and
the VFIO user for other DMA operations), you have to provide some
coordination between those to entities to avoid conflicts.  In the case
of KVM, the two entities are the host kernel and the VFIO user (QEMU/the
VM), and the host kernel informs the VFIO user to never attempt to use
the doorbell IOVA already reserved by the host kernel for DMA.

One way to do that is to ensure that the IPA space of the VFIO user
corresponding to the doorbell IOVA is simply not valid, ie. the reserved
regions that avoid for example QEMU to allocate RAM there.

(I suppose it's technically possible to get around this issue by letting
QEMU place RAM wherever it wants but tell the guest to never use a
particular subset of its RAM for DMA, because that would conflict with
the doorbell IOVA or be seen as p2p transactions.  But I think we all
probably agree that it's a disgusting idea.)

> How are memory holes passed to SMMU so it doesn't have this issue for bare-metal
> (assign an IOVA that overlaps an MSI doorbell address)?
> 

As I understand it, the SMMU driver manages the whole IOVA space when
VFIO is *not* involved, so it simply allocates non-overlapping regions.

The problem occurs when you have two independent entities essentially
attempting to mange the same resource (and the problem is exacerbated by
the VM potentially allocating slots in the IOVA space which may have
other limitations it doesn't know about, for example the p2p regions,
because the VM doesn't know anything about the topology of the
underlying physical system).

Christoffer

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox