* Summary of LPC guest MSI discussion in Santa Fe
From: Alex Williamson @ 2016-11-11 15:50 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161111111944.GO2078@8bytes.org>
On Fri, 11 Nov 2016 12:19:44 +0100
Joerg Roedel <joro@8bytes.org> wrote:
> On Thu, Nov 10, 2016 at 10:46:01AM -0700, Alex Williamson wrote:
> > In the case of x86, we know that DMA mappings overlapping the MSI
> > doorbells won't be translated correctly, it's not a valid mapping for
> > that range, and therefore the iommu driver backing the IOMMU API
> > should describe that reserved range and reject mappings to it.
>
> The drivers actually allow mappings to the MSI region via the IOMMU-API,
> and I think it should stay this way also for other reserved ranges.
> Address space management is done by the IOMMU-API user already (and has
> to be done there nowadays), be it a DMA-API implementation which just
> reserves these regions in its address space allocator or be it VFIO with
> QEMU, which don't map RAM there anyway. So there is no point of checking
> this again in the IOMMU drivers and we can keep that out of the
> mapping/unmapping fast-path.
It's really just a happenstance that we don't map RAM over the x86 MSI
range though. That property really can't be guaranteed once we mix
architectures, such as running an aarch64 VM on x86 host via TCG.
AIUI, the MSI range is actually handled differently than other DMA
ranges, so a iommu_map() overlapping a range that the iommu cannot map
should fail just like an attempt to map beyond the address width of the
iommu.
> > For PCI devices userspace can examine the topology of the iommu group
> > and exclude MMIO ranges of peer devices based on the BARs, which are
> > exposed in various places, pci-sysfs as well as /proc/iomem. For
> > non-PCI or MSI controllers... ???
>
> Right, the hardware resources can be examined. But maybe this can be
> extended to also cover RMRR ranges? Then we would be able to assign
> devices with RMRR mappings to guests.
RMRRs are special in a different way, the VT-d spec requires that the
OS honor RMRRs, the user has no responsibility (and currently no
visibility) to make that same arrangement. In order to potentially
protect the physical host platform, the iommu drivers should prevent a
user from remapping RMRRS. Maybe there needs to be a different
interface used by untrusted users vs in-kernel drivers, but I think the
kernel really needs to be defensive in the case of user mappings, which
is where the IOMMU API is rooted. Thanks,
Alex
^ permalink raw reply
* [RFC PATCH 6/6] [media] davinci: vpif_capture: get subdevs from DT
From: Javier Martinez Canillas @ 2016-11-11 15:50 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <dad6e38b-093b-bd36-3e0d-a0c10bddea58@xs4all.nl>
Hello,
On Fri, Nov 11, 2016 at 12:36 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> On 10/26/2016 01:55 AM, Kevin Hilman wrote:
>> First pass at getting subdevs from DT ports and endpoints.
>>
>> The _get_pdata() function was larely inspired by (i.e. stolen from)
>> am437x-vpfe.c
>>
>> Questions:
>> - Legacy board file passes subdev input & output routes via pdata
>> (e.g. tvp514x svideo or composite selection.) How is this supposed
>> to be done via DT?
>
> We have plans to model connectors as well in the device tree, but no
> implementation exists yet. I think Laurent has some code in progress for this,
> but I may be mistaken.
>
I posted a RFC series [0] some time ago, that proposed a DT binding
for input connectors [1] using OF graphs.
I never re-spin the series because Laurent had some comments on the DT
bindings and I was waiting for a response on to my latest email [2].
So if you can comment on this and see if the DT bindings fits your,
would be very useful.
> Anyway, hard-coding it like you do now is for now the only way.
>
> Hans
>
>>
[0]: https://www.mail-archive.com/linux-media at vger.kernel.org/msg96393.html
[1]: http://www.spinics.net/lists/linux-media/msg99421.html
[2]: http://www.spinics.net/lists/linux-media/msg99987.html
Best regards,
Javier
^ permalink raw reply
* [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback
From: Auger Eric @ 2016-11-11 15:47 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161111114223.GP2078@8bytes.org>
Hi Joerg,
On 11/11/2016 12:42, Joerg Roedel wrote:
> On Thu, Nov 10, 2016 at 07:00:52PM +0100, Auger Eric wrote:
>> GICv2m and GICV3 ITS use dma-mapping iommu_dma_map_msi_msg to allocate
>> an MSI IOVA on-demand.
>
> Yes, and it the right thing to do there because as a DMA-API
> implementation the dma-iommu code cares about the address space
> allocation.
>
> As I understand it this is different in your case, as someone else is
> defining the address space layout. So why do you need to allocate it
> yourself?
Effectively in passthrough use case, the userspace defines the address
space layout and maps guest RAM PA=IOVA to PAs (using
VFIO_IOMMU_MAP_DMA). But this address space does not comprise the MSI
IOVAs. Userspace does not care about MSI IOMMU mapping. So the MSI IOVA
region must be allocated by either the VFIO driver or the IOMMU driver I
think. Who else could initialize the IOVA allocator domain?
That's true that we have a mix of unmanaged addresses and "managed"
addresses which is not neat. But how to manage otherwise?
Thanks
Eric
>
>
> Joerg
>
^ permalink raw reply
* [PATCH] Replacement for Arm initrd memblock reserve and free inconsistency.
From: william.helsby at stfc.ac.uk @ 2016-11-11 15:46 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161110174645.GB1041@n2100.armlinux.org.uk>
>From 45272bc4d3f27f2c316f5b607441d1337a5501ec Mon Sep 17 00:00:00 2001
From: William Helsby <william.helsby@stfc.ac.uk>
Date: Fri, 11 Nov 2016 15:04:04 +0000
Subject: [PATCH] arm: zynq::Fixing inconsistent reserve and free for initrd
image
My first attempt at fixing this was very similar to your proposed patch,
though due me not understanding the need to switch outlook to plain text
it never got posted.
In the current case, with the initrd image starting page aligned, with space
above it, the patch worked.
However, on reflection it struck me that the placement of the ramdisk image
is done by the bootloader, so may not always be like this.
This patch tries to round down the start and up the end, but checks that this
does not cause an overlap. If it does overlap, it tries aligning just the end
or start and if neither is possible falls back to not expanding the area
reserved.
The code then remembers the start and end of reserved area, with
any successful expansions to page boundaries.
If when ./init/initramfs.c calls free_initrd_mem(), the start or end match,
The respective extended values are used so complete pages are released.
Note that according to comments in ./init/initramfs.c, the crashkernel
region can overlap the initrd region, though I don't know anything about
this.
The POISON_FREE_INITMEM value is used to cause free_reserved_area to
poison only the pages which are released.
Despite this code being careful not to overlap the kernel, there still may be
an issue with it extending the RAMDISK image reserved are to overlap
the device tree or something else I am unaware of.
Perhaps the call to early_init_fdt_reserve_self() should be moved
to before the initrd code?
For the sake of freeing at most 2 partial 4k pages, I am not sure whether
complexity and technical risk of this solution is justified - unless these
pages are going to be in the middle of a DMA contiguous area.
Note that the arm64 code reverted to not extending the released area,
but enable clearing of the released areas at
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/arch/arm64/mm/init.c?id=6b00f7efb5303418c231994c91fb8239f5ada260
Now
void free_initrd_mem(unsigned long start, unsigned long end)
{
if (!keep_initrd)
free_reserved_area((void *)start, (void *)end, 0, "initrd");
}
Signed-off-by: William Helsby <william.helsby@stfc.ac.uk>
---
arch/arm/mm/init.c | 38 +++++++++++++++++++++++++++++++++-----
1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 370581a..00a489d 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -49,6 +49,8 @@ unsigned long __init __clear_cr(unsigned long mask)
static phys_addr_t phys_initrd_start __initdata = 0;
static unsigned long phys_initrd_size __initdata = 0;
+static unsigned long initrd_reservation_start __initdata = 0;
+static unsigned long initrd_reservation_end __initdata = 0;
static int __init early_initrd(char *p)
{
@@ -255,11 +257,38 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
phys_initrd_start = phys_initrd_size = 0;
}
if (phys_initrd_size) {
- memblock_reserve(phys_initrd_start, phys_initrd_size);
+ /* try to round the initrd start and end down and up to page boundaries,
+ so when freed, whole pages can be released.
+ However the initrd image may be adjacent to something else, so check that this rounding is OK
+ */
+ /* First try rounding the start down and end up */
+ phys_addr_t phys_initrd_reservation_start = phys_initrd_start & PAGE_MASK;
+ unsigned long size_to_reserve = PAGE_ALIGN(phys_initrd_start+phys_initrd_size) - phys_initrd_reservation_start;
+ if (!memblock_is_region_memory(phys_initrd_reservation_start, size_to_reserve) ||
+ memblock_is_region_reserved(phys_initrd_reservation_start, size_to_reserve)) {
+ /* This either does fit or overlaps something else, so try just rounding end up */
+ phys_initrd_reservation_start = phys_initrd_start;
+ size_to_reserve = PAGE_ALIGN(phys_initrd_start+phys_initrd_size) - phys_initrd_reservation_start;
+ if (!memblock_is_region_memory(phys_initrd_reservation_start, size_to_reserve) ||
+ memblock_is_region_reserved(phys_initrd_reservation_start, size_to_reserve)) {
+ /* This either does not fit or overlaps something else, so try just rounding start down */
+ phys_initrd_reservation_start = phys_initrd_start & PAGE_MASK;
+ size_to_reserve = (phys_initrd_start+phys_initrd_size) - phys_initrd_reservation_start;
+ if (!memblock_is_region_memory(phys_initrd_reservation_start, size_to_reserve) ||
+ memblock_is_region_reserved(phys_initrd_reservation_start, size_to_reserve)) {
+ /* This either does not fit or overlaps something else, so do not round at all */
+ phys_initrd_reservation_start = phys_initrd_start;
+ size_to_reserve = (phys_initrd_start+phys_initrd_size) - phys_initrd_reservation_start;
+ }
+ }
+ }
+ memblock_reserve(phys_initrd_reservation_start, size_to_reserve);
/* Now convert initrd to virtual addresses */
initrd_start = __phys_to_virt(phys_initrd_start);
initrd_end = initrd_start + phys_initrd_size;
+ initrd_reservation_start = __phys_to_virt(phys_initrd_reservation_start);
+ initrd_reservation_end = initrd_reservation_start + size_to_reserve;
}
#endif
@@ -771,12 +800,11 @@ void free_initrd_mem(unsigned long start, unsigned long end)
{
if (!keep_initrd) {
if (start == initrd_start)
- start = round_down(start, PAGE_SIZE);
+ start = initrd_reservation_start;
if (end == initrd_end)
- end = round_up(end, PAGE_SIZE);
+ end = initrd_reservation_end;
- poison_init_mem((void *)start, PAGE_ALIGN(end) - start);
- free_reserved_area((void *)start, (void *)end, -1, "initrd");
+ free_reserved_area((void *)start, (void *)end, POISON_FREE_INITMEM, "initrd");
}
}
--
1.8.3.1
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at armlinux.org.uk]
> Sent: 10 November 2016 17:47
>To: Helsby, William (STFC,DL,TECH)
> Cc: linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH] Replacement for Arm initrd memblock reserve and free inconsistency.
>On Wed, Nov 09, 2016 at 04:35:37PM +0000, william.helsby at stfc.ac.uk wrote:
>> A boot time system crash was noticed with a segmentation fault just after the initrd image had been used to initialise the ramdisk.
>> This occurred when the U-Boot loaded the ramdisk image from a FAT partition, but not when loaded by TFTPBOOT. This is not understood?
>> However the problem was caused by free_initrd_mem freeing and
>> "poisoning" memory that had been allocted to init/main.c to store the saved_command_line.
>> This patch reverses "ARM: 8167/1: extend the reserved memory for initrd to be page aligned"
>> because it is safer to leave a partial head or tail page reserved (wasted) than to free a page which is partially still in use.
>> If this is not acceptable (particularly if wanting large contiguous physical areas for DMA) then a better solution is required.
>> This would extend the region reserved to page boundaries, if possible without overlapping other regions.
>>My previous attempt to fix this coded this scheme, to grow the are reserved.
>> However, this? again is not safe if in growing the area it then overlaps a region that is in use.
>> Note this patch is against the 4.6 kernel, but as far as I can tell applies equally to 4.8.
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index
> 370581a..ff3e9c3 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -770,12 +770,6 @@ static int keep_initrd; void
>> free_initrd_mem(unsigned long start, unsigned long end) {
>> ??????? if (!keep_initrd) {
>> -?????????????? if (start == initrd_start)
>> -?????????????????????? start = round_down(start, PAGE_SIZE);
>> -?????????????? if (end == initrd_end)
>> -?????????????????????? end = round_up(end, PAGE_SIZE);
>> -
>> -???? ??????????poison_init_mem((void *)start, PAGE_ALIGN(end) -
>> start);
> We're definitely not getting rid of the poisoning of the pages - the poisoning there is to detect accesses to this memory which should not be made.
> The point of rounding up and down is to ensure that the partly-used pages (which would have been previously reserved) are freed.
> Probably a better fix is to round the start up/end down of the initrd when reserving the memory region:
> arch/arm/mm/init.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 370581aeb871..ee8509e4329d 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -255,7 +255,11 @@ void __init arm_memblock_init(const struct machine_desc *mdesc)
> phys_initrd_start = phys_initrd_size = 0;
> }
> if (phys_initrd_size) {
> - memblock_reserve(phys_initrd_start, phys_initrd_size);
> + phys_addr_t start, size;
> +
> + start = round_down(phys_initrd_start, PAGE_SIZE);
> + end = round_up(phys_initrd_start + phys_initrd_size, PAGE_SIZE);
> + memblock_reserve(start, end - start);
>
> /* Now convert initrd to virtual addresses */
> initrd_start = __phys_to_virt(phys_initrd_start);
>
> and this should ensure that memblock_alloc() doesn't try to allocate memory overlapping the pages containing the initrd.
> Intentionally using pages overlapping the initrd is a recipe for problems...
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
^ permalink raw reply related
* [RFC PATCH 0/6] media: davinci: VPIF: add DT support
From: Hans Verkuil @ 2016-11-11 15:36 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161025235536.7342-1-khilman@baylibre.com>
Hi Kevin,
On 10/26/2016 01:55 AM, Kevin Hilman wrote:
> This series attempts to add DT support to the davinci VPIF capture
> driver.
>
> I'm not sure I've completely grasped the proper use of the ports and
> endpoints stuff, so this RFC is primarily to get input on whether I'm
> on the right track.
>
> The last patch is the one where all my questions are, the rest are
> just prep work to ge there.
>
> Tested on da850-lcdk and was able to do basic frame capture from the
> composite input.
>
> Series applies on v4.9-rc1
>
> Kevin Hilman (6):
> [media] davinci: add support for DT init
> ARM: davinci: da8xx: VPIF: enable DT init
> ARM: dts: davinci: da850: add VPIF
> ARM: dts: davinci: da850-lcdk: enable VPIF capture
> [media] davinci: vpif_capture: don't lock over s_stream
> [media] davinci: vpif_capture: get subdevs from DT
Looks good, but wouldn't it be better to do the dts changes last when all the
supporting code is in?
Regards,
Hans
>
> arch/arm/boot/dts/da850-lcdk.dts | 30 ++++++
> arch/arm/boot/dts/da850.dtsi | 28 +++++
> arch/arm/mach-davinci/da8xx-dt.c | 17 +++
> drivers/media/platform/davinci/vpif.c | 9 ++
> drivers/media/platform/davinci/vpif_capture.c | 150 +++++++++++++++++++++++++-
> include/media/davinci/vpif_types.h | 9 +-
> 6 files changed, 236 insertions(+), 7 deletions(-)
>
^ permalink raw reply
* [RFC PATCH 6/6] [media] davinci: vpif_capture: get subdevs from DT
From: Hans Verkuil @ 2016-11-11 15:36 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161025235536.7342-7-khilman@baylibre.com>
On 10/26/2016 01:55 AM, Kevin Hilman wrote:
> First pass at getting subdevs from DT ports and endpoints.
>
> The _get_pdata() function was larely inspired by (i.e. stolen from)
> am437x-vpfe.c
>
> Questions:
> - Legacy board file passes subdev input & output routes via pdata
> (e.g. tvp514x svideo or composite selection.) How is this supposed
> to be done via DT?
We have plans to model connectors as well in the device tree, but no
implementation exists yet. I think Laurent has some code in progress for this,
but I may be mistaken.
Anyway, hard-coding it like you do now is for now the only way.
Hans
>
> Not-Yet-Signed-off-by: Kevin Hilman <khilman@baylibre.com>
> ---
> drivers/media/platform/davinci/vpif_capture.c | 132 +++++++++++++++++++++++++-
> include/media/davinci/vpif_types.h | 9 +-
> 2 files changed, 134 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/media/platform/davinci/vpif_capture.c b/drivers/media/platform/davinci/vpif_capture.c
> index becc3e63b472..df2af5cda37a 100644
> --- a/drivers/media/platform/davinci/vpif_capture.c
> +++ b/drivers/media/platform/davinci/vpif_capture.c
> @@ -26,6 +26,8 @@
> #include <linux/slab.h>
>
> #include <media/v4l2-ioctl.h>
> +#include <media/v4l2-of.h>
> +#include <media/i2c/tvp514x.h> /* FIXME: how to pass the INPUT_* OUTPUT* fields? */
>
> #include "vpif.h"
> #include "vpif_capture.h"
> @@ -651,6 +653,10 @@ static int vpif_input_to_subdev(
>
> vpif_dbg(2, debug, "vpif_input_to_subdev\n");
>
> + if (!chan_cfg)
> + return -1;
> + if (input_index >= chan_cfg->input_count)
> + return -1;
> subdev_name = chan_cfg->inputs[input_index].subdev_name;
> if (subdev_name == NULL)
> return -1;
> @@ -658,7 +664,7 @@ static int vpif_input_to_subdev(
> /* loop through the sub device list to get the sub device info */
> for (i = 0; i < vpif_cfg->subdev_count; i++) {
> subdev_info = &vpif_cfg->subdev_info[i];
> - if (!strcmp(subdev_info->name, subdev_name))
> + if (subdev_info && !strcmp(subdev_info->name, subdev_name))
> return i;
> }
> return -1;
> @@ -1328,13 +1334,25 @@ static int vpif_async_bound(struct v4l2_async_notifier *notifier,
> {
> int i;
>
> + for (i = 0; i < vpif_obj.config->asd_sizes[0]; i++) {
> + const struct device_node *node = vpif_obj.config->asd[i]->match.of.node;
> +
> + if (node == subdev->of_node) {
> + vpif_obj.sd[i] = subdev;
> + vpif_obj.config->chan_config->inputs[i].subdev_name = subdev->of_node->full_name;
> + vpif_dbg(2, debug, "%s: setting input %d subdev_name = %s\n", __func__,
> + i, subdev->of_node->full_name);
> + return 0;
> + }
> + }
> +
> for (i = 0; i < vpif_obj.config->subdev_count; i++)
> if (!strcmp(vpif_obj.config->subdev_info[i].name,
> subdev->name)) {
> vpif_obj.sd[i] = subdev;
> return 0;
> }
> -
> +
> return -EINVAL;
> }
>
> @@ -1423,6 +1441,113 @@ static int vpif_async_complete(struct v4l2_async_notifier *notifier)
> return vpif_probe_complete();
> }
>
> +static struct vpif_capture_config *
> +vpif_capture_get_pdata(struct platform_device *pdev)
> +{
> + struct device_node *endpoint = NULL;
> + struct v4l2_of_endpoint bus_cfg;
> + struct vpif_capture_config *pdata;
> + struct vpif_subdev_info *sdinfo;
> + struct vpif_capture_chan_config *chan;
> + unsigned int i;
> +
> + dev_dbg(&pdev->dev, "vpif_get_pdata\n");
> +
> + if (!IS_ENABLED(CONFIG_OF) || !pdev->dev.of_node)
> + return pdev->dev.platform_data;
> +
> + pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL);
> + if (!pdata)
> + return NULL;
> + pdata->subdev_info = devm_kzalloc(&pdev->dev,
> + sizeof(*pdata->subdev_info) *
> + VPIF_CAPTURE_MAX_CHANNELS, GFP_KERNEL);
> + if (!pdata->subdev_info)
> + return NULL;
> + dev_dbg(&pdev->dev, "%s\n", __func__);
> +
> + for (i = 0; ; i++) {
> + struct device_node *rem;
> + unsigned int flags;
> + int err;
> +
> + endpoint = of_graph_get_next_endpoint(pdev->dev.of_node,
> + endpoint);
> + if (!endpoint)
> + break;
> +
> + dev_dbg(&pdev->dev, "found endpoint %s, %s\n",
> + endpoint->name, endpoint->full_name);
> +
> + sdinfo = &pdata->subdev_info[i];
> + chan = &pdata->chan_config[i];
> + chan->inputs = devm_kzalloc(&pdev->dev,
> + sizeof(*chan->inputs) *
> + VPIF_DISPLAY_MAX_CHANNELS,
> + GFP_KERNEL);
> +
> + /* sdinfo->name = devm_kzalloc(&pdev->dev, 16, GFP_KERNEL); */
> + /* snprintf(sdinfo->name, 16, "VPIF input %d", i); */
> + chan->input_count++;
> + chan->inputs[i].input.type = V4L2_INPUT_TYPE_CAMERA;
> + chan->inputs[i].input.std = V4L2_STD_ALL;
> + chan->inputs[i].input.capabilities = V4L2_IN_CAP_STD;
> +
> + /* FIXME: need a new property? ch0:composite ch1: s-video */
> + if (i == 0)
> + chan->inputs[i].input_route = INPUT_CVBS_VI2B;
> + else
> + chan->inputs[i].input_route = INPUT_SVIDEO_VI2C_VI1C;
> + chan->inputs[i].output_route = OUTPUT_10BIT_422_EMBEDDED_SYNC;
> +
> + err = v4l2_of_parse_endpoint(endpoint, &bus_cfg);
> + if (err) {
> + dev_err(&pdev->dev, "Could not parse the endpoint\n");
> + goto done;
> + }
> + dev_dbg(&pdev->dev, "Endpoint %s, bus_width = %d\n",
> + endpoint->full_name, bus_cfg.bus.parallel.bus_width);
> + flags = bus_cfg.bus.parallel.flags;
> +
> + if (flags & V4L2_MBUS_HSYNC_ACTIVE_HIGH)
> + chan->vpif_if.hd_pol = 1;
> +
> + if (flags & V4L2_MBUS_VSYNC_ACTIVE_HIGH)
> + chan->vpif_if.vd_pol = 1;
> +
> + chan->vpif_if.if_type = VPIF_IF_BT656;
> + rem = of_graph_get_remote_port_parent(endpoint);
> + if (!rem) {
> + dev_dbg(&pdev->dev, "Remote device at %s not found\n",
> + endpoint->full_name);
> + goto done;
> + }
> +
> + dev_dbg(&pdev->dev, "Remote device %s, %s found\n", rem->name, rem->full_name);
> + sdinfo->name = rem->full_name;
> +
> + pdata->asd[i] = devm_kzalloc(&pdev->dev,
> + sizeof(struct v4l2_async_subdev),
> + GFP_KERNEL);
> + if (!pdata->asd[i]) {
> + of_node_put(rem);
> + pdata = NULL;
> + goto done;
> + }
> +
> + pdata->asd[i]->match_type = V4L2_ASYNC_MATCH_OF;
> + pdata->asd[i]->match.of.node = rem;
> + of_node_put(rem);
> + }
> +
> +done:
> + pdata->asd_sizes[0] = i;
> + pdata->subdev_count = i;
> + pdata->card_name = "DA850/OMAP-L138 Video Capture";
> +
> + return pdata;
> +}
> +
> /**
> * vpif_probe : This function probes the vpif capture driver
> * @pdev: platform device pointer
> @@ -1439,6 +1564,7 @@ static __init int vpif_probe(struct platform_device *pdev)
> int res_idx = 0;
> int i, err;
>
> + pdev->dev.platform_data = vpif_capture_get_pdata(pdev);;
> if (!pdev->dev.platform_data) {
> dev_warn(&pdev->dev, "Missing platform data. Giving up.\n");
> return -EINVAL;
> @@ -1481,7 +1607,7 @@ static __init int vpif_probe(struct platform_device *pdev)
> goto vpif_unregister;
> }
>
> - if (!vpif_obj.config->asd_sizes) {
> + if (!vpif_obj.config->asd_sizes[0]) {
> i2c_adap = i2c_get_adapter(1);
> for (i = 0; i < subdev_count; i++) {
> subdevdata = &vpif_obj.config->subdev_info[i];
> diff --git a/include/media/davinci/vpif_types.h b/include/media/davinci/vpif_types.h
> index 3cb1704a0650..4ee3b41975db 100644
> --- a/include/media/davinci/vpif_types.h
> +++ b/include/media/davinci/vpif_types.h
> @@ -65,14 +65,14 @@ struct vpif_display_config {
>
> struct vpif_input {
> struct v4l2_input input;
> - const char *subdev_name;
> + char *subdev_name;
> u32 input_route;
> u32 output_route;
> };
>
> struct vpif_capture_chan_config {
> struct vpif_interface vpif_if;
> - const struct vpif_input *inputs;
> + struct vpif_input *inputs;
> int input_count;
> };
>
> @@ -83,7 +83,8 @@ struct vpif_capture_config {
> struct vpif_subdev_info *subdev_info;
> int subdev_count;
> const char *card_name;
> - struct v4l2_async_subdev **asd; /* Flat array, arranged in groups */
> - int *asd_sizes; /* 0-terminated array of asd group sizes */
> +
> + struct v4l2_async_subdev *asd[VPIF_CAPTURE_MAX_CHANNELS];
> + int asd_sizes[VPIF_CAPTURE_MAX_CHANNELS];
> };
> #endif /* _VPIF_TYPES_H */
>
^ permalink raw reply
* [PATCH v14 4/9] acpi/arm64: Add GTDT table parse driver
From: Mark Rutland @ 2016-11-11 15:32 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <5825CBB5.8090104@linaro.org>
On Fri, Nov 11, 2016 at 09:46:29PM +0800, Hanjun Guo wrote:
> On 10/21/2016 12:37 AM, Mark Rutland wrote:
> >On Thu, Sep 29, 2016 at 02:17:12AM +0800, fu.wei at linaro.org wrote:
> >>+static int __init map_gt_gsi(u32 interrupt, u32 flags)
> >>+{
> >>+ int trigger, polarity;
> >>+
> >>+ if (!interrupt)
> >>+ return 0;
> >
> >Urgh.
> >
> >Only the secure interrupt (which we do not need) is optional in this
> >manner, and (hilariously), zero appears to also be a valid GSIV, per
> >figure 5-24 in the ACPI 6.1 spec.
> >
> >So, I think that:
> >
> >(a) we should not bother parsing the secure interrupt
> >(b) we should drop the check above
> >(c) we should report the spec issue to the ASWG
>
> Sorry, I willing to do that, but I need to figure out the issue here.
> What kind of issue in detail? do you mean that zero should not be valid
> for arch timer interrupts?
As above, zero is a valid GSIV, and is valid for the non-secure timer
interrupts. The check is wrong for non-secure interrupts.
We can ignore the secure timer interrupt since it's irrelevant to us,
and remove the check.
Regardless, the spec is inconsistent w.r.t. the secure interrupt being
zero if not present, since zero is a valid GSIV. That should be reported
to the ASWG.
Thanks,
Mark.
^ permalink raw reply
* [PATCH RFC 00/12] tda998x updates
From: Russell King - ARM Linux @ 2016-11-11 15:27 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <964260d7-af7f-fefb-2adb-0e10ff750883@ti.com>
On Fri, Nov 11, 2016 at 05:10:09PM +0200, Jyri Sarha wrote:
> On 11/08/16 14:24, Russell King - ARM Linux wrote:
> > As no one responded to the previous round, I'm not spending soo much
> > time writing up a description of these changes again. It's also been
> > quite a long time, so I've forgotten all the details of the changes,
> > so I'll do my best.
> >
> > Changes from the previous series include:
> > - reorder the initial three patches
> > - change the (now third patch)... I think to increase the size of the
> > locked region.
> > - fix edid parsing for infoframe generation - as was pointed out for
> > dw-hdmi, parsing the EDID in get_modes() is incorrect, as that method
> > will not be called when an override-edid is in effect. We need to
> > parse the override-edid. Moreover, infoframe generation should not
> > be keyed to whether the monitor is HDMI or not, CEA-861B allows non-
> > HDMI to send infoframes.
> > - only send audio if audio and infoframes are supported.
> >
> > Otherwise, these are very much like the previous posting of the series,
> > except rebased upon the mali/hdlcd/tda998x change to remove the
> > drm_connector_register() call.
> >
> > https://www.spinics.net/lists/dri-devel/msg121495.html
> >
> > It'd be nice to have other tda998x users ack and test these patches,
> > I've tried to test on Juno, but the Juno situation seems to be a huge
> > fail. (HBI0282B completely fails with latest firmware - (a) FPGA image
> > incompatibilities io_b118 causes all FPGA AMBA devices to vanish, (b)
> > seems no way to get SCPI support on it - adding the BL0 executable
> > start address in the SCC registers seems to be incompatible with the
> > devchip, causing the PLLs to fail. In discussion with Sudeep over
> > these issues, but no idea where things are with it at the moment, other
> > than Sudeep needs to investigate. All Linaro firmware releases are
> > broken on HBI0282B.)
> >
> > drivers/gpu/drm/i2c/tda998x_drv.c | 826 ++++++++++++++++++++------------------
> > 1 file changed, 429 insertions(+), 397 deletions(-)
> >
>
> Reviewed-by: Jyri Sarha <jsarha@ti.com>
>
> For the whole series. I am also happy to test these patches if I can
> fetch them from some git repo.
git://git.armlinux.org.uk/~rmk/linux-arm.git drm-tda998x-devel
The commit IDs are unstable, because I'll have to recommit them to add
your r-by and any other tags you later give me. :)
Thanks.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply
* [PATCH v7 04/16] drivers: iommu: make of_iommu_set/get_ops() DT agnostic
From: Joerg Roedel @ 2016-11-11 15:22 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161109141948.19244-5-lorenzo.pieralisi@arm.com>
On Wed, Nov 09, 2016 at 02:19:36PM +0000, Lorenzo Pieralisi wrote:
> +struct iommu_fwentry {
> + struct list_head list;
> + struct fwnode_handle *fwnode;
> + const struct iommu_ops *ops;
> +};
Is there a reason the iommu_ops need to be stored there? Currently it
seems that the ops are only needed to get the of_xlate fn-ptr later. And
the place where it is called the iommu-ops should also be available
through pdev->dev->bus->iommu_ops.
Joerg
^ permalink raw reply
* [PATCH v5 2/3] vcodec: mediatek: Add Mediatek JPEG Decoder Driver
From: Hans Verkuil @ 2016-11-11 15:10 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1478586880-3923-3-git-send-email-rick.chang@mediatek.com>
A quick review:
On 11/08/2016 07:34 AM, Rick Chang wrote:
> Add v4l2 driver for Mediatek JPEG Decoder
>
> Signed-off-by: Rick Chang <rick.chang@mediatek.com>
> Signed-off-by: Minghsiu Tsai <minghsiu.tsai@mediatek.com>
> ---
> drivers/media/platform/Kconfig | 15 +
> drivers/media/platform/Makefile | 2 +
> drivers/media/platform/mtk-jpeg/Makefile | 2 +
> drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c | 1275 ++++++++++++++++++++++
> drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h | 141 +++
> drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.c | 417 +++++++
> drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.h | 91 ++
> drivers/media/platform/mtk-jpeg/mtk_jpeg_parse.c | 160 +++
> drivers/media/platform/mtk-jpeg/mtk_jpeg_parse.h | 25 +
> drivers/media/platform/mtk-jpeg/mtk_jpeg_reg.h | 58 +
> 10 files changed, 2186 insertions(+)
> create mode 100644 drivers/media/platform/mtk-jpeg/Makefile
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.c
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.h
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_parse.c
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_parse.h
> create mode 100644 drivers/media/platform/mtk-jpeg/mtk_jpeg_reg.h
>
> diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
> index 754edbf1..96c9887 100644
> --- a/drivers/media/platform/Kconfig
> +++ b/drivers/media/platform/Kconfig
> @@ -162,6 +162,21 @@ config VIDEO_CODA
> Coda is a range of video codec IPs that supports
> H.264, MPEG-4, and other video formats.
>
> +config VIDEO_MEDIATEK_JPEG
> + tristate "Mediatek JPEG Codec driver"
> + depends on MTK_IOMMU_V1 || COMPILE_TEST
> + depends on VIDEO_DEV && VIDEO_V4L2
> + depends on ARCH_MEDIATEK || COMPILE_TEST
> + depends on HAS_DMA
> + select VIDEOBUF2_DMA_CONTIG
> + select V4L2_MEM2MEM_DEV
> + ---help---
> + Mediatek jpeg codec driver provides HW capability to decode
> + JPEG format
> +
> + To compile this driver as a module, choose M here: the
> + module will be called mtk-jpeg
> +
> config VIDEO_MEDIATEK_VPU
> tristate "Mediatek Video Processor Unit"
> depends on VIDEO_DEV && VIDEO_V4L2 && HAS_DMA
> diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
> index f842933..cf701e3 100644
> --- a/drivers/media/platform/Makefile
> +++ b/drivers/media/platform/Makefile
> @@ -68,3 +68,5 @@ obj-$(CONFIG_VIDEO_MEDIATEK_VPU) += mtk-vpu/
> obj-$(CONFIG_VIDEO_MEDIATEK_VCODEC) += mtk-vcodec/
>
> obj-$(CONFIG_VIDEO_MEDIATEK_MDP) += mtk-mdp/
> +
> +obj-$(CONFIG_VIDEO_MEDIATEK_JPEG) += mtk-jpeg/
> diff --git a/drivers/media/platform/mtk-jpeg/Makefile b/drivers/media/platform/mtk-jpeg/Makefile
> new file mode 100644
> index 0000000..b2e6069
> --- /dev/null
> +++ b/drivers/media/platform/mtk-jpeg/Makefile
> @@ -0,0 +1,2 @@
> +mtk_jpeg-objs := mtk_jpeg_core.o mtk_jpeg_hw.o mtk_jpeg_parse.o
> +obj-$(CONFIG_VIDEO_MEDIATEK_JPEG) += mtk_jpeg.o
> diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> new file mode 100644
> index 0000000..33ddf79
> --- /dev/null
> +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> @@ -0,0 +1,1275 @@
> +/*
> + * Copyright (c) 2016 MediaTek Inc.
> + * Author: Ming Hsiu Tsai <minghsiu.tsai@mediatek.com>
> + * Rick Chang <rick.chang@mediatek.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/err.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/spinlock.h>
> +#include <media/v4l2-event.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/videobuf2-core.h>
> +#include <media/videobuf2-dma-contig.h>
> +#include <soc/mediatek/smi.h>
> +#include <asm/dma-iommu.h>
> +
> +#include "mtk_jpeg_hw.h"
> +#include "mtk_jpeg_core.h"
> +#include "mtk_jpeg_parse.h"
> +
> +static struct mtk_jpeg_fmt mtk_jpeg_formats[] = {
> + {
> + .name = "JPEG JFIF",
> + .fourcc = V4L2_PIX_FMT_JPEG,
> + .colplanes = 1,
> + .flags = MTK_JPEG_FMT_FLAG_DEC_OUTPUT,
> + },
> + {
> + .name = "YUV 4:2:0 non-contiguous 3-planar, Y/Cb/Cr",
> + .fourcc = V4L2_PIX_FMT_YUV420M,
> + .h_sample = {4, 2, 2},
> + .v_sample = {4, 2, 2},
> + .colplanes = 3,
> + .h_align = 5,
> + .v_align = 4,
> + .flags = MTK_JPEG_FMT_FLAG_DEC_CAPTURE,
> + },
> + {
> + .name = "YUV 4:2:2 non-contiguous 3-planar, Y/Cb/Cr",
> + .fourcc = V4L2_PIX_FMT_YUV422M,
> + .h_sample = {4, 2, 2},
> + .v_sample = {4, 4, 4},
> + .colplanes = 3,
> + .h_align = 5,
> + .v_align = 3,
> + .flags = MTK_JPEG_FMT_FLAG_DEC_CAPTURE,
You probably don't need the name since it is filled in by the v4l2 core
(v4l2-ioctls.c).
> + },
> +};
> +
> +#define MTK_JPEG_NUM_FORMATS ARRAY_SIZE(mtk_jpeg_formats)
> +
> +enum {
> + MTK_JPEG_BUF_FLAGS_INIT = 0,
> + MTK_JPEG_BUF_FLAGS_LAST_FRAME = 1,
> +};
> +
> +struct mtk_jpeg_src_buf {
> + struct vb2_v4l2_buffer b;
> + struct list_head list;
> + int flags;
> + struct mtk_jpeg_dec_param dec_param;
> +};
> +
> +static int debug;
> +module_param(debug, int, 0644);
> +
> +static inline struct mtk_jpeg_ctx *mtk_jpeg_fh_to_ctx(struct v4l2_fh *fh)
> +{
> + return container_of(fh, struct mtk_jpeg_ctx, fh);
> +}
> +
> +static inline struct mtk_jpeg_src_buf *mtk_jpeg_vb2_to_srcbuf(
> + struct vb2_buffer *vb)
> +{
> + return container_of(to_vb2_v4l2_buffer(vb), struct mtk_jpeg_src_buf, b);
> +}
> +
> +static int mtk_jpeg_querycap(struct file *file, void *priv,
> + struct v4l2_capability *cap)
> +{
> + struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> +
> + strlcpy(cap->driver, MTK_JPEG_NAME " decoder", sizeof(cap->driver));
> + strlcpy(cap->card, MTK_JPEG_NAME " decoder", sizeof(cap->card));
> + snprintf(cap->bus_info, sizeof(cap->bus_info), "platform:%s",
> + dev_name(jpeg->dev));
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_enum_fmt(struct mtk_jpeg_fmt *mtk_jpeg_formats, int n,
> + struct v4l2_fmtdesc *f, u32 type)
> +{
> + int i, num = 0;
> +
> + for (i = 0; i < n; ++i) {
> + if (mtk_jpeg_formats[i].flags & type) {
> + if (num == f->index)
> + break;
> + ++num;
> + }
> + }
> +
> + if (i >= n)
> + return -EINVAL;
> +
> + f->pixelformat = mtk_jpeg_formats[i].fourcc;
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_enum_fmt_vid_cap(struct file *file, void *priv,
> + struct v4l2_fmtdesc *f)
> +{
> + return mtk_jpeg_enum_fmt(mtk_jpeg_formats, MTK_JPEG_NUM_FORMATS, f,
> + MTK_JPEG_FMT_FLAG_DEC_CAPTURE);
> +}
> +
> +static int mtk_jpeg_enum_fmt_vid_out(struct file *file, void *priv,
> + struct v4l2_fmtdesc *f)
> +{
> + return mtk_jpeg_enum_fmt(mtk_jpeg_formats, MTK_JPEG_NUM_FORMATS, f,
> + MTK_JPEG_FMT_FLAG_DEC_OUTPUT);
> +}
> +
> +static struct mtk_jpeg_q_data *mtk_jpeg_get_q_data(struct mtk_jpeg_ctx *ctx,
> + enum v4l2_buf_type type)
> +{
> + if (V4L2_TYPE_IS_OUTPUT(type))
> + return &ctx->out_q;
> + else
No need for 'else'.
> + return &ctx->cap_q;
> +}
> +
> +static struct mtk_jpeg_fmt *mtk_jpeg_find_format(struct mtk_jpeg_ctx *ctx,
> + u32 pixelformat,
> + unsigned int fmt_type)
> +{
> + unsigned int k, fmt_flag;
> +
> + fmt_flag = (fmt_type == MTK_JPEG_FMT_TYPE_OUTPUT) ?
> + MTK_JPEG_FMT_FLAG_DEC_OUTPUT :
> + MTK_JPEG_FMT_FLAG_DEC_CAPTURE;
> +
> + for (k = 0; k < MTK_JPEG_NUM_FORMATS; k++) {
> + struct mtk_jpeg_fmt *fmt = &mtk_jpeg_formats[k];
> +
> + if (fmt->fourcc == pixelformat && fmt->flags & fmt_flag)
> + return fmt;
> + }
> +
> + return NULL;
> +}
> +
> +static void mtk_jpeg_bound_align_image(u32 *w, unsigned int wmin,
> + unsigned int wmax, unsigned int walign,
> + u32 *h, unsigned int hmin,
> + unsigned int hmax, unsigned int halign)
> +{
> + int width, height, w_step, h_step;
> +
> + width = *w;
> + height = *h;
> + w_step = 1 << walign;
> + h_step = 1 << halign;
> +
> + v4l_bound_align_image(w, wmin, wmax, walign, h, hmin, hmax, halign, 0);
> + if (*w < width && (*w + w_step) <= wmax)
> + *w += w_step;
> + if (*h < height && (*h + h_step) <= hmax)
> + *h += h_step;
> +}
> +
> +static void mtk_jpeg_adjust_fmt_mplane(struct mtk_jpeg_ctx *ctx,
> + struct v4l2_format *f)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct mtk_jpeg_q_data *q_data;
> + int i;
> +
> + q_data = mtk_jpeg_get_q_data(ctx, f->type);
> +
> + pix_mp->width = q_data->w;
> + pix_mp->height = q_data->h;
> + pix_mp->pixelformat = q_data->fmt->fourcc;
> + pix_mp->num_planes = q_data->fmt->colplanes;
> +
> + for (i = 0; i < pix_mp->num_planes; i++) {
> + pix_mp->plane_fmt[i].bytesperline = q_data->bytesperline[i];
> + pix_mp->plane_fmt[i].sizeimage = q_data->sizeimage[i];
> + }
> +}
> +
> +static int mtk_jpeg_try_fmt_mplane(struct v4l2_format *f,
> + struct mtk_jpeg_fmt *fmt,
> + struct mtk_jpeg_ctx *ctx, int q_type)
> +{
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + int i;
> +
> + memset(pix_mp->reserved, 0, sizeof(pix_mp->reserved));
> + pix_mp->field = V4L2_FIELD_NONE;
> +
> + if (ctx->state != MTK_JPEG_INIT) {
> + mtk_jpeg_adjust_fmt_mplane(ctx, f);
> + goto end;
> + }
> +
> + pix_mp->num_planes = fmt->colplanes;
> + pix_mp->pixelformat = fmt->fourcc;
> +
> + if (q_type == MTK_JPEG_FMT_TYPE_OUTPUT) {
> + struct v4l2_plane_pix_format *pfmt = &pix_mp->plane_fmt[0];
> +
> + mtk_jpeg_bound_align_image(&pix_mp->width, MTK_JPEG_MIN_WIDTH,
> + MTK_JPEG_MAX_WIDTH, 0,
> + &pix_mp->height, MTK_JPEG_MIN_HEIGHT,
> + MTK_JPEG_MAX_HEIGHT, 0);
> +
> + memset(pfmt->reserved, 0, sizeof(pfmt->reserved));
> + pfmt->bytesperline = 0;
> + /* Source size must be aligned to 128 */
> + pfmt->sizeimage = mtk_jpeg_align(pfmt->sizeimage, 128);
> + if (pfmt->sizeimage == 0)
> + pfmt->sizeimage = MTK_JPEG_DEFAULT_SIZEIMAGE;
> + goto end;
> + }
> +
> + /* type is MTK_JPEG_FMT_TYPE_CAPTURE */
> + mtk_jpeg_bound_align_image(&pix_mp->width, MTK_JPEG_MIN_WIDTH,
> + MTK_JPEG_MAX_WIDTH, fmt->h_align,
> + &pix_mp->height, MTK_JPEG_MIN_HEIGHT,
> + MTK_JPEG_MAX_HEIGHT, fmt->v_align);
> +
> + for (i = 0; i < fmt->colplanes; i++) {
> + struct v4l2_plane_pix_format *pfmt = &pix_mp->plane_fmt[i];
> + u32 stride = pix_mp->width * fmt->h_sample[i] / 4;
> + u32 h = pix_mp->height * fmt->v_sample[i] / 4;
> +
> + memset(pfmt->reserved, 0, sizeof(pfmt->reserved));
> + pfmt->bytesperline = stride;
> + pfmt->sizeimage = stride * h;
> + }
> +end:
> + v4l2_dbg(2, debug, &jpeg->v4l2_dev, "wxh:%ux%u\n",
> + pix_mp->width, pix_mp->height);
> + for (i = 0; i < pix_mp->num_planes; i++) {
> + v4l2_dbg(2, debug, &jpeg->v4l2_dev,
> + "plane[%d] bpl=%u, size=%u\n",
> + i,
> + pix_mp->plane_fmt[i].bytesperline,
> + pix_mp->plane_fmt[i].sizeimage);
> + }
> + return 0;
> +}
> +
> +static int mtk_jpeg_g_fmt_vid_mplane(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct vb2_queue *vq;
> + struct mtk_jpeg_q_data *q_data = NULL;
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(priv);
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + int i;
> +
> + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> + if (!vq)
> + return -EINVAL;
> +
> + q_data = mtk_jpeg_get_q_data(ctx, f->type);
> +
> + memset(pix_mp->reserved, 0, sizeof(pix_mp->reserved));
> + pix_mp->width = q_data->w;
> + pix_mp->height = q_data->h;
> + pix_mp->field = V4L2_FIELD_NONE;
> + pix_mp->pixelformat = q_data->fmt->fourcc;
> + pix_mp->num_planes = q_data->fmt->colplanes;
> + pix_mp->colorspace = ctx->colorspace;
> + pix_mp->ycbcr_enc = ctx->ycbcr_enc;
> + pix_mp->xfer_func = ctx->xfer_func;
> + pix_mp->quantization = ctx->quantization;
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "(%d) g_fmt:%s wxh:%ux%u\n",
> + f->type, q_data->fmt->name, pix_mp->width, pix_mp->height);
> +
> + for (i = 0; i < pix_mp->num_planes; i++) {
> + struct v4l2_plane_pix_format *pfmt = &pix_mp->plane_fmt[i];
> +
> + pfmt->bytesperline = q_data->bytesperline[i];
> + pfmt->sizeimage = q_data->sizeimage[i];
> + memset(pfmt->reserved, 0, sizeof(pfmt->reserved));
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev,
> + "plane[%d] bpl=%u, size=%u\n",
> + i,
> + pfmt->bytesperline,
> + pfmt->sizeimage);
> + }
> + return 0;
> +}
> +
> +static int mtk_jpeg_try_fmt_vid_cap_mplane(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(priv);
> + struct mtk_jpeg_fmt *fmt;
> +
> + fmt = mtk_jpeg_find_format(ctx, f->fmt.pix_mp.pixelformat,
> + MTK_JPEG_FMT_TYPE_CAPTURE);
> + if (!fmt)
> + fmt = ctx->cap_q.fmt;
> +
> + v4l2_dbg(2, debug, &ctx->jpeg->v4l2_dev, "(%d) try_fmt:%s\n",
> + f->type, fmt->name);
> +
> + return mtk_jpeg_try_fmt_mplane(f, fmt, ctx, MTK_JPEG_FMT_TYPE_CAPTURE);
> +}
> +
> +static int mtk_jpeg_try_fmt_vid_out_mplane(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(priv);
> + struct mtk_jpeg_fmt *fmt;
> +
> + fmt = mtk_jpeg_find_format(ctx, f->fmt.pix_mp.pixelformat,
> + MTK_JPEG_FMT_TYPE_OUTPUT);
> + if (!fmt)
> + fmt = ctx->out_q.fmt;
> +
> + v4l2_dbg(2, debug, &ctx->jpeg->v4l2_dev, "(%d) try_fmt:%s\n",
> + f->type, fmt->name);
> +
> + return mtk_jpeg_try_fmt_mplane(f, fmt, ctx, MTK_JPEG_FMT_TYPE_OUTPUT);
> +}
> +
> +static int mtk_jpeg_s_fmt_mplane(struct mtk_jpeg_ctx *ctx,
> + struct v4l2_format *f)
> +{
> + struct vb2_queue *vq;
> + struct mtk_jpeg_q_data *q_data = NULL;
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + unsigned int f_type;
> + int i;
> +
> + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> + if (!vq)
> + return -EINVAL;
> +
> + q_data = mtk_jpeg_get_q_data(ctx, f->type);
> +
> + if (vb2_is_busy(vq)) {
> + v4l2_err(&jpeg->v4l2_dev, "queue busy\n");
> + return -EBUSY;
> + }
> +
> + f_type = V4L2_TYPE_IS_OUTPUT(f->type) ?
> + MTK_JPEG_FMT_TYPE_OUTPUT : MTK_JPEG_FMT_TYPE_CAPTURE;
> +
> + q_data->fmt = mtk_jpeg_find_format(ctx, pix_mp->pixelformat, f_type);
> + q_data->w = pix_mp->width;
> + q_data->h = pix_mp->height;
> + ctx->colorspace = pix_mp->colorspace;
> + ctx->ycbcr_enc = pix_mp->ycbcr_enc;
> + ctx->xfer_func = pix_mp->xfer_func;
> + ctx->quantization = pix_mp->quantization;
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "(%d) s_fmt:%s wxh:%ux%u\n",
> + f->type, q_data->fmt->name, q_data->w, q_data->h);
> +
> + for (i = 0; i < q_data->fmt->colplanes; i++) {
> + q_data->bytesperline[i] = pix_mp->plane_fmt[i].bytesperline;
> + q_data->sizeimage[i] = pix_mp->plane_fmt[i].sizeimage;
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev,
> + "plane[%d] bpl=%u, size=%u\n",
> + i, q_data->bytesperline[i], q_data->sizeimage[i]);
> + }
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_s_fmt_vid_out_mplane(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + int ret;
> +
> + ret = mtk_jpeg_try_fmt_vid_out_mplane(file, priv, f);
> + if (ret)
> + return ret;
> +
> + return mtk_jpeg_s_fmt_mplane(mtk_jpeg_fh_to_ctx(priv), f);
> +}
> +
> +static int mtk_jpeg_s_fmt_vid_cap_mplane(struct file *file, void *priv,
> + struct v4l2_format *f)
> +{
> + int ret;
> +
> + ret = mtk_jpeg_try_fmt_vid_cap_mplane(file, priv, f);
> + if (ret)
> + return ret;
> +
> + return mtk_jpeg_s_fmt_mplane(mtk_jpeg_fh_to_ctx(priv), f);
> +}
> +
> +static void mtk_jpeg_queue_src_chg_event(struct mtk_jpeg_ctx *ctx)
> +{
> + static const struct v4l2_event ev_src_ch = {
> + .type = V4L2_EVENT_SOURCE_CHANGE,
> + .u.src_change.changes =
> + V4L2_EVENT_SRC_CH_RESOLUTION,
> + };
> +
> + v4l2_event_queue_fh(&ctx->fh, &ev_src_ch);
> +}
> +
> +static int mtk_jpeg_subscribe_event(struct v4l2_fh *fh,
> + const struct v4l2_event_subscription *sub)
> +{
> + switch (sub->type) {
> + case V4L2_EVENT_SOURCE_CHANGE:
> + return v4l2_src_change_event_subscribe(fh, sub);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int mtk_jpeg_g_selection(struct file *file, void *priv,
> + struct v4l2_selection *s)
> +{
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(priv);
> +
> + if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> + return -EINVAL;
> +
> + switch (s->target) {
> + case V4L2_SEL_TGT_COMPOSE:
> + case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> + s->r.width = ctx->out_q.w;
> + s->r.height = ctx->out_q.h;
> + s->r.left = 0;
> + s->r.top = 0;
> + break;
> + case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> + case V4L2_SEL_TGT_COMPOSE_PADDED:
> + s->r.width = ctx->cap_q.w;
> + s->r.height = ctx->cap_q.h;
> + s->r.left = 0;
> + s->r.top = 0;
> + break;
> + default:
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static int mtk_jpeg_s_selection(struct file *file, void *priv,
> + struct v4l2_selection *s)
> +{
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(priv);
> +
> + if (s->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
> + return -EINVAL;
> +
> + switch (s->target) {
> + case V4L2_SEL_TGT_COMPOSE:
> + s->r.left = 0;
> + s->r.top = 0;
> + s->r.width = ctx->out_q.w;
> + s->r.height = ctx->out_q.h;
> + break;
> + default:
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static int mtk_jpeg_qbuf(struct file *file, void *priv, struct v4l2_buffer *buf)
> +{
> + struct v4l2_fh *fh = file->private_data;
> + struct vb2_queue *vq;
> + struct vb2_buffer *vb;
> + struct mtk_jpeg_src_buf *jpeg_src_buf;
> +
> + if (buf->type != V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> + goto end;
> +
> + vq = v4l2_m2m_get_vq(fh->m2m_ctx, buf->type);
> + vb = vq->bufs[buf->index];
> + jpeg_src_buf = mtk_jpeg_vb2_to_srcbuf(vb);
> + jpeg_src_buf->flags = (buf->m.planes[0].bytesused == 0) ?
> + MTK_JPEG_BUF_FLAGS_LAST_FRAME : MTK_JPEG_BUF_FLAGS_INIT;
> +end:
> + return v4l2_m2m_qbuf(file, fh->m2m_ctx, buf);
> +}
> +
> +static const struct v4l2_ioctl_ops mtk_jpeg_ioctl_ops = {
> + .vidioc_querycap = mtk_jpeg_querycap,
> + .vidioc_enum_fmt_vid_cap_mplane = mtk_jpeg_enum_fmt_vid_cap,
> + .vidioc_enum_fmt_vid_out_mplane = mtk_jpeg_enum_fmt_vid_out,
> + .vidioc_try_fmt_vid_cap_mplane = mtk_jpeg_try_fmt_vid_cap_mplane,
> + .vidioc_try_fmt_vid_out_mplane = mtk_jpeg_try_fmt_vid_out_mplane,
> + .vidioc_g_fmt_vid_cap_mplane = mtk_jpeg_g_fmt_vid_mplane,
> + .vidioc_g_fmt_vid_out_mplane = mtk_jpeg_g_fmt_vid_mplane,
> + .vidioc_s_fmt_vid_cap_mplane = mtk_jpeg_s_fmt_vid_cap_mplane,
> + .vidioc_s_fmt_vid_out_mplane = mtk_jpeg_s_fmt_vid_out_mplane,
> + .vidioc_qbuf = mtk_jpeg_qbuf,
> + .vidioc_subscribe_event = mtk_jpeg_subscribe_event,
> + .vidioc_g_selection = mtk_jpeg_g_selection,
> + .vidioc_s_selection = mtk_jpeg_s_selection,
> +
> + .vidioc_create_bufs = v4l2_m2m_ioctl_create_bufs,
> + .vidioc_prepare_buf = v4l2_m2m_ioctl_prepare_buf,
> + .vidioc_reqbufs = v4l2_m2m_ioctl_reqbufs,
> + .vidioc_querybuf = v4l2_m2m_ioctl_querybuf,
> + .vidioc_dqbuf = v4l2_m2m_ioctl_dqbuf,
> + .vidioc_expbuf = v4l2_m2m_ioctl_expbuf,
> + .vidioc_streamon = v4l2_m2m_ioctl_streamon,
> + .vidioc_streamoff = v4l2_m2m_ioctl_streamoff,
> +
> + .vidioc_unsubscribe_event = v4l2_event_unsubscribe,
> +};
> +
> +static int mtk_jpeg_queue_setup(struct vb2_queue *q,
> + unsigned int *num_buffers,
> + unsigned int *num_planes,
> + unsigned int sizes[],
> + struct device *alloc_ctxs[])
> +{
> + struct mtk_jpeg_ctx *ctx = vb2_get_drv_priv(q);
> + struct mtk_jpeg_q_data *q_data = NULL;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + int i;
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "(%d) buf_req count=%u\n",
> + q->type, *num_buffers);
> +
> + q_data = mtk_jpeg_get_q_data(ctx, q->type);
> + if (!q_data)
> + return -EINVAL;
> +
> + *num_planes = q_data->fmt->colplanes;
> + for (i = 0; i < q_data->fmt->colplanes; i++) {
> + sizes[i] = q_data->sizeimage[i];
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "sizeimage[%d]=%u\n",
> + i, sizes[i]);
> + }
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_buf_prepare(struct vb2_buffer *vb)
> +{
> + struct mtk_jpeg_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> + struct mtk_jpeg_q_data *q_data = NULL;
> + int i;
> +
> + q_data = mtk_jpeg_get_q_data(ctx, vb->vb2_queue->type);
> + if (!q_data)
> + return -EINVAL;
> +
> + for (i = 0; i < q_data->fmt->colplanes; i++)
> + vb2_set_plane_payload(vb, i, q_data->sizeimage[i]);
> +
> + return 0;
> +}
> +
> +static bool mtk_jpeg_check_resolution_change(struct mtk_jpeg_ctx *ctx,
> + struct mtk_jpeg_dec_param *param)
> +{
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + struct mtk_jpeg_q_data *q_data;
> +
> + q_data = &ctx->out_q;
> + if (q_data->w != param->pic_w || q_data->h != param->pic_h) {
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "Picture size change\n");
> + return true;
> + }
> +
> + q_data = &ctx->cap_q;
> + if (q_data->fmt != mtk_jpeg_find_format(ctx, param->dst_fourcc,
> + MTK_JPEG_FMT_TYPE_CAPTURE)) {
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "format change\n");
> + return true;
> + }
> + return false;
> +}
> +
> +static void mtk_jpeg_set_queue_data(struct mtk_jpeg_ctx *ctx,
> + struct mtk_jpeg_dec_param *param)
> +{
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + struct mtk_jpeg_q_data *q_data;
> + int i;
> +
> + q_data = &ctx->out_q;
> + q_data->w = param->pic_w;
> + q_data->h = param->pic_h;
> +
> + q_data = &ctx->cap_q;
> + q_data->w = param->dec_w;
> + q_data->h = param->dec_h;
> + q_data->fmt = mtk_jpeg_find_format(ctx,
> + param->dst_fourcc,
> + MTK_JPEG_FMT_TYPE_CAPTURE);
> +
> + for (i = 0; i < q_data->fmt->colplanes; i++) {
> + q_data->bytesperline[i] = param->mem_stride[i];
> + q_data->sizeimage[i] = param->comp_size[i];
> + }
> +
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev,
> + "set_parse cap:%s pic(%u, %u), buf(%u, %u)\n",
> + q_data->fmt->name, param->pic_w, param->pic_h,
> + param->dec_w, param->dec_h);
> +}
> +
> +static void mtk_jpeg_buf_queue(struct vb2_buffer *vb)
> +{
> + struct mtk_jpeg_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> + struct mtk_jpeg_dec_param *param;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + struct mtk_jpeg_src_buf *jpeg_src_buf;
> + bool header_valid;
> +
> + v4l2_dbg(2, debug, &jpeg->v4l2_dev, "(%d) buf_q id=%d, vb=%p",
> + vb->vb2_queue->type, vb->index, vb);
> +
> + if (vb->vb2_queue->type != V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> + goto end;
> +
> + jpeg_src_buf = mtk_jpeg_vb2_to_srcbuf(vb);
> + param = &jpeg_src_buf->dec_param;
> + memset(param, 0, sizeof(*param));
> +
> + if (jpeg_src_buf->flags & MTK_JPEG_BUF_FLAGS_LAST_FRAME) {
> + v4l2_dbg(1, debug, &jpeg->v4l2_dev, "Got eos");
> + goto end;
> + }
> + header_valid = mtk_jpeg_parse(param, (u8 *)vb2_plane_vaddr(vb, 0),
> + vb2_get_plane_payload(vb, 0));
> + if (!header_valid) {
> + v4l2_err(&jpeg->v4l2_dev, "Header invalid.\n");
> + vb2_buffer_done(vb, VB2_BUF_STATE_ERROR);
> + return;
> + }
> +
> + if (ctx->state == MTK_JPEG_INIT) {
> + mtk_jpeg_queue_src_chg_event(ctx);
> + mtk_jpeg_set_queue_data(ctx, param);
> + ctx->state = MTK_JPEG_RUNNING;
> + }
> +end:
> + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +static void *mtk_jpeg_buf_remove(struct mtk_jpeg_ctx *ctx,
> + enum v4l2_buf_type type)
> +{
> + if (V4L2_TYPE_IS_OUTPUT(type))
> + return v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + else
> + return v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +}
> +
> +static int mtk_jpeg_start_streaming(struct vb2_queue *q, unsigned int count)
> +{
> + struct mtk_jpeg_ctx *ctx = vb2_get_drv_priv(q);
> + int ret = 0;
> +
> + ret = pm_runtime_get_sync(ctx->jpeg->dev);
> +
If start_streaming returns an error, then you must call
v4l2_m2m_buf_done(to_vb2_v4l2_buffer(vb), VB2_BUF_STATE_QUEUED) for all
buffers. Similar to what happens in stop_streaming, but with a different VB2_BUF_STATE.
> + return ret > 0 ? 0 : ret;
> +}
> +
> +static void mtk_jpeg_stop_streaming(struct vb2_queue *q)
> +{
> + struct mtk_jpeg_ctx *ctx = vb2_get_drv_priv(q);
> + struct vb2_buffer *vb;
> +
> + /*
> + * STREAMOFF is an acknowledgment for source change event.
> + * Before STREAMOFF, we still have to return the old resolution and
> + * subsampling. Update capture queue when the stream is off.
> + */
> + if (ctx->state == MTK_JPEG_SOURCE_CHANGE &&
> + !V4L2_TYPE_IS_OUTPUT(q->type)) {
> + struct mtk_jpeg_src_buf *src_buf;
> +
> + vb = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> + src_buf = mtk_jpeg_vb2_to_srcbuf(vb);
> + mtk_jpeg_set_queue_data(ctx, &src_buf->dec_param);
> + ctx->state = MTK_JPEG_RUNNING;
> + } else if (V4L2_TYPE_IS_OUTPUT(q->type)) {
> + ctx->state = MTK_JPEG_INIT;
> + }
> +
> + vb = mtk_jpeg_buf_remove(ctx, q->type);
> + while (vb) {
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(vb), VB2_BUF_STATE_ERROR);
> + vb = mtk_jpeg_buf_remove(ctx, q->type);
> + }
> +
> + pm_runtime_put_sync(ctx->jpeg->dev);
> +}
> +
> +static struct vb2_ops mtk_jpeg_qops = {
> + .queue_setup = mtk_jpeg_queue_setup,
> + .buf_prepare = mtk_jpeg_buf_prepare,
> + .buf_queue = mtk_jpeg_buf_queue,
> + .wait_prepare = vb2_ops_wait_prepare,
> + .wait_finish = vb2_ops_wait_finish,
> + .start_streaming = mtk_jpeg_start_streaming,
> + .stop_streaming = mtk_jpeg_stop_streaming,
> +};
> +
> +static void mtk_jpeg_set_dec_src(struct mtk_jpeg_ctx *ctx,
> + struct vb2_buffer *src_buf,
> + struct mtk_jpeg_bs *bs)
> +{
> + bs->str_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0);
> + bs->end_addr = bs->str_addr +
> + mtk_jpeg_align(vb2_get_plane_payload(src_buf, 0), 16);
> + bs->size = mtk_jpeg_align(vb2_plane_size(src_buf, 0), 128);
> +}
> +
> +static int mtk_jpeg_set_dec_dst(struct mtk_jpeg_ctx *ctx,
> + struct mtk_jpeg_dec_param *param,
> + struct vb2_buffer *dst_buf,
> + struct mtk_jpeg_fb *fb)
> +{
> + int i;
> +
> + if (param->comp_num != dst_buf->num_planes) {
> + dev_err(ctx->jpeg->dev, "plane number mismatch (%u != %u)\n",
> + param->comp_num, dst_buf->num_planes);
> + return -EINVAL;
> + }
> +
> + for (i = 0; i < dst_buf->num_planes; i++) {
> + if (vb2_plane_size(dst_buf, i) < param->comp_size[i]) {
> + dev_err(ctx->jpeg->dev,
> + "buffer size is underflow (%lu < %u)\n",
> + vb2_plane_size(dst_buf, 0),
> + param->comp_size[i]);
> + return -EINVAL;
> + }
> + fb->plane_addr[i] = vb2_dma_contig_plane_dma_addr(dst_buf, i);
> + }
> +
> + return 0;
> +}
> +
> +static void mtk_jpeg_device_run(void *priv)
> +{
> + struct mtk_jpeg_ctx *ctx = priv;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + struct vb2_buffer *src_buf, *dst_buf;
> + enum vb2_buffer_state buf_state = VB2_BUF_STATE_ERROR;
> + unsigned long flags;
> + struct mtk_jpeg_src_buf *jpeg_src_buf;
> + struct mtk_jpeg_bs bs;
> + struct mtk_jpeg_fb fb;
> + int i;
> +
> + src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> + dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> + jpeg_src_buf = mtk_jpeg_vb2_to_srcbuf(src_buf);
> +
> + if (jpeg_src_buf->flags & MTK_JPEG_BUF_FLAGS_LAST_FRAME) {
> + for (i = 0; i < dst_buf->num_planes; i++)
> + vb2_set_plane_payload(dst_buf, i, 0);
> + buf_state = VB2_BUF_STATE_DONE;
> + goto dec_end;
> + }
> +
> + if (mtk_jpeg_check_resolution_change(ctx, &jpeg_src_buf->dec_param)) {
> + mtk_jpeg_queue_src_chg_event(ctx);
> + ctx->state = MTK_JPEG_SOURCE_CHANGE;
> + v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
> + return;
> + }
> +
> + mtk_jpeg_set_dec_src(ctx, src_buf, &bs);
> + if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, dst_buf, &fb))
> + goto dec_end;
> +
> + spin_lock_irqsave(&jpeg->hw_lock, flags);
> + mtk_jpeg_dec_reset(jpeg->dec_reg_base);
> + mtk_jpeg_dec_set_config(jpeg->dec_reg_base,
> + &jpeg_src_buf->dec_param, &bs, &fb);
> +
> + mtk_jpeg_dec_start(jpeg->dec_reg_base);
> + spin_unlock_irqrestore(&jpeg->hw_lock, flags);
> + return;
> +
> +dec_end:
> + v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(src_buf), buf_state);
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(dst_buf), buf_state);
> + v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
> +}
> +
> +static int mtk_jpeg_job_ready(void *priv)
> +{
> + struct mtk_jpeg_ctx *ctx = priv;
> +
> + return (ctx->state == MTK_JPEG_RUNNING) ? 1 : 0;
> +}
> +
> +static void mtk_jpeg_job_abort(void *priv)
> +{
> + struct mtk_jpeg_ctx *ctx = priv;
> + struct mtk_jpeg_dev *jpeg = ctx->jpeg;
> + struct vb2_buffer *src_buf, *dst_buf;
> +
> + src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(src_buf), VB2_BUF_STATE_ERROR);
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(dst_buf), VB2_BUF_STATE_ERROR);
> + v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
> +}
> +
> +static struct v4l2_m2m_ops mtk_jpeg_m2m_ops = {
> + .device_run = mtk_jpeg_device_run,
> + .job_ready = mtk_jpeg_job_ready,
> + .job_abort = mtk_jpeg_job_abort,
> +};
> +
> +static int mtk_jpeg_queue_init(void *priv, struct vb2_queue *src_vq,
> + struct vb2_queue *dst_vq)
> +{
> + struct mtk_jpeg_ctx *ctx = priv;
> + int ret;
> +
> + src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> + src_vq->io_modes = VB2_DMABUF | VB2_MMAP | VB2_USERPTR;
I would drop USERPTR, it really makes little sense for dma_contig.
> + src_vq->drv_priv = ctx;
> + src_vq->buf_struct_size = sizeof(struct mtk_jpeg_src_buf);
> + src_vq->ops = &mtk_jpeg_qops;
> + src_vq->mem_ops = &vb2_dma_contig_memops;
> + src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> + src_vq->lock = &ctx->jpeg->lock;
> + src_vq->dev = ctx->jpeg->dev;
> + ret = vb2_queue_init(src_vq);
> + if (ret)
> + return ret;
> +
> + dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
> + dst_vq->io_modes = VB2_DMABUF | VB2_MMAP | VB2_USERPTR;
Ditto.
> + dst_vq->drv_priv = ctx;
> + dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> + dst_vq->ops = &mtk_jpeg_qops;
> + dst_vq->mem_ops = &vb2_dma_contig_memops;
> + dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> + dst_vq->lock = &ctx->jpeg->lock;
> + dst_vq->dev = ctx->jpeg->dev;
> + ret = vb2_queue_init(dst_vq);
> +
> + return ret;
> +}
> +
> +static void mtk_jpeg_clk_on(struct mtk_jpeg_dev *jpeg)
> +{
> + int ret;
> +
> + ret = mtk_smi_larb_get(jpeg->larb);
> + if (ret)
> + dev_err(jpeg->dev, "mtk_smi_larb_get larbvdec fail %d\n", ret);
> + clk_prepare_enable(jpeg->clk_jdec_smi);
> + clk_prepare_enable(jpeg->clk_jdec);
> +}
> +
> +static void mtk_jpeg_clk_off(struct mtk_jpeg_dev *jpeg)
> +{
> + clk_disable_unprepare(jpeg->clk_jdec);
> + clk_disable_unprepare(jpeg->clk_jdec_smi);
> + mtk_smi_larb_put(jpeg->larb);
> +}
> +
> +static irqreturn_t mtk_jpeg_dec_irq(int irq, void *priv)
> +{
> + struct mtk_jpeg_dev *jpeg = priv;
> + struct mtk_jpeg_ctx *ctx;
> + struct vb2_buffer *src_buf, *dst_buf;
> + struct mtk_jpeg_src_buf *jpeg_src_buf;
> + enum vb2_buffer_state buf_state = VB2_BUF_STATE_ERROR;
> + u32 dec_irq_ret;
> + u32 dec_ret;
> + int i;
> +
> + ctx = v4l2_m2m_get_curr_priv(jpeg->m2m_dev);
> + if (!ctx) {
> + v4l2_err(&jpeg->v4l2_dev, "Context is NULL\n");
> + return IRQ_HANDLED;
> + }
> +
> + src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> + dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> + jpeg_src_buf = mtk_jpeg_vb2_to_srcbuf(src_buf);
> +
> + dec_ret = mtk_jpeg_dec_get_int_status(jpeg->dec_reg_base);
> + dec_irq_ret = mtk_jpeg_dec_enum_result(dec_ret);
> +
> + if (dec_irq_ret >= MTK_JPEG_DEC_RESULT_UNDERFLOW)
> + mtk_jpeg_dec_reset(jpeg->dec_reg_base);
> +
> + if (dec_irq_ret != MTK_JPEG_DEC_RESULT_EOF_DONE) {
> + dev_err(jpeg->dev, "decode failed\n");
> + goto dec_end;
> + }
> +
> + for (i = 0; i < dst_buf->num_planes; i++)
> + vb2_set_plane_payload(dst_buf, i,
> + jpeg_src_buf->dec_param.comp_size[i]);
> +
> + buf_state = VB2_BUF_STATE_DONE;
> +
> +dec_end:
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(src_buf), buf_state);
> + v4l2_m2m_buf_done(to_vb2_v4l2_buffer(dst_buf), buf_state);
> + v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
> + return IRQ_HANDLED;
> +}
> +
> +static void mtk_jpeg_set_default_params(struct mtk_jpeg_ctx *ctx)
> +{
> + struct mtk_jpeg_q_data *q = &ctx->out_q;
> + int i;
> +
> + ctx->colorspace = V4L2_COLORSPACE_JPEG,
> + ctx->ycbcr_enc = V4L2_YCBCR_ENC_DEFAULT;
> + ctx->quantization = V4L2_QUANTIZATION_DEFAULT;
> + ctx->xfer_func = V4L2_XFER_FUNC_DEFAULT;
> +
> + q->fmt = mtk_jpeg_find_format(ctx, V4L2_PIX_FMT_JPEG,
> + MTK_JPEG_FMT_TYPE_OUTPUT);
> + q->w = MTK_JPEG_MIN_WIDTH;
> + q->h = MTK_JPEG_MIN_HEIGHT;
> + q->bytesperline[0] = 0;
> + q->sizeimage[0] = MTK_JPEG_DEFAULT_SIZEIMAGE;
> +
> + q = &ctx->cap_q;
> + q->fmt = mtk_jpeg_find_format(ctx, V4L2_PIX_FMT_YUV420M,
> + MTK_JPEG_FMT_TYPE_CAPTURE);
> + q->w = MTK_JPEG_MIN_WIDTH;
> + q->h = MTK_JPEG_MIN_HEIGHT;
> +
> + for (i = 0; i < q->fmt->colplanes; i++) {
> + u32 stride = q->w * q->fmt->h_sample[i] / 4;
> + u32 h = q->h * q->fmt->v_sample[i] / 4;
> +
> + q->bytesperline[i] = stride;
> + q->sizeimage[i] = stride * h;
> + }
> +}
> +
> +static int mtk_jpeg_open(struct file *file)
> +{
> + struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> + struct video_device *vfd = video_devdata(file);
> + struct mtk_jpeg_ctx *ctx;
> + int ret = 0;
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + if (mutex_lock_interruptible(&jpeg->lock)) {
> + ret = -ERESTARTSYS;
> + goto free;
> + }
> +
> + v4l2_fh_init(&ctx->fh, vfd);
> + file->private_data = &ctx->fh;
> + v4l2_fh_add(&ctx->fh);
> +
> + ctx->jpeg = jpeg;
> + ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(jpeg->m2m_dev, ctx,
> + mtk_jpeg_queue_init);
> + if (IS_ERR(ctx->fh.m2m_ctx)) {
> + ret = PTR_ERR(ctx->fh.m2m_ctx);
> + goto error;
> + }
> +
> + mtk_jpeg_set_default_params(ctx);
> + mutex_unlock(&jpeg->lock);
> + return 0;
> +
> +error:
> + v4l2_fh_del(&ctx->fh);
> + v4l2_fh_exit(&ctx->fh);
> + mutex_unlock(&jpeg->lock);
> +free:
> + kfree(ctx);
> + return ret;
> +}
> +
> +static int mtk_jpeg_release(struct file *file)
> +{
> + struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> + struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data);
> +
> + mutex_lock(&jpeg->lock);
> + v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> + v4l2_fh_del(&ctx->fh);
> + v4l2_fh_exit(&ctx->fh);
> + kfree(ctx);
> + mutex_unlock(&jpeg->lock);
> + return 0;
> +}
> +
> +static const struct v4l2_file_operations mtk_jpeg_fops = {
> + .owner = THIS_MODULE,
> + .open = mtk_jpeg_open,
> + .release = mtk_jpeg_release,
> + .poll = v4l2_m2m_fop_poll,
> + .unlocked_ioctl = video_ioctl2,
> + .mmap = v4l2_m2m_fop_mmap,
> +};
> +
> +static int mtk_jpeg_clk_init(struct mtk_jpeg_dev *jpeg)
> +{
> + struct device_node *node;
> + struct platform_device *pdev;
> +
> + node = of_parse_phandle(jpeg->dev->of_node, "mediatek,larb", 0);
> + if (!node)
> + return -EINVAL;
> + pdev = of_find_device_by_node(node);
> + if (WARN_ON(!pdev)) {
> + of_node_put(node);
> + return -EINVAL;
> + }
> + of_node_put(node);
> +
> + jpeg->larb = &pdev->dev;
> +
> + jpeg->clk_jdec = devm_clk_get(jpeg->dev, "jpgdec");
> + if (IS_ERR(jpeg->clk_jdec))
> + return -EINVAL;
> +
> + jpeg->clk_jdec_smi = devm_clk_get(jpeg->dev, "jpgdec-smi");
> + if (IS_ERR(jpeg->clk_jdec_smi))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_probe(struct platform_device *pdev)
> +{
> + struct mtk_jpeg_dev *jpeg;
> + struct resource *res;
> + int dec_irq;
> + int ret;
> +
> + jpeg = devm_kzalloc(&pdev->dev, sizeof(*jpeg), GFP_KERNEL);
> + if (!jpeg)
> + return -ENOMEM;
> +
> + mutex_init(&jpeg->lock);
> + spin_lock_init(&jpeg->hw_lock);
> + jpeg->dev = &pdev->dev;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + jpeg->dec_reg_base = devm_ioremap_resource(&pdev->dev, res);
> + if (IS_ERR(jpeg->dec_reg_base)) {
> + ret = PTR_ERR(jpeg->dec_reg_base);
> + return ret;
> + }
> +
> + res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
> + dec_irq = platform_get_irq(pdev, 0);
> + if (!res || dec_irq < 0) {
> + dev_err(&pdev->dev, "Failed to get dec_irq %d.\n", dec_irq);
> + ret = -EINVAL;
> + return ret;
> + }
> +
> + ret = devm_request_irq(&pdev->dev, dec_irq, mtk_jpeg_dec_irq, 0,
> + pdev->name, jpeg);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to request dec_irq %d (%d)\n",
> + dec_irq, ret);
> + ret = -EINVAL;
> + goto err_req_irq;
> + }
> +
> + ret = mtk_jpeg_clk_init(jpeg);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to init clk, err %d\n", ret);
> + goto err_clk_init;
> + }
> +
> + ret = v4l2_device_register(&pdev->dev, &jpeg->v4l2_dev);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to register v4l2 device\n");
> + ret = -EINVAL;
> + goto err_dev_register;
> + }
> +
> + jpeg->m2m_dev = v4l2_m2m_init(&mtk_jpeg_m2m_ops);
> + if (IS_ERR(jpeg->m2m_dev)) {
> + v4l2_err(&jpeg->v4l2_dev, "Failed to init mem2mem device\n");
> + ret = PTR_ERR(jpeg->m2m_dev);
> + goto err_m2m_init;
> + }
> +
> + jpeg->dec_vdev = video_device_alloc();
> + if (!jpeg->dec_vdev) {
> + ret = -ENOMEM;
> + goto err_dec_vdev_alloc;
> + }
> + snprintf(jpeg->dec_vdev->name, sizeof(jpeg->dec_vdev->name),
> + "%s-dec", MTK_JPEG_NAME);
> + jpeg->dec_vdev->fops = &mtk_jpeg_fops;
> + jpeg->dec_vdev->ioctl_ops = &mtk_jpeg_ioctl_ops;
> + jpeg->dec_vdev->minor = -1;
> + jpeg->dec_vdev->release = video_device_release;
> + jpeg->dec_vdev->lock = &jpeg->lock;
> + jpeg->dec_vdev->v4l2_dev = &jpeg->v4l2_dev;
> + jpeg->dec_vdev->vfl_dir = VFL_DIR_M2M;
> + jpeg->dec_vdev->device_caps = V4L2_CAP_STREAMING |
> + V4L2_CAP_VIDEO_M2M_MPLANE;
> +
> + ret = video_register_device(jpeg->dec_vdev, VFL_TYPE_GRABBER, 3);
> + if (ret) {
> + v4l2_err(&jpeg->v4l2_dev, "Failed to register video device\n");
> + goto err_dec_vdev_register;
> + }
> +
> + video_set_drvdata(jpeg->dec_vdev, jpeg);
> + v4l2_info(&jpeg->v4l2_dev,
> + "decoder device registered as /dev/video%d (%d,%d)\n",
> + jpeg->dec_vdev->num, VIDEO_MAJOR, jpeg->dec_vdev->minor);
> +
> + platform_set_drvdata(pdev, jpeg);
> +
> + pm_runtime_enable(&pdev->dev);
> +
> + return 0;
> +
> +err_dec_vdev_register:
> + video_device_release(jpeg->dec_vdev);
> +
> +err_dec_vdev_alloc:
> + v4l2_m2m_release(jpeg->m2m_dev);
> +
> +err_m2m_init:
> + v4l2_device_unregister(&jpeg->v4l2_dev);
> +
> +err_dev_register:
> +
> +err_clk_init:
> +
> +err_req_irq:
> +
> + return ret;
> +}
> +
> +static int mtk_jpeg_remove(struct platform_device *pdev)
> +{
> + struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> +
> + pm_runtime_disable(&pdev->dev);
> + video_unregister_device(jpeg->dec_vdev);
> + video_device_release(jpeg->dec_vdev);
> + v4l2_m2m_release(jpeg->m2m_dev);
> + v4l2_device_unregister(&jpeg->v4l2_dev);
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_PM
> +static int mtk_jpeg_pm_suspend(struct device *dev)
> +{
> + struct mtk_jpeg_dev *jpeg = dev_get_drvdata(dev);
> +
> + mtk_jpeg_dec_reset(jpeg->dec_reg_base);
> + mtk_jpeg_clk_off(jpeg);
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_pm_resume(struct device *dev)
> +{
> + struct mtk_jpeg_dev *jpeg = dev_get_drvdata(dev);
> +
> + mtk_jpeg_clk_on(jpeg);
> + mtk_jpeg_dec_reset(jpeg->dec_reg_base);
> +
> + return 0;
> +}
> +#endif /* CONFIG_PM */
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int mtk_jpeg_suspend(struct device *dev)
> +{
> + int ret;
> +
> + if (pm_runtime_suspended(dev))
> + return 0;
> +
> + ret = mtk_jpeg_pm_suspend(dev);
> + return ret;
> +}
> +
> +static int mtk_jpeg_resume(struct device *dev)
> +{
> + int ret;
> +
> + if (pm_runtime_suspended(dev))
> + return 0;
> +
> + ret = mtk_jpeg_pm_resume(dev);
> +
> + return ret;
> +}
> +#endif /* CONFIG_PM_SLEEP */
> +
> +static const struct dev_pm_ops mtk_jpeg_pm_ops = {
> + SET_SYSTEM_SLEEP_PM_OPS(mtk_jpeg_suspend, mtk_jpeg_resume)
> + SET_RUNTIME_PM_OPS(mtk_jpeg_pm_suspend, mtk_jpeg_pm_resume, NULL)
> +};
> +
> +static const struct of_device_id mtk_jpeg_match[] = {
> + {
> + .compatible = "mediatek,mt8173-jpgdec",
> + .data = NULL,
> + },
> + {
> + .compatible = "mediatek,mt2701-jpgdec",
> + .data = NULL,
> + },
> + {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, mtk_jpeg_match);
> +
> +static struct platform_driver mtk_jpeg_driver = {
> + .probe = mtk_jpeg_probe,
> + .remove = mtk_jpeg_remove,
> + .driver = {
> + .owner = THIS_MODULE,
> + .name = MTK_JPEG_NAME,
> + .of_match_table = mtk_jpeg_match,
> + .pm = &mtk_jpeg_pm_ops,
> + },
> +};
> +
> +module_platform_driver(mtk_jpeg_driver);
> +
> +MODULE_DESCRIPTION("MediaTek JPEG codec driver");
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
> new file mode 100644
> index 0000000..d862e3b
> --- /dev/null
> +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
> @@ -0,0 +1,141 @@
> +/*
> + * Copyright (c) 2016 MediaTek Inc.
> + * Author: Ming Hsiu Tsai <minghsiu.tsai@mediatek.com>
> + * Rick Chang <rick.chang@mediatek.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _MTK_JPEG_CORE_H
> +#define _MTK_JPEG_CORE_H
> +
> +#include <linux/interrupt.h>
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-fh.h>
> +
> +#define MTK_JPEG_NAME "mtk-jpeg"
> +
> +#define MTK_JPEG_FMT_FLAG_DEC_OUTPUT BIT(0)
> +#define MTK_JPEG_FMT_FLAG_DEC_CAPTURE BIT(1)
> +
> +#define MTK_JPEG_FMT_TYPE_OUTPUT 1
> +#define MTK_JPEG_FMT_TYPE_CAPTURE 2
> +
> +#define MTK_JPEG_MIN_WIDTH 32
> +#define MTK_JPEG_MIN_HEIGHT 32
> +#define MTK_JPEG_MAX_WIDTH 8192
> +#define MTK_JPEG_MAX_HEIGHT 8192
> +
> +#define MTK_JPEG_DEFAULT_SIZEIMAGE (1 * 1024 * 1024)
> +
> +enum mtk_jpeg_ctx_state {
> + MTK_JPEG_INIT = 0,
> + MTK_JPEG_RUNNING,
> + MTK_JPEG_SOURCE_CHANGE,
> +};
> +
> +/**
> + * struct mt_jpeg - JPEG IP abstraction
> + * @lock: the mutex protecting this structure
> + * @hw_lock: spinlock protecting the hw device resource
> + * @workqueue: decode work queue
> + * @dev: JPEG device
> + * @v4l2_dev: v4l2 device for mem2mem mode
> + * @m2m_dev: v4l2 mem2mem device data
> + * @alloc_ctx: videobuf2 memory allocator's context
> + * @dec_vdev: video device node for decoder mem2mem mode
> + * @dec_reg_base: JPEG registers mapping
> + * @clk_jdec: JPEG hw working clock
> + * @clk_jdec_smi: JPEG SMI bus clock
> + * @larb: SMI device
> + */
> +struct mtk_jpeg_dev {
> + struct mutex lock;
> + spinlock_t hw_lock;
> + struct workqueue_struct *workqueue;
> + struct device *dev;
> + struct v4l2_device v4l2_dev;
> + struct v4l2_m2m_dev *m2m_dev;
> + void *alloc_ctx;
> + struct video_device *dec_vdev;
> + void __iomem *dec_reg_base;
> + struct clk *clk_jdec;
> + struct clk *clk_jdec_smi;
> + struct device *larb;
> +};
> +
> +/**
> + * struct jpeg_fmt - driver's internal color format data
> + * @name: format descritpion
> + * @fourcc: the fourcc code, 0 if not applicable
> + * @h_sample: horizontal sample count of plane in 4 * 4 pixel image
> + * @v_sample: vertical sample count of plane in 4 * 4 pixel image
> + * @colplanes: number of color planes (1 for packed formats)
> + * @h_align: horizontal alignment order (align to 2^h_align)
> + * @v_align: vertical alignment order (align to 2^v_align)
> + * @flags: flags describing format applicability
> + */
> +struct mtk_jpeg_fmt {
> + char *name;
> + u32 fourcc;
> + int h_sample[VIDEO_MAX_PLANES];
> + int v_sample[VIDEO_MAX_PLANES];
> + int colplanes;
> + int h_align;
> + int v_align;
> + u32 flags;
> +};
> +
> +/**
> + * mtk_jpeg_q_data - parameters of one queue
> + * @fmt: driver-specific format of this queue
> + * @w: image width
> + * @h: image height
> + * @bytesperline: distance in bytes between the leftmost pixels in two adjacent
> + * lines
> + * @sizeimage: image buffer size in bytes
> + */
> +struct mtk_jpeg_q_data {
> + struct mtk_jpeg_fmt *fmt;
> + u32 w;
> + u32 h;
> + u32 bytesperline[VIDEO_MAX_PLANES];
> + u32 sizeimage[VIDEO_MAX_PLANES];
> +};
> +
> +/**
> + * mtk_jpeg_ctx - the device context data
> + * @jpeg: JPEG IP device for this context
> + * @out_q: source (output) queue information
> + * @cap_q: destination (capture) queue queue information
> + * @fh: V4L2 file handle
> + * @dec_param parameters for HW decoding
> + * @state: state of the context
> + * @header_valid: set if header has been parsed and valid
> + * @colorspace: enum v4l2_colorspace; supplemental to pixelformat
> + * @ycbcr_enc: enum v4l2_ycbcr_encoding, Y'CbCr encoding
> + * @quantization: enum v4l2_quantization, colorspace quantization
> + * @xfer_func: enum v4l2_xfer_func, colorspace transfer function
> + */
> +struct mtk_jpeg_ctx {
> + struct mtk_jpeg_dev *jpeg;
> + struct mtk_jpeg_q_data out_q;
> + struct mtk_jpeg_q_data cap_q;
> + struct v4l2_fh fh;
> + enum mtk_jpeg_ctx_state state;
> +
> + enum v4l2_colorspace colorspace;
> + enum v4l2_ycbcr_encoding ycbcr_enc;
> + enum v4l2_quantization quantization;
> + enum v4l2_xfer_func xfer_func;
> +};
> +
> +#endif /* _MTK_JPEG_CORE_H */
> diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.c b/drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.c
> new file mode 100644
> index 0000000..a6315f3
> --- /dev/null
> +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_hw.c
> @@ -0,0 +1,417 @@
> +/*
> + * Copyright (c) 2016 MediaTek Inc.
> + * Author: Ming Hsiu Tsai <minghsiu.tsai@mediatek.com>
> + * Rick Chang <rick.chang@mediatek.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <media/videobuf2-core.h>
> +
> +#include "mtk_jpeg_hw.h"
> +
> +#define MTK_JPEG_DUNUM_MASK(val) (((val) - 1) & 0x3)
> +
> +enum mtk_jpeg_color {
> + MTK_JPEG_COLOR_420 = 0x00221111,
> + MTK_JPEG_COLOR_422 = 0x00211111,
> + MTK_JPEG_COLOR_444 = 0x00111111,
> + MTK_JPEG_COLOR_422V = 0x00121111,
> + MTK_JPEG_COLOR_422X2 = 0x00412121,
> + MTK_JPEG_COLOR_422VX2 = 0x00222121,
> + MTK_JPEG_COLOR_400 = 0x00110000
> +};
> +
> +static inline int mtk_jpeg_verify_align(u32 val, int align, u32 reg)
> +{
> + if (val & (align - 1)) {
> + pr_err("mtk-jpeg: write reg %x without %d align\n", reg, align);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int mtk_jpeg_decide_format(struct mtk_jpeg_dec_param *param)
> +{
> + param->src_color = (param->sampling_w[0] << 20) |
> + (param->sampling_h[0] << 16) |
> + (param->sampling_w[1] << 12) |
> + (param->sampling_h[1] << 8) |
> + (param->sampling_w[2] << 4) |
> + (param->sampling_h[2]);
> +
> + param->uv_brz_w = 0;
> + switch (param->src_color) {
> + case MTK_JPEG_COLOR_444:
> + param->uv_brz_w = 1;
> + param->dst_fourcc = V4L2_PIX_FMT_YUV422M;
> + break;
> + case MTK_JPEG_COLOR_422X2:
> + case MTK_JPEG_COLOR_422:
> + param->dst_fourcc = V4L2_PIX_FMT_YUV422M;
> + break;
> + case MTK_JPEG_COLOR_422V:
> + case MTK_JPEG_COLOR_422VX2:
> + param->uv_brz_w = 1;
> + param->dst_fourcc = V4L2_PIX_FMT_YUV420M;
> + break;
> + case MTK_JPEG_COLOR_420:
> + param->dst_fourcc = V4L2_PIX_FMT_YUV420M;
> + break;
> + case MTK_JPEG_COLOR_400:
> + param->dst_fourcc = V4L2_PIX_FMT_GREY;
> + break;
> + default:
> + param->dst_fourcc = 0;
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static void mtk_jpeg_calc_mcu(struct mtk_jpeg_dec_param *param)
> +{
> + u32 factor_w, factor_h;
> + u32 i, comp, blk;
> +
> + factor_w = 2 + param->sampling_w[0];
> + factor_h = 2 + param->sampling_h[0];
> + param->mcu_w = (param->pic_w + (1 << factor_w) - 1) >> factor_w;
> + param->mcu_h = (param->pic_h + (1 << factor_h) - 1) >> factor_h;
> + param->total_mcu = param->mcu_w * param->mcu_h;
> + param->unit_num = ((param->pic_w + 7) >> 3) * ((param->pic_h + 7) >> 3);
> + param->blk_num = 0;
> + for (i = 0; i < MTK_JPEG_COMP_MAX; i++) {
> + param->blk_comp[i] = 0;
> + if (i >= param->comp_num)
> + continue;
> + param->blk_comp[i] = param->sampling_w[i] *
> + param->sampling_h[i];
> + param->blk_num += param->blk_comp[i];
> + }
> +
> + param->membership = 0;
> + for (i = 0, blk = 0, comp = 0; i < MTK_JPEG_BLOCK_MAX; i++) {
> + if (i < param->blk_num && comp < param->comp_num) {
> + u32 tmp;
> +
> + tmp = (0x04 + (comp & 0x3));
> + param->membership |= tmp << (i * 3);
> + if (++blk == param->blk_comp[comp]) {
> + comp++;
> + blk = 0;
> + }
> + } else {
> + param->membership |= 7 << (i * 3);
> + }
> + }
> +}
> +
> +static void mtk_jpeg_calc_dma_group(struct mtk_jpeg_dec_param *param)
> +{
> + u32 factor_mcu = 3;
> +
> + if (param->src_color == MTK_JPEG_COLOR_444 &&
> + param->dst_fourcc == V4L2_PIX_FMT_YUV422M)
> + factor_mcu = 4;
> + else if (param->src_color == MTK_JPEG_COLOR_422V &&
> + param->dst_fourcc == V4L2_PIX_FMT_YUV420M)
> + factor_mcu = 4;
> + else if (param->src_color == MTK_JPEG_COLOR_422X2 &&
> + param->dst_fourcc == V4L2_PIX_FMT_YUV422M)
> + factor_mcu = 2;
> + else if (param->src_color == MTK_JPEG_COLOR_400 ||
> + (param->src_color & 0x0FFFF) == 0)
> + factor_mcu = 4;
> +
> + param->dma_mcu = 1 << factor_mcu;
> + param->dma_group = param->mcu_w / param->dma_mcu;
> + param->dma_last_mcu = param->mcu_w % param->dma_mcu;
> + if (param->dma_last_mcu)
> + param->dma_group++;
> + else
> + param->dma_last_mcu = param->dma_mcu;
> +}
> +
> +static int mtk_jpeg_calc_dst_size(struct mtk_jpeg_dec_param *param)
> +{
> + u32 i, padding_w;
> + u32 ds_row_h[3];
> + u32 brz_w[3];
> +
> + brz_w[0] = 0;
> + brz_w[1] = param->uv_brz_w;
> + brz_w[2] = brz_w[1];
> +
> + for (i = 0; i < param->comp_num; i++) {
> + if (brz_w[i] > 3)
> + return -1;
> +
> + padding_w = param->mcu_w * MTK_JPEG_DCTSIZE *
> + param->sampling_w[i];
> + /* output format is 420/422 */
> + param->comp_w[i] = padding_w >> brz_w[i];
> + param->comp_w[i] = mtk_jpeg_align(param->comp_w[i],
> + MTK_JPEG_DCTSIZE);
> + param->img_stride[i] = i ? mtk_jpeg_align(param->comp_w[i], 16)
> + : mtk_jpeg_align(param->comp_w[i], 32);
> + ds_row_h[i] = (MTK_JPEG_DCTSIZE * param->sampling_h[i]);
> + }
> + param->dec_w = param->img_stride[0];
> + param->dec_h = ds_row_h[0] * param->mcu_h;
> +
> + for (i = 0; i < MTK_JPEG_COMP_MAX; i++) {
> + /* They must be equal in frame mode. */
> + param->mem_stride[i] = param->img_stride[i];
> + param->comp_size[i] = param->mem_stride[i] * ds_row_h[i] *
> + param->mcu_h;
> + }
> +
> + param->y_size = param->comp_size[0];
> + param->uv_size = param->comp_size[1];
> + param->dec_size = param->y_size + (param->uv_size << 1);
> +
> + return 0;
> +}
> +
> +int mtk_jpeg_dec_fill_param(struct mtk_jpeg_dec_param *param)
> +{
> + if (mtk_jpeg_decide_format(param))
> + return -1;
> +
> + mtk_jpeg_calc_mcu(param);
> + mtk_jpeg_calc_dma_group(param);
> + if (mtk_jpeg_calc_dst_size(param))
> + return -2;
> +
> + return 0;
> +}
> +
> +u32 mtk_jpeg_dec_get_int_status(void __iomem *base)
> +{
> + u32 ret;
> +
> + ret = readl(base + JPGDEC_REG_INTERRUPT_STATUS) & BIT_INQST_MASK_ALLIRQ;
> + if (ret)
> + writel(ret, base + JPGDEC_REG_INTERRUPT_STATUS);
> +
> + return ret;
> +}
> +
> +u32 mtk_jpeg_dec_enum_result(u32 irq_result)
> +{
> + if (irq_result & BIT_INQST_MASK_EOF)
> + return MTK_JPEG_DEC_RESULT_EOF_DONE;
> + else if (irq_result & BIT_INQST_MASK_PAUSE)
> + return MTK_JPEG_DEC_RESULT_PAUSE;
> + else if (irq_result & BIT_INQST_MASK_UNDERFLOW)
> + return MTK_JPEG_DEC_RESULT_UNDERFLOW;
> + else if (irq_result & BIT_INQST_MASK_OVERFLOW)
> + return MTK_JPEG_DEC_RESULT_OVERFLOW;
> + else if (irq_result & BIT_INQST_MASK_ERROR_BS)
> + return MTK_JPEG_DEC_RESULT_ERROR_BS;
No need for 'else' here since the previous 'if' always returns if true.
> +
> + return MTK_JPEG_DEC_RESULT_ERROR_UNKNOWN;
> +}
> +
> +void mtk_jpeg_dec_start(void __iomem *base)
> +{
> + writel(0, base + JPGDEC_REG_TRIG);
> +}
> +
> +static void mtk_jpeg_dec_soft_reset(void __iomem *base)
> +{
> + writel(0x0000FFFF, base + JPGDEC_REG_INTERRUPT_STATUS);
> + writel(0x00, base + JPGDEC_REG_RESET);
> + writel(0x01, base + JPGDEC_REG_RESET);
> +}
> +
> +static void mtk_jpeg_dec_hard_reset(void __iomem *base)
> +{
> + writel(0x00, base + JPGDEC_REG_RESET);
> + writel(0x10, base + JPGDEC_REG_RESET);
> +}
> +
> +void mtk_jpeg_dec_reset(void __iomem *base)
> +{
> + mtk_jpeg_dec_soft_reset(base);
> + mtk_jpeg_dec_hard_reset(base);
> +}
> +
> +static void mtk_jpeg_dec_set_brz_factor(void __iomem *base, u8 yscale_w,
> + u8 yscale_h, u8 uvscale_w, u8 uvscale_h)
> +{
> + u32 val;
> +
> + val = (uvscale_h << 12) | (uvscale_w << 8) |
> + (yscale_h << 4) | yscale_w;
> + writel(val, base + JPGDEC_REG_BRZ_FACTOR);
> +}
> +
> +static void mtk_jpeg_dec_set_dst_bank0(void __iomem *base, u32 addr_y,
> + u32 addr_u, u32 addr_v)
> +{
> + mtk_jpeg_verify_align(addr_y, 16, JPGDEC_REG_DEST_ADDR0_Y);
> + writel(addr_y, base + JPGDEC_REG_DEST_ADDR0_Y);
> + mtk_jpeg_verify_align(addr_u, 16, JPGDEC_REG_DEST_ADDR0_U);
> + writel(addr_u, base + JPGDEC_REG_DEST_ADDR0_U);
> + mtk_jpeg_verify_align(addr_v, 16, JPGDEC_REG_DEST_ADDR0_V);
> + writel(addr_v, base + JPGDEC_REG_DEST_ADDR0_V);
> +}
> +
> +static void mtk_jpeg_dec_set_dst_bank1(void __iomem *base, u32 addr_y,
> + u32 addr_u, u32 addr_v)
> +{
> + writel(addr_y, base + JPGDEC_REG_DEST_ADDR1_Y);
> + writel(addr_u, base + JPGDEC_REG_DEST_ADDR1_U);
> + writel(addr_v, base + JPGDEC_REG_DEST_ADDR1_V);
> +}
> +
> +static void mtk_jpeg_dec_set_mem_stride(void __iomem *base, u32 stride_y,
> + u32 stride_uv)
> +{
> + writel((stride_y & 0xFFFF), base + JPGDEC_REG_STRIDE_Y);
> + writel((stride_uv & 0xFFFF), base + JPGDEC_REG_STRIDE_UV);
> +}
> +
> +static void mtk_jpeg_dec_set_img_stride(void __iomem *base, u32 stride_y,
> + u32 stride_uv)
> +{
> + writel((stride_y & 0xFFFF), base + JPGDEC_REG_IMG_STRIDE_Y);
> + writel((stride_uv & 0xFFFF), base + JPGDEC_REG_IMG_STRIDE_UV);
> +}
> +
> +static void mtk_jpeg_dec_set_pause_mcu_idx(void __iomem *base, u32 idx)
> +{
> + writel(idx & 0x0003FFFFFF, base + JPGDEC_REG_PAUSE_MCU_NUM);
> +}
> +
> +static void mtk_jpeg_dec_set_dec_mode(void __iomem *base, u32 mode)
> +{
> + writel(mode & 0x03, base + JPGDEC_REG_OPERATION_MODE);
> +}
> +
> +static void mtk_jpeg_dec_set_bs_write_ptr(void __iomem *base, u32 ptr)
> +{
> + mtk_jpeg_verify_align(ptr, 16, JPGDEC_REG_FILE_BRP);
> + writel(ptr, base + JPGDEC_REG_FILE_BRP);
> +}
> +
> +static void mtk_jpeg_dec_set_bs_info(void __iomem *base, u32 addr, u32 size)
> +{
> + mtk_jpeg_verify_align(addr, 16, JPGDEC_REG_FILE_ADDR);
> + mtk_jpeg_verify_align(size, 128, JPGDEC_REG_FILE_TOTAL_SIZE);
> + writel(addr, base + JPGDEC_REG_FILE_ADDR);
> + writel(size, base + JPGDEC_REG_FILE_TOTAL_SIZE);
> +}
> +
> +static void mtk_jpeg_dec_set_comp_id(void __iomem *base, u32 id_y, u32 id_u,
> + u32 id_v)
> +{
> + u32 val;
> +
> + val = ((id_y & 0x00FF) << 24) | ((id_u & 0x00FF) << 16) |
> + ((id_v & 0x00FF) << 8);
> + writel(val, base + JPGDEC_REG_COMP_ID);
> +}
> +
> +static void mtk_jpeg_dec_set_total_mcu(void __iomem *base, u32 num)
> +{
> + writel(num - 1, base + JPGDEC_REG_TOTAL_MCU_NUM);
> +}
> +
> +static void mtk_jpeg_dec_set_comp0_du(void __iomem *base, u32 num)
> +{
> + writel(num - 1, base + JPGDEC_REG_COMP0_DATA_UNIT_NUM);
> +}
> +
> +static void mtk_jpeg_dec_set_du_membership(void __iomem *base, u32 member,
> + u32 gmc, u32 isgray)
> +{
> + if (isgray)
> + member = 0x3FFFFFFC;
> + member |= (isgray << 31) | (gmc << 30);
> + writel(member, base + JPGDEC_REG_DU_CTRL);
> +}
> +
> +static void mtk_jpeg_dec_set_q_table(void __iomem *base, u32 id0, u32 id1,
> + u32 id2)
> +{
> + u32 val;
> +
> + val = ((id0 & 0x0f) << 8) | ((id1 & 0x0f) << 4) | ((id2 & 0x0f) << 0);
> + writel(val, base + JPGDEC_REG_QT_ID);
> +}
> +
> +static void mtk_jpeg_dec_set_dma_group(void __iomem *base, u32 mcu_group,
> + u32 group_num, u32 last_mcu)
> +{
> + u32 val;
> +
> + val = (((mcu_group - 1) & 0x00FF) << 16) |
> + (((group_num - 1) & 0x007F) << 8) |
> + ((last_mcu - 1) & 0x00FF);
> + writel(val, base + JPGDEC_REG_WDMA_CTRL);
> +}
> +
> +static void mtk_jpeg_dec_set_sampling_factor(void __iomem *base, u32 comp_num,
> + u32 y_w, u32 y_h, u32 u_w,
> + u32 u_h, u32 v_w, u32 v_h)
> +{
> + u32 val;
> + u32 y_wh = (MTK_JPEG_DUNUM_MASK(y_w) << 2) | MTK_JPEG_DUNUM_MASK(y_h);
> + u32 u_wh = (MTK_JPEG_DUNUM_MASK(u_w) << 2) | MTK_JPEG_DUNUM_MASK(u_h);
> + u32 v_wh = (MTK_JPEG_DUNUM_MASK(v_w) << 2) | MTK_JPEG_DUNUM_MASK(v_h);
> +
> + if (comp_num == 1)
> + val = 0;
> + else
> + val = (y_wh << 8) | (u_wh << 4) | v_wh;
> + writel(val, base + JPGDEC_REG_DU_NUM);
> +}
> +
> +void mtk_jpeg_dec_set_config(void __iomem *base,
> + struct mtk_jpeg_dec_param *config,
> + struct mtk_jpeg_bs *bs,
> + struct mtk_jpeg_fb *fb)
> +{
> + mtk_jpeg_dec_set_brz_factor(base, 0, 0, config->uv_brz_w, 0);
> + mtk_jpeg_dec_set_dec_mode(base, 0);
> + mtk_jpeg_dec_set_comp0_du(base, config->unit_num);
> + mtk_jpeg_dec_set_total_mcu(base, config->total_mcu);
> + mtk_jpeg_dec_set_bs_info(base, bs->str_addr, bs->size);
> + mtk_jpeg_dec_set_bs_write_ptr(base, bs->end_addr);
> + mtk_jpeg_dec_set_du_membership(base, config->membership, 1,
> + (config->comp_num == 1) ? 1 : 0);
> + mtk_jpeg_dec_set_comp_id(base, config->comp_id[0], config->comp_id[1],
> + config->comp_id[2]);
> + mtk_jpeg_dec_set_q_table(base, config->qtbl_num[0],
> + config->qtbl_num[1], config->qtbl_num[2]);
> + mtk_jpeg_dec_set_sampling_factor(base, config->comp_num,
> + config->sampling_w[0],
> + config->sampling_h[0],
> + config->sampling_w[1],
> + config->sampling_h[1],
> + config->sampling_w[2],
> + config->sampling_h[2]);
> + mtk_jpeg_dec_set_mem_stride(base, config->mem_stride[0],
> + config->mem_stride[1]);
> + mtk_jpeg_dec_set_img_stride(base, config->img_stride[0],
> + config->img_stride[1]);
> + mtk_jpeg_dec_set_dst_bank0(base, fb->plane_addr[0],
> + fb->plane_addr[1], fb->plane_addr[2]);
> + mtk_jpeg_dec_set_dst_bank1(base, 0, 0, 0);
> + mtk_jpeg_dec_set_dma_group(base, config->dma_mcu, config->dma_group,
> + config->dma_last_mcu);
> + mtk_jpeg_dec_set_pause_mcu_idx(base, config->total_mcu);
> +}
Regards,
Hans
^ permalink raw reply
* [PATCH RFC 00/12] tda998x updates
From: Jyri Sarha @ 2016-11-11 15:10 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161108122420.GP1041@n2100.armlinux.org.uk>
On 11/08/16 14:24, Russell King - ARM Linux wrote:
> As no one responded to the previous round, I'm not spending soo much
> time writing up a description of these changes again. It's also been
> quite a long time, so I've forgotten all the details of the changes,
> so I'll do my best.
>
> Changes from the previous series include:
> - reorder the initial three patches
> - change the (now third patch)... I think to increase the size of the
> locked region.
> - fix edid parsing for infoframe generation - as was pointed out for
> dw-hdmi, parsing the EDID in get_modes() is incorrect, as that method
> will not be called when an override-edid is in effect. We need to
> parse the override-edid. Moreover, infoframe generation should not
> be keyed to whether the monitor is HDMI or not, CEA-861B allows non-
> HDMI to send infoframes.
> - only send audio if audio and infoframes are supported.
>
> Otherwise, these are very much like the previous posting of the series,
> except rebased upon the mali/hdlcd/tda998x change to remove the
> drm_connector_register() call.
>
> https://www.spinics.net/lists/dri-devel/msg121495.html
>
> It'd be nice to have other tda998x users ack and test these patches,
> I've tried to test on Juno, but the Juno situation seems to be a huge
> fail. (HBI0282B completely fails with latest firmware - (a) FPGA image
> incompatibilities io_b118 causes all FPGA AMBA devices to vanish, (b)
> seems no way to get SCPI support on it - adding the BL0 executable
> start address in the SCC registers seems to be incompatible with the
> devchip, causing the PLLs to fail. In discussion with Sudeep over
> these issues, but no idea where things are with it at the moment, other
> than Sudeep needs to investigate. All Linaro firmware releases are
> broken on HBI0282B.)
>
> drivers/gpu/drm/i2c/tda998x_drv.c | 826 ++++++++++++++++++++------------------
> 1 file changed, 429 insertions(+), 397 deletions(-)
>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
For the whole series. I am also happy to test these patches if I can
fetch them from some git repo.
Best regards,
Jyri
^ permalink raw reply
* [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback
From: Joerg Roedel @ 2016-11-11 15:03 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <479aeac0-71f9-a6b8-af6d-e2c25184a818@arm.com>
On Fri, Nov 11, 2016 at 02:34:39PM +0000, Robin Murphy wrote:
> On 10/11/16 16:16, Joerg Roedel wrote:
> > On Thu, Nov 10, 2016 at 04:07:08PM +0000, Robin Murphy wrote:
> >> On 10/11/16 15:46, Joerg Roedel wrote:
> >>> On Fri, Nov 04, 2016 at 11:24:06AM +0000, Eric Auger wrote:
> >>>> + resource_list_for_each_entry(window, &bridge->windows) {
> >>>> + if (resource_type(window->res) != IORESOURCE_MEM &&
> >>>> + resource_type(window->res) != IORESOURCE_IO)
> >>>> + continue;
> >>>
> >>> Why do you care about IO resources?
> >>
> >> [since this is essentially code I wrote]
> >>
> >> Because they occupy some area of the PCI address space, therefore I
> >> assumed that, like memory windows, they would be treated as P2P. Is that
> >> not the case?
> >
> > No, not at all. The IO-space is completly seperate from the MEM-space.
> > They are two different address-spaces, addressing different things. And
> > the IO-space is also not translated by any IOMMU I am aware of.
>
> OK. On the particular root complex I have to hand, though, any DMA to
> IOVAs between 0x5f800000 and 0x5fffffff sends an error back to the
> endpoint, and that just so happens to be where the I/O window is placed
> (both on the PCI side and the AXI (i.e. CPU MMIO) side. Whether it's
> that the external MMIO view of the RC's I/O window is explicitly
> duplicated in its PCI memory space as some side-effect of the PCI/AXI
> bridge, or that the thing just doesn't actually respect the access type
> on the PCI side I don't know, but that's how it is (and I spent this
> morning recreating it to make sure I wasn't mistaken).
What you see is that on your platform the io-ports are accessed by an
mmio-window. On x86 you have dedicated instructions to access io-ports.
And the io-port ranges are what is what the io-resources describe. These
resources do no tell you where the mmio-region for that devices io-ports
are.
Joerg
^ permalink raw reply
* [PATCH V5 3/3] ARM64 LPC: LPC driver implementation on Hip06
From: liviu.dudau at arm.com @ 2016-11-11 14:45 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <EE11001F9E5DDD47B7634E2F8A612F2E1F8F900A@lhreml507-mbx>
On Fri, Nov 11, 2016 at 01:39:35PM +0000, Gabriele Paoloni wrote:
> Hi Arnd
>
> > -----Original Message-----
> > From: Arnd Bergmann [mailto:arnd at arndb.de]
> > Sent: 10 November 2016 16:07
> > To: Gabriele Paoloni
> > Cc: linux-arm-kernel at lists.infradead.org; Yuanzhichang;
> > mark.rutland at arm.com; devicetree at vger.kernel.org;
> > lorenzo.pieralisi at arm.com; minyard at acm.org; linux-pci at vger.kernel.org;
> > benh at kernel.crashing.org; John Garry; will.deacon at arm.com; linux-
> > kernel at vger.kernel.org; xuwei (O); Linuxarm; zourongrong at gmail.com;
> > robh+dt at kernel.org; kantyzc at 163.com; linux-serial at vger.kernel.org;
> > catalin.marinas at arm.com; olof at lixom.net; liviu.dudau at arm.com;
> > bhelgaas at googl e.com; zhichang.yuan02 at gmail.com
> > Subject: Re: [PATCH V5 3/3] ARM64 LPC: LPC driver implementation on
> > Hip06
> >
> > On Thursday, November 10, 2016 3:36:49 PM CET Gabriele Paoloni wrote:
> > >
> > > Where should we get the range from? For LPC we know that it is going
> > > Work on anything that is not used by PCI I/O space, and this is
> > > why we use [0, PCIBIOS_MIN_IO]
> >
> > It should be allocated the same way we allocate PCI config space
> > segments. This is currently done with the io_range list in
> > drivers/pci/pci.c, which isn't perfect but could be extended
> > if necessary. Based on what others commented here, I'd rather
> > make the differences between ISA/LPC and PCI I/O ranges smaller
> > than larger.
Gabriele,
>
> I am not sure this would make sense...
>
> IMHO all the mechanism around io_range_list is needed to provide the
> "mapping" between I/O tokens and physical CPU addresses.
>
> Currently the available tokens range from 0 to IO_SPACE_LIMIT.
>
> As you know the I/O memory accessors operate on whatever
> __of_address_to_resource sets into the resource (start, end).
>
> With this special device in place we cannot know if a resource is
> assigned with an I/O token or a physical address, unless we forbid
> the I/O tokens to be in a specific range.
>
> So this is why we are changing the offsets of all the functions
> handling io_range_list (to make sure that a range is forbidden to
> the tokens and is available to the physical addresses).
>
> We have chosen this forbidden range to be [0, PCIBIOS_MIN_IO)
> because this is the maximum physical I/O range that a non PCI device
> can operate on and because we believe this does not impose much
> restriction on the available I/O token range; that now is
> [PCIBIOS_MIN_IO, IO_SPACE_LIMIT].
> So we believe that the chosen forbidden range can accommodate
> any special ISA bus device with no much constraint on the rest
> of I/O tokens...
Your idea is a good one, however you are abusing PCIBIOS_MIN_IO and you
actually need another variable for "reserving" an area in the I/O space
that can be used for physical addresses rather than I/O tokens.
The one good example for using PCIBIOS_MIN_IO is when your platform/architecture
does not support legacy ISA operations *at all*. In that case someone
sets the PCIBIOS_MIN_IO to a non-zero value to reserve that I/O range
so that it doesn't get used. With Zhichang's patch you now start forcing
those platforms to have a valid address below PCIBIOS_MIN_IO.
For the general case you also have to bear in mind that PCIBIOS_MIN_IO could
be zero. In that case, what is your "forbidden" range? [0, 0) ? So it makes
sense to add a new #define that should only be defined by those architectures/
platforms that want to reserve on top of PCIBIOS_MIN_IO another region
where I/O tokens can't be generated for.
Best regards,
Liviu
>
> >
> > > > Your current version has
> > > >
> > > > if (arm64_extio_ops->pfout) \
> > > > arm64_extio_ops->pfout(arm64_extio_ops->devpara,\
> > > > addr, value, sizeof(type)); \
> > > >
> > > > Instead, just subtract the start of the range from the logical
> > > > port number to transform it back into a bus-local port number:
> > >
> > > These accessors do not operate on IO tokens:
> > >
> > > If (arm64_extio_ops->start > addr || arm64_extio_ops->end < addr)
> > > addr is not going to be an I/O token; in fact patch 2/3 imposes that
> > > the I/O tokens will start at PCIBIOS_MIN_IO. So from 0 to
> > PCIBIOS_MIN_IO
> > > we have free physical addresses that the accessors can operate on.
> >
> > Ah, I missed that part. I'd rather not use PCIBIOS_MIN_IO to refer to
> > the logical I/O tokens, the purpose of that macro is really meant
> > for allocating PCI I/O port numbers within the address space of
> > one bus.
>
> As I mentioned above, special devices operate on CPU addresses directly,
> not I/O tokens. For them there is no way to distinguish....
>
> >
> > Note that it's equally likely that whichever next platform needs
> > non-mapped I/O access like this actually needs them for PCI I/O space,
> > and that will use it on addresses registered to a PCI host bridge.
>
> Ok so here you are talking about a platform that has got an I/O range
> under the PCI host controller, right?
> And this I/O range cannot be directly memory mapped but needs special
> redirections for the I/O tokens, right?
>
> In this scenario registering the I/O ranges with the forbidden range
> implemented by the current patch would still allow to redirect I/O
> tokens as long as arm64_extio_ops->start >= PCIBIOS_MIN_IO
>
> So effectively the special PCI host controller
> 1) knows the physical range that needs special redirection
> 2) register such range
> 3) uses pci_pio_to_address() to retrieve the IO tokens for the
> special accessors
> 4) sets arm64_extio_ops->start/end to the IO tokens retrieved in 3)
>
> So to be honest I think this patch can fit well both with
> special PCI controllers that need I/O tokens redirection and with
> special non-PCI controllers that need non-PCI I/O physical
> address redirection...
>
> Thanks (and sorry for the long reply but I didn't know how
> to make the explanation shorter :) )
>
> Gab
>
> >
> > If we separate the two steps:
> >
> > a) assign a range of logical I/O port numbers to a bus
> > b) register a set of helpers for redirecting logical I/O
> > port to a helper function
> >
> > then I think the code will get cleaner and more flexible.
> > It should actually then be able to replace the powerpc
> > specific implementation.
> >
> > Arnd
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
?\_(?)_/?
^ permalink raw reply
* [Linaro-acpi] ACPI namespace details for ARM64
From: Hanjun Guo @ 2016-11-11 14:44 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161111093226.GA13333@red-moon>
On 11/11/2016 05:32 PM, Lorenzo Pieralisi wrote:
> On Thu, Nov 10, 2016 at 04:18:54PM -0700, Al Stone wrote:
>> On 11/09/2016 03:05 PM, Bjorn Helgaas wrote:
>>> Hi all,
>>>
>>> We've been working through the details of getting ACPI to work on
>>> arm64, and there have been lots of questions about what this means for
>>> PCI. I've outlined this for several people individually, but I'm
>>> going to send this separately, apart from a specific patch series, to
>>> make sure we're all on the same page. Please correct my errors and
>>> misunderstandings.
>>>
>>> Bjorn
>>>
>>> [snip....]
>>
>> A big +1 to all of this. This also looks like something that should
>> be added to either PCI, ACPI or arm64 documentation (or even all three).
>
> And to arm64 platforms FW :)
>
>> What do you think?
>
> I do not think there is anything ARM64 specific in Bjorn's description,
> but I do think it is very useful to have it in documentation, these
> bits of information are scattered around ACPI specs and PCI FW specs,
> having a single source would help and would have prevented asking
> Bjorn the same questions 100 times.
>
>> Thank you for putting this together, Bjorn.
>
> +1, Thank you very much for this nice summary Bjorn.
+1, thanks a lot :)
Hanjun
^ permalink raw reply
* [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback
From: Robin Murphy @ 2016-11-11 14:34 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161110161619.GK2078@8bytes.org>
On 10/11/16 16:16, Joerg Roedel wrote:
> On Thu, Nov 10, 2016 at 04:07:08PM +0000, Robin Murphy wrote:
>> On 10/11/16 15:46, Joerg Roedel wrote:
>>> On Fri, Nov 04, 2016 at 11:24:06AM +0000, Eric Auger wrote:
>>>> + resource_list_for_each_entry(window, &bridge->windows) {
>>>> + if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> + resource_type(window->res) != IORESOURCE_IO)
>>>> + continue;
>>>
>>> Why do you care about IO resources?
>>
>> [since this is essentially code I wrote]
>>
>> Because they occupy some area of the PCI address space, therefore I
>> assumed that, like memory windows, they would be treated as P2P. Is that
>> not the case?
>
> No, not at all. The IO-space is completly seperate from the MEM-space.
> They are two different address-spaces, addressing different things. And
> the IO-space is also not translated by any IOMMU I am aware of.
OK. On the particular root complex I have to hand, though, any DMA to
IOVAs between 0x5f800000 and 0x5fffffff sends an error back to the
endpoint, and that just so happens to be where the I/O window is placed
(both on the PCI side and the AXI (i.e. CPU MMIO) side. Whether it's
that the external MMIO view of the RC's I/O window is explicitly
duplicated in its PCI memory space as some side-effect of the PCI/AXI
bridge, or that the thing just doesn't actually respect the access type
on the PCI side I don't know, but that's how it is (and I spent this
morning recreating it to make sure I wasn't mistaken).
This thing's ECAM space is similarly visible from the PCI side and
aborts DMA the same way - I've not tried decoding the "PCI hardware
error (0x1010)" that the sky2 network driver reports, but I do note it's
a slightly different number from the one it gives when trying to access
an address matching another device's BAR (actual peer-to-peer is
explicitly unsupported). Admittedly that's not something we can detect
from the host bridge windows at all.
Anyway, you are of course right that in the normal case this should only
matter if devices were doing I/O accesses in the first place, which
makes especially no sense in the context of the DMA API, so perhaps we
could drop the unintuitive IORESOURCE_IO check from here and consider
weird PCI controllers a separate problem to solve.
Robin.
^ permalink raw reply
* [Linaro-acpi] ACPI namespace details for ARM64
From: Sinan Kaya @ 2016-11-11 14:24 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <a0829be4-6324-758b-2fe6-a274f715a3a9@linaro.org>
On 11/10/2016 6:18 PM, Al Stone wrote:
> On 11/09/2016 03:05 PM, Bjorn Helgaas wrote:
>> Hi all,
>>
>> We've been working through the details of getting ACPI to work on
>> arm64, and there have been lots of questions about what this means for
>> PCI. I've outlined this for several people individually, but I'm
>> going to send this separately, apart from a specific patch series, to
>> make sure we're all on the same page. Please correct my errors and
>> misunderstandings.
>>
>> Bjorn
>>
>> [snip....]
>
> A big +1 to all of this. This also looks like something that should
> be added to either PCI, ACPI or arm64 documentation (or even all three).
> What do you think?
I agree. In order to have compliant systems, we have to make PNP0C02 required
in the PCIe appendix of the SBSA specification.
>
> Thank you for putting this together, Bjorn.
>
>
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH v5 6/8] Documentation: bindings: add compatible specific to legacy SCPI protocol
From: Sudeep Holla @ 2016-11-11 14:19 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CAL_JsqJ5PxzM_VMP55m+6snkqyWSMSsdCOiUzYWkMTb3LkD5Mw@mail.gmail.com>
On 11/11/16 13:34, Rob Herring wrote:
> On Fri, Nov 11, 2016 at 1:48 AM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On 10/11/16 19:03, Olof Johansson wrote:
>>> On Thu, Nov 10, 2016 at 6:34 AM, Sudeep Holla <sudeep.holla@arm.com>
>>> wrote:
>
> [...]
>
>>>> E.g. Amlogic follows most of the legacy protocol though it deviates in
>>>> couple of things which we can handle with platform specific compatible
>>>> (in the following patch in the series). When another user(Rockchip ?)
>>>> make use of this legacy protocol, we can start using those platform
>>>> specific compatible for deviations only.
>>>>
>>>> Is that not acceptable ?
>>>
>>>
>>> If there's no shared legacy feature set, then it's probably less
>>> useful to have a shared less precise compatible value.
>>>
>>
>> There is and will be some shared feature set for sure. At the least the
>> standard command set will be shared.
>>
>>> What the main point I was trying to get across was that we shouldn't
>>> expand the generic binding with per-vendor compatible fields, instead
>>> we should have those as extensions on the side.
>>>
>>
>> Yes I get the point. We will have per-vendor compatibles for handle the
>> deviations but generic one to handle the shared set.
>>
>>> I'm also a little apprehensive of using "legacy", it goes in the same
>>> bucket as "misc". At some point 1.0 will be legacy too, etc.
>>>
>>
>> True and I agree, how about "arm,scpi-pre-1.0" instead ?
>
> That's still meaningless. Convince me that multiple implementations
> are identical, then we can have a common property. For example, how
> many releases did ARM make before 1.0.
>
None officially, so I tend to agree with you on this.
But so far we have seen some commonality between Rockchip and Amlogic
implementations, which in fact shares some commonality with early
release of SCPI from ARM (there are based on the same SCP code base,
which is closed source and released to partners only). ARM improved the
specification and the code base before the official release but by then
it was adopted(as usual we were late ;))
IMO, it's might be useful to have more generic say "arm,scpi-pre-1.0"
and platform specific "amlogic,meson-gxbb-scpi"
--
Regards,
Sudeep
^ permalink raw reply
* [PATCH v14 4/9] acpi/arm64: Add GTDT table parse driver
From: Hanjun Guo @ 2016-11-11 13:58 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <5825CBB5.8090104@linaro.org>
On 11/11/2016 09:46 PM, Hanjun Guo wrote:
> Hi Mark,
>
> Sorry for the late reply.
>
> On 10/21/2016 12:37 AM, Mark Rutland wrote:
>> Hi,
>>
>> As a heads-up, on v4.9-rc1 I see conflicts at least against
>> arch/arm64/Kconfig. Luckily git am -3 seems to be able to fix that up
>> automatically, but this will need to be rebased before the next posting
>> and/or merging.
>>
>> On Thu, Sep 29, 2016 at 02:17:12AM +0800, fu.wei at linaro.org wrote:
>>> +static int __init map_gt_gsi(u32 interrupt, u32 flags)
>>> +{
>>> + int trigger, polarity;
>>> +
>>> + if (!interrupt)
>>> + return 0;
>>
>> Urgh.
>>
>> Only the secure interrupt (which we do not need) is optional in this
>> manner, and (hilariously), zero appears to also be a valid GSIV, per
>> figure 5-24 in the ACPI 6.1 spec.
>>
>> So, I think that:
>>
>> (a) we should not bother parsing the secure interrupt
>> (b) we should drop the check above
>> (c) we should report the spec issue to the ASWG
>
> Sorry, I willing to do that, but I need to figure out the issue here.
> What kind of issue in detail? do you mean that zero should not be valid
> for arch timer interrupts?
OK, I think you are referring to "we don't need the secure interrupt",
correct me if I'm wrong (still in jet lag...).
Thanks
Hanjun
^ permalink raw reply
* [PATCH v6 2/9] drm/hisilicon/hibmc: Add video memory management
From: Rongrong Zou @ 2016-11-11 13:57 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CAOw6vbJFitYby+cNezQvdtwJqNkSjhX2=S0QwUrL40gmv-Kozg@mail.gmail.com>
? 2016/11/11 21:25, Sean Paul ??:
> On Fri, Nov 11, 2016 at 6:16 AM, Rongrong Zou <zourongrong@huawei.com> wrote:
>> ? 2016/11/11 1:35, Sean Paul ??:
>>>
>>> On Fri, Oct 28, 2016 at 3:27 AM, Rongrong Zou <zourongrong@gmail.com>
>>> wrote:
>>>>
>>>> Hibmc have 32m video memory which can be accessed through PCIe by host,
>>>> we use ttm to manage these memory.
>>>>
>>>> Signed-off-by: Rongrong Zou <zourongrong@gmail.com>
>>>> ---
>>>> drivers/gpu/drm/hisilicon/hibmc/Kconfig | 1 +
>>>> drivers/gpu/drm/hisilicon/hibmc/Makefile | 2 +-
>>>> drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 12 +
>>>> drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h | 46 +++
>>>> drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c | 490
>>>> ++++++++++++++++++++++++
>>>> 5 files changed, 550 insertions(+), 1 deletion(-)
>>>> create mode 100644 drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c
>>>>
>>>> diff --git a/drivers/gpu/drm/hisilicon/hibmc/Kconfig
>>>> b/drivers/gpu/drm/hisilicon/hibmc/Kconfig
>>>> index a9af90d..bcb8c18 100644
>>>> --- a/drivers/gpu/drm/hisilicon/hibmc/Kconfig
>>>> +++ b/drivers/gpu/drm/hisilicon/hibmc/Kconfig
>>>> @@ -1,6 +1,7 @@
>>>> config DRM_HISI_HIBMC
>>>> tristate "DRM Support for Hisilicon Hibmc"
>>>> depends on DRM && PCI
>>>> + select DRM_TTM
>>>>
>>>> help
>>>> Choose this option if you have a Hisilicon Hibmc soc chipset.
>>>> diff --git a/drivers/gpu/drm/hisilicon/hibmc/Makefile
>>>> b/drivers/gpu/drm/hisilicon/hibmc/Makefile
>>>> index 97cf4a0..d5c40b8 100644
>>>> --- a/drivers/gpu/drm/hisilicon/hibmc/Makefile
>>>> +++ b/drivers/gpu/drm/hisilicon/hibmc/Makefile
>>>> @@ -1,5 +1,5 @@
>>>> ccflags-y := -Iinclude/drm
>>>> -hibmc-drm-y := hibmc_drm_drv.o hibmc_drm_power.o
>>>> +hibmc-drm-y := hibmc_drm_drv.o hibmc_drm_power.o hibmc_ttm.o
>>>>
>>>> obj-$(CONFIG_DRM_HISI_HIBMC) +=hibmc-drm.o
>>>> #obj-y += hibmc-drm.o
>>>> diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
>>>> b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
>>>> index 4669d42..81f4301 100644
>>>> --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
>>>> +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
>>>> @@ -31,6 +31,7 @@
>>>> #ifdef CONFIG_COMPAT
>>>> .compat_ioctl = drm_compat_ioctl,
>>>> #endif
>>>> + .mmap = hibmc_mmap,
>>>> .poll = drm_poll,
>>>> .read = drm_read,
>>>> .llseek = no_llseek,
>>>> @@ -46,6 +47,8 @@ static void hibmc_disable_vblank(struct drm_device
>>>> *dev, unsigned int pipe)
>>>> }
>>>>
>>>> static struct drm_driver hibmc_driver = {
>>>> + .driver_features = DRIVER_GEM,
>>>> +
>>>
>>>
>>> nit: extra space
>>>
>>>> .fops = &hibmc_fops,
>>>> .name = "hibmc",
>>>> .date = "20160828",
>>>> @@ -55,6 +58,10 @@ static void hibmc_disable_vblank(struct drm_device
>>>> *dev, unsigned int pipe)
>>>> .get_vblank_counter = drm_vblank_no_hw_counter,
>>>> .enable_vblank = hibmc_enable_vblank,
>>>> .disable_vblank = hibmc_disable_vblank,
>>>> + .gem_free_object_unlocked = hibmc_gem_free_object,
>>>> + .dumb_create = hibmc_dumb_create,
>>>> + .dumb_map_offset = hibmc_dumb_mmap_offset,
>>>> + .dumb_destroy = drm_gem_dumb_destroy,
>>>> };
>>>>
>>>> static int hibmc_pm_suspend(struct device *dev)
>>>> @@ -163,6 +170,7 @@ static int hibmc_unload(struct drm_device *dev)
>>>> {
>>>> struct hibmc_drm_device *hidev = dev->dev_private;
>>>>
>>>> + hibmc_mm_fini(hidev);
>>>> hibmc_hw_fini(hidev);
>>>> dev->dev_private = NULL;
>>>> return 0;
>>>> @@ -183,6 +191,10 @@ static int hibmc_load(struct drm_device *dev,
>>>> unsigned long flags)
>>>> if (ret)
>>>> goto err;
>>>>
>>>> + ret = hibmc_mm_init(hidev);
>>>> + if (ret)
>>>> + goto err;
>>>> +
>>>> return 0;
>>>>
>>>> err:
>>>> diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h
>>>> b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h
>>>> index 0037341..db8d80e 100644
>>>> --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h
>>>> +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h
>>>> @@ -20,6 +20,8 @@
>>>> #define HIBMC_DRM_DRV_H
>>>>
>>>> #include <drm/drmP.h>
>>>> +#include <drm/ttm/ttm_bo_driver.h>
>>>> +#include <drm/drm_gem.h>
>>>
>>>
>>> nit: alphabetize
>>
>>
>> will fix it, thanks.
>>
>>>
>>>>
>>>> struct hibmc_drm_device {
>>>> /* hw */
>>>> @@ -30,6 +32,50 @@ struct hibmc_drm_device {
>>>>
>>>> /* drm */
>>>> struct drm_device *dev;
>>>> +
>>>> + /* ttm */
>>>> + struct {
>>>> + struct drm_global_reference mem_global_ref;
>>>> + struct ttm_bo_global_ref bo_global_ref;
>>>> + struct ttm_bo_device bdev;
>>>> + bool initialized;
>>>> + } ttm;
>>>
>>>
>>> I don't think you gain anything other than keystrokes from the substruct
>>
>>
>> I'm sorry i didn't catch you, i looked at the all drivers used ttm such
>> as ast/bochs/cirrus/mgag200/qxl/virtio_gpu, they all embedded the ttm
>> substruct
>> into the driver-private struct.
>>
>> so do you mean
>> struct hibmc_drm_device {
>> /* hw */
>> void __iomem *mmio;
>> void __iomem *fb_map;
>> unsigned long fb_base;
>> unsigned long fb_size;
>>
>> /* drm */
>> struct drm_device *dev;
>> struct drm_plane plane;
>> struct drm_crtc crtc;
>> struct drm_encoder encoder;
>> struct drm_connector connector;
>> bool mode_config_initialized;
>>
>> /* ttm */
>> struct drm_global_reference mem_global_ref;
>> struct ttm_bo_global_ref bo_global_ref;
>> struct ttm_bo_device bdev;
>> bool initialized;
>> ...
>> };
>> ?
>
> Yeah, that's what I was thinking
>
>>
>>>
>>>> +
>>>> + bool mm_inited;
>>>> };
>>>>
>>>> +struct hibmc_bo {
>>>> + struct ttm_buffer_object bo;
>>>> + struct ttm_placement placement;
>>>> + struct ttm_bo_kmap_obj kmap;
>>>> + struct drm_gem_object gem;
>>>> + struct ttm_place placements[3];
>>>> + int pin_count;
>>>> +};
>>>> +
>>>> +static inline struct hibmc_bo *hibmc_bo(struct ttm_buffer_object *bo)
>>>> +{
>>>> + return container_of(bo, struct hibmc_bo, bo);
>>>> +}
>>>> +
>>>> +static inline struct hibmc_bo *gem_to_hibmc_bo(struct drm_gem_object
>>>> *gem)
>>>> +{
>>>> + return container_of(gem, struct hibmc_bo, gem);
>>>> +}
>>>> +
>>>> +#define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>
>>>
>>> Hide this in ttm.c
>>
>>
>> ok, will do that.
>> thanks for pointing it out.
>>
>>
>>>
>>>> +
>>>> +int hibmc_gem_create(struct drm_device *dev, u32 size, bool iskernel,
>>>> + struct drm_gem_object **obj);
>>>> +
>>>> +int hibmc_mm_init(struct hibmc_drm_device *hibmc);
>>>> +void hibmc_mm_fini(struct hibmc_drm_device *hibmc);
>>>> +int hibmc_bo_pin(struct hibmc_bo *bo, u32 pl_flag, u64 *gpu_addr);
>>>> +void hibmc_gem_free_object(struct drm_gem_object *obj);
>>>> +int hibmc_dumb_create(struct drm_file *file, struct drm_device *dev,
>>>> + struct drm_mode_create_dumb *args);
>>>> +int hibmc_dumb_mmap_offset(struct drm_file *file, struct drm_device
>>>> *dev,
>>>> + u32 handle, u64 *offset);
>>>> +int hibmc_mmap(struct file *filp, struct vm_area_struct *vma);
>>>> +
>>>> #endif
>>>> diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c
>>>> b/drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c
>>>> new file mode 100644
>>>> index 0000000..0802ebd
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c
>>>> @@ -0,0 +1,490 @@
>>>> +/* Hisilicon Hibmc SoC drm driver
>>>> + *
>>>> + * Based on the bochs drm driver.
>>>> + *
>>>> + * Copyright (c) 2016 Huawei Limited.
>>>> + *
>>>> + * Author:
>>>> + * Rongrong Zou <zourongrong@huawei.com>
>>>> + * Rongrong Zou <zourongrong@gmail.com>
>>>> + * Jianhua Li <lijianhua@huawei.com>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + *
>>>> + */
>>>> +
>>>> +#include "hibmc_drm_drv.h"
>>>> +#include <ttm/ttm_page_alloc.h>
>>>> +#include <drm/drm_crtc_helper.h>
>>>> +#include <drm/drm_atomic_helper.h>
>>>> +
>>>> +static inline struct hibmc_drm_device *
>>>> +hibmc_bdev(struct ttm_bo_device *bd)
>>>> +{
>>>> + return container_of(bd, struct hibmc_drm_device, ttm.bdev);
>>>> +}
>>>> +
>>>> +static int
>>>> +hibmc_ttm_mem_global_init(struct drm_global_reference *ref)
>>>> +{
>>>> + return ttm_mem_global_init(ref->object);
>>>> +}
>>>> +
>>>> +static void
>>>> +hibmc_ttm_mem_global_release(struct drm_global_reference *ref)
>>>> +{
>>>> + ttm_mem_global_release(ref->object);
>>>> +}
>>>> +
>>>> +static int hibmc_ttm_global_init(struct hibmc_drm_device *hibmc)
>>>> +{
>>>> + struct drm_global_reference *global_ref;
>>>> + int r;
>>>
>>>
>>> nit: try not to use one character variable names unless it's for the
>>> purpose of a loop (ie: i,j). You also use ret elsewhere in the driver,
>>> so it'd be nice to remain consistent
>>
>>
>> the whole file is delivered from bochs ttm, i didn't modify anything except
>> some checkpatch warnings and the 'hibmc_' prefix. Unfortunately, some
>> problems were delivered too.
>
> Yeah, seems like it. Perhaps you can post patches to fix these issues
> in the other drivers too :)
i will do after the this one get merged :)
>
>>
>>>
>>>> +
>>>> + global_ref = &hibmc->ttm.mem_global_ref;
>>>
>>>
>>> I think using the global_ref local obfuscates what you're doing here.
>>> It saves you 6 characters while typing, but adds a layer of
>>> indirection for all future readers.
>>>
>>>> + global_ref->global_type = DRM_GLOBAL_TTM_MEM;
>>>> + global_ref->size = sizeof(struct ttm_mem_global);
>>>> + global_ref->init = &hibmc_ttm_mem_global_init;
>>>> + global_ref->release = &hibmc_ttm_mem_global_release;
>>>> + r = drm_global_item_ref(global_ref);
>>>> + if (r != 0) {
>>>
>>>
>>> nit: if (r)
>>
>>
>> will fix it,
>> thanks.
>> BTW, i wonder why checkpatch.pl didn't report it.
>>
>>
>>>
>>>> + DRM_ERROR("Failed setting up TTM memory accounting
>>>> subsystem.\n"
>>>> + );
>>>
>>>
>>> Breaking up the line for one character is probably not worthwhile, and
>>> you should really print the error. How about:
>>>
>>> DRM_ERROR("Could not get ref on ttm global ret=%d.\n", ret);
>>
>>
>> i like your solution, thanks.
>>
>>
>>>
>>>
>>>> + return r;
>>>> + }
>>>> +
>>>> + hibmc->ttm.bo_global_ref.mem_glob =
>>>> + hibmc->ttm.mem_global_ref.object;
>>>> + global_ref = &hibmc->ttm.bo_global_ref.ref;
>>>> + global_ref->global_type = DRM_GLOBAL_TTM_BO;
>>>> + global_ref->size = sizeof(struct ttm_bo_global);
>>>> + global_ref->init = &ttm_bo_global_init;
>>>> + global_ref->release = &ttm_bo_global_release;
>>>> + r = drm_global_item_ref(global_ref);
>>>> + if (r != 0) {
>>>> + DRM_ERROR("Failed setting up TTM BO subsystem.\n");
>>>> + drm_global_item_unref(&hibmc->ttm.mem_global_ref);
>>>> + return r;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void
>>>> +hibmc_ttm_global_release(struct hibmc_drm_device *hibmc)
>>>> +{
>>>> + if (!hibmc->ttm.mem_global_ref.release)
>>>
>>>
>>> Are you actually hitting this condition? This seems like it's papering
>>> over something else.
>>
>>
>> it was also delivered from others, i looked at the xxx_ttm_global_init
>> function, 'mem_global_ref.release' is assigned unconditionally, so i
>> think this condition never be hit, it may be hit when release twice,
>> but this won't take place in my driver.
>>
>
> Yeah, that's what I was hoping for. So perhaps we can remove this?
yes, we can.
Regards,
Rongrong.
>
>>>
>>>> + return;
>>>> +
>>>> + drm_global_item_unref(&hibmc->ttm.bo_global_ref.ref);
>>>> + drm_global_item_unref(&hibmc->ttm.mem_global_ref);
>>>> + hibmc->ttm.mem_global_ref.release = NULL;
>>>> +}
>>>> +
>>>> +static void hibmc_bo_ttm_destroy(struct ttm_buffer_object *tbo)
>>>> +{
>>>> + struct hibmc_bo *bo;
>>>> +
>>>> + bo = container_of(tbo, struct hibmc_bo, bo);
>>>
>>>
>>> nit: No need to split this into a separate line.
>>
>>
>> agreed, thanks.
>>
>>>
>>>> +
>>>> + drm_gem_object_release(&bo->gem);
>>>> + kfree(bo);
>>>> +}
>>>> +
>>>> +static bool hibmc_ttm_bo_is_hibmc_bo(struct ttm_buffer_object *bo)
>>>> +{
>>>> + if (bo->destroy == &hibmc_bo_ttm_destroy)
>>>> + return true;
>>>> + return false;
>>>
>>>
>>> return bo->destroy == &hibmc_bo_ttm_destroy;
>>
>>
>> looks better to me.
>>
>>
>>>
>>>> +}
>>>> +
>>>> +static int
>>>> +hibmc_bo_init_mem_type(struct ttm_bo_device *bdev, u32 type,
>>>> + struct ttm_mem_type_manager *man)
>>>> +{
>>>> + switch (type) {
>>>> + case TTM_PL_SYSTEM:
>>>> + man->flags = TTM_MEMTYPE_FLAG_MAPPABLE;
>>>> + man->available_caching = TTM_PL_MASK_CACHING;
>>>> + man->default_caching = TTM_PL_FLAG_CACHED;
>>>> + break;
>>>> + case TTM_PL_VRAM:
>>>> + man->func = &ttm_bo_manager_func;
>>>> + man->flags = TTM_MEMTYPE_FLAG_FIXED |
>>>> + TTM_MEMTYPE_FLAG_MAPPABLE;
>>>> + man->available_caching = TTM_PL_FLAG_UNCACHED |
>>>> + TTM_PL_FLAG_WC;
>>>> + man->default_caching = TTM_PL_FLAG_WC;
>>>> + break;
>>>> + default:
>>>> + DRM_ERROR("Unsupported memory type %u\n", type);
>>>> + return -EINVAL;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +void hibmc_ttm_placement(struct hibmc_bo *bo, int domain)
>>>> +{
>>>> + u32 c = 0;
>>>
>>>
>>> Can you please use a more descriptive name than 'c'?
>>
>>
>> ok, will do that.
>>
>>>
>>>> + u32 i;
>>>> +
>>>> + bo->placement.placement = bo->placements;
>>>> + bo->placement.busy_placement = bo->placements;
>>>> + if (domain & TTM_PL_FLAG_VRAM)
>>>> + bo->placements[c++].flags = TTM_PL_FLAG_WC |
>>>> + TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM;
>>>
>>>
>>> nit: you're alignment is off here and below
>>
>>
>> is it correct?
>>
>> if (domain & TTM_PL_FLAG_VRAM)
>> bo->placements[c++].flags = TTM_PL_FLAG_WC |
>> TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM;
>> if (domain & TTM_PL_FLAG_SYSTEM)
>> bo->placements[c++].flags = TTM_PL_MASK_CACHING |
>> TTM_PL_FLAG_SYSTEM;
>> if (!c)
>> bo->placements[c++].flags = TTM_PL_MASK_CACHING |
>> TTM_PL_FLAG_SYSTEM;
>>
>
> Pretty much anything other than lining them up one under the other is better
>
>>>
>>>> + if (domain & TTM_PL_FLAG_SYSTEM)
>>>> + bo->placements[c++].flags = TTM_PL_MASK_CACHING |
>>>> + TTM_PL_FLAG_SYSTEM;
>>>> + if (!c)
>>>> + bo->placements[c++].flags = TTM_PL_MASK_CACHING |
>>>> + TTM_PL_FLAG_SYSTEM;
>>>> +
>>>> + bo->placement.num_placement = c;
>>>> + bo->placement.num_busy_placement = c;
>>>> + for (i = 0; i < c; ++i) {
>>>
>>>
>>> nit: we tend towards post-increment in kernel
>>
>>
>> agreed, thanks.
>>
>>
>>>
>>>> + bo->placements[i].fpfn = 0;
>>>> + bo->placements[i].lpfn = 0;
>>>> + }
>>>> +}
>>>> +
>>>> +static void
>>>> +hibmc_bo_evict_flags(struct ttm_buffer_object *bo, struct ttm_placement
>>>> *pl)
>>>> +{
>>>> + struct hibmc_bo *hibmcbo = hibmc_bo(bo);
>>>> +
>>>> + if (!hibmc_ttm_bo_is_hibmc_bo(bo))
>>>> + return;
>>>> +
>>>> + hibmc_ttm_placement(hibmcbo, TTM_PL_FLAG_SYSTEM);
>>>> + *pl = hibmcbo->placement;
>>>> +}
>>>> +
>>>> +static int hibmc_bo_verify_access(struct ttm_buffer_object *bo,
>>>> + struct file *filp)
>>>> +{
>>>> + struct hibmc_bo *hibmcbo = hibmc_bo(bo);
>>>> +
>>>> + return drm_vma_node_verify_access(&hibmcbo->gem.vma_node,
>>>> + filp->private_data);
>>>> +}
>>>> +
>>>> +static int hibmc_ttm_io_mem_reserve(struct ttm_bo_device *bdev,
>>>> + struct ttm_mem_reg *mem)
>>>> +{
>>>> + struct ttm_mem_type_manager *man = &bdev->man[mem->mem_type];
>>>> + struct hibmc_drm_device *hibmc = hibmc_bdev(bdev);
>>>> +
>>>> + mem->bus.addr = NULL;
>>>> + mem->bus.offset = 0;
>>>> + mem->bus.size = mem->num_pages << PAGE_SHIFT;
>>>> + mem->bus.base = 0;
>>>> + mem->bus.is_iomem = false;
>>>> + if (!(man->flags & TTM_MEMTYPE_FLAG_MAPPABLE))
>>>> + return -EINVAL;
>>>> + switch (mem->mem_type) {
>>>> + case TTM_PL_SYSTEM:
>>>> + /* system memory */
>>>> + return 0;
>>>> + case TTM_PL_VRAM:
>>>> + mem->bus.offset = mem->start << PAGE_SHIFT;
>>>> + mem->bus.base = pci_resource_start(hibmc->dev->pdev, 0);
>>>> + mem->bus.is_iomem = true;
>>>> + break;
>>>> + default:
>>>> + return -EINVAL;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void hibmc_ttm_io_mem_free(struct ttm_bo_device *bdev,
>>>> + struct ttm_mem_reg *mem)
>>>> +{
>>>> +}
>>>
>>>
>>> No need to stub this, the caller does a NULL-check before invoking
>>
>>
>> will delete it, thanks.
>>
>>>
>>>> +
>>>> +static void hibmc_ttm_backend_destroy(struct ttm_tt *tt)
>>>> +{
>>>> + ttm_tt_fini(tt);
>>>> + kfree(tt);
>>>> +}
>>>> +
>>>> +static struct ttm_backend_func hibmc_tt_backend_func = {
>>>> + .destroy = &hibmc_ttm_backend_destroy,
>>>> +};
>>>> +
>>>> +static struct ttm_tt *hibmc_ttm_tt_create(struct ttm_bo_device *bdev,
>>>> + unsigned long size,
>>>> + u32 page_flags,
>>>> + struct page *dummy_read_page)
>>>> +{
>>>> + struct ttm_tt *tt;
>>>> +
>>>> + tt = kzalloc(sizeof(*tt), GFP_KERNEL);
>>>> + if (!tt)
>>>
>>>
>>> Print error
>>
>>
>> ok, will do that, thanks.
>>
>>>
>>>> + return NULL;
>>>> + tt->func = &hibmc_tt_backend_func;
>>>> + if (ttm_tt_init(tt, bdev, size, page_flags, dummy_read_page)) {
>>>
>>>
>>> Here too?
>>
>>
>> ditto
>>
>>
>>>
>>>> + kfree(tt);
>>>> + return NULL;
>>>> + }
>>>> + return tt;
>>>> +}
>>>> +
>>>> +static int hibmc_ttm_tt_populate(struct ttm_tt *ttm)
>>>> +{
>>>> + return ttm_pool_populate(ttm);
>>>> +}
>>>> +
>>>> +static void hibmc_ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>> +{
>>>> + ttm_pool_unpopulate(ttm);
>>>> +}
>>>> +
>>>> +struct ttm_bo_driver hibmc_bo_driver = {
>>>> + .ttm_tt_create = hibmc_ttm_tt_create,
>>>> + .ttm_tt_populate = hibmc_ttm_tt_populate,
>>>> + .ttm_tt_unpopulate = hibmc_ttm_tt_unpopulate,
>>>> + .init_mem_type = hibmc_bo_init_mem_type,
>>>> + .evict_flags = hibmc_bo_evict_flags,
>>>> + .move = NULL,
>>>> + .verify_access = hibmc_bo_verify_access,
>>>> + .io_mem_reserve = &hibmc_ttm_io_mem_reserve,
>>>> + .io_mem_free = &hibmc_ttm_io_mem_free,
>>>> + .lru_tail = &ttm_bo_default_lru_tail,
>>>> + .swap_lru_tail = &ttm_bo_default_swap_lru_tail,
>>>> +};
>>>> +
>>>> +int hibmc_mm_init(struct hibmc_drm_device *hibmc)
>>>> +{
>>>> + int ret;
>>>> + struct drm_device *dev = hibmc->dev;
>>>> + struct ttm_bo_device *bdev = &hibmc->ttm.bdev;
>>>> +
>>>> + ret = hibmc_ttm_global_init(hibmc);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + ret = ttm_bo_device_init(&hibmc->ttm.bdev,
>>>> + hibmc->ttm.bo_global_ref.ref.object,
>>>> + &hibmc_bo_driver,
>>>> + dev->anon_inode->i_mapping,
>>>> + DRM_FILE_PAGE_OFFSET,
>>>> + true);
>>>> + if (ret) {
>>>
>>>
>>> Call hibmc_ttm_global_release here?
>>
>>
>> agreed, thanks for pointing it out.
>>
>>>
>>>> + DRM_ERROR("Error initialising bo driver; %d\n", ret);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + ret = ttm_bo_init_mm(bdev, TTM_PL_VRAM,
>>>> + hibmc->fb_size >> PAGE_SHIFT);
>>>> + if (ret) {
>>>
>>>
>>> Clean up here as well?
>>
>>
>> ditto
>>
>>
>>>
>>>> + DRM_ERROR("Failed ttm VRAM init: %d\n", ret);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + hibmc->mm_inited = true;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +void hibmc_mm_fini(struct hibmc_drm_device *hibmc)
>>>> +{
>>>> + if (!hibmc->mm_inited)
>>>> + return;
>>>> +
>>>> + ttm_bo_device_release(&hibmc->ttm.bdev);
>>>> + hibmc_ttm_global_release(hibmc);
>>>> + hibmc->mm_inited = false;
>>>> +}
>>>> +
>>>> +int hibmc_bo_create(struct drm_device *dev, int size, int align,
>>>> + u32 flags, struct hibmc_bo **phibmcbo)
>>>> +{
>>>> + struct hibmc_drm_device *hibmc = dev->dev_private;
>>>> + struct hibmc_bo *hibmcbo;
>>>> + size_t acc_size;
>>>> + int ret;
>>>> +
>>>> + hibmcbo = kzalloc(sizeof(*hibmcbo), GFP_KERNEL);
>>>> + if (!hibmcbo)
>>>> + return -ENOMEM;
>>>> +
>>>> + ret = drm_gem_object_init(dev, &hibmcbo->gem, size);
>>>> + if (ret) {
>>>> + kfree(hibmcbo);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + hibmcbo->bo.bdev = &hibmc->ttm.bdev;
>>>> +
>>>> + hibmc_ttm_placement(hibmcbo, TTM_PL_FLAG_VRAM |
>>>> TTM_PL_FLAG_SYSTEM);
>>>> +
>>>> + acc_size = ttm_bo_dma_acc_size(&hibmc->ttm.bdev, size,
>>>> + sizeof(struct hibmc_bo));
>>>> +
>>>> + ret = ttm_bo_init(&hibmc->ttm.bdev, &hibmcbo->bo, size,
>>>> + ttm_bo_type_device, &hibmcbo->placement,
>>>> + align >> PAGE_SHIFT, false, NULL, acc_size,
>>>> + NULL, NULL, hibmc_bo_ttm_destroy);
>>>> + if (ret)
>>>
>>>
>>> Missing hibmcbo clean up here
>>
>>
>> i looked at all other ttm drivers and all of them return directly when
>> ttm_bo_init
>> failed, however, i think it is better to clean up here, should i call
>> hibmc_bo_unref(&hibmc_bo) here ?
>>
>
> Yeah, that should work (might want to test it, though ;)
>
>
>>>
>>>> + return ret;
>>>> +
>>>> + *phibmcbo = hibmcbo;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static inline u64 hibmc_bo_gpu_offset(struct hibmc_bo *bo)
>>>> +{
>>>> + return bo->bo.offset;
>>>> +}
>>>
>>>
>>> I don't think this function provides any value
>>
>>
>> do you nean i use bo->bo.offset instead of calling hibmc_bo_gpu_offset()?
>>
>
> yes
>
>>>
>>>> +
>>>> +int hibmc_bo_pin(struct hibmc_bo *bo, u32 pl_flag, u64 *gpu_addr)
>>>> +{
>>>> + int i, ret;
>>>> +
>>>> + if (bo->pin_count) {
>>>> + bo->pin_count++;
>>>> + if (gpu_addr)
>>>> + *gpu_addr = hibmc_bo_gpu_offset(bo);
>>>
>>>
>>> Are you missing a return here?
>>
>>
>> Thanks for pointing it out!
>>
>>
>>>
>>>> + }
>>>> +
>>>> + hibmc_ttm_placement(bo, pl_flag);
>>>> + for (i = 0; i < bo->placement.num_placement; i++)
>>>> + bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
>>>> + ret = ttm_bo_validate(&bo->bo, &bo->placement, false, false);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + bo->pin_count = 1;
>>>> + if (gpu_addr)
>>>> + *gpu_addr = hibmc_bo_gpu_offset(bo);
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int hibmc_bo_push_sysram(struct hibmc_bo *bo)
>>>> +{
>>>> + int i, ret;
>>>> +
>>>> + if (!bo->pin_count) {
>>>> + DRM_ERROR("unpin bad %p\n", bo);
>>>> + return 0;
>>>> + }
>>>> + bo->pin_count--;
>>>> + if (bo->pin_count)
>>>> + return 0;
>>>> +
>>>> + if (bo->kmap.virtual)
>>>
>>>
>>> ttm_bo_kunmap already does this check so you don't have to
>>
>>
>> agreed. will remove this condition.
>>
>>>
>>>> + ttm_bo_kunmap(&bo->kmap);
>>>> +
>>>> + hibmc_ttm_placement(bo, TTM_PL_FLAG_SYSTEM);
>>>> + for (i = 0; i < bo->placement.num_placement ; i++)
>>>> + bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
>>>> +
>>>> + ret = ttm_bo_validate(&bo->bo, &bo->placement, false, false);
>>>> + if (ret) {
>>>> + DRM_ERROR("pushing to VRAM failed\n");
>>>
>>>
>>> Print ret
>>
>>
>> ok, thanks.
>>
>>
>>>
>>>> + return ret;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int hibmc_mmap(struct file *filp, struct vm_area_struct *vma)
>>>> +{
>>>> + struct drm_file *file_priv;
>>>> + struct hibmc_drm_device *hibmc;
>>>> +
>>>> + if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET))
>>>> + return -EINVAL;
>>>> +
>>>> + file_priv = filp->private_data;
>>>> + hibmc = file_priv->minor->dev->dev_private;
>>>> + return ttm_bo_mmap(filp, vma, &hibmc->ttm.bdev);
>>>> +}
>>>> +
>>>> +int hibmc_gem_create(struct drm_device *dev, u32 size, bool iskernel,
>>>> + struct drm_gem_object **obj)
>>>> +{
>>>> + struct hibmc_bo *hibmcbo;
>>>> + int ret;
>>>> +
>>>> + *obj = NULL;
>>>> +
>>>> + size = PAGE_ALIGN(size);
>>>> + if (size == 0)
>>>
>>>
>>> Print error
>>
>>
>> ditto
>>
>>>
>>>> + return -EINVAL;
>>>> +
>>>> + ret = hibmc_bo_create(dev, size, 0, 0, &hibmcbo);
>>>> + if (ret) {
>>>> + if (ret != -ERESTARTSYS)
>>>> + DRM_ERROR("failed to allocate GEM object\n");
>>>
>>>
>>> Print ret
>>
>>
>> ditto
>>
>>>
>>>> + return ret;
>>>> + }
>>>> + *obj = &hibmcbo->gem;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int hibmc_dumb_create(struct drm_file *file, struct drm_device *dev,
>>>> + struct drm_mode_create_dumb *args)
>>>> +{
>>>> + struct drm_gem_object *gobj;
>>>> + u32 handle;
>>>> + int ret;
>>>> +
>>>> + args->pitch = ALIGN(args->width * ((args->bpp + 7) / 8), 16);
>>>
>>>
>>> What's up with the bpp + 7 here? Perhaps you're looking for DIV_ROUND_UP?
>>
>>
>> Yes, that sounds sane.
>>
>
> sane is what i usually aim for :)
>
> Sean
>
>
>>>
>>>
>>>> + args->size = args->pitch * args->height;
>>>> +
>>>> + ret = hibmc_gem_create(dev, args->size, false,
>>>> + &gobj);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + ret = drm_gem_handle_create(file, gobj, &handle);
>>>> + drm_gem_object_unreference_unlocked(gobj);
>>>> + if (ret)
>>>
>>>
>>> Print error here
>>
>>
>> agreed.
>>
>>
>>>
>>>> + return ret;
>>>> +
>>>> + args->handle = handle;
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void hibmc_bo_unref(struct hibmc_bo **bo)
>>>> +{
>>>> + struct ttm_buffer_object *tbo;
>>>> +
>>>> + if ((*bo) == NULL)
>>>> + return;
>>>> +
>>>> + tbo = &((*bo)->bo);
>>>> + ttm_bo_unref(&tbo);
>>>> + *bo = NULL;
>>>> +}
>>>> +
>>>> +void hibmc_gem_free_object(struct drm_gem_object *obj)
>>>> +{
>>>> + struct hibmc_bo *hibmcbo = gem_to_hibmc_bo(obj);
>>>> +
>>>> + hibmc_bo_unref(&hibmcbo);
>>>> +}
>>>> +
>>>> +static u64 hibmc_bo_mmap_offset(struct hibmc_bo *bo)
>>>> +{
>>>> + return drm_vma_node_offset_addr(&bo->bo.vma_node);
>>>> +}
>>>> +
>>>> +int hibmc_dumb_mmap_offset(struct drm_file *file, struct drm_device
>>>> *dev,
>>>> + u32 handle, u64 *offset)
>>>> +{
>>>> + struct drm_gem_object *obj;
>>>> + struct hibmc_bo *bo;
>>>> +
>>>> + obj = drm_gem_object_lookup(file, handle);
>>>> + if (!obj)
>>>> + return -ENOENT;
>>>> +
>>>> + bo = gem_to_hibmc_bo(obj);
>>>> + *offset = hibmc_bo_mmap_offset(bo);
>>>> +
>>>> + drm_gem_object_unreference_unlocked(obj);
>>>> + return 0;
>>>> +}
>>
>>
>> Regards,
>> Rongrong.
>>
>>>> --
>>>> 1.9.1
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> _______________________________________________
>>> linuxarm mailing list
>>> linuxarm at huawei.com
>>> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm
>>>
>>> .
>>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>
--
Regards, Rongrong
^ permalink raw reply
* [PATCH v3 2/3] pinctrl: sunxi: Add support for fetching pinconf settings from hardware
From: Chen-Yu Tsai @ 2016-11-11 13:57 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161111135159.zqjqjjn7azg4gkzx@lukather>
On Fri, Nov 11, 2016 at 9:51 PM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> On Fri, Nov 11, 2016 at 05:50:35PM +0800, Chen-Yu Tsai wrote:
>> The sunxi pinctrl driver only caches whatever pinconf setting was last
>> set on a given pingroup. This is not particularly helpful, nor is it
>> correct.
>>
>> Fix this by actually reading the hardware registers and returning
>> the correct results or error codes. Also filter out unsupported
>> pinconf settings. Since this driver has a peculiar setup of 1 pin
>> per group, we can support both pin and pingroup pinconf setting
>> read back with the same code. The sunxi_pconf_reg helper and code
>> structure is inspired by pinctrl-msm.
>>
>> With this done we can also claim to support generic pinconf, by
>> setting .is_generic = true in pinconf_ops.
>>
>> Also remove the cached config value. The behavior of this was never
>> correct, as it only cached 1 setting instead of all of them. Since
>> we can now read back settings directly from the hardware, it is no
>> longer required.
>>
>> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
>> ---
>> drivers/pinctrl/sunxi/pinctrl-sunxi.c | 86 +++++++++++++++++++++++++++++++++--
>> drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 -
>> 2 files changed, 81 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
>> index e04edda8629d..ed71bff39869 100644
>> --- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
>> +++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
>> @@ -438,15 +438,91 @@ static const struct pinctrl_ops sunxi_pctrl_ops = {
>> .get_group_pins = sunxi_pctrl_get_group_pins,
>> };
>>
>> +static int sunxi_pconf_reg(unsigned pin, enum pin_config_param param,
>
> Sorry, this went unnoticed in your previous version, but checkpatch
> reports:
>
> WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
>
> For this one and the next function.
I know, but the function pointer definition in pinconf.h says
'unsigned', not 'unsigned int'.
Having a mismatch between the two is even more confusing...
ChenYu
>
> Once fixed,
> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
>
> Thanks!
> Maxime
>
> --
> Maxime Ripard, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com
^ permalink raw reply
* [PATCH v14 5/9] clocksource/drivers/arm_arch_timer: Simplify ACPI support code.
From: Hanjun Guo @ 2016-11-11 13:55 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161021112120.GC16630@leverpostej>
On 10/21/2016 07:21 PM, Mark Rutland wrote:
> On Fri, Oct 21, 2016 at 12:14:01PM +0100, Mark Rutland wrote:
>> On Thu, Oct 20, 2016 at 05:58:17PM +0100, Mark Rutland wrote:
>>> On Thu, Sep 29, 2016 at 02:17:13AM +0800, fu.wei at linaro.org wrote:
>>>> + arch_timer_ppi[PHYS_NONSECURE_PPI] = acpi_gtdt_map_ppi(PHYS_NONSECURE_PPI);
>>>> + arch_timer_ppi[VIRT_PPI] = acpi_gtdt_map_ppi(VIRT_PPI);
>>>> + arch_timer_ppi[HYP_PPI] = acpi_gtdt_map_ppi(HYP_PPI);
>>>> + /* Always-on capability */
>>>> + arch_timer_c3stop = acpi_gtdt_c3stop();
>>>
>>> ... I think we should check the flag on the relevant interrupt, though
>>> that's worth clarifying.
>>
>> I see I misread the spec; this is part of the common flags.
>>
>> Please ignore this point; sorry for the noise.
>
> Actually, I misread the spec this time around; the flag *can* differ per
> interrupt for the sysreg/cp15 timer, but not for the MMIO timers where
> the flag is in a common field.
>
> So please *do* consider the above.
Sorry, misread the email as well...
Check the spec again and it's per interrupt flag.
Thanks
Hanjun
^ permalink raw reply
* [PATCH v3 2/3] pinctrl: sunxi: Add support for fetching pinconf settings from hardware
From: Maxime Ripard @ 2016-11-11 13:51 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161111095036.11803-3-wens@csie.org>
On Fri, Nov 11, 2016 at 05:50:35PM +0800, Chen-Yu Tsai wrote:
> The sunxi pinctrl driver only caches whatever pinconf setting was last
> set on a given pingroup. This is not particularly helpful, nor is it
> correct.
>
> Fix this by actually reading the hardware registers and returning
> the correct results or error codes. Also filter out unsupported
> pinconf settings. Since this driver has a peculiar setup of 1 pin
> per group, we can support both pin and pingroup pinconf setting
> read back with the same code. The sunxi_pconf_reg helper and code
> structure is inspired by pinctrl-msm.
>
> With this done we can also claim to support generic pinconf, by
> setting .is_generic = true in pinconf_ops.
>
> Also remove the cached config value. The behavior of this was never
> correct, as it only cached 1 setting instead of all of them. Since
> we can now read back settings directly from the hardware, it is no
> longer required.
>
> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
> ---
> drivers/pinctrl/sunxi/pinctrl-sunxi.c | 86 +++++++++++++++++++++++++++++++++--
> drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 -
> 2 files changed, 81 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
> index e04edda8629d..ed71bff39869 100644
> --- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
> +++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
> @@ -438,15 +438,91 @@ static const struct pinctrl_ops sunxi_pctrl_ops = {
> .get_group_pins = sunxi_pctrl_get_group_pins,
> };
>
> +static int sunxi_pconf_reg(unsigned pin, enum pin_config_param param,
Sorry, this went unnoticed in your previous version, but checkpatch
reports:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
For this one and the next function.
Once fixed,
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Thanks!
Maxime
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161111/78072c0f/attachment.sig>
^ permalink raw reply
* [PATCH] crypto: arm64/sha2: integrate OpenSSL implementations of SHA256/SHA512
From: Ard Biesheuvel @ 2016-11-11 13:51 UTC (permalink / raw)
To: linux-arm-kernel
This integrates both the accelerated scalar and the NEON implementations
of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
Relative performance compared to the respective generic C versions:
| SHA256-scalar | SHA256-NEON* | SHA512 |
------------+-----------------+--------------+----------+
Cortex-A53 | 1.63x | 1.63x | 2.34x |
Cortex-A57 | 1.43x | 1.59x | 1.95x |
Cortex-A73 | 1.26x | 1.56x | ? |
The core crypto code was authored by Andy Polyakov of the OpenSSL
project, in collaboration with whom the upstream code was adapted so
that this module can be built from the same version of sha512-armv8.pl.
The version in this patch was taken from OpenSSL commit
866e505e0d66 sha/asm/sha512-armv8.pl: add NEON version of SHA256.
* The core SHA algorithm is fundamentally sequential, but there is a
secondary transformation involved, called the schedule update, which
can be performed independently. The NEON version of SHA-224/SHA-256
only implements this part of the algorithm using NEON instructions,
the sequential part is always done using scalar instructions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
This supersedes the SHA-256-NEON-only patch I sent out about 6 weeks ago.
Will, Catalin: note that this pulls in a .pl script, and adds a build rule
locally in arch/arm64/crypto to generate .S files on the fly from Perl
scripts. I will leave it to you to decide whether you are ok with this as
is, or whether you prefer .S_shipped files, in which case the Perl script
is only included as a reference (this is how we did it for arch/arm in the
past, but given that it adds about 3000 lines of generated code to the patch,
I think we may want to simply keep it as below)
arch/arm64/crypto/Kconfig | 8 +
arch/arm64/crypto/Makefile | 15 +
arch/arm64/crypto/sha256-glue.c | 185 +++++
arch/arm64/crypto/sha512-armv8.pl | 778 ++++++++++++++++++++
arch/arm64/crypto/sha512-glue.c | 94 +++
5 files changed, 1080 insertions(+)
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..5f4a617e2957 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -8,6 +8,14 @@ menuconfig ARM64_CRYPTO
if ARM64_CRYPTO
+config CRYPTO_SHA256_ARM64
+ tristate "SHA-224/SHA-256 digest algorithm for arm64"
+ select CRYPTO_HASH
+
+config CRYPTO_SHA512_ARM64
+ tristate "SHA-384/SHA-512 digest algorithm for arm64"
+ select CRYPTO_HASH
+
config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3cfcfe..861589faf6ef 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,6 +29,12 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
aes-neon-blk-y := aes-glue-neon.o aes-neon.o
+obj-$(CONFIG_CRYPTO_SHA256_ARM64) += sha256-arm64.o
+sha256-arm64-y := sha256-glue.o sha256-core.o
+
+obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
+sha512-arm64-y := sha512-glue.o sha512-core.o
+
AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
@@ -40,3 +46,12 @@ CFLAGS_crc32-arm64.o := -mcpu=generic+crc
$(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
$(call if_changed_rule,cc_o_c)
+
+quiet_cmd_perl = PERLASM $@
+ cmd_perl = $(PERL) $(<) void $(@)
+
+$(obj)/sha256-core.S: $(src)/sha512-armv8.pl
+ $(call cmd,perl)
+
+$(obj)/sha512-core.S: $(src)/sha512-armv8.pl
+ $(call cmd,perl)
diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
new file mode 100644
index 000000000000..a2226f841960
--- /dev/null
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -0,0 +1,185 @@
+/*
+ * Linux/arm64 port of the OpenSSL SHA256 implementation for AArch64
+ *
+ * Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <crypto/sha256_base.h>
+#include <linux/cryptohash.h>
+#include <linux/types.h>
+#include <linux/string.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash for arm64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha224");
+MODULE_ALIAS_CRYPTO("sha256");
+
+asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
+ unsigned int num_blks);
+
+asmlinkage void sha256_block_neon(u32 *digest, const void *data,
+ unsigned int num_blks);
+
+static int sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ return sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_data_order);
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (len)
+ sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_data_order);
+ sha256_base_do_finalize(desc,
+ (sha256_block_fn *)sha256_block_data_order);
+
+ return sha256_base_finish(desc, out);
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+ return sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg algs[] = { {
+ .digestsize = SHA256_DIGEST_SIZE,
+ .init = sha256_base_init,
+ .update = sha256_update,
+ .final = sha256_final,
+ .finup = sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha256",
+ .base.cra_driver_name = "sha256-arm64",
+ .base.cra_priority = 100,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA256_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+}, {
+ .digestsize = SHA224_DIGEST_SIZE,
+ .init = sha224_base_init,
+ .update = sha256_update,
+ .final = sha256_final,
+ .finup = sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha224",
+ .base.cra_driver_name = "sha224-arm64",
+ .base.cra_priority = 100,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA224_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+} };
+
+static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ /*
+ * Stacking and unstacking a substantial slice of the NEON register
+ * file may significantly affect performance for small updates when
+ * executing in interrupt context, so fall back to the scalar code
+ * in that case.
+ */
+ if (!may_use_simd())
+ return sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_data_order);
+
+ kernel_neon_begin();
+ sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_neon);
+ kernel_neon_end();
+
+ return 0;
+}
+
+static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (!may_use_simd()) {
+ if (len)
+ sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_data_order);
+ sha256_base_do_finalize(desc,
+ (sha256_block_fn *)sha256_block_data_order);
+ } else {
+ kernel_neon_begin();
+ if (len)
+ sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_neon);
+ sha256_base_do_finalize(desc,
+ (sha256_block_fn *)sha256_block_neon);
+ kernel_neon_end();
+ }
+ return sha256_base_finish(desc, out);
+}
+
+static int sha256_final_neon(struct shash_desc *desc, u8 *out)
+{
+ return sha256_finup_neon(desc, NULL, 0, out);
+}
+
+static struct shash_alg neon_algs[] = { {
+ .digestsize = SHA256_DIGEST_SIZE,
+ .init = sha256_base_init,
+ .update = sha256_update_neon,
+ .final = sha256_final_neon,
+ .finup = sha256_finup_neon,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha256",
+ .base.cra_driver_name = "sha256-arm64-neon",
+ .base.cra_priority = 150,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA256_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+}, {
+ .digestsize = SHA224_DIGEST_SIZE,
+ .init = sha224_base_init,
+ .update = sha256_update_neon,
+ .final = sha256_final_neon,
+ .finup = sha256_finup_neon,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha224",
+ .base.cra_driver_name = "sha224-arm64-neon",
+ .base.cra_priority = 150,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA224_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+} };
+
+static int __init sha256_mod_init(void)
+{
+ int ret = crypto_register_shashes(algs, ARRAY_SIZE(algs));
+ if (ret)
+ return ret;
+
+ if (elf_hwcap & HWCAP_ASIMD) {
+ ret = crypto_register_shashes(neon_algs, ARRAY_SIZE(neon_algs));
+ if (ret)
+ crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+ }
+ return ret;
+}
+
+static void __exit sha256_mod_fini(void)
+{
+ if (elf_hwcap & HWCAP_ASIMD)
+ crypto_unregister_shashes(neon_algs, ARRAY_SIZE(neon_algs));
+ crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_init(sha256_mod_init);
+module_exit(sha256_mod_fini);
diff --git a/arch/arm64/crypto/sha512-armv8.pl b/arch/arm64/crypto/sha512-armv8.pl
new file mode 100644
index 000000000000..ffae5f23bcd8
--- /dev/null
+++ b/arch/arm64/crypto/sha512-armv8.pl
@@ -0,0 +1,778 @@
+#! /usr/bin/env perl
+# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the OpenSSL license (the "License"). You may not use
+# this file except in compliance with the License. You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# ====================================================================
+# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see http://www.openssl.org/~appro/cryptogams/.
+#
+# Permission to use under GPLv2 terms is granted.
+# ====================================================================
+#
+# SHA256/512 for ARMv8.
+#
+# Performance in cycles per processed byte and improvement coefficient
+# over code generated with "default" compiler:
+#
+# SHA256-hw SHA256(*) SHA512
+# Apple A7 1.97 10.5 (+33%) 6.73 (-1%(**))
+# Cortex-A53 2.38 15.5 (+115%) 10.0 (+150%(***))
+# Cortex-A57 2.31 11.6 (+86%) 7.51 (+260%(***))
+# Denver 2.01 10.5 (+26%) 6.70 (+8%)
+# X-Gene 20.0 (+100%) 12.8 (+300%(***))
+# Mongoose 2.36 13.0 (+50%) 8.36 (+33%)
+#
+# (*) Software SHA256 results are of lesser relevance, presented
+# mostly for informational purposes.
+# (**) The result is a trade-off: it's possible to improve it by
+# 10% (or by 1 cycle per round), but at the cost of 20% loss
+# on Cortex-A53 (or by 4 cycles per round).
+# (***) Super-impressive coefficients over gcc-generated code are
+# indication of some compiler "pathology", most notably code
+# generated with -mgeneral-regs-only is significanty faster
+# and the gap is only 40-90%.
+#
+# October 2016.
+#
+# Originally it was reckoned that it makes no sense to implement NEON
+# version of SHA256 for 64-bit processors. This is because performance
+# improvement on most wide-spread Cortex-A5x processors was observed
+# to be marginal, same on Cortex-A53 and ~10% on A57. But then it was
+# observed that 32-bit NEON SHA256 performs significantly better than
+# 64-bit scalar version on *some* of the more recent processors. As
+# result 64-bit NEON version of SHA256 was added to provide best
+# all-round performance. For example it executes ~30% faster on X-Gene
+# and Mongoose. [For reference, NEON version of SHA512 is bound to
+# deliver much less improvement, likely *negative* on Cortex-A5x.
+# Which is why NEON support is limited to SHA256.]
+
+$output=pop;
+$flavour=pop;
+
+if ($flavour && $flavour ne "void") {
+ $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
+ ( $xlate="${dir}arm-xlate.pl" and -f $xlate ) or
+ ( $xlate="${dir}../../perlasm/arm-xlate.pl" and -f $xlate) or
+ die "can't locate arm-xlate.pl";
+
+ open OUT,"| \"$^X\" $xlate $flavour $output";
+ *STDOUT=*OUT;
+} else {
+ open STDOUT,">$output";
+}
+
+if ($output =~ /512/) {
+ $BITS=512;
+ $SZ=8;
+ @Sigma0=(28,34,39);
+ @Sigma1=(14,18,41);
+ @sigma0=(1, 8, 7);
+ @sigma1=(19,61, 6);
+ $rounds=80;
+ $reg_t="x";
+} else {
+ $BITS=256;
+ $SZ=4;
+ @Sigma0=( 2,13,22);
+ @Sigma1=( 6,11,25);
+ @sigma0=( 7,18, 3);
+ @sigma1=(17,19,10);
+ $rounds=64;
+ $reg_t="w";
+}
+
+$func="sha${BITS}_block_data_order";
+
+($ctx,$inp,$num,$Ktbl)=map("x$_",(0..2,30));
+
+ at X=map("$reg_t$_",(3..15,0..2));
+ at V=($A,$B,$C,$D,$E,$F,$G,$H)=map("$reg_t$_",(20..27));
+($t0,$t1,$t2,$t3)=map("$reg_t$_",(16,17,19,28));
+
+sub BODY_00_xx {
+my ($i,$a,$b,$c,$d,$e,$f,$g,$h)=@_;
+my $j=($i+1)&15;
+my ($T0,$T1,$T2)=(@X[($i-8)&15], at X[($i-9)&15], at X[($i-10)&15]);
+ $T0=@X[$i+3] if ($i<11);
+
+$code.=<<___ if ($i<16);
+#ifndef __ARMEB__
+ rev @X[$i], at X[$i] // $i
+#endif
+___
+$code.=<<___ if ($i<13 && ($i&1));
+ ldp @X[$i+1], at X[$i+2],[$inp],#2*$SZ
+___
+$code.=<<___ if ($i==13);
+ ldp @X[14],@X[15],[$inp]
+___
+$code.=<<___ if ($i>=14);
+ ldr @X[($i-11)&15],[sp,#`$SZ*(($i-11)%4)`]
+___
+$code.=<<___ if ($i>0 && $i<16);
+ add $a,$a,$t1 // h+=Sigma0(a)
+___
+$code.=<<___ if ($i>=11);
+ str @X[($i-8)&15],[sp,#`$SZ*(($i-8)%4)`]
+___
+# While ARMv8 specifies merged rotate-n-logical operation such as
+# 'eor x,y,z,ror#n', it was found to negatively affect performance
+# on Apple A7. The reason seems to be that it requires even 'y' to
+# be available earlier. This means that such merged instruction is
+# not necessarily best choice on critical path... On the other hand
+# Cortex-A5x handles merged instructions much better than disjoint
+# rotate and logical... See (**) footnote above.
+$code.=<<___ if ($i<15);
+ ror $t0,$e,#$Sigma1[0]
+ add $h,$h,$t2 // h+=K[i]
+ eor $T0,$e,$e,ror#`$Sigma1[2]-$Sigma1[1]`
+ and $t1,$f,$e
+ bic $t2,$g,$e
+ add $h,$h,@X[$i&15] // h+=X[i]
+ orr $t1,$t1,$t2 // Ch(e,f,g)
+ eor $t2,$a,$b // a^b, b^c in next round
+ eor $t0,$t0,$T0,ror#$Sigma1[1] // Sigma1(e)
+ ror $T0,$a,#$Sigma0[0]
+ add $h,$h,$t1 // h+=Ch(e,f,g)
+ eor $t1,$a,$a,ror#`$Sigma0[2]-$Sigma0[1]`
+ add $h,$h,$t0 // h+=Sigma1(e)
+ and $t3,$t3,$t2 // (b^c)&=(a^b)
+ add $d,$d,$h // d+=h
+ eor $t3,$t3,$b // Maj(a,b,c)
+ eor $t1,$T0,$t1,ror#$Sigma0[1] // Sigma0(a)
+ add $h,$h,$t3 // h+=Maj(a,b,c)
+ ldr $t3,[$Ktbl],#$SZ // *K++, $t2 in next round
+ //add $h,$h,$t1 // h+=Sigma0(a)
+___
+$code.=<<___ if ($i>=15);
+ ror $t0,$e,#$Sigma1[0]
+ add $h,$h,$t2 // h+=K[i]
+ ror $T1, at X[($j+1)&15],#$sigma0[0]
+ and $t1,$f,$e
+ ror $T2, at X[($j+14)&15],#$sigma1[0]
+ bic $t2,$g,$e
+ ror $T0,$a,#$Sigma0[0]
+ add $h,$h, at X[$i&15] // h+=X[i]
+ eor $t0,$t0,$e,ror#$Sigma1[1]
+ eor $T1,$T1, at X[($j+1)&15],ror#$sigma0[1]
+ orr $t1,$t1,$t2 // Ch(e,f,g)
+ eor $t2,$a,$b // a^b, b^c in next round
+ eor $t0,$t0,$e,ror#$Sigma1[2] // Sigma1(e)
+ eor $T0,$T0,$a,ror#$Sigma0[1]
+ add $h,$h,$t1 // h+=Ch(e,f,g)
+ and $t3,$t3,$t2 // (b^c)&=(a^b)
+ eor $T2,$T2, at X[($j+14)&15],ror#$sigma1[1]
+ eor $T1,$T1, at X[($j+1)&15],lsr#$sigma0[2] // sigma0(X[i+1])
+ add $h,$h,$t0 // h+=Sigma1(e)
+ eor $t3,$t3,$b // Maj(a,b,c)
+ eor $t1,$T0,$a,ror#$Sigma0[2] // Sigma0(a)
+ eor $T2,$T2, at X[($j+14)&15],lsr#$sigma1[2] // sigma1(X[i+14])
+ add @X[$j], at X[$j], at X[($j+9)&15]
+ add $d,$d,$h // d+=h
+ add $h,$h,$t3 // h+=Maj(a,b,c)
+ ldr $t3,[$Ktbl],#$SZ // *K++, $t2 in next round
+ add @X[$j], at X[$j],$T1
+ add $h,$h,$t1 // h+=Sigma0(a)
+ add @X[$j], at X[$j],$T2
+___
+ ($t2,$t3)=($t3,$t2);
+}
+
+$code.=<<___;
+#ifndef __KERNEL__
+# include "arm_arch.h"
+#endif
+
+.text
+
+.extern OPENSSL_armcap_P
+.globl $func
+.type $func,%function
+.align 6
+$func:
+___
+$code.=<<___ if ($SZ==4);
+#ifndef __KERNEL__
+# ifdef __ILP32__
+ ldrsw x16,.LOPENSSL_armcap_P
+# else
+ ldr x16,.LOPENSSL_armcap_P
+# endif
+ adr x17,.LOPENSSL_armcap_P
+ add x16,x16,x17
+ ldr w16,[x16]
+ tst w16,#ARMV8_SHA256
+ b.ne .Lv8_entry
+ tst w16,#ARMV7_NEON
+ b.ne .Lneon_entry
+#endif
+___
+$code.=<<___;
+ stp x29,x30,[sp,#-128]!
+ add x29,sp,#0
+
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+ sub sp,sp,#4*$SZ
+
+ ldp $A,$B,[$ctx] // load context
+ ldp $C,$D,[$ctx,#2*$SZ]
+ ldp $E,$F,[$ctx,#4*$SZ]
+ add $num,$inp,$num,lsl#`log(16*$SZ)/log(2)` // end of input
+ ldp $G,$H,[$ctx,#6*$SZ]
+ adr $Ktbl,.LK$BITS
+ stp $ctx,$num,[x29,#96]
+
+.Loop:
+ ldp @X[0], at X[1],[$inp],#2*$SZ
+ ldr $t2,[$Ktbl],#$SZ // *K++
+ eor $t3,$B,$C // magic seed
+ str $inp,[x29,#112]
+___
+for ($i=0;$i<16;$i++) { &BODY_00_xx($i, at V); unshift(@V,pop(@V)); }
+$code.=".Loop_16_xx:\n";
+for (;$i<32;$i++) { &BODY_00_xx($i, at V); unshift(@V,pop(@V)); }
+$code.=<<___;
+ cbnz $t2,.Loop_16_xx
+
+ ldp $ctx,$num,[x29,#96]
+ ldr $inp,[x29,#112]
+ sub $Ktbl,$Ktbl,#`$SZ*($rounds+1)` // rewind
+
+ ldp @X[0], at X[1],[$ctx]
+ ldp @X[2], at X[3],[$ctx,#2*$SZ]
+ add $inp,$inp,#14*$SZ // advance input pointer
+ ldp @X[4], at X[5],[$ctx,#4*$SZ]
+ add $A,$A, at X[0]
+ ldp @X[6], at X[7],[$ctx,#6*$SZ]
+ add $B,$B, at X[1]
+ add $C,$C, at X[2]
+ add $D,$D, at X[3]
+ stp $A,$B,[$ctx]
+ add $E,$E, at X[4]
+ add $F,$F, at X[5]
+ stp $C,$D,[$ctx,#2*$SZ]
+ add $G,$G, at X[6]
+ add $H,$H,@X[7]
+ cmp $inp,$num
+ stp $E,$F,[$ctx,#4*$SZ]
+ stp $G,$H,[$ctx,#6*$SZ]
+ b.ne .Loop
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#4*$SZ
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#128
+ ret
+.size $func,.-$func
+
+.align 6
+.type .LK$BITS,%object
+.LK$BITS:
+___
+$code.=<<___ if ($SZ==8);
+ .quad 0x428a2f98d728ae22,0x7137449123ef65cd
+ .quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+ .quad 0x3956c25bf348b538,0x59f111f1b605d019
+ .quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118
+ .quad 0xd807aa98a3030242,0x12835b0145706fbe
+ .quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+ .quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1
+ .quad 0x9bdc06a725c71235,0xc19bf174cf692694
+ .quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3
+ .quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+ .quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483
+ .quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+ .quad 0x983e5152ee66dfab,0xa831c66d2db43210
+ .quad 0xb00327c898fb213f,0xbf597fc7beef0ee4
+ .quad 0xc6e00bf33da88fc2,0xd5a79147930aa725
+ .quad 0x06ca6351e003826f,0x142929670a0e6e70
+ .quad 0x27b70a8546d22ffc,0x2e1b21385c26c926
+ .quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+ .quad 0x650a73548baf63de,0x766a0abb3c77b2a8
+ .quad 0x81c2c92e47edaee6,0x92722c851482353b
+ .quad 0xa2bfe8a14cf10364,0xa81a664bbc423001
+ .quad 0xc24b8b70d0f89791,0xc76c51a30654be30
+ .quad 0xd192e819d6ef5218,0xd69906245565a910
+ .quad 0xf40e35855771202a,0x106aa07032bbd1b8
+ .quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53
+ .quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+ .quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+ .quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+ .quad 0x748f82ee5defb2fc,0x78a5636f43172f60
+ .quad 0x84c87814a1f0ab72,0x8cc702081a6439ec
+ .quad 0x90befffa23631e28,0xa4506cebde82bde9
+ .quad 0xbef9a3f7b2c67915,0xc67178f2e372532b
+ .quad 0xca273eceea26619c,0xd186b8c721c0c207
+ .quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+ .quad 0x06f067aa72176fba,0x0a637dc5a2c898a6
+ .quad 0x113f9804bef90dae,0x1b710b35131c471b
+ .quad 0x28db77f523047d84,0x32caab7b40c72493
+ .quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+ .quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+ .quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817
+ .quad 0 // terminator
+___
+$code.=<<___ if ($SZ==4);
+ .long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ .long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ .long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ .long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ .long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ .long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ .long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ .long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ .long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ .long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ .long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ .long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ .long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ .long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ .long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ .long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+ .long 0 //terminator
+___
+$code.=<<___;
+.size .LK$BITS,.-.LK$BITS
+#ifndef __KERNEL__
+.align 3
+.LOPENSSL_armcap_P:
+# ifdef __ILP32__
+ .long OPENSSL_armcap_P-.
+# else
+ .quad OPENSSL_armcap_P-.
+# endif
+#endif
+.asciz "SHA$BITS block transform for ARMv8, CRYPTOGAMS by <appro\@openssl.org>"
+.align 2
+___
+
+if ($SZ==4) {
+my $Ktbl="x3";
+
+my ($ABCD,$EFGH,$abcd)=map("v$_.16b",(0..2));
+my @MSG=map("v$_.16b",(4..7));
+my ($W0,$W1)=("v16.4s","v17.4s");
+my ($ABCD_SAVE,$EFGH_SAVE)=("v18.16b","v19.16b");
+
+$code.=<<___;
+#ifndef __KERNEL__
+.type sha256_block_armv8,%function
+.align 6
+sha256_block_armv8:
+.Lv8_entry:
+ stp x29,x30,[sp,#-16]!
+ add x29,sp,#0
+
+ ld1.32 {$ABCD,$EFGH},[$ctx]
+ adr $Ktbl,.LK256
+
+.Loop_hw:
+ ld1 {@MSG[0]- at MSG[3]},[$inp],#64
+ sub $num,$num,#1
+ ld1.32 {$W0},[$Ktbl],#16
+ rev32 @MSG[0], at MSG[0]
+ rev32 @MSG[1], at MSG[1]
+ rev32 @MSG[2], at MSG[2]
+ rev32 @MSG[3], at MSG[3]
+ orr $ABCD_SAVE,$ABCD,$ABCD // offload
+ orr $EFGH_SAVE,$EFGH,$EFGH
+___
+for($i=0;$i<12;$i++) {
+$code.=<<___;
+ ld1.32 {$W1},[$Ktbl],#16
+ add.i32 $W0,$W0, at MSG[0]
+ sha256su0 @MSG[0], at MSG[1]
+ orr $abcd,$ABCD,$ABCD
+ sha256h $ABCD,$EFGH,$W0
+ sha256h2 $EFGH,$abcd,$W0
+ sha256su1 @MSG[0], at MSG[2], at MSG[3]
+___
+ ($W0,$W1)=($W1,$W0); push(@MSG,shift(@MSG));
+}
+$code.=<<___;
+ ld1.32 {$W1},[$Ktbl],#16
+ add.i32 $W0,$W0, at MSG[0]
+ orr $abcd,$ABCD,$ABCD
+ sha256h $ABCD,$EFGH,$W0
+ sha256h2 $EFGH,$abcd,$W0
+
+ ld1.32 {$W0},[$Ktbl],#16
+ add.i32 $W1,$W1, at MSG[1]
+ orr $abcd,$ABCD,$ABCD
+ sha256h $ABCD,$EFGH,$W1
+ sha256h2 $EFGH,$abcd,$W1
+
+ ld1.32 {$W1},[$Ktbl]
+ add.i32 $W0,$W0, at MSG[2]
+ sub $Ktbl,$Ktbl,#$rounds*$SZ-16 // rewind
+ orr $abcd,$ABCD,$ABCD
+ sha256h $ABCD,$EFGH,$W0
+ sha256h2 $EFGH,$abcd,$W0
+
+ add.i32 $W1,$W1, at MSG[3]
+ orr $abcd,$ABCD,$ABCD
+ sha256h $ABCD,$EFGH,$W1
+ sha256h2 $EFGH,$abcd,$W1
+
+ add.i32 $ABCD,$ABCD,$ABCD_SAVE
+ add.i32 $EFGH,$EFGH,$EFGH_SAVE
+
+ cbnz $num,.Loop_hw
+
+ st1.32 {$ABCD,$EFGH},[$ctx]
+
+ ldr x29,[sp],#16
+ ret
+.size sha256_block_armv8,.-sha256_block_armv8
+#endif
+___
+}
+
+if ($SZ==4) { ######################################### NEON stuff #
+# You'll surely note a lot of similarities with sha256-armv4 module,
+# and of course it's not a coincidence. sha256-armv4 was used as
+# initial template, but was adapted for ARMv8 instruction set and
+# extensively re-tuned for all-round performance.
+
+my @V = ($A,$B,$C,$D,$E,$F,$G,$H) = map("w$_",(3..10));
+my ($t0,$t1,$t2,$t3,$t4) = map("w$_",(11..15));
+my $Ktbl="x16";
+my $Xfer="x17";
+my @X = map("q$_",(0..3));
+my ($T0,$T1,$T2,$T3,$T4,$T5,$T6,$T7) = map("q$_",(4..7,16..19));
+my $j=0;
+
+sub AUTOLOAD() # thunk [simplified] x86-style perlasm
+{ my $opcode = $AUTOLOAD; $opcode =~ s/.*:://; $opcode =~ s/_/\./;
+ my $arg = pop;
+ $arg = "#$arg" if ($arg*1 eq $arg);
+ $code .= "\t$opcode\t".join(',', at _,$arg)."\n";
+}
+
+sub Dscalar { shift =~ m|[qv]([0-9]+)|?"d$1":""; }
+sub Dlo { shift =~ m|[qv]([0-9]+)|?"v$1.d[0]":""; }
+sub Dhi { shift =~ m|[qv]([0-9]+)|?"v$1.d[1]":""; }
+
+sub Xupdate()
+{ use integer;
+ my $body = shift;
+ my @insns = (&$body,&$body,&$body,&$body);
+ my ($a,$b,$c,$d,$e,$f,$g,$h);
+
+ &ext_8 ($T0, at X[0], at X[1],4); # X[1..4]
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ext_8 ($T3, at X[2], at X[3],4); # X[9..12]
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &mov (&Dscalar($T7),&Dhi(@X[3])); # X[14..15]
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T2,$T0,$sigma0[0]);
+ eval(shift(@insns));
+ &ushr_32 ($T1,$T0,$sigma0[2]);
+ eval(shift(@insns));
+ &add_32 (@X[0], at X[0],$T3); # X[0..3] += X[9..12]
+ eval(shift(@insns));
+ &sli_32 ($T2,$T0,32-$sigma0[0]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T3,$T0,$sigma0[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T1,$T1,$T2);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &sli_32 ($T3,$T0,32-$sigma0[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T4,$T7,$sigma1[0]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T1,$T1,$T3); # sigma0(X[1..4])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &sli_32 ($T4,$T7,32-$sigma1[0]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T5,$T7,$sigma1[2]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T3,$T7,$sigma1[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &add_32 (@X[0], at X[0],$T1); # X[0..3] += sigma0(X[1..4])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &sli_u32 ($T3,$T7,32-$sigma1[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T5,$T5,$T4);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T5,$T5,$T3); # sigma1(X[14..15])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &add_32 (@X[0], at X[0],$T5); # X[0..1] += sigma1(X[14..15])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ushr_32 ($T6, at X[0],$sigma1[0]);
+ eval(shift(@insns));
+ &ushr_32 ($T7, at X[0],$sigma1[2]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &sli_32 ($T6, at X[0],32-$sigma1[0]);
+ eval(shift(@insns));
+ &ushr_32 ($T5, at X[0],$sigma1[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T7,$T7,$T6);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &sli_32 ($T5, at X[0],32-$sigma1[1]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ld1_32 ("{$T0}","[$Ktbl], #16");
+ eval(shift(@insns));
+ &eor_8 ($T7,$T7,$T5); # sigma1(X[16..17])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &eor_8 ($T5,$T5,$T5);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &mov (&Dhi($T5), &Dlo($T7));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &add_32 (@X[0], at X[0],$T5); # X[2..3] += sigma1(X[16..17])
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &add_32 ($T0,$T0,@X[0]);
+ while($#insns>=1) { eval(shift(@insns)); }
+ &st1_32 ("{$T0}","[$Xfer], #16");
+ eval(shift(@insns));
+
+ push(@X,shift(@X)); # "rotate" X[]
+}
+
+sub Xpreload()
+{ use integer;
+ my $body = shift;
+ my @insns = (&$body,&$body,&$body,&$body);
+ my ($a,$b,$c,$d,$e,$f,$g,$h);
+
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ld1_8 ("{@X[0]}","[$inp],#16");
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &ld1_32 ("{$T0}","[$Ktbl],#16");
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &rev32 (@X[0], at X[0]);
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ eval(shift(@insns));
+ &add_32 ($T0,$T0, at X[0]);
+ foreach (@insns) { eval; } # remaining instructions
+ &st1_32 ("{$T0}","[$Xfer], #16");
+
+ push(@X,shift(@X)); # "rotate" X[]
+}
+
+sub body_00_15 () {
+ (
+ '($a,$b,$c,$d,$e,$f,$g,$h)=@V;'.
+ '&add ($h,$h,$t1)', # h+=X[i]+K[i]
+ '&add ($a,$a,$t4);'. # h+=Sigma0(a) from the past
+ '&and ($t1,$f,$e)',
+ '&bic ($t4,$g,$e)',
+ '&eor ($t0,$e,$e,"ror#".($Sigma1[1]-$Sigma1[0]))',
+ '&add ($a,$a,$t2)', # h+=Maj(a,b,c) from the past
+ '&orr ($t1,$t1,$t4)', # Ch(e,f,g)
+ '&eor ($t0,$t0,$e,"ror#".($Sigma1[2]-$Sigma1[0]))', # Sigma1(e)
+ '&eor ($t4,$a,$a,"ror#".($Sigma0[1]-$Sigma0[0]))',
+ '&add ($h,$h,$t1)', # h+=Ch(e,f,g)
+ '&ror ($t0,$t0,"#$Sigma1[0]")',
+ '&eor ($t2,$a,$b)', # a^b, b^c in next round
+ '&eor ($t4,$t4,$a,"ror#".($Sigma0[2]-$Sigma0[0]))', # Sigma0(a)
+ '&add ($h,$h,$t0)', # h+=Sigma1(e)
+ '&ldr ($t1,sprintf "[sp,#%d]",4*(($j+1)&15)) if (($j&15)!=15);'.
+ '&ldr ($t1,"[$Ktbl]") if ($j==15);'.
+ '&and ($t3,$t3,$t2)', # (b^c)&=(a^b)
+ '&ror ($t4,$t4,"#$Sigma0[0]")',
+ '&add ($d,$d,$h)', # d+=h
+ '&eor ($t3,$t3,$b)', # Maj(a,b,c)
+ '$j++; unshift(@V,pop(@V)); ($t2,$t3)=($t3,$t2);'
+ )
+}
+
+$code.=<<___;
+#ifdef __KERNEL__
+.globl sha256_block_neon
+#endif
+.type sha256_block_neon,%function
+.align 4
+sha256_block_neon:
+.Lneon_entry:
+ stp x29, x30, [sp, #-16]!
+ mov x29, sp
+ sub sp,sp,#16*4
+
+ adr $Ktbl,.LK256
+ add $num,$inp,$num,lsl#6 // len to point at the end of inp
+
+ ld1.8 {@X[0]},[$inp], #16
+ ld1.8 {@X[1]},[$inp], #16
+ ld1.8 {@X[2]},[$inp], #16
+ ld1.8 {@X[3]},[$inp], #16
+ ld1.32 {$T0},[$Ktbl], #16
+ ld1.32 {$T1},[$Ktbl], #16
+ ld1.32 {$T2},[$Ktbl], #16
+ ld1.32 {$T3},[$Ktbl], #16
+ rev32 @X[0], at X[0] // yes, even on
+ rev32 @X[1], at X[1] // big-endian
+ rev32 @X[2], at X[2]
+ rev32 @X[3], at X[3]
+ mov $Xfer,sp
+ add.32 $T0,$T0, at X[0]
+ add.32 $T1,$T1, at X[1]
+ add.32 $T2,$T2, at X[2]
+ st1.32 {$T0-$T1},[$Xfer], #32
+ add.32 $T3,$T3,@X[3]
+ st1.32 {$T2-$T3},[$Xfer]
+ sub $Xfer,$Xfer,#32
+
+ ldp $A,$B,[$ctx]
+ ldp $C,$D,[$ctx,#8]
+ ldp $E,$F,[$ctx,#16]
+ ldp $G,$H,[$ctx,#24]
+ ldr $t1,[sp,#0]
+ mov $t2,wzr
+ eor $t3,$B,$C
+ mov $t4,wzr
+ b .L_00_48
+
+.align 4
+.L_00_48:
+___
+ &Xupdate(\&body_00_15);
+ &Xupdate(\&body_00_15);
+ &Xupdate(\&body_00_15);
+ &Xupdate(\&body_00_15);
+$code.=<<___;
+ cmp $t1,#0 // check for K256 terminator
+ ldr $t1,[sp,#0]
+ sub $Xfer,$Xfer,#64
+ bne .L_00_48
+
+ sub $Ktbl,$Ktbl,#256 // rewind $Ktbl
+ cmp $inp,$num
+ mov $Xfer, #64
+ csel $Xfer, $Xfer, xzr, eq
+ sub $inp,$inp,$Xfer // avoid SEGV
+ mov $Xfer,sp
+___
+ &Xpreload(\&body_00_15);
+ &Xpreload(\&body_00_15);
+ &Xpreload(\&body_00_15);
+ &Xpreload(\&body_00_15);
+$code.=<<___;
+ add $A,$A,$t4 // h+=Sigma0(a) from the past
+ ldp $t0,$t1,[$ctx,#0]
+ add $A,$A,$t2 // h+=Maj(a,b,c) from the past
+ ldp $t2,$t3,[$ctx,#8]
+ add $A,$A,$t0 // accumulate
+ add $B,$B,$t1
+ ldp $t0,$t1,[$ctx,#16]
+ add $C,$C,$t2
+ add $D,$D,$t3
+ ldp $t2,$t3,[$ctx,#24]
+ add $E,$E,$t0
+ add $F,$F,$t1
+ ldr $t1,[sp,#0]
+ stp $A,$B,[$ctx,#0]
+ add $G,$G,$t2
+ mov $t2,wzr
+ stp $C,$D,[$ctx,#8]
+ add $H,$H,$t3
+ stp $E,$F,[$ctx,#16]
+ eor $t3,$B,$C
+ stp $G,$H,[$ctx,#24]
+ mov $t4,wzr
+ mov $Xfer,sp
+ b.ne .L_00_48
+
+ ldr x29,[x29]
+ add sp,sp,#16*4+16
+ ret
+.size sha256_block_neon,.-sha256_block_neon
+___
+}
+
+$code.=<<___;
+#ifndef __KERNEL__
+.comm OPENSSL_armcap_P,4,4
+#endif
+___
+
+{ my %opcode = (
+ "sha256h" => 0x5e004000, "sha256h2" => 0x5e005000,
+ "sha256su0" => 0x5e282800, "sha256su1" => 0x5e006000 );
+
+ sub unsha256 {
+ my ($mnemonic,$arg)=@_;
+
+ $arg =~ m/[qv]([0-9]+)[^,]*,\s*[qv]([0-9]+)[^,]*(?:,\s*[qv]([0-9]+))?/o
+ &&
+ sprintf ".inst\t0x%08x\t//%s %s",
+ $opcode{$mnemonic}|$1|($2<<5)|($3<<16),
+ $mnemonic,$arg;
+ }
+}
+
+open SELF,$0;
+while(<SELF>) {
+ next if (/^#!/);
+ last if (!s/^#/\/\// and !/^$/);
+ print;
+}
+close SELF;
+
+foreach(split("\n",$code)) {
+
+ s/\`([^\`]*)\`/eval($1)/ge;
+
+ s/\b(sha256\w+)\s+([qv].*)/unsha256($1,$2)/ge;
+
+ s/\bq([0-9]+)\b/v$1.16b/g; # old->new registers
+
+ s/\.[ui]?8(\s)/$1/;
+ s/\.\w?32\b// and s/\.16b/\.4s/g;
+ m/(ld|st)1[^\[]+\[0\]/ and s/\.4s/\.s/g;
+
+ print $_,"\n";
+}
+
+close STDOUT;
diff --git a/arch/arm64/crypto/sha512-glue.c b/arch/arm64/crypto/sha512-glue.c
new file mode 100644
index 000000000000..aff35c9992a4
--- /dev/null
+++ b/arch/arm64/crypto/sha512-glue.c
@@ -0,0 +1,94 @@
+/*
+ * Linux/arm64 port of the OpenSSL SHA512 implementation for AArch64
+ *
+ * Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <crypto/internal/hash.h>
+#include <linux/cryptohash.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <crypto/sha.h>
+#include <crypto/sha512_base.h>
+#include <asm/neon.h>
+
+MODULE_DESCRIPTION("SHA-384/SHA-512 secure hash for arm64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha384");
+MODULE_ALIAS_CRYPTO("sha512");
+
+asmlinkage void sha512_block_data_order(u32 *digest, const void *data,
+ unsigned int num_blks);
+
+static int sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ return sha512_base_do_update(desc, data, len,
+ (sha512_block_fn *)sha512_block_data_order);
+}
+
+static int sha512_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (len)
+ sha512_base_do_update(desc, data, len,
+ (sha512_block_fn *)sha512_block_data_order);
+ sha512_base_do_finalize(desc,
+ (sha512_block_fn *)sha512_block_data_order);
+
+ return sha512_base_finish(desc, out);
+}
+
+static int sha512_final(struct shash_desc *desc, u8 *out)
+{
+ return sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg algs[] = { {
+ .digestsize = SHA512_DIGEST_SIZE,
+ .init = sha512_base_init,
+ .update = sha512_update,
+ .final = sha512_final,
+ .finup = sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .base.cra_name = "sha512",
+ .base.cra_driver_name = "sha512-arm64",
+ .base.cra_priority = 150,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA512_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+}, {
+ .digestsize = SHA384_DIGEST_SIZE,
+ .init = sha384_base_init,
+ .update = sha512_update,
+ .final = sha512_final,
+ .finup = sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .base.cra_name = "sha384",
+ .base.cra_driver_name = "sha384-arm64",
+ .base.cra_priority = 150,
+ .base.cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .base.cra_blocksize = SHA384_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+} };
+
+static int __init sha512_mod_init(void)
+{
+ return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha512_mod_fini(void)
+{
+ crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_init(sha512_mod_init);
+module_exit(sha512_mod_fini);
--
2.7.4
^ permalink raw reply related
* [PATCH v14 4/9] acpi/arm64: Add GTDT table parse driver
From: Hanjun Guo @ 2016-11-11 13:46 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161020163719.GC27598@leverpostej>
Hi Mark,
Sorry for the late reply.
On 10/21/2016 12:37 AM, Mark Rutland wrote:
> Hi,
>
> As a heads-up, on v4.9-rc1 I see conflicts at least against
> arch/arm64/Kconfig. Luckily git am -3 seems to be able to fix that up
> automatically, but this will need to be rebased before the next posting
> and/or merging.
>
> On Thu, Sep 29, 2016 at 02:17:12AM +0800, fu.wei at linaro.org wrote:
>> +static int __init map_gt_gsi(u32 interrupt, u32 flags)
>> +{
>> + int trigger, polarity;
>> +
>> + if (!interrupt)
>> + return 0;
>
> Urgh.
>
> Only the secure interrupt (which we do not need) is optional in this
> manner, and (hilariously), zero appears to also be a valid GSIV, per
> figure 5-24 in the ACPI 6.1 spec.
>
> So, I think that:
>
> (a) we should not bother parsing the secure interrupt
> (b) we should drop the check above
> (c) we should report the spec issue to the ASWG
Sorry, I willing to do that, but I need to figure out the issue here.
What kind of issue in detail? do you mean that zero should not be valid
for arch timer interrupts?
Thanks
Hanjun
^ permalink raw reply
* [PATCH v14 4/9] acpi/arm64: Add GTDT table parse driver
From: Hanjun Guo @ 2016-11-11 13:43 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CADyBb7tLJpPiMqWs4KZE=uCpo3HX3WcNVju2UKNuPOR0+M4UDw@mail.gmail.com>
On 10/26/2016 07:10 PM, Fu Wei wrote:
> Hi Mark,
>
> On 21 October 2016 at 00:37, Mark Rutland <mark.rutland@arm.com> wrote:
>> Hi,
>>
>> As a heads-up, on v4.9-rc1 I see conflicts at least against
>> arch/arm64/Kconfig. Luckily git am -3 seems to be able to fix that up
>> automatically, but this will need to be rebased before the next posting
>> and/or merging.
>>
>> On Thu, Sep 29, 2016 at 02:17:12AM +0800, fu.wei at linaro.org wrote:
>>> +static int __init map_gt_gsi(u32 interrupt, u32 flags)
>>> +{
>>> + int trigger, polarity;
>>> +
>>> + if (!interrupt)
>>> + return 0;
>>
>> Urgh.
>>
>> Only the secure interrupt (which we do not need) is optional in this
>> manner, and (hilariously), zero appears to also be a valid GSIV, per
>> figure 5-24 in the ACPI 6.1 spec.
>>
>> So, I think that:
>>
>> (a) we should not bother parsing the secure interrupt
>
> If I understand correctly, from this point of view, kernel don't
> handle the secure interrupt.
> But the current arm_arch_timer driver still enable/disable/request
> PHYS_SECURE_PPI
> with PHYS_NONSECURE_PPI.
> That means we still need to parse the secure interrupt.
> Please correct me, if I misunderstand something? :-)
>
>> (b) we should drop the check above
>
> yes, if zero is a valid GSIV, this makes sense.
>
>> (c) we should report the spec issue to the ASWG
>>
>>> +/*
>>> + * acpi_gtdt_c3stop - got c3stop info from GTDT
>>> + *
>>> + * Returns 1 if the timer is powered in deep idle state, 0 otherwise.
>>> + */
>>> +bool __init acpi_gtdt_c3stop(void)
>>> +{
>>> + struct acpi_table_gtdt *gtdt = acpi_gtdt_desc.gtdt;
>>> +
>>> + return !(gtdt->non_secure_el1_flags & ACPI_GTDT_ALWAYS_ON);
>>> +}
>>
>> It looks like this can differ per interrupt. Shouldn't we check the
>> appropriate one?
>
> yes, I think you are right.
I think Mark already clarified this it's a global flag which defined
in the spec, and we don't need to update it.
Thanks
Hanjun
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox