* [PATCH v2 3/4] Xen: Select correct dom0 console
From: Andre Przywara @ 2016-11-22 15:09 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122150917.16524-1-andre.przywara@arm.com>
From: Ian Campbell <ian.campbell@citrix.com>
If Xen is enabled, tell Dom0 to use the 'hvc0' console, and fall back to
the usual ttyAMA0 otherwise.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
---
configure.ac | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/configure.ac b/configure.ac
index aff4aad..c959ab8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -92,7 +92,8 @@ AC_ARG_WITH([initrd],
AC_SUBST([FILESYSTEM], [$USE_INITRD])
AM_CONDITIONAL([INITRD], [test "x$USE_INITRD" != "x"])
-C_CMDLINE="console=ttyAMA0 earlyprintk=pl011,0x1c090000"
+AS_IF([test "x$X_IMAGE" = "x"],[C_CONSOLE="ttyAMA0"],[C_CONSOLE="hvc0"])
+C_CMDLINE="console=$C_CONSOLE earlyprintk=pl011,0x1c090000"
AC_ARG_WITH([cmdline],
AS_HELP_STRING([--with-cmdline], [set a command line for the kernel]),
[C_CMDLINE=$withval])
--
2.9.0
^ permalink raw reply related
* [PATCH v2 2/4] Xen: Support adding DT nodes
From: Andre Przywara @ 2016-11-22 15:09 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122150917.16524-1-andre.przywara@arm.com>
From: Christoffer Dall <christoffer.dall@linaro.org>
Support adding xen,xen-bootargs node via --with-xen-bootargs to the
configure script and automatically add the Dom0 node to the DT as well.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Makefile.am | 23 +++++++++++++++--------
configure.ac | 9 +++++++++
2 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/Makefile.am b/Makefile.am
index f8b9ec9..db97f9c 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -96,21 +96,28 @@ FDT_OFFSET := 0x08000000
if XEN
XEN := -DXEN=$(XEN_IMAGE)
XEN_OFFSET := 0x08200000
+KERNEL_SIZE := $(shell stat -Lc %s $(KERNEL_IMAGE) 2>/dev/null || echo 0)
+DOM0_OFFSET := $(shell echo $$(($(PHYS_OFFSET) + $(KERNEL_OFFSET))))
+XEN_BOOTARGS := xen,xen-bootargs = \"$(XEN_CMDLINE)\"; \
+ \#address-cells = <2>; \
+ \#size-cells = <2>; \
+ module at 1 { \
+ compatible = \"xen,linux-zimage\", \"xen,multiboot-module\"; \
+ reg = <0x0 $(DOM0_OFFSET) 0x0 $(KERNEL_SIZE)>; \
+ };
endif
if INITRD
INITRD_FLAGS := -DUSE_INITRD
+INITRD_CHOSEN := linux,initrd-start = <$(FILESYSTEM_START)>; \
+ linux,initrd-end = <$(FILESYSTEM_END)>;
+endif
+
CHOSEN_NODE := chosen { \
bootargs = \"$(CMDLINE)\"; \
- linux,initrd-start = <$(FILESYSTEM_START)>; \
- linux,initrd-end = <$(FILESYSTEM_END)>; \
- };
-else
-INITRD_FLAGS :=
-CHOSEN_NODE := chosen { \
- bootargs = \"$(CMDLINE)\"; \
+ $(INITRD_CHOSEN) \
+ $(XEN_BOOTARGS) \
};
-endif
CPPFLAGS += $(INITRD_FLAGS)
CFLAGS += -Iinclude/ -I$(ARCH_SRC)/include/
diff --git a/configure.ac b/configure.ac
index f7e24c7..aff4aad 100644
--- a/configure.ac
+++ b/configure.ac
@@ -98,6 +98,12 @@ AC_ARG_WITH([cmdline],
[C_CMDLINE=$withval])
AC_SUBST([CMDLINE], [$C_CMDLINE])
+X_CMDLINE="console=dtuart dtuart=serial0 no-bootscrub"
+AC_ARG_WITH([xen-cmdline],
+ AS_HELP_STRING([--with-xen-cmdline], [set Xen command line]),
+ [X_CMDLINE=$withval])
+AC_SUBST([XEN_CMDLINE], [$X_CMDLINE])
+
# Allow a user to pass --enable-gicv3
AC_ARG_ENABLE([gicv3],
AS_HELP_STRING([--enable-gicv3], [enable GICv3 instead of GICv2]),
@@ -136,4 +142,7 @@ echo " Use GICv3? ${USE_GICV3}"
echo " Boot-wrapper execution state: AArch${BOOTWRAPPER_ES}"
echo " Kernel execution state: AArch${KERNEL_ES}"
echo " Xen image ${XEN_IMAGE:-NONE}"
+if test "x${XEN_IMAGE}" != "x"; then
+echo " Xen command line: ${XEN_CMDLINE}"
+fi
echo ""
--
2.9.0
^ permalink raw reply related
* [PATCH v2 1/4] Support for building in a Xen binary
From: Andre Przywara @ 2016-11-22 15:09 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122150917.16524-1-andre.przywara@arm.com>
From: Christoffer Dall <christoffer.dall@linaro.org>
Add support for building a Xen binary which includes a Dom0 image and
the Dom0 command-line.
If the user specifies --with-xen=<Xen>, where Xen is an appropriate
AArch64 Xen binary, the build system will generate a xen-system.axf
instead of a linux-system.axf.
Original patch from Ian Campbell, but I modified most of it so all bugs
are probably mine.
[Andre: adapt to newest boot-wrapper branch, increase load address,
fixup Xen image file test]
Cc: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
.gitignore | 1 +
Makefile.am | 10 +++++++---
boot_common.c | 4 ++--
configure.ac | 17 +++++++++++++++++
model.lds.S | 14 ++++++++++++++
5 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/.gitignore b/.gitignore
index 8653852..80770c0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ configure
dtc
fdt.dtb
Image
+Xen
install-sh
Makefile
Makefile.in
diff --git a/Makefile.am b/Makefile.am
index 692d2cc..f8b9ec9 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -85,7 +85,6 @@ TEXT_LIMIT := 0x80080000
endif
LD_SCRIPT := model.lds.S
-IMAGE := linux-system.axf
FS_OFFSET := 0x10000000
FILESYSTEM_START:= $(shell echo $$(($(PHYS_OFFSET) + $(FS_OFFSET))))
@@ -94,6 +93,11 @@ FILESYSTEM_END := $(shell echo $$(($(FILESYSTEM_START) + $(FILESYSTEM_SIZE))))
FDT_OFFSET := 0x08000000
+if XEN
+XEN := -DXEN=$(XEN_IMAGE)
+XEN_OFFSET := 0x08200000
+endif
+
if INITRD
INITRD_FLAGS := -DUSE_INITRD
CHOSEN_NODE := chosen { \
@@ -121,7 +125,7 @@ all: $(IMAGE)
CLEANFILES = $(IMAGE) $(OFILES) model.lds fdt.dtb
-$(IMAGE): $(OFILES) model.lds fdt.dtb $(KERNEL_IMAGE) $(FILESYSTEM)
+$(IMAGE): $(OFILES) model.lds fdt.dtb $(KERNEL_IMAGE) $(FILESYSTEM) $(XEN_IMAGE)
$(LD) $(LDFLAGS) $(OFILES) -o $@ --script=model.lds
%.o: %.S Makefile
@@ -131,7 +135,7 @@ $(IMAGE): $(OFILES) model.lds fdt.dtb $(KERNEL_IMAGE) $(FILESYSTEM)
$(CC) $(CPPFLAGS) $(CFLAGS) $(DEFINES) -c -o $@ $<
model.lds: $(LD_SCRIPT) Makefile
- $(CPP) $(CPPFLAGS) -ansi -DPHYS_OFFSET=$(PHYS_OFFSET) -DMBOX_OFFSET=$(MBOX_OFFSET) -DKERNEL_OFFSET=$(KERNEL_OFFSET) -DFDT_OFFSET=$(FDT_OFFSET) -DFS_OFFSET=$(FS_OFFSET) -DKERNEL=$(KERNEL_IMAGE) -DFILESYSTEM=$(FILESYSTEM) -DTEXT_LIMIT=$(TEXT_LIMIT) -P -C -o $@ $<
+ $(CPP) $(CPPFLAGS) -ansi -DPHYS_OFFSET=$(PHYS_OFFSET) -DMBOX_OFFSET=$(MBOX_OFFSET) -DKERNEL_OFFSET=$(KERNEL_OFFSET) -DFDT_OFFSET=$(FDT_OFFSET) -DFS_OFFSET=$(FS_OFFSET) $(XEN) -DXEN_OFFSET=$(XEN_OFFSET) -DKERNEL=$(KERNEL_IMAGE) -DFILESYSTEM=$(FILESYSTEM) -DTEXT_LIMIT=$(TEXT_LIMIT) -P -C -o $@ $<
fdt.dtb: $(KERNEL_DTB) Makefile gen-cpu-nodes.sh
( $(DTC) -O dts -I dtb $(KERNEL_DTB) ; echo "/ { $(CHOSEN_NODE) $(PSCI_NODE) $(CPUS_NODE) };" ) | $(DTC) -O dtb -o $@ -
diff --git a/boot_common.c b/boot_common.c
index 4947fe3..e7b8e1d 100644
--- a/boot_common.c
+++ b/boot_common.c
@@ -9,7 +9,7 @@
#include <cpu.h>
#include <spin.h>
-extern unsigned long kernel;
+extern unsigned long entrypoint;
extern unsigned long dtb;
void init_platform(void);
@@ -67,7 +67,7 @@ void __noreturn first_spin(unsigned int cpu, unsigned long *mbox,
if (cpu == 0) {
init_platform();
- *mbox = (unsigned long)&kernel;
+ *mbox = (unsigned long)&entrypoint;
sevl();
spin(mbox, invalid, 1);
} else {
diff --git a/configure.ac b/configure.ac
index ab8f5b3..f7e24c7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -40,6 +40,18 @@ AC_ARG_WITH([dtb],
AS_HELP_STRING([--with-dtb], [Specify a particular DTB to use]),
[KERN_DTB="$withval"])
+AC_ARG_WITH([xen],
+ AS_HELP_STRING([--with-xen], [Compile for Xen, and Specify a particular Xen to use]),
+ X_IMAGE=$withval)
+
+AS_IF([test "x$X_IMAGE" == "x"], [],
+ [AS_IF([test ! -f "$X_IMAGE"],
+ [AC_MSG_ERROR([Could not find Xen hypervisor binary: $X_IMAGE])]
+ )]
+)
+AC_SUBST([XEN_IMAGE], [$X_IMAGE])
+AM_CONDITIONAL([XEN], [test "x$X_IMAGE" != "x"])
+
# Ensure that the user has provided us with a sane kernel dir.
m4_define([CHECKFILES], [KERN_DIR,
KERN_DTB,
@@ -50,6 +62,10 @@ m4_foreach([checkfile], [CHECKFILES],
AC_SUBST([KERNEL_IMAGE], [$KERN_IMAGE])
AC_SUBST([KERNEL_DTB], [$KERN_DTB])
+AS_IF([test "x$X_IMAGE" != "x"],
+ [AC_SUBST([IMAGE], ["xen-system.axf"])],
+ [AC_SUBST([IMAGE], ["linux-system.axf"])]
+)
# Allow a user to pass --enable-psci
AC_ARG_ENABLE([psci],
@@ -119,4 +135,5 @@ echo " CPU IDs: ${CPU_IDS}"
echo " Use GICv3? ${USE_GICV3}"
echo " Boot-wrapper execution state: AArch${BOOTWRAPPER_ES}"
echo " Kernel execution state: AArch${KERNEL_ES}"
+echo " Xen image ${XEN_IMAGE:-NONE}"
echo ""
diff --git a/model.lds.S b/model.lds.S
index 51c5684..511f552 100644
--- a/model.lds.S
+++ b/model.lds.S
@@ -16,6 +16,9 @@ OUTPUT_ARCH(aarch64)
#endif
TARGET(binary)
+#ifdef XEN
+INPUT(XEN)
+#endif
INPUT(KERNEL)
INPUT(./fdt.dtb)
@@ -36,6 +39,17 @@ SECTIONS
KERNEL
}
+#ifdef XEN
+ .xen (PHYS_OFFSET + XEN_OFFSET): {
+ xen = .;
+ XEN
+ }
+
+ entrypoint = xen;
+#else
+ entrypoint = kernel;
+#endif
+
.dtb (PHYS_OFFSET + FDT_OFFSET): {
dtb = .;
./fdt.dtb
--
2.9.0
^ permalink raw reply related
* [PATCH v2 0/4] boot-wrapper: arm64: Xen support
From: Andre Przywara @ 2016-11-22 15:09 UTC (permalink / raw)
To: linux-arm-kernel
These patches allow to include a Xen hypervisor binary into a boot-wrapper
ELF file, so that a Foundation Platform or a Fast Model can boot a Xen
system (including a Dom0 kernel).
The original patches have been around for a while, so this series is
merely an update to apply on the latest boot-wrapper HEAD. Also I fixed
minor things here and there (like increasing Xen's load address to
accomodate for Dom0 kernels bigger than 16 MB) and addressed Julien's
review comments.
For testing this just add: "--with-xen=/path/to/src/xen/xen" to the
./configure command line and feed the resulting xen-system.axf file to
the model.
Cheers,
Andre.
Changelog v1 .. v2:
- use "xen-cmdline" instead of bootargs as parameter and variable name
- move hunk in patch 1/4 to make patch 2/4 smaller
- replace AC_FILE_CHECK macro usage to fix cross compilation
Christoffer Dall (3):
Support for building in a Xen binary
Xen: Support adding DT nodes
Explicitly clean linux-system.axf and xen-system.axf
Ian Campbell (1):
Xen: Select correct dom0 console
.gitignore | 1 +
Makefile.am | 35 +++++++++++++++++++++++------------
boot_common.c | 4 ++--
configure.ac | 29 ++++++++++++++++++++++++++++-
model.lds.S | 14 ++++++++++++++
5 files changed, 68 insertions(+), 15 deletions(-)
--
2.9.0
^ permalink raw reply
* [PATCH 1/3] of: base: add support to get machine compatible string
From: Sekhar Nori @ 2016-11-22 15:06 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <5ce9fb9f-459a-562b-2e9f-85d35f9ec035@arm.com>
Hi Sudeep,
On Tuesday 22 November 2016 04:23 PM, Sudeep Holla wrote:
>
>
> On 22/11/16 10:41, Bartosz Golaszewski wrote:
>> Add a function allowing to retrieve the compatible string of the root
>> node of the device tree.
>>
>
> Rob has queued [1] and it's in -next today. You can reuse that if you
> are planning to target this for v4.11 or just use open coding in your
> driver for v4.10 and target this move for v4.11 to avoid cross tree
> dependencies as I already mentioned in your previous thread.
I dont have your original patch in my mailbox, but I wonder if
returning a pointer to property string for a node whose reference has
already been released is safe to do? Probably not an issue for the root
node, but still feels counter-intuitive.
This is the code for reference:
+int of_machine_get_model_name(const char **model)
+{
+ int error;
+
+ if (!of_node_get(of_root))
+ return -EINVAL;
+
+ error = of_property_read_string(of_root, "model", model);
+ if (error)
+ error = of_property_read_string_index(of_root, "compatible",
+ 0, model);
+ of_node_put(of_root);
+
+ return error;
+}
+EXPORT_SYMBOL(of_machine_get_model_name);
Thanks,
Sekhar
^ permalink raw reply
* [PATCH 2/4] arm64: dts: rockchip: Arch counter doesn't tick in system suspend
From: Heiko Stübner @ 2016-11-22 15:02 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1479807866-6957-2-git-send-email-daniel.lezcano@linaro.org>
Am Dienstag, 22. November 2016, 10:44:22 schrieb Daniel Lezcano:
> From: Brian Norris <briannorris@chromium.org>
>
> The "arm,no-tick-in-suspend" property was introduced to note
> implementations where the system counter does not quite follow the ARM
> specification that it "must be implemented in an always-on power
> domain".
>
> Particularly, RK3399's counter stops ticking when we switch from the
> 24MHz clock to the 32KHz clock in low-power suspend, so let's mark it as
> such.
>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
not sure if needed, but anyway
Acked-by: Heiko Stuebner <heiko@sntech.de>
> ---
> arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index b65c193..d85b651 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> @@ -174,6 +174,7 @@
> <GIC_PPI 14 IRQ_TYPE_LEVEL_LOW 0>,
> <GIC_PPI 11 IRQ_TYPE_LEVEL_LOW 0>,
> <GIC_PPI 10 IRQ_TYPE_LEVEL_LOW 0>;
> + arm,no-tick-in-suspend;
> };
>
> xin24m: xin24m {
^ permalink raw reply
* [GIT PULL] Allwinner clock fixes for 4.9, take 2
From: Maxime Ripard @ 2016-11-22 14:59 UTC (permalink / raw)
To: linux-arm-kernel
Hi Mike, Stephen,
Here are two new fixes for the 4.9 release.
That should be my final PR for both the 4.9 fixes and 4.10
developments.
Thanks!
Maxime
The following changes since commit ac95330b96376550ae7a533d1396272d675adfa2:
clk: sunxi: Fix M factor computation for APB1 (2016-11-04 08:49:46 +0100)
are available in the git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux.git tags/sunxi-clk-fixes-for-4.9-2
for you to fetch changes up to df8e3c1fe1050fa2749c346676d12f4d30dcd935:
clk: sunxi-ng: enable so-said LDOs for A33 SoC's pll-mipi clock (2016-11-21 19:50:49 +0100)
----------------------------------------------------------------
Allwinner clock fixes for 4.9, take 2
Two new patches to fix the gating of the MIPI PLL on the A23, A31 and A33.
----------------------------------------------------------------
Chen-Yu Tsai (1):
clk: sunxi-ng: sun6i-a31: Enable PLL-MIPI LDOs when ungating it
Icenowy Zheng (1):
clk: sunxi-ng: enable so-said LDOs for A33 SoC's pll-mipi clock
drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 2 +-
drivers/clk/sunxi-ng/ccu-sun8i-a33.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161122/ca7df353/attachment.sig>
^ permalink raw reply
* [GIT PULL] Allwinner arm64 DT changes for 4.10, bis
From: Maxime Ripard @ 2016-11-22 14:47 UTC (permalink / raw)
To: linux-arm-kernel
Hi Arnd, Olof,
Here is a second run at the arm64 PR I sent earlier.
We had a (build) dependency against the clock patches that have been
merged by Stephen and Mike through the clock tree.
Hence, this new PR is based on top of the tag I sent their way, and it
supersedes the previous arm64 PR.
I used this occasion to rename the patches prefix as Olof suggested in
the earlier pull request.
Thanks,
Maxime
The following changes since commit 0f6f9302b819ca352cfd4f42c18ec08d521f9cae:
clk: sunxi-ng: sun8i-h3: Set CLK_SET_RATE_PARENT for audio module clocks (2016-11-11 21:47:41 +0100)
are available in the git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux.git tags/sunxi-dt64-for-4.10-bis
for you to fetch changes up to 5757867e8758f40227e710625df64a471ec81411:
arm64: dts: allwinner: add Pine64 support (2016-11-22 15:43:01 +0100)
----------------------------------------------------------------
Allwinner arm64 DT changes for 4.10, bis
Support for the Allwinner A64, their first armv8 SoC.
----------------------------------------------------------------
Andre Przywara (3):
arm64: dts: allwinner: add Allwinner A64 SoC .dtsi
Documentation: devicetree: add vendor prefix for Pine64
arm64: dts: allwinner: add Pine64 support
Documentation/devicetree/bindings/arm/sunxi.txt | 1 +
.../devicetree/bindings/vendor-prefixes.txt | 1 +
MAINTAINERS | 1 +
arch/arm64/boot/dts/Makefile | 1 +
arch/arm64/boot/dts/allwinner/Makefile | 5 +
.../boot/dts/allwinner/sun50i-a64-pine64-plus.dts | 50 ++++
.../arm64/boot/dts/allwinner/sun50i-a64-pine64.dts | 74 ++++++
arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 263 +++++++++++++++++++++
8 files changed, 396 insertions(+)
create mode 100644 arch/arm64/boot/dts/allwinner/Makefile
create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts
create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts
create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161122/3b840c1a/attachment.sig>
^ permalink raw reply
* [GIT PULL] Allwinner fixes for 4.9, take 2
From: Maxime Ripard @ 2016-11-22 14:31 UTC (permalink / raw)
To: linux-arm-kernel
Hi Arnd, Olof,
Please find attached a commit to rename the GR8 device trees.
Thanks!
Maxime
The following changes since commit b7f865ede20c87073216f77fe97f6fc56666e3da:
ARM: dts: sun8i: fix the pinmux for UART1 (2016-10-25 12:51:06 +0200)
are available in the git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux.git tags/sunxi-fixes-for-4.9-2
for you to fetch changes up to e5cd7ff7058dc6f2133455636809a09b691ee419:
ARM: gr8: Rename the DTSI and relevant DTS (2016-11-22 15:06:04 +0100)
----------------------------------------------------------------
Allwinner fixes for 4.9, second iteration
A renaming of the GR8 DTSI and DTS to make it explicitly part of the sun5i
family.
----------------------------------------------------------------
Maxime Ripard (1):
ARM: gr8: Rename the DTSI and relevant DTS
arch/arm/boot/dts/Makefile | 2 +-
arch/arm/boot/dts/{ntc-gr8-evb.dts => sun5i-gr8-evb.dts} | 2 +-
arch/arm/boot/dts/{ntc-gr8.dtsi => sun5i-gr8.dtsi} | 0
3 files changed, 2 insertions(+), 2 deletions(-)
rename arch/arm/boot/dts/{ntc-gr8-evb.dts => sun5i-gr8-evb.dts} (99%)
rename arch/arm/boot/dts/{ntc-gr8.dtsi => sun5i-gr8.dtsi} (100%)
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161122/b6f72872/attachment.sig>
^ permalink raw reply
* [PATCH] ARM: fix kmemleak for XIP_KERNEL
From: Arnd Bergmann @ 2016-11-22 14:28 UTC (permalink / raw)
To: linux-arm-kernel
The newly added check for RO_AFTER_INIT_DATA in kmemleak breaks ARM whenever
XIP_KERNEL is enabled:
mm/kmemleak.o: In function `kmemleak_scan':
kmemleak.c:(.text.kmemleak_scan+0x2e4): undefined reference to `__end_data_ro_after_init'
kmemleak.c:(.text.kmemleak_scan+0x2e8): undefined reference to `__start_data_ro_after_init'
This adds the start/end symbols for the section even in the case of having
no data in the section, to make the check work while keeping the architecture
specific override in one place.
Fixes: d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
The patch causing this was merged late into v4.9-rc, this one should
probably go there as well.
I assume the same problem exists on s390, but I have not checked that.
---
arch/arm/kernel/vmlinux-xip.lds.S | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index 06c178214629..bf900f5421a1 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -3,8 +3,11 @@
* Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
*/
-/* No __ro_after_init data in the .rodata section - which will always be ro */
-#define RO_AFTER_INIT_DATA
+/* empty __ro_after_init data in the .rodata section - it will always be ro */
+#define RO_AFTER_INIT_DATA \
+ __start_data_ro_after_init = .; \
+ __end_data_ro_after_init = .;
+
#include <asm-generic/vmlinux.lds.h>
#include <asm/cache.h>
--
2.9.0
^ permalink raw reply related
* [PATCH V8 2/6] thermal: bcm2835: add thermal driver for bcm2835 soc
From: Martin Sperl @ 2016-11-22 14:28 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161119042224.GA25063@localhost.localdomain>
Hi Eduardo!
On 19.11.2016 05:22, Eduardo Valentin wrote:
> Hello Martin,
>
> Thanks for your patience to take the time to explain to me how the
> firmware/linux split is done in your platform. Still, one thing is not
> clear to me.
>
> On Fri, Nov 18, 2016 at 09:32:47AM +0100, kernel at martin.sperl.org wrote:
>>
>
>> The way that firmware works on the RPI is quite different from most others I guess.
>> in principle you got 2 different CPUs on the bcm2835:
>> * ARM, which runs the linux instance
>> * VideoCore 4, which runs the firmware (loading from SD initially) and
>> then booting the ARM.
>>
>> So this Firmware on VC4 is the one that I am talking about.
>> Without the working firmware linux can not boot on arm.
>
> Given that "without the working firmware linux can not boot on arm",
>
> (...)
>
>> As far as I understand the conversion is continuous (as soon as the HW is
>> configured). This case is there primarily to handle the situation where
>> we initialize the HW ourselves (see line 226 and below), and we immediately
>
> and around line 226 we have the comment:
> + /*
> + * right now the FW does set up the HW-block, so we are not
> + * touching the configuration registers.
> + * But if the HW is not enabled, then set it up
> + * using "sane" values used by the firmware right now.
> + */
>
>
>> want to read the ADC value before the first conversion is finished.
>>
>
> then, does the firmware initializes the device or not?
>
Yes, it does (normally)
> What are the cases you would load this driver but still get an
> uninitialized device? That looks like some bug workaround hidden
> somewhere. Do system integrators/engineers need to be aware of this w/a?
> Would the driver work right aways when the subsystem is loaded during
> boot? How about module insertion?
>
I was asked to implement the "initialize" case just in case FW ever
stopped setting up the device itself, so that is why this code is
included.
>
> Who has the ownership of this device?
Joined ownership I suppose...
>
>> The above mentioned ?configuration if not running? reflect the values that
>> the FW is currently setting. We should not change those values as long as the
>> Firmware is also reading the temperature on its own.
>
> hmm.. that looks like racy to me. Again, How do you synchronize accesses to
> this device? What if you configure the device and right after the
> firmware updates the configs? How do you make sure the configs you are
> writing here are the same used by the firmware? What if the firmware
> version changes? What versions of the firmware does this driver support?
>
> Would it make sense to simply always initialize the device? Do you have
> a way to tell the firmware that it should not use the device?
>
> Or, if you want to keep the device driver simply being a dummy reader,
> would it make sense to simply avoid writing configurations to the
> device, and simply retry to check if the firmware gets the device
> initialized?
Again: the device registers are only ever written if the device is not started
already. Otherwise the driver only reads for the ADC register, so there
is no real race here.
>
>>
>>>
>>>> So do you need another version of the patchset that uses that new API?
>>>
>>> I think the API usage is change that can be done together with
>>> clarification for the above questions too: on hardware state,
>>> firmware loading, maybe a master driver dependency, and the ADC
>>> conversion sequence, which are not well clear to me on this driver. As long as
>>> this is clarified and documented in the code (can be simple comments so
>>> it is clear to whoever reads in the future), then I would be OK with
>>> this driver.
>>
>> So how do you want this to get ?documented? in the driver?
>> The setup and Firmware is a generic feature of the SOC, so if we would put
>> some clarifications in this driver, then we would need to put it in every
>> bcm283X driver (which seems unreasonable).
>>
>
> I think a simple comment explaining the firmware dependency and the
> expected pre-conditions to get this driver working in a sane state would
> do it.
>
> A better device initialization would also be appreciated. Based on my
> limited understanding of this platform, and your explanations, this
> device seams to have a serious race condition with firmware while
> accessing this device.
Again: the firmware runs before the ARM is started and for all practical
purposes the firmware is (as of now) configuring the thermal device.
As for the use of thermal_zone_get_offset/slope: looking into the code
it looks like this actually blows up the code, as we now would need to
allocate thermal_zone_params and preset it with the ?correct? values.
So more code to maintain and more memory consumed.
The only advantage I would see is that it would allow to set offset and
slope directly in the device tree.
Thanks,
Martin
^ permalink raw reply
* [GIT PULL] Allwinner DT changes for 4.10
From: Maxime Ripard @ 2016-11-22 14:26 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CAOesGMh3uph=eU6XhHHn+OYUvGx0hXYGAAbFT0LPKUeah5bggA@mail.gmail.com>
On Mon, Nov 21, 2016 at 03:34:50PM -0800, Olof Johansson wrote:
> On Mon, Nov 21, 2016 at 2:22 PM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > On Mon, Nov 21, 2016 at 08:28:09AM -0800, Olof Johansson wrote:
> >> HI,
> >>
> >> On Mon, Nov 21, 2016 at 5:27 AM, Maxime Ripard
> >> <maxime.ripard@free-electrons.com> wrote:
> >> > Hi Olof,
> >> >
> >> > On Fri, Nov 18, 2016 at 04:23:16PM -0800, Olof Johansson wrote:
> >> >> Hi,
> >> >>
> >> >> On Tue, Nov 15, 2016 at 09:41:22PM +0100, Maxime Ripard wrote:
> >> >> > Hi Arnd, Olof,
> >> >> >
> >> >> > Here is our pull request for the next merge window.
> >> >> >
> >> >> > Thanks!
> >> >> > Maxime
> >> >> >
> >> >> > The following changes since commit 1001354ca34179f3db924eb66672442a173147dc:
> >> >> >
> >> >> > Linux 4.9-rc1 (2016-10-15 12:17:50 -0700)
> >> >> >
> >> >> > are available in the git repository at:
> >> >> >
> >> >> > https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux.git tags/sunxi-dt-for-4.10
> >> >> >
> >> >> > for you to fetch changes up to e39a30cf736144814b0bddb3fff28fbbc2a8be0f:
> >> >> >
> >> >> > ARM: sunxi: Add the missing clocks to the pinctrl nodes (2016-11-15 18:49:47 +0100)
> >> >> >
> >> >> > ----------------------------------------------------------------
> >> >> > Allwinner DT additions for 4.10
> >> >> >
> >> >> > The usual bunch of DT additions, but most notably:
> >> >> > - A31 DRM driver
> >> >> > - A31 audio codec
> >> >> > - WiFi for the A80-Based boards and the CHIP
> >> >> > - Support for the NextThing Co CHIP Pro (the first board with NAND
> >> >> > enabled)
> >> >> > - New boards: NanoPi M1,
> >> >> >
> >> >> [...]
> >> >>
> >> >> > Maxime Ripard (16):
> >> >> > ARM: sun5i: a13-olinuxino: Enable VGA bridge
> >> >> > ARM: gr8: Add the UART3
> >> >> > ARM: gr8: Fix typo in the i2s mclk pin group
> >> >> > ARM: gr8: Add missing pwm channel 1 pin
> >> >> > ARM: gr8: Add UART2 pins
> >> >> > ARM: gr8: Add UART3 pins
> >> >> > ARM: gr8: Add CHIP Pro support
> >> >> > ARM: sun5i: chip: Enable Wi-Fi SDIO chip
> >> >> > ARM: sun5i: Rename A10s pins
> >> >> > ARM: sun5i: Add SPI2 pins
> >> >> > ARM: sun5i: Add RGB 565 LCD pins
> >> >> > ARM: sun5i: chip: Add optional buses
> >> >> > ARM: gr8: evb: Enable SPDIF
> >> >> > ARM: gr8: evb: Add i2s codec
> >> >> > ARM: sun8i: sina33: Enable USB gadget
> >> >> > ARM: sunxi: Add the missing clocks to the pinctrl nodes
> >> >> >
> >> >> [...]
> >> >>
> >> >> > arch/arm/boot/dts/Makefile | 1 +
> >> >> > arch/arm/boot/dts/ntc-gr8-chip-pro.dts | 266 +++++++++++++++++++++
> >> >> > arch/arm/boot/dts/ntc-gr8-evb.dts | 33 +++
> >> >> > arch/arm/boot/dts/ntc-gr8.dtsi | 47 +++-
> >> >> > arch/arm/boot/dts/sun4i-a10.dtsi | 3 +-
> >> >>
> >> >> NTC isn't the SoC manufacturer, and we try to keep the prefixes down to
> >> >> manufacturer to keep the namespace a little more manageable, even if
> >> >> we never got subdirectories setup as on arm64.
> >> >>
> >> >> I think this should probably be sun4i-a10-gr8 or sun4i-r8-gr8 as prefix?
> >> >
> >> > The users really expect a SoC from Nextthing, it's always been
> >> > marketed that way, the marking on the SoC also says so, etc. The fact
> >> > that it's been a design in cooperation with Allwinner, and that the
> >> > design is based on some earlier family is an implementation detail,
> >> > and I'd really like not for it to have the sun5i prefix, it's just
> >> > confusing.
> >>
> >> I don't care so much about what's printed on the top of the package, I
> >> care a lot more about what's on the insides. We've got a long
> >> tradition of not renaming things randomly when companies get acquired
> >> or renames themselves, and I think that same spirit is applicable
> >> here.
> >
> > Yet, we called the imx23 and imx28 that way while it was really a
> > sigmatel design. And I'm pretty sure there's other examples too.
>
> Had Sigmatel had a whole bunch of SoCs in-tree we might have had
> discussions about that as well.
So what happens if and when we have more NextThing SIP in tree, we
rename all of them once we have "a whole bunch"?
This is a circular argument.
> >> Calling it an SoC is inaccurate as well, it's really a
> >> system-in-package. It's just a new way to integrate an SoC into a
> >> module to build boards out of. Compare to Octavo OSD335x, for example.
> >>
> >> System-in-package solutions are going to get more and more common, and
> >> it's going to become really chaotic if we expect to use the prefix of
> >> the company custom-ordering the package for each and every one of
> >> them.
> >
> > This is indeed a SIP, but one with a custom SoC (or whatever the !RAM
> > part in a SIP is called) that was designed by Allwinner *and* Next
> > Thing Co., unlike the OSD335x that has a standard AM335x SoC in it.
>
> Even NTC's marketing material says it's an Allwinner R8, at
> https://getchip.com/pages/chippro: "Powered by a chip you can actually
> buy. R8 SoC + 256MB DDR3. 1 package. 1 price. $6 in any quantity."
I've been working on it for the last couple of monthes, and I can tell
you it isn't an R8. It has more IPs, including some that were never
found in any SoC of that generation before (the others being just a
different mix of what was already there).
> This is really quite similar to the situation with SoM manufacturers
> such as Toradex, who integrate SoC, memory and some I/O on a module
> that you then build your product around. The only difference is the
> level it's integrated at. We've chosen to slice our naming across SoC
> family lines, not who makes the substrate that the SoC is mounted to.
I disagree with the comparison. Toradex bought plain off the shelves
SoCs from Freescale, NextThing didn't. It makes sense for the Toradex
case, but I'm still convinced it isn't in this case.
Anyway... I'll send a patch to fix the DTs we have already in 4.9, and
will resend that pull request.
Maxime
--
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20161122/a637fee0/attachment-0001.sig>
^ permalink raw reply
* [PATCH 2/2] PCI: iproc: avoid maybe-uninitialized warning
From: Arnd Bergmann @ 2016-11-22 14:17 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122141844.1655574-1-arnd@arndb.de>
gcc notices that calling iproc_pcie_setup_ib with ib->nr_regions==0
would result in an uninitialized return value:
drivers/pci/host/pcie-iproc.c: In function 'iproc_pcie_setup_ib':
drivers/pci/host/pcie-iproc.c:894:6: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This can't really happen, but the correct behavior of the function
is probably to return -EINVAL if it ever did.
Fixes: 4213e15c364e ("PCI: iproc: Make outbound mapping code more generic")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/pci/host/pcie-iproc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 857ff5198317..0359569c8d78 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -936,6 +936,7 @@ static int iproc_pcie_setup_ib(struct iproc_pcie *pcie,
}
}
+ ret = -EINVAL;
err_ib:
dev_err(dev, "unable to configure inbound mapping\n");
dev_err(dev, "axi %pap, pci %pap, res size %pap\n",
--
2.9.0
^ permalink raw reply related
* [PATCH 1/2] PCI: iproc: fix 32-bit build
From: Arnd Bergmann @ 2016-11-22 14:17 UTC (permalink / raw)
To: linux-arm-kernel
The newly added code to setup the inbound ranges causes a link error
on 32-bit machines from a 32-bit division:
drivers/pci/host/pcie-iproc.o: In function `iproc_pcie_setup_ib':
pcie-iproc.c:(.text.iproc_pcie_setup_ib+0x14c): undefined reference to `__aeabi_uldivmod'
As both sides of the division are always power-of-two numbers and
we already rely on that, we can use a shift instead.
Fixes: 87c240b19bba ("PCI: iproc: Add inbound DMA mapping support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/pci/host/pcie-iproc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index d10e6aa32e0d..857ff5198317 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -865,7 +865,7 @@ static int iproc_pcie_ib_write(struct iproc_pcie *pcie, int region_idx,
* Now program the IMAP registers. Each IARR region may have one or
* more IMAP windows.
*/
- size /= nr_windows;
+ size >>= ilog2(nr_windows);
for (window_idx = 0; window_idx < nr_windows; window_idx++) {
val = readl(pcie->base + imap_offset);
val |= lower_32_bits(axi_addr) | IMAP_VALID;
--
2.9.0
^ permalink raw reply related
* Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-22 14:16 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <7c7798b5-8cd4-ba99-f526-22d3e06e05db@synopsys.com>
Hi Lars and Peppe,
On 21-11-2016 16:11, Joao Pinto wrote:
> On 21-11-2016 15:43, Lars Persson wrote:
>>
>>
>>> 21 nov. 2016 kl. 16:06 skrev Joao Pinto <Joao.Pinto@synopsys.com>:
>>>
>>>> On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
>>>>> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>>>>
>>>>>
>>>>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>>>>
>>>>>> Hello Joao
>>>>>>
>>>>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>>>>> single file
>>>>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and
>
snip (...)
>>>>>>
>>>>>> Peppe
>>>>>>
>>>>>
>>>>> Hello Joao and others,
>>>>>
>>>
>>> Hi Lars,
>>>
>>>>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>>>>> most mature driver, the stmmac.
>>>>>
>>>>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>>>>> convention of naming the folder after the vendor. This makes it easy for
>>>>> others to find the driver.
>>>>>
>>>>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>>>>> then be implemented in the stmmac driver.
>>>
>>> So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
>>> and try to improve the structure and add the missing QoS features to it?
>>
>> Indeed this is what I prefer.
>
> Ok, it makes sense.
> Just for curiosity the target setup is the following:
> https://www.youtube.com/watch?v=8V-LB5y2Cos
> but instead of using internal drivers, we desire to use mainline drivers only.
>
> Thanks!
Regarding this subject, I am thinking of making the following adaption:
a) delete ethernet/synopsys
b) rename ethernet/stmicro/stmmac to ethernet/synopsys
and send you a patch for you to evaluate. Both agree with the approach?
To have a new work base would be important, because I will add to the "new"
structure some missing QoS features like Multichannel support, CBS and later TSN.
Thanks.
>
>>
>>>
>>>>
>>>> Thanks Lars, I will be happy to support all you on this transition
>>>> and I agree on renaming all.
>>>>
>>>> peppe
>>>>
>>>>
>>>>> - Lars
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>>>>
>>>>>>>>> The former only supports 4.x of the hardware.
>>>>>>>>>
>>>>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>>>>> with several features not present in the former (for example: TX/RX
>>>>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>>>>
>>>>>>>>> Have you evaluated both drivers? Why have you decided to work on the
>>>>>>>>> former rather than the latter?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
^ permalink raw reply
* [GIT PULL] arm/hdlcd fixes for v4.9-rc7
From: Liviu Dudau @ 2016-11-22 14:14 UTC (permalink / raw)
To: linux-arm-kernel
Hi David,
A late issue discovered by Russell King while testing his setup on Juno.
I would be really happy if it goes into v4.9-rc7 as it fixes a reference
counting problem with the pixelclock clock, but if it is too late for that
then I guess it can go into drm-next.
The following changes since commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6:
Linux 4.9-rc5 (2016-11-13 10:32:32 -0800)
are available in the git repository at:
git://linux-arm.org/linux-ld.git for-upstream/hdlcd
for you to fetch changes up to 7a79279e7186c4ac8b753cbd335ecc4ba81b5970:
drm/arm: hdlcd: fix plane base address update (2016-11-22 14:09:06 +0000)
----------------------------------------------------------------
Russell King (1):
drm/arm: hdlcd: fix plane base address update
drivers/gpu/drm/arm/hdlcd_crtc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Many thanks,
Liviu
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
?\_(?)_/?
^ permalink raw reply
* [PATCH] USB: OHCI: use module_platform_driver macro
From: csmanjuvijay at gmail.com @ 2016-11-22 13:49 UTC (permalink / raw)
To: linux-arm-kernel
From: Manjunath Goudar <csmanjuvijay@gmail.com>
Use the module_platform_driver macro to do module init/exit.
This eliminates a lot of boilerplate.This also removes
checkpatch.pl errors.
Signed-off-by: Manjunath Goudar <csmanjuvijay@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vladimir Zapolskiy <vz@mleia.com>
Cc: Sylvain Lemieux <slemieux.tyco@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-arm-kernel at lists.infradead.org
Cc: linux-usb at vger.kernel.org
Cc: linux-kernel at vger.kernel.org
---
drivers/usb/host/ohci-nxp.c | 38 ++++++++++++++------------------------
1 file changed, 14 insertions(+), 24 deletions(-)
diff --git a/drivers/usb/host/ohci-nxp.c b/drivers/usb/host/ohci-nxp.c
index b7d4756..9908647 100644
--- a/drivers/usb/host/ohci-nxp.c
+++ b/drivers/usb/host/ohci-nxp.c
@@ -56,8 +56,6 @@ static struct hc_driver __read_mostly ohci_nxp_hc_driver;
static struct i2c_client *isp1301_i2c_client;
-extern int usb_disabled(void);
-
static struct clk *usb_host_clk;
static void isp1301_configure_lpc32xx(void)
@@ -127,6 +125,7 @@ static inline void isp1301_vbus_off(void)
static void ohci_nxp_start_hc(void)
{
unsigned long tmp = __raw_readl(USB_OTG_STAT_CONTROL) | HOST_EN;
+
__raw_writel(tmp, USB_OTG_STAT_CONTROL);
isp1301_vbus_on();
}
@@ -134,6 +133,7 @@ static void ohci_nxp_start_hc(void)
static void ohci_nxp_stop_hc(void)
{
unsigned long tmp;
+
isp1301_vbus_off();
tmp = __raw_readl(USB_OTG_STAT_CONTROL) & ~HOST_EN;
__raw_writel(tmp, USB_OTG_STAT_CONTROL);
@@ -142,11 +142,17 @@ static void ohci_nxp_stop_hc(void)
static int ohci_hcd_nxp_probe(struct platform_device *pdev)
{
struct usb_hcd *hcd = 0;
- const struct hc_driver *driver = &ohci_nxp_hc_driver;
struct resource *res;
int ret = 0, irq;
struct device_node *isp1301_node;
+ if (usb_disabled())
+ return -ENODEV;
+
+ pr_info("%s: " DRIVER_DESC "\n", hcd_name);
+
+ ohci_init_driver(&ohci_nxp_hc_driver, NULL);
+
if (pdev->dev.of_node) {
isp1301_node = of_parse_phandle(pdev->dev.of_node,
"transceiver", 0);
@@ -155,9 +161,8 @@ static int ohci_hcd_nxp_probe(struct platform_device *pdev)
}
isp1301_i2c_client = isp1301_get_client(isp1301_node);
- if (!isp1301_i2c_client) {
+ if (!isp1301_i2c_client)
return -EPROBE_DEFER;
- }
ret = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (ret)
@@ -165,6 +170,7 @@ static int ohci_hcd_nxp_probe(struct platform_device *pdev)
dev_dbg(&pdev->dev, "%s: " DRIVER_DESC " (nxp)\n", hcd_name);
if (usb_disabled()) {
+
dev_err(&pdev->dev, "USB is disabled\n");
ret = -ENODEV;
goto fail_disable;
@@ -186,7 +192,8 @@ static int ohci_hcd_nxp_probe(struct platform_device *pdev)
isp1301_configure();
- hcd = usb_create_hcd(driver, &pdev->dev, dev_name(&pdev->dev));
+ hcd = usb_create_hcd(&ohci_nxp_hc_driver, &pdev->dev,
+ dev_name(&pdev->dev));
if (!hcd) {
dev_err(&pdev->dev, "Failed to allocate HC buffer\n");
ret = -ENOMEM;
@@ -260,24 +267,7 @@ static struct platform_driver ohci_hcd_nxp_driver = {
.probe = ohci_hcd_nxp_probe,
.remove = ohci_hcd_nxp_remove,
};
-
-static int __init ohci_nxp_init(void)
-{
- if (usb_disabled())
- return -ENODEV;
-
- pr_info("%s: " DRIVER_DESC "\n", hcd_name);
-
- ohci_init_driver(&ohci_nxp_hc_driver, NULL);
- return platform_driver_register(&ohci_hcd_nxp_driver);
-}
-module_init(ohci_nxp_init);
-
-static void __exit ohci_nxp_cleanup(void)
-{
- platform_driver_unregister(&ohci_hcd_nxp_driver);
-}
-module_exit(ohci_nxp_cleanup);
+module_platform_driver(ohci_hcd_nxp_driver);
MODULE_DESCRIPTION(DRIVER_DESC);
MODULE_LICENSE("GPL v2");
--
2.7.4
^ permalink raw reply related
* [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation
From: Ard Biesheuvel @ 2016-11-22 13:38 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <CAKv+Gu9x+Z6Wx_jXQvFBuD4nQFF7rhz7M0KZmPWqUgfsfVz-OA@mail.gmail.com>
On 22 November 2016 at 12:53, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 22 November 2016 at 10:14, YueHaibing <yuehaibing@huawei.com> wrote:
>> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
>> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
>> on to enable the feature.The crc_t10dif crypto library function will
>> use this faster algorithm when crct10dif_neon module is loaded.
>>
>
> What is this algorithm commonly used for? In other words, why is it a
> good idea to add support for this algorithm to the kernel?
>
>> Tcrypt benchmark results:
>>
>> HIP06 (mode=320 sec=2)
>>
>> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>>
>> TEST neon generic ratio
>> 16 byte blocks, 16 bytes per update, 1 updates 214506112 171095400 1.25
>> 64 byte blocks, 16 bytes per update, 4 updates 139385312 119036352 1.17
>> 64 byte blocks, 64 bytes per update, 1 updates 671523712 198945344 3.38
>> 256 byte blocks, 16 bytes per update, 16 updates 157674880 125146752 1.26
>> 256 byte blocks, 64 bytes per update, 4 updates 491888128 175764096 2.80
>> 256 byte blocks, 256 bytes per update, 1 updates 2123298176 206995200 10.26
>> 1024 byte blocks, 16 bytes per update, 64 updates 161243136 126460416 1.28
>> 1024 byte blocks, 256 bytes per update, 4 updates 1643020800 200027136 8.21
>> 1024 byte blocks, 1024 bytes per update, 1 updates 4238239232 209106432 20.27
>> 2048 byte blocks, 16 bytes per update, 128 updates 162079744 126953472 1.28
>> 2048 byte blocks, 256 bytes per update, 8 updates 1693587456 200867840 8.43
>> 2048 byte blocks, 1024 bytes per update, 2 updates 3424323584 206330880 16.60
>> 2048 byte blocks, 2048 bytes per update, 1 updates 5228207104 208620544 25.06
>> 4096 byte blocks, 16 bytes per update, 256 updates 162304000 126894080 1.28
>> 4096 byte blocks, 256 bytes per update, 16 updates 1731862528 201197568 8.61
>> 4096 byte blocks, 1024 bytes per update, 4 updates 3668625408 207003648 17.72
>> 4096 byte blocks, 4096 bytes per update, 1 updates 5551239168 209127424 26.54
>> 8192 byte blocks, 16 bytes per update, 512 updates 162779136 126984192 1.28
>> 8192 byte blocks, 256 bytes per update, 32 updates 1753702400 201420800 8.71
>> 8192 byte blocks, 1024 bytes per update, 8 updates 3760918528 207351808 18.14
>> 8192 byte blocks, 4096 bytes per update, 2 updates 5483655168 208928768 26.25
>> 8192 byte blocks, 8192 bytes per update, 1 updates 5623377920 209108992 26.89
>>
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> Signed-off-by: YangShengkai <yangshengkai@huawei.com>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
>>
>> ---
>> arch/arm64/crypto/Kconfig | 5 +
>> arch/arm64/crypto/Makefile | 4 +
>> arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 ++++++++++++++++++++++++++++++
>> arch/arm64/crypto/crct10dif-neon_glue.c | 115 +++++
>> 4 files changed, 875 insertions(+)
>> create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
>> create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 2cf32e9..2e450bf 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
>> depends on ARM64 && KERNEL_MODE_NEON
>> select CRYPTO_HASH
>>
>> +config CRYPTO_CRCT10DIF_NEON
>> + tristate "CRCT10DIF hardware acceleration using NEON instructions"
>> + depends on ARM64 && KERNEL_MODE_NEON
>> + select CRYPTO_HASH
>> +
>
> Could you please follow the existing pattern:
>
> config CRYPTO_CRCT10DIF_ARM64_NEON
>
>
>> config CRYPTO_AES_ARM64_CE
>> tristate "AES core cipher using ARMv8 Crypto Extensions"
>> depends on ARM64 && KERNEL_MODE_NEON
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index abb79b3..6c9ff2c 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
>> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
>> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
>> +
>
> Please drop this line, and add
>
> .cpu generic+crypto
>
> to the .S file
>
>> AFLAGS_aes-ce.o := -DINTERLEAVE=4
>> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>>
>> diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S b/arch/arm64/crypto/crct10dif-neon-asm_64.S
>> new file mode 100644
>> index 0000000..2ae3033
>> --- /dev/null
>> +++ b/arch/arm64/crypto/crct10dif-neon-asm_64.S
>> @@ -0,0 +1,751 @@
>> +/*
>> + * Copyright (c) 2016-2017 Hisilicon Limited.
>> + *
>
> Please drop the 2017 here.
>
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/assembler.h>
>> +
>> +.global crc_t10dif_neon
>
> Please drop this .global, and use ENTRY() below
>
>> +.text
>> +
>> +/* X0 is initial CRC value
>> + * X1 is data buffer
>> + * X2 is the length of buffer
>> + * X3 is the backup buffer(for extend)
>> + * X4 for other extend parameter(for extend)
>> + * Q0, Q1, Q2, Q3 maybe as parameter for other functions,
>> + * the value of Q0, Q1, Q2, Q3 maybe modified.
>> + *
>> + * suggestion:
>> + * 1. dont use general purpose register for calculation
>> + * 2. set data endianness outside of the kernel
>> + * 3. use ext as shifting around
>> + * 4. dont use LD3/LD4, ST3/ST4
>> + */
>> +
>
>
> Whose suggestions are these, and why? I do appreciate comments like
> this, but only if I can learn something from it
>
>> +crc_t10dif_neon:
>
> ENTRY()
>
>> + /* push the register to stack that CRC16 will use */
>> + STP X5, X6, [sp, #-0x10]!
>
> Please use an ordinary stack frame, i.e.,
>
> stp x29, x30, [sp, #-xxx]!
> mov x29, sp
>
> where xxx is the entire allocation you need for stacking callee save registers
>
>> + STP X7, X8, [sp, #-0x10]!
>> + STP X9, X10, [sp, #-0x10]!
>> + STP X11, X12, [sp, #-0x10]!
>> + STP X13, X14, [sp, #-0x10]!
>
> These are not callee save, so no need to stack them
>
>> + STP Q10, Q11, [sp, #-0x20]!
>> + STP Q12, Q13, [sp, #-0x20]!
>> + STP Q4, Q5, [sp, #-0x20]!
>> + STP Q6, Q7, [sp, #-0x20]!
>> + STP Q8, Q9, [sp, #-0x20]!
>> + STP Q14, Q15, [sp, #-0x20]!
>> + STP Q16, Q17, [sp, #-0x20]!
>> + STP Q18, Q19, [sp, #-0x20]!
>> +
>
> What is the point of stacking the NEON registers? Also, as a general
> note, could you switch to lower case throughout the file?
>
>
>> + SUB sp,sp,#0x20
>> +
>
> Please account for locals in the allocation above. Only outgoing
> arguments should be allocated below the frame pointer
>
>
>> + MOV X11, #0 // PUSH STACK FLAG
>> +
>
> What does this comment mean?
>
>> + CMP X2, #0x80
>> + B.LT 2f // _less_than_128, <128
>> +
>
> Redundant comment
>
>> + /* V10/V11/V12/V13 is 128bit.
>> + * we get data 512bit( by cacheline ) each time
>> + */
>> + LDP Q10, Q11, [X1], #0x20
>> + LDP Q12, Q13, [X1], #0x20
>> +
>> + /* move the initial value to V6 register */
>> + LSL X0, X0, #48
>> + EOR V6.16B, V6.16B, V6.16B
>> + MOV V6.D[1], X0
>> +
>> + /* big-little end change. because the data in memory is little-end,
>> + * we deal the data for bigend
>> + */
>> +
>
> What if I am using a big endian kernel? Hint: you probably need to
> wrap these in CPU_LE()
>
>> + REV64 V10.16B, V10.16B
>> + REV64 V11.16B, V11.16B
>> + REV64 V12.16B, V12.16B
>> + REV64 V13.16B, V13.16B
>> + EXT V10.16B, V10.16B, V10.16B, #8
>> + EXT V11.16B, V11.16B, V11.16B, #8
>> + EXT V12.16B, V12.16B, V12.16B, #8
>> + EXT V13.16B, V13.16B, V13.16B, #8
>> +
>> + EOR V10.16B, V10.16B, V6.16B
>> +
>> + SUB X2, X2, #0x80
>> + ADD X5, X1, #0x20
>> +
>> + /* deal data when the size of buffer bigger than 128 bytes */
>> + /* _fold_64_B_loop */
>> + LDR Q6,=0xe658000000000000044c000000000000
>
> Could you move all these non-trivial constants to a separate location
> (after the end of the function), and name them?
>
>> +1:
>> +
>> + LDP Q16, Q17, [X1] ,#0x40
>> + LDP Q18, Q19, [X5], #0x40
>> +
>> + /* carry-less multiply.
>> + * V10 high-64bits carry-less multiply
>> + * V6 high-64bits(PMULL2)
>> + * V11 low-64bits carry-less multiply V6 low-64bits(PMULL)
>> + */
>> +
>> + PMULL2 V4.1Q, V10.2D, V6.2D
>> + PMULL V10.1Q, V10.1D, V6.1D
>> + PMULL2 V5.1Q, V11.2D, V6.2D
>> + PMULL V11.1Q, V11.1D, V6.1D
>> +
>
> These instructions are only available if you have the PMULL extension,
> so this algorithm is not plain NEON.
>
>> + REV64 V16.16B, V16.16B
>> + REV64 V17.16B, V17.16B
>> + REV64 V18.16B, V18.16B
>> + REV64 V19.16B, V19.16B
>> +
>
> Endian swap on LE only?
>
>> + PMULL2 V14.1Q, V12.2D, V6.2D
>> + PMULL V12.1Q, V12.1D, V6.1D
>> + PMULL2 V15.1Q, V13.2D, V6.2D
>> + PMULL V13.1Q, V13.1D, V6.1D
>> +
>> + EXT V16.16B, V16.16B, V16.16B, #8
>> + EOR V10.16B, V10.16B, V4.16B
>> +
>> + EXT V17.16B, V17.16B, V17.16B, #8
>> + EOR V11.16B, V11.16B, V5.16B
>> +
>> + EXT V18.16B, V18.16B, V18.16B, #8
>> + EOR V12.16B, V12.16B, V14.16B
>> +
>> + EXT V19.16B, V19.16B, V19.16B, #8
>> + EOR V13.16B, V13.16B, V15.16B
>> +
>> + SUB X2, X2, #0x40
>> +
>> +
>> + EOR V10.16B, V10.16B, V16.16B
>> + EOR V11.16B, V11.16B, V17.16B
>> +
>> + EOR V12.16B, V12.16B, V18.16B
>> + EOR V13.16B, V13.16B, V19.16B
>> +
>> + CMP X2, #0x0
>> + B.GE 1b // >=0
>> +
>> + LDR Q6, =0x06df0000000000002d56000000000000
>> + MOV V4.16B, V10.16B
>> + /* V10 carry-less 0x06df000000000000([127:64]*[127:64]) */
>> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
>> + PMULL2 V10.1Q, V10.2D, V6.2D
>> + EOR V11.16B, V11.16B, V4.16B
>> + EOR V11.16B, V11.16B, V10.16B
>> +
>> + MOV V4.16B, V11.16B
>> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
>> + PMULL2 V11.1Q, V11.2D, V6.2D
>> + EOR V12.16B, V12.16B, V4.16B
>> + EOR V12.16B, V12.16B, V11.16B
>> +
>> + MOV V4.16B, V12.16B
>> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
>> + PMULL2 V12.1Q, V12.2D, V6.2D
>> + EOR V13.16B, V13.16B, V4.16B
>> + EOR V13.16B, V13.16B, V12.16B
>> +
>> + ADD X2, X2, #48
>> + CMP X2, #0x0
>> + B.LT 3f // _final_reduction_for_128, <0
>> +
>> + /* _16B_reduction_loop */
>> +4:
>> + /* unrelated load as early as possible*/
>> + LDR Q10, [X1], #0x10
>> +
>> + MOV V4.16B, V13.16B
>> + PMULL2 V13.1Q, V13.2D, V6.2D
>> + PMULL V4.1Q, V4.1D, V6.1D
>> + EOR V13.16B, V13.16B, V4.16B
>> +
>> + REV64 V10.16B, V10.16B
>> + EXT V10.16B, V10.16B, V10.16B, #8
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + SUB X2, X2, #0x10
>> + CMP X2, #0x0
>> + B.GE 4b // _16B_reduction_loop, >=0
>> +
>> + /* _final_reduction_for_128 */
>> +3: ADD X2, X2, #0x10
>> + CMP X2, #0x0
>> + B.EQ 5f // _128_done, ==0
>> +
>> + /* _get_last_two_xmms */
>
> Bogus comment. I guess you ported this code from x86, are you sure you
> don't need to credit the original author?
>
>> +6: MOV V12.16B, V13.16B
>> + SUB X1, X1, #0x10
>> + ADD X1, X1, X2
>> + LDR Q11, [X1], #0x10
>> + REV64 V11.16B, V11.16B
>> + EXT V11.16B, V11.16B, V11.16B, #8
>> +
>> + CMP X2, #8
>> + B.EQ 50f
>> + B.LT 51f
>> + B.GT 52f
>> +
>> +50:
>> + /* dont use X register as temp one */
>> + FMOV D14, D12
>> + MOVI D12, #0
>> + MOV V12.D[1],V14.D[0]
>> + B 53f
>> +51:
>> + MOV X9, #64
>> + LSL X13, X2, #3 // <<3 equal x8
>> + SUB X9, X9, X13
>> + MOV X5, V12.D[0] // low 64-bit
>> + MOV X6, V12.D[1] // high 64-bit
>> + LSR X10, X5, X9 // high bit of low 64-bit
>> + LSL X7, X5, X13
>> + LSL X8, X6, X13
>> + ORR X8, X8, X10 // combination of high 64-bit
>> + MOV V12.D[1], X8
>> + MOV V12.D[0], X7
>> +
>> + B 53f
>> +52:
>> + LSL X13, X2, #3 // <<3 equal x8
>> + SUB X13, X13, #64
>> +
>> + DUP V18.2D, X13
>> + FMOV D16, D12
>> + USHL D16, D16, D18
>> + EXT V12.16B, V16.16B, V16.16B, #8
>> +
>> +53:
>> + MOVI D14, #0 //add one zero constant
>> +
>> + CMP X2, #0
>> + B.EQ 30f
>> + CMP X2, #1
>> + B.EQ 31f
>> + CMP X2, #2
>> + B.EQ 32f
>> + CMP X2, #3
>> + B.EQ 33f
>> + CMP X2, #4
>> + B.EQ 34f
>> + CMP X2, #5
>> + B.EQ 35f
>> + CMP X2, #6
>> + B.EQ 36f
>> + CMP X2, #7
>> + B.EQ 37f
>> + CMP X2, #8
>> + B.EQ 38f
>> + CMP X2, #9
>> + B.EQ 39f
>> + CMP X2, #10
>> + B.EQ 40f
>> + CMP X2, #11
>> + B.EQ 41f
>> + CMP X2, #12
>> + B.EQ 42f
>> + CMP X2, #13
>> + B.EQ 43f
>> + CMP X2, #14
>> + B.EQ 44f
>> + CMP X2, #15
>> + B.EQ 45f
>> +
>
> This looks awful. If you make the snippets below a fixed size, you
> could use a computed goto instead
>
>> + // >> 128bit
>> +30:
>> + EOR V13.16B, V13.16B, V13.16B
>> + EOR V8.16B, V8.16B, V8.16B
>> + LDR Q9,=0xffffffffffffffffffffffffffffffff
>
> Shouldn't you initialize q8 here as well.
Never mind, I just spotted the EOR which intializes it to 0x0
> And in general, couldn't you
> use some kind of shift to generate these constants (in all cases
> below)?
>> + B 46f
>> +
>> + // >> 120bit
>> +31:
>> + USHR V13.2D, V13.2D, #56
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xff
>> + LDR Q9,=0xffffffffffffffffffffffffffffff00
>> + B 46f
>> +
>> + // >> 112bit
>> +32:
>> + USHR V13.2D, V13.2D, #48
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffff
>> + LDR Q9,=0xffffffffffffffffffffffffffff0000
>> + B 46f
>> +
>> + // >> 104bit
>> +33:
>> + USHR V13.2D, V13.2D, #40
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffff
>> + LDR Q9,=0xffffffffffffffffffffffffff000000
>> + B 46f
>> +
>> + // >> 96bit
>> +34:
>> + USHR V13.2D, V13.2D, #32
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffffff
>> + LDR Q9,=0xffffffffffffffffffffffff00000000
>> + B 46f
>> +
>> + // >> 88bit
>> +35:
>> + USHR V13.2D, V13.2D, #24
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffffffff
>> + LDR Q9,=0xffffffffffffffffffffff0000000000
>> + B 46f
>> +
>> + // >> 80bit
>> +36:
>> + USHR V13.2D, V13.2D, #16
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffffffffff
>> + LDR Q9,=0xffffffffffffffffffff000000000000
>> + B 46f
>> +
>> + // >> 72bit
>> +37:
>> + USHR V13.2D, V13.2D, #8
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffffffffffff
>> + LDR Q9,=0xffffffffffffffffff00000000000000
>> + B 46f
>> +
>> + // >> 64bit
>> +38:
>> + EXT V13.16B, V13.16B, V14.16B, #8
>> + LDR Q8,=0xffffffffffffffff
>> + LDR Q9,=0xffffffffffffffff0000000000000000
>> + B 46f
>> +
>> + // >> 56bit
>> +39:
>> + EXT V13.16B, V13.16B, V13.16B, #7
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> + MOV V13.B[9], V14.B[0]
>> +
>> + LDR Q8,=0xffffffffffffffffff
>> + LDR Q9,=0xffffffffffffff000000000000000000
>> + B 46f
>> +
>> + // >> 48bit
>> +40:
>> + EXT V13.16B, V13.16B, V13.16B, #6
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffff
>> + LDR Q9,=0xffffffffffff00000000000000000000
>> + B 46f
>> +
>> + // >> 40bit
>> +41:
>> + EXT V13.16B, V13.16B, V13.16B, #5
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.B[11], V14.B[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffffff
>> + LDR Q9,=0xffffffffff0000000000000000000000
>> + B 46f
>> +
>> + // >> 32bit
>> +42:
>> + EXT V13.16B, V13.16B, V13.16B, #4
>> + MOV V13.S[3], V14.S[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffffffff
>> + LDR Q9,=0xffffffff000000000000000000000000
>> + B 46f
>> +
>> + // >> 24bit
>> +43:
>> + EXT V13.16B, V13.16B, V13.16B, #3
>> + MOV V13.H[7], V14.H[0]
>> + MOV V13.B[13], V14.B[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffffffffff
>> + LDR Q9,=0xffffff00000000000000000000000000
>> + B 46f
>> +
>> + // >> 16bit
>> +44:
>> + EXT V13.16B, V13.16B, V13.16B, #2
>> + MOV V13.H[7], V14.H[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffffffffffff
>> + LDR Q9,=0xffff0000000000000000000000000000
>> + B 46f
>> +
>> + // >> 8bit
>> +45:
>> + EXT V13.16B, V13.16B, V13.16B, #1
>> + MOV V13.B[15], V14.B[0]
>> +
>> + LDR Q8,=0xffffffffffffffffffffffffffffff
>> + LDR Q9,=0xff000000000000000000000000000000
>> +
>> + // backup V12 first
>> + // pblendvb xmm1, xmm2
>
> Another remnant of the x86 version, please remove
>
>> +46:
>> + AND V12.16B, V12.16B, V9.16B
>> + AND V11.16B, V11.16B, V8.16B
>> + ORR V11.16B, V11.16B, V12.16B
>> +
>> + MOV V12.16B, V11.16B
>> + MOV V4.16B, V13.16B
>> + PMULL2 V13.1Q, V13.2D, V6.2D
>> + PMULL V4.1Q, V4.1D, V6.1D
>> + EOR V13.16B, V13.16B, V4.16B
>> + EOR V13.16B, V13.16B, V12.16B
>> +
>> + /* _128_done. we change the Q6 D[0] and D[1] */
>> +5: LDR Q6, =0x2d560000000000001368000000000000
>> + MOVI D14, #0
>> + MOV V10.16B, V13.16B
>> + PMULL2 V13.1Q, V13.2D, V6.2D
>> +
>> + MOV V10.D[1], V10.D[0]
>> + MOV V10.D[0], V14.D[0] //set zero
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + MOV V10.16B, V13.16B
>> + LDR Q7, =0x00000000FFFFFFFFFFFFFFFFFFFFFFFF
>> + AND V10.16B, V10.16B, V7.16B
>> +
>> + MOV S13, V13.S[3]
>> +
>> + PMULL V13.1Q, V13.1D, V6.1D
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + /* _barrett */
>
> What does '_barrett' mean?
>
>> +7: LDR Q6, =0x00000001f65a57f8000000018bb70000
>> + MOVI D14, #0
>> + MOV V10.16B, V13.16B
>> + PMULL2 V13.1Q, V13.2D, V6.2D
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #12
>> + MOV V13.S[0], V14.S[0]
>> +
>> + EXT V6.16B, V6.16B, V6.16B, #8
>> + PMULL2 V13.1Q, V13.2D, V6.2D
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #12
>> + MOV V13.S[0], V14.S[0]
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> + MOV X0, V13.D[0]
>> +
>> + /* _cleanup */
>> +8: MOV X14, #48
>> + LSR X0, X0, X14
>
> Why the temp x14?
>
>> +99:
>> + ADD sp, sp, #0x20
>> +
>> + LDP Q18, Q19, [sp], #0x20
>> + LDP Q16, Q17, [sp], #0x20
>> + LDP Q14, Q15, [sp], #0x20
>> +
>> + LDP Q8, Q9, [sp], #0x20
>> + LDP Q6, Q7, [sp], #0x20
>> + LDP Q4, Q5, [sp], #0x20
>> + LDP Q12, Q13, [sp], #0x20
>> + LDP Q10, Q11, [sp], #0x20
>> + LDP X13, X14, [sp], #0x10
>> + LDP X11, X12, [sp], #0x10
>> + LDP X9, X10, [sp], #0x10
>> + LDP X7, X8, [sp], #0x10
>> + LDP X5, X6, [sp], #0x10
>> +
>
> None of these registers need to be restored. The only thing you need
> (to mirror the prologue)
>
> ldp x29, x30, [sp], #xxx
> ret
>
> where xxx is the same value you used at the beginning.
>
>
>> + RET
>> +
>> + /* _less_than_128 */
>> +2: CMP X2, #32
>> + B.LT 9f // _less_than_32
>> + LDR Q6, =0x06df0000000000002d56000000000000
>> +
>> + LSL X0, X0, #48
>> + LDR Q10, =0x0
>
> Please use movi here
>
>> + MOV V10.D[1], X0
>> + LDR Q13, [X1], #0x10
>> + REV64 V13.16B, V13.16B
>> + EXT V13.16B, V13.16B, V13.16B, #8
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + SUB X2, X2, #32
>> + B 4b
>> +
>> + /* _less_than_32 */
>> +9: CMP X2, #0
>> + B.EQ 99b // _cleanup
>
> You can use CBZ here
>
>> + LSL X0, X0, #48
>> + LDR Q10,=0x0
>
> Please use movi here
>
>> + MOV V10.D[1], X0
>> +
>> + CMP X2, #16
>> + B.EQ 10f // _exact_16_left
>> + B.LE 11f // _less_than_16_left
>> + LDR Q13, [X1], #0x10
>> +
>> + REV64 V13.16B, V13.16B
>> + EXT V13.16B, V13.16B, V13.16B, #8
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> + SUB X2, X2, #16
>> + LDR Q6, =0x06df0000000000002d56000000000000
>> + B 6b // _get_last_two_xmms
>
> Another bogus comment
>
>> +
>> + /* _less_than_16_left */
>> +11: CMP X2, #4
>> + B.LT 13f // _only_less_than_4
>> +
>> + /* backup the length of data, we used in _less_than_2_left */
>> + MOV X8, X2
>> + CMP X2, #8
>> + B.LT 14f // _less_than_8_left
>> +
>> + LDR X14, [X1], #8
>> + /* push the data to stack, we backup the data to V10 */
>> + STR X14, [sp, #0]
>> + SUB X2, X2, #8
>> + ADD X11, X11, #8
>> +
>> + /* _less_than_8_left */
>> +14: CMP X2, #4
>> + B.LT 15f // _less_than_4_left
>> +
>> + /* get 32bit data */
>> + LDR W5, [X1], #4
>> +
>> + /* push the data to stack */
>> + STR W5, [sp, X11]
>> + SUB X2, X2, #4
>> + ADD X11, X11, #4
>> +
>> + /* _less_than_4_left */
>> +15: CMP X2, #2
>> + B.LT 16f // _less_than_2_left
>> +
>> + /* get 16bits data */
>> + LDRH W6, [X1], #2
>> +
>> + /* push the data to stack */
>> + STRH W6, [sp, X11]
>> + SUB X2, X2, #2
>> + ADD X11, X11, #2
>> +
>> + /* _less_than_2_left */
>> +16:
>> + /* get 8bits data */
>> + LDRB W7, [X1], #1
>> + STRB W7, [sp, X11]
>> + ADD X11, X11, #1
>> +
>> + /* POP data from stack, store to V13 */
>> + LDR Q13, [sp]
>> + MOVI D14, #0
>> + REV64 V13.16B, V13.16B
>> + MOV V8.16B, V13.16B
>> + MOV V13.D[1], V8.D[0]
>> + MOV V13.D[0], V8.D[1]
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> + CMP X8, #15
>> + B.EQ 80f
>> + CMP X8, #14
>> + B.EQ 81f
>> + CMP X8, #13
>> + B.EQ 82f
>> + CMP X8, #12
>> + B.EQ 83f
>> + CMP X8, #11
>> + B.EQ 84f
>> + CMP X8, #10
>> + B.EQ 85f
>> + CMP X8, #9
>> + B.EQ 86f
>> + CMP X8, #8
>> + B.EQ 87f
>> + CMP X8, #7
>> + B.EQ 88f
>> + CMP X8, #6
>> + B.EQ 89f
>> + CMP X8, #5
>> + B.EQ 90f
>> + CMP X8, #4
>> + B.EQ 91f
>> + CMP X8, #3
>> + B.EQ 92f
>> + CMP X8, #2
>> + B.EQ 93f
>> + CMP X8, #1
>> + B.EQ 94f
>> + CMP X8, #0
>> + B.EQ 95f
>> +
>
> Again, please use a computed goto instead
>
>> +80:
>> + EXT V13.16B, V13.16B, V13.16B, #1
>> + MOV V13.B[15], V14.B[0]
>> + B 5b
>> +
>> +81:
>> + EXT V13.16B, V13.16B, V13.16B, #2
>> + MOV V13.H[7], V14.H[0]
>> + B 5b
>> +
>> +82:
>> + EXT V13.16B, V13.16B, V13.16B, #3
>> + MOV V13.H[7], V14.H[0]
>> + MOV V13.B[13], V14.B[0]
>> + B 5b
>> +83:
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #4
>> + MOV V13.S[3], V14.S[0]
>> + B 5b
>> +
>> +84:
>> + EXT V13.16B, V13.16B, V13.16B, #5
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.B[11], V14.B[0]
>> + B 5b
>> +
>> +85:
>> + EXT V13.16B, V13.16B, V13.16B, #6
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> + B 5b
>> +
>> +86:
>> + EXT V13.16B, V13.16B, V13.16B, #7
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> + MOV V13.B[9], V14.B[0]
>> + B 5b
>> +
>> +87:
>> + MOV V13.D[0], V13.D[1]
>> + MOV V13.D[1], V14.D[0]
>> + B 5b
>> +
>> +88:
>> + EXT V13.16B, V13.16B, V13.16B, #9
>> + MOV V13.D[1], V14.D[0]
>> + MOV V13.B[7], V14.B[0]
>> + B 5b
>> +
>> +89:
>> + EXT V13.16B, V13.16B, V13.16B, #10
>> + MOV V13.D[1], V14.D[0]
>> + MOV V13.H[3], V14.H[0]
>> + B 5b
>> +
>> +90:
>> + EXT V13.16B, V13.16B, V13.16B, #11
>> + MOV V13.D[1], V14.D[0]
>> + MOV V13.H[3], V14.H[0]
>> + MOV V13.B[5], V14.B[0]
>> + B 5b
>> +
>> +91:
>> + MOV V13.S[0], V13.S[3]
>> + MOV V13.D[1], V14.D[0]
>> + MOV V13.S[1], V14.S[0]
>> + B 5b
>> +
>> +92:
>> + EXT V13.16B, V13.16B, V13.16B, #13
>> + MOV V13.D[1], V14.D[0]
>> + MOV V13.S[1], V14.S[0]
>> + MOV V13.B[3], V14.B[0]
>> + B 5b
>> +
>> +93:
>> + MOV V15.H[0], V13.H[7]
>> + MOV V13.16B, V14.16B
>> + MOV V13.H[0], V15.H[0]
>> + B 5b
>> +
>> +94:
>> + MOV V15.B[0], V13.B[15]
>> + MOV V13.16B, V14.16B
>> + MOV V13.B[0], V15.B[0]
>> + B 5b
>> +
>> +95:
>> + LDR Q13,=0x0
>
> movi
>
>> + B 5b // _128_done
>> +
>> + /* _exact_16_left */
>> +10:
>> + LD1 { V13.2D }, [X1], #0x10
>> +
>> + REV64 V13.16B, V13.16B
>> + EXT V13.16B, V13.16B, V13.16B, #8
>> + EOR V13.16B, V13.16B, V10.16B
>> + B 5b // _128_done
>> +
>> + /* _only_less_than_4 */
>> +13: CMP X2, #3
>> + MOVI D14, #0
>> + B.LT 17f //_only_less_than_3
>> +
>> + LDR S13, [X1], #4
>> + MOV V13.B[15], V13.B[0]
>> + MOV V13.B[14], V13.B[1]
>> + MOV V13.B[13], V13.B[2]
>> + MOV V13.S[0], V13.S[1]
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #5
>> +
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.B[11], V14.B[0]
>> +
>> + B 7b // _barrett
>> + /* _only_less_than_3 */
>> +17:
>> + CMP X2, #2
>> + B.LT 18f // _only_less_than_2
>> +
>> + LDR H13, [X1], #2
>> + MOV V13.B[15], V13.B[0]
>> + MOV V13.B[14], V13.B[1]
>> + MOV V13.H[0], V13.H[1]
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #6
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> +
>> + B 7b // _barrett
>> +
>> + /* _only_less_than_2 */
>> +18:
>> + LDRB W7, [X1], #1
>> + LDR Q13, = 0x0
>> + MOV V13.B[15], W7
>> +
>> + EOR V13.16B, V13.16B, V10.16B
>> +
>> + EXT V13.16B, V13.16B, V13.16B, #7
>> + MOV V13.S[3], V14.S[0]
>> + MOV V13.H[5], V14.H[0]
>> + MOV V13.B[9], V14.B[0]
>> +
>> + B 7b // _barrett
>
> Please end with ENDPROC()
>
>> diff --git a/arch/arm64/crypto/crct10dif-neon_glue.c b/arch/arm64/crypto/crct10dif-neon_glue.c
>> new file mode 100644
>> index 0000000..a6d8c5c
>> --- /dev/null
>> +++ b/arch/arm64/crypto/crct10dif-neon_glue.c
>> @@ -0,0 +1,115 @@
>> +/*
>> + * Copyright (c) 2016-2017 Hisilicon Limited.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +
>> +#include <linux/types.h>
>> +#include <linux/module.h>
>> +#include <linux/crc-t10dif.h>
>> +#include <crypto/internal/hash.h>
>> +#include <linux/init.h>
>> +#include <linux/string.h>
>> +#include <linux/kernel.h>
>> +
>> +asmlinkage __u16 crc_t10dif_neon(__u16 crc, const unsigned char *buf,
>> + size_t len);
>> +
>> +struct chksum_desc_ctx {
>> + __u16 crc;
>> +};
>> +
>> +/*
>> + * Steps through buffer one byte at at time, calculates reflected
>> + * crc using table.
>> + */
>> +
>
> Where is the table?
>
>> +static int chksum_init(struct shash_desc *desc)
>> +{
>> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
>> +
>> + ctx->crc = 0;
>> +
>> + return 0;
>> +}
>> +
>> +static int chksum_update(struct shash_desc *desc, const u8 *data,
>> + unsigned int length)
>> +{
>> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
>> +
>> + ctx->crc = crc_t10dif_neon(ctx->crc, data, length);
>
> You need kernel_neon_begin/kernel_neon_end here
>
>> + return 0;
>> +}
>> +
>> +static int chksum_final(struct shash_desc *desc, u8 *out)
>> +{
>> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
>> +
>> + *(__u16 *)out = ctx->crc;
>> + return 0;
>> +}
>> +
>> +static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len,
>> + u8 *out)
>> +{
>> + *(__u16 *)out = crc_t10dif_neon(*crcp, data, len);
>
> ... and here
>
>> + return 0;
>> +}
>> +
>> +static int chksum_finup(struct shash_desc *desc, const u8 *data,
>> + unsigned int len, u8 *out)
>> +{
>> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
>> +
>> + return __chksum_finup(&ctx->crc, data, len, out);
>> +}
>> +
>> +static int chksum_digest(struct shash_desc *desc, const u8 *data,
>> + unsigned int length, u8 *out)
>> +{
>> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
>> +
>> + return __chksum_finup(&ctx->crc, data, length, out);
>> +}
>> +
>> +static struct shash_alg alg = {
>> + .digestsize = CRC_T10DIF_DIGEST_SIZE,
>> + .init = chksum_init,
>> + .update = chksum_update,
>> + .final = chksum_final,
>> + .finup = chksum_finup,
>> + .digest = chksum_digest,
>> + .descsize = sizeof(struct chksum_desc_ctx),
>> + .base = {
>> + .cra_name = "crct10dif",
>> + .cra_driver_name = "crct10dif-neon",
>> + .cra_priority = 200,
>> + .cra_blocksize = CRC_T10DIF_BLOCK_SIZE,
>> + .cra_module = THIS_MODULE,
>
>
> Please align the = characters here, and add only a single space after
>
> Note that you can do
>
> .base.cra_name = xxx,
> .base.xxx
>
> as well.
>
>> + }
>> +};
>> +
>> +static int __init crct10dif_arm64_mod_init(void)
>> +{
>> + return crypto_register_shash(&alg);
>
> You need to check here if your CPU has support for the 64x64->128
> PMULL instruction.
>
>> +}
>> +
>> +static void __exit crct10dif_arm64_mod_fini(void)
>> +{
>> + crypto_unregister_shash(&alg);
>> +}
>> +
>> +module_init(crct10dif_arm64_mod_init);
>> +module_exit(crct10dif_arm64_mod_fini);
>> +
>> +MODULE_AUTHOR("YueHaibing <yuehaibing@huawei.com>");
>> +MODULE_DESCRIPTION("T10 DIF CRC calculation accelerated with ARM64 NEON instruction.");
>> +MODULE_LICENSE("GPL");
>> +
>> +MODULE_ALIAS_CRYPTO("crct10dif");
>> +MODULE_ALIAS_CRYPTO("crct10dif-neon");
>> --
>> 2.7.0
>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [RFC PATCH 06/11] ARM: tlbflush: drop dependency on CONFIG_SMP
From: Vladimir Murzin @ 2016-11-22 13:36 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122100345.GV1041@n2100.armlinux.org.uk>
On 22/11/16 10:03, Russell King - ARM Linux wrote:
> On Tue, Nov 22, 2016 at 09:26:03AM +0000, Vladimir Murzin wrote:
>> It can be referenced in UP case as well.
>
> What's missing is an explanation of why you want this change.
> Exposing the local_* stuff doesn't make sense for UP.
It comes from:
arch/arm/mach-mvebu/pmsu.c: In function 'armada_370_xp_pmsu_idle_enter':
arch/arm/mach-mvebu/pmsu.c:291:2: error: implicit declaration of function 'local_flush_tlb_all' [-Werror=implicit-function-declaration]
local_flush_tlb_all();
^
make[1]: *** [arch/arm/mach-mvebu/pmsu.o] Error 1
and
arch/arm/mach-imx/pm-imx5.c: In function 'mx5_suspend_enter':
arch/arm/mach-imx/pm-imx5.c:227:3: error: implicit declaration of function 'local_flush_tlb_all' [-Werror=implicit-function-declaration]
local_flush_tlb_all();
^
cc1: some warnings being treated as errors
make[1]: *** [arch/arm/mach-imx/pm-imx5.o] Error 1
Maybe there are other users, please, let me know if you want me to count them
all.
Cheers
Vladimir
>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
>> ---
>> arch/arm/include/asm/tlbflush.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
>> index def9e57..d9a6e2e 100644
>> --- a/arch/arm/include/asm/tlbflush.h
>> +++ b/arch/arm/include/asm/tlbflush.h
>> @@ -641,7 +641,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>>
>> #endif
>>
>> -#elif defined(CONFIG_SMP) /* !CONFIG_MMU */
>> +#else /* !CONFIG_MMU */
>>
>> #ifndef __ASSEMBLY__
>>
>> --
>> 1.7.9.5
>>
>
^ permalink raw reply
* [PATCH] PCI: Add information about describing PCI in ACPI
From: Gabriele Paoloni @ 2016-11-22 13:13 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161121201001.GA4832@bhelgaas-glaptop.roam.corp.google.com>
Hi Bjorn
> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas at kernel.org]
> Sent: 21 November 2016 20:10
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; linux-pci at vger.kernel.org; linux-
> acpi at vger.kernel.org; linux-kernel at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org; linaro-acpi at lists.linaro.org
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: linux-pci-owner at vger.kernel.org [mailto:linux-pci-
> > > owner at vger.kernel.org] On Behalf Of Bjorn Helgaas
> > > Sent: 21 November 2016 16:47
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; linux-pci at vger.kernel.org; linux-
> > > acpi at vger.kernel.org; linux-kernel at vger.kernel.org; linux-arm-
> > > kernel at lists.infradead.org; linaro-acpi at lists.linaro.org
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > > > Hi Bjorn
> > > >
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas [mailto:helgaas at kernel.org]
> > > > > Sent: 18 November 2016 17:54
> > > > > To: Gabriele Paoloni
> > > > > Cc: Bjorn Helgaas; linux-pci at vger.kernel.org; linux-
> > > > > acpi at vger.kernel.org; linux-kernel at vger.kernel.org; linux-arm-
> > > > > kernel at lists.infradead.org; linaro-acpi at lists.linaro.org
> > > > > Subject: Re: [PATCH] PCI: Add information about describing PCI
> in
> > > ACPI
> > > > >
> > > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni
> wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: linux-kernel-owner at vger.kernel.org [mailto:linux-
> kernel-
> > > > > > > owner at vger.kernel.org] On Behalf Of Bjorn Helgaas
> > > > > > > Sent: 17 November 2016 18:00
> > > > >
> > > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> > > mechanisms
> > > > > for
> > > > > > > +reserving address space! The static tables are for things
> the
> > > OS
> > > > > > > +needs to know early in boot, before it can parse the ACPI
> > > > > namespace.
> > > > > > > +If a new table is defined, an old OS needs to operate
> > > correctly
> > > > > even
> > > > > > > +though it ignores the table. _CRS allows that because it
> is
> > > > > generic
> > > > > > > +and understood by the old OS; a static table does not.
> > > > > >
> > > > > > Right so if my understanding is correct you are saying that
> > > resources
> > > > > > described in the MCFG table should also be declared in
> PNP0C02
> > > > > devices
> > > > > > so that the PNP driver can reserve these resources.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > On the other side the PCI Root bridge driver should not
> reserve
> > > such
> > > > > > resources.
> > > > > >
> > > > > > Well if my understanding is correct I think we have a problem
> > > here:
> > > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > > > >
> > > > > > As you can see pci_ecam_create() will conflict with the pnp
> > > driver
> > > > > > as it will try to reserve the resources from the MCFG
> table...
> > > > > >
> > > > > > Maybe we need to rework pci_ecam_create() ?
> > > > >
> > > > > I think it's OK as it is.
> > > > >
> > > > > The pnp/system.c driver does try to reserve PNP0C02 resources,
> and
> > > it
> > > > > marks them as "not busy". That way they appear in /proc/iomem
> and
> > > > > won't be allocated for anything else, but they can still be
> > > requested
> > > > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > > > >
> > > > > This is analogous to what the PCI core does in
> > > pci_claim_resource().
> > > > > This is really a function of the ACPI/PNP *core*, which should
> > > reserve
> > > > > all _CRS resources for all devices (not just PNP0C02 devices).
> But
> > > > > it's done by pnp/system.c, and only for PNP0C02, because
> there's a
> > > > > bunch of historical baggage there.
> > > > >
> > > > > You'll also notice that in this case, things are out of order:
> > > > > logically the pnp/system.c reservation should happen first, but
> in
> > > > > fact the pci/ecam.c request happens *before* the pnp/system.c
> one.
> > > > > That means the pnp/system.c one might fail and complain "[mem
> ...]
> > > > > could not be reserved".
> > > >
> > > > Correct me if I am wrong...
> > > >
> > > > So currently we are relying on the fact that pci_ecam_create() is
> > > called
> > > > before the pnp driver.
> > > > If the pnp driver came first we would end up in pci_ecam_create()
> > > failing
> > > > here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> > > >
> > > > I am not sure but it seems to me like a bit weak condition to
> rely
> > > on...
> > > > what about removing the error condition in pci_ecam_create() and
> > > logging
> > > > just a dev_info()?
> > >
> > > Huh. I'm confused. I *thought* it would be safe to reverse the
> > > order, which would effectively be this:
> > >
> > > system_pnp_probe
> > > reserve_resources_of_dev
> > > reserve_range
> > > request_mem_region([mem 0xb0000000-0xb1ffffff])
> > > ...
> > > pci_ecam_create
> > > request_resource_conflict([mem 0xb0000000-0xb1ffffff])
> > >
> > >
> > > but I experimented with the patch below on qemu, and it failed as
> you
> > > predicted:
> > >
> > > ** res test **
> > > requested [mem 0xa0000000-0xafffffff]
> > > can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with
> ECAM
> > > PNP [mem 0xa0000000-0xafffffff]
> > >
> > > I expected the request_resource_conflict() to succeed since it's
> > > completely contained in the "ECAM PNP" region. But I guess I don't
> > > understand kernel/resource.c well enough.
> >
> > I think it fails because effectively the PNP driver is populating the
> > iomem_resource resource tree and therefore pci_ecam_create() finds
> that
> > it cannot add the cfg resource to the same hierarchy as it is already
> > there...
>
> Right. I'm just surprised because the PNP reservation is marked
> "not busy", and a driver (e.g., ECAM) should still be able to request
> the resource.
Yes unfortunately pci_ecam_create() is not flexible on the conflict as
pci_request_regions():
http://lxr.free-electrons.com/source/kernel/resource.c#L1155
if the conflict resource is not busy pci_request_regions() will create
a child resource under the conflict sibling and mark it as busy...
or at least this is my understanding...
>
> > > I'm not sure we need to fix anything yet, since we currently do the
> > > ecam.c request before the system.c one, and any change there would
> be
> > > a long ways off. If/when that *does* change, I think the correct
> fix
> > > would be to change ecam.c so its request succeeds (by changing the
> way
> > > it does the request, fixing kernel/resource.c, or whatever) rather
> > > than to reduce the log level and ignore the failure.
> >
> > Well in my mind I didn't want just to make the error disappear...
> > If all the resources should be reserved by the PNP driver then
> ideally
> > we could take away request_resource_conflict() from
> pci_ecam_create(),
> > but this would make buggy some systems with an already shipped BIOS
> > that relied on pci_ecam_create() reservation rather than PNP
> reservation.
>
> I don't want remove the request from ecam.c. Ideally, there should be
> TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or
> whatever it is, and a second from ecam.c. The first is the generic
> one saying "this region is consumed by a piece of hardware, so don't
> put anything else here." The second is the driver-specific one saying
> "PCI ECAM owns this region, nobody else can use it."
>
> This is the same way we handle PCI BAR resources. Here are two
> examples from my laptop. The first (00:08.0) only has one line:
> it has a BAR that consumes address space, but I don't have a driver
> for it loaded. The second (00:16.0) does have a driver loaded, so it
> has a second line showing that the driver owns the space:
>
> f124a000-f124afff : 0000:00:08.0 # from PCI core
>
> f124d000-f124dfff : 0000:00:16.0 # from PCI core
> f124d000-f124dfff : mei_me # from mei_me driver
>
> > Just removing the error condition and converting dev_err() into
> > dev_info() seems to me like accommodating already shipped BIOS images
> > and flagging a reservation that is already done by somebody else
> > without compromising the functionality of the PCI Root bridge driver
> > (so far the only reason why I can see the error condition there is
> > to catch a buggy MCFG with overlapping addresses; so if this is the
> > case maybe we need to have a different diagnostic check to make sure
> > that the MCFG table is alright)
>
> Ideally I think we should end up with this:
>
> a0000000-afffffff : pnp 00:01
> a0000000-afffffff : PCI ECAM
I think that for PCIe device drivers it works ok because it is guaranteed
that their own pci_request_regions() is called always after
pci_claim_resource() of the bridge that is on top of them...
I.e. pci_claim_resource() reserves the resources as not busy and
pci_request_regions() will create a child busy resource
>
> Realistically right now we'll probably end up with only the "PCI ECAM"
> line in /proc/iomem and a warning from system.c about not being able
> to reserve the space.
>
> If we ever change things to do the generic PNP reservation first, then
> we should fix things so ecam.c can claim the space without an error.
Maybe the patch below could be a sort of solution...effectively pci_ecam
should succeed in reserving a busy resource under the conflict resource
in case of PNP driver allocating a non BUSY resource first...
---
drivers/pci/ecam.c | 16 +++++-----------
drivers/pci/host/pci-thunder-ecam.c | 2 +-
include/linux/pci-ecam.h | 2 +-
3 files changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
index 43ed08d..999b6ef 100644
--- a/drivers/pci/ecam.c
+++ b/drivers/pci/ecam.c
@@ -66,16 +66,10 @@ struct pci_config_window *pci_ecam_create(struct device *dev,
}
bsz = 1 << ops->bus_shift;
- cfg->res.start = cfgres->start;
- cfg->res.end = cfgres->end;
- cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- cfg->res.name = "PCI ECAM";
-
- conflict = request_resource_conflict(&iomem_resource, &cfg->res);
- if (conflict) {
+ cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM");
+ if (!cfg->res) {
err = -EBUSY;
- dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n",
- &cfg->res, conflict->name, conflict);
+ dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res);
goto err_exit;
}
@@ -126,8 +120,8 @@ void pci_ecam_free(struct pci_config_window *cfg)
if (cfg->win)
iounmap(cfg->win);
}
- if (cfg->res.parent)
- release_resource(&cfg->res);
+ if (cfg->res->parent)
+ release_region(cfg->res->start, resource_size(cfg->res));
kfree(cfg);
}
diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c
index d50a3dc..2e48d9d 100644
--- a/drivers/pci/host/pci-thunder-ecam.c
+++ b/drivers/pci/host/pci-thunder-ecam.c
@@ -117,7 +117,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn,
* the config space access window. Since we are working with
* the high-order 32 bits, shift everything down by 32 bits.
*/
- node_bits = (cfg->res.start >> 32) & (1 << 12);
+ node_bits = (cfg->res->start >> 32) & (1 << 12);
v |= node_bits;
set_val(v, where, size, val);
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index 7adad20..f30a4ea 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -36,7 +36,7 @@ struct pci_ecam_ops {
* use ECAM.
*/
struct pci_config_window {
- struct resource res;
+ struct resource *res;
struct resource busr;
void *priv;
struct pci_ecam_ops *ops;
--
2.7.4
^ permalink raw reply related
* [PATCH v2 2/2] memory: da8xx-ddrctl: drop the call to of_flat_dt_get_machine_name()
From: Bartosz Golaszewski @ 2016-11-22 12:57 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1479819468-4016-1-git-send-email-bgolaszewski@baylibre.com>
In order to avoid a section mismatch use a locally implemented routine
instead of of_flat_dt_get_machine_name() when printing the error
message.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
drivers/memory/da8xx-ddrctl.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/memory/da8xx-ddrctl.c b/drivers/memory/da8xx-ddrctl.c
index a20e7bb..ee0c266 100644
--- a/drivers/memory/da8xx-ddrctl.c
+++ b/drivers/memory/da8xx-ddrctl.c
@@ -14,7 +14,6 @@
#include <linux/module.h>
#include <linux/of.h>
#include <linux/of_device.h>
-#include <linux/of_fdt.h>
#include <linux/platform_device.h>
#include <linux/io.h>
@@ -71,6 +70,25 @@ static const struct da8xx_ddrctl_board_settings da8xx_ddrctl_board_confs[] = {
},
};
+/*
+ * FIXME Remove this function once of/base gets a general routine for getting
+ * the machine model/compatible string.
+ */
+static const char *da8xx_ddrctl_machine_get_compatible(void)
+{
+ struct device_node *root;
+ const char *compatible;
+ int ret = -1;
+
+ root = of_find_node_by_path("/");
+ if (root) {
+ ret = of_property_read_string(root, "compatible", &compatible);
+ of_node_put(root);
+ }
+
+ return ret ? NULL : compatible;
+}
+
static const struct da8xx_ddrctl_config_knob *
da8xx_ddrctl_match_knob(const struct da8xx_ddrctl_setting *setting)
{
@@ -118,7 +136,7 @@ static int da8xx_ddrctl_probe(struct platform_device *pdev)
setting = da8xx_ddrctl_get_board_settings();
if (!setting) {
dev_err(dev, "no settings for board '%s'\n",
- of_flat_dt_get_machine_name());
+ da8xx_ddrctl_machine_get_compatible());
return -EINVAL;
}
--
2.9.3
^ permalink raw reply related
* [PATCH v2 1/2] bus: da8xx-mstpri: drop the call to of_flat_dt_get_machine_name()
From: Bartosz Golaszewski @ 2016-11-22 12:57 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1479819468-4016-1-git-send-email-bgolaszewski@baylibre.com>
In order to avoid a section mismatch use a locally implemented routine
instead of of_flat_dt_get_machine_name() when printing the error
message.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
drivers/bus/da8xx-mstpri.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/bus/da8xx-mstpri.c b/drivers/bus/da8xx-mstpri.c
index 85f0b53..bd38170 100644
--- a/drivers/bus/da8xx-mstpri.c
+++ b/drivers/bus/da8xx-mstpri.c
@@ -16,7 +16,6 @@
#include <linux/platform_device.h>
#include <linux/io.h>
#include <linux/regmap.h>
-#include <linux/of_fdt.h>
/*
* REVISIT: Linux doesn't have a good framework for the kind of performance
@@ -190,6 +189,25 @@ static const struct da8xx_mstpri_board_priorities da8xx_mstpri_board_confs[] = {
},
};
+/*
+ * FIXME Remove this function once of/base gets a general routine for getting
+ * the machine model/compatible string.
+ */
+static const char *da8xx_mstpri_machine_get_compatible(void)
+{
+ struct device_node *root;
+ const char *compatible;
+ int ret = -1;
+
+ root = of_find_node_by_path("/");
+ if (root) {
+ ret = of_property_read_string(root, "compatible", &compatible);
+ of_node_put(root);
+ }
+
+ return ret ? NULL : compatible;
+}
+
static const struct da8xx_mstpri_board_priorities *
da8xx_mstpri_get_board_prio(void)
{
@@ -227,7 +245,7 @@ static int da8xx_mstpri_probe(struct platform_device *pdev)
prio_list = da8xx_mstpri_get_board_prio();
if (!prio_list) {
dev_err(dev, "no master priotities defined for board '%s'\n",
- of_flat_dt_get_machine_name());
+ da8xx_mstpri_machine_get_compatible());
return -EINVAL;
}
--
2.9.3
^ permalink raw reply related
* [PATCH v2 0/2] da8xx: fix section mismatch in new drivers
From: Bartosz Golaszewski @ 2016-11-22 12:57 UTC (permalink / raw)
To: linux-arm-kernel
Sekhar noticed there's a section mismatch in the da8xx-mstpri and
da8xx-ddrctl drivers. This is caused by calling
of_flat_dt_get_machine_name() which has an __init annotation.
This series addresses this issue by open coding routines that return
the machine compatible string in both drivers. Once a general function
for that in of/base is merged, we'll remove them.
v1 -> v2:
- drop patch [1/3] from v1
- introduce internal routines in the drivers instead of a general
function in of/base.c
Bartosz Golaszewski (2):
bus: da8xx-mstpri: drop the call to of_flat_dt_get_machine_name()
memory: da8xx-ddrctl: drop the call to of_flat_dt_get_machine_name()
drivers/bus/da8xx-mstpri.c | 22 ++++++++++++++++++++--
drivers/memory/da8xx-ddrctl.c | 22 ++++++++++++++++++++--
2 files changed, 40 insertions(+), 4 deletions(-)
--
2.9.3
^ permalink raw reply
* [PATCH v3] arm64/crypto: Accelerated CRC T10 DIF computation
From: Ard Biesheuvel @ 2016-11-22 12:53 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20161122101455.5312-1-yuehaibing@huawei.com>
On 22 November 2016 at 10:14, YueHaibing <yuehaibing@huawei.com> wrote:
> This is the ARM64 CRC T10 DIF transform accelerated with the ARMv8
> NEON instruction.The config CRYPTO_CRCT10DIF_NEON should be turned
> on to enable the feature.The crc_t10dif crypto library function will
> use this faster algorithm when crct10dif_neon module is loaded.
>
What is this algorithm commonly used for? In other words, why is it a
good idea to add support for this algorithm to the kernel?
> Tcrypt benchmark results:
>
> HIP06 (mode=320 sec=2)
>
> The ratio of bytes/sec crct10dif-neon Vs. crct10dif-generic:
>
> TEST neon generic ratio
> 16 byte blocks, 16 bytes per update, 1 updates 214506112 171095400 1.25
> 64 byte blocks, 16 bytes per update, 4 updates 139385312 119036352 1.17
> 64 byte blocks, 64 bytes per update, 1 updates 671523712 198945344 3.38
> 256 byte blocks, 16 bytes per update, 16 updates 157674880 125146752 1.26
> 256 byte blocks, 64 bytes per update, 4 updates 491888128 175764096 2.80
> 256 byte blocks, 256 bytes per update, 1 updates 2123298176 206995200 10.26
> 1024 byte blocks, 16 bytes per update, 64 updates 161243136 126460416 1.28
> 1024 byte blocks, 256 bytes per update, 4 updates 1643020800 200027136 8.21
> 1024 byte blocks, 1024 bytes per update, 1 updates 4238239232 209106432 20.27
> 2048 byte blocks, 16 bytes per update, 128 updates 162079744 126953472 1.28
> 2048 byte blocks, 256 bytes per update, 8 updates 1693587456 200867840 8.43
> 2048 byte blocks, 1024 bytes per update, 2 updates 3424323584 206330880 16.60
> 2048 byte blocks, 2048 bytes per update, 1 updates 5228207104 208620544 25.06
> 4096 byte blocks, 16 bytes per update, 256 updates 162304000 126894080 1.28
> 4096 byte blocks, 256 bytes per update, 16 updates 1731862528 201197568 8.61
> 4096 byte blocks, 1024 bytes per update, 4 updates 3668625408 207003648 17.72
> 4096 byte blocks, 4096 bytes per update, 1 updates 5551239168 209127424 26.54
> 8192 byte blocks, 16 bytes per update, 512 updates 162779136 126984192 1.28
> 8192 byte blocks, 256 bytes per update, 32 updates 1753702400 201420800 8.71
> 8192 byte blocks, 1024 bytes per update, 8 updates 3760918528 207351808 18.14
> 8192 byte blocks, 4096 bytes per update, 2 updates 5483655168 208928768 26.25
> 8192 byte blocks, 8192 bytes per update, 1 updates 5623377920 209108992 26.89
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> Signed-off-by: YangShengkai <yangshengkai@huawei.com>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
>
> ---
> arch/arm64/crypto/Kconfig | 5 +
> arch/arm64/crypto/Makefile | 4 +
> arch/arm64/crypto/crct10dif-neon-asm_64.S | 751 ++++++++++++++++++++++++++++++
> arch/arm64/crypto/crct10dif-neon_glue.c | 115 +++++
> 4 files changed, 875 insertions(+)
> create mode 100644 arch/arm64/crypto/crct10dif-neon-asm_64.S
> create mode 100644 arch/arm64/crypto/crct10dif-neon_glue.c
>
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index 2cf32e9..2e450bf 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -23,6 +23,11 @@ config CRYPTO_GHASH_ARM64_CE
> depends on ARM64 && KERNEL_MODE_NEON
> select CRYPTO_HASH
>
> +config CRYPTO_CRCT10DIF_NEON
> + tristate "CRCT10DIF hardware acceleration using NEON instructions"
> + depends on ARM64 && KERNEL_MODE_NEON
> + select CRYPTO_HASH
> +
Could you please follow the existing pattern:
config CRYPTO_CRCT10DIF_ARM64_NEON
> config CRYPTO_AES_ARM64_CE
> tristate "AES core cipher using ARMv8 Crypto Extensions"
> depends on ARM64 && KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index abb79b3..6c9ff2c 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,6 +29,10 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> +obj-$(CONFIG_CRYPTO_CRCT10DIF_NEON) += crct10dif-neon.o
> +crct10dif-neon-y := crct10dif-neon-asm_64.o crct10dif-neon_glue.o
> +AFLAGS_crct10dif-neon-asm_64.o := -march=armv8-a+crypto
> +
Please drop this line, and add
.cpu generic+crypto
to the .S file
> AFLAGS_aes-ce.o := -DINTERLEAVE=4
> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>
> diff --git a/arch/arm64/crypto/crct10dif-neon-asm_64.S b/arch/arm64/crypto/crct10dif-neon-asm_64.S
> new file mode 100644
> index 0000000..2ae3033
> --- /dev/null
> +++ b/arch/arm64/crypto/crct10dif-neon-asm_64.S
> @@ -0,0 +1,751 @@
> +/*
> + * Copyright (c) 2016-2017 Hisilicon Limited.
> + *
Please drop the 2017 here.
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/assembler.h>
> +
> +.global crc_t10dif_neon
Please drop this .global, and use ENTRY() below
> +.text
> +
> +/* X0 is initial CRC value
> + * X1 is data buffer
> + * X2 is the length of buffer
> + * X3 is the backup buffer(for extend)
> + * X4 for other extend parameter(for extend)
> + * Q0, Q1, Q2, Q3 maybe as parameter for other functions,
> + * the value of Q0, Q1, Q2, Q3 maybe modified.
> + *
> + * suggestion:
> + * 1. dont use general purpose register for calculation
> + * 2. set data endianness outside of the kernel
> + * 3. use ext as shifting around
> + * 4. dont use LD3/LD4, ST3/ST4
> + */
> +
Whose suggestions are these, and why? I do appreciate comments like
this, but only if I can learn something from it
> +crc_t10dif_neon:
ENTRY()
> + /* push the register to stack that CRC16 will use */
> + STP X5, X6, [sp, #-0x10]!
Please use an ordinary stack frame, i.e.,
stp x29, x30, [sp, #-xxx]!
mov x29, sp
where xxx is the entire allocation you need for stacking callee save registers
> + STP X7, X8, [sp, #-0x10]!
> + STP X9, X10, [sp, #-0x10]!
> + STP X11, X12, [sp, #-0x10]!
> + STP X13, X14, [sp, #-0x10]!
These are not callee save, so no need to stack them
> + STP Q10, Q11, [sp, #-0x20]!
> + STP Q12, Q13, [sp, #-0x20]!
> + STP Q4, Q5, [sp, #-0x20]!
> + STP Q6, Q7, [sp, #-0x20]!
> + STP Q8, Q9, [sp, #-0x20]!
> + STP Q14, Q15, [sp, #-0x20]!
> + STP Q16, Q17, [sp, #-0x20]!
> + STP Q18, Q19, [sp, #-0x20]!
> +
What is the point of stacking the NEON registers? Also, as a general
note, could you switch to lower case throughout the file?
> + SUB sp,sp,#0x20
> +
Please account for locals in the allocation above. Only outgoing
arguments should be allocated below the frame pointer
> + MOV X11, #0 // PUSH STACK FLAG
> +
What does this comment mean?
> + CMP X2, #0x80
> + B.LT 2f // _less_than_128, <128
> +
Redundant comment
> + /* V10/V11/V12/V13 is 128bit.
> + * we get data 512bit( by cacheline ) each time
> + */
> + LDP Q10, Q11, [X1], #0x20
> + LDP Q12, Q13, [X1], #0x20
> +
> + /* move the initial value to V6 register */
> + LSL X0, X0, #48
> + EOR V6.16B, V6.16B, V6.16B
> + MOV V6.D[1], X0
> +
> + /* big-little end change. because the data in memory is little-end,
> + * we deal the data for bigend
> + */
> +
What if I am using a big endian kernel? Hint: you probably need to
wrap these in CPU_LE()
> + REV64 V10.16B, V10.16B
> + REV64 V11.16B, V11.16B
> + REV64 V12.16B, V12.16B
> + REV64 V13.16B, V13.16B
> + EXT V10.16B, V10.16B, V10.16B, #8
> + EXT V11.16B, V11.16B, V11.16B, #8
> + EXT V12.16B, V12.16B, V12.16B, #8
> + EXT V13.16B, V13.16B, V13.16B, #8
> +
> + EOR V10.16B, V10.16B, V6.16B
> +
> + SUB X2, X2, #0x80
> + ADD X5, X1, #0x20
> +
> + /* deal data when the size of buffer bigger than 128 bytes */
> + /* _fold_64_B_loop */
> + LDR Q6,=0xe658000000000000044c000000000000
Could you move all these non-trivial constants to a separate location
(after the end of the function), and name them?
> +1:
> +
> + LDP Q16, Q17, [X1] ,#0x40
> + LDP Q18, Q19, [X5], #0x40
> +
> + /* carry-less multiply.
> + * V10 high-64bits carry-less multiply
> + * V6 high-64bits(PMULL2)
> + * V11 low-64bits carry-less multiply V6 low-64bits(PMULL)
> + */
> +
> + PMULL2 V4.1Q, V10.2D, V6.2D
> + PMULL V10.1Q, V10.1D, V6.1D
> + PMULL2 V5.1Q, V11.2D, V6.2D
> + PMULL V11.1Q, V11.1D, V6.1D
> +
These instructions are only available if you have the PMULL extension,
so this algorithm is not plain NEON.
> + REV64 V16.16B, V16.16B
> + REV64 V17.16B, V17.16B
> + REV64 V18.16B, V18.16B
> + REV64 V19.16B, V19.16B
> +
Endian swap on LE only?
> + PMULL2 V14.1Q, V12.2D, V6.2D
> + PMULL V12.1Q, V12.1D, V6.1D
> + PMULL2 V15.1Q, V13.2D, V6.2D
> + PMULL V13.1Q, V13.1D, V6.1D
> +
> + EXT V16.16B, V16.16B, V16.16B, #8
> + EOR V10.16B, V10.16B, V4.16B
> +
> + EXT V17.16B, V17.16B, V17.16B, #8
> + EOR V11.16B, V11.16B, V5.16B
> +
> + EXT V18.16B, V18.16B, V18.16B, #8
> + EOR V12.16B, V12.16B, V14.16B
> +
> + EXT V19.16B, V19.16B, V19.16B, #8
> + EOR V13.16B, V13.16B, V15.16B
> +
> + SUB X2, X2, #0x40
> +
> +
> + EOR V10.16B, V10.16B, V16.16B
> + EOR V11.16B, V11.16B, V17.16B
> +
> + EOR V12.16B, V12.16B, V18.16B
> + EOR V13.16B, V13.16B, V19.16B
> +
> + CMP X2, #0x0
> + B.GE 1b // >=0
> +
> + LDR Q6, =0x06df0000000000002d56000000000000
> + MOV V4.16B, V10.16B
> + /* V10 carry-less 0x06df000000000000([127:64]*[127:64]) */
> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
> + PMULL2 V10.1Q, V10.2D, V6.2D
> + EOR V11.16B, V11.16B, V4.16B
> + EOR V11.16B, V11.16B, V10.16B
> +
> + MOV V4.16B, V11.16B
> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
> + PMULL2 V11.1Q, V11.2D, V6.2D
> + EOR V12.16B, V12.16B, V4.16B
> + EOR V12.16B, V12.16B, V11.16B
> +
> + MOV V4.16B, V12.16B
> + PMULL V4.1Q, V4.1D, V6.1D //switch PMULL & PMULL2 order
> + PMULL2 V12.1Q, V12.2D, V6.2D
> + EOR V13.16B, V13.16B, V4.16B
> + EOR V13.16B, V13.16B, V12.16B
> +
> + ADD X2, X2, #48
> + CMP X2, #0x0
> + B.LT 3f // _final_reduction_for_128, <0
> +
> + /* _16B_reduction_loop */
> +4:
> + /* unrelated load as early as possible*/
> + LDR Q10, [X1], #0x10
> +
> + MOV V4.16B, V13.16B
> + PMULL2 V13.1Q, V13.2D, V6.2D
> + PMULL V4.1Q, V4.1D, V6.1D
> + EOR V13.16B, V13.16B, V4.16B
> +
> + REV64 V10.16B, V10.16B
> + EXT V10.16B, V10.16B, V10.16B, #8
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + SUB X2, X2, #0x10
> + CMP X2, #0x0
> + B.GE 4b // _16B_reduction_loop, >=0
> +
> + /* _final_reduction_for_128 */
> +3: ADD X2, X2, #0x10
> + CMP X2, #0x0
> + B.EQ 5f // _128_done, ==0
> +
> + /* _get_last_two_xmms */
Bogus comment. I guess you ported this code from x86, are you sure you
don't need to credit the original author?
> +6: MOV V12.16B, V13.16B
> + SUB X1, X1, #0x10
> + ADD X1, X1, X2
> + LDR Q11, [X1], #0x10
> + REV64 V11.16B, V11.16B
> + EXT V11.16B, V11.16B, V11.16B, #8
> +
> + CMP X2, #8
> + B.EQ 50f
> + B.LT 51f
> + B.GT 52f
> +
> +50:
> + /* dont use X register as temp one */
> + FMOV D14, D12
> + MOVI D12, #0
> + MOV V12.D[1],V14.D[0]
> + B 53f
> +51:
> + MOV X9, #64
> + LSL X13, X2, #3 // <<3 equal x8
> + SUB X9, X9, X13
> + MOV X5, V12.D[0] // low 64-bit
> + MOV X6, V12.D[1] // high 64-bit
> + LSR X10, X5, X9 // high bit of low 64-bit
> + LSL X7, X5, X13
> + LSL X8, X6, X13
> + ORR X8, X8, X10 // combination of high 64-bit
> + MOV V12.D[1], X8
> + MOV V12.D[0], X7
> +
> + B 53f
> +52:
> + LSL X13, X2, #3 // <<3 equal x8
> + SUB X13, X13, #64
> +
> + DUP V18.2D, X13
> + FMOV D16, D12
> + USHL D16, D16, D18
> + EXT V12.16B, V16.16B, V16.16B, #8
> +
> +53:
> + MOVI D14, #0 //add one zero constant
> +
> + CMP X2, #0
> + B.EQ 30f
> + CMP X2, #1
> + B.EQ 31f
> + CMP X2, #2
> + B.EQ 32f
> + CMP X2, #3
> + B.EQ 33f
> + CMP X2, #4
> + B.EQ 34f
> + CMP X2, #5
> + B.EQ 35f
> + CMP X2, #6
> + B.EQ 36f
> + CMP X2, #7
> + B.EQ 37f
> + CMP X2, #8
> + B.EQ 38f
> + CMP X2, #9
> + B.EQ 39f
> + CMP X2, #10
> + B.EQ 40f
> + CMP X2, #11
> + B.EQ 41f
> + CMP X2, #12
> + B.EQ 42f
> + CMP X2, #13
> + B.EQ 43f
> + CMP X2, #14
> + B.EQ 44f
> + CMP X2, #15
> + B.EQ 45f
> +
This looks awful. If you make the snippets below a fixed size, you
could use a computed goto instead
> + // >> 128bit
> +30:
> + EOR V13.16B, V13.16B, V13.16B
> + EOR V8.16B, V8.16B, V8.16B
> + LDR Q9,=0xffffffffffffffffffffffffffffffff
Shouldn't you initialize q8 here as well. And in general, couldn't you
use some kind of shift to generate these constants (in all cases
below)?
> + B 46f
> +
> + // >> 120bit
> +31:
> + USHR V13.2D, V13.2D, #56
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xff
> + LDR Q9,=0xffffffffffffffffffffffffffffff00
> + B 46f
> +
> + // >> 112bit
> +32:
> + USHR V13.2D, V13.2D, #48
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffff
> + LDR Q9,=0xffffffffffffffffffffffffffff0000
> + B 46f
> +
> + // >> 104bit
> +33:
> + USHR V13.2D, V13.2D, #40
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffff
> + LDR Q9,=0xffffffffffffffffffffffffff000000
> + B 46f
> +
> + // >> 96bit
> +34:
> + USHR V13.2D, V13.2D, #32
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffffff
> + LDR Q9,=0xffffffffffffffffffffffff00000000
> + B 46f
> +
> + // >> 88bit
> +35:
> + USHR V13.2D, V13.2D, #24
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffffffff
> + LDR Q9,=0xffffffffffffffffffffff0000000000
> + B 46f
> +
> + // >> 80bit
> +36:
> + USHR V13.2D, V13.2D, #16
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffffffffff
> + LDR Q9,=0xffffffffffffffffffff000000000000
> + B 46f
> +
> + // >> 72bit
> +37:
> + USHR V13.2D, V13.2D, #8
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffffffffffff
> + LDR Q9,=0xffffffffffffffffff00000000000000
> + B 46f
> +
> + // >> 64bit
> +38:
> + EXT V13.16B, V13.16B, V14.16B, #8
> + LDR Q8,=0xffffffffffffffff
> + LDR Q9,=0xffffffffffffffff0000000000000000
> + B 46f
> +
> + // >> 56bit
> +39:
> + EXT V13.16B, V13.16B, V13.16B, #7
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> + MOV V13.B[9], V14.B[0]
> +
> + LDR Q8,=0xffffffffffffffffff
> + LDR Q9,=0xffffffffffffff000000000000000000
> + B 46f
> +
> + // >> 48bit
> +40:
> + EXT V13.16B, V13.16B, V13.16B, #6
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> +
> + LDR Q8,=0xffffffffffffffffffff
> + LDR Q9,=0xffffffffffff00000000000000000000
> + B 46f
> +
> + // >> 40bit
> +41:
> + EXT V13.16B, V13.16B, V13.16B, #5
> + MOV V13.S[3], V14.S[0]
> + MOV V13.B[11], V14.B[0]
> +
> + LDR Q8,=0xffffffffffffffffffffff
> + LDR Q9,=0xffffffffff0000000000000000000000
> + B 46f
> +
> + // >> 32bit
> +42:
> + EXT V13.16B, V13.16B, V13.16B, #4
> + MOV V13.S[3], V14.S[0]
> +
> + LDR Q8,=0xffffffffffffffffffffffff
> + LDR Q9,=0xffffffff000000000000000000000000
> + B 46f
> +
> + // >> 24bit
> +43:
> + EXT V13.16B, V13.16B, V13.16B, #3
> + MOV V13.H[7], V14.H[0]
> + MOV V13.B[13], V14.B[0]
> +
> + LDR Q8,=0xffffffffffffffffffffffffff
> + LDR Q9,=0xffffff00000000000000000000000000
> + B 46f
> +
> + // >> 16bit
> +44:
> + EXT V13.16B, V13.16B, V13.16B, #2
> + MOV V13.H[7], V14.H[0]
> +
> + LDR Q8,=0xffffffffffffffffffffffffffff
> + LDR Q9,=0xffff0000000000000000000000000000
> + B 46f
> +
> + // >> 8bit
> +45:
> + EXT V13.16B, V13.16B, V13.16B, #1
> + MOV V13.B[15], V14.B[0]
> +
> + LDR Q8,=0xffffffffffffffffffffffffffffff
> + LDR Q9,=0xff000000000000000000000000000000
> +
> + // backup V12 first
> + // pblendvb xmm1, xmm2
Another remnant of the x86 version, please remove
> +46:
> + AND V12.16B, V12.16B, V9.16B
> + AND V11.16B, V11.16B, V8.16B
> + ORR V11.16B, V11.16B, V12.16B
> +
> + MOV V12.16B, V11.16B
> + MOV V4.16B, V13.16B
> + PMULL2 V13.1Q, V13.2D, V6.2D
> + PMULL V4.1Q, V4.1D, V6.1D
> + EOR V13.16B, V13.16B, V4.16B
> + EOR V13.16B, V13.16B, V12.16B
> +
> + /* _128_done. we change the Q6 D[0] and D[1] */
> +5: LDR Q6, =0x2d560000000000001368000000000000
> + MOVI D14, #0
> + MOV V10.16B, V13.16B
> + PMULL2 V13.1Q, V13.2D, V6.2D
> +
> + MOV V10.D[1], V10.D[0]
> + MOV V10.D[0], V14.D[0] //set zero
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + MOV V10.16B, V13.16B
> + LDR Q7, =0x00000000FFFFFFFFFFFFFFFFFFFFFFFF
> + AND V10.16B, V10.16B, V7.16B
> +
> + MOV S13, V13.S[3]
> +
> + PMULL V13.1Q, V13.1D, V6.1D
> + EOR V13.16B, V13.16B, V10.16B
> +
> + /* _barrett */
What does '_barrett' mean?
> +7: LDR Q6, =0x00000001f65a57f8000000018bb70000
> + MOVI D14, #0
> + MOV V10.16B, V13.16B
> + PMULL2 V13.1Q, V13.2D, V6.2D
> +
> + EXT V13.16B, V13.16B, V13.16B, #12
> + MOV V13.S[0], V14.S[0]
> +
> + EXT V6.16B, V6.16B, V6.16B, #8
> + PMULL2 V13.1Q, V13.2D, V6.2D
> +
> + EXT V13.16B, V13.16B, V13.16B, #12
> + MOV V13.S[0], V14.S[0]
> +
> + EOR V13.16B, V13.16B, V10.16B
> + MOV X0, V13.D[0]
> +
> + /* _cleanup */
> +8: MOV X14, #48
> + LSR X0, X0, X14
Why the temp x14?
> +99:
> + ADD sp, sp, #0x20
> +
> + LDP Q18, Q19, [sp], #0x20
> + LDP Q16, Q17, [sp], #0x20
> + LDP Q14, Q15, [sp], #0x20
> +
> + LDP Q8, Q9, [sp], #0x20
> + LDP Q6, Q7, [sp], #0x20
> + LDP Q4, Q5, [sp], #0x20
> + LDP Q12, Q13, [sp], #0x20
> + LDP Q10, Q11, [sp], #0x20
> + LDP X13, X14, [sp], #0x10
> + LDP X11, X12, [sp], #0x10
> + LDP X9, X10, [sp], #0x10
> + LDP X7, X8, [sp], #0x10
> + LDP X5, X6, [sp], #0x10
> +
None of these registers need to be restored. The only thing you need
(to mirror the prologue)
ldp x29, x30, [sp], #xxx
ret
where xxx is the same value you used at the beginning.
> + RET
> +
> + /* _less_than_128 */
> +2: CMP X2, #32
> + B.LT 9f // _less_than_32
> + LDR Q6, =0x06df0000000000002d56000000000000
> +
> + LSL X0, X0, #48
> + LDR Q10, =0x0
Please use movi here
> + MOV V10.D[1], X0
> + LDR Q13, [X1], #0x10
> + REV64 V13.16B, V13.16B
> + EXT V13.16B, V13.16B, V13.16B, #8
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + SUB X2, X2, #32
> + B 4b
> +
> + /* _less_than_32 */
> +9: CMP X2, #0
> + B.EQ 99b // _cleanup
You can use CBZ here
> + LSL X0, X0, #48
> + LDR Q10,=0x0
Please use movi here
> + MOV V10.D[1], X0
> +
> + CMP X2, #16
> + B.EQ 10f // _exact_16_left
> + B.LE 11f // _less_than_16_left
> + LDR Q13, [X1], #0x10
> +
> + REV64 V13.16B, V13.16B
> + EXT V13.16B, V13.16B, V13.16B, #8
> +
> + EOR V13.16B, V13.16B, V10.16B
> + SUB X2, X2, #16
> + LDR Q6, =0x06df0000000000002d56000000000000
> + B 6b // _get_last_two_xmms
Another bogus comment
> +
> + /* _less_than_16_left */
> +11: CMP X2, #4
> + B.LT 13f // _only_less_than_4
> +
> + /* backup the length of data, we used in _less_than_2_left */
> + MOV X8, X2
> + CMP X2, #8
> + B.LT 14f // _less_than_8_left
> +
> + LDR X14, [X1], #8
> + /* push the data to stack, we backup the data to V10 */
> + STR X14, [sp, #0]
> + SUB X2, X2, #8
> + ADD X11, X11, #8
> +
> + /* _less_than_8_left */
> +14: CMP X2, #4
> + B.LT 15f // _less_than_4_left
> +
> + /* get 32bit data */
> + LDR W5, [X1], #4
> +
> + /* push the data to stack */
> + STR W5, [sp, X11]
> + SUB X2, X2, #4
> + ADD X11, X11, #4
> +
> + /* _less_than_4_left */
> +15: CMP X2, #2
> + B.LT 16f // _less_than_2_left
> +
> + /* get 16bits data */
> + LDRH W6, [X1], #2
> +
> + /* push the data to stack */
> + STRH W6, [sp, X11]
> + SUB X2, X2, #2
> + ADD X11, X11, #2
> +
> + /* _less_than_2_left */
> +16:
> + /* get 8bits data */
> + LDRB W7, [X1], #1
> + STRB W7, [sp, X11]
> + ADD X11, X11, #1
> +
> + /* POP data from stack, store to V13 */
> + LDR Q13, [sp]
> + MOVI D14, #0
> + REV64 V13.16B, V13.16B
> + MOV V8.16B, V13.16B
> + MOV V13.D[1], V8.D[0]
> + MOV V13.D[0], V8.D[1]
> +
> + EOR V13.16B, V13.16B, V10.16B
> + CMP X8, #15
> + B.EQ 80f
> + CMP X8, #14
> + B.EQ 81f
> + CMP X8, #13
> + B.EQ 82f
> + CMP X8, #12
> + B.EQ 83f
> + CMP X8, #11
> + B.EQ 84f
> + CMP X8, #10
> + B.EQ 85f
> + CMP X8, #9
> + B.EQ 86f
> + CMP X8, #8
> + B.EQ 87f
> + CMP X8, #7
> + B.EQ 88f
> + CMP X8, #6
> + B.EQ 89f
> + CMP X8, #5
> + B.EQ 90f
> + CMP X8, #4
> + B.EQ 91f
> + CMP X8, #3
> + B.EQ 92f
> + CMP X8, #2
> + B.EQ 93f
> + CMP X8, #1
> + B.EQ 94f
> + CMP X8, #0
> + B.EQ 95f
> +
Again, please use a computed goto instead
> +80:
> + EXT V13.16B, V13.16B, V13.16B, #1
> + MOV V13.B[15], V14.B[0]
> + B 5b
> +
> +81:
> + EXT V13.16B, V13.16B, V13.16B, #2
> + MOV V13.H[7], V14.H[0]
> + B 5b
> +
> +82:
> + EXT V13.16B, V13.16B, V13.16B, #3
> + MOV V13.H[7], V14.H[0]
> + MOV V13.B[13], V14.B[0]
> + B 5b
> +83:
> +
> + EXT V13.16B, V13.16B, V13.16B, #4
> + MOV V13.S[3], V14.S[0]
> + B 5b
> +
> +84:
> + EXT V13.16B, V13.16B, V13.16B, #5
> + MOV V13.S[3], V14.S[0]
> + MOV V13.B[11], V14.B[0]
> + B 5b
> +
> +85:
> + EXT V13.16B, V13.16B, V13.16B, #6
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> + B 5b
> +
> +86:
> + EXT V13.16B, V13.16B, V13.16B, #7
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> + MOV V13.B[9], V14.B[0]
> + B 5b
> +
> +87:
> + MOV V13.D[0], V13.D[1]
> + MOV V13.D[1], V14.D[0]
> + B 5b
> +
> +88:
> + EXT V13.16B, V13.16B, V13.16B, #9
> + MOV V13.D[1], V14.D[0]
> + MOV V13.B[7], V14.B[0]
> + B 5b
> +
> +89:
> + EXT V13.16B, V13.16B, V13.16B, #10
> + MOV V13.D[1], V14.D[0]
> + MOV V13.H[3], V14.H[0]
> + B 5b
> +
> +90:
> + EXT V13.16B, V13.16B, V13.16B, #11
> + MOV V13.D[1], V14.D[0]
> + MOV V13.H[3], V14.H[0]
> + MOV V13.B[5], V14.B[0]
> + B 5b
> +
> +91:
> + MOV V13.S[0], V13.S[3]
> + MOV V13.D[1], V14.D[0]
> + MOV V13.S[1], V14.S[0]
> + B 5b
> +
> +92:
> + EXT V13.16B, V13.16B, V13.16B, #13
> + MOV V13.D[1], V14.D[0]
> + MOV V13.S[1], V14.S[0]
> + MOV V13.B[3], V14.B[0]
> + B 5b
> +
> +93:
> + MOV V15.H[0], V13.H[7]
> + MOV V13.16B, V14.16B
> + MOV V13.H[0], V15.H[0]
> + B 5b
> +
> +94:
> + MOV V15.B[0], V13.B[15]
> + MOV V13.16B, V14.16B
> + MOV V13.B[0], V15.B[0]
> + B 5b
> +
> +95:
> + LDR Q13,=0x0
movi
> + B 5b // _128_done
> +
> + /* _exact_16_left */
> +10:
> + LD1 { V13.2D }, [X1], #0x10
> +
> + REV64 V13.16B, V13.16B
> + EXT V13.16B, V13.16B, V13.16B, #8
> + EOR V13.16B, V13.16B, V10.16B
> + B 5b // _128_done
> +
> + /* _only_less_than_4 */
> +13: CMP X2, #3
> + MOVI D14, #0
> + B.LT 17f //_only_less_than_3
> +
> + LDR S13, [X1], #4
> + MOV V13.B[15], V13.B[0]
> + MOV V13.B[14], V13.B[1]
> + MOV V13.B[13], V13.B[2]
> + MOV V13.S[0], V13.S[1]
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + EXT V13.16B, V13.16B, V13.16B, #5
> +
> + MOV V13.S[3], V14.S[0]
> + MOV V13.B[11], V14.B[0]
> +
> + B 7b // _barrett
> + /* _only_less_than_3 */
> +17:
> + CMP X2, #2
> + B.LT 18f // _only_less_than_2
> +
> + LDR H13, [X1], #2
> + MOV V13.B[15], V13.B[0]
> + MOV V13.B[14], V13.B[1]
> + MOV V13.H[0], V13.H[1]
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + EXT V13.16B, V13.16B, V13.16B, #6
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> +
> + B 7b // _barrett
> +
> + /* _only_less_than_2 */
> +18:
> + LDRB W7, [X1], #1
> + LDR Q13, = 0x0
> + MOV V13.B[15], W7
> +
> + EOR V13.16B, V13.16B, V10.16B
> +
> + EXT V13.16B, V13.16B, V13.16B, #7
> + MOV V13.S[3], V14.S[0]
> + MOV V13.H[5], V14.H[0]
> + MOV V13.B[9], V14.B[0]
> +
> + B 7b // _barrett
Please end with ENDPROC()
> diff --git a/arch/arm64/crypto/crct10dif-neon_glue.c b/arch/arm64/crypto/crct10dif-neon_glue.c
> new file mode 100644
> index 0000000..a6d8c5c
> --- /dev/null
> +++ b/arch/arm64/crypto/crct10dif-neon_glue.c
> @@ -0,0 +1,115 @@
> +/*
> + * Copyright (c) 2016-2017 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/crc-t10dif.h>
> +#include <crypto/internal/hash.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +#include <linux/kernel.h>
> +
> +asmlinkage __u16 crc_t10dif_neon(__u16 crc, const unsigned char *buf,
> + size_t len);
> +
> +struct chksum_desc_ctx {
> + __u16 crc;
> +};
> +
> +/*
> + * Steps through buffer one byte at at time, calculates reflected
> + * crc using table.
> + */
> +
Where is the table?
> +static int chksum_init(struct shash_desc *desc)
> +{
> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> + ctx->crc = 0;
> +
> + return 0;
> +}
> +
> +static int chksum_update(struct shash_desc *desc, const u8 *data,
> + unsigned int length)
> +{
> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> + ctx->crc = crc_t10dif_neon(ctx->crc, data, length);
You need kernel_neon_begin/kernel_neon_end here
> + return 0;
> +}
> +
> +static int chksum_final(struct shash_desc *desc, u8 *out)
> +{
> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> + *(__u16 *)out = ctx->crc;
> + return 0;
> +}
> +
> +static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len,
> + u8 *out)
> +{
> + *(__u16 *)out = crc_t10dif_neon(*crcp, data, len);
... and here
> + return 0;
> +}
> +
> +static int chksum_finup(struct shash_desc *desc, const u8 *data,
> + unsigned int len, u8 *out)
> +{
> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> + return __chksum_finup(&ctx->crc, data, len, out);
> +}
> +
> +static int chksum_digest(struct shash_desc *desc, const u8 *data,
> + unsigned int length, u8 *out)
> +{
> + struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> + return __chksum_finup(&ctx->crc, data, length, out);
> +}
> +
> +static struct shash_alg alg = {
> + .digestsize = CRC_T10DIF_DIGEST_SIZE,
> + .init = chksum_init,
> + .update = chksum_update,
> + .final = chksum_final,
> + .finup = chksum_finup,
> + .digest = chksum_digest,
> + .descsize = sizeof(struct chksum_desc_ctx),
> + .base = {
> + .cra_name = "crct10dif",
> + .cra_driver_name = "crct10dif-neon",
> + .cra_priority = 200,
> + .cra_blocksize = CRC_T10DIF_BLOCK_SIZE,
> + .cra_module = THIS_MODULE,
Please align the = characters here, and add only a single space after
Note that you can do
.base.cra_name = xxx,
.base.xxx
as well.
> + }
> +};
> +
> +static int __init crct10dif_arm64_mod_init(void)
> +{
> + return crypto_register_shash(&alg);
You need to check here if your CPU has support for the 64x64->128
PMULL instruction.
> +}
> +
> +static void __exit crct10dif_arm64_mod_fini(void)
> +{
> + crypto_unregister_shash(&alg);
> +}
> +
> +module_init(crct10dif_arm64_mod_init);
> +module_exit(crct10dif_arm64_mod_fini);
> +
> +MODULE_AUTHOR("YueHaibing <yuehaibing@huawei.com>");
> +MODULE_DESCRIPTION("T10 DIF CRC calculation accelerated with ARM64 NEON instruction.");
> +MODULE_LICENSE("GPL");
> +
> +MODULE_ALIAS_CRYPTO("crct10dif");
> +MODULE_ALIAS_CRYPTO("crct10dif-neon");
> --
> 2.7.0
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking
From: Andrew Jones @ 2016-11-22 12:52 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <5263e77a-b987-c955-c328-040a11543c94@redhat.com>
On Mon, Nov 21, 2016 at 04:49:20PM -0600, Wei Huang wrote:
>
>
> On 11/21/2016 03:40 PM, Christopher Covington wrote:
> > Hi Wei,
> >
> > On 11/21/2016 03:24 PM, Wei Huang wrote:
> >> From: Christopher Covington <cov@codeaurora.org>
> >
> > I really appreciate your work on these patches. If for any or all of these
> > you have more lines added/modified than me (or using any other better
> > metric), please make sure to change the author to be you with
> > `git commit --amend --reset-author` or equivalent.
>
> Sure, I will if needed. Regarding your comments below, I will fix the
> patch series after Drew's comments, if any.
>
> >
> >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> >> PMU cycle counter values. The code includes a strict checking facility
> >> intended for the -icount option in TCG mode in the configuration file.
> >>
> >> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> ---
> >> arm/pmu.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> arm/unittests.cfg | 14 +++++++
> >> 2 files changed, 132 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 176b070..129ef1e 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
> >> asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
> >> return val;
> >> }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to compensate
> >> + * for, so hand assemble everything between, and including, the PMCR accesses
> >> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
> ^^^^^^^^^^^^
> I will change the comment above to "Total instrs".
>
> >> + */
> >> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> >
> > Nit: I would call this precise_instrs_loop. How many cycles it takes is
> > IMPLEMENTATION DEFINED.
>
> You are right. The cycle indeed depends on the design. Will fix.
>
> >
> >> +{
> >> + asm volatile(
> >> + " mcr p15, 0, %[pmcr], c9, c12, 0\n"
> >> + " isb\n"
> >> + "1: subs %[loop], %[loop], #1\n"
> >> + " bgt 1b\n"
> >
> > Is there any chance we might need an isb here, to prevent the stop from happening
> > before or during the loop? Where ISBs are required, the Linux best practice is to
>
> In theory, I think this can happen when mcr is executed before all loop
> instructions completed, causing pmccntr_read() to miss some cycles. But
> QEMU TCG mode doesn't support out-order-execution. So the test
> condition, "cpi > 0 && cycles != i * cpi", will never be TRUE. Because
> cpi==0 in KVM, this same test condition won't be TRUE under KVM mode either.
>
> > diligently comment why they are needed. Perhaps it would be a good habit to
> > carry over into kvm-unit-tests.
>
> Agreed. Most isb() instructions were added following CP15 writes (not
> all CP15 writes, but at limited locations). We tried to follow what
> Linux kernel does in perf_event.c. If you feel that any isb() place
> needs special comment, I will be more than happy to add it.
>
> <snip>
No new comments from me. Thanks guys for catching the need to update the
comments.
drew
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox