Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] ARM: Fix rd_size declaration
From: Bart Van Assche @ 2017-05-03 19:46 UTC (permalink / raw)
  To: linux-arm-kernel

The global variable 'rd_size' is declared as 'int' in source file
arch/arm/kernel/atags_parse.c and as 'unsigned long' in
drivers/block/brd.c. Fix this inconsistency. Additionally, remove
the declarations of rd_image_start, rd_prompt and rd_doload from
parse_tag_ramdisk() since these duplicate existing declarations
in <linux/initrd.h>.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: <yanaijie@huawei.com>
Cc: <zhaohongjiang@huawei.com>
Cc: <miaoxie@huawei.com>
Cc: linux-arm-kernel at lists.infradead.org
Cc: linux-block at vger.kernel.org
---
 arch/arm/kernel/atags_parse.c | 3 +--
 drivers/block/brd.c           | 1 +
 include/linux/initrd.h        | 3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/atags_parse.c b/arch/arm/kernel/atags_parse.c
index 68c6ae0b9e4c..98fbfd235ac8 100644
--- a/arch/arm/kernel/atags_parse.c
+++ b/arch/arm/kernel/atags_parse.c
@@ -18,6 +18,7 @@
  */
 
 #include <linux/init.h>
+#include <linux/initrd.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
 #include <linux/root_dev.h>
@@ -91,8 +92,6 @@ __tagtable(ATAG_VIDEOTEXT, parse_tag_videotext);
 #ifdef CONFIG_BLK_DEV_RAM
 static int __init parse_tag_ramdisk(const struct tag *tag)
 {
-	extern int rd_size, rd_image_start, rd_prompt, rd_doload;
-
 	rd_image_start = tag->u.ramdisk.start;
 	rd_doload = (tag->u.ramdisk.flags & 1) == 0;
 	rd_prompt = (tag->u.ramdisk.flags & 2) == 0;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..30a45080a9b4 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/init.h>
+#include <linux/initrd.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/major.h>
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 55289d261b4f..bc67b767f9ce 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -10,6 +10,9 @@ extern int rd_prompt;
 /* starting block # of image */
 extern int rd_image_start;
 
+/* size of a single RAM disk */
+extern unsigned long rd_size;
+
 /* 1 if it is not an error if initrd_start < memory_start */
 extern int initrd_below_start_ok;
 
-- 
2.12.2

^ permalink raw reply related

* [PATCH] ARM: Fix rd_size declaration
From: Bart Van Assche @ 2017-05-03 19:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503194600.16089-1-bart.vanassche@sandisk.com>

On Wed, 2017-05-03 at 12:46 -0700, Bart Van Assche wrote:
> The global variable 'rd_size' is declared as 'int' in source file
> arch/arm/kernel/atags_parse.c and as 'unsigned long' in
> drivers/block/brd.c. Fix this inconsistency. Additionally, remove
> the declarations of rd_image_start, rd_prompt and rd_doload from
> parse_tag_ramdisk() since these duplicate existing declarations
> in <linux/initrd.h>.

This is version 2 of the patch for removing the rd_size declaration.
Compared to v1, the new header file <linux/brd.h> has been dropped.

Bart.

^ permalink raw reply

* [PATCH 1/3] of/pci/dma: fix DMA configuration for PCI masters
From: Rob Herring @ 2017-05-03 19:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493786795-28153-1-git-send-email-oza.oza@broadcom.com>

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch serves following:
>
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
>
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
>
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.
>
> 5) leaves scope of adding PCI flag handling for inbound memory
> by the new function.
>
> Bug: SOC-5216
> Change-Id: Ie045386df91e1e0587846bb147ae40d96f6d7d2e
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40428
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Ray Jui <ray.jui@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
> Tested-by: CCXSW <ccxswbuild@broadcom.com>

All these non-person, probably internal Broadcom Tested-by and
Reviewed-by's should be removed too.

> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>

Rob

^ permalink raw reply

* [patch] autogain support for bayer10 format (was Re: [patch] propagating controls in libv4l2)
From: Pavel Machek @ 2017-05-03 19:58 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503190556.GT23750@n2100.armlinux.org.uk>

On Wed 2017-05-03 20:05:56, Russell King - ARM Linux wrote:
> On Wed, Apr 26, 2017 at 06:43:54PM +0300, Ivaylo Dimitrov wrote:
> > >+static int get_luminosity_bayer10(uint16_t *buf, const struct v4l2_format *fmt)
> > >+{
> > >+	long long avg_lum = 0;
> > >+	int x, y;
> > >+	
> > >+	buf += fmt->fmt.pix.height * fmt->fmt.pix.bytesperline / 4 +
> > >+		fmt->fmt.pix.width / 4;
> > >+
> > >+	for (y = 0; y < fmt->fmt.pix.height / 2; y++) {
> > >+		for (x = 0; x < fmt->fmt.pix.width / 2; x++)
> > 
> > That would take some time :). AIUI, we have NEON support in ARM kernels
> > (CONFIG_KERNEL_MODE_NEON), I wonder if it makes sense (me) to convert the
> > above loop to NEON-optimized when it comes to it? Are there any drawbacks in
> > using NEON code in kernel?
> 
> Using neon without the VFP state saved and restored corrupts userspace's
> FP state.  So, you have to save the entire VFP state to use neon in kernel
> mode.  There are helper functions for this: kernel_neon_begin() and
> kernel_neon_end().
...
> Given that, do we really want to be walking over multi-megabytes of image
> data in the kernel with preemption disabled - it sounds like a recipe for
> a very sluggish system.  I think this should (and can only sensibly be
> done) in userspace.

The patch was for libv4l2. (And I explained why we don't need to
overoptimize this.)
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170503/e1091649/attachment.sig>

^ permalink raw reply

* [PATCH 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
From: Rob Herring @ 2017-05-03 20:06 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493786795-28153-3-git-send-email-oza.oza@broadcom.com>

On Tue, May 2, 2017 at 11:46 PM, Oza Pawandeep <oza.oza@broadcom.com> wrote:
> current device framework and of framework integration assumes
> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
>
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.
>
> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> this patch fixes this patch fixes the bug in of_dma_get_range,
> which with as is, parses the PCI memory ranges and return wrong
> size as 0.
>
> in order to get largest possible dma_mask. this patch also
> retuns the largest possible size based on dma-ranges,
>
> for e.g.
> dma-ranges = <0x43000000 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7fffffffff.
>
> based on which iova allocation space will honour PCI host
> bridge limitations.
>
> Bug: SOC-5216
> Change-Id: I4c534bdd17e70c6b27327d39d1656e8ed0cf56d6
> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
> Reviewed-on: http://gerrit-ccxsw.broadcom.net/40762
> Reviewed-by: vpx_checkpatch status <vpx_checkpatch@broadcom.com>
> Reviewed-by: CCXSW <ccxswbuild@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Tested-by: vpx_autobuild status <vpx_autobuild@broadcom.com>
> Tested-by: vpx_smoketest status <vpx_smoketest@broadcom.com>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 02b2903..f7734fc 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -6,6 +6,7 @@
>  #include <linux/ioport.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
> +#include <linux/of_pci.h>
>  #include <linux/pci.h>
>  #include <linux/pci_regs.h>
>  #include <linux/sizes.h>
> @@ -830,6 +831,54 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz
>         int ret = 0;
>         u64 dmaaddr;
>
> +#ifdef CONFIG_PCI
> +       struct resource_entry *window;
> +       LIST_HEAD(res);
> +
> +       if (!node)
> +               return -EINVAL;
> +
> +       if (of_bus_pci_match(np)) {

You are not following what I'm saying. Let me spell it out:

- Add a get_dma_ranges() function to of_bus struct. Or maybe should
cover ranges too (e.g. get_ranges). I'm not sure.
- Convert existing contents of this function to
of_bus_default_dma_get_ranges and add that to the default of_bus
struct.
- Make of_dma_get_range call of_bus_match() and then bus->get_dma_ranges.

Rob

^ permalink raw reply

* [PATCH] arm64: Add translation functions for /dev/mem read/write
From: Leif Lindholm @ 2017-05-03 20:18 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <bda22142-6b46-0e11-9d13-4bed0dcfd713@codeaurora.org>

On Wed, May 03, 2017 at 11:07:45AM -0600, Goel, Sameer wrote:
> On 5/3/2017 5:26 AM, Will Deacon wrote:
> > [adding some /dev/mem fans to cc]
> > 
> > On Tue, May 02, 2017 at 02:28:05PM -0600, Sameer Goel wrote:
> >> Port architecture specific xlate and unxlate functions for /dev/mem
> >> read/write. This sets up the mapping for a valid physical address if a
> >> kernel direct mapping is not already present.
> >>
> >> This is a generic issue as a user space app should not be allowed to crash
> >> the kernel.
> > 
> >> This issue was observed when systemd tried to access performance
> >> pointer record from the FPDT table.
> > 
> > Why is it doing that? Is there not a way to get this via /sys?
>
> There is no ACPI FPDT implementation in the kernel, so the userspace
> systemd code is getting the FPDT table contents from /sys
> and parsing the entries. The performance pointer record is a
> reserved address populated by UEFI and the userspace code tries to
> access it using /dev/mem. The physical address is valid, so cannot
> push back on the user space code.

OK, so then we need to add support for parsing this table in the
kernel and exposing the referred-to regions in a controllable fashion.
Maybe something that belongs under /sys/firmware/efi (adding Matt), or
maybe something that deserves its own driver.

The only two use-cases for /dev/mem on arm64 are:
- Implementing interfaces in the kernel takes up-front effort.
- Being able to accidentally panic the kernel from userland.

/
    Leif

> https://github.com/systemd/systemd/blob/master/src/shared/acpi-fpdt.c
> http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf  5.2.23 
> > 
> >> Ported from commit e045fb2a988a ("x86: PAT avoid aliasing in /dev/mem
> >> read/write")
> >>
> >> Crash Signature:
> >>  Unable to handle kernel paging request at virtual address ffff800008ff0000
> >>  pgd = ffff8007de8b2200
> >>  [ffff800008ff0000] *pgd=0000000000000000, *pud=0000000000000000
> >>  Internal error: Oops: 96000007 [#1] SMP
> >> ................
> >>  CPU: 0 PID: 1 Comm: systemd Not tainted 4.10.0 #1
> >>  task: ffff8007c0820000 task.stack: ffff8007c0900000
> >>  PC is at __arch_copy_to_user+0xb4/0x280
> >>  LR is at read_mem+0xc0/0x138
> >>  pc : [<ffff0000084b3bb4>] lr : [<ffff00000869d178>]
> >> pstate: 80000145
> >>  sp : ffff8007c0903d40
> >> ....................
> >>  x3 : ffff800800000000 x2 : 0000000000000008
> >>  x1 : ffff800008ff0000 x0 : 0000fffff6fdac00
> >> ....................
> >>  Call trace:
> >>  Exception stack(0xffff8007c0903b70 to 0xffff8007c0903ca0)
> >>  [<ffff0000084b3bb4>] __arch_copy_to_user+0xb4/0x280
> >>  [<ffff0000082454d0>] __vfs_read+0x48/0x130
> >>  [<ffff0000082467dc>] vfs_read+0x8c/0x148
> >>  [<ffff000008247a34>] SyS_pread64+0x94/0xa8
> >>  [<ffff0000080833b0>] el0_svc_naked+0x24/0x28
> > 
> > So this certainly looks like a kernel bug, but I don't think your patch is
> > the right way to fix it.
> 
> I agree that the reserved regions are not meant to be accessed by the kernel as system 
> ram. So, another option was to to return a NULL for this translation.
> 
> Since, the same usage was working on other architectures I ported over the same code to
> highlight the issue.
> 
> > 
> >>  Code: a88120c7 d503201f d503201f 36180082 (f8408423)
> >>
> >> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
> >> Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
> >> ---
> >>  arch/arm64/include/asm/io.h |  5 +++++
> >>  arch/arm64/mm/ioremap.c     | 31 +++++++++++++++++++++++++++++++
> >>  2 files changed, 36 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> >> index 0c00c87..c869ea4 100644
> >> --- a/arch/arm64/include/asm/io.h
> >> +++ b/arch/arm64/include/asm/io.h
> >> @@ -183,6 +183,11 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
> >>  #define iowrite32be(v,p)	({ __iowmb(); __raw_writel((__force __u32)cpu_to_be32(v), p); })
> >>  #define iowrite64be(v,p)	({ __iowmb(); __raw_writeq((__force __u64)cpu_to_be64(v), p); })
> >>  
> >> +extern void *xlate_dev_mem_ptr(phys_addr_t phys);
> >> +extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
> >> +
> >> +#define xlate_dev_mem_ptr xlate_dev_mem_ptr
> >> +#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
> >>  #include <asm-generic/io.h>
> >>  
> >>  /*
> >> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> >> index c4c8cd4..ba7e63b 100644
> >> --- a/arch/arm64/mm/ioremap.c
> >> +++ b/arch/arm64/mm/ioremap.c
> >> @@ -24,6 +24,7 @@
> >>  #include <linux/mm.h>
> >>  #include <linux/vmalloc.h>
> >>  #include <linux/io.h>
> >> +#include <linux/memblock.h>
> >>  
> >>  #include <asm/fixmap.h>
> >>  #include <asm/tlbflush.h>
> >> @@ -105,6 +106,36 @@ void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
> >>  EXPORT_SYMBOL(ioremap_cache);
> >>  
> >>  /*
> >> + * Convert a physical pointer to a virtual kernel pointer for /dev/mem
> >> + * access
> >> + */
> >> +void *xlate_dev_mem_ptr(phys_addr_t phys)
> >> +{
> >> +	unsigned long start  = phys &  PAGE_MASK;
> >> +	unsigned long offset = phys & ~PAGE_MASK;
> >> +	void *vaddr;
> >> +
> >> +	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
> >> +	if (page_is_ram(start >> PAGE_SHIFT) && memblock_is_memory(phys))
> >> +		return __va(phys);
> >> +
> >> +	vaddr = ioremap_cache(start, PAGE_SIZE);
> > 
> > Blindly using ioremap like this looks unsafe, since we could accidentally
> > set conflict with the attributes of a mapping used by something else (e.g.
> > firmware running on another CPU).
> > 
> > I'd like to understand more about the crash, so we can see work out how to
> > fix this properly.
> >
> This does opens up access to any valid physical address. In the short term we 
> can block this crash by return NULL from this function if the memblock is MEMBLOCK_NOMAP.
> 
> Eventually we might need to add another memory type to make sure that it can be mapped.
> I have not though about the exact design here.
> 
> Thanks,
> Sameer
>  
> > Will
> > 
> 
> -- 
>  Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH] coresight: use const for device_node structures
From: Mathieu Poirier @ 2017-05-03 20:21 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493712903-16726-1-git-send-email-leo.yan@linaro.org>

Hi Leo,

On Tue, May 02, 2017 at 04:15:03PM +0800, Leo Yan wrote:
> Almost low level functions from open firmware have used const to
> qualify device_node structures, so add const for device_node
> parameters in of_coresight related functions.
> 
> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/of_coresight.c | 4 ++--
>  include/linux/coresight.h                  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
> index 629e031..859ad49 100644
> --- a/drivers/hwtracing/coresight/of_coresight.c
> +++ b/drivers/hwtracing/coresight/of_coresight.c
> @@ -52,7 +52,7 @@ of_coresight_get_endpoint_device(struct device_node *endpoint)
>  			       endpoint, of_dev_node_match);
>  }
>  
> -static void of_coresight_get_ports(struct device_node *node,
> +static void of_coresight_get_ports(const struct device_node *node,
>  				   int *nr_inport, int *nr_outport)
>  {
>  	struct device_node *ep = NULL;
> @@ -102,7 +102,7 @@ static int of_coresight_alloc_memory(struct device *dev,
>  }
>  
>  struct coresight_platform_data *of_get_coresight_platform_data(
> -				struct device *dev, struct device_node *node)
> +			struct device *dev, const struct device_node *node)

The original code didn't do things right.  I suggest:

struct coresight_platform_data *
of_get_coresight_platform_data(struct device *dev,
                               const struct device_node *node)


>  {
>  	int i = 0, ret = 0, cpu;
>  	struct coresight_platform_data *pdata;
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index 2a5982c..769f2c8 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -264,10 +264,10 @@ static inline int coresight_timeout(void __iomem *addr, u32 offset,
>  
>  #ifdef CONFIG_OF
>  extern struct coresight_platform_data *of_get_coresight_platform_data(
> -				struct device *dev, struct device_node *node);
> +	struct device *dev, const struct device_node *node);

Same as above.

Please send me another revision and I'll pick this up for the next cycle.

Thanks,
Mathieu

>  #else
>  static inline struct coresight_platform_data *of_get_coresight_platform_data(
> -	struct device *dev, struct device_node *node) { return NULL; }
> +	struct device *dev, const struct device_node *node) { return NULL; }
>  #endif
>  
>  #ifdef CONFIG_PID_NS
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH] clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
From: Maxime Ripard @ 2017-05-03 20:28 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503031346.29174-1-wens@csie.org>

On Wed, May 03, 2017 at 11:13:46AM +0800, Chen-Yu Tsai wrote:
> The register offset for the lcd1-ch1 clock was incorrectly pointing to
> the lcd0-ch1 clock. This resulted in the lcd0-ch1 clock being disabled
> when the clk core disables unused clocks. This then stops the simplefb
> HDMI output path.
> 
> Reported-by: Bob Ham <rah@settrans.net>
> Fixes: c6e6c96d8fa6 ("clk: sunxi-ng: Add A31/A31s clocks")
> Cc: stable at vger.kernel.org # 4.9.x-
> Signed-off-by: Chen-Yu Tsai <wens@csie.org>

Applied, thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170503/f114aa5c/attachment.sig>

^ permalink raw reply

* [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c
From: Yury Norov @ 2017-05-03 20:32 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAMuHMdXks7pXcg+MNvqYyRWHf-WUrNWLrG8GcPWy8ho49Wx7ug@mail.gmail.com>

On Wed, May 03, 2017 at 05:05:29PM +0200, Geert Uytterhoeven wrote:
> On Wed, May 3, 2017 at 4:51 PM, Yury Norov <ynorov@caviumnetworks.com> wrote:
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/cpumask.h>
> >  #include <linux/percpu.h>
> >  #include <linux/hardirq.h>
> > +#include <asm/spinlock.h>
> 
> linux/spinlock.h?

Comment in include/linux/spinlock.h says:

 * here's the role of the various spinlock/rwlock related include
 * files:
 *
 * on SMP builds:
 *
 [...]
 *  asm/spinlock.h:       contains the arch_spin_*()/etc.  lowlevel
 *                        implementations, mostly inline assembly code
 *

This, and the fact that include/asm-generic/spinlock.h and
arch/arm64/include/asm/spinlock.h doesn't prevent the direct
inclusion means, for me, that I should include asm/spinlock.h,
because of all that I need only arch_spin_lock() and arch_spin_unlock().

Am I wrong?

Yury

^ permalink raw reply

* [PATCH v2 3/8] clk: sunxi-ng: Add class of phase clocks supporting MMC new timing modes
From: Maxime Ripard @ 2017-05-03 20:34 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503031658.29299-4-wens@csie.org>

Hi,

On Wed, May 03, 2017 at 11:16:53AM +0800, Chen-Yu Tsai wrote:
> The MMC clocks on newer SoCs, such as the A83T and H3, support the
> "new timing mode". Under this mode, the output of the clock is divided
> by 2, and the clock delays no longer apply.
> 
> Due to how the clock tree is modeled and setup, we need to model
> this function in two places, the master mmc clock and the two
> child phase clocks. In the mmc clock, we can easily model the
> mode bit as an extra variable post-divider. In the phase clocks,
> we check the bit and return -ENOTSUPP if the bit is set, signaling
> that the phase clocks are not to be used.
> 
> This patch introduces a class of phase clocks that checks the
> timing mode bit.

We've been over this quite some times already...

How do you retrieve the mode used in the clock so that you can also
switch the MMC driver in the new mode?

And do you prevent that from happening on the DT we already have that
use the old clock drivers that do not support this new timing at all?

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170503/6acaeeb8/attachment.sig>

^ permalink raw reply

* [PATCH 0/9] net: thunderx: Adds XDP support
From: Rami Rosen @ 2017-05-03 20:39 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CA+sq2CfW89EbiZmp6byc3y3cbATmmyVB=avzrUXG51-x3yxckg@mail.gmail.com>

Thanks, Sunil.

>with network stack: 0.32 Mpps
>with XDP (XDP_TX): 3 Mpps
>and XDP_DROP: 3.8 Mpps

Interesting; May I ask - which packet size did you use ?

Regards,
Rami Rosen

^ permalink raw reply

* [PATCH V8 4/5] PCI/ASPM: save power on values during bridge init
From: Bjorn Helgaas @ 2017-05-03 21:10 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <92EBB4272BF81E4089A7126EC1E7B284667759B1@IRSMSX101.ger.corp.intel.com>

On Tue, May 02, 2017 at 12:02:53PM +0000, Patel, Mayurkumar wrote:
> >On Fri, Apr 21, 2017 at 2:46 AM, Patel, Mayurkumar
> ><mayurkumar.patel@intel.com> wrote:
> >>>On 4/17/2017 12:38 PM, Bjorn Helgaas wrote:
> >>>>> Like you said, what do we do by default is the question. Should we opt
> >>>>> for safe like we are doing, or try to save some power.
> >>>> I think safety is paramount.  Every user should be able to boot safely
> >>>> without any kernel parameters.  We don't want users to have a problem
> >>>> booting and then have to search for a workaround like booting with
> >>>> "pcie_aspm=off".  Most users will never do that.
> >>>
> >>>OK, no problem with leaving the behavior as it is.
> >>>
> >>>My initial approach was #2. We knew this way that user had full control
> >>>over the ASPM policy by changing the BIOS option. Then, Mayurkumar
> >>>complained that ASPM is not enabled following a hotplug insertion to an
> >>>empty slot. That's when I switched to #3 as it sounded like a good thing
> >>>to have for us.
> >>>
> >>>> Here's a long-term strawman proposal, see what you think:
> >>>>
> >>>>   - Deprecate CONFIG_PCIEASPM_DEFAULT, CONFIG_PCIEASPM_POWERSAVE, etc.
> >>>>   - Default aspm_policy is POLICY_DEFAULT always.
> >>>>   - POLICY_DEFAULT means Linux doesn't touch anything: if BIOS enabled
> >>>> ASPM, we leave it that way; we leave ASPM disabled on hot-added
> >>>> devices.
> >>>
> >> I am also ok with leaving the same behavior as now.  But still
> >> following is something open I feel besides, Which may be there in
> >> your comments redundantly.  The current problem is,
> >> pcie_aspm_exit_link_state() disables the ASPM configuration even
> >> if POLICY_DEFAULT was set.
> >
> >We call pcie_aspm_exit_link_state() when removing an endpoint.  When
> >we remove an endpoint, I think disabling ASPM is the right thing to
> >do.  The spec (PCIe r3.1, sec 5.4.1.3) says "Software must not enable
> >L0s in either direction on a given Link unless components on both
> >sides of the Link each support L0s; otherwise, the result is
> >undefined."
> 
> Yes, you are right and per spec also it makes sense that ASPM needs
> to be disabled.  But, if POLICY_DEFAULT is set then, shouldn't BIOS
> take care of disabling ASPM?

No, I don't think so.  POLICY_DEFAULT is a Linux thing and BIOS
doesn't know anything about it.

ASPM can be configured by BIOS before handoff to Linux, but after
handoff it should be managed either entirely by BIOS or entirely by
Linux.  If BIOS wants to retain ASPM control, it would have to tell
the OS *not* to use ASPM, and it would have to use ACPI hotplug.  In
this case, POLICY_DEFAULT is irrelevant because Linux shouldn't do
anything with ASPM.

But normally BIOS allows Linux to control ASPM, and we would use
native PCIe hotplug (pciehp) instead of ACPI hotplug, and BIOS has no
opportunity to enable or disable ASPM on hotplug events.

> >> I am seeing already following problem(or may be influence) with
> >> it. The Endpoint I have does not have does not have "Presence
> >> detect change" mechanism. Hot plug is working with Link status
> >> events.  When link is in L1 or L1SS and if EP is powered off, no
> >> Link status change event are triggered (It might be the expected
> >> behavior in L1 or L1SS).  When next time EP is powered on there
> >> are link down and link up events coming one after other. BIOS
> >> enables ASPM on Root port and Endpoint, but while processing link
> >> status down, pcie_aspm_exit_link_state() clears the ASPM already
> >> which were enabled by BIOS.  If we want to follow above approach
> >> then shall we consider having something similar as following?
> >
> >The proposal was to leave ASPM disabled on hot-added devices.  If
> >the endpoint was powered off and powered back on again, I think
> >that device looks like a hot-added device, doesn't it?
> 
> Yes, it is hot-added device. Also, I understand, for POLICY_DEFAULT,
> OS would/should not touch ASPM(enable/disable), but BIOS could still
> (enable/disable), right?

Maybe I'm misunderstanding your question.  There are two questions
here:

  1) Does the BIOS allow Linux to manage ASPM?

  2) If Linux does manage ASPM, what policy does it use?

If BIOS doesn't allow Linux to manage ASPM, POLICY_DEFAULT is
irrelevant.  If BIOS does allow Linux to manage ASPM, POLICY_DEFAULT
means Linux should use the settings made by BIOS.  The user could
select a different policy, and then Linux would change the ASPM
configuration accordingly.

> Currently, what happens in my system is as following, (each 2nd
> power cycle/hotplug of Endpoint disables ASPM):
> 
> 
> First Power cycle (When ASPM L1 is already enabled): device gets
> powered off -> there are no Link status events, so no pcie hotplug
> interrupt and pcie_aspm_exit_link_state() triggered.  

If the Downstream Port leading to your Endpoint is hotplug capable,
doesn't the spec require that it can report link state changes (PCIe
r3.1, sec 7.8.6, 7.8.10, 7.8.11)?

> When the device gets powered on again -> Link down/Link up events
> are coming back to back.  First Link down is served. (BIOS checks
> for the Link status and enables ASPM already, as the device is
> actually powered back). OS calls pcie_aspm_exit_link_state() and
> ASPM gets disabled by OS.
> 
> Second Power cycle (When ASPM L1 is disabled after above): device
> gets powered off -> there are link status events, pcie hotplug
> interrupt is triggered and pcie_aspm_exit_link_state() triggered.
> OS disables ASPM. BIOS checks Link status and disables ASPM too.
> When the device gets powered on -> BIOS enables ASPM and as this is
> pcie hotplug insertion, OS does not interfere and we have ASPM
> enabled.

I don't understand this sequence.  If we're using native PCIe hotplug,
BIOS should not be involved to enable ASPM when the device is powered
on.

> The above sequence happens each 2nd power cycle of the hotplug
> device.
> 
> So One could still argue if POLICY_DEFAULT is set, then why OS
> disables ASPM if it is not meant to touch configuration.  This is
> why I proposed following kind of change, so that OS would not touch
> ASPM, if POLICY_DEFAULT is set.  Also, With the below change,
> everything relies on BIOS for ASPM when POLICY_DEFAULT is set and I
> see above problem gets resolved. Also, the existing ASPM behavior
> does not have impact, unless specific BIOS does not disable ASPM on
> Root Port when device gets removed.

^ permalink raw reply

* [RFC PATCH] mtd: spi-nor: handle signal case as failure
From: Cyrille Pitchen @ 2017-05-03 21:23 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493805189-25327-1-git-send-email-der.herr@hofr.at>

Hi all,

Le 03/05/2017 ? 11:53, Nicholas Mc Guire a ?crit :
>  The problem is that stm32_qspi_wait_cmd() will indicate success in case of
>  being interrupted. The if condition is incomplete here as
>  wait_for_copletion_interruptible_timeout can return -ERESTARTSYS but this
>  is not handled by if(!wait_for_completion_interruptible_timeout()).
>  While somewhat unlikely it probably would be hard to figure out what went
>  wrong if the signal case does occur.
> 
> Fixes: commit 0d43d7ab277a ("mtd: spi-nor: add driver for STM32 quad spi flash controller")
> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
> ---
> Problem found by experimental coccinelle script
> 
> Its not clear if its sufficient to simply treat this case as failure or
> if it might need some debug output to allow differentiation of signal
> and timeout case. 
> 
> Patch was compile tested with: stm32_defconfig +CONFIG_MTD=y, CONFIG_MTD_SPI_NOR=y,
> CONFIG_SPI_STM32_QUADSPI=m
> 
> Patch is against v4.11-rc8 (localversion-next is next-20170503)
> 
>  drivers/mtd/spi-nor/stm32-quadspi.c  | 10 ++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/stm32-quadspi.c b/drivers/mtd/spi-nor/stm32-quadspi.c
> index 1056e74..27147ad 100644
> --- a/drivers/mtd/spi-nor/stm32-quadspi.c
> +++ b/drivers/mtd/spi-nor/stm32-quadspi.c
> @@ -159,6 +159,7 @@ static int stm32_qspi_wait_cmd(struct stm32_qspi *qspi)
>  {
>  	u32 cr;
>  	int err = 0;
> +	long timeout = 0;
>  
>  	if (readl_relaxed(qspi->io_base + QUADSPI_SR) & SR_TCF)
>  		return 0;
> @@ -167,8 +168,13 @@ static int stm32_qspi_wait_cmd(struct stm32_qspi *qspi)
>  	cr = readl_relaxed(qspi->io_base + QUADSPI_CR);
>  	writel_relaxed(cr | CR_TCIE, qspi->io_base + QUADSPI_CR);
>  
> -	if (!wait_for_completion_interruptible_timeout(&qspi->cmd_completion,
> -						       msecs_to_jiffies(1000)))

Ludovic, can we just use wait_for_completion_timeout() instead?
I think so but you surely know better than me :)
Do you plan to receive signal? maybe to abort the current SPI flash command?

Best regards,

Cyrille

> +	timeout = wait_for_completion_interruptible_timeout(
> +				&qspi->cmd_completion, msecs_to_jiffies(1000));
> +
> +	/* since the calling side only cares about success of failure
> +	 * returning -ETIMEDOUT even when interrupted should be ok here
> +	 */
> +	if (timeout == 0 || timeout == -ERESTARTSYS)
>  		err = -ETIMEDOUT;
>  
>  	writel_relaxed(cr, qspi->io_base + QUADSPI_CR);
> 

^ permalink raw reply

* [PATCH 2/5] mtd: nand: gpmi: add i.MX 7 SoC support
From: Stefan Agner @ 2017-05-03 21:24 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <d0162bbe-c2de-73d1-0d37-9bdd68ceb94b@gmail.com>

On 2017-05-02 04:18, Marek Vasut wrote:
> On 05/02/2017 11:17 AM, Boris Brezillon wrote:
>> Hi Han,
>>
>> On Fri, 21 Apr 2017 13:29:16 -0500
>> Han Xu <xhnjupt@gmail.com> wrote:
>>
>>>>>
>>>>>>>> But then, adding the type would only require 2-3 lines of change if I
>>>>>>>> add it to the GPMI_IS_MX6 macro...
>>>>>>>
>>>>>>> Then at least add a comment because using type = IMX6SX right under
>>>>>>> gpmi_data_mx7d can trigger some head-scratching. And put my R-B on V2.
>>>>>>
>>>>>> FWIW, I mentioned it in the commit message.
>>>>>>
>>>>>> I think rather then adding a comment it is cleaner to just add IS_IMX7D
>>>>>> and add it to the GPMI_IS_MX6 macro. That does not need a comment since
>>>>>> it implicitly says we have a i.MX 7 but treat it like i.MX 6 and it is a
>>>>>> rather small change. Does that sound acceptable?
>>>>>
>>>>> Sure, that's even better, thanks.
>>>>>
>>>>> btw isn't there some single-core mx7 (mx7s ?) , maybe we should just go
>>>>> with mx7 (without the d suffix). I dunno if it has GPMI NAND though, so
>>>>> maybe mx7d is the right thing to do here ...
>>>>>
>>>>
>>>> There is a Solo version yes, and it has GPMI NAND too. However, almost
>>>> all i.MX 7 IPs have been named imx7d by NXP for some reason (including
>>>> compatible strings, see grep -r -e imx7 Documentation/), so I thought I
>>>> stay consistent here...
> 
> I missed the DT bit, sorry. the DT bindings say:
>   - compatible : should be "fsl,<chip>-gpmi-nand"
> so if FSL invented their own buggy bindings, they need to get them
> through Rob :) IMO for MX7, this should be "imx7-gpmi-nand" , unless
> there's some incentive to discern the solo/dual chips and/or there
> is a future imx7 coming up with different GPMI NAND block version.
> 

There is no incentive to discern solo/dual, afaict this are the same
chips, only some IP's "fused away"... From GPMI NAND perspective, they
are definitely the same.

So that leaves us with "imx7-gpmi-nand" vs. "imx7d-gpmi-nand".

All device tree compatible strings as well as Kconfig symbol,
preprocessor defines and function prefixes have been named "imx7d" or
IMX7D so far... Although they work just fine for i.MX7 Solo too... Not
sure why that exactly ended up to be that way, but I guess the reason
was that NXP started to upstream with the dual...

For the sake of continuity, I would prefer if we stick to that (as my
current patchsets do)... It is like imx6q, which also works for dual...

$ grep -r -w compatible.*imx7d  arch/arm/boot/dts/*.dts* | wc -l
68
$ grep -r -w compatible.*imx7^d  arch/arm/boot/dts/*.dts* | wc -l
0


>>> Hi Guys,
>>>
>>> Yes, there should be a i.MX7 Solo version with one core fused out. IMO, can
>>> we use QUIRK to distinguish them rather than SoC name. I know I also sent
>>> some patch set with SoC Name but I prefer to use QUIRK now.
>>
>> Not sure what this means. Are you okay with Stefan's v2?
> 
> IMO the GPMI controller in solo and dual should be the same, so there's
> no need to have quirks for it.

Agreed.

--
Stefan

^ permalink raw reply

* [PATCH] ARM: Fix __show_regs output timestamps
From: Russell King - ARM Linux @ 2017-05-03 21:30 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493840651.22125.24.camel@perches.com>

On Wed, May 03, 2017 at 12:44:11PM -0700, Joe Perches wrote:
> On Wed, 2017-05-03 at 20:23 +0100, Russell King - ARM Linux wrote:
> > On Wed, Apr 26, 2017 at 10:39:49AM -0700, Joe Perches wrote:
> > > Multiple line formats are not preferred as the second and
> > > subsequent lines may not have timestamps.
> > > 
> > > Lacking timestamps makes reading the output a bit difficult.
> > > This also makes arm/arm64 output more similar.
> > > 
> > > Previous:
> > > 
> > > [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
> > > sp : ecdd7e20  ip : 00000000  fp : ffffffff
> > > 
> > > New:
> > > 
> > > [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
> > > [ 1514.105316] sp : ecdd7e20  ip : 00000000  fp : ffffffff
> > > 
> > > Signed-off-by: Joe Perches <joe@perches.com>
> > 
> > Hi Joe,
> > 
> > Could you put this in my patch system please, I'm unlikely to remember to
> > apply it otherwise if not already there (massive email backlog.)
> > 
> > Thanks.
> 
> Your patch system bounced my perfectly formatted patch
> because your system wants totally unnecessary additional
> information specific to your workflow.
> 
> No thanks, I don't need the additional work just to
> please your system and neither should anyone else.

Don't expect me to remember to apply your patch then.  I've got days of
catch up, and I'm just not going to remember.  Sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* [PATCH v5 20/22] KVM: arm64: vgic-its: Device table save/restore
From: Auger Eric @ 2017-05-03 21:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503152949.GB28421@cbox>

Hi Christoffer,

On 03/05/2017 17:29, Christoffer Dall wrote:
> On Wed, May 03, 2017 at 04:07:45PM +0200, Auger Eric wrote:
>> Hi Christoffer,
>> On 30/04/2017 22:55, Christoffer Dall wrote:
>>> On Fri, Apr 14, 2017 at 12:15:32PM +0200, Eric Auger wrote:
>>>> This patch saves the device table entries into guest RAM.
>>>> Both flat table and 2 stage tables are supported. DeviceId
>>>> indexing is used.
>>>>
>>>> For each device listed in the device table, we also save
>>>> the translation table using the vgic_its_save/restore_itt
>>>> routines.
>>>>
>>>> On restore, devices are re-allocated and their ite are
>>>> re-built.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>> v4 -> v5:
>>>> - sort the device list by deviceid on device table save
>>>> - use defines for shifts and masks
>>>> - use abi->dte_esz
>>>> - clatify entry sizes for L1 and L2 tables
>>>>
>>>> v3 -> v4:
>>>> - use the new proto for its_alloc_device
>>>> - compute_next_devid_offset, vgic_its_flush/restore_itt
>>>>   become static in this patch
>>>> - change in the DTE entry format with the introduction of the
>>>>   valid bit and next field width decrease; ittaddr encoded
>>>>   on its full range
>>>> - fix handle_l1_entry entry handling
>>>> - correct vgic_its_table_restore error handling
>>>>
>>>> v2 -> v3:
>>>> - fix itt_addr bitmask in vgic_its_restore_dte
>>>> - addition of return 0 in vgic_its_restore_ite moved to
>>>>   the ITE related patch
>>>>
>>>> v1 -> v2:
>>>> - use 8 byte format for DTE and ITE
>>>> - support 2 stage format
>>>> - remove kvm parameter
>>>> - ITT flush/restore moved in a separate patch
>>>> - use deviceid indexing
>>>> ---
>>>>  virt/kvm/arm/vgic/vgic-its.c | 183 +++++++++++++++++++++++++++++++++++++++++--
>>>>  virt/kvm/arm/vgic/vgic.h     |   7 ++
>>>>  2 files changed, 185 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>>> index b02fc3f..86dfc6c 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>>> @@ -1682,7 +1682,8 @@ int vgic_its_attr_regs_access(struct kvm_device *dev,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> -u32 compute_next_devid_offset(struct list_head *h, struct its_device *dev)
>>>> +static u32 compute_next_devid_offset(struct list_head *h,
>>>> +				     struct its_device *dev)
>>>>  {
>>>>  	struct list_head *e = &dev->dev_list;
>>>>  	struct its_device *next;
>>>> @@ -1858,7 +1859,7 @@ static int vgic_its_ite_cmp(void *priv, struct list_head *a,
>>>>  		return 1;
>>>>  }
>>>>  
>>>> -int vgic_its_save_itt(struct vgic_its *its, struct its_device *device)
>>>> +static int vgic_its_save_itt(struct vgic_its *its, struct its_device *device)
>>>>  {
>>>>  	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>>  	gpa_t base = device->itt_addr;
>>>> @@ -1877,7 +1878,7 @@ int vgic_its_save_itt(struct vgic_its *its, struct its_device *device)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
>>>> +static int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
>>>>  {
>>>>  	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>>  	gpa_t base = dev->itt_addr;
>>>> @@ -1895,12 +1896,161 @@ int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
>>>>  }
>>>>  
>>>>  /**
>>>> + * vgic_its_save_dte - Save a device table entry at a given GPA
>>>> + *
>>>> + * @its: ITS handle
>>>> + * @dev: ITS device
>>>> + * @ptr: GPA
>>>> + */
>>>> +static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev,
>>>> +			     gpa_t ptr, int dte_esz)
>>>> +{
>>>> +	struct kvm *kvm = its->dev->kvm;
>>>> +	u64 val, itt_addr_field;
>>>> +	int ret;
>>>> +	u32 next_offset;
>>>> +
>>>> +	itt_addr_field = dev->itt_addr >> 8;
>>>> +	next_offset = compute_next_devid_offset(&its->device_list, dev);
>>>> +	val = (1ULL << KVM_ITS_DTE_VALID_SHIFT |
>>>> +	       ((u64)next_offset << KVM_ITS_DTE_NEXT_SHIFT) |
>>>
>>> I think this implies that the next field in your ABI points to the next
>>> offset, regardless of whether or not this is in a a level 2 or lavel 1
>>> table.  See more comments on this below (I reviewed this patch from the
>>> bottom up).
>> Not sure I understand your comment.
>>
>> Doc says:
>>  - next: equals to 0 if this entry is the last one; otherwise it
>>    corresponds to the deviceid offset to the next DTE, capped by
>>    2^14 -1.
>>
>> This is independent on 1 or 2 levels as we sort the devices by
>> deviceid's and compute the delta between those id.
> 
> see below.
> 
>>>
>>> I have a feeling this wasn't tested with 2 level device tables.  Could
>>> that be true?
>> No this was tested with 1 & and 2 levels (I hacked the guest to force 2
>> levels). 1 test hole I have though is all my dte's currently belong to
>> the same 2d level page, ie. my deviceid are not scattered enough.
>>>
>>>> +	       (itt_addr_field << KVM_ITS_DTE_ITTADDR_SHIFT) |
>>>> +		(dev->nb_eventid_bits - 1));
>>>> +	val = cpu_to_le64(val);
>>>> +	ret = kvm_write_guest(kvm, ptr, &val, dte_esz);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * vgic_its_restore_dte - restore a device table entry
>>>> + *
>>>> + * @its: its handle
>>>> + * @id: device id the DTE corresponds to
>>>> + * @ptr: kernel VA where the 8 byte DTE is located
>>>> + * @opaque: unused
>>>> + * @next: offset to the next valid device id
>>>> + *
>>>> + * Return: < 0 on error, 0 otherwise
>>>> + */
>>>> +static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
>>>> +				void *ptr, void *opaque, u32 *next)
>>>> +{
>>>> +	struct its_device *dev;
>>>> +	gpa_t itt_addr;
>>>> +	u8 nb_eventid_bits;
>>>> +	u64 entry = *(u64 *)ptr;
>>>> +	bool valid;
>>>> +	int ret;
>>>> +
>>>> +	entry = le64_to_cpu(entry);
>>>> +
>>>> +	valid = entry >> KVM_ITS_DTE_VALID_SHIFT;
>>>> +	nb_eventid_bits = (entry & KVM_ITS_DTE_SIZE_MASK) + 1;
>>>> +	itt_addr = ((entry & KVM_ITS_DTE_ITTADDR_MASK)
>>>> +			>> KVM_ITS_DTE_ITTADDR_SHIFT) << 8;
>>>> +	*next = 1;
>>>> +
>>>> +	if (!valid)
>>>> +		return 0;
>>>> +
>>>> +	/* dte entry is valid */
>>>> +	*next = (entry & KVM_ITS_DTE_NEXT_MASK) >> KVM_ITS_DTE_NEXT_SHIFT;
>>>> +
>>>> +	ret = vgic_its_alloc_device(its, &dev, id,
>>>> +				    itt_addr, nb_eventid_bits);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +	ret = vgic_its_restore_itt(its, dev);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int vgic_its_device_cmp(void *priv, struct list_head *a,
>>>> +			       struct list_head *b)
>>>> +{
>>>> +	struct its_device *deva = container_of(a, struct its_device, dev_list);
>>>> +	struct its_device *devb = container_of(b, struct its_device, dev_list);
>>>> +
>>>> +	if (deva->device_id < devb->device_id)
>>>> +		return -1;
>>>> +	else
>>>> +		return 1;
>>>> +}
>>>> +
>>>> +/**
>>>>   * vgic_its_save_device_tables - Save the device table and all ITT
>>>>   * into guest RAM
>>>> + *
>>>> + * L1/L2 handling is hidden by vgic_its_check_id() helper which directly
>>>> + * returns the GPA of the device entry
>>>>   */
>>>>  static int vgic_its_save_device_tables(struct vgic_its *its)
>>>>  {
>>>> -	return -ENXIO;
>>>> +	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>> +	struct its_device *dev;
>>>> +	int dte_esz = abi->dte_esz;
>>>> +	u64 baser;
>>>> +
>>>> +	baser = its->baser_device_table;
>>>> +
>>>> +	list_sort(NULL, &its->device_list, vgic_its_device_cmp);
>>>> +
>>>> +	list_for_each_entry(dev, &its->device_list, dev_list) {
>>>> +		int ret;
>>>> +		gpa_t eaddr;
>>>> +
>>>> +		if (!vgic_its_check_id(its, baser,
>>>> +				       dev->device_id, &eaddr))
>>>> +			return -EINVAL;
>>>> +
>>>> +		ret = vgic_its_save_itt(its, dev);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +
>>>> +		ret = vgic_its_save_dte(its, dev, eaddr, dte_esz);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * handle_l1_entry - callback used for L1 entries (2 stage case)
>>>> + *
>>>> + * @its: its handle
>>>> + * @id: id
>>>
>>> IIUC, this is actually the index of the entry in the L1 table.  I think
>>> this should be clarified.
>> yep
>>>
>>>> + * @addr: kernel VA
>>>> + * @opaque: unused
>>>> + * @next_offset: offset to the next L1 entry: 0 if the last element
>>>> + * was found, 1 otherwise
>>>> + */
>>>> +static int handle_l1_entry(struct vgic_its *its, u32 id, void *addr,
>>>> +			   void *opaque, u32 *next_offset)
>>>
>>> nit: shouldn't this be called handle_l1_device_table_entry ?
>> renamed into handle_l1_dte
>>>
>>>> +{
>>>> +	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>> +	int l2_start_id = id * (SZ_64K / GITS_LVL1_ENTRY_SIZE);
>>>
>>> Hmmm, is this not actually supposed to be (SZ_64K / abi->dte_esz) ?
>> no because 1st level entries have a fixed size of GITS_LVL1_ENTRY_SIZE bytes
> 
> yes, but the ID you calculate is a result of how many IDs each 64K 2nd
> level table can hold, which depends on the size of each entry in the 2nd
> level table, right?  Or am I misunderstanding how this works completely.
Hum damn you're fully right. Thank you for insisting.
GITS_LVL1_ENTRY_SIZE must be passed instead as l1_esz in lookup_table()

Eric
> 
>>>
>>>> +	u64 entry = *(u64 *)addr;
>>>> +	int ret, ite_esz = abi->ite_esz;
>>>
>>> Should this be ite_esz or dte_esz?
>>
>> you're correct. dte_esz should be used.
>>>
>>>> +	gpa_t gpa;
>>>
>>> nit: put declarations with initialization on separate lines.
>> OK
>>>
>>>> +
>>>> +	entry = le64_to_cpu(entry);
>>>> +	*next_offset = 1;
>>>
>>> I think you could attach a comment here saying that level 1 tables have
>>> to be scanned entirely.
>> added. note we exit as soon as the last element is found when scanning
>> l2 tables.
>>>
>>> But this also reminds me.  Does that mean that the next field in the DTE
>>> in your documented ABI format points to the next DTE within that level-2
>>> table, or does it point across to different level-2 tables?  I think
>>> this needs to be clarified in the ABI unless I'm missing something.
>> see above comment on next_index semantic. In the doc I talk about
>> deviceid offset and not of table index.
>>
> 
> ok, I see, I was misled by the definition of lookup_table saying that it
> returns 1 if the last element is identified, which is only true when you
> actually find an element that is valid and where the next field is zero.
> I understood it to mean if it found the last item in the table it was
> scanning.  So it is implied that lookup table can be called in two
> levels and the return value indicates if the element was the last from
> the point of view of the highest level, not in the context the last
> instance was called.
> 
> Note that it's further confusing that the handler function has the
> return value the other way around, where 0 means it's the last element.
> Perhaps you could make this much more readable by introducing a define
> for the return values.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply

* [PATCH] arm64: Add translation functions for /dev/mem read/write
From: Goel, Sameer @ 2017-05-03 21:47 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503201814.GP1657@bivouac.eciton.net>



On 5/3/2017 2:18 PM, Leif Lindholm wrote:
> On Wed, May 03, 2017 at 11:07:45AM -0600, Goel, Sameer wrote:
>> On 5/3/2017 5:26 AM, Will Deacon wrote:
>>> [adding some /dev/mem fans to cc]
>>>
>>> On Tue, May 02, 2017 at 02:28:05PM -0600, Sameer Goel wrote:
>>>> Port architecture specific xlate and unxlate functions for /dev/mem
>>>> read/write. This sets up the mapping for a valid physical address if a
>>>> kernel direct mapping is not already present.
>>>>
>>>> This is a generic issue as a user space app should not be allowed to crash
>>>> the kernel.
>>>
>>>> This issue was observed when systemd tried to access performance
>>>> pointer record from the FPDT table.
>>>
>>> Why is it doing that? Is there not a way to get this via /sys?
>>
>> There is no ACPI FPDT implementation in the kernel, so the userspace
>> systemd code is getting the FPDT table contents from /sys
>> and parsing the entries. The performance pointer record is a
>> reserved address populated by UEFI and the userspace code tries to
>> access it using /dev/mem. The physical address is valid, so cannot
>> push back on the user space code.
> 
> OK, so then we need to add support for parsing this table in the
> kernel and exposing the referred-to regions in a controllable fashion.
> Maybe something that belongs under /sys/firmware/efi (adding Matt), or
> maybe something that deserves its own driver.
> 
> The only two use-cases for /dev/mem on arm64 are:
> - Implementing interfaces in the kernel takes up-front effort.
> - Being able to accidentally panic the kernel from userland.
> 
We will see this issue with any access using /dev/mem to a MEMBLOCK_NOMAP marked
memblock. The kernel crash issue has to be fixed irrespective of ACPI FPDT support.

- Sameer
> /
>     Leif
> 
>> https://github.com/systemd/systemd/blob/master/src/shared/acpi-fpdt.c
>> http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf  5.2.23 
>>>
>>>> Ported from commit e045fb2a988a ("x86: PAT avoid aliasing in /dev/mem
>>>> read/write")
>>>>
>>>> Crash Signature:
>>>>  Unable to handle kernel paging request at virtual address ffff800008ff0000
>>>>  pgd = ffff8007de8b2200
>>>>  [ffff800008ff0000] *pgd=0000000000000000, *pud=0000000000000000
>>>>  Internal error: Oops: 96000007 [#1] SMP
>>>> ................
>>>>  CPU: 0 PID: 1 Comm: systemd Not tainted 4.10.0 #1
>>>>  task: ffff8007c0820000 task.stack: ffff8007c0900000
>>>>  PC is at __arch_copy_to_user+0xb4/0x280
>>>>  LR is at read_mem+0xc0/0x138
>>>>  pc : [<ffff0000084b3bb4>] lr : [<ffff00000869d178>]
>>>> pstate: 80000145
>>>>  sp : ffff8007c0903d40
>>>> ....................
>>>>  x3 : ffff800800000000 x2 : 0000000000000008
>>>>  x1 : ffff800008ff0000 x0 : 0000fffff6fdac00
>>>> ....................
>>>>  Call trace:
>>>>  Exception stack(0xffff8007c0903b70 to 0xffff8007c0903ca0)
>>>>  [<ffff0000084b3bb4>] __arch_copy_to_user+0xb4/0x280
>>>>  [<ffff0000082454d0>] __vfs_read+0x48/0x130
>>>>  [<ffff0000082467dc>] vfs_read+0x8c/0x148
>>>>  [<ffff000008247a34>] SyS_pread64+0x94/0xa8
>>>>  [<ffff0000080833b0>] el0_svc_naked+0x24/0x28
>>>
>>> So this certainly looks like a kernel bug, but I don't think your patch is
>>> the right way to fix it.
>>
>> I agree that the reserved regions are not meant to be accessed by the kernel as system 
>> ram. So, another option was to to return a NULL for this translation.
>>
>> Since, the same usage was working on other architectures I ported over the same code to
>> highlight the issue.
>>
>>>
>>>>  Code: a88120c7 d503201f d503201f 36180082 (f8408423)
>>>>
>>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>>> Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
>>>> ---
>>>>  arch/arm64/include/asm/io.h |  5 +++++
>>>>  arch/arm64/mm/ioremap.c     | 31 +++++++++++++++++++++++++++++++
>>>>  2 files changed, 36 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
>>>> index 0c00c87..c869ea4 100644
>>>> --- a/arch/arm64/include/asm/io.h
>>>> +++ b/arch/arm64/include/asm/io.h
>>>> @@ -183,6 +183,11 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
>>>>  #define iowrite32be(v,p)	({ __iowmb(); __raw_writel((__force __u32)cpu_to_be32(v), p); })
>>>>  #define iowrite64be(v,p)	({ __iowmb(); __raw_writeq((__force __u64)cpu_to_be64(v), p); })
>>>>  
>>>> +extern void *xlate_dev_mem_ptr(phys_addr_t phys);
>>>> +extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
>>>> +
>>>> +#define xlate_dev_mem_ptr xlate_dev_mem_ptr
>>>> +#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
>>>>  #include <asm-generic/io.h>
>>>>  
>>>>  /*
>>>> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
>>>> index c4c8cd4..ba7e63b 100644
>>>> --- a/arch/arm64/mm/ioremap.c
>>>> +++ b/arch/arm64/mm/ioremap.c
>>>> @@ -24,6 +24,7 @@
>>>>  #include <linux/mm.h>
>>>>  #include <linux/vmalloc.h>
>>>>  #include <linux/io.h>
>>>> +#include <linux/memblock.h>
>>>>  
>>>>  #include <asm/fixmap.h>
>>>>  #include <asm/tlbflush.h>
>>>> @@ -105,6 +106,36 @@ void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>>>>  EXPORT_SYMBOL(ioremap_cache);
>>>>  
>>>>  /*
>>>> + * Convert a physical pointer to a virtual kernel pointer for /dev/mem
>>>> + * access
>>>> + */
>>>> +void *xlate_dev_mem_ptr(phys_addr_t phys)
>>>> +{
>>>> +	unsigned long start  = phys &  PAGE_MASK;
>>>> +	unsigned long offset = phys & ~PAGE_MASK;
>>>> +	void *vaddr;
>>>> +
>>>> +	/* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
>>>> +	if (page_is_ram(start >> PAGE_SHIFT) && memblock_is_memory(phys))
>>>> +		return __va(phys);
>>>> +
>>>> +	vaddr = ioremap_cache(start, PAGE_SIZE);
>>>
>>> Blindly using ioremap like this looks unsafe, since we could accidentally
>>> set conflict with the attributes of a mapping used by something else (e.g.
>>> firmware running on another CPU).
>>>
>>> I'd like to understand more about the crash, so we can see work out how to
>>> fix this properly.
>>>
>> This does opens up access to any valid physical address. In the short term we 
>> can block this crash by return NULL from this function if the memblock is MEMBLOCK_NOMAP.
>>
>> Eventually we might need to add another memory type to make sure that it can be mapped.
>> I have not though about the exact design here.
>>
>> Thanks,
>> Sameer
>>  
>>> Will
>>>
>>
>> -- 
>>  Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 

-- 
 Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH v5 19/22] KVM: arm64: vgic-its: ITT save and restore
From: Auger Eric @ 2017-05-03 21:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503163742.GA29506@cbox>

Hi Christoffer,

On 03/05/2017 18:37, Christoffer Dall wrote:
> On Wed, May 03, 2017 at 06:08:58PM +0200, Auger Eric wrote:
>> Hi Christoffer,
>>
>> On 30/04/2017 22:14, Christoffer Dall wrote:
>>> On Fri, Apr 14, 2017 at 12:15:31PM +0200, Eric Auger wrote:
>>>> Introduce routines to save and restore device ITT and their
>>>> interrupt table entries (ITE).
>>>>
>>>> The routines will be called on device table save and
>>>> restore. They will become static in subsequent patches.
>>>
>>> Why this bottom-up approach?  Couldn't you start by having the patch
>>> that restores the device table and define the static functions that
>>> return an error there
>> done
>> , and then fill them in with subsequent patches
>>> (liek this one)?
>>>
>>> That would have the added benefit of being able to tell how things are
>>> designed to be called.
>>>
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>> v4 -> v5:
>>>> - ITE are now sorted by eventid on the flush
>>>> - rename *flush* into *save*
>>>> - use macros for shits and masks
>>>> - pass ite_esz to vgic_its_save_ite
>>>>
>>>> v3 -> v4:
>>>> - lookup_table and compute_next_eventid_offset become static in this
>>>>   patch
>>>> - remove static along with vgic_its_flush/restore_itt to avoid
>>>>   compilation warnings
>>>> - next field only computed with a shift (mask removed)
>>>> - handle the case where the last element has not been found
>>>>
>>>> v2 -> v3:
>>>> - add return 0 in vgic_its_restore_ite (was in subsequent patch)
>>>>
>>>> v2: creation
>>>> ---
>>>>  virt/kvm/arm/vgic/vgic-its.c | 128 ++++++++++++++++++++++++++++++++++++++++++-
>>>>  virt/kvm/arm/vgic/vgic.h     |   4 ++
>>>>  2 files changed, 129 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>>> index 35b2ca1..b02fc3f 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>>> @@ -23,6 +23,7 @@
>>>>  #include <linux/interrupt.h>
>>>>  #include <linux/list.h>
>>>>  #include <linux/uaccess.h>
>>>> +#include <linux/list_sort.h>
>>>>  
>>>>  #include <linux/irqchip/arm-gic-v3.h>
>>>>  
>>>> @@ -1695,7 +1696,7 @@ u32 compute_next_devid_offset(struct list_head *h, struct its_device *dev)
>>>>  	return min_t(u32, next_offset, VITS_DTE_MAX_DEVID_OFFSET);
>>>>  }
>>>>  
>>>> -u32 compute_next_eventid_offset(struct list_head *h, struct its_ite *ite)
>>>> +static u32 compute_next_eventid_offset(struct list_head *h, struct its_ite *ite)
>>>>  {
>>>>  	struct list_head *e = &ite->ite_list;
>>>>  	struct its_ite *next;
>>>> @@ -1737,8 +1738,8 @@ typedef int (*entry_fn_t)(struct vgic_its *its, u32 id, void *entry,
>>>>   *
>>>>   * Return: < 0 on error, 1 if last element identified, 0 otherwise
>>>>   */
>>>> -int lookup_table(struct vgic_its *its, gpa_t base, int size, int esz,
>>>> -		 int start_id, entry_fn_t fn, void *opaque)
>>>> +static int lookup_table(struct vgic_its *its, gpa_t base, int size, int esz,
>>>> +			int start_id, entry_fn_t fn, void *opaque)
>>>>  {
>>>>  	void *entry = kzalloc(esz, GFP_KERNEL);
>>>>  	struct kvm *kvm = its->dev->kvm;
>>>> @@ -1773,6 +1774,127 @@ int lookup_table(struct vgic_its *its, gpa_t base, int size, int esz,
>>>>  }
>>>>  
>>>>  /**
>>>> + * vgic_its_save_ite - Save an interrupt translation entry at @gpa
>>>> + */
>>>> +static int vgic_its_save_ite(struct vgic_its *its, struct its_device *dev,
>>>> +			      struct its_ite *ite, gpa_t gpa, int ite_esz)
>>>> +{
>>>> +	struct kvm *kvm = its->dev->kvm;
>>>> +	u32 next_offset;
>>>> +	u64 val;
>>>> +
>>>> +	next_offset = compute_next_eventid_offset(&dev->itt_head, ite);
>>>> +	val = ((u64)next_offset << KVM_ITS_ITE_NEXT_SHIFT) |
>>>> +	       ((u64)ite->lpi << KVM_ITS_ITE_PINTID_SHIFT) |
>>>> +		ite->collection->collection_id;
>>>> +	val = cpu_to_le64(val);
>>>> +	return kvm_write_guest(kvm, gpa, &val, ite_esz);
>>>> +}
>>>> +
>>>> +/**
>>>> + * vgic_its_restore_ite - restore an interrupt translation entry
>>>> + * @event_id: id used for indexing
>>>> + * @ptr: pointer to the ITE entry
>>>> + * @opaque: pointer to the its_device
>>>> + * @next: id offset to the next entry
>>>> + */
>>>> +static int vgic_its_restore_ite(struct vgic_its *its, u32 event_id,
>>>> +				void *ptr, void *opaque, u32 *next)
>>>> +{
>>>> +	struct its_device *dev = (struct its_device *)opaque;
>>>> +	struct its_collection *collection;
>>>> +	struct kvm *kvm = its->dev->kvm;
>>>> +	u64 val, *p = (u64 *)ptr;
>>>
>>> nit: initializations on separate line (and possible do that just above
>>> assigning val).
>> done
>>>
>>>> +	struct vgic_irq *irq;
>>>> +	u32 coll_id, lpi_id;
>>>> +	struct its_ite *ite;
>>>> +	int ret;
>>>> +
>>>> +	val = *p;
>>>> +	*next = 1;
>>>> +
>>>> +	val = le64_to_cpu(val);
>>>> +
>>>> +	coll_id = val & KVM_ITS_ITE_ICID_MASK;
>>>> +	lpi_id = (val & KVM_ITS_ITE_PINTID_MASK) >> KVM_ITS_ITE_PINTID_SHIFT;
>>>> +
>>>> +	if (!lpi_id)
>>>> +		return 0;
>>>
>>> are all non-zero LPI IDs valid?  Don't we have a wrapper that tests if
>>> the ID is valid?
>> no, lpi_id must be >= GIC_MIN_LPI=8192; added that check.
>> ABI Doc says lpi_id==0 is interpreted as invalid. Other values <
>> GIC_MIN_LPI cause an -EINVAL error
>>>
>>> (looks like it's possible to add LPIs with the INTID range of SPIs, SGIs
>>> and PPIs here)
>>
>>>
>>>> +
>>>> +	*next = val >> KVM_ITS_ITE_NEXT_SHIFT;
>>>
>>> Don't we need to validate this somehow since it will presumably be used
>>> to forward a pointer somehow by the caller?
>> checked against max number of eventids supported by the device
>>>
>>>> +
>>>> +	collection = find_collection(its, coll_id);
>>>> +	if (!collection)
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = vgic_its_alloc_ite(dev, &ite, collection,
>>>> +				  lpi_id, event_id);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	irq = vgic_add_lpi(kvm, lpi_id);
>>>> +	if (IS_ERR(irq))
>>>> +		return PTR_ERR(irq);
>>>> +	ite->irq = irq;
>>>> +
>>>> +	/* restore the configuration of the LPI */
>>>> +	ret = update_lpi_config(kvm, irq, NULL);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	update_affinity_ite(kvm, ite);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int vgic_its_ite_cmp(void *priv, struct list_head *a,
>>>> +			    struct list_head *b)
>>>> +{
>>>> +	struct its_ite *itea = container_of(a, struct its_ite, ite_list);
>>>> +	struct its_ite *iteb = container_of(b, struct its_ite, ite_list);
>>>> +
>>>> +	if (itea->event_id < iteb->event_id)
>>>> +		return -1;
>>>> +	else
>>>> +		return 1;
>>>> +}
>>>> +
>>>> +int vgic_its_save_itt(struct vgic_its *its, struct its_device *device)
>>>> +{
>>>> +	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>> +	gpa_t base = device->itt_addr;
>>>> +	struct its_ite *ite;
>>>> +	int ret, ite_esz = abi->ite_esz;
>>>
>>> nit: initializations on separate line
>> OK
>>>
>>>> +
>>>> +	list_sort(NULL, &device->itt_head, vgic_its_ite_cmp);
>>>> +
>>>> +	list_for_each_entry(ite, &device->itt_head, ite_list) {
>>>> +		gpa_t gpa = base + ite->event_id * ite_esz;
>>>> +
>>>> +		ret = vgic_its_save_ite(its, device, ite, gpa, ite_esz);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
>>>> +{
>>>> +	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>>>> +	gpa_t base = dev->itt_addr;
>>>> +	int ret, ite_esz = abi->ite_esz;
>>>> +	size_t max_size = BIT_ULL(dev->nb_eventid_bits) * ite_esz;
>>>
>>> nit: initializations on separate line
>> OK
>>>
>>>> +
>>>> +	ret =  lookup_table(its, base, max_size, ite_esz, 0,
>>>> +			    vgic_its_restore_ite, dev);
>>>
>>> nit: extra white space
>>>
>>>> +
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	/* if the last element has not been found we are in trouble */
>>>> +	return ret ? 0 : -EINVAL;
>>>
>>> hmm, these are values potentially created by the guest in guest RAM,
>>> right?  So do we really abort migration and return an error to userspace
>>> in this case?
>> So we discussed with Peter/dave we shouldn't abort() in qemu in case of
>> such error. The restore table IOCTL will return an error. Up to qemu to
>> print the error. Destination guest will not be functional though.
>>
> 
> ok, I'm just wondering if userspace can make a qualified decision based
> on this error code.  EINVAL typically means that userspace provided
> something incorrect, which I suppose in a sense is true, but this should
> be the only case where we return EINVAL here.
  Userspace must be able to
> tell the cases apart where the guest programmed bogus into memory before
> migration started, in which case we should ignore-and-resume, and where
> QEMU errornously provide some bogus value where the machine state
> becomes unreliable and must be powered down.
guest does not feed much besides few registers the ITS table restore
depends on. In case we want a more subtle error management at userspace
level all the error codes need to be revisited I am afraid. My plan was
to be more rough at the beginning and ignore & resume if ITS table
restore fails.

Thanks

Eric
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply

* [PATCH 1/3] ARM: dts: imx6qdl-nitrogen6_max: fix rv4162 compatible
From: Fabio Estevam @ 2017-05-03 22:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503085010.GF18578@dragon>

Hi Shawn,

On Wed, May 3, 2017 at 5:50 AM, Shawn Guo <shawnguo@kernel.org> wrote:

>>       rtc: rtc at 68 {
>> -             compatible = "st,rv4162";
>> +             compatible = "microcrystal,rv4162";
>
> The compatible is not documented?

This compatible has been added in a897bf138c9b443 ("rtc: m41t80: Add
proper compatible for rv4162") in linux-next, so it will be present in
4.12-rc1.

^ permalink raw reply

* [PATCH 1/3] ARM: dts: imx6qdl-nitrogen6_max: fix rv4162 compatible
From: Alexandre Belloni @ 2017-05-03 22:16 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170503085010.GF18578@dragon>

On 03/05/2017 at 16:50:12 +0800, Shawn Guo wrote:
> On Wed, Apr 19, 2017 at 10:22:02PM +0200, Alexandre Belloni wrote:
> > The rv4162 vendor is microcrystal, not ST.
> > 
> > Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
> > ---
> >  arch/arm/boot/dts/imx6qdl-nitrogen6_max.dtsi | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/boot/dts/imx6qdl-nitrogen6_max.dtsi b/arch/arm/boot/dts/imx6qdl-nitrogen6_max.dtsi
> > index bad3c9f9eeac..b63134e3b51a 100644
> > --- a/arch/arm/boot/dts/imx6qdl-nitrogen6_max.dtsi
> > +++ b/arch/arm/boot/dts/imx6qdl-nitrogen6_max.dtsi
> > @@ -408,7 +408,7 @@
> >  	};
> >  
> >  	rtc: rtc at 68 {
> > -		compatible = "st,rv4162";
> > +		compatible = "microcrystal,rv4162";
> 
> The compatible is not documented?
> 

No, it wasn't. Like for many i2c devices, the driver predates DT and
didn't need any specific code to be used from DT. Basically, the i2c
core was using the i2c_device_ids to match the DT compatible string.
This is changing since cabcf4f6be658b2d4bea26dee89d08fdf67d6811.

I'm planning to document all the undocumented ones soon.

> Shawn
> 
> >  		pinctrl-names = "default";
> >  		pinctrl-0 = <&pinctrl_rv4162>;
> >  		reg = <0x68>;
> > -- 
> > 2.11.0
> > 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH v5 21/22] KVM: arm64: vgic-its: Fix pending table sync
From: Auger Eric @ 2017-05-03 22:20 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170430211033.GD1499@lvm>

Hi,

On 30/04/2017 23:10, Christoffer Dall wrote:
> On Fri, Apr 14, 2017 at 12:15:33PM +0200, Eric Auger wrote:
>> In its_sync_lpi_pending_table() we currently ignore the
>> target_vcpu of the LPIs. We sync the pending bit found in
>> the vcpu pending table even if the LPI is not targeting it.
>>
>> Also in vgic_its_cmd_handle_invall() we are supposed to
>> read the config table data for the LPIs associated to the
>> collection ID. At the moment we refresh all LPI config
>> information.
>>
>> This patch passes a vpcu to vgic_copy_lpi_list() so that
>> this latter returns a snapshot of the LPIs targeting this
>> CPU and only those.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  virt/kvm/arm/vgic/vgic-its.c | 24 ++++++++++++------------
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index 86dfc6c..be848be 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -252,13 +252,13 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>>  }
>>  
>>  /*
>> - * Create a snapshot of the current LPI list, so that we can enumerate all
>> - * LPIs without holding any lock.
>> - * Returns the array length and puts the kmalloc'ed array into intid_ptr.
>> + * Create a snapshot of the current LPIs targeting @vcpu, so that we can
>> + * enumerate those LPIs without holding any lock.
>> + * Returns their number and puts the kmalloc'ed array into intid_ptr.
>>   */
>> -static int vgic_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>> +static int vgic_copy_lpi_list(struct kvm_vcpu *vcpu, u32 **intid_ptr)
>>  {
>> -	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>  	struct vgic_irq *irq;
>>  	u32 *intids;
>>  	int irq_count = dist->lpi_list_count, i = 0;
>> @@ -277,14 +277,14 @@ static int vgic_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>>  	spin_lock(&dist->lpi_list_lock);
>>  	list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
>>  		/* We don't need to "get" the IRQ, as we hold the list lock. */
>> -		intids[i] = irq->intid;
>> -		if (++i == irq_count)
>> -			break;
>> +		if (irq->target_vcpu != vcpu)
>> +			continue;
>> +		intids[i++] = irq->intid;
> 
> were we checking the ++i == irq_count condition for no good reason
> before since we can just drop it now?
I didn't get why we had that check.

Thanks

Eric
> 
>>  	}
>>  	spin_unlock(&dist->lpi_list_lock);
>>  
>>  	*intid_ptr = intids;
>> -	return irq_count;
>> +	return i;
>>  }
>>  
>>  /*
>> @@ -333,7 +333,7 @@ static u32 max_lpis_propbaser(u64 propbaser)
>>  }
>>  
>>  /*
>> - * Scan the whole LPI pending table and sync the pending bit in there
>> + * Sync the pending table pending bit of LPIs targeting @vcpu
>>   * with our own data structures. This relies on the LPI being
>>   * mapped before.
>>   */
>> @@ -346,7 +346,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>>  	u32 *intids;
>>  	int nr_irqs, i;
>>  
>> -	nr_irqs = vgic_copy_lpi_list(vcpu->kvm, &intids);
>> +	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
>>  	if (nr_irqs < 0)
>>  		return nr_irqs;
>>  
>> @@ -1027,7 +1027,7 @@ static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
>>  
>>  	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
>>  
>> -	irq_count = vgic_copy_lpi_list(kvm, &intids);
>> +	irq_count = vgic_copy_lpi_list(vcpu, &intids);
>>  	if (irq_count < 0)
>>  		return irq_count;
>>  
>> -- 
>> 2.5.5
>>
> 
> Assuming that it's ok to remove the irq_count check above, the rest of
> this patch looks good to me.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply

* [PATCH v5 22/22] KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
From: Auger Eric @ 2017-05-03 22:22 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20170430213219.GE1499@lvm>

Hi Christoffer,

On 30/04/2017 23:32, Christoffer Dall wrote:
> On Fri, Apr 14, 2017 at 12:15:34PM +0200, Eric Auger wrote:
>> This patch adds a new attribute to GICV3 KVM device
>> KVM_DEV_ARM_VGIC_GRP_CTRL group. This Allows the userspace to
> 
> nit: allows (lowercase)
> nit: s/the userspace/userspace/
> 
>> flush all GICR pending tables into guest RAM. At the moment
>> we do not offer any restore control as the sync is implicit.
> 
> I would probably remove the last sentence here.
> 
>>
>> As we need the PENDBASER_ADDRESS() in vgic-v3, let's move its
>> definition in the irqchip header. We restore the full length
>> of the field, ie [51:16]. Same for PROPBASER_ADDRESS with full
>> field length of [51:12].
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v4 -> v5:
>> - move pending table save code/ctrl into VGICv3 instead of ITS
>> - remove useless gpa_t cast
>> - use the same optimization as in its_sync_lpi_pending_table
>>
>> v3 -> v4:
>> - remove the wrong comment about locking
>> - pass kvm struct instead of its handle
>> - add comment about restore method
>> - remove GITR_PENDABASER.PTZ check
>> - continue if target_vcpu == NULL
>> - new locking strategy
>>
>> v1 -> v2:
>> - do not care about the 1st KB which should be zeroed according to
>>   the spec.
>> ---
>>  arch/arm/include/uapi/asm/kvm.h     |  1 +
>>  arch/arm64/include/uapi/asm/kvm.h   |  1 +
>>  include/linux/irqchip/arm-gic-v3.h  |  2 ++
>>  virt/kvm/arm/vgic/vgic-its.c        |  6 ++---
>>  virt/kvm/arm/vgic/vgic-kvm-device.c | 20 ++++++++++++++
>>  virt/kvm/arm/vgic/vgic-v3.c         | 54 +++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic/vgic.h            |  1 +
>>  7 files changed, 81 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>> index 8e6563c..78fe803 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -202,6 +202,7 @@ struct kvm_arch_memory_slot {
>>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT		0
>>  #define   KVM_DEV_ARM_ITS_SAVE_TABLES		1
>>  #define   KVM_DEV_ARM_ITS_RESTORE_TABLES	2
>> +#define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES	3
>>  
>>  /* KVM_IRQ_LINE irq field index values */
>>  #define KVM_ARM_IRQ_TYPE_SHIFT		24
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 1e35115..8a3758a 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -222,6 +222,7 @@ struct kvm_arch_memory_slot {
>>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT		0
>>  #define   KVM_DEV_ARM_ITS_SAVE_TABLES           1
>>  #define   KVM_DEV_ARM_ITS_RESTORE_TABLES        2
>> +#define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES	3
>>  
>>  /* Device Control API on vcpu fd */
>>  #define KVM_ARM_VCPU_PMU_V3_CTRL	0
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index 0c6798c..9d3932f 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -158,6 +158,7 @@
>>  #define GICR_PROPBASER_RaWaWb	GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWb)
>>  
>>  #define GICR_PROPBASER_IDBITS_MASK			(0x1f)
>> +#define GICR_PROPBASER_ADDRESS(x)    ((x) & GENMASK_ULL(51, 12))
>>  
>>  #define GICR_PENDBASER_SHAREABILITY_SHIFT		(10)
>>  #define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT		(7)
>> @@ -183,6 +184,7 @@
>>  #define GICR_PENDBASER_RaWaWb	GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWb)
>>  
>>  #define GICR_PENDBASER_PTZ				BIT_ULL(62)
>> +#define GICR_PENDBASER_ADDRESS(x)    ((x) & GENMASK_ULL(51, 16))
>>  
>>  /*
>>   * Re-Distributor registers, offsets from SGI_base
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index be848be..745c245 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -189,8 +189,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id,
>>   */
>>  #define BASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 16))
>>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 12))
>> -#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 16))
>> -#define PROPBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 12))
>>  
>>  #define GIC_LPI_OFFSET 8192
>>  
>> @@ -227,7 +225,7 @@ static struct its_collection *find_collection(struct vgic_its *its, int coll_id)
>>  static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>>  			     struct kvm_vcpu *filter_vcpu)
>>  {
>> -	u64 propbase = PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>> +	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>>  	u8 prop;
>>  	int ret;
>>  
>> @@ -339,7 +337,7 @@ static u32 max_lpis_propbaser(u64 propbaser)
>>   */
>>  static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>>  {
>> -	gpa_t pendbase = PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>> +	gpa_t pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>>  	struct vgic_irq *irq;
>>  	int last_byte_offset = -1;
>>  	int ret = 0;
>> diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
>> index 859bfa8..d48743c 100644
>> --- a/virt/kvm/arm/vgic/vgic-kvm-device.c
>> +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
>> @@ -580,6 +580,24 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
>>  		reg = tmp32;
>>  		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
>>  	}
>> +	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
>> +		int ret;
>> +
>> +		switch (attr->attr) {
>> +		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
>> +			mutex_lock(&dev->kvm->lock);
>> +
>> +			if (!lock_all_vcpus(dev->kvm)) {
>> +				mutex_unlock(&dev->kvm->lock);
>> +				return -EBUSY;
>> +			}
>> +			ret = vgic_v3_save_pending_tables(dev->kvm);
>> +			unlock_all_vcpus(dev->kvm);
>> +			mutex_unlock(&dev->kvm->lock);
>> +			return ret;
>> +		}
>> +		break;
>> +	}
>>  	}
>>  	return -ENXIO;
>>  }
>> @@ -658,6 +676,8 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
>>  		switch (attr->attr) {
>>  		case KVM_DEV_ARM_VGIC_CTRL_INIT:
>>  			return 0;
>> +		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
>> +			return 0;
>>  		}
>>  	}
>>  	return -ENXIO;
>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>> index be0f4c3..1f0977f 100644
>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>> @@ -15,6 +15,7 @@
>>  #include <linux/irqchip/arm-gic-v3.h>
>>  #include <linux/kvm.h>
>>  #include <linux/kvm_host.h>
>> +//#include <linux/bitops.h>
>>  #include <kvm/arm_vgic.h>
>>  #include <asm/kvm_mmu.h>
>>  #include <asm/kvm_asm.h>
>> @@ -252,6 +253,59 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
>>  }
>>  
>> +/**
>> + * vgic_its_save_pending_tables - Save the pending tables into guest RAM
>> + * kvm lock and all vcpu lock must be held
>> + */
>> +int vgic_v3_save_pending_tables(struct kvm *kvm)
>> +{
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	int last_byte_offset = -1;
>> +	struct vgic_irq *irq;
>> +	int ret;
>> +
>> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
>> +		int byte_offset, bit_nr;
>> +		struct kvm_vcpu *vcpu;
>> +		gpa_t pendbase, ptr;
>> +		bool stored;
>> +		u8 val;
>> +
>> +		vcpu = irq->target_vcpu;
>> +		if (!vcpu)
>> +			continue;
>> +
>> +		pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>> +
>> +		byte_offset = irq->intid / BITS_PER_BYTE;
>> +		bit_nr = irq->intid % BITS_PER_BYTE;
>> +		ptr = pendbase + byte_offset;
>> +
>> +		if (byte_offset != last_byte_offset) {
>> +			ret = kvm_read_guest(kvm, ptr, &val, 1);
>> +			if (ret)
>> +				return ret;
>> +			last_byte_offset = byte_offset;
>> +		}
>> +
>> +		stored = val & bit_nr;
> 
> didn't you mean 'stored = val & (1 << bit_nr)' ?
yes thanks!
> 
>> +		if (stored == irq->pending_latch)
>> +			continue;
>> +
>> +		if (irq->pending_latch)
>> +			val |= 1 << bit_nr;
>> +		else
>> +			val &= ~(1 << bit_nr);
>> +
>> +		ret = kvm_write_guest(kvm, ptr, &val, 1);
>> +		if (ret)
>> +			return ret;
>> +	}
> 
> This loop could probably be written a bit more efficiently and
> simplicity by reading the memory one word at a time (and remembering to
> do le64_to_cpu) and then doing something like:
> 
> 		if (irq->pending_latch)
> 			old = __test_and_set_bit(bit_nr, &val);
> 		else
> 			old = __test_and_clear_bit(bit_nr, &val);
> 
> 		if (old != val) {
> 			tmp = cpu_to_le64(val);
> 			ret = kvm_write_guest(kvm, ptr, &tmp,
> 					      sizeof(unsigned long));
> 			if (ret)
> 				retur ret;
> 		}
> 
> Further, you could also detect when the word_offset changes and write
> back the entire word with all its changes then, but you'd also have to
> check at the end of the loop then.  Not sure that's worth the
> optimization or easier to read than what you heave.  It's up to you.
I eventually chose to leave the code as it is today. One of the
rationale is it is pretty similar to the its_sync_lpi_pending_table
function in vgic-its.c.

Thanks

Eric
> 
> Thanks,
> -Christoffer
> 
>> +
>> +	return 0;
>> +}
>> +
>> +
>>  /* check for overlapping regions and for regions crossing the end of memory */
>>  static bool vgic_v3_check_base(struct kvm *kvm)
>>  {
>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>> index 9bc52ef..535c2fc 100644
>> --- a/virt/kvm/arm/vgic/vgic.h
>> +++ b/virt/kvm/arm/vgic/vgic.h
>> @@ -177,6 +177,7 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
>>  void vgic_v3_enable(struct kvm_vcpu *vcpu);
>>  int vgic_v3_probe(const struct gic_kvm_info *info);
>>  int vgic_v3_map_resources(struct kvm *kvm);
>> +int vgic_v3_save_pending_tables(struct kvm *kvm);
>>  int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
>>  
>>  int vgic_register_its_iodevs(struct kvm *kvm);
>> -- 
>> 2.5.5
>>

^ permalink raw reply

* [PATCH v8 5/7] coresight: add support for CPU debug module
From: Mathieu Poirier @ 2017-05-03 22:24 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1493719717-27698-6-git-send-email-leo.yan@linaro.org>

On Tue, May 02, 2017 at 06:08:35PM +0800, Leo Yan wrote:
> Coresight includes debug module and usually the module connects with CPU
> debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has
> description for related info in "Part H: External Debug".
> 
> Chapter H7 "The Sample-based Profiling Extension" introduces several
> sampling registers, e.g. we can check program counter value with
> combined CPU exception level, secure state, etc. So this is helpful for
> analysis CPU lockup scenarios, e.g. if one CPU has run into infinite
> loop with IRQ disabled. In this case the CPU cannot switch context and
> handle any interrupt (including IPIs), as the result it cannot handle
> SMP call for stack dump.
> 
> This patch is to enable coresight debug module, so firstly this driver
> is to bind apb clock for debug module and this is to ensure the debug
> module can be accessed from program or external debugger. And the driver
> uses sample-based registers for debug purpose, e.g. when system triggers
> panic, the driver will dump program counter and combined context
> registers (EDCIDSR, EDVIDSR); by parsing context registers so can
> quickly get to know CPU secure state, exception level, etc.
> 
> Some of the debug module registers are located in CPU power domain, so
> this requires the CPU power domain stays on when access related debug
> registers, but the power management for CPU power domain is quite
> dependent on SoC integration for power management. For the platforms
> which with sane power controller implementations, this driver follows
> the method to set EDPRCR to try to pull the CPU out of low power state
> and then set 'no power down request' bit so the CPU has no chance to
> lose power.
> 
> If the SoC has not followed up this design well for power management
> controller, the user should use the command line parameter or sysfs
> to constrain all or partial idle states to ensure the CPU power
> domain is enabled and access coresight CPU debug component safely.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/Kconfig               |  14 +
>  drivers/hwtracing/coresight/Makefile              |   1 +
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 670 ++++++++++++++++++++++
>  3 files changed, 685 insertions(+)
>  create mode 100644 drivers/hwtracing/coresight/coresight-cpu-debug.c
> 
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index 130cb21..8d55d6d 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -89,4 +89,18 @@ config CORESIGHT_STM
>  	  logging useful software events or data coming from various entities
>  	  in the system, possibly running different OSs
>  
> +config CORESIGHT_CPU_DEBUG
> +	tristate "CoreSight CPU Debug driver"
> +	depends on ARM || ARM64
> +	depends on DEBUG_FS
> +	help
> +	  This driver provides support for coresight debugging module. This
> +	  is primarily used to dump sample-based profiling registers when
> +	  system triggers panic, the driver will parse context registers so
> +	  can quickly get to know program counter (PC), secure state,
> +	  exception level, etc. Before use debugging functionality, platform
> +	  needs to ensure the clock domain and power domain are enabled
> +	  properly, please refer Documentation/trace/coresight-cpu-debug.txt
> +	  for detailed description and the example for usage.
> +
>  endif
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index af480d9..433d590 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -16,3 +16,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>  					coresight-etm4x-sysfs.o
>  obj-$(CONFIG_CORESIGHT_QCOM_REPLICATOR) += coresight-replicator-qcom.o
>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
> +obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> new file mode 100644
> index 0000000..b77456d
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -0,0 +1,670 @@
> +/*
> + * Copyright (c) 2017 Linaro Limited. All rights reserved.
> + *
> + * Author: Leo Yan <leo.yan@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +#include <linux/amba/bus.h>
> +#include <linux/coresight.h>
> +#include <linux/cpu.h>
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/pm_qos.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +
> +#include "coresight-priv.h"
> +
> +#define EDPCSR				0x0A0
> +#define EDCIDSR				0x0A4
> +#define EDVIDSR				0x0A8
> +#define EDPCSR_HI			0x0AC
> +#define EDOSLAR				0x300
> +#define EDPRCR				0x310
> +#define EDPRSR				0x314
> +#define EDDEVID1			0xFC4
> +#define EDDEVID				0xFC8
> +
> +#define EDPCSR_PROHIBITED		0xFFFFFFFF
> +
> +/* bits definition for EDPCSR */
> +#define EDPCSR_THUMB			BIT(0)
> +#define EDPCSR_ARM_INST_MASK		GENMASK(31, 2)
> +#define EDPCSR_THUMB_INST_MASK		GENMASK(31, 1)
> +
> +/* bits definition for EDPRCR */
> +#define EDPRCR_COREPURQ			BIT(3)
> +#define EDPRCR_CORENPDRQ		BIT(0)
> +
> +/* bits definition for EDPRSR */
> +#define EDPRSR_DLK			BIT(6)
> +#define EDPRSR_PU			BIT(0)
> +
> +/* bits definition for EDVIDSR */
> +#define EDVIDSR_NS			BIT(31)
> +#define EDVIDSR_E2			BIT(30)
> +#define EDVIDSR_E3			BIT(29)
> +#define EDVIDSR_HV			BIT(28)
> +#define EDVIDSR_VMID			GENMASK(7, 0)
> +
> +/*
> + * bits definition for EDDEVID1:PSCROffset
> + *
> + * NOTE: armv8 and armv7 have different definition for the register,
> + * so consolidate the bits definition as below:
> + *
> + * 0b0000 - Sample offset applies based on the instruction state, we
> + *          rely on EDDEVID to check if EDPCSR is implemented or not
> + * 0b0001 - No offset applies.
> + * 0b0010 - No offset applies, but do not use in AArch32 mode
> 
> + */
> +#define EDDEVID1_PCSR_OFFSET_MASK	GENMASK(3, 0)
> +#define EDDEVID1_PCSR_OFFSET_INS_SET	(0x0)
> +#define EDDEVID1_PCSR_NO_OFFSET_DIS_AARCH32	(0x2)
> +
> +/* bits definition for EDDEVID */
> +#define EDDEVID_PCSAMPLE_MODE		GENMASK(3, 0)
> +#define EDDEVID_IMPL_EDPCSR		(0x1)
> +#define EDDEVID_IMPL_EDPCSR_EDCIDSR	(0x2)
> +#define EDDEVID_IMPL_FULL		(0x3)
> +
> +#define DEBUG_WAIT_SLEEP		1000
> +#define DEBUG_WAIT_TIMEOUT		32000
> +
> +struct debug_drvdata {
> +	void __iomem	*base;
> +	struct device	*dev;
> +	int		cpu;
> +
> +	bool		edpcsr_present;
> +	bool		edcidsr_present;
> +	bool		edvidsr_present;
> +	bool		pc_has_offset;
> +
> +	u32		edpcsr;
> +	u32		edpcsr_hi;
> +	u32		edprsr;
> +	u32		edvidsr;
> +	u32		edcidsr;
> +};
> +
> +static DEFINE_MUTEX(debug_lock);
> +static DEFINE_PER_CPU(struct debug_drvdata *, debug_drvdata);
> +static int debug_count;
> +static struct dentry *debug_debugfs_dir;
> +
> +static bool debug_enable;
> +module_param_named(enable, debug_enable, bool, 0600);
> +MODULE_PARM_DESC(enable, "Knob to enable debug functionality "
> +		 "(default is 0, which means is disabled by default)");
> +
> +static void debug_os_unlock(struct debug_drvdata *drvdata)
> +{
> +	/* Unlocks the debug registers */
> +	writel_relaxed(0x0, drvdata->base + EDOSLAR);
> +
> +	/* Make sure the registers are unlocked before accessing */
> +	wmb();
> +}
> +
> +/*
> + * According to ARM DDI 0487A.k, before access external debug
> + * registers should firstly check the access permission; if any
> + * below condition has been met then cannot access debug
> + * registers to avoid lockup issue:
> + *
> + * - CPU power domain is powered off;
> + * - The OS Double Lock is locked;
> + *
> + * By checking EDPRSR can get to know if meet these conditions.
> + */
> +static bool debug_access_permitted(struct debug_drvdata *drvdata)
> +{
> +	/* CPU is powered off */
> +	if (!(drvdata->edprsr & EDPRSR_PU))
> +		return false;
> +
> +	/* The OS Double Lock is locked */
> +	if (drvdata->edprsr & EDPRSR_DLK)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void debug_force_cpu_powered_up(struct debug_drvdata *drvdata)
> +{
> +	u32 edprcr;
> +
> +try_again:
> +
> +	/*
> +	 * Send request to power management controller and assert
> +	 * DBGPWRUPREQ signal; if power management controller has
> +	 * sane implementation, it should enable CPU power domain
> +	 * in case CPU is in low power state.
> +	 */
> +	edprcr = readl_relaxed(drvdata->base + EDPRCR);
> +	edprcr |= EDPRCR_COREPURQ;
> +	writel_relaxed(edprcr, drvdata->base + EDPRCR);
> +
> +	/* Wait for CPU to be powered up (timeout~=32ms) */
> +	if (readx_poll_timeout_atomic(readl_relaxed, drvdata->base + EDPRSR,
> +			drvdata->edprsr, (drvdata->edprsr & EDPRSR_PU),
> +			DEBUG_WAIT_SLEEP, DEBUG_WAIT_TIMEOUT)) {
> +		/*
> +		 * Unfortunately the CPU cannot be powered up, so return
> +		 * back and later has no permission to access other
> +		 * registers. For this case, should disable CPU low power
> +		 * states to ensure CPU power domain is enabled!
> +		 */
> +		pr_err("%s: power up request for CPU%d failed\n",
> +			__func__, drvdata->cpu);
> +		return;
> +	}
> +
> +	/*
> +	 * At this point the CPU is powered up, so set the no powerdown
> +	 * request bit so we don't lose power and emulate power down.
> +	 */
> +	edprcr = readl_relaxed(drvdata->base + EDPRCR);
> +	edprcr |= EDPRCR_COREPURQ | EDPRCR_CORENPDRQ;
> +	writel_relaxed(edprcr, drvdata->base + EDPRCR);
> +
> +	drvdata->edprsr = readl_relaxed(drvdata->base + EDPRSR);
> +
> +	/* The core power domain got switched off on use, try again */
> +	if (unlikely(!(drvdata->edprsr & EDPRSR_PU)))
> +		goto try_again;
> +}
> +
> +static void debug_read_regs(struct debug_drvdata *drvdata)
> +{
> +	u32 save_edprcr;
> +
> +	CS_UNLOCK(drvdata->base);
> +
> +	/* Unlock os lock */
> +	debug_os_unlock(drvdata);
> +
> +	/* Save EDPRCR register */
> +	save_edprcr = readl_relaxed(drvdata->base + EDPRCR);
> +
> +	/*
> +	 * Ensure CPU power domain is enabled to let registers
> +	 * are accessiable.
> +	 */
> +	debug_force_cpu_powered_up(drvdata);
> +
> +	if (!debug_access_permitted(drvdata))
> +		goto out;
> +
> +	drvdata->edpcsr = readl_relaxed(drvdata->base + EDPCSR);
> +
> +	/*
> +	 * As described in ARM DDI 0487A.k, if the processing
> +	 * element (PE) is in debug state, or sample-based
> +	 * profiling is prohibited, EDPCSR reads as 0xFFFFFFFF;
> +	 * EDCIDSR, EDVIDSR and EDPCSR_HI registers also become
> +	 * UNKNOWN state. So directly bail out for this case.
> +	 */
> +	if (drvdata->edpcsr == EDPCSR_PROHIBITED)
> +		goto out;
> +
> +	/*
> +	 * A read of the EDPCSR normally has the side-effect of
> +	 * indirectly writing to EDCIDSR, EDVIDSR and EDPCSR_HI;
> +	 * at this point it's safe to read value from them.
> +	 */
> +	if (IS_ENABLED(CONFIG_64BIT))
> +		drvdata->edpcsr_hi = readl_relaxed(drvdata->base + EDPCSR_HI);
> +
> +	if (drvdata->edcidsr_present)
> +		drvdata->edcidsr = readl_relaxed(drvdata->base + EDCIDSR);
> +
> +	if (drvdata->edvidsr_present)
> +		drvdata->edvidsr = readl_relaxed(drvdata->base + EDVIDSR);
> +
> +out:
> +	/* Restore EDPRCR register */
> +	writel_relaxed(save_edprcr, drvdata->base + EDPRCR);
> +
> +	CS_LOCK(drvdata->base);
> +}
> +
> +static unsigned long debug_adjust_pc(struct debug_drvdata *drvdata)
> +{
> +	unsigned long arm_inst_offset = 0, thumb_inst_offset = 0;
> +	unsigned long pc;
> +
> +	if (IS_ENABLED(CONFIG_64BIT))
> +		return (unsigned long)drvdata->edpcsr_hi << 32 |
> +		       (unsigned long)drvdata->edpcsr;
> +
> +	pc = (unsigned long)drvdata->edpcsr;
> +
> +	if (drvdata->pc_has_offset) {
> +		arm_inst_offset = 8;
> +		thumb_inst_offset = 4;
> +	}
> +
> +	/* Handle thumb instruction */
> +	if (pc & EDPCSR_THUMB) {
> +		pc = (pc & EDPCSR_THUMB_INST_MASK) - thumb_inst_offset;
> +		return pc;
> +	}
> +
> +	/*
> +	 * Handle arm instruction offset, if the arm instruction
> +	 * is not 4 byte alignment then it's possible the case
> +	 * for implementation defined; keep original value for this
> +	 * case and print info for notice.
> +	 */
> +	if (pc & BIT(1))
> +		pr_emerg("Instruction offset is implementation defined\n");
> +	else
> +		pc = (pc & EDPCSR_ARM_INST_MASK) - arm_inst_offset;
> +
> +	return pc;
> +}
> +
> +static void debug_dump_regs(struct debug_drvdata *drvdata)
> +{
> +	unsigned long pc;
> +
> +	pr_emerg("\tEDPRSR:  %08x (Power:%s DLK:%s)\n", drvdata->edprsr,
> +		 drvdata->edprsr & EDPRSR_PU ? "On" : "Off",
> +		 drvdata->edprsr & EDPRSR_DLK ? "Lock" : "Unlock");
> +
> +	if (!debug_access_permitted(drvdata)) {
> +		pr_emerg("No permission to access debug registers!\n");
> +		return;
> +	}
> +
> +	if (drvdata->edpcsr == EDPCSR_PROHIBITED) {
> +		pr_emerg("CPU is in Debug state or profiling is prohibited!\n");
> +		return;
> +	}
> +
> +	pc = debug_adjust_pc(drvdata);
> +	pr_emerg("\tEDPCSR:  [<%p>] %pS\n", (void *)pc, (void *)pc);
> +
> +	if (drvdata->edcidsr_present)
> +		pr_emerg("\tEDCIDSR: %08x\n", drvdata->edcidsr);
> +
> +	if (drvdata->edvidsr_present)
> +		pr_emerg("\tEDVIDSR: %08x (State:%s Mode:%s Width:%dbits VMID:%x)\n",
> +			 drvdata->edvidsr,
> +			 drvdata->edvidsr & EDVIDSR_NS ? "Non-secure" : "Secure",
> +			 drvdata->edvidsr & EDVIDSR_E3 ? "EL3" :
> +				(drvdata->edvidsr & EDVIDSR_E2 ? "EL2" : "EL1/0"),
> +			 drvdata->edvidsr & EDVIDSR_HV ? 64 : 32,
> +			 drvdata->edvidsr & (u32)EDVIDSR_VMID);
> +}
> +
> +static void debug_init_arch_data(void *info)
> +{
> +	struct debug_drvdata *drvdata = info;
> +	u32 mode, pcsr_offset;
> +	u32 eddevid, eddevid1;
> +
> +	CS_UNLOCK(drvdata->base);
> +
> +	/* Read device info */
> +	eddevid  = readl_relaxed(drvdata->base + EDDEVID);
> +	eddevid1 = readl_relaxed(drvdata->base + EDDEVID1);
> +
> +	CS_LOCK(drvdata->base);
> +
> +	/* Parse implementation feature */
> +	mode = eddevid & EDDEVID_PCSAMPLE_MODE;
> +	pcsr_offset = eddevid1 & EDDEVID1_PCSR_OFFSET_MASK;
> +
> +	drvdata->edpcsr_present  = false;
> +	drvdata->edcidsr_present = false;
> +	drvdata->edvidsr_present = false;
> +	drvdata->pc_has_offset   = false;
> +
> +	switch (mode) {
> +	case EDDEVID_IMPL_FULL:
> +		drvdata->edvidsr_present = true;
> +		/* Fall through */
> +	case EDDEVID_IMPL_EDPCSR_EDCIDSR:
> +		drvdata->edcidsr_present = true;
> +		/* Fall through */
> +	case EDDEVID_IMPL_EDPCSR:
> +		/*
> +		 * In ARM DDI 0487A.k, the EDDEVID1.PCSROffset is used to
> +		 * define if has the offset for PC sampling value; if read
> +		 * back EDDEVID1.PCSROffset == 0x2, then this means the debug
> +		 * module does not sample the instruction set state when
> +		 * armv8 CPU in AArch32 state.
> +		 */
> +		drvdata->edpcsr_present =
> +			((IS_ENABLED(CONFIG_64BIT) && pcsr_offset != 0) ||
> +			 (pcsr_offset != EDDEVID1_PCSR_NO_OFFSET_DIS_AARCH32));
> +
> +		drvdata->pc_has_offset =
> +			(pcsr_offset == EDDEVID1_PCSR_OFFSET_INS_SET);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +/*
> + * Dump out information on panic.
> + */
> +static int debug_notifier_call(struct notifier_block *self,
> +			       unsigned long v, void *p)
> +{
> +	int cpu;
> +	struct debug_drvdata *drvdata;
> +
> +	mutex_lock(&debug_lock);
> +
> +	/* Bail out if the functionality is disabled */
> +	if (!debug_enable)
> +		goto skip_dump;
> +
> +	pr_emerg("ARM external debug module:\n");
> +
> +	for_each_possible_cpu(cpu) {
> +		drvdata = per_cpu(debug_drvdata, cpu);
> +		if (!drvdata)
> +			continue;
> +
> +		pr_emerg("CPU[%d]:\n", drvdata->cpu);
> +
> +		debug_read_regs(drvdata);
> +		debug_dump_regs(drvdata);
> +	}
> +
> +skip_dump:
> +	mutex_unlock(&debug_lock);
> +	return 0;
> +}
> +
> +static struct notifier_block debug_notifier = {
> +	.notifier_call = debug_notifier_call,
> +};
> +
> +static int debug_enable_func(void)
> +{
> +	struct debug_drvdata *drvdata;
> +	int cpu, ret = 0;
> +
> +	for_each_possible_cpu(cpu) {
> +		drvdata = per_cpu(debug_drvdata, cpu);
> +		if (!drvdata)
> +			continue;
> +
> +		ret = pm_runtime_get_sync(drvdata->dev);
> +		if (ret < 0) {
> +			pm_runtime_put_noidle(drvdata->dev);
> +			goto err;
> +		}

Here pm_runtime_get_sync() has failed and as such there is no need to call
pm_runtime_put_noidle() on it.  On the flip side we need to call it on all the
other CPUs that have been enabled before that. To fix this I suggest to
use a cpumask variable and set the bit for the CPUs that are successful.  When
you encounter an error you call pm_runtime_put_noidle() for all the CPUs in the
cpumask variable that you have been keeping.

> +	}
> +
> +err:
> +	return ret;
> +}
> +
> +static int debug_disable_func(void)
> +{
> +	struct debug_drvdata *drvdata;
> +	int cpu, ret = 0;
> +
> +	for_each_possible_cpu(cpu) {
> +		drvdata = per_cpu(debug_drvdata, cpu);
> +		if (!drvdata)
> +			continue;
> +
> +		ret = pm_runtime_put(drvdata->dev);
> +		if (ret < 0)
> +			goto err;
> +	}

I would not bail out when an error has been encountered since there is nothing
you can do anyway.  So record the error (as you did) and keep circling through
the list of processors.  That way you can call pm_runtime_put() successfully on
other CPUs. 

> +
> +err:
> +	return ret;
> +}
> +
> +static ssize_t debug_func_knob_write(struct file *f,
> +		const char __user *buf, size_t count, loff_t *ppos)
> +{
> +	u8 val;
> +	int ret;
> +
> +	ret = kstrtou8_from_user(buf, count, 2, &val);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&debug_lock);
> +
> +	if (val == debug_enable)
> +		goto out;
> +
> +	if (val)
> +		ret = debug_enable_func();
> +	else
> +		ret = debug_disable_func();
> +
> +	if (ret) {
> +		pr_err("%s: unable to %s debug function: %d\n",
> +		       __func__, val ? "enable" : "disable", ret);
> +		goto err;
> +	}
> +
> +	debug_enable = val;
> +out:
> +	ret = count;
> +err:
> +	mutex_unlock(&debug_lock);
> +	return ret;
> +}
> +
> +static ssize_t debug_func_knob_read(struct file *f,
> +		char __user *ubuf, size_t count, loff_t *ppos)
> +{
> +	ssize_t ret;
> +	char buf[3];
> +
> +	mutex_lock(&debug_lock);
> +
> +	snprintf(buf, sizeof(buf), "%d\n", debug_enable);
> +	ret = simple_read_from_buffer(ubuf, count, ppos, buf, sizeof(buf));
> +
> +	mutex_unlock(&debug_lock);

Move this before simple_read_from_buffer()

> +	return ret;
> +}
> +
> +static const struct file_operations debug_func_knob_fops = {
> +	.open	= simple_open,
> +	.read	= debug_func_knob_read,
> +	.write	= debug_func_knob_write,
> +};
> +
> +static int debug_func_init(void)
> +{
> +	struct dentry *file;
> +	int ret;
> +
> +	/* Create debugfs node */
> +	debug_debugfs_dir = debugfs_create_dir("coresight_cpu_debug", NULL);
> +	if (!debug_debugfs_dir) {
> +		pr_err("%s: unable to create debugfs directory\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	file = debugfs_create_file("enable", 0644, debug_debugfs_dir, NULL,
> +				   &debug_func_knob_fops);
> +	if (!file) {
> +		pr_err("%s: unable to create enable knob file\n", __func__);
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	/* Register function to be called for panic */
> +	ret = atomic_notifier_chain_register(&panic_notifier_list,
> +					     &debug_notifier);
> +	if (ret) {
> +		pr_err("%s: unable to register notifier: %d\n",
> +		       __func__, ret);
> +		goto err;
> +	}
> +
> +	return 0;
> +
> +err:
> +	debugfs_remove_recursive(debug_debugfs_dir);
> +	return ret;
> +}
> +
> +static void debug_func_exit(void)
> +{
> +	atomic_notifier_chain_unregister(&panic_notifier_list,
> +					 &debug_notifier);
> +	debugfs_remove_recursive(debug_debugfs_dir);
> +}
> +
> +static int debug_probe(struct amba_device *adev, const struct amba_id *id)
> +{
> +	void __iomem *base;
> +	struct device *dev = &adev->dev;
> +	struct debug_drvdata *drvdata;
> +	struct resource *res = &adev->res;
> +	struct device_node *np = adev->dev.of_node;
> +	int ret;
> +
> +	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
> +	if (!drvdata)
> +		return -ENOMEM;
> +
> +	drvdata->cpu = np ? of_coresight_get_cpu(np) : 0;
> +	if (per_cpu(debug_drvdata, drvdata->cpu)) {
> +		dev_err(dev, "CPU%d drvdata has been initialized, "
> +			"may be caused by binding wrong CPU node in the DT\n",
> +			drvdata->cpu);
> +		return -EBUSY;
> +	}
> +
> +	drvdata->dev = &adev->dev;
> +	amba_set_drvdata(adev, drvdata);
> +
> +	/* Validity for the resource is already checked by the AMBA core */
> +	base = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(base))
> +		return PTR_ERR(base);
> +
> +	drvdata->base = base;
> +
> +	get_online_cpus();
> +	per_cpu(debug_drvdata, drvdata->cpu) = drvdata;
> +	ret = smp_call_function_single(drvdata->cpu, debug_init_arch_data,
> +				       drvdata, 1);
> +	put_online_cpus();
> +
> +	if (ret) {
> +		dev_err(dev, "CPU%d debug arch init failed\n", drvdata->cpu);
> +		goto err;
> +	}
> +
> +	if (!drvdata->edpcsr_present) {
> +		dev_err(dev, "CPU%d sample-based profiling isn't implemented\n",
> +			drvdata->cpu);
> +		ret = -ENXIO;
> +		goto err;
> +	}
> +
> +	if (!debug_count++) {
> +		ret = debug_func_init();
> +		if (ret)
> +			goto err_func_init;
> +	}
> +
> +	mutex_lock(&debug_lock);
> +	if (!debug_enable)
> +		pm_runtime_put(dev);

I would add a comment here - that way your intentions are clear to everyone.

> +	mutex_unlock(&debug_lock);
> +
> +	dev_info(dev, "Coresight debug-CPU%d initialized\n", drvdata->cpu);
> +	return 0;
> +
> +err_func_init:
> +	debug_count--;
> +err:
> +	per_cpu(debug_drvdata, drvdata->cpu) = NULL;
> +	return ret;
> +}
> +
> +static int debug_remove(struct amba_device *adev)
> +{
> +	struct device *dev = &adev->dev;
> +	struct debug_drvdata *drvdata = amba_get_drvdata(adev);
> +
> +	per_cpu(debug_drvdata, drvdata->cpu) = NULL;
> +
> +	mutex_lock(&debug_lock);
> +	if (debug_enable)
> +		pm_runtime_put(dev);
> +	mutex_unlock(&debug_lock);
> +
> +	if (!--debug_count)
> +		debug_func_exit();
> +
> +	return 0;
> +}
> +
> +static struct amba_id debug_ids[] = {
> +	{       /* Debug for Cortex-A53 */
> +		.id	= 0x000bbd03,
> +		.mask	= 0x000fffff,
> +	},
> +	{       /* Debug for Cortex-A57 */
> +		.id	= 0x000bbd07,
> +		.mask	= 0x000fffff,
> +	},
> +	{       /* Debug for Cortex-A72 */
> +		.id	= 0x000bbd08,
> +		.mask	= 0x000fffff,
> +	},
> +	{ 0, 0 },
> +};
> +
> +static struct amba_driver debug_driver = {
> +	.drv = {
> +		.name   = "coresight-cpu-debug",
> +		.suppress_bind_attrs = true,
> +	},
> +	.probe		= debug_probe,
> +	.remove		= debug_remove,
> +	.id_table	= debug_ids,
> +};
> +
> +module_amba_driver(debug_driver);
> +
> +MODULE_AUTHOR("Leo Yan <leo.yan@linaro.org>");
> +MODULE_DESCRIPTION("ARM Coresight CPU Debug Driver");
> +MODULE_LICENSE("GPL");
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH 1/3] ARM: dts: imx6qdl-nitrogen6_max: fix rv4162 compatible
From: Alexandre Belloni @ 2017-05-03 22:27 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAOMZO5BhxHbSyHhj-6GhHNSbgQ6H6b1+HudB9MhmUOsFPdd5YA@mail.gmail.com>

On 03/05/2017 at 19:14:42 -0300, Fabio Estevam wrote:
> Hi Shawn,
> 
> On Wed, May 3, 2017 at 5:50 AM, Shawn Guo <shawnguo@kernel.org> wrote:
> 
> >>       rtc: rtc at 68 {
> >> -             compatible = "st,rv4162";
> >> +             compatible = "microcrystal,rv4162";
> >
> > The compatible is not documented?
> 
> This compatible has been added in a897bf138c9b443 ("rtc: m41t80: Add
> proper compatible for rv4162") in linux-next, so it will be present in
> 4.12-rc1.

Yes, and to be clear, both patches are independent and can be applied
in any order because the i?C core will still continue to match the
i2c_device_ids if the compatible doesn't match.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH 1/2] dt/bindings: Add bindings for Broadcom STB DRAM Sensors
From: Markus Mayer @ 2017-05-03 22:29 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <90f011d6-9241-e860-7a0e-2fb52c2337ce@gmail.com>

Hi Rob,

On 27 April 2017 at 15:00, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 04/27/2017 02:57 PM, Rob Herring wrote:
>>>>> +Optional properties:
>>>>> +  - cell-index: the index of the DPFE instance; will default to 0 if not set
>>
>> Don't use cell-index. It's not a valid property for FDT (only real
>> OpenFirmware).
>
> My bad, I was advising Markus to use this property since it was largely
> used throughout Documentation/devicetree/bindings/. What would be a more
> appropriate way to have the same information? Aliases?

Hopefully this will be the last time we need to pester you about this.
What should we be using instead of cell-index to identify multiple
instances of a device?

To give you an idea of what the code looks like right now:

ret = of_property_read_u32(dev->of_node, "cell-index", &index);
if (ret)
    index = 0;
[...]
dev_set_name(dpfe_dev, "dpfe%u", index);
ret = device_register(dpfe_dev);

Enumerating the devices like this is what we are after.

Thanks,
-Markus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox