Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-31 16:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <ebe7f0c1-b2dc-57ad-e5cb-7edf0fee4626@redhat.com>

On Thu, May 31, 2018 at 11:36:39AM -0400, Waiman Long wrote:
> > I'm on the fence myself; the only thing I'm fairly sure of is that tying
> > this particular behaviour to the load-balance knob seems off.
> 
> The main reason for doing it this way is that I don't want to have
> load-balanced partition with no cpu in it. How about we just don't allow
> consume-all at all. Each partition must have at least 1 cpu.

I suspect that might be sufficient. It certainly is for the use-cases
I'm aware of. You always want a system/control set which runs the
regular busy work of running a system.

Then you have one (or more) partitions to run your 'important' work.

> > I also think we should not mix the 'consume all' thing with the
> > 'fully-partitioned' thing, as they are otherwise unrelated.
>
> The "consume all" and "fully-partitioned" look the same to me. Are you
> talking about allocating all the CPUs in a partition to sub-partitions
> so that there is no CPU left in the parent partition?

Not sure what you're asking. "consume all" is allowing sub-partitions to
allocate all CPUs of the parent, such that there are none left.

"fully-partitioned" is N cpus but no load-balancing, also equivalent to
N 1 CPU parititions.

They are distinct things. Disabling load-balancing should not affect how
many CPUs can be allocated to sub-partitions, the moment you hit 1 CPU
the load balancing is effectively off already. Going down to 0 CPUs
isn't a problem for the load-balancer, it wasn't doing anything anyway.

So the question is if someone really needs the one partition without
balancing over N separate paritions.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Peter Zijlstra @ 2018-05-31 16:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Zefan Li, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531155807.GU1351649@devbig577.frc2.facebook.com>

On Thu, May 31, 2018 at 08:58:07AM -0700, Tejun Heo wrote:
> Tying together what's configured and what's applied may feel
> attractive on the surface but it's a long term headache.
> 
> * It's inconsistent with what other controllers are doing.  All the
>   limit resource configs declare the upper bound the specific cgroup
>   can consume regardless of what's actually available to it.  They
>   limit but don't guarantee access.
> 
> * Which decouples a given cgroup's configurations from its ancestors',
>   which allows an ancestor to take away resources that it granted
>   before and then also giving it back later.  No matter what you do,
>   if you couple configs of cgroup hierarchy, you end up restricting
>   what an ancestor can do to its sub-hierarchy, which can quickly
>   become a difficult operational headache.
> 
> So, let's please stay away from it even if that means a bit of
> overhead in terms of interface.

Urgh, that again :/

I'm still not convinced by your arguments though. The root container can
access all the sub-groups anyway and can grub around in them to take
away resources if it really wants to.

For cpuset in particular randomly restricting on the ancestor level can
create an unrecoverable trainwreck inside a container. Affinities are
not recoverable. Once a runnable task ends up with an empty set, its
affinities are reset and the smaller (empty) set is lost.


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Tejun Heo @ 2018-05-31 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Zefan Li, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531161645.GN12180@hirez.programming.kicks-ass.net>

Hello,

On Thu, May 31, 2018 at 06:16:45PM +0200, Peter Zijlstra wrote:
> > So, let's please stay away from it even if that means a bit of
> > overhead in terms of interface.
> 
> Urgh, that again :/

Yeah, well, it's pretty important.

> I'm still not convinced by your arguments though. The root container can
> access all the sub-groups anyway and can grub around in them to take
> away resources if it really wants to.

That's really messy and if you delegated away a subtree, you can't
walk the subtree in a race free way, not easily anyway.

> For cpuset in particular randomly restricting on the ancestor level can
> create an unrecoverable trainwreck inside a container. Affinities are
> not recoverable. Once a runnable task ends up with an empty set, its
> affinities are reset and the smaller (empty) set is lost.

Yeah, for cpuset, it's messier, but it isn't different from hotunplug
scenario, right?  I think the best we can do there is putting ancestor
operation on an equal footing as hotplug ops.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v4] coresight: documentation: update sysfs section
From: Kim Phillips @ 2018-05-31 16:21 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Mathieu Poirier, Randy Dunlap, linux-arm-kernel,
	open list:DOCUMENTATION, linux-kernel
In-Reply-To: <20180516152458.d392b0d1e09393da30e2aeb6@arm.com>

- Align and show updated ls devices output from the TC2, based on
  current driver

- Provide an example from an ETMv4 based system (Juno)

- Reflect changes to the way the RAM write pointer is accessed since
  it got changed in commit 7d83d17795ef ("coresight: tmc: adding sysFS
  management entries").

Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
---
v4: v3 + Randy and Mathieu's acked-bys, for Jonathan to apply.
v3: address Randy Dunlap's showns->shown,  corrected - and + line merging
v2: address Mathieu's comment about clarifying the sinks on the Juno
    vs. TC2 platforms.

 Documentation/trace/coresight.txt | 43 +++++++++++++++++++------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 6f0120c3a4f1..15d2a0f1e1b8 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -141,13 +141,25 @@ register the device with the core framework.  The unregister function takes
 a reference to a "struct coresight_device", obtained at registration time.
 
 If everything goes well during the registration process the new devices will
-show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
+show up under /sys/bus/coresight/devices, as shown here for a TC2 platform:
 
 root:~# ls /sys/bus/coresight/devices/
-replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
-20010000.etb         20040000.funnel  2201d000.ptm  2203d000.etm
+20010000.etb   20040000.funnel	2201d000.ptm  2203d000.etm  replicator
+20030000.tpiu  2201c000.ptm	2203c000.etm  2203e000.etm
 root:~#
 
+and here for a Juno platform:
+
+root@juno:~# ls /sys/bus/coresight/devices/
+20010000.etf	 20120000.replicator  22040000.etm     230c0000.funnel
+20030000.tpiu	 20130000.funnel      220c0000.funnel  23140000.etm
+20040000.funnel  20140000.etf	      22140000.etm     23240000.etm
+20070000.etr	 20150000.funnel      23040000.etm     23340000.etm
+root@juno:~# 
+
+Note that on Juno users can select the ETF, ETR and TPIU as a sink target while
+on TC2, the ETB and TPIU can be selected.
+
 The functions take a "struct coresight_device", which looks like this:
 
 struct coresight_desc {
@@ -193,16 +205,16 @@ the information carried in "THIS_MODULE".
 How to use the tracer modules
 -----------------------------
 
-Before trace collection can start, a coresight sink needs to be identify.
+Before trace collection can start, a coresight sink needs to be identified.
 There is no limit on the amount of sinks (nor sources) that can be enabled at
 any given moment.  As a generic operation, all device pertaining to the sink
 class will have an "active" entry in sysfs:
 
 root:/sys/bus/coresight/devices# ls
-replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
-20010000.etb         20040000.funnel  2201d000.ptm  2203d000.etm
+20010000.etb   20040000.funnel	2201d000.ptm  2203d000.etm  replicator
+20030000.tpiu  2201c000.ptm	2203c000.etm  2203e000.etm
 root:/sys/bus/coresight/devices# ls 20010000.etb
-enable_sink  status  trigger_cntr
+enable_sink  mgmt  power  subsystem  trigger_cntr  uevent
 root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
 root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
 1
@@ -216,16 +228,13 @@ trigger a trace capture:
 root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
 root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
 1
-root:/sys/bus/coresight/devices# cat 20010000.etb/status
-Depth:          0x2000
-Status:         0x1
-RAM read ptr:   0x0
-RAM wrt ptr:    0x19d3   <----- The write pointer is moving
-Trigger cnt:    0x0
-Control:        0x1
-Flush status:   0x0
-Flush ctrl:     0x2001
-root:/sys/bus/coresight/devices#
+
+Observe the write pointer moving:
+
+root:/sys/bus/coresight/devices# cat 20010000.etb/mgmt/rwp
+0x1a8
+root:/sys/bus/coresight/devices# cat 20010000.etb/mgmt/rwp
+0x19a6
 
 Trace collection is stopped the same way:
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 2/3] drm/doc: Add initial amdgpu driver documentation
From: Michel Dänzer @ 2018-05-31 16:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Jonathan Corbet, dri-devel, linux-doc, linux-kernel
In-Reply-To: <20180531161708.5894-1-michel@daenzer.net>

From: Michel Dänzer <michel.daenzer@amd.com>

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 Documentation/gpu/amdgpu.rst  | 6 ++++++
 Documentation/gpu/drivers.rst | 1 +
 2 files changed, 7 insertions(+)
 create mode 100644 Documentation/gpu/amdgpu.rst

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst
new file mode 100644
index 000000000000..41a14e4aa4ac
--- /dev/null
+++ b/Documentation/gpu/amdgpu.rst
@@ -0,0 +1,6 @@
+=========================
+ drm/amdgpu AMDgpu driver
+=========================
+
+The drm/amdgpu driver supports all AMD Radeon GPUs based on the Graphics Core
+Next (GCN) architecture.
diff --git a/Documentation/gpu/drivers.rst b/Documentation/gpu/drivers.rst
index e8c84419a2a1..604b6d6975af 100644
--- a/Documentation/gpu/drivers.rst
+++ b/Documentation/gpu/drivers.rst
@@ -4,6 +4,7 @@ GPU Driver Documentation
 
 .. toctree::
 
+   amdgpu
    i915
    meson
    pl111
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/3] drm/amdgpu: Add documentation for PRIME related code
From: Michel Dänzer @ 2018-05-31 16:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Jonathan Corbet, dri-devel, linux-doc, linux-kernel
In-Reply-To: <20180531161708.5894-1-michel@daenzer.net>

From: Michel Dänzer <michel.daenzer@amd.com>

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 Documentation/gpu/amdgpu.rst              |  14 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 119 ++++++++++++++++++++++
 2 files changed, 133 insertions(+)

diff --git a/Documentation/gpu/amdgpu.rst b/Documentation/gpu/amdgpu.rst
index 41a14e4aa4ac..f557866f6788 100644
--- a/Documentation/gpu/amdgpu.rst
+++ b/Documentation/gpu/amdgpu.rst
@@ -4,3 +4,17 @@
 
 The drm/amdgpu driver supports all AMD Radeon GPUs based on the Graphics Core
 Next (GCN) architecture.
+
+Core Driver Infrastructure
+==========================
+
+This section covers core driver infrastructure.
+
+PRIME Buffer Sharing
+--------------------
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+   :doc: PRIME Buffer Sharing
+
+.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+   :internal:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 4683626b065f..d1f05489595b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -23,6 +23,14 @@
  *
  * Authors: Alex Deucher
  */
+
+/**
+ * DOC: PRIME Buffer Sharing
+ *
+ * The following callback implementations are used for :ref:`sharing GEM buffer
+ * objects between different devices via PRIME <prime_buffer_sharing>`.
+ */
+
 #include <drm/drmP.h>
 
 #include "amdgpu.h"
@@ -32,6 +40,14 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
+/**
+ * amdgpu_gem_prime_get_sg_table - &drm_driver.gem_prime_get_sg_table
+ * implementation
+ * @obj: GEM buffer object
+ *
+ * Returns:
+ * A scatter/gather table for the pinned pages of the buffer object's memory.
+ */
 struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -40,6 +56,15 @@ struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
 	return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
 }
 
+/**
+ * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
+ * @obj: GEM buffer object
+ *
+ * Sets up an in-kernel virtual mapping of the buffer object's memory.
+ *
+ * Returns:
+ * The virtual address of the mapping or an error pointer.
+ */
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -53,6 +78,13 @@ void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 	return bo->dma_buf_vmap.virtual;
 }
 
+/**
+ * amdgpu_gem_prime_vunmap - &dma_buf_ops.vunmap implementation
+ * @obj: GEM buffer object
+ * @vaddr: virtual address (unused)
+ *
+ * Tears down the in-kernel virtual mapping of the buffer object's memory.
+ */
 void amdgpu_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -60,6 +92,17 @@ void amdgpu_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr)
 	ttm_bo_kunmap(&bo->dma_buf_vmap);
 }
 
+/**
+ * amdgpu_gem_prime_mmap - &drm_driver.gem_prime_mmap implementation
+ * @obj: GEM buffer object
+ * @vma: virtual memory area
+ *
+ * Sets up a userspace mapping of the buffer object's memory in the given
+ * virtual memory area.
+ *
+ * Returns:
+ * 0 on success or negative error code.
+ */
 int amdgpu_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -94,6 +137,19 @@ int amdgpu_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma
 	return ret;
 }
 
+/**
+ * amdgpu_gem_prime_import_sg_table - &drm_driver.gem_prime_import_sg_table
+ * implementation
+ * @dev: DRM device
+ * @attach: DMA-buf attachment
+ * @sg: Scatter/gather table
+ *
+ * Import shared DMA buffer memory exported by another device.
+ *
+ * Returns:
+ * A new GEM buffer object of the given DRM device, representing the memory
+ * described by the given DMA-buf attachment and scatter/gather table.
+ */
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 				 struct dma_buf_attachment *attach,
@@ -132,6 +188,19 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 	return ERR_PTR(ret);
 }
 
+/**
+ * amdgpu_gem_map_attach - &dma_buf_ops.attach implementation
+ * @dma_buf: shared DMA buffer
+ * @target_dev: target device
+ * @attach: DMA-buf attachment
+ *
+ * Makes sure that the shared DMA buffer can be accessed by the target device.
+ * For now, simply pins it to the GTT domain, where it should be accessible by
+ * all DMA devices.
+ *
+ * Returns:
+ * 0 on success or negative error code.
+ */
 static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 				 struct device *target_dev,
 				 struct dma_buf_attachment *attach)
@@ -181,6 +250,14 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 	return r;
 }
 
+/**
+ * amdgpu_gem_map_detach - &dma_buf_ops.detach implementation
+ * @dma_buf: shared DMA buffer
+ * @attach: DMA-buf attachment
+ *
+ * This is called when a shared DMA buffer no longer needs to be accessible by
+ * the other device. For now, simply unpins the buffer from GTT.
+ */
 static void amdgpu_gem_map_detach(struct dma_buf *dma_buf,
 				  struct dma_buf_attachment *attach)
 {
@@ -202,6 +279,13 @@ static void amdgpu_gem_map_detach(struct dma_buf *dma_buf,
 	drm_gem_map_detach(dma_buf, attach);
 }
 
+/**
+ * amdgpu_gem_prime_res_obj - &drm_driver.gem_prime_res_obj implementation
+ * @obj: GEM buffer object
+ *
+ * Returns:
+ * The buffer object's reservation object.
+ */
 struct reservation_object *amdgpu_gem_prime_res_obj(struct drm_gem_object *obj)
 {
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -209,6 +293,18 @@ struct reservation_object *amdgpu_gem_prime_res_obj(struct drm_gem_object *obj)
 	return bo->tbo.resv;
 }
 
+/**
+ * amdgpu_gem_begin_cpu_access - &dma_buf_ops.begin_cpu_access implementation
+ * @dma_buf: shared DMA buffer
+ * @direction: direction of DMA transfer
+ *
+ * This is called before CPU access to the shared DMA buffer's memory. If it's
+ * a read access, the buffer is moved to the GTT domain if possible, for optimal
+ * CPU read performance.
+ *
+ * Returns:
+ * 0 on success or negative error code.
+ */
 static int amdgpu_gem_begin_cpu_access(struct dma_buf *dma_buf,
 				       enum dma_data_direction direction)
 {
@@ -253,6 +349,18 @@ static const struct dma_buf_ops amdgpu_dmabuf_ops = {
 	.vunmap = drm_gem_dmabuf_vunmap,
 };
 
+/**
+ * amdgpu_gem_prime_export - &drm_driver.gem_prime_export implementation
+ * @dev: DRM device
+ * @gobj: GEM buffer object
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
+ *
+ * The main work is done by the &drm_gem_prime_export helper, which in turn
+ * uses &amdgpu_gem_prime_res_obj.
+ *
+ * Returns:
+ * Shared DMA buffer representing the GEM buffer object from the given device.
+ */
 struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 					struct drm_gem_object *gobj,
 					int flags)
@@ -273,6 +381,17 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	return buf;
 }
 
+/**
+ * amdgpu_gem_prime_import - &drm_driver.gem_prime_import implementation
+ * @dev: DRM device
+ * @dma_buf: Shared DMA buffer
+ *
+ * The main work is done by the &drm_gem_prime_import helper, which in turn
+ * uses &amdgpu_gem_prime_import_sg_table.
+ *
+ * Returns:
+ * GEM buffer object representing the shared DMA buffer for the given device.
+ */
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
 					    struct dma_buf *dma_buf)
 {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 1/3] drm/doc: Add a label for the PRIME Buffer Sharing chapter
From: Michel Dänzer @ 2018-05-31 16:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Jonathan Corbet, dri-devel, linux-doc, linux-kernel

From: Michel Dänzer <michel.daenzer@amd.com>

So that it can be referenced from e.g. DOC comments.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 Documentation/gpu/drm-mm.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 96ebcc2a7b41..21b6b72a9ba8 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -395,6 +395,8 @@ VMA Offset Manager
 .. kernel-doc:: drivers/gpu/drm/drm_vma_manager.c
    :export:
 
+.. _prime_buffer_sharing:
+
 PRIME Buffer Sharing
 ====================
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Waiman Long @ 2018-05-31 16:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zefan Li, Peter Zijlstra, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531155807.GU1351649@devbig577.frc2.facebook.com>

On 05/31/2018 11:58 AM, Tejun Heo wrote:
> Hello,
>
> On Thu, May 31, 2018 at 09:22:23AM -0400, Waiman Long wrote:
>>>>>>> As the intersection of g11's cpus and that of g1 is empty, the effective
>>>>>>> cpus of g11 is just that of g1. The check in update_cpumask() is now
>>>>>>> corrected to make sure that cpus in a child cpus must be a subset of
>>>>>>> its parent's cpus. The error "write error: Invalid argument" will now
>>>>>>> be reported in the above case.
>>>>>>>
>>>>>> We made the distinction between user-configured CPUs and effective CPUs
>>>>>> in commit 7e88291beefbb758, so actually it's not a bug.
>>>>>>
>>>>> I remember the original reason is to support restoration of the original
>>>>> cpu after cpu offline->online. We use user-configured CPUs to remember
>>>>> if the cpu should be restored in the cpuset after it's onlined.
>>>> AFAICT you can do that and still have the child a subset of the parent,
>>>> no?
>>>> .
>>> Sure. IIRC this was suggested by Tejun as he had done something similar to devcgroup.
>>>
>> OK, let wait until Tejun has time to chime in. For me, it just look
>> weird to be able to do that.
>>
>> Another corner case that is not handled is when cpus_allowed is empty.
>> In this case, it falls back to the parent's effective cpus. On the other
>> hand, it can also be argued that an empty cpus_allowed is a transient
>> state and a cpuset shouldn't have cpus undefined while creating children.
> Tying together what's configured and what's applied may feel
> attractive on the surface but it's a long term headache.
>
> * It's inconsistent with what other controllers are doing.  All the
>   limit resource configs declare the upper bound the specific cgroup
>   can consume regardless of what's actually available to it.  They
>   limit but don't guarantee access.
>
> * Which decouples a given cgroup's configurations from its ancestors',
>   which allows an ancestor to take away resources that it granted
>   before and then also giving it back later.  No matter what you do,
>   if you couple configs of cgroup hierarchy, you end up restricting
>   what an ancestor can do to its sub-hierarchy, which can quickly
>   become a difficult operational headache.
>
> So, let's please stay away from it even if that means a bit of
> overhead in terms of interface.
>
> Thanks.
>
I am fine with that argument. I will update the patch documentation to
include this information as I think it is important for the users to be
aware of that.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuset: Enforce that a child's cpus must be a subset of the parent
From: Peter Zijlstra @ 2018-05-31 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Zefan Li, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi
In-Reply-To: <20180531161942.GW1351649@devbig577.frc2.facebook.com>

On Thu, May 31, 2018 at 09:19:42AM -0700, Tejun Heo wrote:
> Hello,
> 
> On Thu, May 31, 2018 at 06:16:45PM +0200, Peter Zijlstra wrote:
> > > So, let's please stay away from it even if that means a bit of
> > > overhead in terms of interface.
> > 
> > Urgh, that again :/
> 
> Yeah, well, it's pretty important.
> 
> > I'm still not convinced by your arguments though. The root container can
> > access all the sub-groups anyway and can grub around in them to take
> > away resources if it really wants to.
> 
> That's really messy and if you delegated away a subtree, you can't
> walk the subtree in a race free way, not easily anyway.

Messy perhaps, but taking away resources you gave out earlier isn't
particularly nice either way around.

Not sure the races matter, if you win, the delegate can't undo it, if
you loose, you try again until you win.

It's not like cgroup stuff gets changed often, so a conflict causing you
to loose should be very rare indeed.

> > For cpuset in particular randomly restricting on the ancestor level can
> > create an unrecoverable trainwreck inside a container. Affinities are
> > not recoverable. Once a runnable task ends up with an empty set, its
> > affinities are reset and the smaller (empty) set is lost.
> 
> Yeah, for cpuset, it's messier, but it isn't different from hotunplug
> scenario, right?  I think the best we can do there is putting ancestor
> operation on an equal footing as hotplug ops.

Right, but hotplug is exceedingly rare, while I get the impression you
think it is perfectly fine to recind on your resource grants.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-31 16:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi,
	Thomas Gleixner
In-Reply-To: <20180531160857.GM12180@hirez.programming.kicks-ass.net>

On 05/31/2018 12:08 PM, Peter Zijlstra wrote:
> On Thu, May 31, 2018 at 11:36:39AM -0400, Waiman Long wrote:
>>> I'm on the fence myself; the only thing I'm fairly sure of is that tying
>>> this particular behaviour to the load-balance knob seems off.
>> The main reason for doing it this way is that I don't want to have
>> load-balanced partition with no cpu in it. How about we just don't allow
>> consume-all at all. Each partition must have at least 1 cpu.
> I suspect that might be sufficient. It certainly is for the use-cases
> I'm aware of. You always want a system/control set which runs the
> regular busy work of running a system.
>
> Then you have one (or more) partitions to run your 'important' work.

Good. I will make the change in the next version.

>
>>> I also think we should not mix the 'consume all' thing with the
>>> 'fully-partitioned' thing, as they are otherwise unrelated.
>> The "consume all" and "fully-partitioned" look the same to me. Are you
>> talking about allocating all the CPUs in a partition to sub-partitions
>> so that there is no CPU left in the parent partition?
> Not sure what you're asking. "consume all" is allowing sub-partitions to
> allocate all CPUs of the parent, such that there are none left.
>
> "fully-partitioned" is N cpus but no load-balancing, also equivalent to
> N 1 CPU parititions.

Thanks for the clarification.

> They are distinct things. Disabling load-balancing should not affect how
> many CPUs can be allocated to sub-partitions, the moment you hit 1 CPU
> the load balancing is effectively off already. Going down to 0 CPUs
> isn't a problem for the load-balancer, it wasn't doing anything anyway.
>
> So the question is if someone really needs the one partition without
> balancing over N separate paritions.

Thinking about isolcpus emulation, I now realize that it is more than
just disabling load balancing. it also disables some kernel threads like
kworker from running so that an userspace application can monopolize as
much of a cpu as possible. Disabling kernel threads from running isn't
that hard if it is only done once at boot time. it is trickier if we
have to do it at run time.

Without good isolcpus emulation, disabling load balance kind of loses
its usefulness. So I am going to take out the load_balance flag for now
unless I hear objection otherwise.

Cheers,
Longman



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH resend*2] VFS: simplify seq_file iteration code and interface
From: NeilBrown @ 2018-05-31 22:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-doc, linux-kernel, linux-fsdevel, Alexander Viro,
	Jonathan Corbet, Linus Torvalds
In-Reply-To: <87fu3dihtf.fsf@notabene.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 11466 bytes --]


The documentation for seq_file suggests that it is necessary to be
able to move the iterator to a given offset, however that is not the
case.  If the iterator is stored in the private data and is stable
from one read() syscall to the next, it is only necessary to support
first/next interactions.  Implementing this in a client is a little
clumsy.
- if ->start() is given a pos of zero, it should go to start of
  sequence.
- if ->start() is given the same pos that was given to the most recent
  next() or start(), it should restore the iterator to state just
  before that last call
- if ->start is given another number, it should set the iterator one
  beyond the start just before the last ->start or ->next call.


Also, the documentation says that the implementation can interpret the
pos however it likes (other than zero meaning start), but seq_file
increments the pos sometimes which does impose on the implementation.

This patch simplifies the interface for first/next iteration and
simplifies the code, while maintaining complete backward
compatability.  Now:

- if ->start() is given a pos of zero, it should return an iterator
  placed at the start of the sequence
- if ->start() is given a non-zero pos, it should return the iterator
  in the same state it was after the last ->start or ->next.

This is particularly useful for interators which walk the multiple
chains in a hash table, e.g. using rhashtable_walk*. See
fs/gfs2/glock.c and drivers/staging/lustre/lustre/llite/vvp_dev.c

A large part of achieving this is to *always* call ->next after ->show
has successfully stored all of an entry in the buffer.  Never just
increment the index instead.
Also:
 - always pass &m->index to ->start() and ->next(), never a temp
   variable
 - don't clear ->from when ->count is zero, as ->from is dead when
    ->count is zero.

Some ->next functions do not increment *pos when they return NULL.
To maintain compatability with this, we still need to increment
m->index in one place, if ->next didn't increment it.
Note that such ->next functions are buggy and should be fixed.
A simple demonstration is
   dd if=/proc/swaps bs=1000 skip=1
Choose any block size larger than the size of /proc/swaps.
This will always show the whole last line of /proc/swaps.

This patch doesn't work around buggy next() functions for this case.

Acked-by: Jonathan Corbet <corbet@lwn.net> (For the docs part)
Signed-off-by: NeilBrown <neilb@suse.com>
---

Hi Andrew,
 maybe I've been sending this to the wrong address.  You seem to have
 processed a few seq_file patches lately - maybe you could review/take
 this one too?
Thanks,
NeilBrown


 Documentation/filesystems/seq_file.txt | 63 ++++++++++++++++++++++------------
 fs/seq_file.c                          | 53 +++++++++++-----------------
 2 files changed, 62 insertions(+), 54 deletions(-)

diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt
index 9de4303201e1..d412b236a9d6 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.txt
@@ -66,23 +66,39 @@ kernel 3.10. Current versions require the following update
 
 The iterator interface
 
-Modules implementing a virtual file with seq_file must implement a simple
-iterator object that allows stepping through the data of interest.
-Iterators must be able to move to a specific position - like the file they
-implement - but the interpretation of that position is up to the iterator
-itself. A seq_file implementation that is formatting firewall rules, for
-example, could interpret position N as the Nth rule in the chain.
-Positioning can thus be done in whatever way makes the most sense for the
-generator of the data, which need not be aware of how a position translates
-to an offset in the virtual file. The one obvious exception is that a
-position of zero should indicate the beginning of the file.
+Modules implementing a virtual file with seq_file must implement an
+iterator object that allows stepping through the data of interest
+during a "session" (roughly one read() system call).  If the iterator
+is able to move to a specific position - like the file they implement,
+though with freedom to map the position number to a sequence location
+in whatever way is convenient - the iterator need only exist
+transiently during a session.  If the iterator cannot easily find a
+numerical position but works well with a first/next interface, the
+iterator can be stored in the private data area and continue from one
+session to the next.
+
+A seq_file implementation that is formatting firewall rules from a
+table, for example, could provide a simple iterator that interprets
+position N as the Nth rule in the chain.  A seq_file implementation
+that presents the content of a, potentially volatile, linked list
+might record a pointer into that list, providing that can be done
+without risk of the current location being removed.
+
+Positioning can thus be done in whatever way makes the most sense for
+the generator of the data, which need not be aware of how a position
+translates to an offset in the virtual file. The one obvious exception
+is that a position of zero should indicate the beginning of the file.
 
 The /proc/sequence iterator just uses the count of the next number it
 will output as its position.
 
-Four functions must be implemented to make the iterator work. The first,
-called start() takes a position as an argument and returns an iterator
-which will start reading at that position. For our simple sequence example,
+Four functions must be implemented to make the iterator work. The
+first, called start(), starts a session and takes a position as an
+argument, returning an iterator which will start reading at that
+position.  The pos passed to start() will always be either zero, or
+the most recent pos used in the previous session.
+
+For our simple sequence example,
 the start() function looks like:
 
 	static void *ct_seq_start(struct seq_file *s, loff_t *pos)
@@ -101,11 +117,12 @@ implementations; in most cases the start() function should check for a
 "past end of file" condition and return NULL if need be.
 
 For more complicated applications, the private field of the seq_file
-structure can be used. There is also a special value which can be returned
-by the start() function called SEQ_START_TOKEN; it can be used if you wish
-to instruct your show() function (described below) to print a header at the
-top of the output. SEQ_START_TOKEN should only be used if the offset is
-zero, however.
+structure can be used to hold state from session to session.  There is
+also a special value which can be returned by the start() function
+called SEQ_START_TOKEN; it can be used if you wish to instruct your
+show() function (described below) to print a header at the top of the
+output. SEQ_START_TOKEN should only be used if the offset is zero,
+however.
 
 The next function to implement is called, amazingly, next(); its job is to
 move the iterator forward to the next position in the sequence.  The
@@ -121,9 +138,13 @@ complete. Here's the example version:
 	        return spos;
 	}
 
-The stop() function is called when iteration is complete; its job, of
-course, is to clean up. If dynamic memory is allocated for the iterator,
-stop() is the place to free it.
+The stop() function closes a session; its job, of course, is to clean
+up. If dynamic memory is allocated for the iterator, stop() is the
+place to free it; if a lock was taken by start(), stop() must release
+that lock.  The value that *pos was set to by the last next() call
+before stop() is remembered, and used for the first start() call of
+the next session unless lseek() has been called on the file; in that
+case next start() will be asked to start at position zero.
 
 	static void ct_seq_stop(struct seq_file *s, void *v)
 	{
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 4cc090b50cc5..fd82585ab50f 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -90,23 +90,22 @@ EXPORT_SYMBOL(seq_open);
 
 static int traverse(struct seq_file *m, loff_t offset)
 {
-	loff_t pos = 0, index;
+	loff_t pos = 0;
 	int error = 0;
 	void *p;
 
 	m->version = 0;
-	index = 0;
+	m->index = 0;
 	m->count = m->from = 0;
-	if (!offset) {
-		m->index = index;
+	if (!offset)
 		return 0;
-	}
+
 	if (!m->buf) {
 		m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
 		if (!m->buf)
 			return -ENOMEM;
 	}
-	p = m->op->start(m, &index);
+	p = m->op->start(m, &m->index);
 	while (p) {
 		error = PTR_ERR(p);
 		if (IS_ERR(p))
@@ -123,20 +122,15 @@ static int traverse(struct seq_file *m, loff_t offset)
 		if (pos + m->count > offset) {
 			m->from = offset - pos;
 			m->count -= m->from;
-			m->index = index;
 			break;
 		}
 		pos += m->count;
 		m->count = 0;
-		if (pos == offset) {
-			index++;
-			m->index = index;
+		p = m->op->next(m, p, &m->index);
+		if (pos == offset)
 			break;
-		}
-		p = m->op->next(m, p, &index);
 	}
 	m->op->stop(m, p);
-	m->index = index;
 	return error;
 
 Eoverflow:
@@ -160,7 +154,6 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 {
 	struct seq_file *m = file->private_data;
 	size_t copied = 0;
-	loff_t pos;
 	size_t n;
 	void *p;
 	int err = 0;
@@ -223,16 +216,11 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 		size -= n;
 		buf += n;
 		copied += n;
-		if (!m->count) {
-			m->from = 0;
-			m->index++;
-		}
 		if (!size)
 			goto Done;
 	}
 	/* we need at least one record in buffer */
-	pos = m->index;
-	p = m->op->start(m, &pos);
+	p = m->op->start(m, &m->index);
 	while (1) {
 		err = PTR_ERR(p);
 		if (!p || IS_ERR(p))
@@ -243,8 +231,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 		if (unlikely(err))
 			m->count = 0;
 		if (unlikely(!m->count)) {
-			p = m->op->next(m, p, &pos);
-			m->index = pos;
+			p = m->op->next(m, p, &m->index);
 			continue;
 		}
 		if (m->count < m->size)
@@ -256,29 +243,33 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 		if (!m->buf)
 			goto Enomem;
 		m->version = 0;
-		pos = m->index;
-		p = m->op->start(m, &pos);
+		p = m->op->start(m, &m->index);
 	}
 	m->op->stop(m, p);
 	m->count = 0;
 	goto Done;
 Fill:
 	/* they want more? let's try to get some more */
-	while (m->count < size) {
+	while (1) {
 		size_t offs = m->count;
-		loff_t next = pos;
-		p = m->op->next(m, p, &next);
+		loff_t pos = m->index;
+
+		p = m->op->next(m, p, &m->index);
+		if (pos == m->index)
+			/* Buggy ->next function */
+			m->index++;
 		if (!p || IS_ERR(p)) {
 			err = PTR_ERR(p);
 			break;
 		}
+		if (m->count >= size)
+			break;
 		err = m->op->show(m, p);
 		if (seq_has_overflowed(m) || err) {
 			m->count = offs;
 			if (likely(err <= 0))
 				break;
 		}
-		pos = next;
 	}
 	m->op->stop(m, p);
 	n = min(m->count, size);
@@ -287,11 +278,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 		goto Efault;
 	copied += n;
 	m->count -= n;
-	if (m->count)
-		m->from = n;
-	else
-		pos++;
-	m->index = pos;
+	m->from = n;
 Done:
 	if (!copied)
 		copied = err;
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related

* [PATCH 0/3] Add parameter for disabling ACS redirection for P2P
From: Logan Gunthorpe @ 2018-05-31 23:50 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Hi,

As discussed in our PCI P2PDMA series, we'd like to add a kernel
parameter for selectively disabling ACS redirection for select
bridges. Seeing this turned out to be a small series in itself, we've
decided to send this separately from the P2P work.

This series generalizes the code already done for the resource_alignment
option that already exists. The first patch creates a helper function
to match PCI devices against strings based on the code that already
existed in pci_specified_resource_alignment().

The second patch expands the new helper to optionally take a path of
PCI devfns. This is to address Alex's renumbering concern when using
simple bus-devfns. The implementation is essentially how he described it and
similar to the Intel VT-d spec (Section 8.3.1).

The final patch adds the disable_acs_redir kernel parameter which takes
a list of PCI devices and will disable the ACS P2P Request Redirect,
ACS P2P Completion Redirect and ACS P2P Egress Control bits for the
selected devices. This allows P2P traffic between selected bridges and
seeing it's done at boot, before IOMMU group creating the IOMMU groups
will be created correctly based on the bits.

Thanks,

Logan

--

Changes since v1:

* Reworked pci_dev_str_match_path using strrchr as suggested by Alex
* Collected Christian's Acks

--

Logan Gunthorpe (3):
  PCI: Make specifying PCI devices in kernel parameters reusable
  PCI: Allow specifying devices using a base bus and path of devfns
  PCI: Introduce the disable_acs_redir parameter

 Documentation/admin-guide/kernel-parameters.txt |  39 ++-
 drivers/pci/pci.c                               | 353 ++++++++++++++++++++----
 2 files changed, 331 insertions(+), 61 deletions(-)

--
2.11.0
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 3/3] PCI: Introduce the disable_acs_redir parameter
From: Logan Gunthorpe @ 2018-05-31 23:50 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe
In-Reply-To: <20180531235010.5279-1-logang@deltatee.com>

In order to support P2P traffic on a segment of the PCI hierarchy,
we must be able to disable the ACS redirect bits for select
PCI bridges. The bridges must be selected before the devices are
discovered by the kernel and the IOMMU groups created. Therefore,
a kernel command line parameter is created to specify devices
which must have their ACS bits disabled.

The new parameter takes a list of devices separated by a semicolon.
Each device specified will have it's ACS redirect bits disabled.
This is similar to the existing 'resource_alignment' parameter and just
like it we also create a sysfs bus attribute which can be used to
read the parameter. Writing the parameter is not supported
as it would require forcibly hot plugging the affected device as
well as all devices whose IOMMU groups might change.

The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel sets the bits or not seeing some BIOS firmware is known
to set the bits on boot.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   9 +++
 drivers/pci/pci.c                               | 103 +++++++++++++++++++++++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bc51b316f485..e4a761a2821b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3179,6 +3179,15 @@
 				Adding the window is slightly risky (it may
 				conflict with unreported devices), so this
 				taints the kernel.
+		disable_acs_redir=<pci_dev>[; ...]
+				Specify one or more PCI devices (in the format
+				specified above) separated by semicolons.
+				Each device specified will have the PCI ACS
+				redirect capabilities forced off which will
+				allow P2P traffic between devices through
+				bridges without forcing it upstream. Note:
+				this removes isolation between devices and
+				will make the IOMMU groups less granular.
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 39f11bd0ee03..7a5f3b97d0ff 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2993,6 +2993,92 @@ void pci_request_acs(void)
 	pci_acs_enable = 1;
 }
 
+#define DISABLE_ACS_REDIR_PARAM_SIZE COMMAND_LINE_SIZE
+static char disable_acs_redir_param[DISABLE_ACS_REDIR_PARAM_SIZE] = {0};
+static DEFINE_SPINLOCK(disable_acs_redir_lock);
+
+static ssize_t pci_set_disable_acs_redir_param(const char *buf, size_t count)
+{
+	if (count > DISABLE_ACS_REDIR_PARAM_SIZE - 1)
+		count = DISABLE_ACS_REDIR_PARAM_SIZE - 1;
+	spin_lock(&disable_acs_redir_lock);
+	strncpy(disable_acs_redir_param, buf, count);
+	disable_acs_redir_param[count] = '\0';
+	spin_unlock(&disable_acs_redir_lock);
+	return count;
+}
+
+static ssize_t pci_disable_acs_redir_show(struct bus_type *bus, char *buf)
+{
+	size_t count;
+
+	spin_lock(&disable_acs_redir_lock);
+	count = snprintf(buf, PAGE_SIZE, "%s\n", disable_acs_redir_param);
+	spin_unlock(&disable_acs_redir_lock);
+	return count;
+}
+
+static BUS_ATTR(disable_acs_redir, 0444, pci_disable_acs_redir_show, NULL);
+
+static int __init pci_disable_acs_redir_sysfs_init(void)
+{
+	return bus_create_file(&pci_bus_type, &bus_attr_disable_acs_redir);
+}
+late_initcall(pci_disable_acs_redir_sysfs_init);
+
+/**
+ * pci_disable_acs_redir - disable ACS redirect capabilities
+ * @dev: the PCI device
+ *
+ * For only devices specified in the disable_acs_redir parameter.
+ */
+static void pci_disable_acs_redir(struct pci_dev *dev)
+{
+	int ret = 0;
+	const char *p;
+	int pos;
+	u16 ctrl;
+
+	spin_lock(&disable_acs_redir_lock);
+
+	p = disable_acs_redir_param;
+	while (*p) {
+		ret = pci_dev_str_match(dev, p, &p);
+		if (ret < 0) {
+			pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
+				     disable_acs_redir_param);
+
+			break;
+		} else if (ret == 1) {
+			/* Found a match */
+			break;
+		}
+
+		if (*p != ';' && *p != ',') {
+			/* End of param or invalid format */
+			break;
+		}
+		p++;
+	}
+	spin_unlock(&disable_acs_redir_lock);
+
+	if (ret != 1)
+		return;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	/* P2P Request & Completion Redirect */
+	ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+
+	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+	pci_info(dev, "disabled ACS redirect\n");
+}
+
 /**
  * pci_std_enable_acs - enable ACS on devices using standard ACS capabilites
  * @dev: the PCI device
@@ -3032,12 +3118,22 @@ static void pci_std_enable_acs(struct pci_dev *dev)
 void pci_enable_acs(struct pci_dev *dev)
 {
 	if (!pci_acs_enable)
-		return;
+		goto disable_acs_redir;
 
 	if (!pci_dev_specific_enable_acs(dev))
-		return;
+		goto disable_acs_redir;
 
 	pci_std_enable_acs(dev);
+
+disable_acs_redir:
+	/*
+	 * Note: pci_disable_acs_redir() must be called even if
+	 * ACS is not enabled by the kernel because the firmware
+	 * may have unexpectedly set the flags. So if we are told
+	 * to disable it, we should always disable it after setting
+	 * the kernel's default preferences.
+	 */
+	pci_disable_acs_redir(dev);
 }
 
 static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
@@ -5990,6 +6086,9 @@ static int __init pci_setup(char *str)
 				pcie_bus_config = PCIE_BUS_PEER2PEER;
 			} else if (!strncmp(str, "pcie_scan_all", 13)) {
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
+			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
+				pci_set_disable_acs_redir_param(str + 18,
+					strlen(str + 18));
 			} else {
 				printk(KERN_ERR "PCI: Unknown option `%s'\n",
 						str);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 2/3] PCI: Allow specifying devices using a base bus and path of devfns
From: Logan Gunthorpe @ 2018-05-31 23:50 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe
In-Reply-To: <20180531235010.5279-1-logang@deltatee.com>

When specifying PCI devices on the kernel command line using a
BDF, the bus numbers can change when adding or replacing a device,
changing motherboard firmware, or applying kernel parameters like
pci=assign-buses. When this happens, it is usually undesirable to
apply whatever command line tweak to the wrong device.

Therefore, it is useful to be able to specify devices with a base
bus number and the path of devfns needed to get to it. (Similar to
the "device scope" structure in the Intel VT-d spec, Section 8.3.1.)

Thus, we add an option to specify devices in the following format:

path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]

The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device specification
and the tweak won't be applied to the wrong device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  12 ++-
 drivers/pci/pci.c                               | 101 +++++++++++++++++++++++-
 2 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e58cc671ff92..bc51b316f485 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2989,9 +2989,10 @@
 
 				Some options herein operate on a specific device
 				or a set of devices (<pci_dev>). These are
-				specified in one of two formats:
+				specified in one of three formats:
 
 				[<domain>:]<bus>:<slot>.<func>
+				path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
 				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
 
 				Note: the first format specifies a PCI
@@ -2999,9 +3000,12 @@
 				if new hardware is inserted, if motherboard
 				firmware changes, or due to changes caused
 				by other kernel parameters. The second format
-				selects devices using IDs from the
-				configuration space which may match multiple
-				devices in the system.
+				specifies a path from a device through
+				a path of multiple slot/function addresses
+				(this is more robust against renumbering
+				issues). The third format selects devices using
+				IDs from the configuration space which may match
+				multiple devices in the system.
 
 		earlydump	[X86] dump PCI config space before the kernel
 			        changes anything
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 85fec5e2640b..39f11bd0ee03 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -184,22 +184,111 @@ EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
 /**
+ * pci_dev_str_match_path - test if a path string matches a device
+ * @dev:    the PCI device to test
+ * @p:      string to match the device against
+ * @endptr: pointer to the string after the match
+ *
+ * Test if a string (typically from a kernel parameter) formated as a
+ * path of slot/function addresses matches a PCI device. The string must
+ * be of the form:
+ *
+ *   [<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
+ *
+ * A path for a device can be obtained using 'lspci -t'. Using a path
+ * is more robust against renumbering of devices than using only
+ * a single bus, slot and function address.
+ *
+ * Returns 1 if the string matches the device, 0 if it does not and
+ * a negative error code if it fails to parse the string.
+ */
+static int pci_dev_str_match_path(struct pci_dev *dev, const char *path,
+				  const char **endptr)
+{
+	int ret;
+	int seg, bus, slot, func;
+	char *wpath, *p;
+	char end;
+
+	*endptr = strchrnul(path, ';');
+
+	wpath = kmemdup_nul(path, *endptr - path, GFP_KERNEL);
+	if (!wpath)
+		return -ENOMEM;
+
+	while (1) {
+		p = strrchr(wpath, '/');
+		if (!p)
+			break;
+		ret = sscanf(p, "/%x.%x%c", &slot, &func, &end);
+		if (ret != 2) {
+			ret = -EINVAL;
+			goto free_and_exit;
+		}
+
+		if (dev->devfn != PCI_DEVFN(slot, func)) {
+			ret = 0;
+			goto free_and_exit;
+		}
+
+		/*
+		 * Note: we don't need to get a reference to the upstream
+		 * bridge because we hold a reference to the top level
+		 * device which should hold a reference to the bridge,
+		 * and so on.
+		 */
+		dev = pci_upstream_bridge(dev);
+		if (!dev) {
+			ret = 0;
+			goto free_and_exit;
+		}
+
+		*p = 0;
+	}
+
+	ret = sscanf(wpath, "%x:%x:%x.%x%c", &seg, &bus, &slot,
+		     &func, &end);
+	if (ret != 4) {
+		seg = 0;
+		ret = sscanf(wpath, "%x:%x.%x%c", &bus, &slot, &func, &end);
+		if (ret != 3) {
+			ret = -EINVAL;
+			goto free_and_exit;
+		}
+	}
+
+	ret = (seg == pci_domain_nr(dev->bus) &&
+	       bus == dev->bus->number &&
+	       dev->devfn == PCI_DEVFN(slot, func));
+
+free_and_exit:
+	kfree(wpath);
+	return ret;
+}
+
+/**
  * pci_dev_str_match - test if a string matches a device
  * @dev:    the PCI device to test
  * @p:      string to match the device against
  * @endptr: pointer to the string after the match
  *
  * Test if a string (typically from a kernel parameter) matches a
- * specified. The string may be of one of two forms formats:
+ * specified. The string may be of one of three formats:
  *
  *   [<domain>:]<bus>:<slot>.<func>
+ *   path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
  *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
  *
  * The first format specifies a PCI bus/slot/function address which
  * may change if new hardware is inserted, if motherboard firmware changes,
  * or due to changes caused in kernel parameters.
  *
- * The second format matches devices using IDs in the configuration
+ * The second format specifies a PCI bus/slot/function root address and
+ * a path of slot/function addresses to the specific device from the root.
+ * The path for a device can be determined through the use of 'lspci -t'.
+ * This format is more robust against renumbering issues than the first format.
+
+ * The third format matches devices using IDs in the configuration
  * space which may match multiple devices in the system. A value of 0
  * for any field will match all devices.
  *
@@ -236,7 +325,15 @@ static int pci_dev_str_match(struct pci_dev *dev, const char *p,
 		    (!subsystem_device ||
 			    subsystem_device == dev->subsystem_device))
 			goto found;
+	} else if (strncmp(p, "path:", 5) == 0) {
+		/* PCI Root Bus and a path of Slot,Function IDs */
+		p += 5;
 
+		ret = pci_dev_str_match_path(dev, p, &p);
+		if (ret < 0)
+			return ret;
+		else if (ret)
+			goto found;
 	} else {
 		/* PCI Bus,Slot,Function ids are specified */
 		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v2 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
From: Logan Gunthorpe @ 2018-05-31 23:50 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe
In-Reply-To: <20180531235010.5279-1-logang@deltatee.com>

Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper
function.

While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).

Additionally, make the analogous change to the kernel parameter
documentation: Separating the description of how to specify a PCI device
into it's own section at the head of the pci= parameter.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  26 +++-
 drivers/pci/pci.c                               | 153 +++++++++++++++---------
 2 files changed, 120 insertions(+), 59 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d46f095..e58cc671ff92 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2985,7 +2985,24 @@
 			See header of drivers/block/paride/pcd.c.
 			See also Documentation/blockdev/paride.txt.
 
-	pci=option[,option...]	[PCI] various PCI subsystem options:
+	pci=option[,option...]	[PCI] various PCI subsystem options.
+
+				Some options herein operate on a specific device
+				or a set of devices (<pci_dev>). These are
+				specified in one of two formats:
+
+				[<domain>:]<bus>:<slot>.<func>
+				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
+
+				Note: the first format specifies a PCI
+				bus/slot/function address which may change
+				if new hardware is inserted, if motherboard
+				firmware changes, or due to changes caused
+				by other kernel parameters. The second format
+				selects devices using IDs from the
+				configuration space which may match multiple
+				devices in the system.
+
 		earlydump	[X86] dump PCI config space before the kernel
 			        changes anything
 		off		[X86] don't probe for the PCI bus
@@ -3114,11 +3131,10 @@
 				window. The default value is 64 megabytes.
 		resource_alignment=
 				Format:
-				[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
-				[<order of align>@]pci:<vendor>:<device>\
-						[:<subvendor>:<subdevice>][; ...]
+				[<order of align>@]<pci_dev>[; ...]
 				Specifies alignment and device to reassign
-				aligned memory resources.
+				aligned memory resources. How to
+				specify the device is described above.
 				If <order of align> is not specified,
 				PAGE_SIZE is used as alignment.
 				PCI-PCI bridge can be specified, if resource
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index dbfe7c4f3776..85fec5e2640b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -183,6 +183,88 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
 EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
+/**
+ * pci_dev_str_match - test if a string matches a device
+ * @dev:    the PCI device to test
+ * @p:      string to match the device against
+ * @endptr: pointer to the string after the match
+ *
+ * Test if a string (typically from a kernel parameter) matches a
+ * specified. The string may be of one of two forms formats:
+ *
+ *   [<domain>:]<bus>:<slot>.<func>
+ *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
+ *
+ * The first format specifies a PCI bus/slot/function address which
+ * may change if new hardware is inserted, if motherboard firmware changes,
+ * or due to changes caused in kernel parameters.
+ *
+ * The second format matches devices using IDs in the configuration
+ * space which may match multiple devices in the system. A value of 0
+ * for any field will match all devices.
+ *
+ * Returns 1 if the string matches the device, 0 if it does not and
+ * a negative error code if the string cannot be parsed.
+ */
+static int pci_dev_str_match(struct pci_dev *dev, const char *p,
+			     const char **endptr)
+{
+	int ret;
+	int seg, bus, slot, func, count;
+	unsigned short vendor, device, subsystem_vendor, subsystem_device;
+
+	if (strncmp(p, "pci:", 4) == 0) {
+		/* PCI vendor/device (subvendor/subdevice) ids are specified */
+		p += 4;
+		ret = sscanf(p, "%hx:%hx:%hx:%hx%n", &vendor, &device,
+			     &subsystem_vendor, &subsystem_device, &count);
+		if (ret != 4) {
+			ret = sscanf(p, "%hx:%hx%n", &vendor, &device, &count);
+			if (ret != 2)
+				return -EINVAL;
+
+			subsystem_vendor = 0;
+			subsystem_device = 0;
+		}
+
+		p += count;
+
+		if ((!vendor || vendor == dev->vendor) &&
+		    (!device || device == dev->device) &&
+		    (!subsystem_vendor ||
+			    subsystem_vendor == dev->subsystem_vendor) &&
+		    (!subsystem_device ||
+			    subsystem_device == dev->subsystem_device))
+			goto found;
+
+	} else {
+		/* PCI Bus,Slot,Function ids are specified */
+		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
+			     &func, &count);
+		if (ret != 4) {
+			seg = 0;
+			ret = sscanf(p, "%x:%x.%x%n", &bus, &slot,
+				     &func, &count);
+			if (ret != 3)
+				return -EINVAL;
+		}
+
+		p += count;
+
+		if (seg == pci_domain_nr(dev->bus) &&
+		    bus == dev->bus->number &&
+		    slot == PCI_SLOT(dev->devfn) &&
+		    func == PCI_FUNC(dev->devfn))
+			goto found;
+	}
+
+	*endptr = p;
+	return 0;
+
+found:
+	*endptr = p;
+	return 1;
+}
 
 static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,
 				   u8 pos, int cap, int *ttl)
@@ -5462,10 +5544,10 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
 static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
 							bool *resize)
 {
-	int seg, bus, slot, func, align_order, count;
-	unsigned short vendor, device, subsystem_vendor, subsystem_device;
+	int align_order, count;
 	resource_size_t align = pcibios_default_alignment();
-	char *p;
+	const char *p;
+	int ret;
 
 	spin_lock(&resource_alignment_lock);
 	p = resource_alignment_param;
@@ -5485,58 +5567,21 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
 		} else {
 			align_order = -1;
 		}
-		if (strncmp(p, "pci:", 4) == 0) {
-			/* PCI vendor/device (subvendor/subdevice) ids are specified */
-			p += 4;
-			if (sscanf(p, "%hx:%hx:%hx:%hx%n",
-				&vendor, &device, &subsystem_vendor, &subsystem_device, &count) != 4) {
-				if (sscanf(p, "%hx:%hx%n", &vendor, &device, &count) != 2) {
-					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: pci:%s\n",
-						p);
-					break;
-				}
-				subsystem_vendor = subsystem_device = 0;
-			}
-			p += count;
-			if ((!vendor || (vendor == dev->vendor)) &&
-				(!device || (device == dev->device)) &&
-				(!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
-				(!subsystem_device || (subsystem_device == dev->subsystem_device))) {
-				*resize = true;
-				if (align_order == -1)
-					align = PAGE_SIZE;
-				else
-					align = 1 << align_order;
-				/* Found */
-				break;
-			}
-		}
-		else {
-			if (sscanf(p, "%x:%x:%x.%x%n",
-				&seg, &bus, &slot, &func, &count) != 4) {
-				seg = 0;
-				if (sscanf(p, "%x:%x.%x%n",
-						&bus, &slot, &func, &count) != 3) {
-					/* Invalid format */
-					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n",
-						p);
-					break;
-				}
-			}
-			p += count;
-			if (seg == pci_domain_nr(dev->bus) &&
-				bus == dev->bus->number &&
-				slot == PCI_SLOT(dev->devfn) &&
-				func == PCI_FUNC(dev->devfn)) {
-				*resize = true;
-				if (align_order == -1)
-					align = PAGE_SIZE;
-				else
-					align = 1 << align_order;
-				/* Found */
-				break;
-			}
+
+		ret = pci_dev_str_match(dev, p, &p);
+		if (ret == 1) {
+			*resize = true;
+			if (align_order == -1)
+				align = PAGE_SIZE;
+			else
+				align = 1 << align_order;
+			break;
+		} else if (ret < 0) {
+			pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",
+				p);
+			break;
 		}
+
 		if (*p != ';' && *p != ',') {
 			/* End of param or invalid format */
 			break;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v1 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS
From: Marc Zyngier @ 2018-06-01  9:17 UTC (permalink / raw)
  To: Dongjiu Geng, rkrcmar, corbet, christoffer.dall, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1527772139-19665-3-git-send-email-gengdongjiu@huawei.com>

On 31/05/18 14:08, Dongjiu Geng wrote:
> For the migrating VMs, user space may need to know the exception
> state. For example, in the machine A, KVM make an SError pending,
> when migrate to B, KVM also needs to pend an SError.
> 
> This new IOCTL exports user-invisible states related to SError.
> Together with appropriate user space changes, user space can get/set
> the SError exception state to do migrate/snapshot/suspend.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> --
> this series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html
> change since V12:
> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) in kvm_arm_vcpu_get_events()
> 
> Change since V11:
> Address James's comments, thanks James
> 1. Align the struct of kvm_vcpu_events to 64 bytes
> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
> 3. Change variables 'injected' name to 'serror_pending' in the kvm_arm_vcpu_set_events()
> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in kvm_arch_vcpu_ioctl()
> 
> Change since V10:
> Address James's comments, thanks James
> 1. Merge the helper function with the user.
> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
> 3. Make kvm_vcpu_events struct align to 4 bytes
> 4. Add something check in the kvm_arm_vcpu_set_events()
> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space doesn't
>    contain kernel stack.
> ---
>  Documentation/virtual/kvm/api.txt    | 31 ++++++++++++++++++++++++++++---
>  arch/arm/include/asm/kvm_host.h      |  6 ++++++
>  arch/arm/kvm/guest.c                 | 12 ++++++++++++
>  arch/arm64/include/asm/kvm_emulate.h |  5 +++++
>  arch/arm64/include/asm/kvm_host.h    |  7 +++++++
>  arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++++
>  arch/arm64/kvm/guest.c               | 36 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/inject_fault.c        |  7 ++++++-
>  arch/arm64/kvm/reset.c               |  1 +
>  virt/kvm/arm/arm.c                   | 21 +++++++++++++++++++++
>  10 files changed, 135 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index fdac969..8896737 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>  
>  Capability: KVM_CAP_VCPU_EVENTS
>  Extended by: KVM_CAP_INTR_SHADOW
> -Architectures: x86
> +Architectures: x86, arm, arm64
>  Type: vm ioctl
>  Parameters: struct kvm_vcpu_event (out)
>  Returns: 0 on success, -1 on error
>  
> +X86:
> +
>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>  states of the vcpu.
>  
> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>    smi contains a valid state.
>  
> +ARM, ARM64:
> +
> +Gets currently pending SError exceptions as well as related states of the vcpu.
> +
> +struct kvm_vcpu_events {
> +	struct {
> +		__u8 serror_pending;
> +		__u8 serror_has_esr;
> +		/* Align it to 8 bytes */
> +		__u8 pad[6];
> +		__u64 serror_esr;
> +	} exception;
> +	__u32 reserved[12];
> +};
> +
>  4.32 KVM_SET_VCPU_EVENTS
>  
> -Capability: KVM_CAP_VCPU_EVENTS
> +Capebility: KVM_CAP_VCPU_EVENTS
>  Extended by: KVM_CAP_INTR_SHADOW
> -Architectures: x86
> +Architectures: x86, arm, arm64
>  Type: vm ioctl
>  Parameters: struct kvm_vcpu_event (in)
>  Returns: 0 on success, -1 on error
>  
> +X86:
> +
>  Set pending exceptions, interrupts, and NMIs as well as related states of the
>  vcpu.
>  
> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>  
>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>  
> +ARM, ARM64:
> +
> +Set pending SError exceptions as well as related states of the vcpu.
> +
> +See KVM_GET_VCPU_EVENTS for the data structure.
> +
>  
>  4.33 KVM_GET_DEBUGREGS
>  
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index c7c28c8..39f9901 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -213,6 +213,12 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events);
> +
> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events);
> +
>  unsigned long kvm_call_hyp(void *hypfn, ...);
>  void force_vm_exit(const cpumask_t *mask);
>  
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index a18f33e..c685f0e 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -261,6 +261,18 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>  	return -EINVAL;
>  }
>  
> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events)
> +{
> +	return -EINVAL;
> +}
> +
> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events)
> +{
> +	return -EINVAL;
> +}
> +
>  int __attribute_const__ kvm_target_cpu(void)
>  {
>  	switch (read_cpuid_part()) {
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 1dab3a9..18f61ff 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -81,6 +81,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>  	return (unsigned long *)&vcpu->arch.hcr_el2;
>  }
>  
> +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.vsesr_el2;
> +}
> +
>  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
>  {
>  	vcpu->arch.vsesr_el2 = vsesr;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 469de8a..357304a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -335,6 +335,11 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events);
> +
> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events);
>  
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
> @@ -363,6 +368,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
> +
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
>  void __kvm_set_tpidr_el2(u64 tpidr_el2);
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 04b3256..df4faee 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -39,6 +39,7 @@
>  #define __KVM_HAVE_GUEST_DEBUG
>  #define __KVM_HAVE_IRQ_LINE
>  #define __KVM_HAVE_READONLY_MEM
> +#define __KVM_HAVE_VCPU_EVENTS
>  
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  
> @@ -153,6 +154,18 @@ struct kvm_sync_regs {
>  struct kvm_arch_memory_slot {
>  };
>  
> +/* for KVM_GET/SET_VCPU_EVENTS */
> +struct kvm_vcpu_events {
> +	struct {
> +		__u8 serror_pending;
> +		__u8 serror_has_esr;
> +		/* Align it to 8 bytes */
> +		__u8 pad[6];
> +		__u64 serror_esr;
> +	} exception;
> +	__u32 reserved[12];
> +};
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT	16
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 56a0260..71d3841 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -289,6 +289,42 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>  	return -EINVAL;
>  }
>  
> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events)
> +{
> +	events->exception.serror_pending = !!(vcpu->arch.hcr_el2 & HCR_VSE);
> +	events->exception.serror_has_esr =
> +			cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
> +					(!!vcpu_get_vsesr(vcpu));

This is odd. Isn't VSESR==0 a valid value? And isn't
serror_has_esr always true when ARM64_HAS_RAS_EXTN is set?

> +
> +	if (events->exception.serror_pending &&
> +		events->exception.serror_has_esr)
> +		events->exception.serror_esr = vcpu_get_vsesr(vcpu);
> +	else
> +		events->exception.serror_esr = 0;
> +
> +	return 0;
> +}
> +
> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> +			struct kvm_vcpu_events *events)
> +{
> +	bool serror_pending = events->exception.serror_pending;
> +	bool has_esr = events->exception.serror_has_esr;
> +
> +	if (serror_pending && has_esr) {
> +		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> +			return -EINVAL;
> +
> +		kvm_set_sei_esr(vcpu, events->exception.serror_esr);
> +

Spurious blank line

> +	} else if (serror_pending) {
> +		kvm_inject_vabt(vcpu);
> +	}
> +
> +	return 0;
> +}
> +
>  int __attribute_const__ kvm_target_cpu(void)
>  {
>  	unsigned long implementor = read_cpuid_implementor();
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index d8e7165..9e0ca56 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -166,7 +166,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>  
>  static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
>  {
> -	vcpu_set_vsesr(vcpu, esr);
> +	vcpu_set_vsesr(vcpu, esr & ESR_ELx_ISS_MASK);
>  	*vcpu_hcr(vcpu) |= HCR_VSE;
>  }
>  
> @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>  {
>  	pend_guest_serror(vcpu, ESR_ELx_ISV);
>  }
> +
> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome)
> +{
> +	pend_guest_serror(vcpu, syndrome);
> +}

I think it'd make more sense to rename pend_guest_serror to
kvm_set_sei_esr and be done with it.

> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 38c8a64..20e919a 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -82,6 +82,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG:
>  	case KVM_CAP_VCPU_ATTRIBUTES:
> +	case KVM_CAP_VCPU_EVENTS:
>  		r = 1;
>  		break;
>  	default:
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index a4c1b76..8b43968 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1107,6 +1107,27 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		r = kvm_arm_vcpu_has_attr(vcpu, &attr);
>  		break;
>  	}
> +	case KVM_GET_VCPU_EVENTS: {
> +		struct kvm_vcpu_events events;
> +
> +		memset(&events, 0, sizeof(events));

You could write this as

		struct kvm_cpu_events events = { };

but it'd make more sense if kvm_arm_vcpu_get_events() did all the work
rather than having this split responsibility.

> +		if (kvm_arm_vcpu_get_events(vcpu, &events))
> +			return -EINVAL;
> +
> +		if (copy_to_user(argp, &events, sizeof(events)))
> +			return -EFAULT;
> +
> +		return 0;
> +	}
> +	case KVM_SET_VCPU_EVENTS: {
> +		struct kvm_vcpu_events events;
> +
> +		if (copy_from_user(&events, argp,
> +				sizeof(struct kvm_vcpu_events)))

Prefer using sizeof(events) instead.

> +			return -EFAULT;
> +
> +		return kvm_arm_vcpu_set_events(vcpu, &events);
> +	}
>  	default:
>  		r = -EINVAL;
>  	}
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/4] clk: renesas: cpg-mssr: Stop using printk format %pCr
From: Geert Uytterhoeven @ 2018-06-01  9:28 UTC (permalink / raw)
  To: Jia-Ju Bai, Jonathan Corbet, Michael Turquette, Stephen Boyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1527845302-12159-1-git-send-email-geert+renesas@glider.be>

Printk format "%pCr" will be removed soon, as clk_get_rate() must not be
called in atomic context.

Replace it by open-coding the operation.  This is safe here, as the code
runs in task context.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/clk/renesas/renesas-cpg-mssr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c
index 49e510691eeeab07..f4b013e9352d9efc 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -258,8 +258,9 @@ struct clk *cpg_mssr_clk_src_twocell_get(struct of_phandle_args *clkspec,
 		dev_err(dev, "Cannot get %s clock %u: %ld", type, clkidx,
 		       PTR_ERR(clk));
 	else
-		dev_dbg(dev, "clock (%u, %u) is %pC at %pCr Hz\n",
-			clkspec->args[0], clkspec->args[1], clk, clk);
+		dev_dbg(dev, "clock (%u, %u) is %pC at %lu Hz\n",
+			clkspec->args[0], clkspec->args[1], clk,
+			clk_get_rate(clk));
 	return clk;
 }
 
@@ -326,7 +327,7 @@ static void __init cpg_mssr_register_core_clk(const struct cpg_core_clk *core,
 	if (IS_ERR_OR_NULL(clk))
 		goto fail;
 
-	dev_dbg(dev, "Core clock %pC at %pCr Hz\n", clk, clk);
+	dev_dbg(dev, "Core clock %pC at %lu Hz\n", clk, clk_get_rate(clk));
 	priv->clks[id] = clk;
 	return;
 
@@ -392,7 +393,7 @@ static void __init cpg_mssr_register_mod_clk(const struct mssr_mod_clk *mod,
 	if (IS_ERR(clk))
 		goto fail;
 
-	dev_dbg(dev, "Module clock %pC at %pCr Hz\n", clk, clk);
+	dev_dbg(dev, "Module clock %pC at %lu Hz\n", clk, clk_get_rate(clk));
 	priv->clks[id] = clk;
 	priv->smstpcr_saved[clock->index / 32].mask |= BIT(clock->index % 32);
 	return;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/4] serial: sh-sci: Stop using printk format %pCr
From: Geert Uytterhoeven @ 2018-06-01  9:28 UTC (permalink / raw)
  To: Jia-Ju Bai, Jonathan Corbet, Michael Turquette, Stephen Boyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1527845302-12159-1-git-send-email-geert+renesas@glider.be>

Printk format "%pCr" will be removed soon, as clk_get_rate() must not be
called in atomic context.

Replace it by open-coding the operation.  This is safe here, as the code
runs in task context.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/tty/serial/sh-sci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index b46b146524ce7495..c181eb37f98509e6 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -2724,8 +2724,8 @@ static int sci_init_clocks(struct sci_port *sci_port, struct device *dev)
 			dev_dbg(dev, "failed to get %s (%ld)\n", clk_names[i],
 				PTR_ERR(clk));
 		else
-			dev_dbg(dev, "clk %s is %pC rate %pCr\n", clk_names[i],
-				clk, clk);
+			dev_dbg(dev, "clk %s is %pC rate %lu\n", clk_names[i],
+				clk, clk_get_rate(clk));
 		sci_port->clks[i] = IS_ERR(clk) ? NULL : clk;
 	}
 	return 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 4/4] lib/vsprintf: Remove atomic-unsafe support for %pCr
From: Geert Uytterhoeven @ 2018-06-01  9:28 UTC (permalink / raw)
  To: Jia-Ju Bai, Jonathan Corbet, Michael Turquette, Stephen Boyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1527845302-12159-1-git-send-email-geert+renesas@glider.be>

"%pCr" formats the current rate of a clock, and calls clk_get_rate().
The latter obtains a mutex, hence it must not be called from atomic
context.

Remove support for this rarely-used format, as vsprintf() (and e.g.
printk()) must be callable from any context.

Any remaining out-of-tree users will start seeing the clock's name
printed instead of its rate.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes: 900cca2944254edd ("lib/vsprintf: add %pC{,n,r} format specifiers for clocks")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 Documentation/core-api/printk-formats.rst | 3 +--
 lib/vsprintf.c                            | 3 ---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index eb30efdd2e789616..25dc591cb1108790 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -419,11 +419,10 @@ struct clk
 
 	%pC	pll1
 	%pCn	pll1
-	%pCr	1560000000
 
 For printing struct clk structures. %pC and %pCn print the name
 (Common Clock Framework) or address (legacy clock framework) of the
-structure; %pCr prints the current clock rate.
+structure.
 
 Passed by reference.
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 247a7e0bf24f6f74..a48aaa79d352313a 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1469,9 +1469,6 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec,
 		return string(buf, end, NULL, spec);
 
 	switch (fmt[1]) {
-	case 'r':
-		return number(buf, end, clk_get_rate(clk), spec);
-
 	case 'n':
 	default:
 #ifdef CONFIG_COMMON_CLK
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 0/4] lib/vsprintf: Remove atomic-unsafe support for printk format %pCr
From: Geert Uytterhoeven @ 2018-06-01  9:28 UTC (permalink / raw)
  To: Jia-Ju Bai, Jonathan Corbet, Michael Turquette, Stephen Boyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven

	Hi all,

"%pCr" formats the current rate of a clock, and calls clk_get_rate().
The latter obtains a mutex, hence it must not be called from atomic
context.  As vsprintf() (and e.g. printk()) must be callable from any
context, it's better to remove support for this (rarely-used) format.

This patch series:
  - Changes all existing users of "%pCr" to print the result of
    clk_get_rate() directly, which is safe as they all do this in task
    context only,
  - Removes support for the "%pCr" printk format.

Note that any remaining out-of-tree users will start seeing the clock's
name printed instead of its rate.

Thanks for your comments!

Geert Uytterhoeven (4):
  clk: renesas: cpg-mssr: Stop using printk format %pCr
  thermal: bcm2835: Stop using printk format %pCr
  serial: sh-sci: Stop using printk format %pCr
  lib/vsprintf: Remove atomic-unsafe support for %pCr

 Documentation/core-api/printk-formats.rst  | 3 +--
 drivers/clk/renesas/renesas-cpg-mssr.c     | 9 +++++----
 drivers/thermal/broadcom/bcm2835_thermal.c | 4 ++--
 drivers/tty/serial/sh-sci.c                | 4 ++--
 lib/vsprintf.c                             | 3 ---
 5 files changed, 10 insertions(+), 13 deletions(-)

-- 
2.7.4

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 2/4] thermal: bcm2835: Stop using printk format %pCr
From: Geert Uytterhoeven @ 2018-06-01  9:28 UTC (permalink / raw)
  To: Jia-Ju Bai, Jonathan Corbet, Michael Turquette, Stephen Boyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1527845302-12159-1-git-send-email-geert+renesas@glider.be>

Printk format "%pCr" will be removed soon, as clk_get_rate() must not be
called in atomic context.

Replace it by printing the variable that already holds the clock rate.
Note that calling clk_get_rate() is safe here, as the code runs in task
context.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/thermal/broadcom/bcm2835_thermal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/broadcom/bcm2835_thermal.c b/drivers/thermal/broadcom/bcm2835_thermal.c
index a4d6a0e2e9938190..23ad4f9f21438e45 100644
--- a/drivers/thermal/broadcom/bcm2835_thermal.c
+++ b/drivers/thermal/broadcom/bcm2835_thermal.c
@@ -213,8 +213,8 @@ static int bcm2835_thermal_probe(struct platform_device *pdev)
 	rate = clk_get_rate(data->clk);
 	if ((rate < 1920000) || (rate > 5000000))
 		dev_warn(&pdev->dev,
-			 "Clock %pCn running at %pCr Hz is outside of the recommended range: 1.92 to 5MHz\n",
-			 data->clk, data->clk);
+			 "Clock %pCn running at %lu Hz is outside of the recommended range: 1.92 to 5MHz\n",
+			 data->clk, rate);
 
 	/* register of thermal sensor and get info from DT */
 	tz = thermal_zone_of_sensor_register(&pdev->dev, 0, data,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 2/4] thermal: bcm2835: Stop using printk format %pCr
From: Stefan Wahren @ 2018-06-01  9:35 UTC (permalink / raw)
  To: Geert Uytterhoeven, Jia-Ju Bai, Jonathan Corbet,
	Michael Turquette, Stephen Boyd, Zhang Rui, Eduardo Valentin,
	Eric Anholt, Greg Kroah-Hartman
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, Steven Rostedt,
	linux-doc, linux-clk, linux-pm, linux-serial, linux-arm-kernel,
	linux-renesas-soc, linux-kernel
In-Reply-To: <1527845302-12159-3-git-send-email-geert+renesas@glider.be>



Am 01.06.2018 um 11:28 schrieb Geert Uytterhoeven:
> Printk format "%pCr" will be removed soon, as clk_get_rate() must not be
> called in atomic context.
>
> Replace it by printing the variable that already holds the clock rate.
> Note that calling clk_get_rate() is safe here, as the code runs in task
> context.
>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
From: Andy Shevchenko @ 2018-06-01 10:39 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Linux Kernel Mailing List, linux-pci, Linux Documentation List,
	Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König
In-Reply-To: <20180531235010.5279-2-logang@deltatee.com>

On Fri, Jun 1, 2018 at 2:50 AM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Separate out the code to match a PCI device with a string (typically
> originating from a kernel parameter) from the
> pci_specified_resource_alignment() function into its own helper
> function.
>
> While we are at it, this change fixes the kernel style of the function
> (fixing a number of long lines and extra parentheses).
>
> Additionally, make the analogous change to the kernel parameter
> documentation: Separating the description of how to specify a PCI device
> into it's own section at the head of the pci= parameter.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> Acked-by: Christian König <christian.koenig@amd.com>

Sorry for maybe being late here.

> -       pci=option[,option...]  [PCI] various PCI subsystem options:
> +       pci=option[,option...]  [PCI] various PCI subsystem options.
> +
> +                               Some options herein operate on a specific device
> +                               or a set of devices (<pci_dev>). These are
> +                               specified in one of two formats:

I would rather to add

 <pci_dev>

here in the same way as done for options.
It would be easy for people to find a referenced paragraph.

> +
> +                               [<domain>:]<bus>:<slot>.<func>
> +                               pci:<vendor>:<device>[:<subvendor>:<subdevice>]
> +
> +                               Note: the first format specifies a PCI
> +                               bus/slot/function address which may change
> +                               if new hardware is inserted, if motherboard
> +                               firmware changes, or due to changes caused
> +                               by other kernel parameters. The second format
> +                               selects devices using IDs from the
> +                               configuration space which may match multiple
> +                               devices in the system.

> +static int pci_dev_str_match(struct pci_dev *dev, const char *p,
> +                            const char **endptr)

This change I hope has no functional alteration, so, can be split to a
separate patch.

-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 2/3] PCI: Allow specifying devices using a base bus and path of devfns
From: Andy Shevchenko @ 2018-06-01 10:41 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Linux Kernel Mailing List, linux-pci, Linux Documentation List,
	Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König
In-Reply-To: <20180531235010.5279-3-logang@deltatee.com>

On Fri, Jun 1, 2018 at 2:50 AM, Logan Gunthorpe <logang@deltatee.com> wrote:
> When specifying PCI devices on the kernel command line using a
> BDF, the bus numbers can change when adding or replacing a device,
> changing motherboard firmware, or applying kernel parameters like
> pci=assign-buses. When this happens, it is usually undesirable to
> apply whatever command line tweak to the wrong device.
>
> Therefore, it is useful to be able to specify devices with a base
> bus number and the path of devfns needed to get to it. (Similar to
> the "device scope" structure in the Intel VT-d spec, Section 8.3.1.)
>
> Thus, we add an option to specify devices in the following format:
>
> path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
>
> The path can be any segment within the PCI hierarchy of any length and
> determined through the use of 'lspci -t'. When specified this way, it is
> less likely that a renumbered bus will result in a valid device specification
> and the tweak won't be applied to the wrong device.
>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> Acked-by: Christian König <christian.koenig@amd.com>

> -                               specified in one of two formats:
> +                               specified in one of three formats:

...in one of the following formats:

in the first place and don't fix it each time you add/remove one?


-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/4] lib/vsprintf: Remove atomic-unsafe support for printk format %pCr
From: Linus Torvalds @ 2018-06-01 11:00 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: baijiaju1990, Jonathan Corbet, Michael Turquette, sboyd,
	Zhang Rui, Eduardo Valentin, Eric Anholt, Stefan Wahren,
	Greg Kroah-Hartman, Sergey Senozhatsky, Petr Mladek,
	Steven Rostedt, open list:DOCUMENTATION, linux-clk, Linux PM,
	linux-serial, linux-arm-kernel, Linux-Renesas,
	Linux Kernel Mailing List
In-Reply-To: <1527845302-12159-1-git-send-email-geert+renesas@glider.be>

On Fri, Jun 1, 2018 at 4:29 AM Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
>
> This patch series:
>   - Changes all existing users of "%pCr" to print the result of
>     clk_get_rate() directly, which is safe as they all do this in task
>     context only,
>   - Removes support for the "%pCr" printk format.

Looks good to me.

What tree will this go through? The normal printk one? Just checking
that this doesn't end up falling through the cracks because nobody
knows who would take it...

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox