Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-29 12:40 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180529062703.GA8985@localhost.localdomain>

On 05/29/2018 02:27 AM, Juri Lelli wrote:
> On 28/05/18 21:24, Waiman Long wrote:
>> On 05/28/2018 09:12 PM, Waiman Long wrote:
>>> On 05/24/2018 06:28 AM, Juri Lelli wrote:
>>>> On 17/05/18 16:55, Waiman Long wrote:
>>>>
>>>> [...]
>>>>
>>>>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>>>>>  	 * passing doms with offlined cpu to partition_sched_domains().
>>>>>  	 * Anyways, hotplug work item will rebuild sched domains.
>>>>>  	 */
>>>>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>> +	if (!top_cpuset.isolation_count &&
>>>>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>> +		goto out;
>>>>> +
>>>>> +	if (top_cpuset.isolation_count &&
>>>>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>>>>>  		goto out;
>>>> Do we cover the case in which hotplug removed one of the isolated cpus
>>>> from cpu_active_mask?
>>> Yes, you are right. That is the remnant of my original patch that allows
>>> only one isolated_cpus at root. Thanks for spotting that.
>> I am sorry. I would like to take it back my previous comment. The code
>> above looks for inconsistency in the state of the effective_cpus mask to
>> find out if it is racing with a hotplug event. If it is, we can skip the
>> domain generation as the hotplug event will do that too. The checks are
>> still valid with the current patchset. So I don't think we need to make
>> any change here.
> Yes, these checks are valid, but don't we also need to check for hotplug
> races w.r.t. isolated CPUs (of some other sub domain)?

It is not actually a race. Both the hotplug event and any changes to cpu
lists or flags are serialized by the cpuset_mutex. It is just that we
may be doing the same work twice that we are wasting cpu cycles. So we
are doing a quick check to avoid this. The check isn't exhaustive and we
can certainly miss some cases. Doing a more throughout check may need as
much time as doing the sched domain generation itself and so you are
actually wasting more CPU cycles on average as the chance of a hotplug
event is very low.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Documentation: document hung_task_panic kernel parameter
From: Jonathan Corbet @ 2018-05-29 12:41 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-doc, linux-kernel, kernel-team
In-Reply-To: <53ad4158547699c2d35ba817394750dc2487158a.1526926625.git.osandov@fb.com>

On Mon, 21 May 2018 11:18:17 -0700
Omar Sandoval <osandov@osandov.com> wrote:

> This parameter has been around since commit e162b39a368f ("softlockup:
> decouple hung tasks check from softlockup detection") in 2009 but was
> never documented.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Applied, thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Juri Lelli @ 2018-05-29 13:12 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <8164a41b-3218-c618-64a6-52747344c4db@redhat.com>

On 29/05/18 08:40, Waiman Long wrote:
> On 05/29/2018 02:27 AM, Juri Lelli wrote:
> > On 28/05/18 21:24, Waiman Long wrote:
> >> On 05/28/2018 09:12 PM, Waiman Long wrote:
> >>> On 05/24/2018 06:28 AM, Juri Lelli wrote:
> >>>> On 17/05/18 16:55, Waiman Long wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
> >>>>>  	 * passing doms with offlined cpu to partition_sched_domains().
> >>>>>  	 * Anyways, hotplug work item will rebuild sched domains.
> >>>>>  	 */
> >>>>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> >>>>> +	if (!top_cpuset.isolation_count &&
> >>>>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	if (top_cpuset.isolation_count &&
> >>>>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
> >>>>>  		goto out;
> >>>> Do we cover the case in which hotplug removed one of the isolated cpus
> >>>> from cpu_active_mask?
> >>> Yes, you are right. That is the remnant of my original patch that allows
> >>> only one isolated_cpus at root. Thanks for spotting that.
> >> I am sorry. I would like to take it back my previous comment. The code
> >> above looks for inconsistency in the state of the effective_cpus mask to
> >> find out if it is racing with a hotplug event. If it is, we can skip the
> >> domain generation as the hotplug event will do that too. The checks are
> >> still valid with the current patchset. So I don't think we need to make
> >> any change here.
> > Yes, these checks are valid, but don't we also need to check for hotplug
> > races w.r.t. isolated CPUs (of some other sub domain)?
> 
> It is not actually a race. Both the hotplug event and any changes to cpu
> lists or flags are serialized by the cpuset_mutex. It is just that we
> may be doing the same work twice that we are wasting cpu cycles. So we
> are doing a quick check to avoid this. The check isn't exhaustive and we
> can certainly miss some cases. Doing a more throughout check may need as
> much time as doing the sched domain generation itself and so you are
> actually wasting more CPU cycles on average as the chance of a hotplug
> event is very low.

Fair enough.

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 0/4] Perf script: Add python script for CoreSight trace disassembler
From: Arnaldo Carvalho de Melo @ 2018-05-29 13:32 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Leo Yan, Jonathan Corbet, Robert Walker, Mike Leach, Kim Phillips,
	Tor Jeremiassen, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	open list:DOCUMENTATION, Linux Kernel Mailing List, coresight
In-Reply-To: <CANLsYkzn5qyzjxMiCPQ1GxyNjhHJp-2H6Lds11HP9rG5xug0FA@mail.gmail.com>

Em Mon, May 28, 2018 at 03:53:42PM -0600, Mathieu Poirier escreveu:
> On 28 May 2018 at 14:03, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
> >> This patch series is to support for using 'perf script' for CoreSight
> >> trace disassembler, for this purpose this patch series adds a new
> >> python script to parse CoreSight tracing event and use command 'objdump'
> >> for disassembled lines, finally this can generate readable program
> >> execution flow for reviewing tracing data.
> >>
> >> Patch 0001 is one fixing patch to generate samples for the start packet
> >> and exception packets.
> >>
> >> Patch 0002 is the prerequisite to add addr into sample dict, so this
> >> value can be used by python script to analyze instruction range.
> >>
> >> Patch 0003 is to add python script for trace disassembler.
> >>
> >> Patch 0004 is to add doc to explain python script usage and give
> >> example for it.
> >>
> >> This patch series has been rebased on acme git tree [1] with the commit
> >> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
> >> on Hikey (ARM64 octa CA53 cores).
> >
> > Thanks, applied to perf/core.
> 
> Please hold off on that Arnaldo - I'm currently reviewing the set and
> I think some things can be improved.

Ok, I dropped all but the one adding sample->addr to the python
dictionary, that is ok to cherry pick.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v9 0/7] Enable cpuset controller in default hierarchy
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

v9:
 - Rename cpuset.sched.domain to cpuset.sched.domain_root to better
   identify its purpose as the root of a new scheduling domain or
   partition.
 - Clarify in the document about the purpose of domain_root and
   load_balance. Using domain_root is th only way to create new
   partition.
 - Fix a lockdep warning in update_isolated_cpumask() function.
 - Add a new patch to eliminate call to generate_sched_domains() for
   v2 when a change in cpu list does not touch a domain_root.

v8:
 - Remove cpuset.cpus.isolated and add a new cpuset.sched.domain flag
   and rework the code accordingly.

v7:
 - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
 - Enforce that load_balancing can only be turned off on cpusets with
   CPUs from the isolated list.
 - Update sched domain generation to allow cpusets with CPUs only
   from the isolated CPU list to be in separate root domains.

v6:
 - Hide cpuset control knobs in root cgroup.
 - Rename effective_cpus and effective_mems to cpus.effective and
   mems.effective respectively.
 - Remove cpuset.flags and add cpuset.sched_load_balance instead
   as the behavior of sched_load_balance has changed and so is
   not a simple flag.
 - Update cgroup-v2.txt accordingly.

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

v6 patch: https://lkml.org/lkml/2018/3/21/530
v7 patch: https://lkml.org/lkml/2018/4/19/448
v8 patch: https://lkml.org/lkml/2018/5/17/939

The purpose of this patchset is to provide a basic set of cpuset control
files for cgroup v2. This basic set includes the non-root "cpus", "mems",
"sched.load_balance" and "sched.domain_root". The "cpus.effective" and
"mems.effective" will appear in all cpuset-enabled cgroups.

The new control file that is unique to v2 is "sched.domain_root". It
is a boolean flag file that designates if a cgroup is the root of a new
scheduling domain or partition with its own set of unique list of CPUs
from scheduling perspective disjointed from other partitions. The root
cgroup is always a scheduling domain root. Multiple levels of scheduling
domains are supported with some limitations. So a container scheduling
domain root can behave like a real root.

When a scheduling domain root cgroup is removed, its list of exclusive
CPUs will be returned to the parent's cpus.effective automatically.

The "sched.load_balance" flag can only be changed in a scheduling
domain root with no child cpuset-enabled cgroups while the rests
inherit its value from their parents. This ensures that all cpusets
within the same partition will have the same load balancing state. The
"sched.load_balance" flag can no longer be used to create additional
partition as a side effect.

This patchset does not exclude the possibility of adding more features
in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their
effective counterparts.

Patch 2 adds a new "sched.domain_root" control file for setting up
multiple scheduling domains or partitions. A scheduling domain root
implies cpu_exclusive.

Patch 3 adds a "sched.load_balance" flag to turn off load balancing in
a scheduling domain or partition.

Patch 4 updates the scheduling domain genaration code to work with
the new scheduling domain feature.

Patch 5 exposes cpus.effective and mems.effective to the root cgroup as
enabling child scheduling domains will take CPUs away from the root cgroup.
So it will be nice to monitor what CPUs are left there.

Patch 6 eliminates the need to rebuild sched domains for v2 if cpu list
changes occur to non-domain root cpusets only.

Patch 7 enables the printing the debug information about scheduling
domain generation.

Waiman Long (7):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add new v2 cpuset.sched.domain_root flag
  cpuset: Add cpuset.sched.load_balance flag to v2
  cpuset: Make generate_sched_domains() recognize isolated_cpus
  cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
  cpuset: Don't rebuild sched domains if cpu changes in non-domain root
  cpuset: Allow reporting of sched domain generation info

 Documentation/cgroup-v2.txt | 144 +++++++++++++++-
 kernel/cgroup/cpuset.c      | 396 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 518 insertions(+), 22 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v9 7/7] cpuset: Allow reporting of sched domain generation info
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

This patch enables us to report sched domain generation information.

If DYNAMIC_DEBUG is enabled, issuing the following command

  echo "file cpuset.c +p" > /sys/kernel/debug/dynamic_debug/control

and setting loglevel to 8 will allow the kernel to show what scheduling
domain changes are being made.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 9513f90..71fb2d0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -820,6 +820,12 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	}
 	BUG_ON(nslot != ndoms);
 
+#ifdef CONFIG_DEBUG_KERNEL
+	for (i = 0; i < ndoms; i++)
+		pr_debug("generate_sched_domains dom %d: %*pbl\n", i,
+			 cpumask_pr_args(doms[i]));
+#endif
+
 done:
 	kfree(csa);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 1/7] cpuset: Enable cpuset controller in default hierarchy
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 90 ++++++++++++++++++++++++++++++++++++++++++---
 kernel/cgroup/cpuset.c      | 48 ++++++++++++++++++++++--
 2 files changed, 130 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeae..cf7bac6 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
        5-3-2. Writeback
      5-4. PID
        5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-5. Cpuset
+       5.5-1. Cpuset Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN if the creation
 of a new process would cause a cgroup policy to be violated.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the CPUs allowed to be used by tasks within this
+	cgroup.  The CPU numbers are comma-separated numbers or
+	ranges.  For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.cpus.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined CPUs that are actually allowed to be
+	used by tasks within the current cgroup.  If "cpuset.cpus"
+	is empty, it shows all the CPUs from the parent cgroup that
+	will be available to be used by this cgroup.  Otherwise, it is
+	a subset of "cpuset.cpus".  Its value will be affected by CPU
+	hotplug events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the memory nodes allowed to be used by tasks within
+	this cgroup.  The memory node numbers are comma-separated
+	numbers or ranges.  For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.mems.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined memory nodes that are actually allowed to
+	be used by tasks within the current cgroup.  If "cpuset.mems"
+	is empty, it shows all the memory nodes from the parent cgroup
+	that will be available to be used by this cgroup.  Otherwise,
+	it is a subset of "cpuset.mems".  Its value will be affected
+	by memory nodes hotplug events.
+
+
 Device controller
 -----------------
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e..419b758 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1931,6 +1930,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "cpus.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2104,8 +2144,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 2/7] cpuset: Add new v2 cpuset.sched.domain_root flag
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

A new cpuset.sched.domain_root boolean flag is added to cpuset
v2. This new flag, if set, indicates that the cgroup is the root of
a new scheduling domain or partition that includes itself and all its
descendants except those that are scheduling domain roots themselves
and their descendants.

With this new flag, one can directly create as many partitions as
necessary without ever using the v1 trick of turning off load balancing
in specific cpusets to create partitions as a side effect.

This new flag is owned by the parent and will cause the CPUs in the
cpuset to be removed from the effective CPUs of its parent.

This is implemented internally by adding a new isolated_cpus mask that
holds the CPUs belonging to child scheduling domain cpusets so that:

	isolated_cpus | effective_cpus = cpus_allowed
	isolated_cpus & effective_cpus = 0

This new flag can only be turned on in a cpuset if its parent is a
scheduling domain root itself. The state of this flag cannot be changed
if the cpuset has children.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  28 +++++
 kernel/cgroup/cpuset.c      | 246 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 271 insertions(+), 3 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index cf7bac6..e7534c5 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1514,6 +1514,34 @@ Cpuset Interface Files
 	it is a subset of "cpuset.mems".  Its value will be affected
 	by memory nodes hotplug events.
 
+  cpuset.sched.domain_root
+	A read-write single value file which exists on non-root
+	cpuset-enabled cgroups.  It is a binary value flag that accepts
+	either "0" (off) or "1" (on).  This flag is set by the parent
+	and is not delegatable.
+
+	If set, it indicates that the current cgroup is the root of a
+	new scheduling domain or partition that comprises itself and
+	all its descendants except those that are scheduling domain
+	roots themselves and their descendants.  The root cgroup is
+	always a scheduling domain root.
+
+	There are constraints on where this flag can be set.  It can
+	only be set in a cgroup if all the following conditions are true.
+
+	1) The "cpuset.cpus" is not empty and the list of CPUs are
+	   exclusive, i.e. they are not shared by any of its siblings.
+	2) The parent cgroup is also a scheduling domain root.
+	3) There is no child cgroups with cpuset enabled.  This is
+	   for eliminating corner cases that have to be handled if such
+	   a condition is allowed.
+
+	Setting this flag will take the CPUs away from the effective
+	CPUs of the parent cgroup.  Once it is set, this flag cannot
+	be cleared if there are any child cgroups with cpuset enabled.
+	Further changes made to "cpuset.cpus" is allowed as long as
+	the first condition above is still true.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 419b758..405b072 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -109,6 +109,9 @@ struct cpuset {
 	cpumask_var_t effective_cpus;
 	nodemask_t effective_mems;
 
+	/* Isolated CPUs for scheduling domain children */
+	cpumask_var_t isolated_cpus;
+
 	/*
 	 * This is old Memory Nodes tasks took on.
 	 *
@@ -134,6 +137,9 @@ struct cpuset {
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* for isolated_cpus */
+	int isolation_count;
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -175,6 +181,7 @@ static inline bool task_has_mempolicy(struct task_struct *task)
 	CS_SCHED_LOAD_BALANCE,
 	CS_SPREAD_PAGE,
 	CS_SPREAD_SLAB,
+	CS_SCHED_DOMAIN_ROOT,
 } cpuset_flagbits_t;
 
 /* convenient tests for these bits */
@@ -203,6 +210,11 @@ static inline int is_sched_load_balance(const struct cpuset *cs)
 	return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 }
 
+static inline int is_sched_domain_root(const struct cpuset *cs)
+{
+	return test_bit(CS_SCHED_DOMAIN_ROOT, &cs->flags);
+}
+
 static inline int is_memory_migrate(const struct cpuset *cs)
 {
 	return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
@@ -220,7 +232,7 @@ static inline int is_spread_slab(const struct cpuset *cs)
 
 static struct cpuset top_cpuset = {
 	.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
-		  (1 << CS_MEM_EXCLUSIVE)),
+		  (1 << CS_MEM_EXCLUSIVE) | (1 << CS_SCHED_DOMAIN_ROOT)),
 };
 
 /**
@@ -902,7 +914,19 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
 
-		cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+		/*
+		 * If parent has isolated CPUs, include them in the list
+		 * of allowable CPUs.
+		 */
+		if (parent->isolation_count) {
+			cpumask_or(new_cpus, parent->effective_cpus,
+				   parent->isolated_cpus);
+			cpumask_and(new_cpus, new_cpus, cpu_online_mask);
+			cpumask_and(new_cpus, new_cpus, cp->cpus_allowed);
+		} else {
+			cpumask_and(new_cpus, cp->cpus_allowed,
+				    parent->effective_cpus);
+		}
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
@@ -948,6 +972,162 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 }
 
 /**
+ * update_isolated_cpumask - update the isolated_cpus mask of parent cpuset
+ * @cpuset:  The cpuset that requests CPU isolation
+ * @oldmask: The old isolated cpumask to be removed from the parent
+ * @newmask: The new isolated cpumask to be added to the parent
+ * Return: 0 if successful, an error code otherwise
+ *
+ * Changes to the isolated CPUs are not allowed if any of CPUs changing
+ * state are in any of the child cpusets of the parent except the requesting
+ * child.
+ *
+ * If the sched_domain_root flag changes, either the oldmask (0=>1) or the
+ * newmask (1=>0) will be NULL.
+ *
+ * Called with cpuset_mutex held.
+ */
+static int update_isolated_cpumask(struct cpuset *cpuset,
+	struct cpumask *oldmask, struct cpumask *newmask)
+{
+	int retval;
+	int adding, deleting;
+	cpumask_var_t addmask, delmask;
+	struct cpuset *parent = parent_cs(cpuset);
+	struct cpuset *sibling;
+	struct cgroup_subsys_state *pos_css;
+	int old_count = parent->isolation_count;
+	bool dying = cpuset->css.flags & CSS_DYING;
+
+	/*
+	 * The new cpumask, if present, mut not be empty and its parent
+	 * must be a scheduling domain root.
+	 */
+	if ((newmask && cpumask_empty(newmask)) ||
+	   !is_sched_domain_root(parent))
+		return -EINVAL;
+
+	/*
+	 * The oldmask, if present, must be a subset of parent's isolated
+	 * CPUs.
+	 */
+	if (oldmask && !cpumask_empty(oldmask) && (!parent->isolation_count ||
+		       !cpumask_subset(oldmask, parent->isolated_cpus))) {
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
+	/*
+	 * A sched_domain_root state change is not allowed if there are
+	 * online children and the cpuset is not dying.
+	 */
+	if (!dying && (!oldmask || !newmask) &&
+	    css_has_online_children(&cpuset->css))
+		return -EBUSY;
+
+	if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
+		free_cpumask_var(addmask);
+		return -ENOMEM;
+	}
+
+	if (!old_count) {
+		if (!zalloc_cpumask_var(&parent->isolated_cpus, GFP_KERNEL)) {
+			retval = -ENOMEM;
+			goto out;
+		}
+		old_count = 1;
+	}
+
+	retval = -EBUSY;
+	adding = deleting = false;
+	if (newmask)
+		cpumask_copy(addmask, newmask);
+	if (oldmask)
+		deleting = cpumask_andnot(delmask, oldmask, addmask);
+	if (newmask)
+		adding = cpumask_andnot(addmask, newmask, delmask);
+
+	if (!adding && !deleting)
+		goto out_ok;
+
+	/*
+	 * The cpus to be added must be in the parent's effective_cpus mask
+	 * but not in the isolated_cpus mask.
+	 */
+	if (!cpumask_subset(addmask, parent->effective_cpus))
+		goto out;
+	if (parent->isolation_count &&
+	    cpumask_intersects(parent->isolated_cpus, addmask))
+		goto out;
+
+	/*
+	 * Check if any CPUs in addmask or delmask are in a sibling cpuset.
+	 * An empty sibling cpus_allowed means it is the same as parent's
+	 * effective_cpus. This checking is skipped if the cpuset is dying.
+	 */
+	if (dying)
+		goto updated_isolated_cpus;
+
+	rcu_read_lock();
+	cpuset_for_each_child(sibling, pos_css, parent) {
+		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
+			continue;
+		if (cpumask_empty(sibling->cpus_allowed))
+			goto out_unlock;
+		if (adding &&
+		    cpumask_intersects(sibling->cpus_allowed, addmask))
+			goto out_unlock;
+		if (deleting &&
+		    cpumask_intersects(sibling->cpus_allowed, delmask))
+			goto out_unlock;
+	}
+	rcu_read_unlock();
+
+	/*
+	 * Change the isolated CPU list.
+	 * Newly added isolated CPUs will be removed from effective_cpus
+	 * and newly deleted ones will be added back if they are online.
+	 */
+updated_isolated_cpus:
+	spin_lock_irq(&callback_lock);
+	if (adding)
+		cpumask_or(parent->isolated_cpus,
+			   parent->isolated_cpus, addmask);
+
+	if (deleting)
+		cpumask_andnot(parent->isolated_cpus,
+			       parent->isolated_cpus, delmask);
+
+	/*
+	 * New effective_cpus = (cpus_allowed & ~isolated_cpus) &
+	 *			 cpu_online_mask
+	 */
+	cpumask_andnot(parent->effective_cpus, parent->cpus_allowed,
+		       parent->isolated_cpus);
+	cpumask_and(parent->effective_cpus, parent->effective_cpus,
+		    cpu_online_mask);
+
+	parent->isolation_count = cpumask_weight(parent->isolated_cpus);
+	spin_unlock_irq(&callback_lock);
+
+out_ok:
+	retval = 0;
+out:
+	free_cpumask_var(addmask);
+	free_cpumask_var(delmask);
+	if (old_count && !parent->isolation_count)
+		free_cpumask_var(parent->isolated_cpus);
+
+	return retval;
+
+out_unlock:
+	rcu_read_unlock();
+	goto out;
+}
+
+/**
  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
  * @cs: the cpuset to consider
  * @trialcs: trial cpuset
@@ -988,6 +1168,13 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval < 0)
 		return retval;
 
+	if (is_sched_domain_root(cs)) {
+		retval = update_isolated_cpumask(cs, cs->cpus_allowed,
+						 trialcs->cpus_allowed);
+		if (retval < 0)
+			return retval;
+	}
+
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	spin_unlock_irq(&callback_lock);
@@ -1316,6 +1503,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	struct cpuset *trialcs;
 	int balance_flag_changed;
 	int spread_flag_changed;
+	int domain_flag_changed;
 	int err;
 
 	trialcs = alloc_trial_cpuset(cs);
@@ -1327,6 +1515,18 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	/*
+	 *  Turning on sched.domain flag (default hierarchy only) implies
+	 *  an implicit cpu_exclusive. Turning off sched.domain will clear
+	 *  the cpu_exclusive flag.
+	 */
+	if (bit == CS_SCHED_DOMAIN_ROOT) {
+		if (turning_on)
+			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+		else
+			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	}
+
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
@@ -1337,11 +1537,27 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
+	domain_flag_changed = (is_sched_domain_root(cs) !=
+			       is_sched_domain_root(trialcs));
+
+	if (domain_flag_changed) {
+		err = turning_on
+		    ? update_isolated_cpumask(cs, NULL, cs->cpus_allowed)
+		    : update_isolated_cpumask(cs, cs->cpus_allowed, NULL);
+		if (err < 0)
+			goto out;
+		/*
+		 * At this point, the state has been changed.
+		 * So we can't back out with error anymore.
+		 */
+	}
+
 	spin_lock_irq(&callback_lock);
 	cs->flags = trialcs->flags;
 	spin_unlock_irq(&callback_lock);
 
-	if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
+	if (!cpumask_empty(trialcs->cpus_allowed) &&
+	   (balance_flag_changed || domain_flag_changed))
 		rebuild_sched_domains_locked();
 
 	if (spread_flag_changed)
@@ -1596,6 +1812,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
 	FILE_SCHED_LOAD_BALANCE,
+	FILE_SCHED_DOMAIN_ROOT,
 	FILE_SCHED_RELAX_DOMAIN_LEVEL,
 	FILE_MEMORY_PRESSURE_ENABLED,
 	FILE_MEMORY_PRESSURE,
@@ -1629,6 +1846,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	case FILE_SCHED_LOAD_BALANCE:
 		retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
 		break;
+	case FILE_SCHED_DOMAIN_ROOT:
+		retval = update_flag(CS_SCHED_DOMAIN_ROOT, cs, val);
+		break;
 	case FILE_MEMORY_MIGRATE:
 		retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
 		break;
@@ -1790,6 +2010,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
 		return is_mem_hardwall(cs);
 	case FILE_SCHED_LOAD_BALANCE:
 		return is_sched_load_balance(cs);
+	case FILE_SCHED_DOMAIN_ROOT:
+		return is_sched_domain_root(cs);
 	case FILE_MEMORY_MIGRATE:
 		return is_memory_migrate(cs);
 	case FILE_MEMORY_PRESSURE_ENABLED:
@@ -1966,6 +2188,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched.domain_root",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_DOMAIN_ROOT,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -2075,6 +2305,9 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
  * If the cpuset being removed has its flag 'sched_load_balance'
  * enabled, then simulate turning sched_load_balance off, which
  * will call rebuild_sched_domains_locked().
+ *
+ * If the cpuset has the 'sched_domain_root' flag enabled, simulate
+ * turning sched_domain_root off.
  */
 
 static void cpuset_css_offline(struct cgroup_subsys_state *css)
@@ -2083,6 +2316,13 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
 
 	mutex_lock(&cpuset_mutex);
 
+	/*
+	 * Calling update_flag() may fail, so we have to call
+	 * update_isolated_cpumask directly to be sure.
+	 */
+	if (is_sched_domain_root(cs))
+		update_isolated_cpumask(cs, cs->cpus_allowed, NULL);
+
 	if (is_sched_load_balance(cs))
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 6/7] cpuset: Don't rebuild sched domains if cpu changes in non-domain root
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

With the cpuset v1, any changes to the list of allowable CPUs in a cpuset
may cause changes in the sched domain configuration depending on the
load balancing states and the cpu lists of its parent and its children.

With cpuset v2 (on default hierarchy), there are more restrictions
on how the load balancing state of a cpuset can change. As a result,
only changes made in a sched domain root will cause possible changes
to the corresponding sched domain. CPU changes to any of the non-domain
root cpusets will not cause changes in the sched domain configuration.
As a result, we don't need to call rebuild_sched_domains_locked()
for changes in a non-domain root cpuset saving precious cpu cycles.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f6ae483..9513f90 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -971,11 +971,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 		update_tasks_cpumask(cp);
 
 		/*
-		 * If the effective cpumask of any non-empty cpuset is changed,
-		 * we need to rebuild sched domains.
+		 * On legacy hierarchy, if the effective cpumask of any non-
+		 * empty cpuset is changed, we need to rebuild sched domains.
+		 * On default hiearchy, the cpuset needs to be a sched
+		 * domain root as well.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
-		    is_sched_load_balance(cp))
+		    is_sched_load_balance(cp) &&
+		   (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
+		    is_sched_domain_root(cp)))
 			need_rebuild_sched_domains = true;
 
 		rcu_read_lock();
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 5/7] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

Because of the fact that setting the "cpuset.sched.domain_root" in
a direct child of root can remove CPUs from the root's effective CPU
list, it makes sense to know what CPUs are left in the root cgroup for
scheduling purpose. So the "cpuset.cpus.effective" control file is now
exposed in the v2 cgroup root.

For consistency, the "cpuset.mems.effective" control file is exposed
as well.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 4 ++--
 kernel/cgroup/cpuset.c      | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 681a809..b97f211 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1474,7 +1474,7 @@ Cpuset Interface Files
 	and won't be affected by any CPU hotplug events.
 
   cpuset.cpus.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined CPUs that are actually allowed to be
@@ -1504,7 +1504,7 @@ Cpuset Interface Files
 	and won't be affected by any memory nodes hotplug events.
 
   cpuset.mems.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined memory nodes that are actually allowed to
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71cd920..f6ae483 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2214,14 +2214,12 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.name = "cpus.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_CPULIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
 		.name = "mems.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_MEMLIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 4/7] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

The generate_sched_domains() function and the hotplug code are modified
to make them use the newly introduced isolated_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b94d4a0..71cd920 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.isolation_count) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
+		/* skip @cp's subtree if not a scheduling domain root */
+		if (!is_sched_domain_root(cp))
+			pos_css = css_rightmost_descendant(pos_css);
 	}
 	rcu_read_unlock();
 
@@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.isolation_count &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.isolation_count &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
@@ -2635,6 +2651,11 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
+	/*
+	 * If isolated_cpus is populated, it is likely that the check below
+	 * will produce a false positive on cpus_updated when the cpu list
+	 * isn't changed. It is extra work, but it is better to be safe.
+	 */
 	cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
 	mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
 
@@ -2643,6 +2664,10 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 		spin_lock_irq(&callback_lock);
 		if (!on_dfl)
 			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+
+		if (top_cpuset.isolation_count)
+			cpumask_andnot(&new_cpus, &new_cpus,
+					top_cpuset.isolated_cpus);
 		cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
 		spin_unlock_irq(&callback_lock);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-29 13:41 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long
In-Reply-To: <1527601294-3444-1-git-send-email-longman@redhat.com>

The sched.load_balance flag is needed to enable CPU isolation similar to
what can be done with the "isolcpus" kernel boot parameter. Its value
can only be changed in a scheduling domain with no child cpusets. On
a non-scheduling domain cpuset, the value of sched.load_balance is
inherited from its parent. This is to make sure that all the cpusets
within the same scheduling domain or partition has the same load
balancing state.

This flag is set by the parent and is not delegatable.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 26 +++++++++++++++++++++
 kernel/cgroup/cpuset.c      | 55 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index e7534c5..681a809 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1542,6 +1542,32 @@ Cpuset Interface Files
 	Further changes made to "cpuset.cpus" is allowed as long as
 	the first condition above is still true.
 
+	A parent scheduling domain root cgroup cannot distribute all
+	its CPUs to its child scheduling domain root cgroups unless
+	its load balancing flag is turned off.
+
+  cpuset.sched.load_balance
+	A read-write single value file which exists on non-root
+	cpuset-enabled cgroups.  It is a binary value flag that accepts
+	either "0" (off) or "1" (on).  This flag is set by the parent
+	and is not delegatable.  It is on by default in the root cgroup.
+
+	When it is on, tasks within this cpuset will be load-balanced
+	by the kernel scheduler.  Tasks will be moved from CPUs with
+	high load to other CPUs within the same cpuset with less load
+	periodically.
+
+	When it is off, there will be no load balancing among CPUs on
+	this cgroup.  Tasks will stay in the CPUs they are running on
+	and will not be moved to other CPUs.
+
+	The load balancing state of a cgroup can only be changed on a
+	scheduling domain root cgroup with no cpuset-enabled children.
+	All cgroups within a scheduling domain or partition must have
+	the same load balancing state.	As descendant cgroups of a
+	scheduling domain root are created, they inherit the same load
+	balancing state of their root.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 405b072..b94d4a0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -510,7 +510,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
-	/* On legacy hiearchy, we must be a subset of our parent cpuset. */
+	/* On legacy hierarchy, we must be a subset of our parent cpuset. */
 	ret = -EACCES;
 	if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
 		goto out;
@@ -1063,6 +1063,14 @@ static int update_isolated_cpumask(struct cpuset *cpuset,
 		goto out;
 
 	/*
+	 * A parent can't distribute all its CPUs to child scheduling
+	 * domain root cpusets unless load balancing is off.
+	 */
+	if (adding & !deleting && is_sched_load_balance(parent) &&
+	    cpumask_equal(addmask, parent->effective_cpus))
+		goto out;
+
+	/*
 	 * Check if any CPUs in addmask or delmask are in a sibling cpuset.
 	 * An empty sibling cpus_allowed means it is the same as parent's
 	 * effective_cpus. This checking is skipped if the cpuset is dying.
@@ -1540,6 +1548,18 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	domain_flag_changed = (is_sched_domain_root(cs) !=
 			       is_sched_domain_root(trialcs));
 
+	/*
+	 * On default hierachy, a load balance flag change is only allowed
+	 * in a scheduling domain root with no child cpuset as all the
+	 * cpusets within the same scheduling domain/partition must have the
+	 * same load balancing state.
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed &&
+	   (!is_sched_domain_root(cs) || css_has_online_children(&cs->css))) {
+		err = -EINVAL;
+		goto out;
+	}
+
 	if (domain_flag_changed) {
 		err = turning_on
 		    ? update_isolated_cpumask(cs, NULL, cs->cpus_allowed)
@@ -2196,6 +2216,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched.load_balance",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_LOAD_BALANCE,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -2209,19 +2237,38 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct cpuset *cs;
+	struct cgroup_subsys_state *errptr = ERR_PTR(-ENOMEM);
 
 	if (!parent_css)
 		return &top_cpuset.css;
 
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
-		return ERR_PTR(-ENOMEM);
+		return errptr;
 	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
 		goto free_cs;
 	if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL))
 		goto free_cpus;
 
-	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	/*
+	 * On default hierarchy, inherit parent's CS_SCHED_LOAD_BALANCE flag.
+	 * Creating new cpuset is also not allowed if the effective_cpus of
+	 * its parent is empty.
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+		struct cpuset *parent = css_cs(parent_css);
+
+		if (test_bit(CS_SCHED_LOAD_BALANCE, &parent->flags))
+			set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+
+		if (cpumask_empty(parent->effective_cpus)) {
+			errptr = ERR_PTR(-EINVAL);
+			goto free_cpus;
+		}
+	} else {
+		set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	}
+
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
 	cpumask_clear(cs->effective_cpus);
@@ -2235,7 +2282,7 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	free_cpumask_var(cs->cpus_allowed);
 free_cs:
 	kfree(cs);
-	return ERR_PTR(-ENOMEM);
+	return errptr;
 }
 
 static int cpuset_css_online(struct cgroup_subsys_state *css)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Mathieu Poirier @ 2018-05-29 16:04 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	Mike Leach, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, open list:DOCUMENTATION,
	Linux Kernel Mailing List, coresight
In-Reply-To: <20180529002538.GA11317@leoy-ThinkPad-X240s>

On 28 May 2018 at 18:25, Leo Yan <leo.yan@linaro.org> wrote:
> Hi Mathieu,
>
> On Mon, May 28, 2018 at 04:13:47PM -0600, Mathieu Poirier wrote:
>> Leo and/or Robert,
>>
>> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
>> > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
>> > traces") reworks the samples generation flow from CoreSight trace to
>> > match the correct format so Perf report tool can display the samples
>> > properly.
>> >
>> > But the change has side effect for branch packet handling, it only
>> > generate branch samples by checking previous packet flag
>> > 'last_instr_taken_branch' is true, this results in below three kinds
>> > packets are missed to generate branch samples:
>> >
>> > - The start tracing packet at the beginning of tracing data;
>> > - The exception handling packet;
>> > - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>> >   for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
>> >   that there have a discontinuity in the trace, on the other hand we
>> >   also miss to generate proper branch sample for packets before and
>> >   after CS_ETM_TRACE_ON packet.
>> >
>> > This patch is to add branch sample handling for up three kinds packets:
>> >
>> > - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>> >   zero and in this case it generates branch sample for the start tracing
>> >   packet; furthermore, we also need to handle the condition for
>> >   prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
>> >
>> > - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>> >   generate branch sample for exception handling packet;
>> >
>> > - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>> >   branch sample in the function cs_etm__flush(), this can save complete
>> >   info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>> >   packet.  We also generate branch sample for the new CS_ETM_RANGE
>> >   packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>> >   first one purpose is to save the info for the new CS_ETM_RANGE packet,
>> >   the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>> >   have hint for a discontinuity in the trace.
>> >
>> >   For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>> >   'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>> >   the decoder layer as dummy value.  This patch is to convert these
>> >   values to zeros for more readable; this is accomplished by functions
>> >   cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
>> >   later one is a new function introduced by this patch.
>> >
>> > Reviewed-by: Robert Walker <robert.walker@arm.com>
>> > Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
>> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
>> > ---
>> >  tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
>> >  1 file changed, 73 insertions(+), 20 deletions(-)
>> >
>> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> > index 822ba91..8418173 100644
>> > --- a/tools/perf/util/cs-etm.c
>> > +++ b/tools/perf/util/cs-etm.c
>> > @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>> >  static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> >  {
>> >     /*
>> > +    * The packet is the start tracing packet if the end_addr is zero,
>> > +    * returns 0 for this case.
>> > +    */
>> > +   if (!packet->end_addr)
>> > +           return 0;
>>
>> What is considered to be the "start tracing packet"?  Right now the only two
>> kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
>> TRACE_ON.  How can we hit a condition where packet->end-addr == 0?
>
> When the first CS_ETM_RANGE packet is coming, etmq->prev_packet is
> initialized by the function cs_etm__alloc_queue(), so
> etmq->prev_packet->end_addr is zero:
>
>     etmq->prev_packet = zalloc(szp);
>
> As you mentioned, we should only have two kind of packets for
> CS_ETM_RANGE and CS_ETM_TRACE_ON.  Currently we skip to handle the
> first CS_ETM_TRACE_ON packet in function cs_etm__flush(), we also can
> refine the function cs_etm__flush() to handle the first coming
> CS_ETM_TRACE_ON packet, after that all packets will be CS_ETM_RANGE
> and CS_ETM_TRACE_ON and have no chance to hit 'packet->end_addr = 0'.
>
> Does this make sense for you?

That is the right way to handle this condition and it gives us a
reliable state machine.

>
> --- Packet dumping when first packet coming ---
> cs_etm__flush: prev_packet: sample_type=0 exc=0 exc_ret=0 cpu=0 start_addr=0x0 end_addr=0x0 last_instr_taken_branch=0
> cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=1 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=0
>
>> > +
>> > +   /*
>> > +    * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
>> > +    * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> > +    */
>> > +   if (packet->end_addr == 0xdeadbeefdeadbeefUL)
>> > +           return 0;
>>
>> As it is with the above, I find triggering on addresses to be brittle and hard
>> to maintain on the long run.  Packets all have a sample_type field that should
>> be used in cases like this one.  That way we know exactly the condition that is
>> targeted.
>
> Will do this.
>
>> While working on this set, please spin-off another patch that defines
>> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
>> numeral is used.  That way we stop using the hard coded value.
>
> Will do this.

Much appreciated.

>
> As now this patch is big with more complex logic, so I consider to
> split it into small patches:
>
> - Define CS_ETM_INVAL_ADDR;
> - Fix for CS_ETM_TRACE_ON packet;
> - Fix for exception packet;
>
> Does this make sense for you?  I have concern that this patch is a
> fixing patch, so not sure after spliting patches will introduce
> trouble for applying them for other stable kernels ...

Reverse the order:

- Fix for CS_ETM_TRACE_ON packet;
- Fix for exception packet;
- Define CS_ETM_INVAL_ADDR;

But you may not need to - see next comment.

>
>> > +
>> > +   /*
>> >      * The packet records the execution range with an exclusive end address
>> >      *
>> >      * A64 instructions are constant size, so the last executed
>> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>> >     return packet->end_addr - A64_INSTR_SIZE;
>> >  }
>> >
>> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
>> > +{
>> > +   /*
>> > +    * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
>> > +    * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
>> > +    */
>> > +   if (packet->start_addr == 0xdeadbeefdeadbeefUL)
>> > +           return 0;
>>
>> Same comment as above.
>
> Will do this.
>
>> > +
>> > +   return packet->start_addr;
>> > +}
>> > +
>> >  static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>> >  {
>> >     /*
>> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>> >
>> >     be       = &bs->entries[etmq->last_branch_pos];
>> >     be->from = cs_etm__last_executed_instr(etmq->prev_packet);
>> > -   be->to   = etmq->packet->start_addr;
>> > +   be->to   = cs_etm__first_executed_instr(etmq->packet);
>> >     /* No support for mispredict */
>> >     be->flags.mispred = 0;
>> >     be->flags.predicted = 1;
>> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>> >     sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>> >     sample.pid = etmq->pid;
>> >     sample.tid = etmq->tid;
>> > -   sample.addr = etmq->packet->start_addr;
>> > +   sample.addr = cs_etm__first_executed_instr(etmq->packet);
>> >     sample.id = etmq->etm->branches_id;
>> >     sample.stream_id = etmq->etm->branches_id;
>> >     sample.period = 1;
>> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> >             etmq->period_instructions = instrs_over;
>> >     }
>> >
>> > -   if (etm->sample_branches &&
>> > -       etmq->prev_packet &&
>> > -       etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> > -       etmq->prev_packet->last_instr_taken_branch) {
>> > -           ret = cs_etm__synth_branch_sample(etmq);
>> > -           if (ret)
>> > -                   return ret;
>> > +   if (etm->sample_branches && etmq->prev_packet) {
>> > +           bool generate_sample = false;
>> > +
>> > +           /* Generate sample for start tracing packet */
>> > +           if (etmq->prev_packet->sample_type == 0 ||
>>
>> What kind of packet is sample_type == 0 ?
>
> Just as explained above, sample_type == 0 is the packet which
> initialized in the function cs_etm__alloc_queue().
>
>> > +               etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
>> > +                   generate_sample = true;
>> > +
>> > +           /* Generate sample for exception packet */
>> > +           if (etmq->prev_packet->exc == true)
>> > +                   generate_sample = true;
>>
>> Please don't do that.  Exception packets have a type of their own and can be
>> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
>> are.  Moreover exception packet containt an address that, if I'm reading the
>> documenation properly, can be used to keep track of instructions that were
>> executed between the last address of the previous range packet and the address
>> executed just before the exception occurred.  Mike and Rob will have to confirm
>> this as the decoder may be doing all that hard work for us.
>
> Sure, will wait for Rob and Mike to confirm for this.
>
> At my side, I dump the packet, the exception packet isn't passed to
> cs-etm.c layer, the decoder layer only sets the flag
> 'packet->exc = true' when exception packet is coming [1].

That's because we didn't need the information.  Now that we do a
function that will insert a packet in the decoder packet queue and
deal with the new packet type in the main decoder loop [2].  At that
point your work may not be eligible for stable anymore and I think it
is fine.  Robert's work was an enhancement over mine and yours is an
enhancement over his.

[2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999

>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n364
>
>> > +
>> > +           /* Generate sample for normal branch packet */
>> > +           if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
>> > +               etmq->prev_packet->last_instr_taken_branch)
>> > +                   generate_sample = true;
>> > +
>> > +           if (generate_sample) {
>> > +                   ret = cs_etm__synth_branch_sample(etmq);
>> > +                   if (ret)
>> > +                           return ret;
>> > +           }
>> >     }
>> >
>> >     if (etm->sample_branches || etm->synth_opts.last_branch) {
>> > @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>> >  static int cs_etm__flush(struct cs_etm_queue *etmq)
>> >  {
>> >     int err = 0;
>> > +   struct cs_etm_auxtrace *etm = etmq->etm;
>> >     struct cs_etm_packet *tmp;
>> >
>> > -   if (etmq->etm->synth_opts.last_branch &&
>> > -       etmq->prev_packet &&
>> > -       etmq->prev_packet->sample_type == CS_ETM_RANGE) {
>> > +   if (!etmq->prev_packet)
>> > +           return 0;
>> > +
>> > +   if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
>> > +           return 0;
>> > +
>> > +   if (etmq->etm->synth_opts.last_branch) {
>>
>> If you add:
>>
>>         if (!etmq->etm->synth_opts.last_branch)
>>                 return 0;
>>
>> You can avoid indenting the whole block.
>
> No, here we cannot do like this.  Except we need to handle the
> condition for 'etmq->etm->synth_opts.last_branch', we also need to
> handle 'etm->sample_branches'.  These two conditions are saperate and
> decide by different command parameters from 'perf script'.

Pardon me - I didn't see the addition of the new '}' just below.

>
>> >             /*
>> >              * Generate a last branch event for the branches left in the
>> >              * circular buffer at the end of the trace.
>> > @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>> >             err = cs_etm__synth_instruction_sample(
>> >                     etmq, addr,
>> >                     etmq->period_instructions);
>> > +           if (err)
>> > +                   return err;
>> >             etmq->period_instructions = 0;
>> > +   }
>> >
>> > -           /*
>> > -            * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> > -            * the next incoming packet.
>> > -            */
>> > -           tmp = etmq->packet;
>> > -           etmq->packet = etmq->prev_packet;
>> > -           etmq->prev_packet = tmp;
>> > +   if (etm->sample_branches) {
>> > +           err = cs_etm__synth_branch_sample(etmq);
>> > +           if (err)
>> > +                   return err;
>> >     }
>> >
>> > -   return err;
>> > +   /*
>> > +    * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>> > +    * the next incoming packet.
>> > +    */
>> > +   tmp = etmq->packet;
>> > +   etmq->packet = etmq->prev_packet;
>> > +   etmq->prev_packet = tmp;
>>
>> Robert, I remember noticing that when you first submitted the code but forgot to
>> go back to it.  What is the point of swapping the packets?  I understand
>>
>> etmq->prev_packet = etmq->packet;
>>
>> But not
>>
>> etmq->packet = tmp;
>>
>> After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
>> is called, which is alwasy right after either cs_etm__sample() or
>> cs_etm__flush().
>
> Yeah, I have the same question for this :)
>
> Thanks for suggestions and reviewing.
>
>> Thanks,
>> Mathieu
>>
>>
>>
>> > +   return 0;
>> >  }
>> >
>> >  static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
>> > --
>> > 2.7.4
>> >
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: Marcus Folkesson @ 2018-05-29 18:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi, davem,
	Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart
  Cc: linux-usb, linux-doc, linux-kernel, Marcus Folkesson

Chip Card Interface Device (CCID) protocol is a USB protocol that
allows a smartcard device to be connected to a computer via a card
reader using a standard USB interface, without the need for each manufacturer
of smartcards to provide its own reader or protocol.

This gadget driver makes Linux show up as a CCID device to the host and let a
userspace daemon act as the smartcard.

This is useful when the Linux gadget itself should act as a cryptographic
device or forward APDUs to an embedded smartcard device.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
---

Notes:
    v3:
    	- fix sparse warnings reported by kbuild test robot
    v2:
    	- add the missing changelog text

 drivers/usb/gadget/Kconfig           |  17 +
 drivers/usb/gadget/function/Makefile |   1 +
 drivers/usb/gadget/function/f_ccid.c | 993 +++++++++++++++++++++++++++++++++++
 drivers/usb/gadget/function/f_ccid.h |  91 ++++
 include/uapi/linux/usb/ccid.h        |  93 ++++
 5 files changed, 1195 insertions(+)
 create mode 100644 drivers/usb/gadget/function/f_ccid.c
 create mode 100644 drivers/usb/gadget/function/f_ccid.h
 create mode 100644 include/uapi/linux/usb/ccid.h

diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
index 31cce7805eb2..bdebdf1ffa2b 100644
--- a/drivers/usb/gadget/Kconfig
+++ b/drivers/usb/gadget/Kconfig
@@ -149,6 +149,9 @@ config USB_LIBCOMPOSITE
 config USB_F_ACM
 	tristate
 
+config USB_F_CCID
+	tristate
+
 config USB_F_SS_LB
 	tristate
 
@@ -248,6 +251,20 @@ config USB_CONFIGFS_ACM
 	  ACM serial link.  This function can be used to interoperate with
 	  MS-Windows hosts or with the Linux-USB "cdc-acm" driver.
 
+config USB_CONFIGFS_CCID
+	bool "Chip Card Interface Device (CCID)"
+	depends on USB_CONFIGFS
+	select USB_F_CCID
+	help
+	  The CCID function driver provides generic emulation of a
+	  Chip Card Interface Device (CCID).
+
+	  You will need a user space server talking to /dev/ccidg*,
+	  since the kernel itself does not implement CCID/TPDU/APDU
+	  protocol.
+
+	  For more information, see Documentation/usb/gadget_ccid.rst.
+
 config USB_CONFIGFS_OBEX
 	bool "Object Exchange Model (CDC OBEX)"
 	depends on USB_CONFIGFS
diff --git a/drivers/usb/gadget/function/Makefile b/drivers/usb/gadget/function/Makefile
index 5d3a6cf02218..629851009e1a 100644
--- a/drivers/usb/gadget/function/Makefile
+++ b/drivers/usb/gadget/function/Makefile
@@ -9,6 +9,7 @@ ccflags-y			+= -I$(srctree)/drivers/usb/gadget/udc/
 # USB Functions
 usb_f_acm-y			:= f_acm.o
 obj-$(CONFIG_USB_F_ACM)		+= usb_f_acm.o
+obj-$(CONFIG_USB_F_CCID)	+= f_ccid.o
 usb_f_ss_lb-y			:= f_loopback.o f_sourcesink.o
 obj-$(CONFIG_USB_F_SS_LB)	+= usb_f_ss_lb.o
 obj-$(CONFIG_USB_U_SERIAL)	+= u_serial.o
diff --git a/drivers/usb/gadget/function/f_ccid.c b/drivers/usb/gadget/function/f_ccid.c
new file mode 100644
index 000000000000..47fb229a06db
--- /dev/null
+++ b/drivers/usb/gadget/function/f_ccid.c
@@ -0,0 +1,993 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * f_ccid.c -- Chip Card Interface Device (CCID) function Driver
+ *
+ * Copyright (C) 2018 Marcus Folkesson <marcus.folkesson@gmail.com>
+ *
+ */
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/usb/composite.h>
+#include <uapi/linux/usb/ccid.h>
+
+#include "f_ccid.h"
+#include "u_f.h"
+
+/* Number of tx requests to allocate */
+#define N_TX_REQS 4
+
+/* Maximum number of devices */
+#define CCID_MINORS 4
+
+struct ccidg_bulk_dev {
+	atomic_t is_open;
+	atomic_t rx_req_busy;
+	wait_queue_head_t read_wq;
+	wait_queue_head_t write_wq;
+	struct usb_request *rx_req;
+	atomic_t rx_done;
+	struct list_head tx_idle;
+};
+
+struct f_ccidg {
+	struct usb_function_instance	func_inst;
+	struct usb_function function;
+	spinlock_t lock;
+	atomic_t online;
+
+	/* Character device */
+	struct cdev cdev;
+	int minor;
+
+	/* Dynamic attributes */
+	u32 features;
+	u32 protocols;
+	u8 pinsupport;
+	u8 nslots;
+	u8 lcdlayout;
+
+	/* Endpoints */
+	struct usb_ep *in;
+	struct usb_ep *out;
+	struct ccidg_bulk_dev bulk_dev;
+};
+
+/* Interface Descriptor: */
+static struct usb_interface_descriptor ccid_interface_desc = {
+	.bLength =		USB_DT_INTERFACE_SIZE,
+	.bDescriptorType =	USB_DT_INTERFACE,
+	.bNumEndpoints =	2,
+	.bInterfaceClass =	USB_CLASS_CSCID,
+	.bInterfaceSubClass =	0,
+	.bInterfaceProtocol =	0,
+};
+
+/* CCID Class Descriptor */
+static struct ccid_class_descriptor ccid_class_desc = {
+	.bLength =		sizeof(ccid_class_desc),
+	.bDescriptorType =	CCID_DECRIPTOR_TYPE,
+	.bcdCCID =		cpu_to_le16(CCID1_10),
+	/* .bMaxSlotIndex =	DYNAMIC */
+	.bVoltageSupport =	CCID_VOLTS_3_0,
+	/* .dwProtocols =	DYNAMIC */
+	.dwDefaultClock =	cpu_to_le32(3580),
+	.dwMaximumClock =	cpu_to_le32(3580),
+	.bNumClockSupported =	0,
+	.dwDataRate =		cpu_to_le32(9600),
+	.dwMaxDataRate =	cpu_to_le32(9600),
+	.bNumDataRatesSupported = 0,
+	.dwMaxIFSD =		0,
+	.dwSynchProtocols =	0,
+	.dwMechanical =		0,
+	/* .dwFeatures =	DYNAMIC */
+
+	/* extended APDU level Message Length */
+	.dwMaxCCIDMessageLength = cpu_to_le32(0x200),
+	.bClassGetResponse =	0x0,
+	.bClassEnvelope =	0x0,
+	/* .wLcdLayout =	DYNAMIC */
+	/* .bPINSupport =	DYNAMIC */
+	.bMaxCCIDBusySlots =	1
+};
+
+/* Full speed support: */
+static struct usb_endpoint_descriptor ccid_fs_in_desc = {
+	.bLength =		USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_IN,
+	.bmAttributes =		USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize   =	cpu_to_le16(64),
+};
+
+static struct usb_endpoint_descriptor ccid_fs_out_desc = {
+	.bLength =		USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_OUT,
+	.bmAttributes =		USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize   =	 cpu_to_le16(64),
+};
+
+static struct usb_descriptor_header *ccid_fs_descs[] = {
+	(struct usb_descriptor_header *) &ccid_interface_desc,
+	(struct usb_descriptor_header *) &ccid_class_desc,
+	(struct usb_descriptor_header *) &ccid_fs_in_desc,
+	(struct usb_descriptor_header *) &ccid_fs_out_desc,
+	NULL,
+};
+
+/* High speed support: */
+static struct usb_endpoint_descriptor ccid_hs_in_desc = {
+	.bLength =		USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_IN,
+	.bmAttributes =		USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize =	cpu_to_le16(512),
+};
+
+static struct usb_endpoint_descriptor ccid_hs_out_desc = {
+	.bLength =		USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_OUT,
+	.bmAttributes =		USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize =	cpu_to_le16(512),
+};
+
+static struct usb_descriptor_header *ccid_hs_descs[] = {
+	(struct usb_descriptor_header *) &ccid_interface_desc,
+	(struct usb_descriptor_header *) &ccid_class_desc,
+	(struct usb_descriptor_header *) &ccid_hs_in_desc,
+	(struct usb_descriptor_header *) &ccid_hs_out_desc,
+	NULL,
+};
+
+static DEFINE_IDA(ccidg_ida);
+static int major;
+static DEFINE_MUTEX(ccidg_ida_lock); /* protects access to ccidg_ida */
+static struct class *ccidg_class;
+
+static inline struct f_ccidg_opts *to_f_ccidg_opts(struct config_item *item)
+{
+	return container_of(to_config_group(item), struct f_ccidg_opts,
+			    func_inst.group);
+}
+
+static inline struct f_ccidg *func_to_ccidg(struct usb_function *f)
+{
+	return container_of(f, struct f_ccidg, function);
+}
+
+static inline int ccidg_get_minor(void)
+{
+	int ret;
+
+	ret = ida_simple_get(&ccidg_ida, 0, 0, GFP_KERNEL);
+	if (ret >= CCID_MINORS) {
+		ida_simple_remove(&ccidg_ida, ret);
+		ret = -ENODEV;
+	}
+
+	return ret;
+}
+
+static inline void ccidg_put_minor(int minor)
+{
+	ida_simple_remove(&ccidg_ida, minor);
+}
+
+static int ccidg_setup(void)
+{
+	int ret;
+	dev_t dev;
+
+	ccidg_class = class_create(THIS_MODULE, "ccidg");
+	if (IS_ERR(ccidg_class)) {
+		ccidg_class = NULL;
+		return PTR_ERR(ccidg_class);
+	}
+
+	ret = alloc_chrdev_region(&dev, 0, CCID_MINORS, "ccidg");
+	if (ret) {
+		class_destroy(ccidg_class);
+		ccidg_class = NULL;
+		return ret;
+	}
+
+	major = MAJOR(dev);
+
+	return 0;
+}
+
+static void ccidg_cleanup(void)
+{
+	if (major) {
+		unregister_chrdev_region(MKDEV(major, 0), CCID_MINORS);
+		major = 0;
+	}
+
+	class_destroy(ccidg_class);
+	ccidg_class = NULL;
+}
+
+static void ccidg_attr_release(struct config_item *item)
+{
+	struct f_ccidg_opts *opts = to_f_ccidg_opts(item);
+
+	usb_put_function_instance(&opts->func_inst);
+}
+
+static struct configfs_item_operations ccidg_item_ops = {
+	.release	= ccidg_attr_release,
+};
+
+#define F_CCIDG_OPT(name, prec, limit)					\
+static ssize_t f_ccidg_opts_##name##_show(struct config_item *item, char *page)\
+{									\
+	struct f_ccidg_opts *opts = to_f_ccidg_opts(item);		\
+	int result;							\
+									\
+	mutex_lock(&opts->lock);					\
+	result = sprintf(page, "%x\n", opts->name);			\
+	mutex_unlock(&opts->lock);					\
+									\
+	return result;							\
+}									\
+									\
+static ssize_t f_ccidg_opts_##name##_store(struct config_item *item,	\
+					 const char *page, size_t len)	\
+{									\
+	struct f_ccidg_opts *opts = to_f_ccidg_opts(item);		\
+	int ret;							\
+	u##prec num;							\
+									\
+	mutex_lock(&opts->lock);					\
+	if (opts->refcnt) {						\
+		ret = -EBUSY;						\
+		goto end;						\
+	}								\
+									\
+	ret = kstrtou##prec(page, 0, &num);				\
+	if (ret)							\
+		goto end;						\
+									\
+	if (num > limit) {						\
+		ret = -EINVAL;						\
+		goto end;						\
+	}								\
+	opts->name = num;						\
+	ret = len;							\
+									\
+end:									\
+	mutex_unlock(&opts->lock);					\
+	return ret;							\
+}									\
+									\
+CONFIGFS_ATTR(f_ccidg_opts_, name)
+
+F_CCIDG_OPT(features, 32, 0xffffffff);
+F_CCIDG_OPT(protocols, 32, 0x03);
+F_CCIDG_OPT(pinsupport, 8, 0x03);
+F_CCIDG_OPT(lcdlayout, 16, 0xffff);
+F_CCIDG_OPT(nslots, 8, 0xff);
+
+static struct configfs_attribute *ccidg_attrs[] = {
+	&f_ccidg_opts_attr_features,
+	&f_ccidg_opts_attr_protocols,
+	&f_ccidg_opts_attr_pinsupport,
+	&f_ccidg_opts_attr_lcdlayout,
+	&f_ccidg_opts_attr_nslots,
+	NULL,
+};
+
+static struct config_item_type ccidg_func_type = {
+	.ct_item_ops	= &ccidg_item_ops,
+	.ct_attrs	= ccidg_attrs,
+	.ct_owner	= THIS_MODULE,
+};
+
+static void ccidg_req_put(struct f_ccidg *ccidg, struct list_head *head,
+		struct usb_request *req)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ccidg->lock, flags);
+	list_add_tail(&req->list, head);
+	spin_unlock_irqrestore(&ccidg->lock, flags);
+}
+
+static struct usb_request *ccidg_req_get(struct f_ccidg *ccidg,
+					struct list_head *head)
+{
+	unsigned long flags;
+	struct usb_request *req = NULL;
+
+	spin_lock_irqsave(&ccidg->lock, flags);
+	if (!list_empty(head)) {
+		req = list_first_entry(head, struct usb_request, list);
+		list_del(&req->list);
+	}
+	spin_unlock_irqrestore(&ccidg->lock, flags);
+
+	return req;
+}
+
+static void ccidg_bulk_complete_tx(struct usb_ep *ep, struct usb_request *req)
+{
+	struct f_ccidg *ccidg = (struct f_ccidg *)ep->driver_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	struct usb_composite_dev *cdev	= ccidg->function.config->cdev;
+
+	switch (req->status) {
+	default:
+		VDBG(cdev, "ccid: tx err %d\n", req->status);
+		/* FALLTHROUGH */
+	case -ECONNRESET:		/* unlink */
+	case -ESHUTDOWN:		/* disconnect etc */
+		break;
+	case 0:
+		break;
+	}
+
+	ccidg_req_put(ccidg, &bulk_dev->tx_idle, req);
+	wake_up(&bulk_dev->write_wq);
+}
+
+static void ccidg_bulk_complete_rx(struct usb_ep *ep, struct usb_request *req)
+{
+	struct f_ccidg *ccidg = (struct f_ccidg *)ep->driver_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	struct usb_composite_dev *cdev	= ccidg->function.config->cdev;
+
+	switch (req->status) {
+
+	/* normal completion */
+	case 0:
+		/* We only cares about packets with nonzero length */
+		if (req->actual > 0)
+			atomic_set(&bulk_dev->rx_done, 1);
+		break;
+
+	/* software-driven interface shutdown */
+	case -ECONNRESET:		/* unlink */
+	case -ESHUTDOWN:		/* disconnect etc */
+		VDBG(cdev, "ccid: rx shutdown, code %d\n", req->status);
+		break;
+
+	/* for hardware automagic (such as pxa) */
+	case -ECONNABORTED:		/* endpoint reset */
+		DBG(cdev, "ccid: rx %s reset\n", ep->name);
+		break;
+
+	/* data overrun */
+	case -EOVERFLOW:
+		/* FALLTHROUGH */
+	default:
+		DBG(cdev, "ccid: rx status %d\n", req->status);
+		break;
+	}
+
+	wake_up(&bulk_dev->read_wq);
+}
+
+static struct usb_request *
+ccidg_request_alloc(struct usb_ep *ep, unsigned int len)
+{
+	struct usb_request *req;
+
+	req = usb_ep_alloc_request(ep, GFP_ATOMIC);
+	if (!req)
+		return ERR_PTR(-ENOMEM);
+
+	req->length = len;
+	req->buf = kmalloc(len, GFP_ATOMIC);
+	if (req->buf == NULL) {
+		usb_ep_free_request(ep, req);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return req;
+}
+
+static void ccidg_request_free(struct usb_request *req, struct usb_ep *ep)
+{
+	if (req) {
+		kfree(req->buf);
+		usb_ep_free_request(ep, req);
+	}
+}
+
+static int ccidg_function_setup(struct usb_function *f,
+		const struct usb_ctrlrequest *ctrl)
+{
+	struct f_ccidg *ccidg = container_of(f, struct f_ccidg, function);
+	struct usb_composite_dev *cdev	= f->config->cdev;
+	struct usb_request *req		= cdev->req;
+	int ret				= -EOPNOTSUPP;
+	u16 w_index			= le16_to_cpu(ctrl->wIndex);
+	u16 w_value			= le16_to_cpu(ctrl->wValue);
+	u16 w_length			= le16_to_cpu(ctrl->wLength);
+
+	if (!atomic_read(&ccidg->online))
+		return -ENOTCONN;
+
+	switch (ctrl->bRequestType & USB_TYPE_MASK) {
+	case USB_TYPE_CLASS:
+		{
+		switch (ctrl->bRequest) {
+		case CCIDGENERICREQ_GET_CLOCK_FREQUENCIES:
+			if (w_length > sizeof(ccid_class_desc.dwDefaultClock))
+				break;
+
+			*(__le32 *) req->buf = ccid_class_desc.dwDefaultClock;
+			ret = sizeof(ccid_class_desc.dwDefaultClock);
+			break;
+
+		case CCIDGENERICREQ_GET_DATA_RATES:
+			if (w_length > sizeof(ccid_class_desc.dwDataRate))
+				break;
+
+			*(__le32 *) req->buf = ccid_class_desc.dwDataRate;
+			ret = sizeof(ccid_class_desc.dwDataRate);
+			break;
+
+		default:
+			VDBG(f->config->cdev,
+				"ccid: invalid control req%02x.%02x v%04x i%04x l%d\n",
+				ctrl->bRequestType, ctrl->bRequest,
+				w_value, w_index, w_length);
+		}
+		}
+	}
+
+	/* responded with data transfer or status phase? */
+	if (ret >= 0) {
+		VDBG(f->config->cdev, "ccid: req%02x.%02x v%04x i%04x l%d\n",
+			ctrl->bRequestType, ctrl->bRequest,
+			w_value, w_index, w_length);
+
+		req->length = ret;
+		ret = usb_ep_queue(cdev->gadget->ep0, req, GFP_ATOMIC);
+		if (ret < 0)
+			ERROR(f->config->cdev,
+				"ccid: ep0 enqueue err %d\n", ret);
+	}
+
+	return ret;
+}
+
+static void ccidg_function_disable(struct usb_function *f)
+{
+	struct f_ccidg *ccidg = func_to_ccidg(f);
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	struct usb_request *req;
+
+	/* Disable endpoints */
+	usb_ep_disable(ccidg->in);
+	usb_ep_disable(ccidg->out);
+
+	/* Free endpoint related requests */
+	if (!atomic_read(&bulk_dev->rx_req_busy))
+		ccidg_request_free(bulk_dev->rx_req, ccidg->out);
+	while ((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle)))
+		ccidg_request_free(req, ccidg->in);
+
+	atomic_set(&ccidg->online, 0);
+
+	/* Wake up threads */
+	wake_up(&bulk_dev->write_wq);
+	wake_up(&bulk_dev->read_wq);
+}
+
+static int ccidg_start_ep(struct f_ccidg *ccidg, struct usb_function *f,
+			struct usb_ep *ep)
+{
+	struct usb_composite_dev *cdev = f->config->cdev;
+	int ret;
+
+	usb_ep_disable(ep);
+
+	ret = config_ep_by_speed(cdev->gadget, f, ep);
+	if (ret) {
+		ERROR(cdev, "ccid: can't configure %s: %d\n", ep->name, ret);
+		return ret;
+	}
+
+	ret = usb_ep_enable(ep);
+	if (ret) {
+		ERROR(cdev, "ccid: can't start %s: %d\n", ep->name, ret);
+		return ret;
+	}
+
+	ep->driver_data = ccidg;
+
+	return ret;
+}
+
+static int ccidg_function_set_alt(struct usb_function *f,
+		unsigned int intf, unsigned int alt)
+{
+	struct f_ccidg *ccidg		= func_to_ccidg(f);
+	struct usb_composite_dev *cdev	= f->config->cdev;
+	struct ccidg_bulk_dev *bulk_dev	= &ccidg->bulk_dev;
+	struct usb_request *req;
+	int ret;
+	int i;
+
+	/* Allocate requests for our endpoints */
+	req = ccidg_request_alloc(ccidg->out,
+			sizeof(struct ccidg_bulk_out_header));
+	if (IS_ERR(req)) {
+		ERROR(cdev, "ccid: uname to allocate memory for out req\n");
+		return PTR_ERR(req);
+	}
+	req->complete = ccidg_bulk_complete_rx;
+	req->context = ccidg;
+	bulk_dev->rx_req = req;
+
+	/* Allocate bunch of in requests */
+	for (i = 0; i < N_TX_REQS; i++) {
+		req = ccidg_request_alloc(ccidg->in,
+				sizeof(struct ccidg_bulk_in_header));
+
+		if (IS_ERR(req)) {
+			ret = PTR_ERR(req);
+			ERROR(cdev,
+				"ccid: uname to allocate memory for in req\n");
+			goto free_bulk_out;
+		}
+		req->complete = ccidg_bulk_complete_tx;
+		req->context = ccidg;
+		ccidg_req_put(ccidg, &bulk_dev->tx_idle, req);
+	}
+
+	/* choose the descriptors and enable endpoints */
+	ret = ccidg_start_ep(ccidg, f, ccidg->in);
+	if (ret)
+		goto free_bulk_in;
+
+	ret = ccidg_start_ep(ccidg, f, ccidg->out);
+	if (ret)
+		goto disable_ep_in;
+
+	atomic_set(&ccidg->online, 1);
+	return ret;
+
+disable_ep_in:
+	usb_ep_disable(ccidg->in);
+free_bulk_in:
+	while ((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle)))
+		ccidg_request_free(req, ccidg->in);
+free_bulk_out:
+	ccidg_request_free(bulk_dev->rx_req, ccidg->out);
+	return ret;
+}
+
+static int ccidg_bulk_open(struct inode *inode, struct file *file)
+{
+	struct f_ccidg *ccidg;
+	struct ccidg_bulk_dev *bulk_dev;
+
+	ccidg = container_of(inode->i_cdev, struct f_ccidg, cdev);
+	bulk_dev = &ccidg->bulk_dev;
+
+	if (!atomic_read(&ccidg->online)) {
+		DBG(ccidg->function.config->cdev, "ccid: device not online\n");
+		return -ENODEV;
+	}
+
+	if (atomic_read(&bulk_dev->is_open)) {
+		DBG(ccidg->function.config->cdev,
+				"ccid: device already opened\n");
+		return -EBUSY;
+	}
+
+	atomic_set(&bulk_dev->is_open, 1);
+
+	file->private_data = ccidg;
+
+	return 0;
+}
+
+static int ccidg_bulk_release(struct inode *inode, struct file *file)
+{
+	struct f_ccidg *ccidg =  file->private_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+
+	atomic_set(&bulk_dev->is_open, 0);
+	return 0;
+}
+
+static ssize_t ccidg_bulk_read(struct file *file, char __user *buf,
+				size_t count, loff_t *pos)
+{
+	struct f_ccidg *ccidg =  file->private_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	struct usb_request *req;
+	int r = count, xfer;
+	int ret;
+
+	/* Make sure we have enough space for a whole package */
+	if (count < sizeof(struct ccidg_bulk_out_header)) {
+		DBG(ccidg->function.config->cdev,
+				"ccid: too small buffer size. %zu provided, need at least %zu\n",
+				count, sizeof(struct ccidg_bulk_out_header));
+		return -ENOMEM;
+	}
+
+	if (!atomic_read(&ccidg->online))
+		return -ENODEV;
+
+	/* queue a request */
+	req = bulk_dev->rx_req;
+	req->length = count;
+	atomic_set(&bulk_dev->rx_done, 0);
+
+	ret = usb_ep_queue(ccidg->out, req, GFP_KERNEL);
+	if (ret < 0) {
+		ERROR(ccidg->function.config->cdev,
+				"ccid: usb ep queue failed\n");
+		return -EIO;
+	}
+
+	if (!atomic_read(&bulk_dev->rx_done) &&
+			file->f_flags & (O_NONBLOCK | O_NDELAY))
+		return -EAGAIN;
+
+	/* wait for a request to complete */
+	ret = wait_event_interruptible(bulk_dev->read_wq,
+			atomic_read(&bulk_dev->rx_done) ||
+			!atomic_read(&ccidg->online));
+	if (ret < 0) {
+		usb_ep_dequeue(ccidg->out, req);
+		return -ERESTARTSYS;
+	}
+
+	/* Still online? */
+	if (!atomic_read(&ccidg->online))
+		return -ENODEV;
+
+	atomic_set(&bulk_dev->rx_req_busy, 1);
+	xfer = (req->actual < count) ? req->actual : count;
+
+	if (copy_to_user(buf, req->buf, xfer))
+		r = -EFAULT;
+
+	atomic_set(&bulk_dev->rx_req_busy, 0);
+	if (!atomic_read(&ccidg->online)) {
+		ccidg_request_free(bulk_dev->rx_req, ccidg->out);
+		return -ENODEV;
+	}
+
+	return xfer;
+}
+
+static ssize_t ccidg_bulk_write(struct file *file, const char __user *buf,
+				 size_t count, loff_t *pos)
+{
+	struct f_ccidg *ccidg =  file->private_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	struct usb_request *req = NULL;
+	int ret;
+
+	/* Are we online? */
+	if (!atomic_read(&ccidg->online))
+		return -ENODEV;
+
+	/* Avoid Zero Length Packets (ZLP) */
+	if (!count)
+		return 0;
+
+	/* Make sure we have enough space for a whole package */
+	if (count > sizeof(struct ccidg_bulk_out_header)) {
+		DBG(ccidg->function.config->cdev,
+				"ccid: too much data. %zu provided, but we can only handle %zu\n",
+				count, sizeof(struct ccidg_bulk_out_header));
+		return -ENOMEM;
+	}
+
+	if (list_empty(&bulk_dev->tx_idle) &&
+			file->f_flags & (O_NONBLOCK | O_NDELAY))
+		return -EAGAIN;
+
+	/* get an idle tx request to use */
+	ret = wait_event_interruptible(bulk_dev->write_wq,
+		((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle))));
+
+	if (ret < 0)
+		return -ERESTARTSYS;
+
+	if (copy_from_user(req->buf, buf, count)) {
+		if (!atomic_read(&ccidg->online)) {
+			ccidg_request_free(req, ccidg->in);
+			return -ENODEV;
+		} else {
+			ccidg_req_put(ccidg, &bulk_dev->tx_idle, req);
+			return -EFAULT;
+		}
+	}
+
+	req->length = count;
+	ret = usb_ep_queue(ccidg->in, req, GFP_KERNEL);
+	if (ret < 0) {
+		ccidg_req_put(ccidg, &bulk_dev->tx_idle, req);
+
+		if (!atomic_read(&ccidg->online)) {
+			/* Free up all requests if we are not online */
+			while ((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle)))
+				ccidg_request_free(req, ccidg->in);
+
+			return -ENODEV;
+		}
+		return -EIO;
+	}
+
+	return count;
+}
+
+static __poll_t ccidg_bulk_poll(struct file *file, poll_table * wait)
+{
+	struct f_ccidg *ccidg =  file->private_data;
+	struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
+	__poll_t	ret = 0;
+
+	poll_wait(file, &bulk_dev->read_wq, wait);
+	poll_wait(file, &bulk_dev->write_wq, wait);
+
+	if (list_empty(&bulk_dev->tx_idle))
+		ret |= EPOLLOUT | EPOLLWRNORM;
+
+	if (atomic_read(&bulk_dev->rx_done))
+		ret |= EPOLLIN | EPOLLRDNORM;
+
+	return ret;
+}
+
+static const struct file_operations f_ccidg_fops = {
+	.owner = THIS_MODULE,
+	.read = ccidg_bulk_read,
+	.write = ccidg_bulk_write,
+	.open = ccidg_bulk_open,
+	.poll = ccidg_bulk_poll,
+	.release = ccidg_bulk_release,
+};
+
+static int ccidg_bulk_device_init(struct f_ccidg *dev)
+{
+	struct ccidg_bulk_dev *bulk_dev = &dev->bulk_dev;
+
+	init_waitqueue_head(&bulk_dev->read_wq);
+	init_waitqueue_head(&bulk_dev->write_wq);
+	INIT_LIST_HEAD(&bulk_dev->tx_idle);
+
+	return 0;
+}
+
+static void ccidg_function_free(struct usb_function *f)
+{
+	struct f_ccidg *ccidg;
+	struct f_ccidg_opts *opts;
+
+	ccidg = func_to_ccidg(f);
+	opts = container_of(f->fi, struct f_ccidg_opts, func_inst);
+
+	kfree(ccidg);
+	mutex_lock(&opts->lock);
+	--opts->refcnt;
+	mutex_unlock(&opts->lock);
+}
+
+static void ccidg_function_unbind(struct usb_configuration *c,
+					struct usb_function *f)
+{
+	struct f_ccidg *ccidg = func_to_ccidg(f);
+
+	device_destroy(ccidg_class, MKDEV(major, ccidg->minor));
+	cdev_del(&ccidg->cdev);
+
+	/* disable/free request and end point */
+	usb_free_all_descriptors(f);
+}
+
+static int ccidg_function_bind(struct usb_configuration *c,
+					struct usb_function *f)
+{
+	struct f_ccidg *ccidg = func_to_ccidg(f);
+	struct usb_ep *ep;
+	struct usb_composite_dev *cdev = c->cdev;
+	struct device *device;
+	dev_t dev;
+	int ifc_id;
+	int ret;
+
+	/* allocate instance-specific interface IDs, and patch descriptors */
+	ifc_id = usb_interface_id(c, f);
+	if (ifc_id < 0) {
+		ERROR(cdev, "ccid: unable to allocate ifc id, err:%d\n",
+				ifc_id);
+		return ifc_id;
+	}
+	ccid_interface_desc.bInterfaceNumber = ifc_id;
+
+	/* allocate instance-specific endpoints */
+	ep = usb_ep_autoconfig(cdev->gadget, &ccid_fs_in_desc);
+	if (!ep) {
+		ERROR(cdev, "ccid: usb epin autoconfig failed\n");
+		ret = -ENODEV;
+		goto ep_auto_in_fail;
+	}
+	ccidg->in = ep;
+	ep->driver_data = ccidg;
+
+	ep = usb_ep_autoconfig(cdev->gadget, &ccid_fs_out_desc);
+	if (!ep) {
+		ERROR(cdev, "ccid: usb epout autoconfig failed\n");
+		ret = -ENODEV;
+		goto ep_auto_out_fail;
+	}
+	ccidg->out = ep;
+	ep->driver_data = ccidg;
+
+	/* set descriptor dynamic values */
+	ccid_class_desc.dwFeatures	= cpu_to_le32(ccidg->features);
+	ccid_class_desc.bPINSupport	= ccidg->pinsupport;
+	ccid_class_desc.wLcdLayout	= cpu_to_le16(ccidg->lcdlayout);
+	ccid_class_desc.bMaxSlotIndex	= ccidg->nslots;
+	ccid_class_desc.dwProtocols	= cpu_to_le32(ccidg->protocols);
+
+	if (ccidg->protocols == CCID_PROTOCOL_NOT_SEL) {
+		ccidg->protocols = CCID_PROTOCOL_T0 | CCID_PROTOCOL_T1;
+		INFO(ccidg->function.config->cdev,
+			"ccid: No protocol selected. Support both T0 and T1.\n");
+	}
+
+
+	ccid_hs_in_desc.bEndpointAddress =
+			ccid_fs_in_desc.bEndpointAddress;
+	ccid_hs_out_desc.bEndpointAddress =
+			ccid_fs_out_desc.bEndpointAddress;
+
+	ret  = usb_assign_descriptors(f, ccid_fs_descs,
+			ccid_hs_descs, NULL, NULL);
+	if (ret)
+		goto ep_auto_out_fail;
+
+	/* create char device */
+	cdev_init(&ccidg->cdev, &f_ccidg_fops);
+	dev = MKDEV(major, ccidg->minor);
+	ret = cdev_add(&ccidg->cdev, dev, 1);
+	if (ret)
+		goto fail_free_descs;
+
+	device = device_create(ccidg_class, NULL, dev, NULL,
+			       "%s%d", "ccidg", ccidg->minor);
+	if (IS_ERR(device)) {
+		ret = PTR_ERR(device);
+		goto del;
+	}
+
+	return 0;
+
+del:
+	cdev_del(&ccidg->cdev);
+fail_free_descs:
+	usb_free_all_descriptors(f);
+ep_auto_out_fail:
+	ccidg->out->driver_data = NULL;
+	ccidg->out = NULL;
+ep_auto_in_fail:
+	ccidg->in->driver_data = NULL;
+	ccidg->in = NULL;
+	ERROR(f->config->cdev, "ccidg_bind FAILED\n");
+
+	return ret;
+}
+
+static struct usb_function *ccidg_alloc(struct usb_function_instance *fi)
+{
+	struct f_ccidg *ccidg;
+	struct f_ccidg_opts *opts;
+	int ret;
+
+	ccidg = kzalloc(sizeof(*ccidg), GFP_KERNEL);
+	if (!ccidg)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&ccidg->lock);
+
+	ret = ccidg_bulk_device_init(ccidg);
+	if (ret) {
+		kfree(ccidg);
+		return ERR_PTR(ret);
+	}
+
+	opts = container_of(fi, struct f_ccidg_opts, func_inst);
+
+	mutex_lock(&opts->lock);
+	++opts->refcnt;
+
+	ccidg->minor = opts->minor;
+	ccidg->features = opts->features;
+	ccidg->protocols = opts->protocols;
+	ccidg->pinsupport = opts->pinsupport;
+	ccidg->nslots = opts->nslots;
+	mutex_unlock(&opts->lock);
+
+	ccidg->function.name	= "ccid";
+	ccidg->function.bind	= ccidg_function_bind;
+	ccidg->function.unbind	= ccidg_function_unbind;
+	ccidg->function.set_alt	= ccidg_function_set_alt;
+	ccidg->function.disable	= ccidg_function_disable;
+	ccidg->function.setup	= ccidg_function_setup;
+	ccidg->function.free_func = ccidg_function_free;
+
+	return &ccidg->function;
+}
+
+static void ccidg_free_inst(struct usb_function_instance *f)
+{
+	struct f_ccidg_opts *opts;
+
+	opts = container_of(f, struct f_ccidg_opts, func_inst);
+	mutex_lock(&ccidg_ida_lock);
+
+	ccidg_put_minor(opts->minor);
+	if (ida_is_empty(&ccidg_ida))
+		ccidg_cleanup();
+
+	mutex_unlock(&ccidg_ida_lock);
+
+	kfree(opts);
+}
+
+static struct usb_function_instance *ccidg_alloc_inst(void)
+{
+	struct f_ccidg_opts *opts;
+	struct usb_function_instance *ret;
+	int status = 0;
+
+	opts = kzalloc(sizeof(*opts), GFP_KERNEL);
+	if (!opts)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&opts->lock);
+	opts->func_inst.free_func_inst = ccidg_free_inst;
+	ret = &opts->func_inst;
+
+	mutex_lock(&ccidg_ida_lock);
+
+	if (ida_is_empty(&ccidg_ida)) {
+		status = ccidg_setup();
+		if (status)  {
+			ret = ERR_PTR(status);
+			kfree(opts);
+			goto unlock;
+		}
+	}
+
+	opts->minor = ccidg_get_minor();
+	if (opts->minor < 0) {
+		ret = ERR_PTR(opts->minor);
+		kfree(opts);
+		if (ida_is_empty(&ccidg_ida))
+			ccidg_cleanup();
+		goto unlock;
+	}
+
+	config_group_init_type_name(&opts->func_inst.group,
+			"", &ccidg_func_type);
+
+unlock:
+	mutex_unlock(&ccidg_ida_lock);
+	return ret;
+}
+
+DECLARE_USB_FUNCTION_INIT(ccid, ccidg_alloc_inst, ccidg_alloc);
+
+MODULE_DESCRIPTION("USB CCID Gadget driver");
+MODULE_AUTHOR("Marcus Folkesson <marcus.folkesson@gmail.com>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/usb/gadget/function/f_ccid.h b/drivers/usb/gadget/function/f_ccid.h
new file mode 100644
index 000000000000..f1053ec5c4d9
--- /dev/null
+++ b/drivers/usb/gadget/function/f_ccid.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Marcus Folkesson <marcus.folkesson@gmail.com>
+ */
+
+#ifndef F_CCID_H
+#define F_CCID_H
+
+#define CCID1_10                0x0110
+#define CCID_DECRIPTOR_TYPE     0x21
+#define ABDATA_SIZE		512
+#define SMART_CARD_DEVICE_CLASS	0x0B
+
+/* CCID Class Specific Request */
+#define CCIDGENERICREQ_ABORT                    0x01
+#define CCIDGENERICREQ_GET_CLOCK_FREQUENCIES    0x02
+#define CCIDGENERICREQ_GET_DATA_RATES           0x03
+
+/* Supported voltages */
+#define CCID_VOLTS_AUTO                             0x00
+#define CCID_VOLTS_5_0                              0x01
+#define CCID_VOLTS_3_0                              0x02
+#define CCID_VOLTS_1_8                              0x03
+
+struct f_ccidg_opts {
+	struct usb_function_instance func_inst;
+	int	minor;
+	__u32	features;
+	__u32	protocols;
+	__u8	pinsupport;
+	__u8	nslots;
+	__u8	lcdlayout;
+
+	/*
+	 * Protect the data form concurrent access by read/write
+	 * and create symlink/remove symlink.
+	 */
+	struct mutex	lock;
+	int		refcnt;
+};
+
+struct ccidg_bulk_in_header {
+	__u8	bMessageType;
+	__le32	wLength;
+	__u8	bSlot;
+	__u8	bSeq;
+	__u8	bStatus;
+	__u8	bError;
+	__u8	bSpecific;
+	__u8	abData[ABDATA_SIZE];
+	__u8	bSizeToSend;
+} __packed;
+
+struct ccidg_bulk_out_header {
+	__u8	 bMessageType;
+	__le32	 wLength;
+	__u8	 bSlot;
+	__u8	 bSeq;
+	__u8	 bSpecific_0;
+	__u8	 bSpecific_1;
+	__u8	 bSpecific_2;
+	__u8	 APDU[ABDATA_SIZE];
+} __packed;
+
+struct ccid_class_descriptor {
+	__u8	bLength;
+	__u8	bDescriptorType;
+	__le16	bcdCCID;
+	__u8	bMaxSlotIndex;
+	__u8	bVoltageSupport;
+	__le32	dwProtocols;
+	__le32	dwDefaultClock;
+	__le32	dwMaximumClock;
+	__u8	bNumClockSupported;
+	__le32	dwDataRate;
+	__le32	dwMaxDataRate;
+	__u8	bNumDataRatesSupported;
+	__le32	dwMaxIFSD;
+	__le32	dwSynchProtocols;
+	__le32	dwMechanical;
+	__le32	dwFeatures;
+	__le32	dwMaxCCIDMessageLength;
+	__u8	bClassGetResponse;
+	__u8	bClassEnvelope;
+	__le16	wLcdLayout;
+	__u8	bPINSupport;
+	__u8	bMaxCCIDBusySlots;
+} __packed;
+
+
+#endif
diff --git a/include/uapi/linux/usb/ccid.h b/include/uapi/linux/usb/ccid.h
new file mode 100644
index 000000000000..517897201563
--- /dev/null
+++ b/include/uapi/linux/usb/ccid.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Marcus Folkesson <marcus.folkesson@gmail.com>
+ *
+ * This file holds USB constants defined by the CCID Specification.
+ */
+
+#ifndef CCID_H
+#define CCID_H
+
+/* Slot error register when bmCommandStatus = 1 */
+#define CCID_CMD_ABORTED                            0xFF
+#define CCID_ICC_MUTE                               0xFE
+#define CCID_XFR_PARITY_ERROR                       0xFD
+#define CCID_XFR_OVERRUN                            0xFC
+#define CCID_HW_ERROR                               0xFB
+#define CCID_BAD_ATR_TS                             0xF8
+#define CCID_BAD_ATR_TCK                            0xF7
+#define CCID_ICC_PROTOCOL_NOT_SUPPORTED             0xF6
+#define CCID_ICC_CLASS_NOT_SUPPORTED                0xF5
+#define CCID_PROCEDURE_BYTE_CONFLICT                0xF4
+#define CCID_DEACTIVATED_PROTOCOL                   0xF3
+#define CCID_BUSY_WITH_AUTO_SEQUENCE                0xF2
+#define CCID_PIN_TIMEOUT                            0xF0
+#define CCID_PIN_CANCELLED                          0xEF
+#define CCID_CMD_SLOT_BUSY                          0xE0
+
+/* PC to RDR messages (bulk out) */
+#define CCID_PC_TO_RDR_ICCPOWERON                   0x62
+#define CCID_PC_TO_RDR_ICCPOWEROFF                  0x63
+#define CCID_PC_TO_RDR_GETSLOTSTATUS                0x65
+#define CCID_PC_TO_RDR_XFRBLOCK                     0x6F
+#define CCID_PC_TO_RDR_GETPARAMETERS                0x6C
+#define CCID_PC_TO_RDR_RESETPARAMETERS              0x6D
+#define CCID_PC_TO_RDR_SETPARAMETERS                0x61
+#define CCID_PC_TO_RDR_ESCAPE                       0x6B
+#define CCID_PC_TO_RDR_ICCCLOCK                     0x6E
+#define CCID_PC_TO_RDR_T0APDU                       0x6A
+#define CCID_PC_TO_RDR_SECURE                       0x69
+#define CCID_PC_TO_RDR_MECHANICAL                   0x71
+#define CCID_PC_TO_RDR_ABORT                        0x72
+#define CCID_PC_TO_RDR_SETDATARATEANDCLOCKFREQUENCY 0x73
+
+/* RDR to PC messages (bulk in) */
+#define CCID_RDR_TO_PC_DATABLOCK                    0x80
+#define CCID_RDR_TO_PC_SLOTSTATUS                   0x81
+#define CCID_RDR_TO_PC_PARAMETERS                   0x82
+#define CCID_RDR_TO_PC_ESCAPE                       0x83
+#define CCID_RDR_TO_PC_DATARATEANDCLOCKFREQUENCY    0x84
+
+/* Class Features */
+
+/* No special characteristics */
+#define CCID_FEATURES_NADA       0x00000000
+/* Automatic parameter configuration based on ATR data */
+#define CCID_FEATURES_AUTO_PCONF 0x00000002
+/* Automatic activation of ICC on inserting */
+#define CCID_FEATURES_AUTO_ACTIV 0x00000004
+/* Automatic ICC voltage selection */
+#define CCID_FEATURES_AUTO_VOLT  0x00000008
+/* Automatic ICC clock frequency change */
+#define CCID_FEATURES_AUTO_CLOCK 0x00000010
+/* Automatic baud rate change */
+#define CCID_FEATURES_AUTO_BAUD  0x00000020
+/*Automatic parameters negotiation made by the CCID */
+#define CCID_FEATURES_AUTO_PNEGO 0x00000040
+/* Automatic PPS made by the CCID according to the active parameters */
+#define CCID_FEATURES_AUTO_PPS   0x00000080
+/* CCID can set ICC in clock stop mode */
+#define CCID_FEATURES_ICCSTOP    0x00000100
+/* NAD value other than 00 accepted (T=1 protocol in use) */
+#define CCID_FEATURES_NAD        0x00000200
+/* Automatic IFSD exchange as first exchange (T=1 protocol in use) */
+#define CCID_FEATURES_AUTO_IFSD  0x00000400
+/* TPDU level exchanges with CCID */
+#define CCID_FEATURES_EXC_TPDU   0x00010000
+/* Short APDU level exchange with CCID */
+#define CCID_FEATURES_EXC_SAPDU  0x00020000
+/* Short and Extended APDU level exchange with CCID */
+#define CCID_FEATURES_EXC_APDU   0x00040000
+/* USB Wake up signaling supported on card insertion and removal */
+#define CCID_FEATURES_WAKEUP	0x00100000
+
+/* Supported protocols */
+#define CCID_PROTOCOL_NOT_SEL	0x00
+#define CCID_PROTOCOL_T0	0x01
+#define CCID_PROTOCOL_T1	0x02
+
+#define CCID_PINSUPOORT_NONE		0x00
+#define CCID_PINSUPOORT_VERIFICATION	(1 << 1)
+#define CCID_PINSUPOORT_MODIFICATION	(1 << 2)
+
+#endif
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 2/3] Documentation: usb: add documentation for USB CCID Gadget Device
From: Marcus Folkesson @ 2018-05-29 18:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi, davem,
	Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart
  Cc: linux-usb, linux-doc, linux-kernel, Marcus Folkesson
In-Reply-To: <20180529185021.13738-1-marcus.folkesson@gmail.com>

Add documentation to give a brief description on how to use the
CCID Gadget Device.
This includes a description for all attributes followed by an example on
how to setup the device with ConfigFS.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
---

Notes:
    v3:
    	- correct the grammer (thanks Randy)
    v2:
    	- add the missing changelog text

 Documentation/usb/gadget_ccid.rst | 267 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)
 create mode 100644 Documentation/usb/gadget_ccid.rst

diff --git a/Documentation/usb/gadget_ccid.rst b/Documentation/usb/gadget_ccid.rst
new file mode 100644
index 000000000000..524fe9e6ac19
--- /dev/null
+++ b/Documentation/usb/gadget_ccid.rst
@@ -0,0 +1,267 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+CCID Gadget
+============
+
+:Author: Marcus Folkesson <marcus.folkesson@gmail.com>
+
+Introduction
+============
+
+The CCID Gadget will present itself as a CCID device to the host system.
+The device supports two endpoints for now; BULK IN and BULK OUT.
+These endpoints are exposed to userspace via /dev/ccidg*.
+
+All CCID commands are sent on the BULK-OUT endpoint. Each command sent to the CCID
+has an associated ending response. Some commands can also have intermediate
+responses. The response is sent on the BULK-IN endpoint.
+See Figure 3-3 in the CCID Specification [1]_ for more details.
+
+The CCID commands must be handled in userspace since the driver is only working
+as a transport layer for the TPDUs.
+
+
+CCID Commands
+--------------
+
+All CCID commands begins with a 10-byte header followed by an optional
+data field depending on message type.
+
++--------+--------------+-------+----------------------------------+
+| Offset | Field        | Size  | Description                      |
++========+==============+=======+==================================+
+| 0      | bMessageType | 1     | Type of message                  |
++--------+--------------+-------+----------------------------------+
+| 1      | dwLength     | 4     | Message specific data length     |
+|        |              |       |                                  |
++--------+--------------+-------+----------------------------------+
+| 5      | bSlot        | 1     | Identifies the slot number       |
+|        |              |       | for this command                 |
++--------+--------------+-------+----------------------------------+
+| 6      | bSeq         | 1     | Sequence number for command      |
++--------+--------------+-------+----------------------------------+
+| 7      | ...          | 3     | Fields depends on message type   |
++--------+--------------+-------+----------------------------------+
+| 10     | abData       | array | Message specific data (OPTIONAL) |
++--------+--------------+-------+----------------------------------+
+
+
+Multiple CCID gadgets
+----------------------
+
+It is possible to create multiple instances of the CCID gadget, however,
+a much more flexible way is to create one gadget and set the `nslots` attribute
+to the number of desired CCID devices.
+
+All CCID commands specify which slot is the receiver in the `bSlot` field
+of the CCID header.
+
+Usage
+=====
+
+Access from userspace
+----------------------
+All communication is by read(2) and write(2) to the corresponding /dev/ccidg* device.
+Only one file descriptor is allowed to be open to the device at a time.
+
+The buffer size provided to read(2) **must be at least** 522 (10 bytes header + 512 bytes payload)
+bytes as we are working with whole commands.
+
+The buffer size provided to write(2) **may not exceed** 522 (10 bytes header + 512 bytes payload)
+bytes as we are working with whole commands.
+
+
+Configuration with configfs
+----------------------------
+
+ConfigFS is used to create and configure the CCID gadget.
+In order to get a device to work as intended, a few attributes must
+be considered.
+
+The attributes are described below followed by an example.
+
+features
+~~~~~~~~~
+
+The `feature` attribute writes to the dwFeatures field in the class descriptor.
+See Table 5.1-1 Smart Card Device Descriptors in the CCID Specification [1]_.
+
+The value indicates what intelligent features the CCID has.
+These values are available to user application as defined in ccid.h [2]_.
+The default value is 0x00000000.
+
+The value is a bitwise OR operation performed on the following values:
+
++------------+----------------------------------------------------------------+
+| Value      | Description                                                    |
++============+================================================================+
+| 0x00000000 | No special characteristics                                     |
++------------+----------------------------------------------------------------+
+| 0x00000002 | Automatic parameter configuration based on ATR data            |
++------------+----------------------------------------------------------------+
+| 0x00000004 | Automatic activation of ICC on inserting                       |
++------------+----------------------------------------------------------------+
+| 0x00000008 | Automatic ICC voltage selection                                |
++------------+----------------------------------------------------------------+
+| 0x00000010 | Automatic ICC clock frequency change according to active       |
+|            | parameters provided by the Host or self determined             |
++------------+----------------------------------------------------------------+
+| 0x00000020 | Automatic baud rate change according to active                 |
+|            | parameters provided by the Host or self determined             |
++------------+----------------------------------------------------------------+
+| 0x00000040 | Automatic parameters negotiation made by the CCID              |
++------------+----------------------------------------------------------------+
+| 0x00000080 | Automatic PPS made by the CCID according to the                |
+|            | active parameters                                              |
++------------+----------------------------------------------------------------+
+| 0x00000100 | CCID can set ICC in clock stop mode                            |
++------------+----------------------------------------------------------------+
+| 0x00000200 | NAD value other than 00 accepted (T=1 protocol in use)         |
++------------+----------------------------------------------------------------+
+| 0x00000400 | Automatic IFSD exchange as first exchange                      |
++------------+----------------------------------------------------------------+
+
+
+Only one of the following values may be present to select a level of exchange:
+
++------------+--------------------------------------------------+
+| Value      | Description                                      |
++============+==================================================+
+| 0x00010000 | TPDU level exchanges with CCID                   |
++------------+--------------------------------------------------+
+| 0x00020000 | Short APDU level exchange with CCID              |
++------------+--------------------------------------------------+
+| 0x00040000 | Short and Extended APDU level exchange with CCID |
++------------+--------------------------------------------------+
+
+If none of those values is indicated the level of exchange is
+character.
+
+
+protocols
+~~~~~~~~~~
+The `protocols` attribute writes to the dwProtocols field in the class descriptor.
+See Table 5.1-1 Smart Card Device Descriptors in the CCID Specification [1]_.
+
+The value is a bitwise OR operation performed on the following values:
+
++--------+--------------+
+| Value  | Description  |
++========+==============+
+| 0x0001 | Protocol T=0 |
++--------+--------------+
+| 0x0002 | Protocol T=1 |
++--------+--------------+
+
+If no protocol is selected both T=0 and T=1 will be supported (`protocols` = 0x0003).
+
+nslots
+~~~~~~
+
+The `nslots` attribute writes to the bMaxSlotIndex field in the class descriptor.
+See Table 5.1-1 Smart Card Device Descriptors in the CCID Specification [1]_.
+
+This is the index of the highest available slot on this device. All slots are consecutive starting at 00h.
+i.e. 0Fh = 16 slots on this device numbered 00h to 0Fh.
+
+The default value is 0, which means one slot.
+
+
+pinsupport
+~~~~~~~~~~~~
+
+This value indicates what PIN support features the CCID has.
+
+The `pinsupport` attribute writes to the dwPINSupport field in the class descriptor.
+See Table 5.1-1 Smart Card Device Descriptors in the CCID Specification [1]_.
+
+
+The value is a bitwise OR operation performed on the following values:
+
++--------+----------------------------+
+| Value  | Description                |
++========+============================+
+| 0x00   | No PIN support             |
++--------+----------------------------+
+| 0x01   | PIN Verification supported |
++--------+----------------------------+
+| 0x02   | PIN Modification supported |
++--------+----------------------------+
+
+The default value is set to 0x00.
+
+
+lcdlayout
+~~~~~~~~~~
+
+Number of lines and characters for the LCD display used to send messages for PIN entry.
+
+The `lcdLayout` attribute writes to the wLcdLayout field in the class descriptor.
+See Table 5.1-1 Smart Card Device Descriptors in the CCID Specification [1]_.
+
+
+The value is set as follows:
+
++--------+------------------------------------+
+| Value  | Description                        |
++========+====================================+
+| 0x0000 | No LCD                             |
++--------+------------------------------------+
+| 0xXXYY | XX: number of lines                |
+|        | YY: number of characters per line. |
++--------+------------------------------------+
+
+The default value is set to 0x0000.
+
+
+Example
+-------
+
+Here is an example on how to setup a CCID gadget with configfs ::
+
+    #!/bin/sh
+
+    CONFIGDIR=/sys/kernel/config
+    GADGET=$CONFIGDIR/usb_gadget/g0
+    FUNCTION=$GADGET/functions/ccid.sc0
+
+    VID=YOUR_VENDOR_ID_HERE
+    PID=YOUR_PRODUCT_ID_HERE
+    UDC=YOUR_UDC_HERE
+
+    #Mount filesystem
+    mount none -t configfs $CONFIGDIR
+
+    #Populate ID:s
+    echo $VID > $GADGET/idVendor
+    echo $PID > $GADGET/idProduct
+
+    #Create and configure the gadget
+    mkdir $FUNCTION
+    echo 0x000407B8 > $FUNCTION/features
+    echo 0x02 > $FUNCTION/protocols
+
+    #Create our english strings
+    mkdir  $GADGET/strings/0x409
+    echo 556677 > $GADGET/strings/0x409/serialnumber
+    echo "Hungry Penguins" > $GADGET/strings/0x409/manufacturer
+    echo "Harpoon With SmartCard"  > $GADGET/strings/0x409/product
+
+    #Create configuration
+    mkdir  $GADGET/configs/c.1
+    mkdir  $GADGET/configs/c.1/strings/0x409
+    echo Config1 > $GADGET/configs/c.1/strings/0x409/configuration
+
+    #Use `Config1` for our CCID gadget
+    ln -s $FUNCTION $GADGET/configs/c.1
+
+    #Execute
+    echo $UDC > $GADGET/UDC
+
+
+References
+==========
+
+.. [1] http://www.usb.org/developers/docs/devclass_docs/DWG_Smart-Card_CCID_Rev110.pdf
+.. [2] include/uapi/linux/usb/ccid.h
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 3/3] MAINTAINERS: add USB CCID Gadget Device
From: Marcus Folkesson @ 2018-05-29 18:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi, davem,
	Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart
  Cc: linux-usb, linux-doc, linux-kernel, Marcus Folkesson
In-Reply-To: <20180529185021.13738-1-marcus.folkesson@gmail.com>

Add MAINTAINERS entry for USB CCID Gadget Device

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
---

Notes:
    v3:
    	- No changes
    v2:
    	- No changes

 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 078fd80f664f..e77c3d2bec89 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14541,6 +14541,14 @@ L:	linux-scsi@vger.kernel.org
 S:	Maintained
 F:	drivers/usb/storage/uas.c
 
+USB CCID GADGET
+M:	Marcus Folkesson <marcus.folkesson@gmail.com>
+L:	linux-usb@vger.kernel.org
+S:	Maintained
+F:	drivers/usb/gadget/function/f_ccid.*
+F:	include/uapi/linux/usb/ccid.h
+F:	Documentation/usb/gadget_ccid.rst
+
 USB CDC ETHERNET DRIVER
 M:	Oliver Neukum <oliver@neukum.org>
 L:	linux-usb@vger.kernel.org
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v3 2/3] Documentation: usb: add documentation for USB CCID Gadget Device
From: Randy Dunlap @ 2018-05-29 20:27 UTC (permalink / raw)
  To: Marcus Folkesson, Greg Kroah-Hartman, Jonathan Corbet,
	Felipe Balbi, davem, Mauro Carvalho Chehab, Andrew Morton,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart
  Cc: linux-usb, linux-doc, linux-kernel
In-Reply-To: <20180529185021.13738-2-marcus.folkesson@gmail.com>

On 05/29/2018 11:50 AM, Marcus Folkesson wrote:
> Add documentation to give a brief description on how to use the
> CCID Gadget Device.
> This includes a description for all attributes followed by an example on
> how to setup the device with ConfigFS.
> 
> Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
> ---
> 
> Notes:
>     v3:
>     	- correct the grammer (thanks Randy)
>     v2:
>     	- add the missing changelog text
> 
>  Documentation/usb/gadget_ccid.rst | 267 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 267 insertions(+)
>  create mode 100644 Documentation/usb/gadget_ccid.rst
> 
> diff --git a/Documentation/usb/gadget_ccid.rst b/Documentation/usb/gadget_ccid.rst
> new file mode 100644
> index 000000000000..524fe9e6ac19
> --- /dev/null
> +++ b/Documentation/usb/gadget_ccid.rst
> @@ -0,0 +1,267 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +CCID Gadget
> +============
> +
> +:Author: Marcus Folkesson <marcus.folkesson@gmail.com>
> +
> +Introduction
> +============
> +
> +The CCID Gadget will present itself as a CCID device to the host system.
> +The device supports two endpoints for now; BULK IN and BULK OUT.
> +These endpoints are exposed to userspace via /dev/ccidg*.
> +
> +All CCID commands are sent on the BULK-OUT endpoint. Each command sent to the CCID
> +has an associated ending response. Some commands can also have intermediate
> +responses. The response is sent on the BULK-IN endpoint.
> +See Figure 3-3 in the CCID Specification [1]_ for more details.
> +
> +The CCID commands must be handled in userspace since the driver is only working
> +as a transport layer for the TPDUs.

I think that it would be helpful to tell us what the naming of the /dev/ccidg*
endpoints looks like.  Also, how to distinguish the BULK-IN from the BULK-OUT
endpoint.

> +
> +
> +CCID Commands
> +--------------
> +
> +All CCID commands begins with a 10-byte header followed by an optional
> +data field depending on message type.
> +
> ++--------+--------------+-------+----------------------------------+
> +| Offset | Field        | Size  | Description                      |
> ++========+==============+=======+==================================+
> +| 0      | bMessageType | 1     | Type of message                  |
> ++--------+--------------+-------+----------------------------------+
> +| 1      | dwLength     | 4     | Message specific data length     |
> +|        |              |       |                                  |
> ++--------+--------------+-------+----------------------------------+
> +| 5      | bSlot        | 1     | Identifies the slot number       |
> +|        |              |       | for this command                 |
> ++--------+--------------+-------+----------------------------------+
> +| 6      | bSeq         | 1     | Sequence number for command      |
> ++--------+--------------+-------+----------------------------------+
> +| 7      | ...          | 3     | Fields depends on message type   |
> ++--------+--------------+-------+----------------------------------+
> +| 10     | abData       | array | Message specific data (OPTIONAL) |
> ++--------+--------------+-------+----------------------------------+
> +
> +
> +Multiple CCID gadgets
> +----------------------
> +
> +It is possible to create multiple instances of the CCID gadget, however,
> +a much more flexible way is to create one gadget and set the `nslots` attribute
> +to the number of desired CCID devices.
> +
> +All CCID commands specify which slot is the receiver in the `bSlot` field
> +of the CCID header.
> +
> +Usage
> +=====
> +
> +Access from userspace
> +----------------------
> +All communication is by read(2) and write(2) to the corresponding /dev/ccidg* device.
> +Only one file descriptor is allowed to be open to the device at a time.


Reviewed-by: Randy Dunlap <rdunlap@infradead.org>


thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v4 22/27] x86/modules: Add option to start module section after kernel
From: Thomas Garnier @ 2018-05-29 22:15 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Thomas Garnier, Skip Peter Zijlstra, Skip Philippe Ombredanne,
	Skip Greg Kroah-Hartman, Skip Jiri Kosina,
	Skip Alexander Potapenko, Skip Joerg Roedel, Skip Jan Beulich,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Jonathan Corbet, Andy Lutomirski, Andrey Ryabinin,
	Kirill A. Shutemov, Tom Lendacky, Juergen Gross, linux-kernel,
	linux-doc
In-Reply-To: <20180529221625.33541-1-thgarnie@google.com>

Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 Documentation/x86/x86_64/mm.txt         | 3 +++
 arch/x86/Kconfig                        | 4 ++++
 arch/x86/include/asm/pgtable_64_types.h | 6 ++++++
 arch/x86/kernel/head64.c                | 5 ++++-
 arch/x86/mm/dump_pagetables.c           | 3 ++-
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..334ab458c82d 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -77,3 +77,6 @@ Their order is preserved but their base will be offset early at boot time.
 Be very careful vs. KASLR when changing anything here. The KASLR address
 range must not overlap with anything except the KASAN shadow area, which is
 correct as KASAN disables KASLR.
+
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 177e712201d1..94a00d81ec18 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2198,6 +2198,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+	bool
+
 config X86_GLOBAL_STACKPROTECTOR
 	bool "Stack cookie using a global variable"
 	depends on CC_STACKPROTECTOR_AUTO
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index adb47552e6bb..3ab25b908879 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -7,6 +7,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <asm/kaslr.h>
+#include <asm/sections.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -126,7 +127,12 @@ extern unsigned int ptrs_per_p4d;
 
 #define VMALLOC_END		(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
 
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR		ALIGN(((unsigned long)_end + PAGE_SIZE), PMD_SIZE)
+#else
 #define MODULES_VADDR		(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
+
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END		_AC(0xffffffffff000000, UL)
 #define MODULES_LEN		(MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fa661fb97127..3a1ce822e1c0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -394,12 +394,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	 * Build-time sanity checks on the kernel image and module
 	 * area mappings. (these are purely build-time and produce no code)
 	 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
 	BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
 	BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+		     MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
 	MAYBE_BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index cc7ff5957194..dca4098ce4fd 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -105,7 +105,7 @@ static struct addr_marker address_markers[] = {
 	[EFI_END_NR]		= { EFI_VA_END,		"EFI Runtime Services" },
 #endif
 	[HIGH_KERNEL_NR]	= { __START_KERNEL_map,	"High Kernel Mapping" },
-	[MODULES_VADDR_NR]	= { MODULES_VADDR,	"Modules" },
+	[MODULES_VADDR_NR]	= { 0/*MODULES_VADDR*/,	"Modules" },
 	[MODULES_END_NR]	= { MODULES_END,	"End Modules" },
 	[FIXADDR_START_NR]	= { FIXADDR_START,	"Fixmap Area" },
 	[END_OF_SPACE_NR]	= { -1,			NULL }
@@ -600,6 +600,7 @@ static int __init pt_dump_init(void)
 	address_markers[KASAN_SHADOW_START_NR].start_address = KASAN_SHADOW_START;
 	address_markers[KASAN_SHADOW_END_NR].start_address = KASAN_SHADOW_END;
 #endif
+	address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR;
 #endif
 #ifdef CONFIG_X86_32
 	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
-- 
2.17.0.921.gf22659ad46-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 2/2] Documentation: devices.txt: remove the mk712 touchscreen device from the list
From: Dmitry Torokhov @ 2018-05-29 22:58 UTC (permalink / raw)
  To: Martin Kepplinger
  Cc: corbet, gregkh, logang, stefanha, linux-doc, linux-kernel,
	linux-input
In-Reply-To: <20180402125551.13641-2-martink@posteo.de>

On Mon, Apr 02, 2018 at 02:55:51PM +0200, Martin Kepplinger wrote:
> The input/touchscreen/mk712.c driver has been rewritten for the common
> input event system. in 2005. There shouldn't a special device node be
> created anymore.
> 
> Signed-off-by: Martin Kepplinger <martink@posteo.de>

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Jon, can you please pick it up?

> ---
> 
> Please review this by looking at the driver too. Thanks,
> 
>                     martin
> 
> 
> 
>  Documentation/admin-guide/devices.txt | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
> index 4ec843123cc3..fb39bbf0789a 100644
> --- a/Documentation/admin-guide/devices.txt
> +++ b/Documentation/admin-guide/devices.txt
> @@ -259,7 +259,6 @@
>  		 11 = /dev/vrtpanel	Vr41xx embedded touch panel
>  		 13 = /dev/vpcmouse	Connectix Virtual PC Mouse
>  		 14 = /dev/touchscreen/ucb1x00  UCB 1x00 touchscreen
> -		 15 = /dev/touchscreen/mk712	MK712 touchscreen
>  		128 = /dev/beep		Fancy beep device
>  		129 =
>  		130 = /dev/watchdog	Watchdog timer port
> -- 
> 2.16.2
> 

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] Input: mk712: update documentation web link
From: Dmitry Torokhov @ 2018-05-29 22:59 UTC (permalink / raw)
  To: Martin Kepplinger
  Cc: corbet, gregkh, logang, stefanha, linux-doc, linux-kernel,
	linux-input
In-Reply-To: <20180402125551.13641-1-martink@posteo.de>

On Mon, Apr 02, 2018 at 02:55:50PM +0200, Martin Kepplinger wrote:
> At the mentioned address there's nothing found. By searching information
> on the controller chip still can be found, so update the link to the
> resulting page.
> 
> Signed-off-by: Martin Kepplinger <martink@posteo.de>

Applied, thank you.

> ---
>  drivers/input/touchscreen/mk712.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/touchscreen/mk712.c b/drivers/input/touchscreen/mk712.c
> index bd5352824f77..c179060525ae 100644
> --- a/drivers/input/touchscreen/mk712.c
> +++ b/drivers/input/touchscreen/mk712.c
> @@ -17,7 +17,7 @@
>   * found in Gateway AOL Connected Touchpad computers.
>   *
>   * Documentation for ICS MK712 can be found at:
> - *	http://www.idt.com/products/getDoc.cfm?docID=18713923
> + *	https://www.idt.com/general-parts/mk712-touch-screen-controller
>   */
>  
>  /*
> -- 
> 2.16.2
> 

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Leo Yan @ 2018-05-30  0:28 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	Mike Leach, Kim Phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, open list:DOCUMENTATION,
	Linux Kernel Mailing List, coresight
In-Reply-To: <CANLsYkzEU90aVtPMRqf=YHzee8DHq8JZttRuxjLKGz-gRYiBuw@mail.gmail.com>

Hi Mathieu,

On Tue, May 29, 2018 at 10:04:49AM -0600, Mathieu Poirier wrote:

[...]

> > As now this patch is big with more complex logic, so I consider to
> > split it into small patches:
> >
> > - Define CS_ETM_INVAL_ADDR;
> > - Fix for CS_ETM_TRACE_ON packet;
> > - Fix for exception packet;
> >
> > Does this make sense for you?  I have concern that this patch is a
> > fixing patch, so not sure after spliting patches will introduce
> > trouble for applying them for other stable kernels ...
> 
> Reverse the order:
> 
> - Fix for CS_ETM_TRACE_ON packet;
> - Fix for exception packet;
> - Define CS_ETM_INVAL_ADDR;
> 
> But you may not need to - see next comment.

From the discussion context, I think here 'you may not need to' is
referring to my concern for applying patches on stable kernel, so I
should take this patch series as an enhancement and don't need to
consider much for stable kernel.

On the other hand, your suggestion is possible to mean 'not need
to' split into small patches (though I guess this is misunderstanding
for your meaning).

Could you clarify which is your meaning?

> >> > +
> >> > +   /*
> >> >      * The packet records the execution range with an exclusive end address
> >> >      *
> >> >      * A64 instructions are constant size, so the last executed
> >> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> >> >     return packet->end_addr - A64_INSTR_SIZE;
> >> >  }
> >> >
> >> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> >> > +{
> >> > +   /*
> >> > +    * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> >> > +    * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> >> > +    */
> >> > +   if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> >> > +           return 0;
> >>
> >> Same comment as above.
> >
> > Will do this.
> >
> >> > +
> >> > +   return packet->start_addr;
> >> > +}
> >> > +
> >> >  static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
> >> >  {
> >> >     /*
> >> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
> >> >
> >> >     be       = &bs->entries[etmq->last_branch_pos];
> >> >     be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> >> > -   be->to   = etmq->packet->start_addr;
> >> > +   be->to   = cs_etm__first_executed_instr(etmq->packet);
> >> >     /* No support for mispredict */
> >> >     be->flags.mispred = 0;
> >> >     be->flags.predicted = 1;
> >> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
> >> >     sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
> >> >     sample.pid = etmq->pid;
> >> >     sample.tid = etmq->tid;
> >> > -   sample.addr = etmq->packet->start_addr;
> >> > +   sample.addr = cs_etm__first_executed_instr(etmq->packet);
> >> >     sample.id = etmq->etm->branches_id;
> >> >     sample.stream_id = etmq->etm->branches_id;
> >> >     sample.period = 1;
> >> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> >> >             etmq->period_instructions = instrs_over;
> >> >     }
> >> >
> >> > -   if (etm->sample_branches &&
> >> > -       etmq->prev_packet &&
> >> > -       etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> >> > -       etmq->prev_packet->last_instr_taken_branch) {
> >> > -           ret = cs_etm__synth_branch_sample(etmq);
> >> > -           if (ret)
> >> > -                   return ret;
> >> > +   if (etm->sample_branches && etmq->prev_packet) {
> >> > +           bool generate_sample = false;
> >> > +
> >> > +           /* Generate sample for start tracing packet */
> >> > +           if (etmq->prev_packet->sample_type == 0 ||
> >>
> >> What kind of packet is sample_type == 0 ?
> >
> > Just as explained above, sample_type == 0 is the packet which
> > initialized in the function cs_etm__alloc_queue().
> >
> >> > +               etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> >> > +                   generate_sample = true;
> >> > +
> >> > +           /* Generate sample for exception packet */
> >> > +           if (etmq->prev_packet->exc == true)
> >> > +                   generate_sample = true;
> >>
> >> Please don't do that.  Exception packets have a type of their own and can be
> >> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
> >> are.  Moreover exception packet containt an address that, if I'm reading the
> >> documenation properly, can be used to keep track of instructions that were
> >> executed between the last address of the previous range packet and the address
> >> executed just before the exception occurred.  Mike and Rob will have to confirm
> >> this as the decoder may be doing all that hard work for us.
> >
> > Sure, will wait for Rob and Mike to confirm for this.
> >
> > At my side, I dump the packet, the exception packet isn't passed to
> > cs-etm.c layer, the decoder layer only sets the flag
> > 'packet->exc = true' when exception packet is coming [1].
> 
> That's because we didn't need the information.  Now that we do a
> function that will insert a packet in the decoder packet queue and
> deal with the new packet type in the main decoder loop [2].  At that
> point your work may not be eligible for stable anymore and I think it
> is fine.  Robert's work was an enhancement over mine and yours is an
> enhancement over his.
> 
> [2]. https://elixir.bootlin.com/linux/v4.17-rc7/source/tools/perf/util/cs-etm.c#L999

Agree, will look into for exception packet and try to add new packet
type for this.

[...]

Thanks,
Leo Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: Andy Shevchenko @ 2018-05-30  0:55 UTC (permalink / raw)
  To: Marcus Folkesson
  Cc: Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi,
	David S. Miller, Mauro Carvalho Chehab, Andrew Morton,
	Randy Dunlap, Ruslan Bilovol, Thomas Gleixner, Kate Stewart, USB,
	Linux Documentation List, Linux Kernel Mailing List
In-Reply-To: <20180529185021.13738-1-marcus.folkesson@gmail.com>

On Tue, May 29, 2018 at 9:50 PM, Marcus Folkesson
<marcus.folkesson@gmail.com> wrote:
> Chip Card Interface Device (CCID) protocol is a USB protocol that
> allows a smartcard device to be connected to a computer via a card
> reader using a standard USB interface, without the need for each manufacturer
> of smartcards to provide its own reader or protocol.
>
> This gadget driver makes Linux show up as a CCID device to the host and let a
> userspace daemon act as the smartcard.
>
> This is useful when the Linux gadget itself should act as a cryptographic
> device or forward APDUs to an embedded smartcard device.

> + * Copyright (C) 2018 Marcus Folkesson <marcus.folkesson@gmail.com>

> + *

Redundant line

> +static DEFINE_IDA(ccidg_ida);

Where is it destroyed?

> +               ccidg_class = NULL;
> +               return PTR_ERR(ccidg_class);

Are you sure?

> +       if (!list_empty(head)) {
> +               req = list_first_entry(head, struct usb_request, list);

list_first_entry_or_null()

> +       req->length = len;

Perhaps assign this obly if malloc successedeed ?

> +       req->buf = kmalloc(len, GFP_ATOMIC);

> +       if (req->buf == NULL) {

if (!req->buf) ?

> +               usb_ep_free_request(ep, req);
> +               return ERR_PTR(-ENOMEM);
> +       }

> +static void ccidg_request_free(struct usb_request *req, struct usb_ep *ep)
> +{

> +       if (req) {

Is it even possible?

What about

if (!req)
 return;

?

> +               kfree(req->buf);
> +               usb_ep_free_request(ep, req);
> +       }
> +}

> +                       *(__le32 *) req->buf = ccid_class_desc.dwDefaultClock;

Hmm... put_unaligned()? cpu_to_le32()? cpu_to_le32p()?

> +                       *(__le32 *) req->buf = ccid_class_desc.dwDataRate;

Ditto.

> +               }
> +               }

Indentation.

> +       /* responded with data transfer or status phase? */
> +       if (ret >= 0) {

Why not

if (ret < 0)
 return ret;

?

> +       }
> +
> +       return ret;
> +}

> +       atomic_set(&ccidg->online, 1);
> +       return ret;

return 0; ?

> +       struct f_ccidg *ccidg;

> +       ccidg = container_of(inode->i_cdev, struct f_ccidg, cdev);

One line ?

> +       xfer = (req->actual < count) ? req->actual : count;

min_t()

> +       ret = wait_event_interruptible(bulk_dev->write_wq,
> +               ((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle))));
> +
> +       if (ret < 0)
> +               return -ERESTARTSYS;

Redundant blank line above.

> +static void ccidg_function_free(struct usb_function *f)
> +{

> +       struct f_ccidg_opts *opts;

> +       opts = container_of(f->fi, struct f_ccidg_opts, func_inst);

One line.


> +       mutex_lock(&opts->lock);
> +       --opts->refcnt;

-- will work

> +       mutex_unlock(&opts->lock);
> +}

> +       struct f_ccidg_opts *opts;

> +       opts = container_of(fi, struct f_ccidg_opts, func_inst);

Perhaps one line ?
> +       ++opts->refcnt;
X++ would work as well.
> +       struct f_ccidg_opts *opts;
> +
> +       opts = container_of(f, struct f_ccidg_opts, func_inst);

Perhaps one line?

> +#define CCID_PINSUPOORT_NONE           0x00

(0 << 0)

 ?

for sake of consistency

> +#define CCID_PINSUPOORT_VERIFICATION   (1 << 1)
> +#define CCID_PINSUPOORT_MODIFICATION   (1 << 2)

-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] PCI: Add pci=safemode option
From: Sinan Kaya @ 2018-05-30  3:19 UTC (permalink / raw)
  To: linux-pci, timur
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jonathan Corbet,
	Bjorn Helgaas, Thomas Gleixner, Ingo Molnar, Christoffer Dall,
	Paul E. McKenney, Marc Zyngier, Kai-Heng Feng, Thymo van Beers,
	Frederic Weisbecker, Konrad Rzeszutek Wilk, Greg Kroah-Hartman,
	David Rientjes, Rafael J. Wysocki, Keith Busch, Dongdong Liu,
	Frederick Lawler, Oza Pawandeep, Gabriele Paoloni,
	open list:DOCUMENTATION, open list

Adding pci=safemode kernel command line parameter to turn off all PCI
Express service driver as well as all optional PCIe features such as LTR,
Extended tags, Relaxed Ordering etc.

Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
reconfigured with by the kernel in case BIOS hands off a broken
configuration.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 drivers/pci/pci.c                               | 7 +++++++
 drivers/pci/pci.h                               | 2 ++
 drivers/pci/pcie/portdrv_core.c                 | 2 +-
 drivers/pci/probe.c                             | 6 ++++++
 5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 641ec9c..247adbb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3153,6 +3153,8 @@
 		noari		do not use PCIe ARI.
 		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
 				do not use PCIe ATS (and IOMMU device IOTLB).
+		safemode	turns of all optinal PCI features. Useful
+				for bringup/troubleshooting.
 		pcie_scan_all	Scan all possible PCIe devices.  Otherwise we
 				only look for one device below a PCIe downstream
 				port.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d27f771..11f0282 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -115,6 +115,9 @@ static bool pcie_ari_disabled;
 /* If set, the PCIe ATS capability will not be used. */
 static bool pcie_ats_disabled;
 
+/* If set, disables most of the optional PCI features */
+bool pci_safe_mode;
+
 bool pci_ats_disabled(void)
 {
 	return pcie_ats_disabled;
@@ -5845,6 +5848,10 @@ static int __init pci_setup(char *str)
 		if (*str && (str = pcibios_setup(str)) && *str) {
 			if (!strcmp(str, "nomsi")) {
 				pci_no_msi();
+			} else if (!strncmp(str, "safemode", 8)) {
+				pr_info("PCI: safe mode with minimum features\n");
+				pci_safe_mode = true;
+				pcie_bus_config = PCIE_BUS_SAFE;
 			} else if (!strncmp(str, "noats", 5)) {
 				pr_info("PCIe: ATS is disabled\n");
 				pcie_ats_disabled = true;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a0..4517bcd 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -8,6 +8,8 @@
 
 extern const unsigned char pcie_link_speed[];
 
+extern bool pci_safe_mode;
+
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
 
 /* Functions internal to the PCI core code */
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index a5b3b3a..9fe4ed6 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -311,7 +311,7 @@ int pcie_port_device_register(struct pci_dev *dev)
 
 	/* Get and check PCI Express port services */
 	capabilities = get_port_device_capability(dev);
-	if (!capabilities)
+	if (!capabilities || pci_safe_mode)
 		return 0;
 
 	pci_set_master(dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 3840207..295b79c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2047,6 +2047,9 @@ static void pci_configure_device(struct pci_dev *dev)
 	struct hotplug_params hpp;
 	int ret;
 
+	if (pci_safe_mode)
+		return;
+
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev, NULL);
 	pci_configure_relaxed_ordering(dev);
@@ -2213,6 +2216,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Setup MSI caps & disable MSI/MSI-X interrupts */
 	pci_msi_setup_pci_dev(dev);
 
+	if (pci_safe_mode)
+		return;
+
 	/* Buffers for saving PCIe and PCI-X capabilities */
 	pci_allocate_cap_save_buffers(dev);
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] PCI: Add pci=safemode option
From: Greg Kroah-Hartman @ 2018-05-30  4:31 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, timur, linux-arm-msm, linux-arm-kernel,
	Jonathan Corbet, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	Christoffer Dall, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Thymo van Beers, Frederic Weisbecker, Konrad Rzeszutek Wilk,
	David Rientjes, Rafael J. Wysocki, Keith Busch, Dongdong Liu,
	Frederick Lawler, Oza Pawandeep, Gabriele Paoloni,
	open list:DOCUMENTATION, open list
In-Reply-To: <1527650389-31575-1-git-send-email-okaya@codeaurora.org>

On Tue, May 29, 2018 at 11:19:41PM -0400, Sinan Kaya wrote:
> Adding pci=safemode kernel command line parameter to turn off all PCI
> Express service driver as well as all optional PCIe features such as LTR,
> Extended tags, Relaxed Ordering etc.
> 
> Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
> reconfigured with by the kernel in case BIOS hands off a broken
> configuration.

Why not fix the BIOS?  That's what sane platforms do :)

> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 2 ++
>  drivers/pci/pci.c                               | 7 +++++++
>  drivers/pci/pci.h                               | 2 ++
>  drivers/pci/pcie/portdrv_core.c                 | 2 +-
>  drivers/pci/probe.c                             | 6 ++++++
>  5 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 641ec9c..247adbb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3153,6 +3153,8 @@
>  		noari		do not use PCIe ARI.
>  		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
>  				do not use PCIe ATS (and IOMMU device IOTLB).
> +		safemode	turns of all optinal PCI features. Useful
> +				for bringup/troubleshooting.

s/optinal/optional/ ?

And you should explain what exactly in PCI is "optional".  Who defines
this and where is that list and what can go wrong if those options are
not enabled?

In looking at your patch, I can't determine that at all, so there's no
way that someone just looking at this sentence will be able to
understand.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] PCI: move early dump functionality from x86 arch into the common code
From: Sinan Kaya @ 2018-05-30  4:34 UTC (permalink / raw)
  To: linux-pci, timur
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Bjorn Helgaas,
	Christoffer Dall, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Thymo van Beers, Frederic Weisbecker, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, David Rientjes, Kate Stewart,
	Philippe Ombredanne, Tom Lendacky, Juergen Gross, Borislav Petkov,
	Mikulas Patocka, Petr Tesarik, Andy Lutomirski, Dou Liyang,
	Ram Pai, Boris Ostrovsky, open list:DOCUMENTATION, open list

Move early dump functionality into common code so that it is available for
all archtiectures. No need to carry arch specific reads around as the read
hooks are already initialized by the time pci_setup_device() is getting
called during scan.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/x86/include/asm/pci-direct.h               |  5 ---
 arch/x86/kernel/setup.c                         |  5 ---
 arch/x86/pci/common.c                           |  4 --
 arch/x86/pci/early.c                            | 50 -------------------------
 drivers/pci/pci.c                               |  4 ++
 drivers/pci/pci.h                               |  2 +-
 drivers/pci/probe.c                             | 19 ++++++++++
 8 files changed, 25 insertions(+), 66 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c247612..4459270 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2986,7 +2986,7 @@
 			See also Documentation/blockdev/paride.txt.
 
 	pci=option[,option...]	[PCI] various PCI subsystem options:
-		earlydump	[X86] dump PCI config space before the kernel
+		earlydump	dump PCI config space before the kernel
 				changes anything
 		off		[X86] don't probe for the PCI bus
 		bios		[X86-32] force use of PCI BIOS, don't access
diff --git a/arch/x86/include/asm/pci-direct.h b/arch/x86/include/asm/pci-direct.h
index e1084f7..e5e2129 100644
--- a/arch/x86/include/asm/pci-direct.h
+++ b/arch/x86/include/asm/pci-direct.h
@@ -14,9 +14,4 @@ extern void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset, u32 val);
 extern void write_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset, u8 val);
 extern void write_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset, u16 val);
 
-extern int early_pci_allowed(void);
-
-extern unsigned int pci_early_dump_regs;
-extern void early_dump_pci_device(u8 bus, u8 slot, u8 func);
-extern void early_dump_pci_devices(void);
 #endif /* _ASM_X86_PCI_DIRECT_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2f86d88..480f250 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -991,11 +991,6 @@ void __init setup_arch(char **cmdline_p)
 		setup_clear_cpu_cap(X86_FEATURE_APIC);
 	}
 
-#ifdef CONFIG_PCI
-	if (pci_early_dump_regs)
-		early_dump_pci_devices();
-#endif
-
 	e820__reserve_setup_data();
 	e820__finish_early_params();
 
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 563049c..d4ec117 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -22,7 +22,6 @@
 unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
 				PCI_PROBE_MMCONF;
 
-unsigned int pci_early_dump_regs;
 static int pci_bf_sort;
 int pci_routeirq;
 int noioapicquirk;
@@ -599,9 +598,6 @@ char *__init pcibios_setup(char *str)
 		pci_probe |= PCI_BIG_ROOT_WINDOW;
 		return NULL;
 #endif
-	} else if (!strcmp(str, "earlydump")) {
-		pci_early_dump_regs = 1;
-		return NULL;
 	} else if (!strcmp(str, "routeirq")) {
 		pci_routeirq = 1;
 		return NULL;
diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
index e5f753c..e20d449 100644
--- a/arch/x86/pci/early.c
+++ b/arch/x86/pci/early.c
@@ -51,53 +51,3 @@ void write_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset, u16 val)
 	outw(val, 0xcfc + (offset&2));
 }
 
-int early_pci_allowed(void)
-{
-	return (pci_probe & (PCI_PROBE_CONF1|PCI_PROBE_NOEARLY)) ==
-			PCI_PROBE_CONF1;
-}
-
-void early_dump_pci_device(u8 bus, u8 slot, u8 func)
-{
-	u32 value[256 / 4];
-	int i;
-
-	pr_info("pci 0000:%02x:%02x.%d config space:\n", bus, slot, func);
-
-	for (i = 0; i < 256; i += 4)
-		value[i / 4] = read_pci_config(bus, slot, func, i);
-
-	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1, value, 256, false);
-}
-
-void early_dump_pci_devices(void)
-{
-	unsigned bus, slot, func;
-
-	if (!early_pci_allowed())
-		return;
-
-	for (bus = 0; bus < 256; bus++) {
-		for (slot = 0; slot < 32; slot++) {
-			for (func = 0; func < 8; func++) {
-				u32 class;
-				u8 type;
-
-				class = read_pci_config(bus, slot, func,
-							PCI_CLASS_REVISION);
-				if (class == 0xffffffff)
-					continue;
-
-				early_dump_pci_device(bus, slot, func);
-
-				if (func == 0) {
-					type = read_pci_config_byte(bus, slot,
-								    func,
-							       PCI_HEADER_TYPE);
-					if (!(type & 0x80))
-						break;
-				}
-			}
-		}
-	}
-}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7c03701..ae5a2ae 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -115,6 +115,8 @@ static bool pcie_ari_disabled;
 /* If set, the PCIe ATS capability will not be used. */
 static bool pcie_ats_disabled;
 
+bool pci_early_dump;
+
 bool pci_ats_disabled(void)
 {
 	return pcie_ats_disabled;
@@ -5848,6 +5850,8 @@ static int __init pci_setup(char *str)
 				pcie_ats_disabled = true;
 			} else if (!strcmp(str, "noaer")) {
 				pci_no_aer();
+			} else if (!strcmp(str, "earlydump")) {
+				pci_early_dump = true;
 			} else if (!strncmp(str, "realloc=", 8)) {
 				pci_realloc_get_opt(str + 8);
 			} else if (!strncmp(str, "realloc", 7)) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a0..9c66b7d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -7,7 +7,7 @@
 #define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
 
 extern const unsigned char pcie_link_speed[];
-
+extern bool pci_early_dump;
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
 
 /* Functions internal to the PCI core code */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 3840207..b1f068d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1549,6 +1549,22 @@ static int pci_intx_mask_broken(struct pci_dev *dev)
 	return 0;
 }
 
+static void early_dump_pci_device(struct pci_dev *pdev)
+{
+	u32 value[256 / 4];
+	int i;
+
+	dev_info(&pdev->dev, "pci 0000:%02x:%02x.%d config space:\n",
+		 pdev->bus->number, PCI_SLOT(pdev->devfn),
+		 PCI_FUNC(pdev->devfn));
+
+	for (i = 0; i < 256; i += 4)
+		pci_read_config_dword(pdev, i, &value[i / 4]);
+
+	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1, value,
+		       256, false);
+}
+
 /**
  * pci_setup_device - Fill in class and map information of a device
  * @dev: the device structure to fill
@@ -1598,6 +1614,9 @@ int pci_setup_device(struct pci_dev *dev)
 	pci_printk(KERN_DEBUG, dev, "[%04x:%04x] type %02x class %#08x\n",
 		   dev->vendor, dev->device, dev->hdr_type, dev->class);
 
+	if (pci_early_dump)
+		early_dump_pci_device(dev);
+
 	/* Need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox