Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: Andrzej Pietrasiewicz @ 2018-05-28  9:58 UTC (permalink / raw)
  To: Marcus Folkesson
  Cc: Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi, davem,
	Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart, linux-usb,
	linux-doc, linux-kernel
In-Reply-To: <20180528093254.GC4651@gmail.com>

W dniu 28.05.2018 o 11:32, Marcus Folkesson pisze:
> Hi Andrzej,
> 
> Thank you for reviewing.
> 
> On Mon, May 28, 2018 at 11:12:27AM +0200, Andrzej Pietrasiewicz wrote:
>> W dniu 28.05.2018 o 10:38, Marcus Folkesson pisze:
>>> Hi Andrzej,
>>>
>>> On Mon, May 28, 2018 at 09:04:51AM +0200, Andrzej Pietrasiewicz wrote:
>>>> Mi Marcus,
>>>>
>>>> W dniu 26.05.2018 o 23:19, Marcus Folkesson pisze:
>>>>> Chip Card Interface Device (CCID) protocol is a USB protocol that
>>>>> allows a smartcard device to be connected to a computer via a card
>>>>> reader using a standard USB interface, without the need for each manufacturer
>>>>> of smartcards to provide its own reader or protocol.
>>>>>
>>>>> This gadget driver makes Linux show up as a CCID device to the host and let a
>>>>> userspace daemon act as the smartcard.
>>>>>
>>>>> This is useful when the Linux gadget itself should act as a cryptographic
>>>>> device or forward APDUs to an embedded smartcard device.
>>>>>
>>>>> Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
>>>>> ---
>>>>
>>>>>     
>>>>> +config USB_CONFIGFS_CCID
>>>>> +	bool "Chip Card Interface Device (CCID)"
>>>>> +	depends on USB_CONFIGFS
>>>>> +	select USB_F_CCID
>>>>> +	help
>>>>> +	  The CCID function driver provides generic emulation of a
>>>>> +	  Chip Card Interface Device (CCID).
>>>>> +
>>>>> +	  You will need a user space server talking to /dev/ccidg*,
>>>>> +	  since the kernel itself does not implement CCID/TPDU/APDU
>>>>> +	  protocol.
>>>>
>>>> Your function needs a userspace daemon to work.
>>>> It seems you want to use FunctionFS for such a purpose
>>>> instead of creating a new function.
>>>>
>>>> Andrzej
>>>
>>>>> +	  since the kernel itself does not implement CCID/TPDU/APDU
>>> Oops, the driver does handle CCID.
>>
>> Which parts of code do this handling?
> 
> My bad, I was thinking about the USB descriptors and endpoints setup.
> That is of cause not part of the CCID protocol.
> 
>>
>> Is there any kind of state machine usual for protocols?
>> If the protocol is stateless then isn't it just a data format then?
> 
> The protocol is stateless.
> 
>>
>> Which part of this handling must be done in kernel and why?
>>
>> Does the said handling do anything other than forwarding the
>> traffic between USB and a character device?
> 
> No, it forward the CCID messages to the character device to be handled
> by the application.
> 
>>

My opinion is: this wants to be done with FunctionFS.

Andrzej
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2
From: Peter Zijlstra @ 2018-05-28 12:45 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <9f547771-54e8-118e-80f7-48f99c7b0a12@redhat.com>

On Thu, May 24, 2018 at 02:55:25PM -0400, Waiman Long wrote:
> On 05/24/2018 11:43 AM, Peter Zijlstra wrote:

> > I'm confused... why exactly do we have both domain and load_balance ?
> 
> The domain is for partitioning the CPUs only. It doesn't change the load
> balancing state. So the load_balance flag is still need to turn on and
> off load balancing.

OK, so we have to two boolean flags, giving 4 possible states. Lets just
go through them one by on:

A) domain:0 load_balance:0 -- we have no exclusive domain, but have
   load-balancing disabled across them. AFAICT this should be an invalid
   state.

B) domain:0 load_balance:1 -- we have no exclusive domain, but have
   load-balancing enabled. AFAICT this is the default state and is a
   no-op.

C) domain:1 load_balance:0 -- we have an exclusive domain, and have
   load-balancing disabled across it. This is, AFAICT, identical to
   having a bunch of sub/sibling groups each with a single CPU domain.

D) domain:1 load_balance:1 -- we have an exclusive domain, and have
   load-balancing enabled. This is a partition.

Now, I think I've overlooked the fact that load_balance==1 only really
means something when the parent's load_balance==0, but I'm not sure that
really changes anything.

So, afaict, the above only have two useful states: B and D. Which again
raises the question, why two knobs? What useful configurations does it
allow?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: doc: Italian translation
From: Jonathan Corbet @ 2018-05-28 13:08 UTC (permalink / raw)
  To: Federico Vaga; +Cc: linux-doc, linux-kernel, Alessia Mantegazza
In-Reply-To: <20180527145559.16411-1-federico.vaga@vaga.pv.it>

On Sun, 27 May 2018 16:55:55 +0200
Federico Vaga <federico.vaga@vaga.pv.it> wrote:

> here the doc-guide translated in Italian. This set of patches includes
> some minor changes to the main one. The idea of this first set of patches
> is also to adjust the structure and our expectations.

I've only done a quick scan so far.  It looks reasonable, but I have a
couple of requests:

 - Please include a proper changelog with each patch describing what the
   patch does and why.  Otherwise I'll have to fill them in, and that
   makes me grumpy...:)

 - You might as well add a MAINTAINERS entry.  The other translations
   don't have them, but they should.

 - For extra credit: it's time to move the translations out of the
   top-level index into an index.rst in the translations directory.  Then
   the top-level file can have a single pointer to the translations.

> We tried to translate everything in **Italian**; which means that we avoided
> imported English words wherever there is the possibility to use the Italian
> ones. Of course, this is not always possible, or worst doing the translation
> make it less clear, for example the word "file".

"We"?  Is there somebody else's work represented here too?  If so, they
should appear in the signoff chain (or as a Co-developed-by credit).

> Another point. I noticed that the disclaimer for other translations is
> in English, so I did the same. But actually shouldn't be in Italian, since
> that page will be read by Italian speakers?

Perhaps, though part of the idea is to give the English speakers, too,
some vague clue of what they're looking at.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: doc: Italian translation
From: Federico Vaga @ 2018-05-28 13:22 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Alessia Mantegazza
In-Reply-To: <20180528070828.3a3c7287@lwn.net>

On Monday, 28 May 2018 15:08:28 CEST Jonathan Corbet wrote:
> On Sun, 27 May 2018 16:55:55 +0200
> 
> Federico Vaga <federico.vaga@vaga.pv.it> wrote:
> > here the doc-guide translated in Italian. This set of patches
> > includes some minor changes to the main one. The idea of this
> > first set of patches is also to adjust the structure and our
> > expectations.
> 
> I've only done a quick scan so far.  It looks reasonable, but I have
> a couple of requests:
> 
>  - Please include a proper changelog with each patch describing what
> the patch does and why.  Otherwise I'll have to fill them in, and
> that makes me grumpy...:)

Ok

>  - You might as well add a MAINTAINERS entry.  The other
> translations don't have them, but they should.

Ok

>  - For extra credit: it's time to move the translations out of the
>    top-level index into an index.rst in the translations directory. 
> Then the top-level file can have a single pointer to the
> translations.

I completely agree on this point. I can prepare a patch that does it.

> > We tried to translate everything in **Italian**; which means that
> > we avoided imported English words wherever there is the
> > possibility to use the Italian ones. Of course, this is not
> > always possible, or worst doing the translation make it less
> > clear, for example the word "file".
> 
> "We"?  Is there somebody else's work represented here too?  If so,
> they should appear in the signoff chain (or as a Co-developed-by
> credit).

I did, in patch 3/5 and 4/5 there is Alessia as "signed-off" (in CC 
here).

> > Another point. I noticed that the disclaimer for other
> > translations is in English, so I did the same. But actually
> > shouldn't be in Italian, since that page will be read by Italian
> > speakers?
> 
> Perhaps, though part of the idea is to give the English speakers,
> too, some vague clue of what they're looking at.

Make sense. I will keep it like this then.

Thanks

> 
> Thanks,
> 
> jon


-- 
Federico Vaga
http://www.federicovaga.it/


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] x86/xen: Combine PV features to be disabled in xen_nopv
From: Joao Martins @ 2018-05-28 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Juergen Gross, Boris Ostrovsky, xen-devel,
	Jonathan Corbet, linux-doc, Joao Martins

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Extend 'xen_nopv' parameter to disable individual features in
the following format:

xen_nopv={ [spin,][ipi] | all }

'spin' to disable PV spinlocks
'ipi' to disable PV IPI
'all' to disable all of the above and PV drivers

'all' ideally would be the set of features that can be disabled with
xen_nopv. Albeit it is disabling PV features and drivers because it has
been the behaviour of all past kernels. Thus users should not see a
difference when specifying no value. Also, deprecate 'xen_nopvspin' as
we are making it part of 'xen_nopv'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Changes since RFC:
 (https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg00898.html
  Comments from Boris, and new changes)
 - Improve subject and commit message
 - Fix Documentation format and deprecate 'xen_nopvspin' instead of removing it
 - Make existing 'xen_nopvspin' command line use the same masking
 helpers as xen_nopv
 - Use xen_nopv_ipi() accordingly
 - Do not create unused irqs when disabling PV IPI
---
 Documentation/admin-guide/kernel-parameters.txt | 13 ++++--
 arch/x86/xen/enlighten_hvm.c                    | 55 ++++++++++++++++++++++---
 arch/x86/xen/smp.c                              | 21 ++++++----
 arch/x86/xen/smp_hvm.c                          | 14 +++++--
 arch/x86/xen/spinlock.c                         |  9 ++--
 arch/x86/xen/xen-ops.h                          |  9 ++++
 6 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d46f095..3f29ee4741bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4807,12 +4807,19 @@
 			never -- do not unplug even if version check succeeds
 
 	xen_nopvspin	[X86,XEN]
+			[Deprecated - use xen_nopv=spin]
 			Disables the ticketlock slowpath using Xen PV
 			optimizations.
 
-	xen_nopv	[X86]
-			Disables the PV optimizations forcing the HVM guest to
-			run as generic HVM guest with no PV drivers.
+	xen_nopv=	[X86,XEN]
+			Disables various (or all) PV optimizations forcing the
+			HVM (or PV) guest to run without them. No value
+			specified defaults to 'all'.
+			Format: { [spin,][ipi,] | all }
+			all -- every PV feature on HVM, including PV drivers.
+			spin -- Disables the ticketlock slowpath using Xen PV
+			         optimizations (PV and HVM).
+			ipi -- Disable PV IPIs (on HVM).
 
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 19c1ff542387..c396671dadd5 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -208,17 +208,62 @@ static void __init xen_hvm_guest_init(void)
 #endif
 }
 
-static bool xen_nopv;
+static unsigned int xen_nopv_feat;
+
+static bool xen_nopv(void)
+{
+	return xen_nopv_feat == XEN_NOPV_ALL;
+}
+
+bool xen_nopv_ipi(void)
+{
+	return xen_nopv_feat & XEN_NOPV_IPI;
+}
+
+bool xen_nopv_spin(void)
+{
+	return xen_nopv_feat & XEN_NOPV_SPIN;
+}
+
+void xen_set_nopv(unsigned int mask)
+{
+	xen_nopv_feat |= mask;
+}
+
 static __init int xen_parse_nopv(char *arg)
 {
-       xen_nopv = true;
-       return 0;
+	char *p, *q;
+	int l;
+
+	xen_nopv_feat = arg ? 0 : XEN_NOPV_ALL;
+
+	for (p = arg; p; p = q) {
+		q = strchr(p, ',');
+		if (q) {
+			l = q - p;
+			q++;
+		} else {
+			l = strlen(p);
+		}
+		if (!strncmp(p, "spin", l))
+			xen_nopv_feat |= XEN_NOPV_SPIN;
+		else if (!strncmp(p, "ipi", l))
+			xen_nopv_feat |= XEN_NOPV_IPI;
+		else if (!strncmp(p, "all", l))
+			xen_nopv_feat = XEN_NOPV_ALL;
+		else
+			pr_warn("unrecognised option '%s' for 'xen_nopv'\n", p);
+	}
+
+	pr_debug("xen_nopv_feat = 0x%x\n", xen_nopv_feat);
+
+	return 0;
 }
 early_param("xen_nopv", xen_parse_nopv);
 
 bool xen_hvm_need_lapic(void)
 {
-	if (xen_nopv)
+	if (xen_nopv())
 		return false;
 	if (xen_pv_domain())
 		return false;
@@ -232,7 +277,7 @@ EXPORT_SYMBOL_GPL(xen_hvm_need_lapic);
 
 static uint32_t __init xen_platform_hvm(void)
 {
-	if (xen_pv_domain() || xen_nopv)
+	if (xen_pv_domain() || xen_nopv())
 		return 0;
 
 	return xen_cpuid_base();
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7a43b2ae19f1..e67a941cd20f 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -64,6 +64,18 @@ int xen_smp_intr_init(unsigned int cpu)
 	int rc;
 	char *resched_name, *callfunc_name, *debug_name;
 
+	debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
+	rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu, xen_debug_interrupt,
+				     IRQF_PERCPU | IRQF_NOBALANCING,
+				     debug_name, NULL);
+	if (rc < 0)
+		goto fail;
+	per_cpu(xen_debug_irq, cpu).irq = rc;
+	per_cpu(xen_debug_irq, cpu).name = debug_name;
+
+	if (xen_hvm_domain() && xen_nopv_ipi())
+		return 0;
+
 	resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
 	rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
 				    cpu,
@@ -88,15 +100,6 @@ int xen_smp_intr_init(unsigned int cpu)
 	per_cpu(xen_callfunc_irq, cpu).irq = rc;
 	per_cpu(xen_callfunc_irq, cpu).name = callfunc_name;
 
-	debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
-	rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu, xen_debug_interrupt,
-				     IRQF_PERCPU | IRQF_NOBALANCING,
-				     debug_name, NULL);
-	if (rc < 0)
-		goto fail;
-	per_cpu(xen_debug_irq, cpu).irq = rc;
-	per_cpu(xen_debug_irq, cpu).name = debug_name;
-
 	callfunc_name = kasprintf(GFP_KERNEL, "callfuncsingle%d", cpu);
 	rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_SINGLE_VECTOR,
 				    cpu,
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index f8d39440b292..fb550b4cd273 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -66,11 +66,19 @@ void __init xen_hvm_smp_init(void)
 	if (!xen_have_vector_callback)
 		return;
 
+	smp_ops.smp_cpus_done = xen_smp_cpus_done;
+	smp_ops.smp_prepare_boot_cpu = xen_hvm_smp_prepare_boot_cpu;
 	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
-	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
 	smp_ops.cpu_die = xen_hvm_cpu_die;
+
+	if (xen_nopv_ipi()) {
+		pr_debug("xen: PV IPI disabled\n");
+		return;
+	}
+
+	pr_debug("xen: PV IPI enabled\n");
+
+	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
 	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
 	smp_ops.send_call_func_single_ipi = xen_smp_send_call_function_single_ipi;
-	smp_ops.smp_prepare_boot_cpu = xen_hvm_smp_prepare_boot_cpu;
-	smp_ops.smp_cpus_done = xen_smp_cpus_done;
 }
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index cd97a62394e7..72499a5ef06c 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -21,7 +21,6 @@
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(char *, irq_name);
-static bool xen_pvspin = true;
 
 static void xen_qlock_kick(int cpu)
 {
@@ -80,7 +79,7 @@ void xen_init_lock_cpu(int cpu)
 	int irq;
 	char *name;
 
-	if (!xen_pvspin) {
+	if (xen_nopv_spin()) {
 		if (cpu == 0)
 			static_branch_disable(&virt_spin_lock_key);
 		return;
@@ -108,7 +107,7 @@ void xen_init_lock_cpu(int cpu)
 
 void xen_uninit_lock_cpu(int cpu)
 {
-	if (!xen_pvspin)
+	if (xen_nopv_spin())
 		return;
 
 	unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
@@ -130,7 +129,7 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_vcpu_stolen);
 void __init xen_init_spinlocks(void)
 {
 
-	if (!xen_pvspin) {
+	if (xen_nopv_spin()) {
 		printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
 		return;
 	}
@@ -146,7 +145,7 @@ void __init xen_init_spinlocks(void)
 
 static __init int xen_parse_nopvspin(char *arg)
 {
-	xen_pvspin = false;
+	xen_set_nopv(XEN_NOPV_SPIN);
 	return 0;
 }
 early_param("xen_nopvspin", xen_parse_nopvspin);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 3b34745d0a52..d212cf034541 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -164,4 +164,13 @@ void xen_hvm_post_suspend(int suspend_cancelled);
 static inline void xen_hvm_post_suspend(int suspend_cancelled) {}
 #endif
 
+#define XEN_NOPV_PLATFORM (1<<0)
+#define XEN_NOPV_SPIN     (1<<1)
+#define XEN_NOPV_IPI      (1<<2)
+#define XEN_NOPV_ALL      (XEN_NOPV_PLATFORM | XEN_NOPV_SPIN | XEN_NOPV_IPI)
+
+void xen_set_nopv(unsigned int mask);
+bool xen_nopv_spin(void);
+bool xen_nopv_ipi(void);
+
 #endif /* XEN_OPS_H */
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2
From: Waiman Long @ 2018-05-28 18:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <20180528124508.GE3452@worktop.programming.kicks-ass.net>

On 05/28/2018 08:45 AM, Peter Zijlstra wrote:
> On Thu, May 24, 2018 at 02:55:25PM -0400, Waiman Long wrote:
>> On 05/24/2018 11:43 AM, Peter Zijlstra wrote:
>>> I'm confused... why exactly do we have both domain and load_balance ?
>> The domain is for partitioning the CPUs only. It doesn't change the load
>> balancing state. So the load_balance flag is still need to turn on and
>> off load balancing.
> OK, so we have to two boolean flags, giving 4 possible states. Lets just
> go through them one by on:
>
> A) domain:0 load_balance:0 -- we have no exclusive domain, but have
>    load-balancing disabled across them. AFAICT this should be an invalid
>    state.
>
> B) domain:0 load_balance:1 -- we have no exclusive domain, but have
>    load-balancing enabled. AFAICT this is the default state and is a
>    no-op.
>
> C) domain:1 load_balance:0 -- we have an exclusive domain, and have
>    load-balancing disabled across it. This is, AFAICT, identical to
>    having a bunch of sub/sibling groups each with a single CPU domain.
>
> D) domain:1 load_balance:1 -- we have an exclusive domain, and have
>    load-balancing enabled. This is a partition.
>
> Now, I think I've overlooked the fact that load_balance==1 only really
> means something when the parent's load_balance==0, but I'm not sure that
> really changes anything.
>
> So, afaict, the above only have two useful states: B and D. Which again
> raises the question, why two knobs? What useful configurations does it
> allow?

I am working on the v9 patch, and below is the current draft of the
documentation. Hopefully that will clarify some of the concepts that we
are discussing here.

  cpuset.sched.domain_root
        A read-write single value file which exists on non-root
        cpuset-enabled cgroups.  It is a binary value flag that accepts
        either "0" (off) or "1" (on).  This flag is set by the parent
        and is not delegatable.

        If set, it indicates that the current cgroup is the root of a
        new scheduling domain or partition that comprises itself and
        all its descendants except those that are scheduling domain
        roots themselves and their descendants.  The root cgroup is
        always a scheduling domain root.

        There are constraints on where this flag can be set.  It can
        only be set in a cgroup if all the following conditions are true.

        1) The "cpuset.cpus" is not empty and the list of CPUs are
           exclusive, i.e. they are not shared by any of its siblings.
        2) The parent cgroup is also a scheduling domain root.
        3) There is no child cgroups with cpuset enabled.  This is
           for eliminating corner cases that have to be handled if such
           a condition is allowed.

        Setting this flag will take the CPUs away from the effective
        CPUs of the parent cgroup.  Once it is set, this flag cannot
        be cleared if there are any child cgroups with cpuset enabled.
        Further changes made to "cpuset.cpus" is allowed as long as
        the first condition above is still true.

        A parent scheduling domain root cgroup cannot distribute all
        its CPUs to its child scheduling domain root cgroups unless
        its load balancing flag is turned off.

  cpuset.sched.load_balance
        A read-write single value file which exists on non-root
        cpuset-enabled cgroups.  It is a binary value flag that accepts
        either "0" (off) or "1" (on).  This flag is set by the parent
        and is not delegatable.  It is on by default in the root cgroup.

        When it is on, tasks within this cpuset will be load-balanced
        by the kernel scheduler.  Tasks will be moved from CPUs with
        high load to other CPUs within the same cpuset with less load
        periodically.

        When it is off, there will be no load balancing among CPUs on
        this cgroup.  Tasks will stay in the CPUs they are running on
        and will not be moved to other CPUs.

        The load balancing state of a cgroup can only be changed on a
        scheduling domain root cgroup with no cpuset-enabled children.
        All cgroups within a scheduling domain or partition must have
        the same load balancing state.  As descendant cgroups of a
        scheduling domain root are created, they inherit the same load
        balancing state of their root.

The main purpose of using a new domain_root flag is to enable user to
create new partitions without the trick of disabling load_balance in the
parent and enabling it in the child. Now, we can create as many
partitions as we want without ever turning off load balancing in any of
the cpusets. I find it to be more straight forward and easier to
understand than using the load_balance trick.

Of course, turning off load balancing is still useful in some use cases,
so it is supported. To simplify thing, it is mandated that all the
cpusets within a partition must have the same load balancing state. This
is to ensure that we can't use the load_balance trick to create
additional partition underneath it. The domain_root flag is the only way
to create partition.

A) domain_root: 0, load_balance: 0 -- a non-domain root cpuset within a
no load balancing partition.

B) domain_root: 0, load_balance: 1 -- a non-domain root cpuset within a
load balancing partition.

C) domain_root: 1, load_balance: 0 -- a domain root cpuset of a no load
balancing partition.

D) domain_root: 1, load_balance: 1 -- a domain root cpuset of a load
balancing partition.

Hope this help.

Cheers,
Longman



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 0/4] Perf script: Add python script for CoreSight trace disassembler
From: Arnaldo Carvalho de Melo @ 2018-05-28 20:03 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Jonathan Corbet, Robert Walker, mike.leach,
	kim.phillips, Tor Jeremiassen, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	linux-doc, linux-kernel, coresight
In-Reply-To: <1527497103-3593-1-git-send-email-leo.yan@linaro.org>

Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
> This patch series is to support for using 'perf script' for CoreSight
> trace disassembler, for this purpose this patch series adds a new
> python script to parse CoreSight tracing event and use command 'objdump'
> for disassembled lines, finally this can generate readable program
> execution flow for reviewing tracing data.
> 
> Patch 0001 is one fixing patch to generate samples for the start packet
> and exception packets.
> 
> Patch 0002 is the prerequisite to add addr into sample dict, so this
> value can be used by python script to analyze instruction range.
> 
> Patch 0003 is to add python script for trace disassembler.
> 
> Patch 0004 is to add doc to explain python script usage and give
> example for it.
> 
> This patch series has been rebased on acme git tree [1] with the commit
> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
> on Hikey (ARM64 octa CA53 cores).

Thanks, applied to perf/core.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 0/4] Perf script: Add python script for CoreSight trace disassembler
From: Mathieu Poirier @ 2018-05-28 21:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Leo Yan, Jonathan Corbet, Robert Walker, Mike Leach, Kim Phillips,
	Tor Jeremiassen, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel,
	open list:DOCUMENTATION, Linux Kernel Mailing List, coresight
In-Reply-To: <20180528200326.GJ25467@kernel.org>

On 28 May 2018 at 14:03, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Em Mon, May 28, 2018 at 04:44:59PM +0800, Leo Yan escreveu:
>> This patch series is to support for using 'perf script' for CoreSight
>> trace disassembler, for this purpose this patch series adds a new
>> python script to parse CoreSight tracing event and use command 'objdump'
>> for disassembled lines, finally this can generate readable program
>> execution flow for reviewing tracing data.
>>
>> Patch 0001 is one fixing patch to generate samples for the start packet
>> and exception packets.
>>
>> Patch 0002 is the prerequisite to add addr into sample dict, so this
>> value can be used by python script to analyze instruction range.
>>
>> Patch 0003 is to add python script for trace disassembler.
>>
>> Patch 0004 is to add doc to explain python script usage and give
>> example for it.
>>
>> This patch series has been rebased on acme git tree [1] with the commit
>> 19422a9f2a3b ("perf tools: Fix kernel_start for PTI on x86") and tested
>> on Hikey (ARM64 octa CA53 cores).
>
> Thanks, applied to perf/core.

Please hold off on that Arnaldo - I'm currently reviewing the set and
I think some things can be improved.

Thanks,
Mathieu

>
> - Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Mathieu Poirier @ 2018-05-28 22:13 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	mike.leach, kim.phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-doc, linux-kernel, coresight
In-Reply-To: <1527497103-3593-2-git-send-email-leo.yan@linaro.org>

Leo and/or Robert,

On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
> Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> traces") reworks the samples generation flow from CoreSight trace to
> match the correct format so Perf report tool can display the samples
> properly.
> 
> But the change has side effect for branch packet handling, it only
> generate branch samples by checking previous packet flag
> 'last_instr_taken_branch' is true, this results in below three kinds
> packets are missed to generate branch samples:
> 
> - The start tracing packet at the beginning of tracing data;
> - The exception handling packet;
> - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
>   for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
>   that there have a discontinuity in the trace, on the other hand we
>   also miss to generate proper branch sample for packets before and
>   after CS_ETM_TRACE_ON packet.
> 
> This patch is to add branch sample handling for up three kinds packets:
> 
> - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
>   zero and in this case it generates branch sample for the start tracing
>   packet; furthermore, we also need to handle the condition for
>   prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
> 
> - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
>   generate branch sample for exception handling packet;
> 
> - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
>   branch sample in the function cs_etm__flush(), this can save complete
>   info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
>   packet.  We also generate branch sample for the new CS_ETM_RANGE
>   packet after CS_ETM_TRACE_ON packet, this have two purposes, the
>   first one purpose is to save the info for the new CS_ETM_RANGE packet,
>   the second purpose is to save CS_ETM_TRACE_ON packet info so we can
>   have hint for a discontinuity in the trace.
> 
>   For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
>   'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
>   the decoder layer as dummy value.  This patch is to convert these
>   values to zeros for more readable; this is accomplished by functions
>   cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
>   later one is a new function introduced by this patch.
> 
> Reviewed-by: Robert Walker <robert.walker@arm.com>
> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 73 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 822ba91..8418173 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
>  static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>  {
>  	/*
> +	 * The packet is the start tracing packet if the end_addr is zero,
> +	 * returns 0 for this case.
> +	 */
> +	if (!packet->end_addr)
> +		return 0;

What is considered to be the "start tracing packet"?  Right now the only two
kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
TRACE_ON.  How can we hit a condition where packet->end-addr == 0?


> +
> +	/*
> +	 * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
> +	 * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> +	 */
> +	if (packet->end_addr == 0xdeadbeefdeadbeefUL)
> +		return 0;

As it is with the above, I find triggering on addresses to be brittle and hard
to maintain on the long run.  Packets all have a sample_type field that should
be used in cases like this one.  That way we know exactly the condition that is
targeted.

While working on this set, please spin-off another patch that defines
CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
numeral is used.  That way we stop using the hard coded value.

> +
> +	/*
>  	 * The packet records the execution range with an exclusive end address
>  	 *
>  	 * A64 instructions are constant size, so the last executed
> @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
>  	return packet->end_addr - A64_INSTR_SIZE;
>  }
>  
> +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> +{
> +	/*
> +	 * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> +	 * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> +	 */
> +	if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> +		return 0;

Same comment as above.

> +
> +	return packet->start_addr;
> +}
> +
>  static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
>  {
>  	/*
> @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
>  
>  	be       = &bs->entries[etmq->last_branch_pos];
>  	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> -	be->to	 = etmq->packet->start_addr;
> +	be->to	 = cs_etm__first_executed_instr(etmq->packet);
>  	/* No support for mispredict */
>  	be->flags.mispred = 0;
>  	be->flags.predicted = 1;
> @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
>  	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
>  	sample.pid = etmq->pid;
>  	sample.tid = etmq->tid;
> -	sample.addr = etmq->packet->start_addr;
> +	sample.addr = cs_etm__first_executed_instr(etmq->packet);
>  	sample.id = etmq->etm->branches_id;
>  	sample.stream_id = etmq->etm->branches_id;
>  	sample.period = 1;
> @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>  		etmq->period_instructions = instrs_over;
>  	}
>  
> -	if (etm->sample_branches &&
> -	    etmq->prev_packet &&
> -	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> -	    etmq->prev_packet->last_instr_taken_branch) {
> -		ret = cs_etm__synth_branch_sample(etmq);
> -		if (ret)
> -			return ret;
> +	if (etm->sample_branches && etmq->prev_packet) {
> +		bool generate_sample = false;
> +
> +		/* Generate sample for start tracing packet */
> +		if (etmq->prev_packet->sample_type == 0 ||

What kind of packet is sample_type == 0 ?

> +		    etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> +			generate_sample = true;
> +
> +		/* Generate sample for exception packet */
> +		if (etmq->prev_packet->exc == true)
> +			generate_sample = true;

Please don't do that.  Exception packets have a type of their own and can be
added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
are.  Moreover exception packet containt an address that, if I'm reading the
documenation properly, can be used to keep track of instructions that were
executed between the last address of the previous range packet and the address
executed just before the exception occurred.  Mike and Rob will have to confirm
this as the decoder may be doing all that hard work for us.

> +
> +		/* Generate sample for normal branch packet */
> +		if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> +		    etmq->prev_packet->last_instr_taken_branch)
> +			generate_sample = true;
> +
> +		if (generate_sample) {
> +			ret = cs_etm__synth_branch_sample(etmq);
> +			if (ret)
> +				return ret;
> +		}
>  	}
>  
>  	if (etm->sample_branches || etm->synth_opts.last_branch) {
> @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
>  static int cs_etm__flush(struct cs_etm_queue *etmq)
>  {
>  	int err = 0;
> +	struct cs_etm_auxtrace *etm = etmq->etm;
>  	struct cs_etm_packet *tmp;
>  
> -	if (etmq->etm->synth_opts.last_branch &&
> -	    etmq->prev_packet &&
> -	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> +	if (!etmq->prev_packet)
> +		return 0;
> +
> +	if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> +		return 0;
> +
> +	if (etmq->etm->synth_opts.last_branch) {

If you add:

        if (!etmq->etm->synth_opts.last_branch)
                return 0;

You can avoid indenting the whole block.

>  		/*
>  		 * Generate a last branch event for the branches left in the
>  		 * circular buffer at the end of the trace.
> @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
>  		err = cs_etm__synth_instruction_sample(
>  			etmq, addr,
>  			etmq->period_instructions);
> +		if (err)
> +			return err;
>  		etmq->period_instructions = 0;
> +	}
>  
> -		/*
> -		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> -		 * the next incoming packet.
> -		 */
> -		tmp = etmq->packet;
> -		etmq->packet = etmq->prev_packet;
> -		etmq->prev_packet = tmp;
> +	if (etm->sample_branches) {
> +		err = cs_etm__synth_branch_sample(etmq);
> +		if (err)
> +			return err;
>  	}
>  
> -	return err;
> +	/*
> +	 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> +	 * the next incoming packet.
> +	 */
> +	tmp = etmq->packet;
> +	etmq->packet = etmq->prev_packet;
> +	etmq->prev_packet = tmp;

Robert, I remember noticing that when you first submitted the code but forgot to
go back to it.  What is the point of swapping the packets?  I understand

etmq->prev_packet = etmq->packet;

But not 

etmq->packet = tmp;

After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
is called, which is alwasy right after either cs_etm__sample() or
cs_etm__flush().

Thanks,
Mathieu



> +	return 0;
>  }
>  
>  static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFT v3 1/4] perf cs-etm: Generate branch sample for missed packets
From: Leo Yan @ 2018-05-29  0:25 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Jonathan Corbet, Robert Walker,
	mike.leach, kim.phillips, Tor Jeremiassen, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-doc, linux-kernel, coresight
In-Reply-To: <20180528221347.GA4109@xps15>

Hi Mathieu,

On Mon, May 28, 2018 at 04:13:47PM -0600, Mathieu Poirier wrote:
> Leo and/or Robert,
> 
> On Mon, May 28, 2018 at 04:45:00PM +0800, Leo Yan wrote:
> > Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> > traces") reworks the samples generation flow from CoreSight trace to
> > match the correct format so Perf report tool can display the samples
> > properly.
> > 
> > But the change has side effect for branch packet handling, it only
> > generate branch samples by checking previous packet flag
> > 'last_instr_taken_branch' is true, this results in below three kinds
> > packets are missed to generate branch samples:
> > 
> > - The start tracing packet at the beginning of tracing data;
> > - The exception handling packet;
> > - If one CS_ETM_TRACE_ON packet is inserted, we also miss to handle it
> >   for branch samples.  CS_ETM_TRACE_ON packet itself can give the info
> >   that there have a discontinuity in the trace, on the other hand we
> >   also miss to generate proper branch sample for packets before and
> >   after CS_ETM_TRACE_ON packet.
> > 
> > This patch is to add branch sample handling for up three kinds packets:
> > 
> > - In function cs_etm__sample(), check if 'prev_packet->sample_type' is
> >   zero and in this case it generates branch sample for the start tracing
> >   packet; furthermore, we also need to handle the condition for
> >   prev_packet::end_addr is zero in the cs_etm__last_executed_instr();
> > 
> > - In function cs_etm__sample(), check if 'prev_packet->exc' is true and
> >   generate branch sample for exception handling packet;
> > 
> > - If there has one CS_ETM_TRACE_ON packet is coming, we firstly generate
> >   branch sample in the function cs_etm__flush(), this can save complete
> >   info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON
> >   packet.  We also generate branch sample for the new CS_ETM_RANGE
> >   packet after CS_ETM_TRACE_ON packet, this have two purposes, the
> >   first one purpose is to save the info for the new CS_ETM_RANGE packet,
> >   the second purpose is to save CS_ETM_TRACE_ON packet info so we can
> >   have hint for a discontinuity in the trace.
> > 
> >   For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and
> >   'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in
> >   the decoder layer as dummy value.  This patch is to convert these
> >   values to zeros for more readable; this is accomplished by functions
> >   cs_etm__last_executed_instr() and cs_etm__first_executed_instr().  The
> >   later one is a new function introduced by this patch.
> > 
> > Reviewed-by: Robert Walker <robert.walker@arm.com>
> > Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 93 +++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 73 insertions(+), 20 deletions(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 822ba91..8418173 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -495,6 +495,20 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
> >  static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> >  {
> >  	/*
> > +	 * The packet is the start tracing packet if the end_addr is zero,
> > +	 * returns 0 for this case.
> > +	 */
> > +	if (!packet->end_addr)
> > +		return 0;
> 
> What is considered to be the "start tracing packet"?  Right now the only two
> kind of packets inserted in the decoder packet buffer queue are INST_RANGE and
> TRACE_ON.  How can we hit a condition where packet->end-addr == 0?

When the first CS_ETM_RANGE packet is coming, etmq->prev_packet is
initialized by the function cs_etm__alloc_queue(), so
etmq->prev_packet->end_addr is zero:

    etmq->prev_packet = zalloc(szp);

As you mentioned, we should only have two kind of packets for
CS_ETM_RANGE and CS_ETM_TRACE_ON.  Currently we skip to handle the
first CS_ETM_TRACE_ON packet in function cs_etm__flush(), we also can
refine the function cs_etm__flush() to handle the first coming
CS_ETM_TRACE_ON packet, after that all packets will be CS_ETM_RANGE
and CS_ETM_TRACE_ON and have no chance to hit 'packet->end_addr = 0'.

Does this make sense for you?

--- Packet dumping when first packet coming ---
cs_etm__flush: prev_packet: sample_type=0 exc=0 exc_ret=0 cpu=0 start_addr=0x0 end_addr=0x0 last_instr_taken_branch=0
cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=1 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=0

> > +
> > +	/*
> > +	 * The packet is the CS_ETM_TRACE_ON packet if the end_addr is
> > +	 * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> > +	 */
> > +	if (packet->end_addr == 0xdeadbeefdeadbeefUL)
> > +		return 0;
> 
> As it is with the above, I find triggering on addresses to be brittle and hard
> to maintain on the long run.  Packets all have a sample_type field that should
> be used in cases like this one.  That way we know exactly the condition that is
> targeted.

Will do this.

> While working on this set, please spin-off another patch that defines
> CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL and replace all the cases where the
> numeral is used.  That way we stop using the hard coded value.

Will do this.

As now this patch is big with more complex logic, so I consider to
split it into small patches:

- Define CS_ETM_INVAL_ADDR;
- Fix for CS_ETM_TRACE_ON packet;
- Fix for exception packet;

Does this make sense for you?  I have concern that this patch is a
fixing patch, so not sure after spliting patches will introduce
trouble for applying them for other stable kernels ...

> > +
> > +	/*
> >  	 * The packet records the execution range with an exclusive end address
> >  	 *
> >  	 * A64 instructions are constant size, so the last executed
> > @@ -505,6 +519,18 @@ static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
> >  	return packet->end_addr - A64_INSTR_SIZE;
> >  }
> >  
> > +static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
> > +{
> > +	/*
> > +	 * The packet is the CS_ETM_TRACE_ON packet if the start_addr is
> > +	 * magic number 0xdeadbeefdeadbeefUL, returns 0 for this case.
> > +	 */
> > +	if (packet->start_addr == 0xdeadbeefdeadbeefUL)
> > +		return 0;
> 
> Same comment as above.

Will do this.

> > +
> > +	return packet->start_addr;
> > +}
> > +
> >  static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
> >  {
> >  	/*
> > @@ -546,7 +572,7 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
> >  
> >  	be       = &bs->entries[etmq->last_branch_pos];
> >  	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
> > -	be->to	 = etmq->packet->start_addr;
> > +	be->to	 = cs_etm__first_executed_instr(etmq->packet);
> >  	/* No support for mispredict */
> >  	be->flags.mispred = 0;
> >  	be->flags.predicted = 1;
> > @@ -701,7 +727,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
> >  	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
> >  	sample.pid = etmq->pid;
> >  	sample.tid = etmq->tid;
> > -	sample.addr = etmq->packet->start_addr;
> > +	sample.addr = cs_etm__first_executed_instr(etmq->packet);
> >  	sample.id = etmq->etm->branches_id;
> >  	sample.stream_id = etmq->etm->branches_id;
> >  	sample.period = 1;
> > @@ -897,13 +923,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> >  		etmq->period_instructions = instrs_over;
> >  	}
> >  
> > -	if (etm->sample_branches &&
> > -	    etmq->prev_packet &&
> > -	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > -	    etmq->prev_packet->last_instr_taken_branch) {
> > -		ret = cs_etm__synth_branch_sample(etmq);
> > -		if (ret)
> > -			return ret;
> > +	if (etm->sample_branches && etmq->prev_packet) {
> > +		bool generate_sample = false;
> > +
> > +		/* Generate sample for start tracing packet */
> > +		if (etmq->prev_packet->sample_type == 0 ||
> 
> What kind of packet is sample_type == 0 ?

Just as explained above, sample_type == 0 is the packet which
initialized in the function cs_etm__alloc_queue().

> > +		    etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
> > +			generate_sample = true;
> > +
> > +		/* Generate sample for exception packet */
> > +		if (etmq->prev_packet->exc == true)
> > +			generate_sample = true;
> 
> Please don't do that.  Exception packets have a type of their own and can be
> added to the decoder packet queue the same way INST_RANGE and TRACE_ON packets
> are.  Moreover exception packet containt an address that, if I'm reading the
> documenation properly, can be used to keep track of instructions that were
> executed between the last address of the previous range packet and the address
> executed just before the exception occurred.  Mike and Rob will have to confirm
> this as the decoder may be doing all that hard work for us.

Sure, will wait for Rob and Mike to confirm for this.

At my side, I dump the packet, the exception packet isn't passed to
cs-etm.c layer, the decoder layer only sets the flag
'packet->exc = true' when exception packet is coming [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n364

> > +
> > +		/* Generate sample for normal branch packet */
> > +		if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
> > +		    etmq->prev_packet->last_instr_taken_branch)
> > +			generate_sample = true;
> > +
> > +		if (generate_sample) {
> > +			ret = cs_etm__synth_branch_sample(etmq);
> > +			if (ret)
> > +				return ret;
> > +		}
> >  	}
> >  
> >  	if (etm->sample_branches || etm->synth_opts.last_branch) {
> > @@ -922,11 +963,16 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
> >  static int cs_etm__flush(struct cs_etm_queue *etmq)
> >  {
> >  	int err = 0;
> > +	struct cs_etm_auxtrace *etm = etmq->etm;
> >  	struct cs_etm_packet *tmp;
> >  
> > -	if (etmq->etm->synth_opts.last_branch &&
> > -	    etmq->prev_packet &&
> > -	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
> > +	if (!etmq->prev_packet)
> > +		return 0;
> > +
> > +	if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
> > +		return 0;
> > +
> > +	if (etmq->etm->synth_opts.last_branch) {
> 
> If you add:
> 
>         if (!etmq->etm->synth_opts.last_branch)
>                 return 0;
> 
> You can avoid indenting the whole block.

No, here we cannot do like this.  Except we need to handle the
condition for 'etmq->etm->synth_opts.last_branch', we also need to
handle 'etm->sample_branches'.  These two conditions are saperate and
decide by different command parameters from 'perf script'.

> >  		/*
> >  		 * Generate a last branch event for the branches left in the
> >  		 * circular buffer at the end of the trace.
> > @@ -939,18 +985,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
> >  		err = cs_etm__synth_instruction_sample(
> >  			etmq, addr,
> >  			etmq->period_instructions);
> > +		if (err)
> > +			return err;
> >  		etmq->period_instructions = 0;
> > +	}
> >  
> > -		/*
> > -		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > -		 * the next incoming packet.
> > -		 */
> > -		tmp = etmq->packet;
> > -		etmq->packet = etmq->prev_packet;
> > -		etmq->prev_packet = tmp;
> > +	if (etm->sample_branches) {
> > +		err = cs_etm__synth_branch_sample(etmq);
> > +		if (err)
> > +			return err;
> >  	}
> >  
> > -	return err;
> > +	/*
> > +	 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> > +	 * the next incoming packet.
> > +	 */
> > +	tmp = etmq->packet;
> > +	etmq->packet = etmq->prev_packet;
> > +	etmq->prev_packet = tmp;
> 
> Robert, I remember noticing that when you first submitted the code but forgot to
> go back to it.  What is the point of swapping the packets?  I understand
> 
> etmq->prev_packet = etmq->packet;
> 
> But not 
> 
> etmq->packet = tmp;
> 
> After all etmq->packet will be clobbered as soon as cs_etm_decoder__get_packet()
> is called, which is alwasy right after either cs_etm__sample() or
> cs_etm__flush().

Yeah, I have the same question for this :)

Thanks for suggestions and reviewing.

> Thanks,
> Mathieu
> 
> 
> 
> > +	return 0;
> >  }
> >  
> >  static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
> > -- 
> > 2.7.4
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag
From: Waiman Long @ 2018-05-29  0:55 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180522125750.GA31040@localhost.localdomain>

On 05/22/2018 08:57 AM, Juri Lelli wrote:
> Hi,
>
> On 17/05/18 16:55, Waiman Long wrote:
>
> [...]
>
>>  /**
>> + * update_isolated_cpumask - update the isolated_cpus mask of parent cpuset
>> + * @cpuset:  The cpuset that requests CPU isolation
>> + * @oldmask: The old isolated cpumask to be removed from the parent
>> + * @newmask: The new isolated cpumask to be added to the parent
>> + * Return: 0 if successful, an error code otherwise
>> + *
>> + * Changes to the isolated CPUs are not allowed if any of CPUs changing
>> + * state are in any of the child cpusets of the parent except the requesting
>> + * child.
>> + *
>> + * If the sched_domain flag changes, either the oldmask (0=>1) or the
>> + * newmask (1=>0) will be NULL.
>> + *
>> + * Called with cpuset_mutex held.
>> + */
>> +static int update_isolated_cpumask(struct cpuset *cpuset,
>> +	struct cpumask *oldmask, struct cpumask *newmask)
>> +{
>> +	int retval;
>> +	int adding, deleting;
>> +	cpumask_var_t addmask, delmask;
>> +	struct cpuset *parent = parent_cs(cpuset);
>> +	struct cpuset *sibling;
>> +	struct cgroup_subsys_state *pos_css;
>> +	int old_count = parent->isolation_count;
>> +	bool dying = cpuset->css.flags & CSS_DYING;
>> +
>> +	/*
>> +	 * Parent must be a scheduling domain with non-empty cpus_allowed.
>> +	 */
>> +	if (!is_sched_domain(parent) || cpumask_empty(parent->cpus_allowed))
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * The oldmask, if present, must be a subset of parent's isolated
>> +	 * CPUs.
>> +	 */
>> +	if (oldmask && !cpumask_empty(oldmask) && (!parent->isolation_count ||
>> +			!cpumask_subset(oldmask, parent->isolated_cpus))) {
>> +		WARN_ON_ONCE(1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/*
>> +	 * A sched_domain state change is not allowed if there are
>> +	 * online children and the cpuset is not dying.
>> +	 */
>> +	if (!dying && (!oldmask || !newmask) &&
>> +	    css_has_online_children(&cpuset->css))
>> +		return -EBUSY;
>> +
>> +	if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
>> +		return -ENOMEM;
>> +	if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
>> +		free_cpumask_var(addmask);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	if (!old_count) {
>> +		if (!zalloc_cpumask_var(&parent->isolated_cpus, GFP_KERNEL)) {
>> +			retval = -ENOMEM;
>> +			goto out;
>> +		}
>> +		old_count = 1;
>> +	}
>> +
>> +	retval = -EBUSY;
>> +	adding = deleting = false;
>> +	if (newmask)
>> +		cpumask_copy(addmask, newmask);
>> +	if (oldmask)
>> +		deleting = cpumask_andnot(delmask, oldmask, addmask);
>> +	if (newmask)
>> +		adding = cpumask_andnot(addmask, newmask, delmask);
>> +
>> +	if (!adding && !deleting)
>> +		goto out_ok;
>> +
>> +	/*
>> +	 * The cpus to be added must be in the parent's effective_cpus mask
>> +	 * but not in the isolated_cpus mask.
>> +	 */
>> +	if (!cpumask_subset(addmask, parent->effective_cpus))
>> +		goto out;
>> +	if (parent->isolation_count &&
>> +	    cpumask_intersects(parent->isolated_cpus, addmask))
>> +		goto out;
>> +
>> +	/*
>> +	 * Check if any CPUs in addmask or delmask are in a sibling cpuset.
>> +	 * An empty sibling cpus_allowed means it is the same as parent's
>> +	 * effective_cpus. This checking is skipped if the cpuset is dying.
>> +	 */
>> +	if (dying)
>> +		goto updated_isolated_cpus;
>> +
>> +	cpuset_for_each_child(sibling, pos_css, parent) {
>> +		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
>> +			continue;
>> +		if (cpumask_empty(sibling->cpus_allowed))
>> +			goto out;
>> +		if (adding &&
>> +		    cpumask_intersects(sibling->cpus_allowed, addmask))
>> +			goto out;
>> +		if (deleting &&
>> +		    cpumask_intersects(sibling->cpus_allowed, delmask))
>> +			goto out;
>> +	}
> Just got the below by echoing 1 into cpuset.sched.domain of a sibling with
> "isolated" cpuset.cpus. Guess you are missing proper locking about here
> above.
>
> --->8---
> [ 7509.905005] =============================
> [ 7509.905009] WARNING: suspicious RCU usage
> [ 7509.905014] 4.17.0-rc5+ #11 Not tainted
> [ 7509.905017] -----------------------------
> [ 7509.905023] /home/juri/work/kernel/linux/kernel/cgroup/cgroup.c:3826 cgroup_mutex or RCU read lock required!
> [ 7509.905026] 
>                other info that might help us debug this:

The cause is missing rcu_lock/rcu_unlock in section of the code. It will
be fixed in the next version.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 6/6] cpuset: Allow reporting of sched domain generation info
From: Waiman Long @ 2018-05-29  1:04 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180522135349.GB31040@localhost.localdomain>

On 05/22/2018 09:53 AM, Juri Lelli wrote:
> Hi,
>
> On 17/05/18 16:55, Waiman Long wrote:
>> This patch enables us to report sched domain generation information.
>>
>> If DYNAMIC_DEBUG is enabled, issuing the following command
>>
>>   echo "file cpuset.c +p" > /sys/kernel/debug/dynamic_debug/control
>>
>> and setting loglevel to 8 will allow the kernel to show what scheduling
>> domain changes are being made.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  kernel/cgroup/cpuset.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index fb8aa82b..8f586e8 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -820,6 +820,12 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>  	}
>>  	BUG_ON(nslot != ndoms);
>>  
>> +#ifdef CONFIG_DEBUG_KERNEL
>> +	for (i = 0; i < ndoms; i++)
>> +		pr_debug("generate_sched_domains dom %d: %*pbl\n", i,
>> +			 cpumask_pr_args(doms[i]));
>> +#endif
>> +
> While I'm always in favor of adding debug output, in this case I'm not
> sure it's adding much to what we already print when booting with
> sched_debug kernel command-line param, e.g.
>
> --->8---
>  Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.0-rc5+ ... sched_debug
>
> [...]
>
>  smp: Bringing up secondary CPUs ...
>  x86: Booting SMP configuration:
>  .... node  #0, CPUs:        #1  #2  #3  #4  #5
>  .... node  #1, CPUs:    #6  #7  #8  #9 #10 #11
>  smp: Brought up 2 nodes, 12 CPUs
>  smpboot: Max logical packages: 2
>  smpboot: Total of 12 processors activated (45636.50 BogoMIPS)
>  CPU0 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU1 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU2 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU3 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 3:{ span=3 cap=1023 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU4 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU5 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 5:{ span=5 }, 0:{ span=0 cap=1016 }, 1:{ span=1 cap=1011 }, 2:{ span=2 }, 3:{ span=3 cap=1023 }, 4:{ span=4 }
>    domain-1: span=0-11 level=NUMA
>     groups: 0:{ span=0-5 cap=6122 }, 6:{ span=6-11 cap=6141 }
>  CPU6 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  CPU7 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  CPU8 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  CPU9 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  CPU10 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  CPU11 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 cap=1021 }, 9:{ span=9 }, 10:{ span=10 }
>    domain-1: span=0-11 level=NUMA
>     groups: 6:{ span=6-11 cap=6141 }, 0:{ span=0-5 cap=6122 }
>  span: 0-11 (max cpu_capacity = 1024)
>
> [...]
>
>  generate_sched_domains dom 0: 6-11 <-- this and the one below is what
>  generate_sched_domains dom 1: 0-5      you are adding
>  CPU0 attaching NULL sched-domain.
>  CPU1 attaching NULL sched-domain.
>  CPU2 attaching NULL sched-domain.
>  CPU3 attaching NULL sched-domain.
>  CPU4 attaching NULL sched-domain.
>  CPU5 attaching NULL sched-domain.
>  CPU6 attaching NULL sched-domain.
>  CPU7 attaching NULL sched-domain.
>  CPU8 attaching NULL sched-domain.
>  CPU9 attaching NULL sched-domain.
>  CPU10 attaching NULL sched-domain.
>  CPU11 attaching NULL sched-domain.
>  CPU6 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }
>  CPU7 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }
>  CPU8 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }
>  CPU9 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }
>  CPU10 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 10:{ span=10 }, 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }
>  CPU11 attaching sched-domain(s):
>   domain-0: span=6-11 level=MC
>    groups: 11:{ span=11 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }
>  span: 6-11 (max cpu_capacity = 1024)
>  CPU0 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }
>  CPU1 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }
>  CPU2 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }
>  CPU3 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }
>  CPU4 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 4:{ span=4 }, 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }
>  CPU5 attaching sched-domain(s):
>   domain-0: span=0-5 level=MC
>    groups: 5:{ span=5 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }
>  span: 0-5 (max cpu_capacity = 1024)
>  --->8---
>
>  Do you think there is still a benefit in printing out what
>  generate_sched_domains does?

For testing purpose the debug messages from generate_sched_domains() is
more concise. The sched_debug message is too verbose from my point of
view. So I think it is still useful.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-29  1:12 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180524102837.GA3948@localhost.localdomain>

On 05/24/2018 06:28 AM, Juri Lelli wrote:
> On 17/05/18 16:55, Waiman Long wrote:
>
> [...]
>
>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>>  	 * passing doms with offlined cpu to partition_sched_domains().
>>  	 * Anyways, hotplug work item will rebuild sched domains.
>>  	 */
>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>> +	if (!top_cpuset.isolation_count &&
>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>> +		goto out;
>> +
>> +	if (top_cpuset.isolation_count &&
>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>>  		goto out;
> Do we cover the case in which hotplug removed one of the isolated cpus
> from cpu_active_mask?

Yes, you are right. That is the remnant of my original patch that allows
only one isolated_cpus at root. Thanks for spotting that.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: kbuild test robot @ 2018-05-29  1:15 UTC (permalink / raw)
  To: Marcus Folkesson
  Cc: kbuild-all, Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi,
	davem, Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart, linux-usb,
	linux-doc, linux-kernel, Marcus Folkesson
In-Reply-To: <20180526211940.25474-1-marcus.folkesson@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9645 bytes --]

Hi Marcus,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on balbi-usb/next]
[also build test WARNING on v4.17-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Marcus-Folkesson/usb-gadget-ccid-add-support-for-USB-CCID-Gadget-Device/20180529-014427
base:   https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git next
config: s390-allyesconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:332:0,
                    from include/linux/kernel.h:14,
                    from include/linux/list.h:9,
                    from include/linux/kobject.h:19,
                    from include/linux/cdev.h:5,
                    from drivers/usb/gadget/function/f_ccid.c:8:
   drivers/usb/gadget/function/f_ccid.c: In function 'ccidg_bulk_read':
>> drivers/usb/gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
>> drivers/usb/gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
>> drivers/usb/gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
>> drivers/usb/gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb/gadget/function/f_ccid.c: In function 'ccidg_bulk_write':
   drivers/usb/gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb/gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
--
   In file included from include/linux/printk.h:332:0,
                    from include/linux/kernel.h:14,
                    from include/linux/list.h:9,
                    from include/linux/kobject.h:19,
                    from include/linux/cdev.h:5,
                    from drivers/usb//gadget/function/f_ccid.c:8:
   drivers/usb//gadget/function/f_ccid.c: In function 'ccidg_bulk_read':
   drivers/usb//gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb//gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb//gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb//gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb//gadget/function/f_ccid.c: In function 'ccidg_bulk_write':
   drivers/usb//gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb//gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb//gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %i\n",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, \
                                          ^~~
>> include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb//gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~

vim +612 drivers/usb/gadget/function/f_ccid.c

   599	
   600	static ssize_t ccidg_bulk_read(struct file *file, char __user *buf,
   601					size_t count, loff_t *pos)
   602	{
   603		struct f_ccidg *ccidg =  file->private_data;
   604		struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
   605		struct usb_request *req;
   606		int r = count, xfer;
   607		int ret;
   608	
   609		/* Make sure we have enough space for a whole package */
   610		if (count < sizeof(struct ccidg_bulk_out_header)) {
 > 611			DBG(ccidg->function.config->cdev,
 > 612					"ccid: too small buffer size. %i provided, need at least %i\n",
   613					count, sizeof(struct ccidg_bulk_out_header));
   614			return -ENOMEM;
   615		}
   616	
   617		if (!atomic_read(&ccidg->online))
   618			return -ENODEV;
   619	
   620		/* queue a request */
   621		req = bulk_dev->rx_req;
   622		req->length = count;
   623		atomic_set(&bulk_dev->rx_done, 0);
   624	
   625		ret = usb_ep_queue(ccidg->out, req, GFP_KERNEL);
   626		if (ret < 0) {
   627			ERROR(ccidg->function.config->cdev,
   628					"ccid: usb ep queue failed\n");
   629			return -EIO;
   630		}
   631	
   632		if (!atomic_read(&bulk_dev->rx_done) &&
   633				file->f_flags & (O_NONBLOCK | O_NDELAY))
   634			return -EAGAIN;
   635	
   636		/* wait for a request to complete */
   637		ret = wait_event_interruptible(bulk_dev->read_wq,
   638				atomic_read(&bulk_dev->rx_done) ||
   639				!atomic_read(&ccidg->online));
   640		if (ret < 0) {
   641			usb_ep_dequeue(ccidg->out, req);
   642			return -ERESTARTSYS;
   643		}
   644	
   645		/* Still online? */
   646		if (!atomic_read(&ccidg->online))
   647			return -ENODEV;
   648	
   649		atomic_set(&bulk_dev->rx_req_busy, 1);
   650		xfer = (req->actual < count) ? req->actual : count;
   651	
   652		if (copy_to_user(buf, req->buf, xfer))
   653			r = -EFAULT;
   654	
   655		atomic_set(&bulk_dev->rx_req_busy, 0);
   656		if (!atomic_read(&ccidg->online)) {
   657			ccidg_request_free(bulk_dev->rx_req, ccidg->out);
   658			return -ENODEV;
   659		}
   660	
   661		return xfer;
   662	}
   663	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49294 bytes --]

^ permalink raw reply

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-29  1:24 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <45d70c88-e9f5-716a-ee9a-33dc111159cc@redhat.com>

On 05/28/2018 09:12 PM, Waiman Long wrote:
> On 05/24/2018 06:28 AM, Juri Lelli wrote:
>> On 17/05/18 16:55, Waiman Long wrote:
>>
>> [...]
>>
>>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>>>  	 * passing doms with offlined cpu to partition_sched_domains().
>>>  	 * Anyways, hotplug work item will rebuild sched domains.
>>>  	 */
>>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>> +	if (!top_cpuset.isolation_count &&
>>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>> +		goto out;
>>> +
>>> +	if (top_cpuset.isolation_count &&
>>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>>>  		goto out;
>> Do we cover the case in which hotplug removed one of the isolated cpus
>> from cpu_active_mask?
> Yes, you are right. That is the remnant of my original patch that allows
> only one isolated_cpus at root. Thanks for spotting that.

I am sorry. I would like to take it back my previous comment. The code
above looks for inconsistency in the state of the effective_cpus mask to
find out if it is racing with a hotplug event. If it is, we can skip the
domain generation as the hotplug event will do that too. The checks are
still valid with the current patchset. So I don't think we need to make
any change here.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: kbuild test robot @ 2018-05-29  2:57 UTC (permalink / raw)
  To: Marcus Folkesson
  Cc: kbuild-all, Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi,
	davem, Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart, linux-usb,
	linux-doc, linux-kernel, Marcus Folkesson
In-Reply-To: <20180526211940.25474-1-marcus.folkesson@gmail.com>

Hi Marcus,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on balbi-usb/next]
[also build test WARNING on v4.17-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Marcus-Folkesson/usb-gadget-ccid-add-support-for-USB-CCID-Gadget-Device/20180529-014427
base:   https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git next
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/usb/gadget/function/f_ccid.c:423:43: sparse: incorrect type in assignment (different base types) @@    expected unsigned int [unsigned] [usertype] <noident> @@    got ed int [unsigned] [usertype] <noident> @@
   drivers/usb/gadget/function/f_ccid.c:423:43:    expected unsigned int [unsigned] [usertype] <noident>
   drivers/usb/gadget/function/f_ccid.c:423:43:    got restricted __le32 [usertype] <noident>
>> drivers/usb/gadget/function/f_ccid.c:424:31: sparse: expression using sizeof(void)
   drivers/usb/gadget/function/f_ccid.c:429:43: sparse: incorrect type in assignment (different base types) @@    expected unsigned int [unsigned] [usertype] <noident> @@    got ed int [unsigned] [usertype] <noident> @@
   drivers/usb/gadget/function/f_ccid.c:429:43:    expected unsigned int [unsigned] [usertype] <noident>
   drivers/usb/gadget/function/f_ccid.c:429:43:    got restricted __le32 [usertype] <noident>
   drivers/usb/gadget/function/f_ccid.c:430:31: sparse: expression using sizeof(void)
>> drivers/usb/gadget/function/f_ccid.c:481:5: sparse: symbol 'ccidg_start_ep' was not declared. Should it be static?
>> drivers/usb/gadget/function/f_ccid.c:669:35: sparse: Using plain integer as NULL pointer
>> drivers/usb/gadget/function/f_ccid.c:831:41: sparse: incorrect type in assignment (different base types) @@    expected unsigned int static [unsigned] [addressable] [toplevel] [usertype] dwFeatures @@    got sable] [toplevel] [usertype] dwFeatures @@
   drivers/usb/gadget/function/f_ccid.c:831:41:    expected unsigned int static [unsigned] [addressable] [toplevel] [usertype] dwFeatures
   drivers/usb/gadget/function/f_ccid.c:831:41:    got restricted __le32 [usertype] <noident>
>> drivers/usb/gadget/function/f_ccid.c:833:41: sparse: incorrect type in assignment (different base types) @@    expected unsigned short static [unsigned] [addressable] [toplevel] [assigned] [usertype] wLcdLayout @@    got level] [assigned] [usertype] wLcdLayout @@
   drivers/usb/gadget/function/f_ccid.c:833:41:    expected unsigned short static [unsigned] [addressable] [toplevel] [assigned] [usertype] wLcdLayout
   drivers/usb/gadget/function/f_ccid.c:833:41:    got restricted __le16 [usertype] <noident>
>> drivers/usb/gadget/function/f_ccid.c:835:41: sparse: incorrect type in assignment (different base types) @@    expected unsigned int static [unsigned] [addressable] [toplevel] [assigned] [usertype] dwProtocols @@    got level] [assigned] [usertype] dwProtocols @@
   drivers/usb/gadget/function/f_ccid.c:835:41:    expected unsigned int static [unsigned] [addressable] [toplevel] [assigned] [usertype] dwProtocols
   drivers/usb/gadget/function/f_ccid.c:835:41:    got restricted __le32 [usertype] <noident>
   In file included from include/linux/printk.h:332:0,
                    from include/linux/kernel.h:14,
                    from include/linux/list.h:9,
                    from include/linux/kobject.h:19,
                    from include/linux/cdev.h:5,
                    from drivers/usb/gadget/function/f_ccid.c:8:
   drivers/usb/gadget/function/f_ccid.c: In function 'ccidg_bulk_read':
   drivers/usb/gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %in",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, 32-                                       ^~~
   include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb/gadget/function/f_ccid.c:612:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too small buffer size. %i provided, need at least %in",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, 44-                                       ^~~
   include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:611:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb/gadget/function/f_ccid.c: In function 'ccidg_bulk_write':
   drivers/usb/gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %in",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, 57-                                       ^~~
   include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~
   drivers/usb/gadget/function/f_ccid.c:683:5: warning: format '%i' expects argument of type 'int', but argument 5 has type 'long unsigned int' [-Wformat=]
        "ccid: too much data. %i provided, but we can only handle %in",
        ^
   include/linux/dynamic_debug.h:135:39: note: in definition of macro 'dynamic_dev_dbg'
      __dynamic_dev_dbg(&descriptor, dev, fmt, 69-                                       ^~~
   include/linux/usb/composite.h:632:2: note: in expansion of macro 'dev_dbg'
     dev_dbg(&(d)->gadget->dev , fmt , ## args)
     ^~~~~~~
   drivers/usb/gadget/function/f_ccid.c:682:3: note: in expansion of macro 'DBG'
      DBG(ccidg->function.config->cdev,
      ^~~

Please review and possibly fold the followup patch.

vim +423 drivers/usb/gadget/function/f_ccid.c

   403	
   404	static int ccidg_function_setup(struct usb_function *f,
   405			const struct usb_ctrlrequest *ctrl)
   406	{
   407		struct f_ccidg *ccidg = container_of(f, struct f_ccidg, function);
   408		struct usb_composite_dev *cdev	= f->config->cdev;
   409		struct usb_request *req		= cdev->req;
   410		int ret				= -EOPNOTSUPP;
   411		u16 w_index			= le16_to_cpu(ctrl->wIndex);
   412		u16 w_value			= le16_to_cpu(ctrl->wValue);
   413		u16 w_length			= le16_to_cpu(ctrl->wLength);
   414	
   415		if (!atomic_read(&ccidg->online))
   416			return -ENOTCONN;
   417	
   418		switch (ctrl->bRequestType & USB_TYPE_MASK) {
   419		case USB_TYPE_CLASS:
   420			{
   421			switch (ctrl->bRequest) {
   422			case CCIDGENERICREQ_GET_CLOCK_FREQUENCIES:
 > 423				*(u32 *) req->buf = cpu_to_le32(ccid_class_desc.dwDefaultClock);
 > 424				ret = min_t(u32, w_length,
   425						sizeof(ccid_class_desc.dwDefaultClock));
   426				break;
   427	
   428			case CCIDGENERICREQ_GET_DATA_RATES:
 > 429				*(u32 *) req->buf = cpu_to_le32(ccid_class_desc.dwDataRate);
 > 430				ret = min_t(u32, w_length, sizeof(ccid_class_desc.dwDataRate));
   431				break;
   432	
   433			default:
   434				VDBG(f->config->cdev,
   435					"ccid: invalid control req%02x.%02x v%04x i%04x l%d\n",
   436					ctrl->bRequestType, ctrl->bRequest,
   437					w_value, w_index, w_length);
   438			}
   439			}
   440		}
   441	
   442		/* responded with data transfer or status phase? */
   443		if (ret >= 0) {
   444			VDBG(f->config->cdev, "ccid: req%02x.%02x v%04x i%04x l%d\n",
   445				ctrl->bRequestType, ctrl->bRequest,
   446				w_value, w_index, w_length);
   447	
   448			req->length = ret;
   449			ret = usb_ep_queue(cdev->gadget->ep0, req, GFP_ATOMIC);
   450			if (ret < 0)
   451				ERROR(f->config->cdev,
   452					"ccid: ep0 enqueue err %d\n", ret);
   453		}
   454	
   455		return ret;
   456	}
   457	
   458	static void ccidg_function_disable(struct usb_function *f)
   459	{
   460		struct f_ccidg *ccidg = func_to_ccidg(f);
   461		struct ccidg_bulk_dev *bulk_dev = &ccidg->bulk_dev;
   462		struct usb_request *req;
   463	
   464		/* Disable endpoints */
   465		usb_ep_disable(ccidg->in);
   466		usb_ep_disable(ccidg->out);
   467	
   468		/* Free endpoint related requests */
   469		if (!atomic_read(&bulk_dev->rx_req_busy))
   470			ccidg_request_free(bulk_dev->rx_req, ccidg->out);
   471		while ((req = ccidg_req_get(ccidg, &bulk_dev->tx_idle)))
   472			ccidg_request_free(req, ccidg->in);
   473	
   474		atomic_set(&ccidg->online, 0);
   475	
   476		/* Wake up threads */
   477		wake_up(&bulk_dev->write_wq);
   478		wake_up(&bulk_dev->read_wq);
   479	}
   480	
 > 481	int ccidg_start_ep(struct f_ccidg *ccidg, struct usb_function *f,
   482				struct usb_ep *ep)
   483	{
   484		struct usb_composite_dev *cdev = f->config->cdev;
   485		int ret;
   486	
   487		usb_ep_disable(ep);
   488	
   489		ret = config_ep_by_speed(cdev->gadget, f, ep);
   490		if (ret) {
   491			ERROR(cdev, "ccid: can't configure %s: %d\n", ep->name, ret);
   492			return ret;
   493		}
   494	
   495		ret = usb_ep_enable(ep);
   496		if (ret) {
   497			ERROR(cdev, "ccid: can't start %s: %d\n", ep->name, ret);
   498			return ret;
   499		}
   500	
   501		ep->driver_data = ccidg;
   502	
   503		return ret;
   504	}
   505	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [RFC PATCH] usb: gadget: ccid: ccidg_start_ep() can be static
From: kbuild test robot @ 2018-05-29  2:57 UTC (permalink / raw)
  To: Marcus Folkesson
  Cc: kbuild-all, Greg Kroah-Hartman, Jonathan Corbet, Felipe Balbi,
	davem, Mauro Carvalho Chehab, Andrew Morton, Randy Dunlap,
	Ruslan Bilovol, Thomas Gleixner, Kate Stewart, linux-usb,
	linux-doc, linux-kernel, Marcus Folkesson
In-Reply-To: <20180526211940.25474-1-marcus.folkesson@gmail.com>


Fixes: 0760b8dd303b ("usb: gadget: ccid: add support for USB CCID Gadget Device")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
---
 f_ccid.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/f_ccid.c b/drivers/usb/gadget/function/f_ccid.c
index 9ff8615..35b392c 100644
--- a/drivers/usb/gadget/function/f_ccid.c
+++ b/drivers/usb/gadget/function/f_ccid.c
@@ -478,7 +478,7 @@ static void ccidg_function_disable(struct usb_function *f)
 	wake_up(&bulk_dev->read_wq);
 }
 
-int ccidg_start_ep(struct f_ccidg *ccidg, struct usb_function *f,
+static int ccidg_start_ep(struct f_ccidg *ccidg, struct usb_function *f,
 			struct usb_ep *ep)
 {
 	struct usb_composite_dev *cdev = f->config->cdev;
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Juri Lelli @ 2018-05-29  6:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <8e610b98-970c-a309-5821-fc8e6aca892f@redhat.com>

On 28/05/18 21:24, Waiman Long wrote:
> On 05/28/2018 09:12 PM, Waiman Long wrote:
> > On 05/24/2018 06:28 AM, Juri Lelli wrote:
> >> On 17/05/18 16:55, Waiman Long wrote:
> >>
> >> [...]
> >>
> >>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
> >>>  	 * passing doms with offlined cpu to partition_sched_domains().
> >>>  	 * Anyways, hotplug work item will rebuild sched domains.
> >>>  	 */
> >>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> >>> +	if (!top_cpuset.isolation_count &&
> >>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> >>> +		goto out;
> >>> +
> >>> +	if (top_cpuset.isolation_count &&
> >>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
> >>>  		goto out;
> >> Do we cover the case in which hotplug removed one of the isolated cpus
> >> from cpu_active_mask?
> > Yes, you are right. That is the remnant of my original patch that allows
> > only one isolated_cpus at root. Thanks for spotting that.
> 
> I am sorry. I would like to take it back my previous comment. The code
> above looks for inconsistency in the state of the effective_cpus mask to
> find out if it is racing with a hotplug event. If it is, we can skip the
> domain generation as the hotplug event will do that too. The checks are
> still valid with the current patchset. So I don't think we need to make
> any change here.

Yes, these checks are valid, but don't we also need to check for hotplug
races w.r.t. isolated CPUs (of some other sub domain)?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Jesper Dangaard Brouer @ 2018-05-29  8:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eugene Syromiatnikov, netdev, linux-kernel, linux-doc, Kees Cook,
	Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
	Jonathan Corbet, Jiri Olsa, brouer
In-Reply-To: <20180525194359.qckwj3tnsbufyapz@ast-mbp>

On Fri, 25 May 2018 12:44:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Fri, May 25, 2018 at 06:50:09PM +0200, Eugene Syromiatnikov wrote:
> > On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:  
> > > On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:  
> > > > On Wed, 23 May 2018 15:02:45 -0700
> > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >   
> > > > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:  
> > > > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > > > system boot/init stages these sysctls are not yet configured.
> > > > > > A concrete example is systemd, that has implemented loading of BPF
> > > > > > programs.
> > > > > > 
> > > > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > > > adds the ability to change the default setting of these sysctl knobs
> > > > > > as well as option to override them via a boot-time kernel parameter
> > > > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > > > defaults arises).
> > > > > > 
> > > > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.    
> > > > > 
> > > > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > > > >   so disabling unpriv during boot time makes no difference to systemd.
> > > > >   what is the actual reason to present time?  
> > systemd also runs a lot of code, some of which is unprivileged.  
> 
> systemd processes sysctl configs first. It's essential for system
> security to do so. If you have concerns in how systemd does that
> please bring it up with systemd folks.
> 
> > > > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > > > >   networking. With unpriv disable during boot, it will force systemd
> > > > >   to do such networking from root, which will lower its security barrier.  
> > No, it will force systemd not to use SO_REUSEPORT BPF.  
> 
> sorry this argument makes no sense to me.
> 
> > > > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > > > >   Flipping it during the boot or right after or any time after
> > > > >   is the same thing. Why add such boot flag then?  
> > Well, that one was for completeness.  
> 
> Should we convert all sysctls to bootparams for 'completeness' ?

The bpf_jit_kallsyms option is not there for 'completeness', it is
there because I know Daniel talked about changing the default, and this
would provide an easy option for allowing changing this default (to on)
in different distros before flipping it on in some kernel release.


> > > > > - jit_harden can be turned on by systemd. so turning it during the boot
> > > > >   will make systemd progs to be constant blinded.
> > > > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > > > >   Are you worried that systemd will attack the kernel with JIT spraying?  
> > I'm worried that systemd can be exploited for a JIT spraying attack.  
> 
> I'm afraid we're not on the same page with definition of 'JIT spraying attack'.
> 
> > Another thing I'm concerned with is that the generated code is different,
> > which introduces additional complication during debugging.  
> 
> specifically what kind of complication?
> 

I think kernel.unprivileged_bpf_disable is the best example, why we
need this facility.  Once kernel.unprivileged_bpf_disable is set to 1,
it cannot be turned back to 0.

It is a policy decision that I want unprivileged_bpf_disable=1.
Setting this policy in the sysctl step is too late, because as you
write above, you expect that e.g. systemd or other boot-scripts
want/can load bpf-progs in unpriv mode.  Thus, they can violate my
policy choice.

This also leads to the before mentioned inconsistency issue. The
init-script/systemd-service will during boot successfully load unpriv
bpf-prog, but when I afterwards when I reload (or stop/start) the
service, it will fail as it no longer can load the unpriv bpf program.

If a distribution makes this policy choice, then users cannot change it
back to unprivileged_bpf_disable=0 via sysctl.  Thus, we need a boot
param to allow users to change this policy.  If the disto didn't change
this default, then we still need the boot-param, to allow users to
enforce this policy choice towards init-scripts (as explained above).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] docs/vm: move ksm and transhuge from "user" to "internals" section.
From: Mike Rapoport @ 2018-05-29 10:13 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-mm, Mike Rapoport

After the userspace interface description for KSM and THP was split to
Documentation/admin-guide/mm, the remaining parts belong to the section
describing MM internals.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/vm/index.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index 8e1cc66..c4ded22 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -13,8 +13,6 @@ various features of the Linux memory management
 .. toctree::
    :maxdepth: 1
 
-   ksm
-   transhuge
    swap_numa
    zswap
 
@@ -36,6 +34,7 @@ descriptions of data structures and algorithms.
    hmm
    hwpoison
    hugetlbfs_reserv
+   ksm
    mmu_notifier
    numa
    overcommit-accounting
@@ -45,6 +44,7 @@ descriptions of data structures and algorithms.
    remap_file_pages
    slub
    split_page_table_lock
+   transhuge
    unevictable-lru
    z3fold
    zsmalloc
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] docs/admin-guide/mm: add high level concepts overview
From: Mike Rapoport @ 2018-05-29 11:37 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-mm, linux-kernel

Hi,

The below patch describes some of the concepts in Linux mm. It does not aim
to provide in-depth description of the mm internals, but rather help
unprepared reader to understand cryptic texts, e.g.
Documentation/sysctl/vm.txt.

I covered what seemed to me the essential minimum that is required for
user/administrator to read the existing docs without searching the web for
the explanations for every other term.

--
Sincerely yours,
Mike.

From 2d3ec7ea101a66b1535d5bec4acfc1e0f737fd53 Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.vnet.ibm.com>
Date: Tue, 29 May 2018 14:12:39 +0300
Subject: [PATCH] docs/admin-guide/mm: add high level concepts overview

The are terms that seem obvious to the mm developers, but may be somewhat
obscure for, say, less involved readers.

The concepts overview can be seen as an "extended glossary" that introduces
such terms to the readers of the kernel documentation.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
---
 Documentation/admin-guide/mm/concepts.rst | 222 ++++++++++++++++++++++++++++++
 Documentation/admin-guide/mm/index.rst    |   5 +
 2 files changed, 227 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/concepts.rst

diff --git a/Documentation/admin-guide/mm/concepts.rst b/Documentation/admin-guide/mm/concepts.rst
new file mode 100644
index 0000000..291699c
--- /dev/null
+++ b/Documentation/admin-guide/mm/concepts.rst
@@ -0,0 +1,222 @@
+.. _mm_concepts:
+
+=================
+Concepts overview
+=================
+
+The memory management in Linux is complex system that evolved over the
+years and included more and more functionality to support variety of
+systems from MMU-less microcontrollers to supercomputers. The memory
+management for systems without MMU is called ``nommu`` and it
+definitely deserves a dedicated document, which hopefully will be
+eventually written. Yet, although some of the concepts are the same,
+here we assume that MMU is available and CPU can translate a virtual
+address to a physical address.
+
+.. contents:: :local:
+
+Virtual Memory Primer
+=====================
+
+The physical memory in a computer system is a limited resource and
+even for systems that support memory hotplug there is a hard limit on
+the amount of memory that can be installed. The physical memory is not
+necessary contiguous, it might be accessible as a set of distinct
+address ranges. Besides, different CPU architectures, and even
+different implementations of the same architecture have different view
+how these address ranges defined.
+
+All this makes dealing directly with physical memory quite complex and
+to avoid this complexity a concept of virtual memory was developed.
+
+The virtual memory abstracts the details of physical memory from the
+application software, allows to keep only needed information in the
+physical memory (demand paging) and provides a mechanism for the
+protection and controlled sharing of data between processes.
+
+With virtual memory, each and every memory access uses a virtual
+address. When the CPU decodes the an instruction that reads (or
+writes) from (or to) the system memory, it translates the `virtual`
+address encoded in that instruction to a `physical` address that the
+memory controller can understand.
+
+The physical system memory is divided into page frames, or pages. The
+size of each page is architecture specific. Some architectures allow
+selection of the page size from several supported values; this
+selection is performed at the kernel build time by setting an
+appropriate kernel configuration option.
+
+Each physical memory page can be mapped as one or more virtual
+pages. These mappings are described by page tables that allow
+translation from virtual address used by programs to real address in
+the physical memory. The page tables organized hierarchically.
+
+The tables at the lowest level of the hierarchy contain physical
+addresses of actual pages used by the software. The tables at higher
+levels contain physical addresses of the pages belonging to the lower
+levels. The pointer to the top level page table resides in a
+register. When the CPU performs the address translation, it uses this
+register to access the top level page table. The high bits of the
+virtual address are used to index an entry in the top level page
+table. That entry is then used to access the next level in the
+hierarchy with the next bits of the virtual address as the index to
+that level page table. The lowest bits in the virtual address define
+the offset inside the actual page.
+
+Huge Pages
+==========
+
+The address translation requires several memory accesses and memory
+accesses are slow relatively to CPU speed. To avoid spending precious
+processor cycles on the address translation, CPUs maintain a cache of
+such translations called Translation Lookaside Buffer (or
+TLB). Usually TLB is pretty scarce resource and applications with
+large memory working set will experience performance hit because of
+TLB misses.
+
+Many modern CPU architectures allow mapping of the memory pages
+directly by the higher levels in the page table. For instance, on x86,
+it is possible to map 2M and even 1G pages using entries in the second
+and the third level page tables. In Linux such pages are called
+`huge`. Usage of huge pages significantly reduces pressure on TLB,
+improves TLB hit-rate and thus improves overall system performance.
+
+There are two mechanisms in Linux that enable mapping of the physical
+memory with the huge pages. The first one is `HugeTLB filesystem`, or
+hugetlbfs. It is a pseudo filesystem that uses RAM as its backing
+store. For the files created in this filesystem the data resides in
+the memory and mapped using huge pages. The hugetlbfs is described at
+:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`.
+
+Another, more recent, mechanism that enables use of the huge pages is
+called `Transparent HugePages`, or THP. Unlike the hugetlbfs that
+requires users and/or system administrators to configure what parts of
+the system memory should and can be mapped by the huge pages, THP
+manages such mappings transparently to the user and hence the
+name. See
+:ref:`Documentation/admin-guide/mm/transhuge.rst <admin_guide_transhuge>`
+for more details about THP.
+
+Zones
+=====
+
+Often hardware poses restrictions on how different physical memory
+ranges can be accessed. In some cases, devices cannot perform DMA to
+all the addressable memory. In other cases, the size of the physical
+memory exceeds the maximal addressable size of virtual memory and
+special actions are required to access portions of the memory. Linux
+groups memory pages into `zones` according to their possible
+usage. For example, ZONE_DMA will contain memory that can be used by
+devices for DMA, ZONE_HIGHMEM will contain memory that is not
+permanently mapped into kernel's address space and ZONE_NORMAL will
+contain normally addressed pages.
+
+The actual layout of the memory zones is hardware dependent as not all
+architectures define all zones, and requirements for DMA are different
+for different platforms.
+
+Nodes
+=====
+
+Many multi-processor machines are NUMA - Non-Uniform Memory Access -
+systems. In such systems the memory is arranged into banks that have
+different access latency depending on the "distance" from the
+processor. Each bank is referred as `node` and for each node Linux
+constructs an independent memory management subsystem. A node has it's
+own set of zones, lists of free and used pages and various statistics
+counters. You can find more details about NUMA in
+:ref:`Documentation/vm/numa.rst <numa>` and in
+:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`.
+
+Page cache
+==========
+
+The physical memory is volatile and the common case for getting data
+into the memory is to read it from files. Whenever a file is read, the
+data is put into the `page cache` to avoid expensive disk access on
+the subsequent reads. Similarly, when one writes to a file, the data
+is placed in the page cache and eventually gets into the backing
+storage device. The written pages are marked as `dirty` and when Linux
+decides to reuse them for other purposes, it makes sure to synchronize
+the file contents on the device with the updated data.
+
+Anonymous Memory
+================
+
+The `anonymous memory` or `anonymous mappings` represent memory that
+is not backed by a filesystem. Such mappings are implicitly created
+for program's stack and heap or by explicit calls to mmap(2) system
+call. Usually, the anonymous mappings only define virtual memory areas
+that the program is allowed to access. The read accesses will result
+in creation of a page table entry that references a special physical
+page filled with zeroes. When the program performs a write, regular
+physical page will be allocated to hold the written data. The page
+will be marked dirty and if the kernel will decide to repurpose it,
+the dirty page will be swapped out.
+
+Reclaim
+=======
+
+Throughout the system lifetime, a physical page can be used for storing
+different types of data. It can be kernel internal data structures,
+DMA'able buffers for device drivers use, data read from a filesystem,
+memory allocated by user space processes etc.
+
+Depending on the page usage it is treated differently by the Linux
+memory management. The pages that can be freed at any time, either
+because they cache the data available elsewhere, for instance, on a
+hard disk, or because they can be swapped out, again, to the hard
+disk, are called `reclaimable`. The most notable categories of the
+reclaimable pages are page cache and anonymous memory.
+
+In most cases, the pages holding internal kernel data and used as DMA
+buffers cannot be repurposed, and they remain pinned until freed by
+their user. Such pages are called `unreclaimable`. However, in certain
+circumstances, even pages occupied with kernel data structures can be
+reclaimed. For instance, in-memory caches of filesystem metadata can
+be re-read from the storage device and therefore it is possible to
+discard them from the main memory when system is under memory
+pressure.
+
+The process of freeing the reclaimable physical memory pages and
+repurposing them is called (surprise!) `reclaim`. Linux can reclaim
+pages either asynchronously or synchronously, depending on the state
+of the system. When system is not loaded, most of the memory is free
+and allocation request will be satisfied immediately from the free
+pages supply. As the load increases, the amount of the free pages goes
+down and when it reaches a certain threshold (high watermark), an
+allocation request will awaken the ``kswapd`` daemon. It will
+asynchronously scan memory pages and either just free them if the data
+they contain is available elsewhere, or evict to the backing storage
+device (remember those dirty pages?). As memory usage increases even
+more and reaches another threshold - min watermark - an allocation
+will trigger the `direct reclaim`. In this case allocation is stalled
+until enough memory pages are reclaimed to satisfy the request.
+
+Compaction
+==========
+
+As the system runs, tasks allocate and free the memory and it becomes
+fragmented. Although with virtual memory it is possible to present
+scattered physical pages as virtually contiguous range, sometimes it is
+necessary to allocate large physically contiguous memory areas. Such
+need may arise, for instance, when a device driver requires large
+buffer for DMA, or when THP allocates a huge page. Memory `compaction`
+addresses the fragmentation issue. This mechanism moves occupied pages
+from the lower part of a memory zone to free pages in the upper part
+of the zone. When a compaction scan is finished free pages are grouped
+together at the beginning of the zone and allocations of large
+physically contiguous areas become possible.
+
+Like reclaim, the compaction may happen asynchronously in ``kcompactd``
+daemon or synchronously as a result of memory allocation request.
+
+OOM killer
+==========
+
+It may happen, that on a loaded machine memory will be exhausted. When
+the kernel detects that the system runs out of memory (OOM) it invokes
+`OOM killer`. Its mission is simple: all it has to do is to select a
+task to sacrifice for the sake of the overall system health. The
+selected task is killed in a hope that after it exits enough memory
+will be freed to continue normal operation.
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 8454be6..ceead68 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -15,12 +15,17 @@ are described in Documentation/sysctl/vm.txt and in `man 5 proc`_.
 
 .. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
 
+Linux memory management has its own jargon and if you are not yet
+familiar with it, consider reading
+:ref:`Documentation/admin-guide/mm/concepts.rst <mm_concepts>`.
+
 Here we document in detail how to interact with various mechanisms in
 the Linux memory management.
 
 .. toctree::
    :maxdepth: 1
 
+   concepts
    hugetlbpage
    idle_page_tracking
    ksm
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] docs/vm: move ksm and transhuge from "user" to "internals" section.
From: Jonathan Corbet @ 2018-05-29 12:12 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-doc, linux-mm
In-Reply-To: <1527588818-7031-1-git-send-email-rppt@linux.vnet.ibm.com>

On Tue, 29 May 2018 13:13:38 +0300
Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> After the userspace interface description for KSM and THP was split to
> Documentation/admin-guide/mm, the remaining parts belong to the section
> describing MM internals.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

Applied, thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] docs/admin-guide/mm: add high level concepts overview
From: Justin Skists @ 2018-05-29 12:18 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport; +Cc: linux-doc, linux-mm, linux-kernel
In-Reply-To: <20180529113725.GB13092@rapoport-lnx>


> On 29 May 2018 at 12:37 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> 

> +=================
> +Concepts overview
> +=================
> +
> +The memory management in Linux is complex system that evolved over the
> +years and included more and more functionality to support variety of
> +systems from MMU-less microcontrollers to supercomputers. The memory
> +management for systems without MMU is called ``nommu`` and it
> +definitely deserves a dedicated document, which hopefully will be
> +eventually written. Yet, although some of the concepts are the same,
> +here we assume that MMU is available and CPU can translate a virtual
> +address to a physical address.
> +
> +.. contents:: :local:
> +
> +Virtual Memory Primer
> +=====================
> +
> +The physical memory in a computer system is a limited resource and
> +even for systems that support memory hotplug there is a hard limit on
> +the amount of memory that can be installed. The physical memory is not
> +necessary contiguous, it might be accessible as a set of distinct
> +address ranges. Besides, different CPU architectures, and even
> +different implementations of the same architecture have different view
> +how these address ranges defined.
> +
> +All this makes dealing directly with physical memory quite complex and
> +to avoid this complexity a concept of virtual memory was developed.
> +
> +The virtual memory abstracts the details of physical memory from the
> +application software, allows to keep only needed information in the
> +physical memory (demand paging) and provides a mechanism for the
> +protection and controlled sharing of data between processes.
> +
> +With virtual memory, each and every memory access uses a virtual
> +address. When the CPU decodes the an instruction that reads (or
> +writes) from (or to) the system memory, it translates the `virtual`
> +address encoded in that instruction to a `physical` address that the
> +memory controller can understand.

I spotted an errant "the an" in that paragraph.

I would rewrite that sentence as "When the CPU decodes an instruction that
reads from (or writes to) the system memory," ...

The rest of the document looks good to me, and a nice overview.


Best regards,
Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] docs/admin-guide/mm: add high level concepts overview
From: Jonathan Corbet @ 2018-05-29 12:21 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-doc, linux-mm, linux-kernel
In-Reply-To: <20180529113725.GB13092@rapoport-lnx>

On Tue, 29 May 2018 14:37:25 +0300
Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> The are terms that seem obvious to the mm developers, but may be somewhat
> obscure for, say, less involved readers.
> 
> The concepts overview can be seen as an "extended glossary" that introduces
> such terms to the readers of the kernel documentation.

So as I read through this I thought of all kinds of ways it could be
improved, but I suspect that will always be the case.  It's a good intro
as-is, so I've applied it.  Thanks!

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus
From: Waiman Long @ 2018-05-29 12:40 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin
In-Reply-To: <20180529062703.GA8985@localhost.localdomain>

On 05/29/2018 02:27 AM, Juri Lelli wrote:
> On 28/05/18 21:24, Waiman Long wrote:
>> On 05/28/2018 09:12 PM, Waiman Long wrote:
>>> On 05/24/2018 06:28 AM, Juri Lelli wrote:
>>>> On 17/05/18 16:55, Waiman Long wrote:
>>>>
>>>> [...]
>>>>
>>>>> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>>>>>  	 * passing doms with offlined cpu to partition_sched_domains().
>>>>>  	 * Anyways, hotplug work item will rebuild sched domains.
>>>>>  	 */
>>>>> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>> +	if (!top_cpuset.isolation_count &&
>>>>> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>> +		goto out;
>>>>> +
>>>>> +	if (top_cpuset.isolation_count &&
>>>>> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>>>>>  		goto out;
>>>> Do we cover the case in which hotplug removed one of the isolated cpus
>>>> from cpu_active_mask?
>>> Yes, you are right. That is the remnant of my original patch that allows
>>> only one isolated_cpus at root. Thanks for spotting that.
>> I am sorry. I would like to take it back my previous comment. The code
>> above looks for inconsistency in the state of the effective_cpus mask to
>> find out if it is racing with a hotplug event. If it is, we can skip the
>> domain generation as the hotplug event will do that too. The checks are
>> still valid with the current patchset. So I don't think we need to make
>> any change here.
> Yes, these checks are valid, but don't we also need to check for hotplug
> races w.r.t. isolated CPUs (of some other sub domain)?

It is not actually a race. Both the hotplug event and any changes to cpu
lists or flags are serialized by the cpuset_mutex. It is just that we
may be doing the same work twice that we are wasting cpu cycles. So we
are doing a quick check to avoid this. The check isn't exhaustive and we
can certainly miss some cases. Doing a more throughout check may need as
much time as doing the sched domain generation itself and so you are
actually wasting more CPU cycles on average as the chance of a hotplug
event is very low.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox