Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Fenghua Yu <fenghua.yu@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	"H. Peter Anvin" <h.peter.anvin@intel.com>,
	Ingo Molnar <mingo@elte.hu>, Tony Luck <tony.luck@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Stephane Eranian <eranian@google.com>,
	Borislav Petkov <bp@suse.de>, Dave Hansen <dave.hansen@intel.com>,
	Nilay Vaish <nilayvaish@gmail.com>, Shaohua Li <shli@fb.com>,
	David Carrillo-Cisneros <davidcc@google.com>,
	Ravi V Shankar <ravi.v.shankar@intel.com>,
	Sai Prakhya <sai.praneeth.prakhya@intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>
Subject: Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system
Date: Fri, 28 Oct 2016 10:51:30 -0700	[thread overview]
Message-ID: <20161028175130.GB24456@linux.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1610261557130.4983@nanos>

On Wed, Oct 26, 2016 at 05:01:32PM +0200, Thomas Gleixner wrote:
> On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > +/*
> > + * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
> > + * we can keep a bitmap of free CLOSIDs in a single integer.
> > + *
> > + * Using a global CLOSID across all resources has some advantages and
> > + * some drawbacks:
> > + * + We can simply set "current->closid" to assign a task to a resource
> > + *   group.
> > + * + Context switch code can avoid extra memory references deciding which
> > + *   CLOSID to load into the PQR_ASSOC MSR
> > + * - We give up some options in configuring resource groups across multi-socket
> > + *   systems.
> > + * - Our choices on how to configure each resource become progressively more
> > + *   limited as the number of resources grows.
> > + */
> > +static int closid_free_map;
> > +
> > +static void closid_init(void)
> > +{
> > +	struct rdt_resource *r;
> > +	int rdt_max_closid;
> > +
> > +	/* Compute rdt_max_closid across all resources */
> > +	rdt_max_closid = 0;
> > +	for_each_enabled_rdt_resource(r)
> > +		rdt_max_closid = max(rdt_max_closid, r->num_closid);
> 
> So you decided to silently ignore my objections against this approach. Fine
> with me, but that does not solve the problem at all.
> 
> Once more:
> 
> On a system with L2 and L3 CAT it does not make any sense at all to expose
> the closids which exceed the L2 space. Simply because using them wreckages
> any L2 partitioning done in the valid L2 space.
> 
> If you really want to allow that, then:
> 
>    1) It must be a opt-in at mount time
> 
>    2) It must be documented clearly along with the mount option
> 
> > +	/*
> > +	 * CDP is "special". Because we share the L3 CBM MSR array
> > +	 * between L3DATA and L3CODE, we must not use a CLOSID larger
> > +	 * than they support. Just check against L3DATA because it
> > +	 * is the same as L3CODE.
> > +	 */
> > +	r = &rdt_resources_all[RDT_RESOURCE_L3DATA];
> > +	if (r->enabled)
> > +		rdt_max_closid = min(rdt_max_closid, r->num_closid);
> 
> This explicit special casing is crap, really.
> 
> 	for_each_enabled_rdt_resource(r)
> 		rdt_max_closid = max(rdt_max_closid, r->num_closid);
> 
> 	for_each_enabled_rdt_resource(r) {
> 		if (!relaxed_max_closid || r->force_min_closid)
> 			rdt_max_closid = min(rdt_max_closid, r->num_closid);
> 	}
> 
> Handles all cases without 'CDP is special' and whatever nonsense intel will
> come up with in future. All you need to do is to add that force_min_closid
> field into the resource struct and set it for l3data and l3code.
> 
> relaxed_max_closid is set at mount time by an appropriate mount option.

Can we just do a simple implementation that finds the minimal closid? We
can implement the maximum closid and mount parameter later?

AFAIK, the minimal closid works in all current situations (L3, L3DATA,
L3CODE, and L2) and there is no platform that needs to use max closid yet.

This is the updated patch that only change is to use minimal closid in
closid_init(). Does it look good?

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 237 +++++++++++++++++++++++++++++++
 2 files changed, 246 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 39ed561..a6c7d94 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -12,13 +12,20 @@
  * @kn:				kernfs node
  * @rdtgroup_list:		linked list for all rdtgroups
  * @closid:			closid for this rdtgroup
+ * @flags:			status bits
+ * @waitcount:			how many cpus expect to find this
  */
 struct rdtgroup {
 	struct kernfs_node	*kn;
 	struct list_head	rdtgroup_list;
 	int			closid;
+	int			flags;
+	atomic_t		waitcount;
 };
 
+/* rdtgroup.flags */
+#define	RDT_DELETED		1
+
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
@@ -154,4 +161,6 @@ union cpuid_0x10_1_edx {
 };
 
 void rdt_cbm_update(void *arg);
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+void rdtgroup_kn_unlock(struct kernfs_node *kn);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index ebab170..296ee23 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -26,10 +26,12 @@
 #include <linux/seq_file.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 #include <uapi/linux/magic.h>
 
 #include <asm/intel_rdt.h>
+#include <asm/intel_rdt_common.h>
 
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 struct kernfs_root *rdt_root;
@@ -39,6 +41,61 @@ LIST_HEAD(rdt_all_groups);
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
+/*
+ * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
+ * we can keep a bitmap of free CLOSIDs in a single integer.
+ *
+ * Using a global CLOSID across all resources has some advantages and
+ * some drawbacks:
+ * + We can simply set "current->closid" to assign a task to a resource
+ *   group.
+ * + Context switch code can avoid extra memory references deciding which
+ *   CLOSID to load into the PQR_ASSOC MSR
+ * - We give up some options in configuring resource groups across multi-socket
+ *   systems.
+ * - Our choices on how to configure each resource become progressively more
+ *   limited as the number of resources grows.
+ */
+static int closid_free_map;
+
+static void closid_init(void)
+{
+	struct rdt_resource *r;
+	int rdt_min_closid;
+
+	/* Compute rdt_min_closid across all resources */
+	rdt_min_closid = 0;
+	for_each_enabled_rdt_resource(r)
+		rdt_min_closid = min(rdt_min_closid, r->num_closid);
+
+	if (rdt_min_closid > 32) {
+		pr_warn("Only using 32 of %d CLOSIDs\n", rdt_min_closid);
+		rdt_min_closid = 32;
+	}
+
+	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
+
+	/* CLOSID 0 is always reserved for the default group */
+	closid_free_map &= ~1;
+}
+
+int closid_alloc(void)
+{
+	int closid = ffs(closid_free_map);
+
+	if (closid == 0)
+		return -ENOSPC;
+	closid--;
+	closid_free_map &= ~(1 << closid);
+
+	return closid;
+}
+
+static void closid_free(int closid)
+{
+	closid_free_map |= 1 << closid;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -271,6 +328,54 @@ static int parse_rdtgroupfs_options(char *data)
 	return ret;
 }
 
+/*
+ * We don't allow rdtgroup directories to be created anywhere
+ * except the root directory. Thus when looking for the rdtgroup
+ * structure for a kernfs node we are either looking at a directory,
+ * in which case the rdtgroup structure is pointed at by the "priv"
+ * field, otherwise we have a file, and need only look to the parent
+ * to find the rdtgroup.
+ */
+static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
+{
+	if (kernfs_type(kn) == KERNFS_DIR)
+		return kn->priv;
+	else
+		return kn->parent->priv;
+}
+
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	atomic_inc(&rdtgrp->waitcount);
+	kernfs_break_active_protection(kn);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	/* Was this group deleted while we waited? */
+	if (rdtgrp->flags & RDT_DELETED)
+		return NULL;
+
+	return rdtgrp;
+}
+
+void rdtgroup_kn_unlock(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+	    (rdtgrp->flags & RDT_DELETED)) {
+		kernfs_unbreak_active_protection(kn);
+		kernfs_put(kn);
+		kfree(rdtgrp);
+	} else {
+		kernfs_unbreak_active_protection(kn);
+	}
+}
+
 static struct dentry *rdt_mount(struct file_system_type *fs_type,
 				int flags, const char *unused_dev_name,
 				void *data)
@@ -302,6 +407,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		}
 	}
 
+	closid_init();
+
 	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
 	if (ret)
 		goto out;
@@ -358,10 +465,39 @@ static int reset_all_cbms(struct rdt_resource *r)
 }
 
 /*
+ * MSR_IA32_PQR_ASSOC is scoped per logical CPU, so all updates
+ * are always in thread context.
+ */
+static void rdt_reset_pqr_assoc_closid(void *v)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+
+	state->closid = 0;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+}
+
+/*
  * Forcibly remove all of subdirectories under root.
  */
 static void rmdir_all_sub(void)
 {
+	struct rdtgroup *rdtgrp, *tmp;
+
+	get_cpu();
+	/* Reset PQR_ASSOC MSR on this cpu. */
+	rdt_reset_pqr_assoc_closid(NULL);
+	/* Reset PQR_ASSOC MSR on the rest of cpus. */
+	smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid,
+			       NULL, 1);
+	put_cpu();
+	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
+		/* Remove each rdtgroup other than root */
+		if (rdtgrp == &rdtgroup_default)
+			continue;
+		kernfs_remove(rdtgrp->kn);
+		list_del(&rdtgrp->rdtgroup_list);
+		kfree(rdtgrp);
+	}
 	kernfs_remove(kn_info);
 }
 
@@ -394,7 +530,108 @@ static struct file_system_type rdt_fs_type = {
 	.kill_sb = rdt_kill_sb,
 };
 
+static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+			  umode_t mode)
+{
+	struct rdtgroup *parent, *rdtgrp;
+	struct kernfs_node *kn;
+	int ret, closid;
+
+	/* Only allow mkdir in the root directory */
+	if (parent_kn != rdtgroup_default.kn)
+		return -EPERM;
+
+	/* Do not accept '\n' to avoid unparsable situation. */
+	if (strchr(name, '\n'))
+		return -EINVAL;
+
+	parent = rdtgroup_kn_lock_live(parent_kn);
+	if (!parent) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	ret = closid_alloc();
+	if (ret < 0)
+		goto out_unlock;
+	closid = ret;
+
+	/* allocate the rdtgroup. */
+	rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
+	if (!rdtgrp) {
+		ret = -ENOSPC;
+		goto out_closid_free;
+	}
+	rdtgrp->closid = closid;
+	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
+
+	/* kernfs creates the directory for rdtgrp */
+	kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_cancel_ref;
+	}
+	rdtgrp->kn = kn;
+
+	/*
+	 * kernfs_remove() will drop the reference count on "kn" which
+	 * will free it. But we still need it to stick around for the
+	 * rdtgroup_kn_unlock(kn} call below. Take one extra reference
+	 * here, which will be dropped inside rdtgroup_kn_unlock().
+	 */
+	kernfs_get(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	ret = 0;
+	goto out_unlock;
+
+out_destroy:
+	kernfs_remove(rdtgrp->kn);
+out_cancel_ref:
+	list_del(&rdtgrp->rdtgroup_list);
+	kfree(rdtgrp);
+out_closid_free:
+	closid_free(closid);
+out_unlock:
+	rdtgroup_kn_unlock(parent_kn);
+	return ret;
+}
+
+static int rdtgroup_rmdir(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(kn);
+		return -ENOENT;
+	}
+
+	rdtgrp->flags = RDT_DELETED;
+	closid_free(rdtgrp->closid);
+	list_del(&rdtgrp->rdtgroup_list);
+
+	/*
+	 * one extra hold on this, will drop when we kfree(rdtgrp)
+	 * in rdtgroup_kn_unlock()
+	 */
+	kernfs_get(kn);
+	kernfs_remove(rdtgrp->kn);
+
+	rdtgroup_kn_unlock(kn);
+
+	return ret;
+}
+
 static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+	.mkdir	= rdtgroup_mkdir,
+	.rmdir	= rdtgroup_rmdir,
 };
 
 static int __init rdtgroup_setup_root(void)
-- 
2.5.0

next prev parent reply	other threads:[~2016-10-28 17:51 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
2016-10-26 21:25   ` [tip:x86/cache] Documentation, ABI: Document the new sysfs files for cpu cache ids tip-bot for Tony Luck
2016-10-22 13:19 ` [PATCH v5 02/18] cacheinfo: Introduce cache id Fenghua Yu
2016-10-26 21:25   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info Fenghua Yu
2016-10-26 21:26   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
2016-10-26 14:15   ` Borislav Petkov
2016-10-26 14:28     ` Thomas Gleixner
2016-10-26 21:26   ` [tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 05/18] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
2016-10-26 20:43   ` Thomas Gleixner
2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery Fenghua Yu
2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Fenghua Yu
2016-10-26 21:28   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT Fenghua Yu
2016-10-26 21:29   ` [tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology Fenghua Yu
2016-10-26 13:02   ` Thomas Gleixner
2016-10-26 16:06     ` Luck, Tony
2016-10-26 17:31       ` Thomas Gleixner
2016-10-26 21:14     ` Fenghua Yu
2016-10-26 21:18       ` Thomas Gleixner
2016-10-22 13:19 ` [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support Fenghua Yu
2016-10-26 13:52   ` Thomas Gleixner
2016-10-22 13:19 ` [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system Fenghua Yu
2016-10-26 14:45   ` Thomas Gleixner
2016-10-26 15:48     ` Luck, Tony
2016-10-26 17:33       ` Thomas Gleixner
2016-10-27 18:17     ` Fenghua Yu
2016-10-27 18:25       ` Thomas Gleixner
2016-10-27 18:35         ` Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 13/18] x86/intel_rdt: Add mkdir " Fenghua Yu
2016-10-26 15:01   ` Thomas Gleixner
2016-10-28 17:51     ` Fenghua Yu [this message]
2016-10-28 18:41       ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 14/18] x86/intel_rdt: Add cpus file Fenghua Yu
2016-10-26 17:57   ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 15/18] x86/intel_rdt: Add tasks files Fenghua Yu
2016-10-26 15:27   ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 16/18] x86/intel_rdt: Add schemata file Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 17/18] x86/intel_rdt: Add scheduler hook Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
2016-10-26 21:39 ` [PATCH v5 00/18] Intel Cache Allocation Technology Thomas Gleixner
2016-10-26 21:54   ` Fenghua Yu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:39ed561 dfblob:a6c7d94 dfblob:ebab170 dfblob:296ee23 )
 OR (
bs:"Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161028175130.GB24456@linux.intel.com \
    --to=fenghua.yu@intel.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@intel.com \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=h.peter.anvin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nilayvaish@gmail.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=sai.praneeth.prakhya@intel.com \
    --cc=shli@fb.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).