public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Helsley <matthltc@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: menage@google.com, pj@sgi.com, xemul@openvz.org,
	balbir@in.ibm.com, serue@us.ibm.com,
	linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org
Subject: Re: [RFC/PATCH 1/8]: CGroup Files: Add locking mode to cgroups control files
Date: Tue, 13 May 2008 13:38:58 -0700	[thread overview]
Message-ID: <1210711138.21217.49.camel@localhost.localdomain> (raw)
In-Reply-To: <20080513130127.fcd46a41.akpm@linux-foundation.org>


On Tue, 2008-05-13 at 13:01 -0700, Andrew Morton wrote:
> Fear, doubt and resistance!
> 
> On Mon, 12 May 2008 23:37:08 -0700
> menage@google.com wrote:
> 
> > Different cgroup files have different stability requirements of the
> > cgroups framework while the handler is running; currently most
> > subsystems that don't have their own internal synchronization just
> > call cgroup_lock()/cgroup_unlock(), which takes the global cgroup_mutex.
> > 
> > This patch introduces a range of locking modes that can be requested
> > by a control file; currently these are all implemented internally by
> > taking cgroup_mutex, but expressing the intention will make it simpler
> > to move to a finer-grained locking scheme in the future.
> > 
> 
> This, umm, doesn't seem to do much to make the kernel a simpler place.
> 
> Do we expect to gain much from this?  Hope so...  What?
> 
> > Index: cgroup-2.6.25-mm1/include/linux/cgroup.h
> > ===================================================================
> > --- cgroup-2.6.25-mm1.orig/include/linux/cgroup.h
> > +++ cgroup-2.6.25-mm1/include/linux/cgroup.h
> > @@ -200,11 +200,87 @@ struct cgroup_map_cb {
> >   */
> >  
> >  #define MAX_CFTYPE_NAME 64
> > +
> > +/* locking modes for control files.
> > + *
> > + * These determine what level of guarantee the file handler wishes
> > + * cgroups to provide about the stability of control group entities
> > + * for the duration of the handler callback.
> > + *
> > + * The minimum guarantee is that the subsystem state for this
> > + * subsystem will not be freed (via a call to the subsystem's
> > + * destroy() callback) until after the control file handler
> > + * returns. This guarantee is provided by the fact that the open
> > + * dentry for the control file keeps its parent (cgroup) dentry alive,
> > + * which in turn keeps the cgroup object from being actually freed
> > + * (although it can be moved into the removed state in the
> > + * meantime). This is suitable for subsystems that completely control
> > + * their own synchronization.
> > + *
> > + * Other possible guarantees are given below.
> > + *
> > + * XXX_READ bits are used for a read operation on the control file,
> > + * XXX_WRITE bits are used for a write operation on the control file
> > + */
> 
> Vague handwaving: lockdep doesn't know anything about any of this. 
> Whereas if we were more conventional in using separate locks and
> suitable lock types for each application, we would retain full lockdep
> coverage.
> 
> > +/*
> > + * CFT_LOCK_ATTACH_(READ|WRITE): This operation will not run
> > + * concurrently with a task movement into or out of this cgroup.
> > + */
> > +#define CFT_LOCK_ATTACH_READ 1
> > +#define CFT_LOCK_ATTACH_WRITE 2
> > +#define CFT_LOCK_ATTACH (CFT_LOCK_ATTACH_READ | CFT_LOCK_ATTACH_WRITE)
> > +
> > +/*
> > + * CFT_LOCK_RMDIR_(READ|WRITE): This operation will not run
> > + * concurrently with the removal of the affected cgroup.
> > + */
> > +#define CFT_LOCK_RMDIR_READ 4
> > +#define CFT_LOCK_RMDIR_WRITE 8
> > +#define CFT_LOCK_RMDIR (CFT_LOCK_RMDIR_READ | CFT_LOCK_RMDIR_WRITE)
> > +
> > +/*
> > + * CFT_LOCK_HIERARCHY_(READ|WRITE): This operation will not run
> > + * concurrently with a cgroup creation or removal in this hierarchy,
> > + * or a bind/move/unbind for this subsystem.
> > + */
> > +#define CFT_LOCK_HIERARCHY_READ 16
> > +#define CFT_LOCK_HIERARCHY_WRITE 32
> > +#define CFT_LOCK_HIERARCHY (CFT_LOCK_HIERARCHY_READ | CFT_LOCK_HIERARCHY_WRITE)
> > +
> > +/*
> > + * CFT_LOCK_CGL_(READ|WRITE): This operation is called with
> > + * cgroup_lock() held; it will not run concurrently with any of the
> > + * above operations in any cgroup/hierarchy. This should be considered
> > + * to be the BKL of cgroups - it should be avoided if you can use
> > + * finer-grained locking
> > + */
> > +#define CFT_LOCK_CGL_READ 64
> > +#define CFT_LOCK_CGL_WRITE 128
> > +#define CFT_LOCK_CGL (CFT_LOCK_CGL_READ | CFT_LOCK_CGL_WRITE)
> > +
> > +#define CFT_LOCK_FOR_READ (CFT_LOCK_ATTACH_READ |		\
> > +			   CFT_LOCK_RMDIR_READ |		\
> > +			   CFT_LOCK_HIERARCHY_READ |		\
> > +			   CFT_LOCK_CGL_READ)
> > +
> > +#define CFT_LOCK_FOR_WRITE (CFT_LOCK_ATTACH_WRITE |	\
> > +			    CFT_LOCK_RMDIR_WRITE |	\
> > +			    CFT_LOCK_HIERARCHY_WRITE |	\
> > +			    CFT_LOCK_CGL_WRITE)
> > +
> >  struct cftype {
> >  	/* By convention, the name should begin with the name of the
> >  	 * subsystem, followed by a period */
> >  	char name[MAX_CFTYPE_NAME];
> >  	int private;
> > +
> > +	/*
> > +	 * Determine what locks (if any) are held across calls to
> > +	 * read_X/write_X callback. See lockmode definitions above
> > +	 */
> > +	int lockmode;
> > +
> >  	int (*open) (struct inode *inode, struct file *file);
> >  	ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft,
> >  			 struct file *file,
> > Index: cgroup-2.6.25-mm1/kernel/cgroup.c
> > ===================================================================
> > --- cgroup-2.6.25-mm1.orig/kernel/cgroup.c
> > +++ cgroup-2.6.25-mm1/kernel/cgroup.c
> > @@ -1327,38 +1327,65 @@ enum cgroup_filetype {
> >  	FILE_RELEASE_AGENT,
> >  };
> >  
> > -static ssize_t cgroup_write_X64(struct cgroup *cgrp, struct cftype *cft,
> > -				struct file *file,
> > -				const char __user *userbuf,
> > -				size_t nbytes, loff_t *unused_ppos)
> > +
> > +
> > +/**
> > + * cgroup_file_lock(). Helper for cgroup read/write methods.
> > + * @cgrp:  the cgroup being acted on
> > + * @cft:   the control file being written to or read from
> > + * *write: true if the access is a write access.
> > + *
> > + * Takes any necessary locks as requested by the control file's
> > + * 'lockmode' field; checks (after locking if necessary) that the
> > + * control group is not in the process of being destroyed.
> > + *
> > + * Currently all the locking options are implemented in the same way,
> > + * by taking cgroup_mutex. Future patches will add finer-grained
> > + * locking.
> > + *
> > + * Calls to cgroup_file_lock() should always be paired with calls to
> > + * cgroup_file_unlock(), even if cgroup_file_lock() returns an error.
> > + */
> > +
> > +static int cgroup_file_lock(struct cgroup *cgrp, struct cftype *cft, int write)
> >  {
> > -	char buffer[64];
> > -	int retval = 0;
> > -	char *end;
> > +	int mask = write ? CFT_LOCK_FOR_WRITE : CFT_LOCK_FOR_READ;
> > +	BUILD_BUG_ON(CFT_LOCK_FOR_READ != (CFT_LOCK_FOR_WRITE >> 1));
> >  
> > -	if (!nbytes)
> > -		return -EINVAL;
> > -	if (nbytes >= sizeof(buffer))
> > -		return -E2BIG;
> > -	if (copy_from_user(buffer, userbuf, nbytes))
> > -		return -EFAULT;
> > +	if (cft->lockmode & mask)
> > +		mutex_lock(&cgroup_mutex);
> > +	if (cgroup_is_removed(cgrp))
> > +		return -ENODEV;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * cgroup_file_unlock(): undoes the effect of cgroup_file_lock()
> > + */
> > +
> > +static void cgroup_file_unlock(struct cgroup *cgrp, struct cftype *cft,
> > +			      int write)
> > +{
> > +	int mask = write ? CFT_LOCK_FOR_WRITE : CFT_LOCK_FOR_READ;
> > +	if (cft->lockmode & mask)
> > +		mutex_unlock(&cgroup_mutex);
> > +}
> >  
> > -	buffer[nbytes] = 0;     /* nul-terminate */
> > -	strstrip(buffer);
> > +static ssize_t cgroup_write_X64(struct cgroup *cgrp, struct cftype *cft,
> > +				const char *buffer)
> > +{
> > +	char *end;
> >  	if (cft->write_u64) {
> >  		u64 val = simple_strtoull(buffer, &end, 0);
> >  		if (*end)
> >  			return -EINVAL;
> > -		retval = cft->write_u64(cgrp, cft, val);
> > +		return cft->write_u64(cgrp, cft, val);
> >  	} else {
> >  		s64 val = simple_strtoll(buffer, &end, 0);
> >  		if (*end)
> >  			return -EINVAL;
> > -		retval = cft->write_s64(cgrp, cft, val);
> > +		return cft->write_s64(cgrp, cft, val);
> >  	}
> > -	if (!retval)
> > -		retval = nbytes;
> > -	return retval;
> >  }
> >  
> >  static ssize_t cgroup_common_file_write(struct cgroup *cgrp,
> > @@ -1426,47 +1453,82 @@ out1:
> >  	return retval;
> >  }
> >  
> > -static ssize_t cgroup_file_write(struct file *file, const char __user *buf,
> > +static ssize_t cgroup_file_write(struct file *file, const char __user *userbuf,
> >  						size_t nbytes, loff_t *ppos)
> >  {
> >  	struct cftype *cft = __d_cft(file->f_dentry);
> >  	struct cgroup *cgrp = __d_cgrp(file->f_dentry->d_parent);
> > -
> > -	if (!cft || cgroup_is_removed(cgrp))
> > -		return -ENODEV;
> > -	if (cft->write)
> > -		return cft->write(cgrp, cft, file, buf, nbytes, ppos);
> > -	if (cft->write_u64 || cft->write_s64)
> > -		return cgroup_write_X64(cgrp, cft, file, buf, nbytes, ppos);
> > -	if (cft->trigger) {
> > -		int ret = cft->trigger(cgrp, (unsigned int)cft->private);
> > -		return ret ? ret : nbytes;
> > +	ssize_t retval;
> > +	char static_buffer[64];
> > +	char *buffer = static_buffer;
> > +	ssize_t max_bytes = sizeof(static_buffer) - 1;
> > +	if (!cft->write && !cft->trigger) {
> > +		if (!nbytes)
> > +			return -EINVAL;
> > +		if (nbytes >= max_bytes)
> > +			return -E2BIG;
> > +		if (nbytes >= sizeof(static_buffer)) {
> 
> afaict this can't happen - we would have already returned -E2BIG?
> 
> > +			/* +1 for nul-terminator */
> > +			buffer = kmalloc(nbytes + 1, GFP_KERNEL);
> > +			if (buffer == NULL)
> > +				return -ENOMEM;
> > +		}
> > +		if (copy_from_user(buffer, userbuf, nbytes)) {
> > +			retval = -EFAULT;
> > +			goto out_free;
> > +		}
> > +		buffer[nbytes] = 0;	/* nul-terminate */
> > +		strstrip(buffer);	/* strip -just- trailing whitespace */
> >  	}
> > -	return -EINVAL;
> > -}
> 
> I'm trying to work out what protects static_buffer?

One of us must be having a brain cramp because it looks to me like the
buffer doesn't need protection -- it's on the stack. It's probably me
but I'm just not seeing how this use is unsafe..

> Why does it need to be static anyway?  64 bytes on-stack is OK.

Uh, it is on stack. It doesn't use the C keyword "static". It's just
poorly-named.

<snip>

Cheers,
	-Matt Helsley


  reply	other threads:[~2008-05-13 20:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-13  6:37 [RFC/PATCH 0/8]: CGroup Files: Clean up locking and boilerplate menage
2008-05-13  6:37 ` [RFC/PATCH 1/8]: CGroup Files: Add locking mode to cgroups control files menage
2008-05-13  9:23   ` Li Zefan
2008-05-13 21:07     ` Paul Menage
2008-05-14  1:30       ` Li Zefan
2008-05-14  1:40         ` Paul Menage
2008-05-13 20:01   ` Andrew Morton
2008-05-13 20:38     ` Matthew Helsley [this message]
2008-05-13 20:43       ` Andrew Morton
2008-05-13 21:17     ` Paul Menage
2008-05-13 21:32       ` Andrew Morton
2008-05-13 21:46         ` Paul Menage
2008-05-14  1:59         ` Paul Jackson
2008-05-13  6:37 ` [RFC/PATCH 2/8]: CGroup Files: Add a cgroup write_string control file method menage
2008-05-13 20:07   ` Andrew Morton
2008-05-13 21:01     ` Paul Menage
2008-05-13 20:44   ` Matt Helsley
2008-05-13  6:37 ` [RFC/PATCH 3/8]: CGroup Files: Move the release_agent file to use typed handlers menage
2008-05-13 20:08   ` Andrew Morton
2008-05-13 21:32     ` Paul Menage
2008-05-13  6:37 ` [RFC/PATCH 4/8]: CGroup Files: Move notify_on_release file to separate write handler menage
2008-05-13  6:37 ` [RFC/PATCH 5/8]: CGroup Files: Turn attach_task_by_pid directly into a cgroup " menage
2008-05-13  6:37 ` [RFC/PATCH 6/8]: CGroup Files: Remove cpuset_common_file_write() menage
2008-05-13 20:11   ` Andrew Morton
2008-05-13 21:27     ` Paul Menage
2008-05-13  6:37 ` [RFC/PATCH 7/8]: CGroup Files: Convert devcgroup_access_write() into a cgroup write_string() handler menage
2008-05-13  6:37 ` [RFC/PATCH 8/8]: CGroup Files: Convert res_counter_write() to be a cgroups " menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1210711138.21217.49.camel@localhost.localdomain \
    --to=matthltc@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=serue@us.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox