cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Alasdair G. Kergon"
	<agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	Mikulas Patocka
	<mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states
Date: Fri, 14 Jun 2013 15:31:25 -0700	[thread overview]
Message-ID: <20130614223125.GD6593@mtj.dyndns.org> (raw)
In-Reply-To: <20130614132026.GD10084-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3244 bytes --]

Hello, Michal.

On Fri, Jun 14, 2013 at 03:20:26PM +0200, Michal Hocko wrote:
> I have no objections to change css reference counting scheme if the
> guarantees we used to have are still valid. I am just missing some
> comparisons. Do you have any numbers that would show benefits clearly?

Mikulas' high scalability dm test case on top of ramdisk was affected
severely when css refcnting was added to track the original issuer's
cgroup context.  That probably is one of the more severe cases.

> You are mentioning that especially controllers that are strongly per-cpu
> oriented will see the biggest improvements. What about others?
> A single atomic_add resp. atomic_dec_return is much less heavy than the

Even with preemption enabled, the percpu ref get/put will be under ten
instructions which touch two memory areas - the preemption counter
which is usually very hot anyway and the percpu refcnt itself.  It
shouldn't be much slower than the atomic ops.  If the kernel has
preemption disabled, percpu_ref is actually likely to be cheaper even
on single CPU.

So, here are some numbers from the attached test program.  The test is
very simple - inc ref, copy N bytes into per-cpu buf, dec ref - and
see how many times it can do that in given amount of time - 15s.  Both
single CPU and all CPUs scenarios are tested.  The test is run inside
qemu on my laptop - mobile i7 2 core / 4 threads.  Yeah, I know.  I'll
run it on a proper test machine later today.

Single CPU case.  Preemption enabled.  This is the best scenario for
atomic_t.  No cacheline bouncing at all.

    copy size  atomic_t		  percpu_ref	   diff

	0      1198217077	  1747505555	  +45.84%
	32	505504457	   465714742	   -7.87%
	64	511521639	   470741887	   -7.97%
	128	485319952	   434920137	  -10.38%
	256	421809359	   384871731	   -8.76%
	512	330527059	   307587622	   -6.94%

For some reason, percpu_ref wins if copy_size is zero.  I don't know
why that is.  The body isn't optimized out so it's still doing all the
refcnting.  Maybe the CPU doesn't have enough work to mask pipeline
bubbles from atomic ops?  In other cases, it's slower by around or
under 10% which isn't exactly noise but this is the worst possible
scenario.  Unless this is the only thing a pinned CPU is doing, it's
unlikely to be noticeable.

Now doing the same thing on multiple CPUs.  Note that while this is
the best scenario for percpu_ref, the hardware the test is run on is
very favorable to atomic_t - it's just two cores on the same package
sharing the L3 cache, so cacheline ping-poinging is relatively cheap.

    copy size  atomic_t		  percpu_ref	   diff

	0      342659959	  3794775739	  +1007.45%
	32     401933493	  1337286466	   +232.71%
	64     385459752	  1353978982	   +251.26%
	128    401379622	  1234780968	   +207.63%
	256    401170676	  1052511682	   +162.36%
	512    387796881	   794101245	   +104.77%

Even on this machine, the difference is huge.  If the refcnt is used
from different CPUs in any frequency, percpu_ref will destroy
atomic_t.  Also note that percpu_ref will scale perfectly as the
number of CPUs increases while atomic_t will get worse.

I'll play with it a bit more on an actual machine and post more
results.  Test program attached.

Thanks.

-- 
tejun

[-- Attachment #2: test-pcpuref.c --]
[-- Type: text/plain, Size: 2951 bytes --]

#include <linux/module.h>
#include <linux/moduleparam.h>
#include <asm/atomic.h>
#include <linux/percpu-refcount.h>
#include <linux/workqueue.h>
#include <linux/completion.h>

static struct workqueue_struct *my_wq;

static int test_all_cpus = 0;
module_param_named(all_cpus, test_all_cpus, int, 0444);

static int test_duration = 15;
module_param_named(duration, test_duration, int, 0444);

static int test_copy_bytes = 32;
module_param_named(copy_bytes, test_copy_bytes, int, 0444);

static char test_src_buf[1024];
static DEFINE_PER_CPU(char [1024], test_dst_buf);
static DEFINE_PER_CPU(struct work_struct, test_work);

static atomic_t test_atomic_ref = ATOMIC_INIT(1);

static void test_atomic_workfn(struct work_struct *work)
{
	unsigned long end = jiffies + test_duration * HZ;
	unsigned long cnt = 0;

	do {
		int bytes = test_copy_bytes;

		atomic_inc(&test_atomic_ref);
		while (bytes) {
			int todo = min_t(int, bytes, 1024);

			memcpy(*this_cpu_ptr(&test_dst_buf), test_src_buf, todo);
			bytes -= todo;
		}
		cnt++;
		atomic_dec(&test_atomic_ref);
	} while (time_before(jiffies, end));

	printk("XXX atomic on CPU %d completed %lu loops\n",
	       smp_processor_id(), cnt);
}

static struct percpu_ref test_pcpu_ref;
static DECLARE_COMPLETION(test_cpu_ref_done);

static void test_pcpu_release(struct percpu_ref *ref)
{
	complete(&test_cpu_ref_done);
}

static void test_pcpu_workfn(struct work_struct *work)
{
	unsigned long end = jiffies + test_duration * HZ;
	unsigned long cnt = 0;

	do {
		int bytes = test_copy_bytes;

		percpu_ref_get(&test_pcpu_ref);
		while (bytes) {
			int todo = min_t(int, bytes, 1024);

			memcpy(*this_cpu_ptr(&test_dst_buf), test_src_buf, todo);
			bytes -= todo;
		}
		cnt++;
		percpu_ref_put(&test_pcpu_ref);
	} while (time_before(jiffies, end));

	printk("XXX percpu on CPU %d completed %lu loops\n",
	       smp_processor_id(), cnt);
}

static int test_pcpuref_init(void)
{
	int cpu;

	if (percpu_ref_init(&test_pcpu_ref, test_pcpu_release))
		return -ENOMEM;

	my_wq = alloc_workqueue("test-pcpuref", WQ_CPU_INTENSIVE, 0);
	if (!my_wq) {
		percpu_ref_cancel_init(&test_pcpu_ref);
		return -ENOMEM;
	}

	printk("XXX testing atomic_t for %d secs\n", test_duration);

	for_each_online_cpu(cpu) {
		INIT_WORK(per_cpu_ptr(&test_work, cpu), test_atomic_workfn);
		queue_work_on(cpu, my_wq, per_cpu_ptr(&test_work, cpu));
		if (!test_all_cpus)
			break;
	}

	flush_workqueue(my_wq);

	printk("XXX testing percpu_ref for %d secs\n", test_duration);

	for_each_online_cpu(cpu) {
		INIT_WORK(per_cpu_ptr(&test_work, cpu), test_pcpu_workfn);
		queue_work_on(cpu, my_wq, per_cpu_ptr(&test_work, cpu));
		if (!test_all_cpus)
			break;
	}

	flush_workqueue(my_wq);

	percpu_ref_kill(&test_pcpu_ref);
	wait_for_completion(&test_cpu_ref_done);

	return 0;
}
module_init(test_pcpuref_init);

static void test_pcpuref_exit(void)
{
	destroy_workqueue(my_wq);
}
module_exit(test_pcpuref_exit);

MODULE_LICENSE("GPL v2");

  parent reply	other threads:[~2013-06-14 22:31 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  4:04 [PATCHSET v2 cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Tejun Heo
     [not found] ` <1371096298-24402-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-13  4:04   ` [PATCH 01/11] cgroup: remove now unused css_depth() Tejun Heo
2013-06-13  4:04   ` [PATCH 02/11] cgroup: consistently use @cset for struct css_set variables Tejun Heo
2013-06-13  4:04   ` [PATCH 03/11] cgroup: bring some sanity to naming around cg_cgroup_link Tejun Heo
2013-06-13  4:04   ` [PATCH 04/11] cgroup: use kzalloc() instead of kmalloc() Tejun Heo
2013-06-13  4:04   ` [PATCH 06/11] cgroup: rename CGRP_REMOVED to CGRP_DEAD Tejun Heo
2013-06-13  4:04   ` [PATCH 07/11] cgroup: drop unnecessary RCU dancing from __put_css_set() Tejun Heo
2013-06-13  4:04   ` [PATCH 09/11] cgroup: reorder the operations in cgroup_destroy_locked() Tejun Heo
2013-06-13  6:04   ` [PATCHSET v2 cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Li Zefan
     [not found]     ` <51B960FF.7070604-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-06-13 17:56       ` Tejun Heo
     [not found]         ` <20130613175627.GC10799-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-14  2:41           ` Tejun Heo
2013-06-13  4:04 ` [PATCH 05/11] cgroup: clean up css_[try]get() and css_put() Tejun Heo
2013-06-13  4:04 ` [PATCH 08/11] cgroup: remove cgroup->count and use Tejun Heo
2013-06-13  4:04 ` [PATCH 10/11] cgroup: split cgroup destruction into two steps Tejun Heo
2013-06-13  4:04 ` [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states Tejun Heo
     [not found]   ` <1371096298-24402-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-06-13 23:16     ` Kent Overstreet
2013-06-14 12:55     ` Michal Hocko
     [not found]       ` <20130614125539.GC10084-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-14 14:15         ` Glauber Costa
     [not found]           ` <20130614141510.GA19021-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2013-06-14 14:22             ` Michal Hocko
2013-06-14 13:20     ` Michal Hocko
     [not found]       ` <20130614132026.GD10084-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-14 22:31         ` Tejun Heo [this message]
     [not found]           ` <20130614223125.GD6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-15  5:35             ` Tejun Heo
     [not found]               ` <20130615053522.GA7017-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-15  5:39                 ` Tejun Heo
2013-06-15  6:31                 ` Tejun Heo
2013-06-17 13:27                 ` Michal Hocko
     [not found]                   ` <20130617132744.GB5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-17 17:16                     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2013-06-12 21:03 [PATCHSET cgroup/for-3.11] cgroup: convert cgroup_subsys_state refcnt to percpu_ref Tejun Heo
2013-06-12 21:03 ` [PATCH 11/11] cgroup: use percpu refcnt for cgroup_subsys_states Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130614223125.GD6593@mtj.dyndns.org \
    --to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).