All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Jens Axboe <axboe@kernel.dk>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>
Subject: Re: [PATCH] Percpu tag allocator
Date: Wed, 12 Jun 2013 19:08:35 +0200	[thread overview]
Message-ID: <20130612170835.GA4682@redhat.com> (raw)
In-Reply-To: <1371009804-11596-1-git-send-email-koverstreet@google.com>

On 06/11, Kent Overstreet wrote:
>
> + * This is done by keeping track of which cpus have tags on their percpu
> + * freelists in a bitmap, and then on allocation failure if too many cpus have
> + * tags on their freelists - i.e. if cpus_have_tags * TAG_CPU_SIZE (64) >
> + * nr_tags / 2 - then we steal one remote cpu's freelist (effectively picked at
> + * random).

Interesting... I'll try to read the patch later.

I have to admit, I do not really understand what this bitmap can actually
buy after a quick glance. It is only a hint afaics, and obviously it is
not accurate. alloc_local_tag() never clears the bit, tag_free()->set_bit()
is racy, etc. I guess this is fine, just this doesn't look clear.

But the code looks correct, just a couple of minor nits, feel freee to
ignore.

> +enum {
> +	TAG_FAIL	= -1U,
> +	TAG_MAX		= TAG_FAIL - 1,
> +};

This can probably go to .c, and it seems that TAG_MAX is not needed.
tag_init() can check "nr_tags >= TAG_FAIL" instead.

> +static bool steal_tags(struct percpu_tag_pool *pool,
> +		       struct percpu_tag_cpu_freelist *tags)
> +{
> +	unsigned i, nr_free, cpus_have_tags = 0, cpu = pool->cpu_last_stolen;
> +	struct percpu_tag_cpu_freelist *remote;
> +
> +	for (i = 0; i < DIV_ROUND_UP(nr_cpu_ids, BITS_PER_LONG); i++)
> +		cpus_have_tags += hweight_long(pool->cpus_have_tags[i]);

bitmap_weight(pool->cpus_have_tags, pool->nr_tags).

> +
> +	while (cpus_have_tags-- * TAG_CPU_SIZE > pool->nr_tags / 2) {
> +		cpu = find_next_bit(pool->cpus_have_tags, nr_cpu_ids, cpu);
> +
> +		if (cpu == nr_cpu_ids)
> +			cpu = find_first_bit(pool->cpus_have_tags, nr_cpu_ids);
> +
> +		if (cpu == nr_cpu_ids)
> +			BUG();
> +
> +		pool->cpu_last_stolen = cpu;
> +		remote = per_cpu_ptr(pool->tag_cpu, cpu);
> +
> +		if (remote == tags)
> +			continue;
> +
> +		clear_bit(cpu, pool->cpus_have_tags);
> +
> +		nr_free = xchg(&remote->nr_free, TAG_CPU_STEALING);
> +
> +		if (nr_free) {
> +			memcpy(tags->freelist,
> +			       remote->freelist,
> +			       sizeof(unsigned) * nr_free);
> +			smp_mb();
> +			remote->nr_free = 0;
> +			tags->nr_free = nr_free;
> +			return true;
> +		} else {
> +			remote->nr_free = 0;
> +		}

Both branches clear remote->nr_free.

> +int percpu_tag_pool_init(struct percpu_tag_pool *pool, unsigned long nr_tags)
> +{
> +	unsigned i, order;
> +
> +	memset(pool, 0, sizeof(*pool));
> +
> +	spin_lock_init(&pool->lock);
> +	init_waitqueue_head(&pool->wait);
> +	pool->nr_tags = nr_tags;
> +
> +	/* Guard against overflow */
> +	if (nr_tags > TAG_MAX) {
> +		pr_err("tags.c: nr_tags too large\n");
> +		return -EINVAL;
> +	}
> +
> +	order = get_order(nr_tags * sizeof(unsigned));
> +	pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
> +	if (!pool->freelist)
> +		goto err;
> +
> +	for (i = 0; i < nr_tags; i++)
> +		pool->freelist[i] = i;
> +
> +	pool->nr_free = nr_tags;
> +
> +	pool->cpus_have_tags = kzalloc(DIV_ROUND_UP(nr_cpu_ids, BITS_PER_LONG) *
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BITS_TO_LONGS(nr_cpu_ids)

Oleg.


  reply	other threads:[~2013-06-12 17:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12  4:03 [PATCH] Percpu tag allocator Kent Overstreet
2013-06-12 17:08 ` Oleg Nesterov [this message]
2013-06-12 17:59   ` Kent Overstreet
2013-06-12 19:14     ` Oleg Nesterov
2013-06-12 23:38 ` Andrew Morton
2013-06-13  2:05   ` Kent Overstreet
2013-06-13  3:03     ` Andrew Morton
2013-06-13  3:54       ` Kent Overstreet
2013-06-13  5:46         ` Andrew Morton
2013-06-13 18:53       ` Tejun Heo
2013-06-13 19:04         ` Andrew Morton
2013-06-13 19:15           ` Tejun Heo
2013-06-13 19:23             ` Andrew Morton
2013-06-13 19:35               ` Tejun Heo
2013-06-13 22:10                 ` Andrew Morton
2013-06-13 22:30                   ` Tejun Heo
2013-06-13 22:35                     ` Andrew Morton
2013-06-13 23:13                       ` Tejun Heo
2013-06-13 23:23                         ` Tejun Heo
2013-06-19  1:32               ` Kent Overstreet
2013-06-13 19:08         ` J. Bruce Fields
2013-06-13 19:09         ` Jeff Layton
2013-06-13 21:53         ` Kent Overstreet
2013-06-13 19:06   ` Tejun Heo
2013-06-13 19:13     ` Andrew Morton
2013-06-13 19:21       ` Tejun Heo
2013-06-13 21:14         ` Kent Overstreet
2013-06-13 21:50           ` Tejun Heo
2013-06-13 21:58             ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130612170835.GA4682@redhat.com \
    --to=oleg@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux-foundation.org \
    --cc=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nab@linux-iscsi.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.