Re: [patch, v2] aio: allocate kiocbs in batches

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@google.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org
Subject: Re: [patch, v2] aio: allocate kiocbs in batches
Date: Wed, 21 Sep 2011 14:39:58 -0700	[thread overview]
Message-ID: <20110921143958.df7105db.akpm@google.com> (raw)
In-Reply-To: <x49ipol3jfz.fsf@segfault.boston.devel.redhat.com>

On Wed, 21 Sep 2011 13:16:00 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> Hi,
> 
> In testing aio on a fast storage device, I found that the context lock
> takes up a fair amount of cpu time in the I/O submission path.  The
> reason is that we take it for every I/O submitted (see __aio_get_req).
> Since we know how many I/Os are passed to io_submit, we can preallocate
> the kiocbs in batches, reducing the number of times we take and release
> the lock.  In my testing, I was able to reduce the amount of time spent
> in _raw_spin_lock_irq by .56% (average of 3 runs).  The command I used
> to test this was:
>    aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384 <dev>
> 
> I also tested the patch with various numbers of events passed to
> io_submit, and I ran the xfstests aio group of tests to ensure I didn't
> break anything.
>
> ...
>
> +/*
> + * struct kiocb's are allocated in batches to reduce the number of
> + * times the ctx lock is acquired and released.
> + */
> +#define KIOCB_BATCH_SIZE	32
> +struct kiocb_batch {
> +	struct list_head head;
> +	long total;	/* number of requests passed to sys_io_submit */
> +	long allocated;	/* number of requests allocated so far */
> +};

I don't see a reason why `total' and `allocated' need to be 64-bit. 
Making them 32-bit results in smaller code, smaller storage, smaller
d-cache footprint, etc.

Also, they should logically be unsigned types.

>
> ...
>
> +static int kiocb_batch_refill(struct kioctx *ctx, struct kiocb_batch *batch)
> +{
> +	int i;
> +	int to_alloc, avail;
> +	bool called_fput = false;
> +	struct kiocb *req, *n;
> +	struct aio_ring *ring;
> +
> +	to_alloc = min(batch->total - batch->allocated, KIOCB_BATCH_SIZE);

And this generates a compile-time warning due to the long/int mismatch.
Did your compiler not warn here?  (And why did `to_alloc' and `i' get
to be `int'?  The type choices are chaotic in there!)

I'd suggest going with "unsigned" for `total' and `allocated', and make
KIOCB_BATCH_SIZE 32U.  Then have a think about the appropriate types
for the derived locals such as `i', `to_alloc' and `avail'.

> +	for (i = 0; i < to_alloc; i++) {
> +		req = __aio_get_req(ctx);
> +		if (!req)
> +			/* allocation failed, go with what we've got */
> +			break;
> +		list_add(&req->ki_batch, &batch->head);
> +	}
> +
> +	if (i == 0)
> +		goto out;
> +
> +retry:
>  	spin_lock_irq(&ctx->ctx_lock);
> -	ring = kmap_atomic(ctx->ring_info.ring_pages[0], KM_USER0);
> -	if (ctx->reqs_active < aio_ring_avail(&ctx->ring_info, ring)) {
> +	ring = kmap_atomic(ctx->ring_info.ring_pages[0]);
> +
> +	avail = aio_ring_avail(&ctx->ring_info, ring) - ctx->reqs_active;
> +	BUG_ON(avail < 0);
> +	if (avail == 0 && !called_fput) {
> +		/*
> +		 * Handle a potential starvation case.  It is possible that
> +		 * we hold the last reference on a struct file, causing us
> +		 * to delay the final fput to non-irq context.  In this case,
> +		 * ctx->reqs_active is artificially high.  Calling the fput
> +		 * routine here may free up a slot in the event completion
> +		 * ring, allowing this allocation to succeed.
> +		 */
> +		spin_unlock_irq(&ctx->ctx_lock);
> +		kunmap_atomic(ring);

And there's a bug.  We need to maintain the thread's atomic state
across the kunmap_atomic().  This should have caused a might_sleep()
runtime warning from kunmap_atomic()'s smp_processor_id() (at least). 
That's assuming you tested on a 32-bit highmem box and were able to
exercise this codepath, neither of which seems likely ;)

> +		aio_fput_routine(NULL);
> +		called_fput = true;
> +		goto retry;
> +	}
> +
> +	if (avail < i) {
> +		/* Trim back the number of requests. */
> +		list_for_each_entry_safe(req, n, &batch->head, ki_batch) {
> +			list_del(&req->ki_batch);
> +			kmem_cache_free(kiocb_cachep, req);
> +			if (--i <= avail)
> +				break;
> +		}
> +	}
> +
> +	batch->allocated += i;
> +	list_for_each_entry(req, &batch->head, ki_batch) {
>  		list_add(&req->ki_list, &ctx->active_reqs);
>  		ctx->reqs_active++;
> -		okay = 1;
>  	}
> -	kunmap_atomic(ring, KM_USER0);
> -	spin_unlock_irq(&ctx->ctx_lock);
>  
> -	if (!okay) {
> -		kmem_cache_free(kiocb_cachep, req);
> -		req = NULL;
> -	}
> +	kunmap_atomic(ring);
> +	spin_unlock_irq(&ctx->ctx_lock);

Like that.

> -	return req;
> +out:
> +	return i;
>  }
>
> ...
>

I wouldn't want to do the long->unsigned conversion without runtime
testing it so can you please do a v3?

next prev parent reply	other threads:[~2011-09-21 21:40 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-21 17:16 [patch, v2] aio: allocate kiocbs in batches Jeff Moyer
2011-09-21 21:39 ` Andrew Morton [this message]
2011-09-22 13:24   ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110921143958.df7105db.akpm@google.com \
    --to=akpm@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.