All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org,
	jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk,
	tytso@mit.edu
Subject: Re: [PATCH 14/32] aio: Make aio_read_evt() more efficient, convert to hrtimers
Date: Mon, 7 Jan 2013 17:00:55 -0800	[thread overview]
Message-ID: <20130107170055.aec2b6f0.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130108002821.GM26407@google.com>

On Mon, 7 Jan 2013 16:28:21 -0800
Kent Overstreet <koverstreet@google.com> wrote:

> On Thu, Jan 03, 2013 at 03:19:20PM -0800, Andrew Morton wrote:
> > On Wed, 26 Dec 2012 17:59:52 -0800
> > Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > Previously, aio_read_event() pulled a single completion off the
> > > ringbuffer at a time, locking and unlocking each time.  Changed it to
> > > pull off as many events as it can at a time, and copy them directly to
> > > userspace.
> > > 
> > > This also fixes a bug where if copying the event to userspace failed,
> > > we'd lose the event.
> > > 
> > > Also convert it to wait_event_interruptible_hrtimeout(), which
> > > simplifies it quite a bit.
> > > 
> > > ...
> > >
> > > -static int aio_read_evt(struct kioctx *ioctx, struct io_event *ent)
> > > +static int aio_read_events_ring(struct kioctx *ctx,
> > > +				struct io_event __user *event, long nr)
> > >  {
> > > -	struct aio_ring_info *info = &ioctx->ring_info;
> > > +	struct aio_ring_info *info = &ctx->ring_info;
> > >  	struct aio_ring *ring;
> > > -	unsigned long head;
> > > -	int ret = 0;
> > > +	unsigned head, pos;
> > > +	int ret = 0, copy_ret;
> > > +
> > > +	if (!mutex_trylock(&info->ring_lock)) {
> > > +		__set_current_state(TASK_RUNNING);
> > > +		mutex_lock(&info->ring_lock);
> > > +	}
> > 
> > You're not big on showing your homework, I see :(
> 
> No :(

Am still awaiting the patch which explains to people what the above
code is doing!

> > I agree that calling mutex_lock() in state TASK_[UN]INTERRUPTIBLE is at
> > least poor practice.  Assuming this is what the code is trying to do. 
> > But if aio_read_events_ring() is indeed called in state
> > TASK_[UN]INTERRUPTIBLE then the effect of the above code is to put the
> > task into an *unknown* state.
> 
> So - yes, aio_read_events_ring() is called after calling
> prepare_to_wait(TASK_INTERRUPTIBLE).
> 
> The problem is that lock kind of has to be a mutex, because it's got to
> call copy_to_user() under it, and it's got to take the lock to check
> whether it needs to sleep (i.e. after putting itself on the waitlist).
> 
> Though - (correct me if I'm wrong) the task state is not now unknown,
> it's either unchanged (still TASK_INTERRUPTIBLE) or TASK_RUNNING.

I call that "unknown" :)

> So
> it'll get to the schedule() part of the wait_event() loop in
> TASK_RUNNING state, but AFAIK that should be ok... just perhaps less
> than ideal.

aio_read_events_ring() is called via the
wait_event_interruptible_hrtimeout() macro's call to `condition' - to
work out whether aio_read_events_ring() should terminate.

A problem we should think about is "under what circumstances will
aio_read_events_ring() set us into TASK_RUNNING?".  We don't want
aio_read_events_ring() to do this too often because it will cause
schedule() to fall through and we end up in a busy loop, chewing CPU. 

afacit, aio_read_events_ring() will usually return non-zero if it
flipped us into TASK_RUNNING state.  An exception is where the
mutex_trylock() failed, in which case the thread slept in mutex_lock(),
whcih will help with the CPU-chewing.  But aio_read_events_ring() can
then end up returning 0 but in state TASK_RUNNING which will cause a
small cpu-chew in wait_event_interruptible_hrtimeout().

I think :( It is unfortunately complex and it would be nice to make
this dynamic behaviour more clear and solid.  Or at least documented! 
Explain how this code avoid getting stuck in a cpu-burning loop.  To
help prevent people from causing a cpu-burning loop when they later
change the code.

> However - I was told that calling mutex_lock() in TASK_INTERRUPTIBLE
> state was bad, but thinking about it more I'm not seeing how that's the
> case. Either mutex_lock() finds the lock uncontended and doesn't touch
> the task state, or it does and leaves it in TASK_RUNNING when it
> returns.
> 
> IOW, I don't see how it'd behave any differently from what I'd doing.
> 
> Any light you could shed would be most appreciated.

Well, the problem with running mutex_lock() in TASK_[UN]INTERRUPTIBLE
is just that: it may or may not flip you into TASK_RUNNING, so what the
heck is the caller thinking of?  It's strange to set the task state a
particular way, then call a function which will randomly go and undo
that.

The cause of all this is the wish to use a wait_event `condition'
predicate which must take a mutex.  hrm.

> > IOW, I don't have the foggiest clue what you're trying to do here and
> > you owe us all a code comment.  At least.
> 
> Yeah, will do.

Excited!

> This look better for the types?

yup.


Also, it's unclear why kioctx.shadow_tail exists.  Some overviewy
explanation at its definitions site is needed, IMO.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org,
	jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk,
	tytso@mit.edu
Subject: Re: [PATCH 14/32] aio: Make aio_read_evt() more efficient, convert to hrtimers
Date: Mon, 7 Jan 2013 17:00:55 -0800	[thread overview]
Message-ID: <20130107170055.aec2b6f0.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130108002821.GM26407@google.com>

On Mon, 7 Jan 2013 16:28:21 -0800
Kent Overstreet <koverstreet@google.com> wrote:

> On Thu, Jan 03, 2013 at 03:19:20PM -0800, Andrew Morton wrote:
> > On Wed, 26 Dec 2012 17:59:52 -0800
> > Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > Previously, aio_read_event() pulled a single completion off the
> > > ringbuffer at a time, locking and unlocking each time.  Changed it to
> > > pull off as many events as it can at a time, and copy them directly to
> > > userspace.
> > > 
> > > This also fixes a bug where if copying the event to userspace failed,
> > > we'd lose the event.
> > > 
> > > Also convert it to wait_event_interruptible_hrtimeout(), which
> > > simplifies it quite a bit.
> > > 
> > > ...
> > >
> > > -static int aio_read_evt(struct kioctx *ioctx, struct io_event *ent)
> > > +static int aio_read_events_ring(struct kioctx *ctx,
> > > +				struct io_event __user *event, long nr)
> > >  {
> > > -	struct aio_ring_info *info = &ioctx->ring_info;
> > > +	struct aio_ring_info *info = &ctx->ring_info;
> > >  	struct aio_ring *ring;
> > > -	unsigned long head;
> > > -	int ret = 0;
> > > +	unsigned head, pos;
> > > +	int ret = 0, copy_ret;
> > > +
> > > +	if (!mutex_trylock(&info->ring_lock)) {
> > > +		__set_current_state(TASK_RUNNING);
> > > +		mutex_lock(&info->ring_lock);
> > > +	}
> > 
> > You're not big on showing your homework, I see :(
> 
> No :(

Am still awaiting the patch which explains to people what the above
code is doing!

> > I agree that calling mutex_lock() in state TASK_[UN]INTERRUPTIBLE is at
> > least poor practice.  Assuming this is what the code is trying to do. 
> > But if aio_read_events_ring() is indeed called in state
> > TASK_[UN]INTERRUPTIBLE then the effect of the above code is to put the
> > task into an *unknown* state.
> 
> So - yes, aio_read_events_ring() is called after calling
> prepare_to_wait(TASK_INTERRUPTIBLE).
> 
> The problem is that lock kind of has to be a mutex, because it's got to
> call copy_to_user() under it, and it's got to take the lock to check
> whether it needs to sleep (i.e. after putting itself on the waitlist).
> 
> Though - (correct me if I'm wrong) the task state is not now unknown,
> it's either unchanged (still TASK_INTERRUPTIBLE) or TASK_RUNNING.

I call that "unknown" :)

> So
> it'll get to the schedule() part of the wait_event() loop in
> TASK_RUNNING state, but AFAIK that should be ok... just perhaps less
> than ideal.

aio_read_events_ring() is called via the
wait_event_interruptible_hrtimeout() macro's call to `condition' - to
work out whether aio_read_events_ring() should terminate.

A problem we should think about is "under what circumstances will
aio_read_events_ring() set us into TASK_RUNNING?".  We don't want
aio_read_events_ring() to do this too often because it will cause
schedule() to fall through and we end up in a busy loop, chewing CPU. 

afacit, aio_read_events_ring() will usually return non-zero if it
flipped us into TASK_RUNNING state.  An exception is where the
mutex_trylock() failed, in which case the thread slept in mutex_lock(),
whcih will help with the CPU-chewing.  But aio_read_events_ring() can
then end up returning 0 but in state TASK_RUNNING which will cause a
small cpu-chew in wait_event_interruptible_hrtimeout().

I think :( It is unfortunately complex and it would be nice to make
this dynamic behaviour more clear and solid.  Or at least documented! 
Explain how this code avoid getting stuck in a cpu-burning loop.  To
help prevent people from causing a cpu-burning loop when they later
change the code.

> However - I was told that calling mutex_lock() in TASK_INTERRUPTIBLE
> state was bad, but thinking about it more I'm not seeing how that's the
> case. Either mutex_lock() finds the lock uncontended and doesn't touch
> the task state, or it does and leaves it in TASK_RUNNING when it
> returns.
> 
> IOW, I don't see how it'd behave any differently from what I'd doing.
> 
> Any light you could shed would be most appreciated.

Well, the problem with running mutex_lock() in TASK_[UN]INTERRUPTIBLE
is just that: it may or may not flip you into TASK_RUNNING, so what the
heck is the caller thinking of?  It's strange to set the task state a
particular way, then call a function which will randomly go and undo
that.

The cause of all this is the wish to use a wait_event `condition'
predicate which must take a mutex.  hrm.

> > IOW, I don't have the foggiest clue what you're trying to do here and
> > you owe us all a code comment.  At least.
> 
> Yeah, will do.

Excited!

> This look better for the types?

yup.


Also, it's unclear why kioctx.shadow_tail exists.  Some overviewy
explanation at its definitions site is needed, IMO.


  reply	other threads:[~2013-01-08  1:00 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-27  1:59 [PATCH 00/32] AIO performance improvements/cleanups, v3 Kent Overstreet
2012-12-27  1:59 ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 01/32] mm: remove old aio use_mm() comment Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 02/32] aio: remove dead code from aio.h Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 03/32] gadget: remove only user of aio retry Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 04/32] aio: remove retry-based AIO Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-29  7:36   ` Hillf Danton
2012-12-29  7:36     ` Hillf Danton
2013-01-07 22:12     ` Kent Overstreet
2013-01-07 22:12       ` Kent Overstreet
2012-12-29  7:47   ` Hillf Danton
2012-12-29  7:47     ` Hillf Danton
2013-01-07 22:15     ` Kent Overstreet
2013-01-07 22:15       ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 05/32] char: add aio_{read,write} to /dev/{null,zero} Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 06/32] aio: Kill return value of aio_complete() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 07/32] aio: kiocb_cancel() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 08/32] aio: Move private stuff out of aio.h Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 09/32] aio: dprintk() -> pr_debug() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 10/32] aio: do fget() after aio_get_req() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 11/32] aio: Make aio_put_req() lockless Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 12/32] aio: Refcounting cleanup Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 13/32] wait: Add wait_event_hrtimeout() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27 10:37   ` Fubo Chen
2012-12-27 10:37     ` Fubo Chen
2013-01-03 23:08   ` Andrew Morton
2013-01-03 23:08     ` Andrew Morton
2013-01-08  0:09     ` Kent Overstreet
2013-01-08  0:09       ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 14/32] aio: Make aio_read_evt() more efficient, convert to hrtimers Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2013-01-03 23:19   ` Andrew Morton
2013-01-03 23:19     ` Andrew Morton
2013-01-08  0:28     ` Kent Overstreet
2013-01-08  0:28       ` Kent Overstreet
2013-01-08  1:00       ` Andrew Morton [this message]
2013-01-08  1:00         ` Andrew Morton
2013-01-08  1:28         ` Kent Overstreet
2013-01-08  1:28           ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 15/32] aio: Use flush_dcache_page() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 16/32] aio: Use cancellation list lazily Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 17/32] aio: Change reqs_active to include unreaped completions Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 18/32] aio: Kill batch allocation Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 19/32] aio: Kill struct aio_ring_info Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 20/32] aio: Give shared kioctx fields their own cachelines Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2013-01-03 23:25   ` Andrew Morton
2013-01-03 23:25     ` Andrew Morton
2013-01-07 23:48     ` Kent Overstreet
2013-01-07 23:48       ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 21/32] aio: reqs_active -> reqs_available Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 22/32] aio: percpu reqs_available Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 23/32] Generic dynamic per cpu refcounting Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-03 22:48   ` Andrew Morton
2013-01-03 22:48     ` Andrew Morton
2013-01-07 23:47     ` Kent Overstreet
2013-01-07 23:47       ` Kent Overstreet
2013-01-08  1:03       ` [PATCH] percpu-refcount: Sparse fixes Kent Overstreet
2013-01-08  1:03         ` Kent Overstreet
2013-01-25  0:51   ` [PATCH 23/32] Generic dynamic per cpu refcounting Tejun Heo
2013-01-25  0:51     ` Tejun Heo
2013-01-25  1:13     ` Kent Overstreet
2013-01-25  1:13       ` Kent Overstreet
2013-01-25  2:03       ` Tejun Heo
2013-01-25  2:03         ` Tejun Heo
2013-01-25  2:09         ` Tejun Heo
2013-01-25  2:09           ` Tejun Heo
2013-01-28 17:48           ` Kent Overstreet
2013-01-28 17:48             ` Kent Overstreet
2013-01-28 18:18             ` Tejun Heo
2013-01-28 18:18               ` Tejun Heo
2013-01-25  6:15     ` Rusty Russell
2013-01-28 17:53       ` Kent Overstreet
2013-01-28 17:53         ` Kent Overstreet
2013-01-28 17:59         ` Tejun Heo
2013-01-28 17:59           ` Tejun Heo
2013-01-28 18:32           ` Kent Overstreet
2013-01-28 18:32             ` Kent Overstreet
2013-01-28 18:57             ` Christoph Lameter
2013-01-28 18:57               ` Christoph Lameter
2013-02-08 14:44   ` Tejun Heo
2013-02-08 14:44     ` Tejun Heo
2013-02-08 14:49     ` Jens Axboe
2013-02-08 14:49       ` Jens Axboe
2013-02-08 17:50       ` Andrew Morton
2013-02-08 17:50         ` Andrew Morton
2013-02-08 21:27       ` Kent Overstreet
2013-02-08 21:27         ` Kent Overstreet
2013-02-11 14:21         ` Jeff Moyer
2013-02-11 14:21           ` Jeff Moyer
2013-02-08 21:17     ` Kent Overstreet
2013-02-08 21:17       ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 24/32] aio: Percpu ioctx refcount Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 25/32] aio: use xchg() instead of completion_lock Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-03 23:34   ` Andrew Morton
2013-01-07 23:21     ` Kent Overstreet
2013-01-07 23:21       ` Kent Overstreet
2013-01-07 23:35       ` Andrew Morton
2013-01-07 23:35         ` Andrew Morton
2013-01-08  0:01         ` Kent Overstreet
2013-01-08  0:01           ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 26/32] aio: Don't include aio.h in sched.h Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 27/32] aio: Kill ki_key Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 28/32] aio: Kill ki_retry Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 29/32] block, aio: Batch completion for bios/kiocbs Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-04  9:22   ` Jens Axboe
2013-01-04  9:22     ` Jens Axboe
2013-01-07 23:34     ` Kent Overstreet
2013-01-07 23:34       ` Kent Overstreet
2013-01-08 15:33       ` Jeff Moyer
2013-01-08 15:33         ` Jeff Moyer
2013-01-08 16:06         ` Kent Overstreet
2013-01-08 16:06           ` Kent Overstreet
2013-01-08 16:15           ` Jeff Moyer
2013-01-08 16:15             ` Jeff Moyer
2013-01-08 16:48             ` Kent Overstreet
2013-01-08 16:48               ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 30/32] virtio-blk: Convert to batch completion Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 31/32] mtip32xx: " Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 32/32] aio: Smoosh struct kiocb Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-04  9:22 ` [PATCH 00/32] AIO performance improvements/cleanups, v3 Jens Axboe
2013-01-04  9:22   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130107170055.aec2b6f0.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=jmoyer@redhat.com \
    --cc=koverstreet@google.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.