Re: [PATCH] aio: Add command to wait completion of all requests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin LaHaise <bcrl@kvack.org>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: avagin@openvz.org, linux-kernel@vger.kernel.org,
	viro@zeniv.linux.org.uk, gorcunov@openvz.org,
	akpm@linux-foundation.org, xemul@virtuozzo.com
Subject: Re: [PATCH] aio: Add command to wait completion of all requests
Date: Tue, 13 Jun 2017 10:42:48 -0400	[thread overview]
Message-ID: <20170613144248.GK7230@kvack.org> (raw)
In-Reply-To: <149700173837.15252.8419518498235874341.stgit@localhost.localdomain>

On Fri, Jun 09, 2017 at 12:49:34PM +0300, Kirill Tkhai wrote:
> During implementation aio support for Checkpoint-Restore
> in Userspace project (CRIU), we found, that current
> kernel functionality does not allow to wait for
> completion of all submitted requests. Checkpoint software
> can't determine if there is no in-flight requests, i.e.
> the process aio rings memory is ready for dumping.

Simply put: no.  There is no way this command will ever complete under
various conditions - if another thread submits i/o, if the in flight i/o
is waiting on non-disk i/o, etc.  If you want to wait until all aios of a
task are complete, kill the process and wait for the zombie to be reaped.  
This is a simplistic approach to checkpointing that does not solve all
issues related to checkpoint aio.

		-ben

> The patch solves this problem. Also, the interface, it
> introduces, may be useful for other in-userspace users.
> Here new IOCB_CMD_WAIT_ALL cmd, which idea is to make
> the caller wait till the sum of completed and available
> requests becomes equal to (ctx->nr_events - 1). If they
> are equal, then there is no aio requests in-flight in
> the system.
> 
> Since available requests are per-cpu, we need synchronization
> with their other readers and writers. Variable ctx->dead
> is used for that, and we set it in 1 during synchronization.
> Now let's look how we become sure, that concurrents can
> see it on other cpus.
> 
> There is two cases. When the task is the only user of its
> memory, it races with aio_complete() only. This function
> takes ctx->completion_lock, so we simply use it to synchronize
> during per-cpu reading.
> 
> When there are more users of current->mm, we base on that
> put_reqs_available() and get_reqs_available() are already
> interrupts-free. Primitives local_irq_disable and
> local_irq_enable() work as rcu_read_xxx_sched() barriers,
> and waiter uses synchronize_sched() to propogate ctx->dead
> visability on other cpus. This RCU primitive works as
> full memory barrier, so nobody will use percpu reqs_available
> after we returned from it, and the waiting comes down
> to the first case further actions. The good point
> is the solution does not affect on existing code
> performance in any way, as it does not change it.
> 
> Couple more words about checkpoint/restore. We need
> especially this functionality, and we are not going to
> dump in-flight request for now, because the experiments
> show, that if dumping of a container was failed by
> another subsystem reasons, it's not always possible
> to queue cancelled requests back (file descriptors
> may be already closed, memory pressure). So, here is
> nothing about dumping.
> 
> It seams, this functionality will be useful not only
> for us, and others may use the new command like
> other standard aio commands.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  fs/aio.c                     |   71 +++++++++++++++++++++++++++++++++++++++---
>  include/uapi/linux/aio_abi.h |    1 +
>  2 files changed, 67 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index f52d925ee259..447fa4283c7c 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -899,7 +899,11 @@ static void put_reqs_available(struct kioctx *ctx, unsigned nr)
>  	struct kioctx_cpu *kcpu;
>  	unsigned long flags;
>  
> -	local_irq_save(flags);
> +	local_irq_save(flags); /* Implies rcu_read_lock_sched() */
> +	if (atomic_read(&ctx->dead)) {
> +		atomic_add(nr, &ctx->reqs_available);
> +		goto unlock;
> +	}
>  	kcpu = this_cpu_ptr(ctx->cpu);
>  	kcpu->reqs_available += nr;
>  
> @@ -907,8 +911,8 @@ static void put_reqs_available(struct kioctx *ctx, unsigned nr)
>  		kcpu->reqs_available -= ctx->req_batch;
>  		atomic_add(ctx->req_batch, &ctx->reqs_available);
>  	}
> -
> -	local_irq_restore(flags);
> +unlock:
> +	local_irq_restore(flags); /* Implies rcu_read_unlock_sched() */
>  }
>  
>  static bool get_reqs_available(struct kioctx *ctx)
> @@ -917,7 +921,9 @@ static bool get_reqs_available(struct kioctx *ctx)
>  	bool ret = false;
>  	unsigned long flags;
>  
> -	local_irq_save(flags);
> +	local_irq_save(flags); /* Implies rcu_read_lock_sched() */
> +	if (atomic_read(&ctx->dead))
> +		goto out;
>  	kcpu = this_cpu_ptr(ctx->cpu);
>  	if (!kcpu->reqs_available) {
>  		int old, avail = atomic_read(&ctx->reqs_available);
> @@ -937,7 +943,7 @@ static bool get_reqs_available(struct kioctx *ctx)
>  	ret = true;
>  	kcpu->reqs_available--;
>  out:
> -	local_irq_restore(flags);
> +	local_irq_restore(flags); /* Implies rcu_read_unlock_sched() */
>  	return ret;
>  }
>  
> @@ -1533,6 +1539,58 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
>  	return ret;
>  }
>  
> +static bool reqs_completed(struct kioctx *ctx)
> +{
> +	unsigned available;
> +	spin_lock_irq(&ctx->completion_lock);
> +	available = atomic_read(&ctx->reqs_available) + ctx->completed_events;
> +	spin_unlock_irq(&ctx->completion_lock);
> +
> +	return (available == ctx->nr_events - 1);
> +}
> +
> +static int aio_wait_all(struct kioctx *ctx)
> +{
> +	unsigned users, reqs = 0;
> +	struct kioctx_cpu *kcpu;
> +	int cpu, ret;
> +
> +	if (atomic_xchg(&ctx->dead, 1))
> +		return -EBUSY;
> +
> +	users = atomic_read(&current->mm->mm_users);
> +	if (users > 1) {
> +		/*
> +		 * Wait till concurrent threads and aio_complete() see
> +		 * dead flag. Implies full memory barrier on all cpus.
> +		 */
> +		synchronize_sched();
> +	} else {
> +		/*
> +		 * Sync with aio_complete() to be sure it puts reqs_available,
> +		 * when dead flag is already seen.
> +		 */
> +		spin_lock_irq(&ctx->completion_lock);
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		kcpu = per_cpu_ptr(ctx->cpu, cpu);
> +		reqs += kcpu->reqs_available;
> +		kcpu->reqs_available = 0;
> +	}
> +
> +	if (users == 1)
> +		spin_unlock_irq(&ctx->completion_lock);
> +
> +	atomic_add(reqs, &ctx->reqs_available);
> +
> +	ret = wait_event_interruptible(ctx->wait, reqs_completed(ctx));
> +
> +	atomic_set(&ctx->dead, 0);
> +
> +	return ret;
> +}
> +
>  static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  			 struct iocb *iocb, bool compat)
>  {
> @@ -1556,6 +1614,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  		return -EINVAL;
>  	}
>  
> +	if (iocb->aio_lio_opcode == IOCB_CMD_WAIT_ALL)
> +		return aio_wait_all(ctx);
> +
>  	req = aio_get_req(ctx);
>  	if (unlikely(!req))
>  		return -EAGAIN;
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index bb2554f7fbd1..29291946d4f1 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -44,6 +44,7 @@ enum {
>  	IOCB_CMD_NOOP = 6,
>  	IOCB_CMD_PREADV = 7,
>  	IOCB_CMD_PWRITEV = 8,
> +	IOCB_CMD_WAIT_ALL = 9,
>  };
>  
>  /*
> 

-- 
"Thought is the essence of where you are now."

next prev parent reply	other threads:[~2017-06-13 15:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09  9:49 [PATCH] aio: Add command to wait completion of all requests Kirill Tkhai
2017-06-13  8:14 ` Cyrill Gorcunov
2017-06-13  9:45   ` Kirill Tkhai
2017-06-13 10:32     ` Cyrill Gorcunov
2017-06-13 14:42 ` Benjamin LaHaise [this message]
2017-06-13 15:11   ` Kirill Tkhai
2017-06-13 15:26     ` Benjamin LaHaise
2017-06-13 16:17       ` Kirill Tkhai
2017-06-13 23:26         ` Benjamin LaHaise
2017-06-14  9:11           ` Kirill Tkhai
2017-06-14 14:47             ` Benjamin LaHaise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170613144248.GK7230@kvack.org \
    --to=bcrl@kvack.org \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@openvz.org \
    --cc=gorcunov@openvz.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.