All of lore.kernel.org
 help / color / mirror / Atom feed
From: Asias He <asias@redhat.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support
Date: Wed, 18 Jul 2012 09:22:48 +0800	[thread overview]
Message-ID: <50060FE8.4040607@redhat.com> (raw)
In-Reply-To: <x49mx2yyzi5.fsf@segfault.boston.devel.redhat.com>

On 07/18/2012 03:10 AM, Jeff Moyer wrote:
> Asias He <asias@redhat.com> writes:
>
>> vhost-blk is a in kernel virito-blk device accelerator.
>>
>> This patch is based on Liu Yuan's implementation with various
>> improvements and bug fixes. Notably, this patch makes guest notify and
>> host completion processing in parallel which gives about 60% performance
>> improvement compared to Liu Yuan's implementation.
>
> So, first off, some basic questions.  Is it correct to assume that you
> tested this with buffered I/O (files opened *without* O_DIRECT)?
>  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
> are solved by the patch set posted by Shaggy, based on Zach Brown's work
> of many moons ago).  Note that, with buffered I/O, the submission path
> is NOT asynchronous.  So, any speedups you've reported are extremely
> suspect.  ;-)

I always used O_DIRECT to test this patchset. And I mostly used raw 
block device as guest image. Is this the reason why I did not hit the 
problem you mentioned. Btw, I do have run this patchset on image based 
file. I still do not see problems like IO hangs.

> Next, did you look at Shaggy's patch set?  I think it would be best to
> focus your efforts on testing *that*, and implementing your work on top
> of it.

I guess you mean this one:

http://marc.info/?l=linux-fsdevel&m=133312234313122

I did not notice that until James pointed that out.

I talked with Zach and Shaggy. Shaggy said he is still working on that 
patch set and will send that patch out soon.


> Having said that, I did do some review of this patch, inlined below.

Thanks, Jeff!

>> +static int vhost_blk_setup(struct vhost_blk *blk)
>> +{
>> +	struct kioctx *ctx;
>> +
>> +	if (blk->ioctx)
>> +		return 0;
>> +
>> +	blk->ioevent_nr = blk->vq.num;
>> +	ctx = ioctx_alloc(blk->ioevent_nr);
>> +	if (IS_ERR(ctx)) {
>> +		pr_err("Failed to ioctx_alloc");
>> +		return PTR_ERR(ctx);
>> +	}
>> +	put_ioctx(ctx);
>> +	blk->ioctx = ctx;
>> +
>> +	blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
>> +			       GFP_KERNEL);
>> +	if (!blk->ioevent) {
>> +		pr_err("Failed to allocate memory for io_events");
>> +		return -ENOMEM;
>
> You've just leaked blk->ioctx.

Yes. Will fix.

>> +	}
>> +
>> +	blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->ioevent_nr,
>> +			    GFP_KERNEL);
>> +	if (!blk->reqs) {
>> +		pr_err("Failed to allocate memory for vhost_blk_req");
>> +		return -ENOMEM;
>
> And here.

Yes. Will fix.

>
>> +	}
>> +
>> +	return 0;
>> +}
>> +
> [snip]
>> +static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
>> +			       struct vhost_blk_req *req,
>> +			       struct iovec *iov, u64 nr_vecs, loff_t offset,
>> +			       int opcode)
>> +{
>> +	struct kioctx *ioctx = blk->ioctx;
>> +	mm_segment_t oldfs = get_fs();
>> +	struct kiocb_batch batch;
>> +	struct blk_plug plug;
>> +	struct kiocb *iocb;
>> +	int ret;
>> +
>> +	if (!try_get_ioctx(ioctx)) {
>> +		pr_info("Failed to get ioctx");
>> +		return -EAGAIN;
>> +	}
>
> Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
> understand that you don't need to do the lookup, but at least wrap it
> and check for ->dead.

OK.

>
>> +
>> +	atomic_long_inc_not_zero(&file->f_count);
>> +	eventfd_ctx_get(blk->ectx);
>> +
>> +	/* TODO: batch to 1 is not good! */
>
> Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
> The way you've written the code, the batching is not at all helpful.

Yes. that's why there is a TODO.

>> +	kiocb_batch_init(&batch, 1);
>> +	blk_start_plug(&plug);
>> +
>> +	iocb = aio_get_req(ioctx, &batch);
>> +	if (unlikely(!iocb)) {
>> +		ret = -EAGAIN;
>> +		goto out;
>> +	}
>> +
>> +	iocb->ki_filp	= file;
>> +	iocb->ki_pos	= offset;
>> +	iocb->ki_buf	= (void *)iov;
>> +	iocb->ki_left	= nr_vecs;
>> +	iocb->ki_nbytes	= nr_vecs;
>> +	iocb->ki_opcode	= opcode;
>> +	iocb->ki_obj.user = req;
>> +	iocb->ki_eventfd  = blk->ectx;
>> +
>> +	set_fs(KERNEL_DS);
>> +	ret = aio_setup_iocb(iocb, false);
>> +	set_fs(oldfs);
>> +	if (unlikely(ret))
>> +		goto out_put_iocb;
>> +
>> +	spin_lock_irq(&ioctx->ctx_lock);
>> +	if (unlikely(ioctx->dead)) {
>> +		spin_unlock_irq(&ioctx->ctx_lock);
>> +		ret = -EINVAL;
>> +		goto out_put_iocb;
>> +	}
>> +	aio_run_iocb(iocb);
>> +	spin_unlock_irq(&ioctx->ctx_lock);
>> +
>> +	aio_put_req(iocb);
>> +
>> +	blk_finish_plug(&plug);
>> +	kiocb_batch_free(ioctx, &batch);
>> +	put_ioctx(ioctx);
>> +
>> +	return ret;
>> +out_put_iocb:
>> +	aio_put_req(iocb); /* Drop extra ref to req */
>> +	aio_put_req(iocb); /* Drop I/O ref to req */
>> +out:
>> +	put_ioctx(ioctx);
>> +	return ret;
>> +}
>> +
>
> You've duplicated a lot of io_submit_one.  I'd rather see that factored
> out than to have to maintain two copies.

Agree.

> Again, what I'd *really* like to see is you rebase on top of Shaggy's
> work.

Sure. Let's wait for Shaggy's new version.


-- 
Asias

WARNING: multiple messages have this Message-ID (diff)
From: Asias He <asias@redhat.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support
Date: Wed, 18 Jul 2012 09:22:48 +0800	[thread overview]
Message-ID: <50060FE8.4040607@redhat.com> (raw)
In-Reply-To: <x49mx2yyzi5.fsf@segfault.boston.devel.redhat.com>

On 07/18/2012 03:10 AM, Jeff Moyer wrote:
> Asias He <asias@redhat.com> writes:
>
>> vhost-blk is a in kernel virito-blk device accelerator.
>>
>> This patch is based on Liu Yuan's implementation with various
>> improvements and bug fixes. Notably, this patch makes guest notify and
>> host completion processing in parallel which gives about 60% performance
>> improvement compared to Liu Yuan's implementation.
>
> So, first off, some basic questions.  Is it correct to assume that you
> tested this with buffered I/O (files opened *without* O_DIRECT)?
>  I'm pretty sure that if you used O_DIRECT, you'd run into problems (which
> are solved by the patch set posted by Shaggy, based on Zach Brown's work
> of many moons ago).  Note that, with buffered I/O, the submission path
> is NOT asynchronous.  So, any speedups you've reported are extremely
> suspect.  ;-)

I always used O_DIRECT to test this patchset. And I mostly used raw 
block device as guest image. Is this the reason why I did not hit the 
problem you mentioned. Btw, I do have run this patchset on image based 
file. I still do not see problems like IO hangs.

> Next, did you look at Shaggy's patch set?  I think it would be best to
> focus your efforts on testing *that*, and implementing your work on top
> of it.

I guess you mean this one:

http://marc.info/?l=linux-fsdevel&m=133312234313122

I did not notice that until James pointed that out.

I talked with Zach and Shaggy. Shaggy said he is still working on that 
patch set and will send that patch out soon.


> Having said that, I did do some review of this patch, inlined below.

Thanks, Jeff!

>> +static int vhost_blk_setup(struct vhost_blk *blk)
>> +{
>> +	struct kioctx *ctx;
>> +
>> +	if (blk->ioctx)
>> +		return 0;
>> +
>> +	blk->ioevent_nr = blk->vq.num;
>> +	ctx = ioctx_alloc(blk->ioevent_nr);
>> +	if (IS_ERR(ctx)) {
>> +		pr_err("Failed to ioctx_alloc");
>> +		return PTR_ERR(ctx);
>> +	}
>> +	put_ioctx(ctx);
>> +	blk->ioctx = ctx;
>> +
>> +	blk->ioevent = kmalloc(sizeof(struct io_event) * blk->ioevent_nr,
>> +			       GFP_KERNEL);
>> +	if (!blk->ioevent) {
>> +		pr_err("Failed to allocate memory for io_events");
>> +		return -ENOMEM;
>
> You've just leaked blk->ioctx.

Yes. Will fix.

>> +	}
>> +
>> +	blk->reqs = kmalloc(sizeof(struct vhost_blk_req) * blk->ioevent_nr,
>> +			    GFP_KERNEL);
>> +	if (!blk->reqs) {
>> +		pr_err("Failed to allocate memory for vhost_blk_req");
>> +		return -ENOMEM;
>
> And here.

Yes. Will fix.

>
>> +	}
>> +
>> +	return 0;
>> +}
>> +
> [snip]
>> +static int vhost_blk_io_submit(struct vhost_blk *blk, struct file *file,
>> +			       struct vhost_blk_req *req,
>> +			       struct iovec *iov, u64 nr_vecs, loff_t offset,
>> +			       int opcode)
>> +{
>> +	struct kioctx *ioctx = blk->ioctx;
>> +	mm_segment_t oldfs = get_fs();
>> +	struct kiocb_batch batch;
>> +	struct blk_plug plug;
>> +	struct kiocb *iocb;
>> +	int ret;
>> +
>> +	if (!try_get_ioctx(ioctx)) {
>> +		pr_info("Failed to get ioctx");
>> +		return -EAGAIN;
>> +	}
>
> Using try_get_ioctx directly gives me a slightly uneasy feeling.  I
> understand that you don't need to do the lookup, but at least wrap it
> and check for ->dead.

OK.

>
>> +
>> +	atomic_long_inc_not_zero(&file->f_count);
>> +	eventfd_ctx_get(blk->ectx);
>> +
>> +	/* TODO: batch to 1 is not good! */
>
> Agreed.  You should setup the batching in vhost_blk_handle_guest_kick.
> The way you've written the code, the batching is not at all helpful.

Yes. that's why there is a TODO.

>> +	kiocb_batch_init(&batch, 1);
>> +	blk_start_plug(&plug);
>> +
>> +	iocb = aio_get_req(ioctx, &batch);
>> +	if (unlikely(!iocb)) {
>> +		ret = -EAGAIN;
>> +		goto out;
>> +	}
>> +
>> +	iocb->ki_filp	= file;
>> +	iocb->ki_pos	= offset;
>> +	iocb->ki_buf	= (void *)iov;
>> +	iocb->ki_left	= nr_vecs;
>> +	iocb->ki_nbytes	= nr_vecs;
>> +	iocb->ki_opcode	= opcode;
>> +	iocb->ki_obj.user = req;
>> +	iocb->ki_eventfd  = blk->ectx;
>> +
>> +	set_fs(KERNEL_DS);
>> +	ret = aio_setup_iocb(iocb, false);
>> +	set_fs(oldfs);
>> +	if (unlikely(ret))
>> +		goto out_put_iocb;
>> +
>> +	spin_lock_irq(&ioctx->ctx_lock);
>> +	if (unlikely(ioctx->dead)) {
>> +		spin_unlock_irq(&ioctx->ctx_lock);
>> +		ret = -EINVAL;
>> +		goto out_put_iocb;
>> +	}
>> +	aio_run_iocb(iocb);
>> +	spin_unlock_irq(&ioctx->ctx_lock);
>> +
>> +	aio_put_req(iocb);
>> +
>> +	blk_finish_plug(&plug);
>> +	kiocb_batch_free(ioctx, &batch);
>> +	put_ioctx(ioctx);
>> +
>> +	return ret;
>> +out_put_iocb:
>> +	aio_put_req(iocb); /* Drop extra ref to req */
>> +	aio_put_req(iocb); /* Drop I/O ref to req */
>> +out:
>> +	put_ioctx(ioctx);
>> +	return ret;
>> +}
>> +
>
> You've duplicated a lot of io_submit_one.  I'd rather see that factored
> out than to have to maintain two copies.

Agree.

> Again, what I'd *really* like to see is you rebase on top of Shaggy's
> work.

Sure. Let's wait for Shaggy's new version.


-- 
Asias



  reply	other threads:[~2012-07-18  1:22 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-13  8:55 [PATCH RESEND 0/5] Add vhost-blk support Asias He
2012-07-13  8:55 ` Asias He
2012-07-13  8:55 ` [PATCH RESEND 1/5] aio: Export symbols and struct kiocb_batch for in kernel aio usage Asias He
2012-07-13  8:55 ` Asias He
2012-07-13  8:55   ` Asias He
2012-07-13  8:55 ` [PATCH RESEND 2/5] eventfd: Export symbol eventfd_file_create() Asias He
2012-07-13  8:55   ` Asias He
2012-07-13  8:55 ` [PATCH RESEND 3/5] vhost: Make vhost a separate module Asias He
2012-07-13  8:55   ` Asias He
2012-07-13  8:55 ` [PATCH RESEND 4/5] vhost-net: Use VHOST_NET_FEATURES for vhost-net Asias He
2012-07-13  8:55   ` Asias He
2012-07-13  8:55 ` [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support Asias He
2012-07-13  8:55   ` Asias He
2012-07-17 19:10   ` Jeff Moyer
2012-07-17 19:10   ` Jeff Moyer
2012-07-18  1:22     ` Asias He [this message]
2012-07-18  1:22       ` Asias He
2012-07-18 14:31       ` Jeff Moyer
2012-07-18 14:45         ` Asias He
2012-07-18 14:45           ` Asias He
2012-07-18 14:31       ` Jeff Moyer
2012-07-19 13:05   ` Anthony Liguori
2012-07-19 13:05   ` Anthony Liguori
2012-07-19 13:09     ` Michael S. Tsirkin
2012-07-19 13:09       ` Michael S. Tsirkin
2012-07-19 13:09     ` Michael S. Tsirkin
2012-07-19 13:09       ` Michael S. Tsirkin
2012-07-20 10:31       ` Stefan Hajnoczi
2012-07-20 10:31         ` Stefan Hajnoczi
2012-07-20 20:56       ` Anthony Liguori
2012-07-21  1:07         ` Asias He
2012-07-21  1:07           ` Asias He
2012-07-20 20:56       ` Anthony Liguori
2012-07-14  7:49 ` [PATCH RESEND 0/5] " Christoph Hellwig
2012-07-14  7:49   ` Christoph Hellwig
2012-07-16  9:05   ` Asias He
2012-07-16  9:05   ` Asias He
2012-07-16  9:05     ` Asias He
2012-07-14  7:49 ` Christoph Hellwig
2012-07-17 15:09 ` Michael S. Tsirkin
2012-07-17 15:09 ` Michael S. Tsirkin
2012-07-17 15:09   ` Michael S. Tsirkin
2012-07-18  2:09   ` Asias He
2012-07-18  2:09   ` Asias He
2012-07-18  2:09     ` Asias He
2012-07-18 11:42   ` Stefan Hajnoczi
2012-07-18 11:42   ` Stefan Hajnoczi
2012-07-20 19:30 ` Michael S. Tsirkin
2012-07-20 19:30   ` Michael S. Tsirkin
2012-07-20 19:30 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50060FE8.4040607@redhat.com \
    --to=asias@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.