From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH] ioeventfd: Introduce KVM_IOEVENTFD_FLAG_PIPE
Date: Mon, 04 Jul 2011 14:19:39 +0300
Message-ID: <4E11A1CB.2080709@redhat.com>
References: <1309712689-4290-1-git-send-email-levinsasha928@gmail.com> <20110704103207.GA11386@redhat.com> <4E1199B3.2010507@redhat.com> <20110704110723.GD11386@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Sasha Levin <levinsasha928@gmail.com>, kvm@vger.kernel.org,
	Ingo Molnar <mingo@elte.hu>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Pekka Enberg <penberg@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:35060 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754905Ab1GDLT7 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 4 Jul 2011 07:19:59 -0400
In-Reply-To: <20110704110723.GD11386@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 07/04/2011 02:07 PM, Michael S. Tsirkin wrote:
> On Mon, Jul 04, 2011 at 01:45:07PM +0300, Avi Kivity wrote:
> >  On 07/04/2011 01:32 PM, Michael S. Tsirkin wrote:
> >  >On Sun, Jul 03, 2011 at 08:04:49PM +0300, Sasha Levin wrote:
> >  >>   The new flag allows passing a write side of a pipe instead of an
> >  >>   eventfd to be notified of writes to the specified memory region.
> >  >>
> >  >>   Instead of signaling an event, the value written to the memory region
> >  >>   is written to the pipe.
> >  >>
> >  >>   Using a pipe instead of an eventfd is usefull when any value can be
> >  >>   written to the memory region but we're interested in recieving the
> >  >>   actual value instead of just a notification.
> >  >>
> >  >>   A simple example for practical use is the serial port. we are not
> >  >>   interested in an exit every time a char is written to the port, but
> >  >>   we do need to know what was written so we could handle it on the guest.
> >  >
> >  >Looking at this example, how would you handle a pipe full condition?
> >  >We can't buffer unlimited amount of data in the host.
> >
> >  Stall.
>
> Right, but the guest gets no indication that the pipe is full.
> Something like virtio would let the guest do something useful
> instead of stalling the vcpu.

That's not a problem.  The vcpu blocks, which lets the other process get 
the cpu and run with it.  If there are not enough cpu resources, we'll 
indeed stall the vcpu, but that happens whenever you're overcommitted 
anyway.

> Also noting that the fd can be set not to block, or that
> a signal can interrupt the write. Both cases are not errors.

One thing we can do is return via the normal KVM_EXIT_MMIO method and 
hope userspace knows how to handle this.  Otherwise I don't see what we 
can do.

> >  >
> >  >If pipe is non-blocking, or if we get a signal,
> >  >this might fail or return a value<   len.
> >  >Data will be lost then, won't it?
> >
> >  Yes.  Need a loop-until-buffer-exhausted-or-error.
>
> Signal handling becomes a problem. You don't want a
> full pipe to prevent qemu from getting killed or
> getting a timer alert.

Maybe we should require AF_UNIX SOCK_SEQPACKET connection.  That gives 
us atomicity, and drops the need for a mutex.

> >
> >  We should allow unix domain sockets as well.  In fact, for
> >  read/write support, we need this to be a unix domain socket.
>
> Sockets are actually better at this than pipes
> as you can at least make the writes
> non-blocking by passing in a message flag.

I'm not sure we want that.  How do we handle it?

If the socket buffers get filled up, it's time for the vcpu to wait for 
the mmio server process.  Let the scheduler sort things out.

btw, like vhost-net and other thread offloads, this sort of trick is 
dangerous.  When you have excess cpu resources throughput improves, but 
once the system is loaded, the workload is needlessly spread across more 
cores than strictly necessary and communication is done by context 
switches instead of user/system transitions.

> If we support sockets, do we really need to support
> pipes at all

I think not.

-- 
error compiling committee.c: too many arguments to function