From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally Date: Thu, 01 Oct 2009 11:50:41 -0500 Message-ID: <4AC4DDE1.1070803@codemonkey.ws> References: <6e46fe952ba8d1896e3cab5b24232828d3f827a9.1253272938.git.quintela@redhat.com> <20090922131901.GA22109@infradead.org> <20090922133438.GA12443@infradead.org> <4AC4BB4E.2000906@codemonkey.ws> <4AC4C030.5000004@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Juan Quintela , Christoph Hellwig , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from fg-out-1718.google.com ([72.14.220.152]:27971 "EHLO fg-out-1718.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755465AbZJAQuo (ORCPT ); Thu, 1 Oct 2009 12:50:44 -0400 Received: by fg-out-1718.google.com with SMTP id 22so140756fge.1 for ; Thu, 01 Oct 2009 09:50:47 -0700 (PDT) In-Reply-To: <4AC4C030.5000004@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 10/01/2009 04:23 PM, Anthony Liguori wrote: >> Juan Quintela wrote: >>> Discused with Anthony about it. signalfd is complicated for qemu >>> upstream (too difficult to use properly), >> >> It's not an issue of being difficult. >> >> To emulate signalfd, we need to create a thread that writes to a pipe >> from a signal handler. The problem is that a write() can return a >> partial result and following the partial result, we can end up >> getting an EAGAIN. We have no way to queue signals beyond that point >> and we have no sane way to deal with partial writes. > > pipe buffers are multiples of of the signalfd size. As long as we > read and write signalfd-sized blocks, we won't get partial writes. > It's true that depending on an implementation detail is bad practice, > but this is emulation code, and if helps simplifying everything else, > I think it's fine to use it. That's a pretty hairy detail to rely upon.. >> Instead, how we do this in upstream QEMU is that we install a signal >> handler and write one byte to the fd. If we get EAGAIN, that's fine >> because all we care about is that at least one byte exists in the >> fd's buffer. This requires that we use an fd-per-signal which means >> we end up with a different model than signalfd. >> >> The reason to use signalfd over what we do in upstream QEMU is that >> signalfd can allow us to mask the signals which means less EINTRs. I >> don't think that's a huge advantage and the inability to do backwards >> compatibility in a sane way means that emulated signalfd is not >> workable. > > signalfd is several microseconds faster than signals + pipes. Do we > have so much performance we can throw some of it away? Do we have any indication that this difference is actually observable? This seems like very premature optimization. >> The same is generally true for eventfd. > > eventfd emulation will also never get partial writes. But you cannot emulate eventfd faithfully because eventfd is supposed to be additive. If you write 1 50x to eventfd, you should be able to read a set of integers that add up to 50. If you hit EAGAIN in a signal handler, you have no way of handling that. As I said earlier, the better thing to do is have a higher level interface that has a subset of the behavior of eventfd/signalfd that we can emulate correctly. Regards, Anthony Liguori