From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally Date: Thu, 01 Oct 2009 19:00:58 +0200 Message-ID: <4AC4E04A.4010304@redhat.com> References: <6e46fe952ba8d1896e3cab5b24232828d3f827a9.1253272938.git.quintela@redhat.com> <20090922131901.GA22109@infradead.org> <20090922133438.GA12443@infradead.org> <4AC4BB4E.2000906@codemonkey.ws> <4AC4C030.5000004@redhat.com> <4AC4DDE1.1070803@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Juan Quintela , Christoph Hellwig , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mx1.redhat.com ([209.132.183.28]:64301 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750706AbZJARBG (ORCPT ); Thu, 1 Oct 2009 13:01:06 -0400 In-Reply-To: <4AC4DDE1.1070803@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: On 10/01/2009 06:50 PM, Anthony Liguori wrote: > Avi Kivity wrote: >> On 10/01/2009 04:23 PM, Anthony Liguori wrote: >>> Juan Quintela wrote: >>>> Discused with Anthony about it. signalfd is complicated for qemu >>>> upstream (too difficult to use properly), >>> >>> It's not an issue of being difficult. >>> >>> To emulate signalfd, we need to create a thread that writes to a >>> pipe from a signal handler. The problem is that a write() can >>> return a partial result and following the partial result, we can end >>> up getting an EAGAIN. We have no way to queue signals beyond that >>> point and we have no sane way to deal with partial writes. >> >> pipe buffers are multiples of of the signalfd size. As long as we >> read and write signalfd-sized blocks, we won't get partial writes. >> It's true that depending on an implementation detail is bad practice, >> but this is emulation code, and if helps simplifying everything else, >> I think it's fine to use it. > > That's a pretty hairy detail to rely upon.. Well, it's a posix detail, as I quoted below. I'm not in love with it but it should work. > >>> Instead, how we do this in upstream QEMU is that we install a signal >>> handler and write one byte to the fd. If we get EAGAIN, that's fine >>> because all we care about is that at least one byte exists in the >>> fd's buffer. This requires that we use an fd-per-signal which means >>> we end up with a different model than signalfd. >>> >>> The reason to use signalfd over what we do in upstream QEMU is that >>> signalfd can allow us to mask the signals which means less EINTRs. >>> I don't think that's a huge advantage and the inability to do >>> backwards compatibility in a sane way means that emulated signalfd >>> is not workable. >> >> signalfd is several microseconds faster than signals + pipes. Do we >> have so much performance we can throw some of it away? > > Do we have any indication that this difference is actually > observable? This seems like very premature optimization. Multiply the signal rate by "a few microseconds", if you get more than 0.1% cpu it's worthwhile in my opinion. The code is localized, and signalfd is a better interface than signals. > >>> The same is generally true for eventfd. >> >> eventfd emulation will also never get partial writes. > > But you cannot emulate eventfd faithfully because eventfd is supposed > to be additive. If you write 1 50x to eventfd, you should be able to > read a set of integers that add up to 50. If you hit EAGAIN in a > signal handler, you have no way of handling that. We never rely on the count anyway. You can simply ignore EAGAIN. > As I said earlier, the better thing to do is have a higher level > interface that has a subset of the behavior of eventfd/signalfd that > we can emulate correctly. Sure, but it's more work. Copying an existing interface is easier. It's not like there's no other work in qemu left to be done. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.