From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50459) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SzBMS-0007Hz-Fw for qemu-devel@nongnu.org; Wed, 08 Aug 2012 14:52:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SzBMP-0000Dd-Qn for qemu-devel@nongnu.org; Wed, 08 Aug 2012 14:52:04 -0400 Received: from e3.ny.us.ibm.com ([32.97.182.143]:59580) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SzBMP-0000Ca-MB for qemu-devel@nongnu.org; Wed, 08 Aug 2012 14:52:01 -0400 Received: from /spool/local by e3.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 8 Aug 2012 14:51:58 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 4A32038C8042 for ; Wed, 8 Aug 2012 14:51:54 -0400 (EDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q78Ipq6S096706 for ; Wed, 8 Aug 2012 14:51:52 -0400 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q78IpoAn005987 for ; Wed, 8 Aug 2012 12:51:50 -0600 Message-ID: <5022B544.2030203@linux.vnet.ibm.com> Date: Wed, 08 Aug 2012 14:51:48 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1344355108-14786-1-git-send-email-coreyb@linux.vnet.ibm.com> <50227D91.7090609@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 0/6] file descriptor passing using fd sets List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, eblake@redhat.com On 08/08/2012 11:58 AM, Stefan Hajnoczi wrote: > On Wed, Aug 8, 2012 at 3:54 PM, Corey Bryant wrote: >> >> >> On 08/08/2012 09:04 AM, Stefan Hajnoczi wrote: >>> >>> On Tue, Aug 7, 2012 at 4:58 PM, Corey Bryant >>> wrote: >>>> >>>> libvirt's sVirt security driver provides SELinux MAC isolation for >>>> Qemu guest processes and their corresponding image files. In other >>>> words, sVirt uses SELinux to prevent a QEMU process from opening >>>> files that do not belong to it. >>>> >>>> sVirt provides this support by labeling guests and resources with >>>> security labels that are stored in file system extended attributes. >>>> Some file systems, such as NFS, do not support the extended >>>> attribute security namespace, and therefore cannot support sVirt >>>> isolation. >>>> >>>> A solution to this problem is to provide fd passing support, where >>>> libvirt opens files and passes file descriptors to QEMU. This, >>>> along with SELinux policy to prevent QEMU from opening files, can >>>> provide image file isolation for NFS files stored on the same NFS >>>> mount. >>>> >>>> This patch series adds the add-fd, remove-fd, and query-fdsets >>>> QMP monitor commands, which allow file descriptors to be passed >>>> via SCM_RIGHTS, and assigned to specified fd sets. This allows >>>> fd sets to be created per file with fds having, for example, >>>> different access rights. When QEMU needs to reopen a file with >>>> different access rights, it can search for a matching fd in the >>>> fd set. Fd sets also allow for easy tracking of fds per file, >>>> helping to prevent fd leaks. >>>> >>>> Support is also added to the block layer to allow QEMU to dup an >>>> fd from an fdset when the filename is of the /dev/fdset/nnn format, >>>> where nnn is the fd set ID. >>>> >>>> No new SELinux policy is required to prevent open of NFS files >>>> (files with type nfs_t). The virt_use_nfs boolean type simply >>>> needs to be set to false, and open will be prevented (and dup will >>>> be allowed). For example: >>>> >>>> # setsebool virt_use_nfs 0 >>>> # getsebool virt_use_nfs >>>> virt_use_nfs --> off >>>> >>>> Corey Bryant (6): >>>> qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg >>>> qapi: Introduce add-fd, remove-fd, query-fdsets >>>> monitor: Clean up fd sets on monitor disconnect >>>> block: Convert open calls to qemu_open >>>> block: Convert close calls to qemu_close >>>> block: Enable qemu_open/close to work with fd sets >>>> >>>> block/raw-posix.c | 42 ++++----- >>>> block/raw-win32.c | 6 +- >>>> block/vdi.c | 5 +- >>>> block/vmdk.c | 25 +++-- >>>> block/vpc.c | 4 +- >>>> block/vvfat.c | 16 ++-- >>>> cutils.c | 5 + >>>> monitor.c | 273 >>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> monitor.h | 5 + >>>> osdep.c | 117 +++++++++++++++++++++++ >>>> qapi-schema.json | 110 +++++++++++++++++++++ >>>> qemu-char.c | 12 ++- >>>> qemu-common.h | 2 + >>>> qemu-tool.c | 20 ++++ >>>> qerror.c | 4 + >>>> qerror.h | 3 + >>>> qmp-commands.hx | 131 +++++++++++++++++++++++++ >>>> savevm.c | 4 +- >>>> 18 files changed, 730 insertions(+), 54 deletions(-) >>> >>> >>> Are there tests for this feature? Do you have test scripts used >>> during development? >> >> >> Yes I have some C code that I've been using for testing. I can clean it up >> and provide it if you'd like. > > That would be very useful. tests/ has test cases. For the block > layer tests/qemu-iotests/ is especially relevant, that's where a lot > of the test cases go. If you look at test case 030 you'll see how a > Python script interacts with QMP to test image streaming - > unfortunately I think Python doesn't natively support SCM_RIGHTS. But > a test script would be very useful so it can be used as a regression > test in the future. > Sure I'll take a look. Hopefully a C test is ok if I can't use SCM_RIGHTS in Python. >>> >>> Here's what I've gathered: >>> >>> Applications use add-fd to add file descriptors to fd sets. An fd set >>> contains one or more file descriptors, each with different access >>> modes (O_RDONLY, O_RDWR, O_WRONLY). File descriptors can be retrieved >>> from the fd set and are matched by their access modes. This allows >>> QEMU to reopen files with different access modes. >>> >>> File descriptors stay in their fd set until explicitly removed by the >>> remove-fd command or when all monitor clients have disconnected. This >>> ensures that file descriptors are not leaked after a monitor client >>> crashes. Automatic removal on monitor close is postponed until all >>> duped fds have been fd - this means QEMU can still reopen an in-use fd >> >> >> I assume you mean "... until all duped fds have been *closed* - ..." > > Yes, my typo :) > Great, then your understanding of how this works is correct. :) >>> after a client disconnects. >>> >>> Does this sound right? >> >> >> Yes, exactly. >> >> I should point out there is an issue that needs to be cleaned up in the >> future. There are short windows of time where refcount can get to zero >> while an image file is in use. This is because the file is being reopened. >> For example, I've noticed this occurs when format= is not specified on the >> device_add command and the file is probed, and when mouting/unmounting a >> file system. Hopefully this can be treated as a follow-up issue. > > The block layer doesn't treat this as a "reopen" today. Supriya > Kannery has a patch series for bdrv_reopen() which would also need to > be integrated with fd sets to ensure the refcount doesn't hit 0 and > cause a cleanup. > Great, Supriya's patches sound like what is needed. Also, I noticed that I'm missing a patch in my series. I need to make sure that /dev/fdset/nnn is not detected as a floppy drive (/dev/fdx). That was causing a close/open. -- Regards, Corey