From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41776) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1StsU9-0003wk-0B for qemu-devel@nongnu.org; Tue, 24 Jul 2012 23:42:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1StsU7-0007C9-9u for qemu-devel@nongnu.org; Tue, 24 Jul 2012 23:42:04 -0400 Received: from e36.co.us.ibm.com ([32.97.110.154]:44579) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1StsU7-0007Bw-0R for qemu-devel@nongnu.org; Tue, 24 Jul 2012 23:42:03 -0400 Received: from /spool/local by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 24 Jul 2012 21:42:01 -0600 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id B81E63E40039 for ; Wed, 25 Jul 2012 03:41:57 +0000 (WET) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q6P3fwTa081070 for ; Tue, 24 Jul 2012 21:41:58 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q6P3fvHs001263 for ; Tue, 24 Jul 2012 21:41:58 -0600 Message-ID: <500F6B04.4020508@linux.vnet.ibm.com> Date: Tue, 24 Jul 2012 23:41:56 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1343048885-1701-1-git-send-email-coreyb@linux.vnet.ibm.com> <1343048885-1701-7-git-send-email-coreyb@linux.vnet.ibm.com> <500E901D.3080801@redhat.com> In-Reply-To: <500E901D.3080801@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, eblake@redhat.com On 07/24/2012 08:07 AM, Kevin Wolf wrote: > Am 23.07.2012 15:08, schrieb Corey Bryant: >> When qemu_open is passed a filename of the "/dev/fdset/nnn" >> format (where nnn is the fdset ID), an fd with matching access >> mode flags will be searched for within the specified monitor >> fd set. If the fd is found, a dup of the fd will be returned >> from qemu_open. >> >> Each fd set has a reference count. The purpose of the reference >> count is to determine if an fd set contains file descriptors that >> have open dup() references that have not yet been closed. It is >> incremented on qemu_open and decremented on qemu_close. It is >> not until the refcount is zero that file desriptors in an fd set >> can be closed. If an fd set has dup() references open, then we >> must keep the other fds in the fd set open in case a reopen >> of the file occurs that requires an fd with a different access >> mode. >> >> Signed-off-by: Corey Bryant >> >> v2: >> -Get rid of file_open and move dup code to qemu_open >> (kwolf@redhat.com) >> -Use strtol wrapper instead of atoi (kwolf@redhat.com) >> >> v3: >> -Add note about fd leakage (eblake@redhat.com) >> >> v4 >> -Moved patch to be later in series (lcapitulino@redhat.com) >> -Update qemu_open to check access mode flags and set flags that >> can be set (eblake@redhat.com, kwolf@redhat.com) >> >> v5: >> -This patch was overhauled quite a bit in this version, with >> the addition of fd set and refcount support. >> -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com) >> -Modify flags set by fcntl on dup'd fd (eblake@redhat.com) >> -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com) >> -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com) >> --- >> block/raw-posix.c | 24 +++++----- >> block/raw-win32.c | 2 +- >> block/vmdk.c | 4 +- >> block/vpc.c | 2 +- >> block/vvfat.c | 12 ++--- >> cutils.c | 5 ++ >> monitor.c | 85 +++++++++++++++++++++++++++++++++ >> monitor.h | 4 ++ >> osdep.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++- >> qemu-common.h | 3 +- >> qemu-tool.c | 12 +++++ >> 11 files changed, 267 insertions(+), 24 deletions(-) >> >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index a172de3..5d0a801 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, >> out_free_buf: >> qemu_vfree(s->aligned_buf); >> out_close: >> - qemu_close(fd); >> + qemu_close(fd, filename); >> return -errno; >> } > > Hm, not a nice interface where qemu_close() needs the filename and > (worse) could be given a wrong filename. Maybe it would be better to > maintain a list of fd -> fdset mappings in qemu_open/close? > I agree, I don't really like it either. We already have a list of fd -> fdset mappings (mon_fdset_fd_t -> mon_fdset_t). Would it be too costly to loop through all the fdsets/fds at the beginning of every qemu_close()? > But if we decided to keep it like this, please use the right interface > from the beginning in patch 5 instead of updating it here. > Ok >> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use) >> } >> } >> >> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id) >> +{ >> + mon_fdset_t *mon_fdset; >> + >> + if (!mon) { >> + return; >> + } >> + >> + QLIST_FOREACH(mon_fdset, &mon->fdsets, next) { >> + if (mon_fdset->id == fdset_id) { >> + mon_fdset->refcount++; >> + break; >> + } >> + } >> +} >> + >> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id) >> +{ >> + mon_fdset_t *mon_fdset; >> + >> + if (!mon) { >> + return; >> + } >> + >> + QLIST_FOREACH(mon_fdset, &mon->fdsets, next) { >> + if (mon_fdset->id == fdset_id) { >> + mon_fdset->refcount--; >> + if (mon_fdset->refcount == 0) { >> + monitor_fdset_cleanup(mon_fdset); >> + } >> + break; >> + } >> + } >> +} > > These two functions are almost the same. Would a > monitor_fdset_update_refcount(mon, fdset_id, value) make sense? These > functions could then be kept as thin wrappers around it, or they could > even be dropped completely. > This makes sense and I'll try one of these approaches. I actually started to do something along these lines in v5 but reverted back to the two independent functions because it was easier to read the code. >> + >> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags) >> +{ >> + mon_fdset_t *mon_fdset; >> + mon_fdset_fd_t *mon_fdset_fd; >> + int mon_fd_flags; >> + >> + if (!mon) { >> + errno = ENOENT; >> + return -1; >> + } >> + >> + QLIST_FOREACH(mon_fdset, &mon->fdsets, next) { >> + if (mon_fdset->id != fdset_id) { >> + continue; >> + } >> + QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) { >> + if (mon_fdset_fd->removed) { >> + continue; >> + } >> + >> + mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL); >> + if (mon_fd_flags == -1) { >> + return -1; >> + } >> + >> + switch (flags & O_ACCMODE) { >> + case O_RDWR: >> + if ((mon_fd_flags & O_ACCMODE) == O_RDWR) { >> + return mon_fdset_fd->fd; >> + } >> + break; >> + case O_RDONLY: >> + if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) { >> + return mon_fdset_fd->fd; >> + } >> + break; >> + case O_WRONLY: >> + if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) { >> + return mon_fdset_fd->fd; >> + } >> + break; >> + } > > I think you mean: > > if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) { > return mon_fdset_fd->fd; > } Yes, that would be a bit simpler wouldn't it. :) > > What about other flags that cannot be set with fcntl(), like O_SYNC on > older kernels or possibly non-Linux? (The block layer doesn't use it any > more, but I think we want to keep the function generally useful) > I see what you're getting at here. Basically you could have 2 fds in an fdset with the same access mode flags, but one has O_SYNC on and the other has O_SYNC off. That seems like it would make sense to implement. As a work-around, I think a user could just create a separate fdset for the same file with different O_SYNC value. But from a client perspective, it would be nicer to have this taken care of for you. >> + } >> + errno = EACCES; >> + return -1; >> + } >> + errno = ENOENT; >> + return -1; >> +} >> + >> /* mon_cmds and info_cmds would be sorted at runtime */ >> static mon_cmd_t mon_cmds[] = { >> #include "hmp-commands.h" > >> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice) >> #endif >> } >> >> +/* >> + * Dups an fd and sets the flags >> + */ >> +static int qemu_dup(int fd, int flags) >> +{ >> + int i; >> + int ret; >> + int serrno; >> + int dup_flags; >> + int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, >> + O_NONBLOCK, 0 }; >> + >> + if (flags & O_CLOEXEC) { >> + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0); >> + if (ret == -1 && errno == EINVAL) { >> + ret = dup(fd); >> + if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) { >> + goto fail; >> + } >> + } >> + } else { >> + ret = dup(fd); >> + } >> + >> + if (ret == -1) { >> + goto fail; >> + } >> + >> + dup_flags = fcntl(ret, F_GETFL); >> + if (dup_flags == -1) { >> + goto fail; >> + } >> + >> + if ((flags & O_SYNC) != (dup_flags & O_SYNC)) { >> + errno = EINVAL; >> + goto fail; >> + } > > It's worth trying to set it before failing, newer kernels can do it. But > as I said above, if you can fail here, it makes sense to consider O_SYNC > when selecting the right file descriptor from the fdset. > I'm on a 3.4.4 Fedora kernel that doesn't appear to support fcntl(O_SYNC), but perhaps I'm doing something wrong. Here's my test code (shortened for simplicty): int main() { int fd; int flags; fd = open("/tmp/corey", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); flags = fcntl(fd, F_GETFL); printf("flags=%08x\n", flags); //A fcntl(fd, F_SETFL, O_SYNC); printf("O_SYNC=%08x\n", O_SYNC); flags = fcntl(fd, F_GETFL); printf("flags=%08x\n", flags); //B } fcntl doesn't fail, but the flags I print out at A are the same as the flags printed out at B, so it appears that O_SYNC doesn't get set. >> + /* Set/unset flags that we can with fcntl */ >> + i = 0; >> + while (setfl_flags[i] != 0) { >> + if (flags & setfl_flags[i]) { >> + dup_flags = (dup_flags | setfl_flags[i]); >> + } else { >> + dup_flags = (dup_flags & ~setfl_flags[i]); >> + } >> + i++; >> + } > > What about this instead of the loop: > > int setfl_flags = O_APPEND | O_ASYNC | ... ; > > dup_flags &= ~setfl_flags; > dup_flags |= (flags & setfl_flags); > > I like your suggestion, it's much simpler. >> + >> + if (fcntl(ret, F_SETFL, dup_flags) == -1) { >> + goto fail; >> + } >> + >> + /* Truncate the file in the cases that open() would truncate it */ >> + if (flags & O_TRUNC || >> + ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) { >> + if (ftruncate(ret, 0) == -1) { >> + goto fail; >> + } >> + } > > O_CREAT | O_EXCL kind of loses its meaning here, but okay, it's hard to > do better with file descriptors. > I agree and I don't know if we can do any better. >> + >> + qemu_set_cloexec(ret); > > Wait... If O_CLOEXEC is set, you set the flag immediately and if it > isn't you set it at the end of the function? What's the intended use > case for not using O_CLOEXEC then? > This is a mistake. I think I just need to be using qemu_set_cloexec() instead of fcntl_setfl() earlier in this function and get rid of this latter call to qemu_set_cloexec(). -- Regards, Corey