From: Kevin Wolf <kwolf@redhat.com>
To: Corey Bryant <coreyb@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
libvir-list@redhat.com, qemu-devel@nongnu.org,
lcapitulino@redhat.com, eblake@redhat.com
Subject: Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
Date: Wed, 25 Jul 2012 10:22:46 +0200 [thread overview]
Message-ID: <500FACD6.700@redhat.com> (raw)
In-Reply-To: <500F6B04.4020508@linux.vnet.ibm.com>
Am 25.07.2012 05:41, schrieb Corey Bryant:
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index a172de3..5d0a801 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>> out_free_buf:
>>> qemu_vfree(s->aligned_buf);
>>> out_close:
>>> - qemu_close(fd);
>>> + qemu_close(fd, filename);
>>> return -errno;
>>> }
>>
>> Hm, not a nice interface where qemu_close() needs the filename and
>> (worse) could be given a wrong filename. Maybe it would be better to
>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>
>
> I agree, I don't really like it either.
>
> We already have a list of fd -> fdset mappings (mon_fdset_fd_t ->
> mon_fdset_t). Would it be too costly to loop through all the fdsets/fds
> at the beginning of every qemu_close()?
I don't think so. qemu_close() is not a fast path and happens almost
never, and the list is short enough that searching it isn't a problem
anyway.
>>> + switch (flags & O_ACCMODE) {
>>> + case O_RDWR:
>>> + if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>>> + return mon_fdset_fd->fd;
>>> + }
>>> + break;
>>> + case O_RDONLY:
>>> + if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>>> + return mon_fdset_fd->fd;
>>> + }
>>> + break;
>>> + case O_WRONLY:
>>> + if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>>> + return mon_fdset_fd->fd;
>>> + }
>>> + break;
>>> + }
>>
>> I think you mean:
>>
>> if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
>> return mon_fdset_fd->fd;
>> }
>
> Yes, that would be a bit simpler wouldn't it. :)
>
>>
>> What about other flags that cannot be set with fcntl(), like O_SYNC on
>> older kernels or possibly non-Linux? (The block layer doesn't use it any
>> more, but I think we want to keep the function generally useful)
>>
>
> I see what you're getting at here. Basically you could have 2 fds in an
> fdset with the same access mode flags, but one has O_SYNC on and the
> other has O_SYNC off. That seems like it would make sense to implement.
> As a work-around, I think a user could just create a separate fdset
> for the same file with different O_SYNC value. But from a client
> perspective, it would be nicer to have this taken care of for you.
Now that the block layer doesn't use O_SYNC any more, it's more of a
theoretical point. I don't think there's any other place, where we'd
need to switch O_SYNC during runtime.
Taking it into consideration is complicated by the fact that some
kernels allow to fcntl() O_SYNC and others don't, so enforcing a match
here wouldn't feel completely right either.
Maybe just leave it as it is. :-)
>
>>> + }
>>> + errno = EACCES;
>>> + return -1;
>>> + }
>>> + errno = ENOENT;
>>> + return -1;
>>> +}
>>> +
>>> /* mon_cmds and info_cmds would be sorted at runtime */
>>> static mon_cmd_t mon_cmds[] = {
>>> #include "hmp-commands.h"
>>
>>> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>> #endif
>>> }
>>>
>>> +/*
>>> + * Dups an fd and sets the flags
>>> + */
>>> +static int qemu_dup(int fd, int flags)
>>> +{
>>> + int i;
>>> + int ret;
>>> + int serrno;
>>> + int dup_flags;
>>> + int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>>> + O_NONBLOCK, 0 };
>>> +
>>> + if (flags & O_CLOEXEC) {
>>> + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>>> + if (ret == -1 && errno == EINVAL) {
>>> + ret = dup(fd);
>>> + if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>>> + goto fail;
>>> + }
>>> + }
>>> + } else {
>>> + ret = dup(fd);
>>> + }
>>> +
>>> + if (ret == -1) {
>>> + goto fail;
>>> + }
>>> +
>>> + dup_flags = fcntl(ret, F_GETFL);
>>> + if (dup_flags == -1) {
>>> + goto fail;
>>> + }
>>> +
>>> + if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>>> + errno = EINVAL;
>>> + goto fail;
>>> + }
>>
>> It's worth trying to set it before failing, newer kernels can do it. But
>> as I said above, if you can fail here, it makes sense to consider O_SYNC
>> when selecting the right file descriptor from the fdset.
>>
>
> I'm on a 3.4.4 Fedora kernel that doesn't appear to support
> fcntl(O_SYNC), but perhaps I'm doing something wrong. Here's my test
> code (shortened for simplicty): [...]
Hm, true. So it seems that patch has never made it into the kernel, in
fact...
>>> +
>>> + qemu_set_cloexec(ret);
>>
>> Wait... If O_CLOEXEC is set, you set the flag immediately and if it
>> isn't you set it at the end of the function? What's the intended use
>> case for not using O_CLOEXEC then?
>>
>
> This is a mistake. I think I just need to be using qemu_set_cloexec()
> instead of fcntl_setfl() earlier in this function and get rid of this
> latter call to qemu_set_cloexec().
Yes, probably. And in fact, I think this shouldn't even be conditional
on flags & O_CLOEXEC. The whole reason qemu_open() was introduced
originally was to always set O_CLOEXEC.
Kevin
next prev parent reply other threads:[~2012-07-25 8:23 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
2012-07-23 22:50 ` Eric Blake
2012-07-24 2:19 ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets Corey Bryant
2012-07-25 18:16 ` Eric Blake
2012-07-26 2:55 ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 3/6] monitor: Clean up fd sets on monitor disconnect Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open Corey Bryant
2012-07-25 19:22 ` Eric Blake
2012-07-26 3:11 ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
2012-07-23 13:14 ` Corey Bryant
2012-08-02 22:21 ` Corey Bryant
2012-08-06 9:15 ` Kevin Wolf
2012-08-06 13:32 ` Corey Bryant
2012-08-06 13:51 ` Kevin Wolf
2012-08-06 14:15 ` Corey Bryant
2012-08-07 16:43 ` Corey Bryant
2012-07-24 12:07 ` Kevin Wolf
2012-07-25 3:41 ` Corey Bryant
2012-07-25 8:22 ` Kevin Wolf [this message]
2012-07-25 19:25 ` Eric Blake
2012-07-26 3:21 ` Corey Bryant
2012-07-26 13:13 ` Eric Blake
2012-07-26 13:16 ` Kevin Wolf
2012-07-27 4:07 ` Corey Bryant
2012-07-25 19:43 ` Eric Blake
2012-07-26 3:57 ` Corey Bryant
2012-07-26 9:07 ` Kevin Wolf
2012-07-27 3:59 ` Corey Bryant
2012-07-27 4:03 ` Corey Bryant
2012-08-02 15:08 ` Corey Bryant
2012-07-24 12:09 ` [Qemu-devel] [PATCH v5 0/6] file descriptor passing using " Kevin Wolf
2012-07-25 3:42 ` Corey Bryant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=500FACD6.700@redhat.com \
--to=kwolf@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=coreyb@linux.vnet.ibm.com \
--cc=eblake@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=libvir-list@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).