From: Avi Kivity <avi@redhat.com>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: t.hirofuchi@aist.go.jp, qemu-devel@nongnu.org,
kvm@vger.kernel.org, satoshi.itoh@aist.go.jp
Subject: Re: [Qemu-devel] [PATCH 2/2] umem: chardevice for kvm postcopy
Date: Thu, 29 Dec 2011 14:47:26 +0200 [thread overview]
Message-ID: <4EFC615E.4040901@redhat.com> (raw)
In-Reply-To: <20111229122250.GF19274@valinux.co.jp>
On 12/29/2011 02:22 PM, Isaku Yamahata wrote:
> >
> > A simpler approach is the open("/dev/umem") returns an mmap()able fd.
> > You need to call an ioctl() to set the size, etc. but only you only
> > operate on that fd.
>
> So you are suggesting that /dev/umem and /dev/umemctl should be introduced
> and split the functionality.
No; perhaps I'm missing some functionality, but I'm suggesting
fd = open("/dev/umem");
ftruncate(fd, size);
struct umem_config config = { ... };
ioctl(fd, UMEM_CONFIG, &config);
mmap(..., fd, size);
> >
> > IMO, it would be better to avoid names, and use opaque __u64 identifiers
> > assigned by userspace, or perhaps file descriptors generated by the
> > kernel. With names come the complications of namespaces, etc. One user
> > can DoS another by grabbing a name that it knows the other user wants to
> > use.
>
> So how about the kernel assigning identifiers which is system global?
Depends on what you do with the identifiers. Something like reattach
needs security, you can't just reattach to any random umem segment.
It's really best to stick with file descriptors, which already have a
security model.
>
>
> > > +
> > > +struct umem_create {
> > > + __u64 size; /* in bytes */
> > > + __s32 umem_fd;
> > > + __s32 shmem_fd;
> > > + __u32 async_req_max;
> > > + __u32 sync_req_max;
> > > + struct umem_name name;
> > > +};
> > > +
> > > +struct umem_page_request {
> > > + __u64 __user *pgoffs;
> >
> > Pointers change their size in 32-bit and 64-bit userspace, best to avoid
> > them.
>
> Ah yes, right. How about following?
> struct {
> __u32 nr;
> __u32 padding;
> __u64 pgoffs[0];
> }
Sure.
If we use a pipe to transport requests, you can just send them as a
sequence of __u64 addresses.
> > > +
> > > +/* ioctl for umem fd */
> > > +#define UMEM_GET_PAGE_REQUEST _IOWR(UMEMIO, 0x10, struct umem_page_request)
> > > +#define UMEM_MARK_PAGE_CACHED _IOW (UMEMIO, 0x11, struct umem_page_cached)
> >
> > You could make the GET_PAGE_REQUEST / MARK_PAGE_CACHED protocol run over
> > file descriptors, instead of an ioctl. It allows you to implement the
> > other side in either the kernel or userspace. This is similar to how
> > kvm uses an eventfd for communication with vhost-net in the kernel, or
> > an implementation in userspace.
>
> Do you mean that read/write on file descriptors is better than ioctl?
> Okay, it would be easy to convert ioctl into read/write.
Yes, they already provide synchronization. And if you want to implement
a umem provider over RDMA in the kernel, then it's easy to add it; it's
not trivial for the kernel to issue ioctls but reads/writes are easy.
It's also easy to pass file descriptors among processes.
How do FUSE/CUSE pass requests?
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2011-12-29 12:47 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-29 1:26 [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy Isaku Yamahata
2011-12-29 1:26 ` [Qemu-devel] [PATCH 1/2] export necessary symbols Isaku Yamahata
2011-12-29 1:26 ` [Qemu-devel] [PATCH 2/2] umem: chardevice for kvm postcopy Isaku Yamahata
2011-12-29 11:17 ` Avi Kivity
2011-12-29 12:22 ` Isaku Yamahata
2011-12-29 12:47 ` Avi Kivity [this message]
2012-01-05 4:08 ` [Qemu-devel] 回复: " thfbjyddx
2012-01-05 10:48 ` [Qemu-devel] 回??: " Isaku Yamahata
2012-01-05 11:10 ` Tommy
2012-01-05 12:18 ` Isaku Yamahata
2012-01-05 15:02 ` Tommy Tang
[not found] ` <4F05BB68.9050302@hotmail.com>
2012-01-05 15:05 ` Tommy Tang
2012-01-06 7:02 ` thfbjyddx
2012-01-06 17:13 ` [Qemu-devel] 回??: [PATCH 2/2] umem: chardevice for kvm?postcopy Isaku Yamahata
2011-12-29 1:31 ` [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy Isaku Yamahata
2011-12-29 11:24 ` Avi Kivity
2011-12-29 12:39 ` Isaku Yamahata
2011-12-29 12:55 ` Avi Kivity
2011-12-29 13:49 ` Isaku Yamahata
2011-12-29 13:52 ` Avi Kivity
2011-12-29 14:18 ` Isaku Yamahata
2011-12-29 14:35 ` Avi Kivity
2011-12-29 14:49 ` Isaku Yamahata
2011-12-29 14:55 ` Avi Kivity
2011-12-29 15:53 ` Isaku Yamahata
2011-12-29 16:00 ` Avi Kivity
2011-12-29 16:01 ` Avi Kivity
2012-01-02 17:05 ` Andrea Arcangeli
2012-01-02 17:55 ` Paolo Bonzini
2012-01-03 14:25 ` Andrea Arcangeli
2012-01-12 13:57 ` Avi Kivity
2012-01-13 2:06 ` Andrea Arcangeli
2012-01-04 3:03 ` Isaku Yamahata
2012-01-12 13:59 ` Avi Kivity
2012-01-13 1:09 ` Benoit Hudzia
2012-01-13 1:31 ` Takuya Yoshikawa
2012-01-13 9:40 ` Benoit Hudzia
2012-01-13 2:03 ` Isaku Yamahata
2012-01-13 2:15 ` Isaku Yamahata
2012-01-13 9:55 ` Benoit Hudzia
2012-01-13 9:48 ` Benoit Hudzia
2012-01-13 2:09 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EFC615E.4040901@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=satoshi.itoh@aist.go.jp \
--cc=t.hirofuchi@aist.go.jp \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).