From: Christoph Hellwig <hch@infradead.org>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
David Howells <dhowells@redhat.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v12 resend] fs: Add VirtualBox guest shared folder (vboxsf) support
Date: Mon, 12 Aug 2019 07:17:01 -0700 [thread overview]
Message-ID: <20190812141701.GA31267@infradead.org> (raw)
In-Reply-To: <b95eaa46-098d-0954-34b4-a96c7ed7ffa2@redhat.com>
On Mon, Aug 12, 2019 at 03:35:39PM +0200, Hans de Goede wrote:
> > Also these casts to uintptr_t for a call that reads data look very
> > odd.
>
> Yes, as I already discussed with Al, that is because vboxsf_read
> can be (and is) used with both kernel and userspace buffer pointers.
>
> In case of a userspace pointer the underlying IPC code to the host takes
> care of copy to / from user for us. That IPC code can also be used from
> userspace through ioctls on /dev/vboxguest, so the handling of both
> in kernel and userspace addresses is something which it must be able
> to handle anyways, at which point we might as well use it in vboxsf too.
>
> But since both Al and you pointed this out as being ugly, I will add
> 2 separate vboxsf_read_user and vboxsf_read_kernel functions for the
> next version, then the cast (and the true flag) can both go away.
What might be even better is to pass a struct iov_iter to the low-level
function. That gets you 90% of implementing the read_iter and
write_iter methods, as well as a versatile low-level primite that
can deal with kernel and user address, as well as pages.
> > > + /* Make sure any pending writes done through mmap are flushed */
> >
> > Why?
>
> I believe that if we were doing everything through the page-cache then a regular
> write to the same range as a write done through mmap, with the regular write
> happening after (in time) the mmap write, will overwrite the mmap
> written data, we want the same behavior here.
But what happens if you mmap and write at the same or at least
barely the same time.
>
> > > + err = filemap_fdatawait_range(inode->i_mapping, pos, pos + nwritten);
> > > + if (err)
> > > + return err;
> >
> > Also this whole write function seems to miss i_rwsem.
>
> Hmm, I do not see e.g. v9fs_file_write_iter take that either, nor a couple
> of other similar not block-backed filesystems. Will this still
> be necessary after converting to the iter interfaces?
Yes.
> The problem is that the IPC to the host which we build upon only offers
> regular read / write calls. So the most consistent (also cache coherent)
> mapping which we can offer is to directly mapping read -> read and
> wrtie->write without the pagecache. Ideally we would be able to just
> say sorry cannot do mmap, but too much apps rely on mmap and the
> out of tree driver has this mmap "emulation" which means not offering
> it in the mainline version would be a serious regression.
>
> In essence this is the same situation as a bunch of network filesystems
> are in and I've looked at several for inspiration. Looking again at
> e.g. v9fs_file_write_iter it does similar regular read -> read mapping
> with invalidation of the page-cache for mmap users.
v9 is probably not a good idea to copy in general. While not the best
idea to copy directly either I'd rather look at nfs - that is another
protocol without a real distributed lock manager, but at least the
NFS close to open semantics are reasonably well defined and allow using
the pagecache.
> I must admit that I've mostly cargo-culted this from other fs code
> such as the 9p code, or the cifs code which has:
>
> /*
> * If the page is mmap'ed into a process' page tables, then we need to make
> * sure that it doesn't change while being written back.
> */
> static vm_fault_t
> cifs_page_mkwrite(struct vm_fault *vmf)
> {
> struct page *page = vmf->page;
>
> lock_page(page);
> return VM_FAULT_LOCKED;
> }
>
> The if (page->mapping != inode->i_mapping) is used in several places
> including the 9p code, bit as you can see no in the cifs code. I couldn't
> really find a rational for that check, so I'm fine with dropping that check.
If you don't implement ->page_mkwrite the caller will just lock the page
for you..
next prev parent reply other threads:[~2019-08-12 14:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-11 16:31 [PATCH v12 resend 0/1] fs: Add VirtualBox guest shared folder (vboxsf) Hans de Goede
2019-08-11 16:31 ` [PATCH v12 resend] fs: Add VirtualBox guest shared folder (vboxsf) support Hans de Goede
2019-08-12 11:49 ` Christoph Hellwig
2019-08-12 13:35 ` Hans de Goede
2019-08-12 14:17 ` Christoph Hellwig [this message]
2019-08-12 15:53 ` Hans de Goede
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190812141701.GA31267@infradead.org \
--to=hch@infradead.org \
--cc=dhowells@redhat.com \
--cc=hdegoede@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).