linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v12 resend] fs: Add VirtualBox guest shared folder (vboxsf) support
Date: Mon, 12 Aug 2019 07:17:01 -0700	[thread overview]
Message-ID: <20190812141701.GA31267@infradead.org> (raw)
In-Reply-To: <b95eaa46-098d-0954-34b4-a96c7ed7ffa2@redhat.com>

On Mon, Aug 12, 2019 at 03:35:39PM +0200, Hans de Goede wrote:
> > Also these casts to uintptr_t for a call that reads data look very
> > odd.
> 
> Yes, as I already discussed with Al, that is because vboxsf_read
> can be (and is) used with both kernel and userspace buffer pointers.
> 
> In case of a userspace pointer the underlying IPC code to the host takes
> care of copy to / from user for us. That IPC code can also be used from
> userspace through ioctls on /dev/vboxguest, so the handling of both
> in kernel and userspace addresses is something which it must be able
> to handle anyways, at which point we might as well use it in vboxsf too.
> 
> But since both Al and you pointed this out as being ugly, I will add
> 2 separate vboxsf_read_user and vboxsf_read_kernel functions for the
> next version, then the cast (and the true flag) can both go away.

What might be even better is to pass a struct iov_iter to the low-level
function.  That gets you 90% of implementing the read_iter and
write_iter methods, as well as a versatile low-level primite that
can deal with kernel and user address, as well as pages.

> > > +	/* Make sure any pending writes done through mmap are flushed */
> > 
> > Why?
> 
> I believe that if we were doing everything through the page-cache then a regular
> write to the same range as a write done through mmap, with the regular write
> happening after (in time) the mmap write, will overwrite the mmap
> written data, we want the same behavior here.

But what happens if you mmap and write at the same or at least
barely the same time.

> 
> > > +	err = filemap_fdatawait_range(inode->i_mapping, pos, pos + nwritten);
> > > +	if (err)
> > > +		return err;
> > 
> > Also this whole write function seems to miss i_rwsem.
> 
> Hmm, I do not see e.g. v9fs_file_write_iter take that either, nor a couple
> of other similar not block-backed filesystems. Will this still
> be necessary after converting to the iter interfaces?

Yes.

> The problem is that the IPC to the host which we build upon only offers
> regular read / write calls. So the most consistent (also cache coherent)
> mapping which we can offer is to directly mapping read -> read and
> wrtie->write without the pagecache. Ideally we would be able to just
> say sorry cannot do mmap, but too much apps rely on mmap and the
> out of tree driver has this mmap "emulation" which means not offering
> it in the mainline version would be a serious regression.
> 
> In essence this is the same situation as a bunch of network filesystems
> are in and I've looked at several for inspiration. Looking again at
> e.g. v9fs_file_write_iter it does similar regular read -> read mapping
> with invalidation of the page-cache for mmap users.

v9 is probably not a good idea to copy in general.  While not the best
idea to copy directly either I'd rather look at nfs - that is another
protocol without a real distributed lock manager, but at least the
NFS close to open semantics are reasonably well defined and allow using
the pagecache.

> I must admit that I've mostly cargo-culted this from other fs code
> such as the 9p code, or the cifs code which has:
> 
> /*
>  * If the page is mmap'ed into a process' page tables, then we need to make
>  * sure that it doesn't change while being written back.
>  */
> static vm_fault_t
> cifs_page_mkwrite(struct vm_fault *vmf)
> {
>         struct page *page = vmf->page;
> 
>         lock_page(page);
>         return VM_FAULT_LOCKED;
> }
> 
> The if (page->mapping != inode->i_mapping) is used in several places
> including the 9p code, bit as you can see no in the cifs code. I couldn't
> really find a rational for that check, so I'm fine with dropping that check.

If you don't implement ->page_mkwrite the caller will just lock the page
for you..

  reply	other threads:[~2019-08-12 14:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-11 16:31 [PATCH v12 resend 0/1] fs: Add VirtualBox guest shared folder (vboxsf) Hans de Goede
2019-08-11 16:31 ` [PATCH v12 resend] fs: Add VirtualBox guest shared folder (vboxsf) support Hans de Goede
2019-08-12 11:49   ` Christoph Hellwig
2019-08-12 13:35     ` Hans de Goede
2019-08-12 14:17       ` Christoph Hellwig [this message]
2019-08-12 15:53         ` Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190812141701.GA31267@infradead.org \
    --to=hch@infradead.org \
    --cc=dhowells@redhat.com \
    --cc=hdegoede@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).