Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Mahoney <jeffm@suse.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: Mark Fasheh <mfasheh@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org,
	Chris Mason <chris.mason@fusionio.com>,
	Andrew Vagin <avagin@gmail.com>
Subject: Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat
Date: Tue, 10 Sep 2013 17:21:54 -0400	[thread overview]
Message-ID: <522F8D72.6060303@suse.com> (raw)
In-Reply-To: <20130910155607.GC2446@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

On 9/10/13 11:56 AM, Josef Bacik wrote:
> On Tue, Sep 10, 2013 at 08:36:55AM -0700, Mark Fasheh wrote:
>> On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
>>> On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
>>>> On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
>>>>> On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
>>>>>> This won't work, try having 10000 subvolumes with dirty inodes and do sync then
>>>>>> go skiing, you'll have time :).  Thanks,
>>>>>
>>>>> Why would the dirty inodes make any difference?  If you share the bdi
>>>>> between the subvolumes the sync workflow should be exactly the same
>>>>> still.
>>>>>
>>>>
>>>> If we could dis-entangle vfsmounts from sb's and have it so you could have
>>>> multiple vfsmounts with just one sb that would solve at least the in-kernel
>>>> confusion, but I think we still have the userspace confusion.  Thanks,
>>>
>>> I think it would mostly solve userspace confusion, as userspace only
>>> sees mounts and the device names.
>>>
>>> But please fix this up properly instead of propagating the effects of
>>> the nasty btrfs hack that should never have been merged in that form
>>> further up the stack.
>>
>> Can one of you explain how this solves the problem that userspace is getting
>> different devices for the same inode?
>>
>> Seriously, I've been looking into it and I'm a bit lost. I followed the
>> converstaion until here but I don't see how any of the proposed changes
>> actually *fix* anything? Also, what is the relationship between vfsmounts
>> and sb today? Wouldn't a bind mount produce the situation of more than 1
>> vfsmount per sb that is described above?
>>
>> Sincerely, someone who would like to fix this ABI breakage that has been
>> going on for years.
> 
> And let me restate the problem so we're all on the same page.
> 
> Btrfs has subvolumes, completely separate trees within the file system.  These
> trees get their own object numbering, which in turn is how we do our inode
> numbers.  So if you have multiple subvolumes, they will likely have the same
> inode numbers within the same file system.  This screws up things like rsync
> which say "hey look, these two inodes are the same, lets skip them."  So we have
> an anonymous dev so we can make them look different.
> 
> Now if we were to make each subvol its own vfsmount (essentially a bind mount)
> and remove the anonymous device that wouldn't fix the problem _at all_.  The
> file system would appear to be the same to rsync and it wouldn't back stuff up.
> So we still need some way of telling userspace that this object is different.
> 
> I'm not convinced vfsmounts is the way to do this, it doesn't do anything other
> than add a whole lot of complexity to our mounting/subvolume mechanism that is
> already relatively complex.  Thanks,

Agreed. It's hugely wasteful as well. We can have thousands of
subvolumes even on modest systems like workstations when automated
snapshots are involved. Using a vfsmount for each subvolume would make
/proc/mounts pretty useless. Having a separate superblock for each one,
at 1k a pop, would waste a ton of memory considering that they'll be
identical except for the dev_t.

The only way vfsmounts would work is if we added a dev_t there, which
would usually be set to ->mnt_sb->s_dev except for the btrfs case. That
still doesn't solve the polluted /proc/mounts, though.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

     prev parent reply	other threads:[~2013-09-10 21:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07 19:57 [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat Mark Fasheh
2013-08-07 20:18 ` Christoph Hellwig
2013-08-07 20:51   ` Josef Bacik
2013-08-08 12:13     ` Christoph Hellwig
2013-08-08 13:02       ` Josef Bacik
2013-08-08 13:48         ` Christoph Hellwig
2013-08-08 15:31           ` Josef Bacik
2013-08-08 15:44           ` Josef Bacik
2013-08-12 11:47             ` Christoph Hellwig
2013-09-10 15:36               ` Mark Fasheh
2013-09-10 15:56                 ` Josef Bacik
2013-09-10 21:21                   ` Jeff Mahoney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522F8D72.6060303@suse.com \
    --to=jeffm@suse.com \
    --cc=avagin@gmail.com \
    --cc=chris.mason@fusionio.com \
    --cc=hch@infradead.org \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).