Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeff Mahoney <jeffm@suse.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: Mark Fasheh <mfasheh@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org,
	Chris Mason <chris.mason@fusionio.com>,
	Andrew Vagin <avagin@gmail.com>
Subject: Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat
Date: Tue, 10 Sep 2013 17:21:54 -0400	[thread overview]
Message-ID: <522F8D72.6060303@suse.com> (raw)
In-Reply-To: <20130910155607.GC2446@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

On 9/10/13 11:56 AM, Josef Bacik wrote:
> On Tue, Sep 10, 2013 at 08:36:55AM -0700, Mark Fasheh wrote:
>> On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
>>> On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
>>>> On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
>>>>> On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
>>>>>> This won't work, try having 10000 subvolumes with dirty inodes and do sync then
>>>>>> go skiing, you'll have time :).  Thanks,
>>>>>
>>>>> Why would the dirty inodes make any difference?  If you share the bdi
>>>>> between the subvolumes the sync workflow should be exactly the same
>>>>> still.
>>>>>
>>>>
>>>> If we could dis-entangle vfsmounts from sb's and have it so you could have
>>>> multiple vfsmounts with just one sb that would solve at least the in-kernel
>>>> confusion, but I think we still have the userspace confusion.  Thanks,
>>>
>>> I think it would mostly solve userspace confusion, as userspace only
>>> sees mounts and the device names.
>>>
>>> But please fix this up properly instead of propagating the effects of
>>> the nasty btrfs hack that should never have been merged in that form
>>> further up the stack.
>>
>> Can one of you explain how this solves the problem that userspace is getting
>> different devices for the same inode?
>>
>> Seriously, I've been looking into it and I'm a bit lost. I followed the
>> converstaion until here but I don't see how any of the proposed changes
>> actually *fix* anything? Also, what is the relationship between vfsmounts
>> and sb today? Wouldn't a bind mount produce the situation of more than 1
>> vfsmount per sb that is described above?
>>
>> Sincerely, someone who would like to fix this ABI breakage that has been
>> going on for years.
> 
> And let me restate the problem so we're all on the same page.
> 
> Btrfs has subvolumes, completely separate trees within the file system.  These
> trees get their own object numbering, which in turn is how we do our inode
> numbers.  So if you have multiple subvolumes, they will likely have the same
> inode numbers within the same file system.  This screws up things like rsync
> which say "hey look, these two inodes are the same, lets skip them."  So we have
> an anonymous dev so we can make them look different.
> 
> Now if we were to make each subvol its own vfsmount (essentially a bind mount)
> and remove the anonymous device that wouldn't fix the problem _at all_.  The
> file system would appear to be the same to rsync and it wouldn't back stuff up.
> So we still need some way of telling userspace that this object is different.
> 
> I'm not convinced vfsmounts is the way to do this, it doesn't do anything other
> than add a whole lot of complexity to our mounting/subvolume mechanism that is
> already relatively complex.  Thanks,

Agreed. It's hugely wasteful as well. We can have thousands of
subvolumes even on modest systems like workstations when automated
snapshots are involved. Using a vfsmount for each subvolume would make
/proc/mounts pretty useless. Having a separate superblock for each one,
at 1k a pop, would waste a ton of memory considering that they'll be
identical except for the dev_t.

The only way vfsmounts would work is if we added a dev_t there, which
would usually be set to ->mnt_sb->s_dev except for the btrfs case. That
still doesn't solve the polluted /proc/mounts, though.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

     prev parent reply	other threads:[~2013-09-10 21:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07 19:57 [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat Mark Fasheh
2013-08-07 20:18 ` Christoph Hellwig
2013-08-07 20:51   ` Josef Bacik
2013-08-08 12:13     ` Christoph Hellwig
2013-08-08 13:02       ` Josef Bacik
2013-08-08 13:48         ` Christoph Hellwig
2013-08-08 15:31           ` Josef Bacik
2013-08-08 15:44           ` Josef Bacik
2013-08-12 11:47             ` Christoph Hellwig
2013-09-10 15:36               ` Mark Fasheh
2013-09-10 15:56                 ` Josef Bacik
2013-09-10 21:21                   ` Jeff Mahoney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522F8D72.6060303@suse.com \
    --to=jeffm@suse.com \
    --cc=avagin@gmail.com \
    --cc=chris.mason@fusionio.com \
    --cc=hch@infradead.org \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.