From: Jeff Mahoney <jeffm@suse.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: Mark Fasheh <mfasheh@suse.de>,
Christoph Hellwig <hch@infradead.org>,
viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org,
linux-btrfs@vger.kernel.org,
Chris Mason <chris.mason@fusionio.com>,
Andrew Vagin <avagin@gmail.com>
Subject: Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat
Date: Tue, 10 Sep 2013 17:21:54 -0400 [thread overview]
Message-ID: <522F8D72.6060303@suse.com> (raw)
In-Reply-To: <20130910155607.GC2446@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]
On 9/10/13 11:56 AM, Josef Bacik wrote:
> On Tue, Sep 10, 2013 at 08:36:55AM -0700, Mark Fasheh wrote:
>> On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
>>> On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
>>>> On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
>>>>> On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
>>>>>> This won't work, try having 10000 subvolumes with dirty inodes and do sync then
>>>>>> go skiing, you'll have time :). Thanks,
>>>>>
>>>>> Why would the dirty inodes make any difference? If you share the bdi
>>>>> between the subvolumes the sync workflow should be exactly the same
>>>>> still.
>>>>>
>>>>
>>>> If we could dis-entangle vfsmounts from sb's and have it so you could have
>>>> multiple vfsmounts with just one sb that would solve at least the in-kernel
>>>> confusion, but I think we still have the userspace confusion. Thanks,
>>>
>>> I think it would mostly solve userspace confusion, as userspace only
>>> sees mounts and the device names.
>>>
>>> But please fix this up properly instead of propagating the effects of
>>> the nasty btrfs hack that should never have been merged in that form
>>> further up the stack.
>>
>> Can one of you explain how this solves the problem that userspace is getting
>> different devices for the same inode?
>>
>> Seriously, I've been looking into it and I'm a bit lost. I followed the
>> converstaion until here but I don't see how any of the proposed changes
>> actually *fix* anything? Also, what is the relationship between vfsmounts
>> and sb today? Wouldn't a bind mount produce the situation of more than 1
>> vfsmount per sb that is described above?
>>
>> Sincerely, someone who would like to fix this ABI breakage that has been
>> going on for years.
>
> And let me restate the problem so we're all on the same page.
>
> Btrfs has subvolumes, completely separate trees within the file system. These
> trees get their own object numbering, which in turn is how we do our inode
> numbers. So if you have multiple subvolumes, they will likely have the same
> inode numbers within the same file system. This screws up things like rsync
> which say "hey look, these two inodes are the same, lets skip them." So we have
> an anonymous dev so we can make them look different.
>
> Now if we were to make each subvol its own vfsmount (essentially a bind mount)
> and remove the anonymous device that wouldn't fix the problem _at all_. The
> file system would appear to be the same to rsync and it wouldn't back stuff up.
> So we still need some way of telling userspace that this object is different.
>
> I'm not convinced vfsmounts is the way to do this, it doesn't do anything other
> than add a whole lot of complexity to our mounting/subvolume mechanism that is
> already relatively complex. Thanks,
Agreed. It's hugely wasteful as well. We can have thousands of
subvolumes even on modest systems like workstations when automated
snapshots are involved. Using a vfsmount for each subvolume would make
/proc/mounts pretty useless. Having a separate superblock for each one,
at 1k a pop, would waste a ton of memory considering that they'll be
identical except for the dev_t.
The only way vfsmounts would work is if we added a dev_t there, which
would usually be set to ->mnt_sb->s_dev except for the btrfs case. That
still doesn't solve the polluted /proc/mounts, though.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]
prev parent reply other threads:[~2013-09-10 21:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-07 19:57 [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat Mark Fasheh
2013-08-07 20:18 ` Christoph Hellwig
2013-08-07 20:51 ` Josef Bacik
2013-08-08 12:13 ` Christoph Hellwig
2013-08-08 13:02 ` Josef Bacik
2013-08-08 13:48 ` Christoph Hellwig
2013-08-08 15:31 ` Josef Bacik
2013-08-08 15:44 ` Josef Bacik
2013-08-12 11:47 ` Christoph Hellwig
2013-09-10 15:36 ` Mark Fasheh
2013-09-10 15:56 ` Josef Bacik
2013-09-10 21:21 ` Jeff Mahoney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=522F8D72.6060303@suse.com \
--to=jeffm@suse.com \
--cc=avagin@gmail.com \
--cc=chris.mason@fusionio.com \
--cc=hch@infradead.org \
--cc=jbacik@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mfasheh@suse.de \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).