linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Andreas Dilger <adilger@sun.com>
Cc: Theodore Tso <tytso@mit.edu>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	hch@infradead.org, viro@zeniv.linux.org.uk, corbet@lwn.net,
	linux-fsdevel@vger.kernel.org, sfrench@us.ibm.com
Subject: Re: [PATCH -V3] Generic name to handle and open by handle syscalls
Date: Sat, 24 Apr 2010 11:08:12 +1000	[thread overview]
Message-ID: <20100424110812.40989988@notabene.brown> (raw)
In-Reply-To: <F4F339E7-3C44-4DAB-9C89-5E665D3CDE24@sun.com>

On Fri, 23 Apr 2010 18:19:59 -0600
Andreas Dilger <adilger@sun.com> wrote:

> On 2010-04-23, at 07:23, Theodore Tso wrote:
> > 
> > Something to consider is whether anything bad happens if there are multiple filesystems mounted with the same UUID.  I can think of two ways this could happen.   One is when we make a read-only LVM snapshot of a filesystem, and then mount it to back up a stable snapshot.  This might happen if the sysadmin is trying to backup a SQL database, for example; the database gets frozen, we take a snapshot, and then we unfreeze the database and mount the snapshot.   Now suppose we try to open-by-handle the mysql database --- should the system return the a file from the r/o frozen snapshot, or from the r/w file system?
> 
> I'd say from the r/w LV in virtually all cases.  We are safe from totally egregious errors, because the inode+generation will prevent totally incorrect files from being returned, but newer/older versions of the same file/director may be found.
> 
> > Something we might do is to add a check and refuse mounting file systems with duplicate UUID's, and changing the LVM snapshot code to do run some kind of hook after a snapshot which runs a "tune2fs -U random" on the snapshot.   For r/o LVM snapshots, we could also put in a hack that if there are two file systems mounted, one r/o and one r/w, we return the r/w file system.
> 
> I think this may break things if we change the UUID when a snapshot is created, because we don't know what userspace might be using the UUID for.  That said, I totally agree that returning the r/w LV makes sense.  The LVM code itself understands which LV is the primary and which is the snapshot, so it likely means that the "lookup the UUID" code might need to be smarter.
> 
> Probably the simplest thing is if a new filesystem is mounted, but a second filesystem with the same UUID is mounted that it is skipped.  If we keep the UUID list in FIFO order, that should be sufficient to ensure that the "primary" version is returned first.
> 

I really think this sounds too much like 'policy'.  It is not a trivially
obvious algorithm for selecting the 'right' filesystem.  It depends on the
order things have happened, which might be right for the case that you are
thinking of, but might be wrong for some other case.

I haven't been following the conversation closely so I might have missed
something, but why don't we leave the mapping from handle->filesystem up to
userspace and just do the "filesystem+handle -> file" part in the kernel?
(i.e. just what nfsd does).

>From the kernel's perspective, the only unique identifier for a file system
is a (sometimes fictitious or arbitrary) device number.  Using anything else
(except maybe a mount point) in a kernel interface just seems wrong.

Maybe map the filesystem part of the handle from UUID (or whatever) to devno
in userspace, then pass the devno+file-part-of-handle to the kernel to
perform, the final mapping.

NeilBrown

  reply	other threads:[~2010-04-24  1:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-22 18:15 [PATCH -V3] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-22 18:15 ` [PATCH -V3 1/5] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-04-22 18:15 ` [PATCH -V3 2/5] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2010-04-22 18:15 ` [PATCH -V3 3/5] vfs: Add open by file handle support Aneesh Kumar K.V
2010-04-22 19:22   ` Andreas Dilger
2010-04-23 11:40     ` Aneesh Kumar K. V
2010-04-22 18:15 ` [PATCH -V3 4/5] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2010-04-22 18:15 ` [PATCH -V3 5/5] ext4: Add get_fsid callback Aneesh Kumar K.V
2010-04-22 19:07 ` [PATCH -V3] Generic name to handle and open by handle syscalls Andreas Dilger
2010-04-22 22:49 ` Serge E. Hallyn
2010-04-23 11:45   ` Aneesh Kumar K. V
2010-04-23 13:49     ` Serge E. Hallyn
2010-04-23 13:23 ` Theodore Tso
2010-04-24  0:19   ` Andreas Dilger
2010-04-24  1:08     ` Neil Brown [this message]
2010-04-25 18:21       ` Aneesh Kumar K. V
2010-04-26  9:56       ` Christoph Hellwig
2010-04-26 10:16         ` Neil Brown
2010-04-26 10:28           ` Christoph Hellwig
2010-04-26 11:16             ` Neil Brown
2010-04-26 14:53               ` Theodore Tso
2010-04-26 14:56                 ` Christoph Hellwig
2010-04-25 18:07     ` Aneesh Kumar K. V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100424110812.40989988@notabene.brown \
    --to=neilb@suse.de \
    --cc=adilger@sun.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=corbet@lwn.net \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sfrench@us.ibm.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).