linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Daegyu Han <hdg9400@gmail.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, linux-fsdevel@vger.kernel.org
Subject: Re: Sharing ext4 on target storage to multiple initiators using NVMeoF
Date: Tue, 17 Sep 2019 08:54:23 -0400	[thread overview]
Message-ID: <20190917125423.GE6762@mit.edu> (raw)
In-Reply-To: <CAARcW+pLLABT9sq5LHykKmrcNjct8h64_6ePKeVGsOzeLgG8Tg@mail.gmail.com>

On Tue, Sep 17, 2019 at 09:44:00AM +0900, Daegyu Han wrote:
> It started with my curiosity.
> I know this is not the right way to use a local filesystem and someone
> would feel weird.
> I just wanted to organize the situation and experiment like that.
> 
> I thought it would work if I flushed Node B's cached file system
> metadata with the drop cache, but I didn't.
> 
> I've googled for something other than the mount and unmount process,
> and I saw a StackOverflow article telling file systems to sync via
> blockdev --flushbufs.
> 
> So I do the blockdev --flushbufs after the drop cache.
> However, I still do not know why I can read the data stored in the
> shared storage via Node B.

There are many problems, but the primary one is that Node B has
caches.  If it has a cached version of the inode table block, why
should it reread it after Node A has modified it?  Also, the VFS also
has negative dentry caches.  This is very important for search path
performance.  Consider for example the compiler which may need to look
in many directories for a particular header file.  If the C program has:

#include "amazing.h"

The C compiler may need to look in a dozen or more directories trying
to find the header file amazing.h.  And each successive C compiler
process will need to keep looking in all of those same directories.
So the kernel will keep a "negative cache", so if
/usr/include/amazing.h doesn't exist, it won't ask the file system
when the 2nd, 3rd, 4th, 5th, ... compiler process tries to open
/usr/include/amazing.h.

You can disable all of the caches, but that makes the file system
terribly, terribly slow.  What network file systems will do is they
have schemes whereby they can safely cache, since the network file
system protocol has a way that the client can be told that their
cached information must be reread.  Local disk file systems don't have
anything like this.

There are shared-disk file systems that are designed for
multi-initiator setups.  Examples of this include gfs and ocfs2 in
Linux.  You will find that they often trade performance for
scalability to support multiple initiators.

You can use ext4 for fallback schemes, where the primary server has
exclusive access to the disk, and when the primary dies, the fallback
server can take over.  The ext4 multi-mount protection scheme is
designed for those sorts of use cases, and it's used by Lustre
servers.  But only one system is actively reading or writing to the
disk at a time, and the fallback server has to replay the journal, and
assure that primary server won't "come back to life".  Those are
sometimes called STONITH schemes ("shoot the other node in the head"),
and might involve network controlled power strips, etc.

Regards,

						- Ted

  reply	other threads:[~2019-09-17 12:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-16 14:33 Sharing ext4 on target storage to multiple initiators using NVMeoF Daegyu Han
2019-09-16 19:23 ` Eric Sandeen
2019-09-17  0:44   ` Daegyu Han
2019-09-17 12:54     ` Theodore Y. Ts'o [this message]
2019-09-17 15:38       ` Daegyu Han
2019-09-17  6:48 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190917125423.GE6762@mit.edu \
    --to=tytso@mit.edu \
    --cc=hdg9400@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).