Re: [RFC PATCH 1/6] Implement reexport helper library

All of lore.kernel.org
 help / color / mirror / Atom feed

From: bfields <bfields@fieldses.org>
To: Richard Weinberger <richard@nod.at>
Cc: linux-nfs <linux-nfs@vger.kernel.org>,
	david <david@sigma-star.at>,
	luis turcitu <luis.turcitu@appsbroker.com>,
	david young <david.young@appsbroker.com>,
	david oberhollenzer <david.oberhollenzer@sigma-star.at>,
	trond myklebust <trond.myklebust@hammerspace.com>,
	anna schumaker <anna.schumaker@netapp.com>,
	chris chilvers <chris.chilvers@appsbroker.com>
Subject: Re: [RFC PATCH 1/6] Implement reexport helper library
Date: Wed, 9 Mar 2022 10:28:46 -0500	[thread overview]
Message-ID: <20220309152846.GC6633@fieldses.org> (raw)
In-Reply-To: <2098326047.128064.1646838131178.JavaMail.zimbra@nod.at>

On Wed, Mar 09, 2022 at 04:02:11PM +0100, Richard Weinberger wrote:
> Bruce,
> 
> ----- Ursprüngliche Mail -----
> > Von: "bfields" <bfields@fieldses.org>
> >> Concurrent access to the database is synchronized using a shared rwlock (on
> >> shared memory).
> >> reexpdb_init.lock is used to make sure that creating and initializing the shared
> >> memory/lock
> >> happens once.
> > 
> > Could you point me to sqlite documentation that explains why the user
> > would need to do their own locking?
> 
> https://www.sqlite.org/rescode.html#busy
>  
> > I assumed sqlite would do any necessary locking for you.  It seems like
> > a core function for a database.
> 
> Well, SQLite does locking but no queuing.
> So, as soon somebody is writing the data base it is locked and all other
> read/writes will fail either with SQLITE_BUSY or SQLITE_LOCKED.
> It is up to the user how to react on that.
>  
> That's why I chose to use a shared rwlock where a task can *wait* upon
> conflicting access.
> 
> Maybe there is a better way do it, dunno.

Oh, got it, thanks for the explanation.

Assuming writes are rare, maybe a dumb retry loop would be adequate.
Sounds like that's what we'd need anyway if we were to share the
database between cooperating re-export servers.  (Would we have a
performance problem in that case, if several reexport servers start at
once and all start trying to populate the shared database?  I don't
know.)

Anyway, it's a judgement call, fair enough.  Might be worth a brief
comment, at least.

> >> > What are the two tables used for?  Naively I'd've thought the
> >> > "subvolumes" table was redundant.
> >> 
> >> fsidnums is used to store generated and predefined fsid numbers.
> >> It is only used in reexport modes auto-fsidnum and predefined-fsidnum.
> >> 
> >> subvolumes contains a list of subvolumes which a are likely in use by
> >> a client. Up start all these paths will get touched such that they can
> >> be exported.
> > 
> > The fsidnums also contains that same list of paths, right?  So I don't
> > understand why we need both.
> 
> In the current design generated fsidnums will stay forever while the paths
> in subvolumes can get cleaned.
>  
> > Also, if we're depending on touching all the paths on startup, something
> > is wrong.
> 
> I think we talked about that already and agreed that it should work without
> touching. So far I didn't had a chance to investigate into this.

OK.  Do you think you could look into that, and strip this down to the
one auto-fsidnum case, and then continue the discussion?  I think that'd
clarify things.

As I say, I wouldn't necessarily be opposed to later adding a reexport=
option back in, but for now I'd first like to see if we can find the
simplest patches that will solve the problem in one good-enough way.

> > What we want to do is touch the path when we get an upcall for the given
> > fsid.  That way we don't have to assume, for example, that the system
> > will never expire mounts that haven't been used recently.
> > 
> >> >> +/*
> >> >> + * This query is a little tricky. We use SQL to find and claim the smallest
> >> >> free fsid number.
> >> > 
> >> > Yes, that is a little tricky.  Is it necessary?  My SQL Is rusty, but
> >> > the database should be able to pick a unique value for us, shouldn't it?
> >> 
> >> SQLite can generate a unique value, but we cannot select the range.
> >> It will give a value between 0 and 2^64.
> >> We need an id between 1 and 2^32.
> > 
> > Surely that CHECK constraint doesn't somehow cause sqlite to generate
> > non-unique primary keys?  At worst I'd think it would cause INSERTs to
> > fail if the ordinary primary-key-choosing algorithm chooses something
> > over 2^32.
> 
> The CHECK is just a paranoid check. My SQL INSERT generates ids starting with 1.
> Sure, if you run it 2^32 times, it will fail due to the CHECK.

OK.

--b.

next prev parent reply	other threads:[~2022-03-09 15:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 13:15 [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 1/6] Implement reexport helper library Richard Weinberger
2022-03-08 21:44   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger
2022-03-09 14:19       ` bfields
2022-03-09 15:02         ` Richard Weinberger
2022-03-09 15:28           ` bfields [this message]
2022-02-17 13:15 ` [RFC PATCH 2/6] exports: Implement new export option reexport= Richard Weinberger
2022-03-08 22:10   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 3/6] export: Implement logic behind reexport= Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 4/6] export: Record mounted volumes Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 5/6] nfsd: statfs() every known subvolume upon start Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 6/6] export: Garbage collect orphaned subvolumes " Richard Weinberger
2022-02-17 16:33 ` [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports J. Bruce Fields
2022-02-17 17:27   ` Richard Weinberger
2022-02-17 19:27     ` bfields
2022-02-17 20:15       ` Richard Weinberger
2022-02-17 20:18         ` bfields
2022-02-17 20:29           ` Richard Weinberger
2022-03-07  9:25   ` Richard Weinberger
2022-03-07 22:29     ` bfields
2022-04-19 20:20       ` Steve Dickson
2022-04-19 20:31         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220309152846.GC6633@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=anna.schumaker@netapp.com \
    --cc=chris.chilvers@appsbroker.com \
    --cc=david.oberhollenzer@sigma-star.at \
    --cc=david.young@appsbroker.com \
    --cc=david@sigma-star.at \
    --cc=linux-nfs@vger.kernel.org \
    --cc=luis.turcitu@appsbroker.com \
    --cc=richard@nod.at \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.