From: Mateusz Guzik <mguzik@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Yann Droneaud <ydroneaud@opteya.com>,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install
Date: Mon, 20 Apr 2015 17:10:55 +0200 [thread overview]
Message-ID: <20150420151054.GD2513@mguzik> (raw)
In-Reply-To: <20150420134326.GC2513@mguzik>
On Mon, Apr 20, 2015 at 03:43:26PM +0200, Mateusz Guzik wrote:
> On Mon, Apr 20, 2015 at 03:06:33PM +0200, Mateusz Guzik wrote:
> > On Sat, Apr 18, 2015 at 12:02:52AM +0100, Al Viro wrote:
> > > On Sat, Apr 18, 2015 at 12:16:48AM +0200, Mateusz Guzik wrote:
> > >
> > > > I would say this makes the use of seq counter impossible. Even if we
> > > > decided to fall back to a lock on retry, we cannot know what to do if
> > > > the slot is reserved - it very well could be that something called
> > > > close, and something else reserved the slot, so putting the file inside
> > > > could be really bad. In fact we would be putting a file for which we
> > > > don't have a reference anymore.
> > > >
> > > > However, not all hope is lost and I still think we can speed things up.
> > > >
> > > > A locking primitive which only locks stuff for current cpu and has
> > > > another mode where it locks stuff for all cpus would do the trick just
> > > > fine. I'm not a linux guy, quick search suggests 'lglock' would do what
> > > > I want.
> > > >
> > > > table reallocation is an extremely rare operation, so this should be
> > > > fine. It would take the lock 'globally' for given table.
> > >
> > > It would also mean percpu_alloc() for each descriptor table...
> >
> > Well as it was noted I have not checked how it's implemented at the time
> > of writing the message. I agree embedding something like this into files
> > struct is a non-starter.
> >
> > I would say this could work with a small set of locks, selected by hashing
> > struct files pointer.
> >
> > Table resizing is supposed to be extremely rare - most processes should
> > not need it at all (if they do, the default size is too small and should
> > be adjusted). Not only that, the lock is only needed if the process in
> > question is multithreaded.
> >
> > So I would say this would not contend in real-world workloads, but still
> > looks crappy.
> >
> > Unfortunately the whole thing loses original appeal of a simple hack
> > with no potential perfomrance drawbacks. Maybe I'll hack it up later and
> > run some tests anyway.
> >
>
> I just came up with another stupid hack, but this time it could really
> work just fine.
>
> Note that the entire issue stems from the fact that the table can be
> resized at any moment. If only we had a guarantee the table "stands
> still", we would not even need that sequence couner. fd_install could
> just plop the file in.
>
> So a stupid hack which comes to mind tells the kernel to make sure the
> table is big enough and then never resize it ever again (inherited on
> fork, cleared on exec):
> prctl(FDTABLE_SIZE_FIXED, BIGNUM);
>
> or
>
> dup2(0, BIGNUM); /* sizes the table appropriately */
> close(BIGNUM);
> prctl(FDTABLE_SIZE_FIXED);
>
> Thoughts?
Sorry for spam but I came up with another hack. :)
The idea is that we can have a variable which would signify the that
given thread is playing with fd table in fd_install (kind of a lock
embedded into task_struct). We would also have a flag in files struct
indicating that a thread would like to resize it.
expand_fdtable would set the flag and iterate over all threads waiting
for all of them to have the var set to 0.
fd_install would set the var, test the flag and if needed would just
unset the var and take the spin lock associated with the table.
This way the common case (nobody resizes the table) is lockless.
Resizing operation can get expensive but that should be totally fine.
As a hack in a hack we could abuse rcu's counter to server as the "lock".
Thoughts?
--
Mateusz Guzik
next prev parent reply other threads:[~2015-04-20 15:10 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-16 12:16 [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Mateusz Guzik
2015-04-16 17:47 ` Eric Dumazet
2015-04-16 18:09 ` Al Viro
2015-04-16 20:42 ` Eric Dumazet
2015-04-16 20:55 ` Eric Dumazet
2015-04-16 22:00 ` Mateusz Guzik
2015-04-16 22:52 ` Eric Dumazet
2015-04-16 22:35 ` Mateusz Guzik
2015-04-17 21:46 ` Eric Dumazet
2015-04-17 22:16 ` Mateusz Guzik
2015-04-17 23:02 ` Al Viro
2015-04-18 19:41 ` Eric Dumazet
2015-04-20 13:41 ` Mateusz Guzik
2015-04-20 16:46 ` Eric Dumazet
2015-04-20 16:48 ` Eric Dumazet
2015-04-20 13:06 ` Mateusz Guzik
2015-04-20 13:43 ` Mateusz Guzik
2015-04-20 15:10 ` Mateusz Guzik [this message]
2015-04-20 17:15 ` Eric Dumazet
2015-04-20 20:49 ` Eric Dumazet
2015-04-21 18:05 ` Eric Dumazet
2015-04-21 20:06 ` Mateusz Guzik
2015-04-21 20:12 ` Mateusz Guzik
2015-04-21 21:06 ` Eric Dumazet
2015-04-22 4:59 ` [PATCH] fs/file.c: don't acquire files->file_lock in fd_install() Eric Dumazet
2015-04-27 19:05 ` Mateusz Guzik
2015-04-28 16:20 ` Eric Dumazet
2015-04-29 4:25 ` [PATCH v2] " Eric Dumazet
2015-06-22 2:32 ` Al Viro
2015-06-23 5:31 ` Eric Dumazet
2015-06-30 13:54 ` [PATCH v3] " Eric Dumazet
2015-04-22 13:31 ` [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Mateusz Guzik
2015-04-22 13:55 ` Eric Dumazet
2015-04-21 20:57 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150420151054.GD2513@mguzik \
--to=mguzik@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=eric.dumazet@gmail.com \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=viro@ZenIV.linux.org.uk \
--cc=ydroneaud@opteya.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).