Re: [lkp-robot] [fs/locks] 9d21d181d0: will-it-scale.per_process

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
       [not found] <20170601020556.GE16905@yexl-desktop>
@ 2017-06-01 11:41 ` Jeff Layton
  2017-06-01 11:49   ` Benjamin Coddington
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2017-06-01 11:41 UTC (permalink / raw)
  To: kernel test robot, Benjamin Coddington
  Cc: Alexander Viro, bfields, linux-fsdevel, linux-nfs, lkp,
	Christoph Hellwig

On Thu, 2017-06-01 at 10:05 +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:
> 
> 
> commit: 9d21d181d06acab9a8e80eac2ec4eed77b656793 ("fs/locks: Set fl_nspid at file_lock allocation")
> url: https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Alloc-file_lock-where-practical/20170527-050700
> 
> 

Ouch, that's a rather nasty performance hit. In hindsight, maybe we
shouldn't move those off the stack after all? Heck, if it's that
significant, maybe we should move the F_SETLK callers to allocate these
on the stack as well?

> in testcase: will-it-scale
> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
> with following parameters:
> 
> 	test: lock1
> 	cpufreq_governor: performance
> 
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+----------------------------------------------------------------+
> > testcase: change | will-it-scale: will-it-scale.per_process_ops -4.9% regression  |
> > test machine     | 16 threads Intel(R) Atom(R) CPU 3958 @ 2.00GHz with 64G memory |
> > test parameters  | cpufreq_governor=performance                                   |
> >                  | mode=process                                                   |
> >                  | nr_task=100%                                                   |
> >                  | test=lock1                                                     |
> 
> +------------------+----------------------------------------------------------------+
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> To reproduce:
> 
>         git clone https://github.com/01org/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install job.yaml  # job file is attached in this email
>         bin/lkp run     job.yaml
> 
> testcase/path_params/tbox_group/run: will-it-scale/lock1-performance/lkp-ivb-d04
> 
> 09790e423b32fba4  9d21d181d06acab9a8e80eac2e  
> ----------------  --------------------------  
>       0.51              19%       0.60 ±  7%  will-it-scale.scalability
>    2462089             -14%    2114597        will-it-scale.per_process_ops
>    2195246             -26%    1631578        will-it-scale.per_thread_ops
>        350                         356        will-it-scale.time.system_time
>      28.89             -24%      22.06        will-it-scale.time.user_time
>      32.78                       31.97        turbostat.PkgWatt
>      15.58              -5%      14.80        turbostat.CorWatt
>      19284                       18803        vmstat.system.in
>      32208              -4%      31052        vmstat.system.cs
>       1630 ±173%      2e+04      18278 ± 27%  latency_stats.avg.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
>       1630 ±173%      2e+04      18278 ± 27%  latency_stats.max.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
>       1630 ±173%      2e+04      18278 ± 27%  latency_stats.sum.perf_event_alloc.SYSC_perf_event_open.SyS_perf_event_open.entry_SYSCALL_64_fastpath
>  1.911e+09 ±  6%       163%  5.022e+09 ±  5%  perf-stat.cache-references
>      27.58 ± 12%        17%      32.14 ±  7%  perf-stat.iTLB-load-miss-rate%
>    9881103              -4%    9527607        perf-stat.context-switches
>  9.567e+11 ±  9%       -14%  8.181e+11 ±  9%  perf-stat.dTLB-loads
>   6.85e+11 ±  4%       -16%  5.761e+11 ±  6%  perf-stat.branch-instructions
>  3.469e+12 ±  4%       -17%  2.893e+12 ±  6%  perf-stat.instructions
>       1.24 ±  4%       -19%       1.00        perf-stat.ipc
>       3.18 ±  8%       -62%       1.19 ± 19%  perf-stat.cache-miss-rate%
> 
> 
> 
>                              perf-stat.cache-references
> 
>   8e+09 ++------------------------------------------------------------------+
>         |                                                                   |
>   7e+09 ++                                             O       O            |
>         |                  O                                                |
>   6e+09 ++                                  O                               |
>         |                                                             O     |
>   5e+09 ++O                       O             O    O       O          O   O
>         O   O O  O O O O O   O O    O O   O   O    O       O        O     O |
>   4e+09 ++                              O                O       O          |
>         |                                                                   |
>   3e+09 ++                                                                  |
>         |   *.                 *..        *.                                |
>   2e+09 *+ +  *..         .*. +   *. .*. +  *.    .*.*.   .*.*. .*..*       |
>         | *      *.*.*.*.*   *      *   *     *.*.     *.*     *            |
>   1e+09 ++------------------------------------------------------------------+
> 
> 
>                           will-it-scale.time.user_time
> 
>   30 ++--*-------------------*-----------*----------------------------------+
>   29 *+*    *.*.*.*..*.*.*.*    *.*.*.*.   *.*.     *. .*.    .*.*.*        |
>      |                                         *. ..  *   *.*.              |
>   28 ++                                          *                          |
>   27 ++                                                                     |
>      |                                                                      |
>   26 ++                                                                     |
>   25 ++                                                                     |
>   24 ++                                                                     |
>      |                                                                      |
>   23 ++  O    O                   O O O  O O O O    O   O   O    O          |
>   22 O+O    O   O O      O O O  O                O    O        O     O  O O |
>      |               O                                    O        O        O
>   21 ++                O                                                    |
>   20 ++---------------------------------------------------------------------+
> 
> 
>                           will-it-scale.time.system_time
> 
>   358 ++--------------------------------------------------------------------+
>   357 O+O    O   O      O       O                 O                     O   O
>       |   O    O   O O    O O     O  O O O          O O    O O O   O O    O |
>   356 ++                      O            O O  O       O        O          |
>   355 ++                                                                    |
>       |                                                                     |
>   354 ++                                                                    |
>   353 ++                                                                    |
>   352 ++                                                                    |
>       |                                                                     |
>   351 ++                                          *. .*.    .*.             |
>   350 *+*.  .*   *     .*.*.*. .*    *.*. .*     +  *   *..*   *.*.*        |
>       |   *.  + + + .*.       *  + ..    *  +  .*                           |
>   349 ++       *   *              *          *.                             |
>   348 ++--------------------------------------------------------------------+
> 
> 
>                              will-it-scale.per_thread_ops
> 
>   2.3e+06 ++----------------------------------------------------------------+
>           |                                                                 |
>   2.2e+06 ++*.*.   .*. .*..*. .*.*.     .*.*.         .*.*..*.*.*.*.*       |
>           *     *.*   *      *     *.*.*     *.*. .*.*                      |
>   2.1e+06 ++                                     *                          |
>     2e+06 ++                                                                |
>           |                                                                 |
>   1.9e+06 ++                                                                |
>           |                                                                 |
>   1.8e+06 ++                                                                |
>   1.7e+06 ++      O                    O O   O       O        O             |
>           O O O     O              O       O       O     O        O     O   |
>   1.6e+06 ++    O     O      O O O   O         O O     O    O   O   O O   O O
>           |             O  O                                                |
>   1.5e+06 ++----------------------------------------------------------------+
> 
>   [*] bisect-good sample
>   [O] bisect-bad  sample
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> Thanks,
> Xiaolong

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-01 11:41 ` [lkp-robot] [fs/locks] 9d21d181d0: will-it-scale.per_process_ops -14.1% regression Jeff Layton
@ 2017-06-01 11:49   ` Benjamin Coddington
  2017-06-01 12:59     ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Coddington @ 2017-06-01 11:49 UTC (permalink / raw)
  To: Jeff Layton
  Cc: kernel test robot, Alexander Viro, bfields, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On 1 Jun 2017, at 7:41, Jeff Layton wrote:

> On Thu, 2017-06-01 at 10:05 +0800, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops 
>> due to commit:
>>
>>
>> commit: 9d21d181d06acab9a8e80eac2ec4eed77b656793 ("fs/locks: Set 
>> fl_nspid at file_lock allocation")
>> url: 
>> https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Alloc-file_lock-where-practical/20170527-050700
>>
>>
>
> Ouch, that's a rather nasty performance hit. In hindsight, maybe we
> shouldn't move those off the stack after all? Heck, if it's that
> significant, maybe we should move the F_SETLK callers to allocate 
> these
> on the stack as well?

We can do that.  But, I think this is picking up the 
locks_mandatory_area()
allocation which is now removed.  The attached config has
CONFIG_MANDATORY_FILE_LOCKING=y,  so there's allocation on every 
read/write.

Ben

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-01 11:49   ` Benjamin Coddington
@ 2017-06-01 12:59     ` Jeff Layton
  2017-06-01 15:14       ` J. Bruce Fields
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2017-06-01 12:59 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: kernel test robot, Alexander Viro, bfields, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On Thu, 2017-06-01 at 07:49 -0400, Benjamin Coddington wrote:
> On 1 Jun 2017, at 7:41, Jeff Layton wrote:
> 
> > On Thu, 2017-06-01 at 10:05 +0800, kernel test robot wrote:
> > > Greeting,
> > > 
> > > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops 
> > > due to commit:
> > > 
> > > 
> > > commit: 9d21d181d06acab9a8e80eac2ec4eed77b656793 ("fs/locks: Set 
> > > fl_nspid at file_lock allocation")
> > > url: 
> > > https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Alloc-file_lock-where-practical/20170527-050700
> > > 
> > > 
> > 
> > Ouch, that's a rather nasty performance hit. In hindsight, maybe we
> > shouldn't move those off the stack after all? Heck, if it's that
> > significant, maybe we should move the F_SETLK callers to allocate 
> > these
> > on the stack as well?
> 
> We can do that.  But, I think this is picking up the 
> locks_mandatory_area()
> allocation which is now removed.  The attached config has
> CONFIG_MANDATORY_FILE_LOCKING=y,  so there's allocation on every 
> read/write.
> 

I'm not so sure. That would only be the case if the thing were marked
for manadatory locking (a really rare thing).

The test is really simple and I don't think any read/write activity is
involved:

    https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c

...and the 0 day bisected it down to this patch, IIUC:

    https://github.com/0day-ci/linux/commit/9d21d181d06acab9a8e80eac2ec4eed77b656793

It seems likely that it's the extra get_pid/put_pid in the allocation
and free codepath. I expected those to be pretty cheap, but maybe
they're not?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-01 12:59     ` Jeff Layton
@ 2017-06-01 15:14       ` J. Bruce Fields
  2017-06-01 15:48         ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: J. Bruce Fields @ 2017-06-01 15:14 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Benjamin Coddington, kernel test robot, Alexander Viro,
	linux-fsdevel, linux-nfs, lkp, Christoph Hellwig

On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
> On Thu, 2017-06-01 at 07:49 -0400, Benjamin Coddington wrote:
> > On 1 Jun 2017, at 7:41, Jeff Layton wrote:
> > 
> > > On Thu, 2017-06-01 at 10:05 +0800, kernel test robot wrote:
> > > > Greeting,
> > > > 
> > > > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops 
> > > > due to commit:
> > > > 
> > > > 
> > > > commit: 9d21d181d06acab9a8e80eac2ec4eed77b656793 ("fs/locks: Set 
> > > > fl_nspid at file_lock allocation")
> > > > url: 
> > > > https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Alloc-file_lock-where-practical/20170527-050700
> > > > 
> > > > 
> > > 
> > > Ouch, that's a rather nasty performance hit. In hindsight, maybe we
> > > shouldn't move those off the stack after all? Heck, if it's that
> > > significant, maybe we should move the F_SETLK callers to allocate 
> > > these
> > > on the stack as well?
> > 
> > We can do that.  But, I think this is picking up the 
> > locks_mandatory_area()
> > allocation which is now removed.  The attached config has
> > CONFIG_MANDATORY_FILE_LOCKING=y,  so there's allocation on every 
> > read/write.
> > 
> 
> I'm not so sure. That would only be the case if the thing were marked
> for manadatory locking (a really rare thing).
> 
> The test is really simple and I don't think any read/write activity is
> involved:
> 
>     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c

So it's just F_WRLCK/F_UNLCK in a loop spread across multiple cores?
I'd think real workloads do some work while holding the lock, and a 15%
regression on just the pure lock/unlock loop might not matter?  But best
to be careful, I guess.

--b.

> 
> ...and the 0 day bisected it down to this patch, IIUC:
> 
>     https://github.com/0day-ci/linux/commit/9d21d181d06acab9a8e80eac2ec4eed77b656793
> 
> It seems likely that it's the extra get_pid/put_pid in the allocation
> and free codepath. I expected those to be pretty cheap, but maybe
> they're not?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-01 15:14       ` J. Bruce Fields
@ 2017-06-01 15:48         ` Jeff Layton
  2017-06-05 18:34           ` Benjamin Coddington
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2017-06-01 15:48 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Benjamin Coddington, kernel test robot, Alexander Viro,
	linux-fsdevel, linux-nfs, lkp, Christoph Hellwig

On Thu, 2017-06-01 at 11:14 -0400, J. Bruce Fields wrote:
> On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
> > On Thu, 2017-06-01 at 07:49 -0400, Benjamin Coddington wrote:
> > > On 1 Jun 2017, at 7:41, Jeff Layton wrote:
> > > 
> > > > On Thu, 2017-06-01 at 10:05 +0800, kernel test robot wrote:
> > > > > Greeting,
> > > > > 
> > > > > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops 
> > > > > due to commit:
> > > > > 
> > > > > 
> > > > > commit: 9d21d181d06acab9a8e80eac2ec4eed77b656793 ("fs/locks: Set 
> > > > > fl_nspid at file_lock allocation")
> > > > > url: 
> > > > > https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Alloc-file_lock-where-practical/20170527-050700
> > > > > 
> > > > > 
> > > > 
> > > > Ouch, that's a rather nasty performance hit. In hindsight, maybe we
> > > > shouldn't move those off the stack after all? Heck, if it's that
> > > > significant, maybe we should move the F_SETLK callers to allocate 
> > > > these
> > > > on the stack as well?
> > > 
> > > We can do that.  But, I think this is picking up the 
> > > locks_mandatory_area()
> > > allocation which is now removed.  The attached config has
> > > CONFIG_MANDATORY_FILE_LOCKING=y,  so there's allocation on every 
> > > read/write.
> > > 
> > 
> > I'm not so sure. That would only be the case if the thing were marked
> > for manadatory locking (a really rare thing).
> > 
> > The test is really simple and I don't think any read/write activity is
> > involved:
> > 
> >     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
> 
> So it's just F_WRLCK/F_UNLCK in a loop spread across multiple cores?
> I'd think real workloads do some work while holding the lock, and a 15%
> regression on just the pure lock/unlock loop might not matter?  But best
> to be careful, I guess.
> 
> --b.
> 

Yeah, that's my take.

I was assuming that getting a pid reference would be essentially free,
but it doesn't seem to be.

So, I think we probably want to avoid taking it for a file_lock that we
use to request a lock, but do take it for a file_lock that is used to
record a lock. How best to code that up, I'm not quite sure...

> > 
> > ...and the 0 day bisected it down to this patch, IIUC:
> > 
> >     https://github.com/0day-ci/linux/commit/9d21d181d06acab9a8e80eac2ec4eed77b656793
> > 
> > It seems likely that it's the extra get_pid/put_pid in the allocation
> > and free codepath. I expected those to be pretty cheap, but maybe
> > they're not?
> 
> 

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-01 15:48         ` Jeff Layton
@ 2017-06-05 18:34           ` Benjamin Coddington
  2017-06-05 22:02             ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Coddington @ 2017-06-05 18:34 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, kernel test robot, Alexander Viro, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On 1 Jun 2017, at 11:48, Jeff Layton wrote:

> On Thu, 2017-06-01 at 11:14 -0400, J. Bruce Fields wrote:
>> On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
>>> I'm not so sure. That would only be the case if the thing were 
>>> marked
>>> for manadatory locking (a really rare thing).
>>>
>>> The test is really simple and I don't think any read/write activity 
>>> is
>>> involved:
>>>
>>>     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
>>
>> So it's just F_WRLCK/F_UNLCK in a loop spread across multiple cores?
>> I'd think real workloads do some work while holding the lock, and a 
>> 15%
>> regression on just the pure lock/unlock loop might not matter?  But 
>> best
>> to be careful, I guess.
>>
>> --b.
>>
>
> Yeah, that's my take.
>
> I was assuming that getting a pid reference would be essentially free,
> but it doesn't seem to be.
>
> So, I think we probably want to avoid taking it for a file_lock that 
> we
> use to request a lock, but do take it for a file_lock that is used to
> record a lock. How best to code that up, I'm not quite sure...

Maybe as simple as only setting fl_nspid in locks_insert_lock_ctx(), but
that seems to just take us back to the problem of getting the pid wrong 
if
the lock is inserted later by a different worker than created the 
request.

I have a mind now to just drop fl_nspid off the struct file_lock 
completely,
and instead just carry fl_pid, and when we do F_GETLK, we can do:

task = find_task_by_pid_ns(fl_pid, init_pid_ns)
fl_nspid = task_pid_nr_ns(task, task_active_pid_ns(current))

That moves all the work off into the F_GETLK case, which I think is not 
used
so much.

Ben

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-05 18:34           ` Benjamin Coddington
@ 2017-06-05 22:02             ` Jeff Layton
  2017-06-06 13:00               ` Benjamin Coddington
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2017-06-05 22:02 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: J. Bruce Fields, kernel test robot, Alexander Viro, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On Mon, 2017-06-05 at 14:34 -0400, Benjamin Coddington wrote:
> On 1 Jun 2017, at 11:48, Jeff Layton wrote:
> 
> > On Thu, 2017-06-01 at 11:14 -0400, J. Bruce Fields wrote:
> > > On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
> > > > I'm not so sure. That would only be the case if the thing were 
> > > > marked
> > > > for manadatory locking (a really rare thing).
> > > > 
> > > > The test is really simple and I don't think any read/write activity 
> > > > is
> > > > involved:
> > > > 
> > > >     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
> > > 
> > > So it's just F_WRLCK/F_UNLCK in a loop spread across multiple cores?
> > > I'd think real workloads do some work while holding the lock, and a 
> > > 15%
> > > regression on just the pure lock/unlock loop might not matter?  But 
> > > best
> > > to be careful, I guess.
> > > 
> > > --b.
> > > 
> > 
> > Yeah, that's my take.
> > 
> > I was assuming that getting a pid reference would be essentially free,
> > but it doesn't seem to be.
> > 
> > So, I think we probably want to avoid taking it for a file_lock that 
> > we
> > use to request a lock, but do take it for a file_lock that is used to
> > record a lock. How best to code that up, I'm not quite sure...
> 
> Maybe as simple as only setting fl_nspid in locks_insert_lock_ctx(), but
> that seems to just take us back to the problem of getting the pid wrong 
> if
> the lock is inserted later by a different worker than created the 
> request.
> 
> I have a mind now to just drop fl_nspid off the struct file_lock 
> completely,
> and instead just carry fl_pid, and when we do F_GETLK, we can do:
> 
> task = find_task_by_pid_ns(fl_pid, init_pid_ns)
> fl_nspid = task_pid_nr_ns(task, task_active_pid_ns(current))
> 
> That moves all the work off into the F_GETLK case, which I think is not 
> used
> so much.
> 

Actually I think what might work best is to:

- have locks_copy_conflock also copy the fl_nspid and take a reference
to it (as your patch #2 does)

- only set fl_nspid and take a reference there in locks_insert_lock_ctx
if it's not already set

- allow ->lock operations (like nfs) to set fl_nspid before they call
locks_lock_inode_wait to set the local lock. Might need to take a nspid
reference before dispatching an RPC so that you get the right thread
context.

Would that work?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-05 22:02             ` Jeff Layton
@ 2017-06-06 13:00               ` Benjamin Coddington
  2017-06-06 13:15                 ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Coddington @ 2017-06-06 13:00 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, kernel test robot, Alexander Viro, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig



On 5 Jun 2017, at 18:02, Jeff Layton wrote:

> On Mon, 2017-06-05 at 14:34 -0400, Benjamin Coddington wrote:
>> On 1 Jun 2017, at 11:48, Jeff Layton wrote:
>>
>>> On Thu, 2017-06-01 at 11:14 -0400, J. Bruce Fields wrote:
>>>> On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
>>>>> I'm not so sure. That would only be the case if the thing were
>>>>> marked
>>>>> for manadatory locking (a really rare thing).
>>>>>
>>>>> The test is really simple and I don't think any read/write 
>>>>> activity
>>>>> is
>>>>> involved:
>>>>>
>>>>>     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
>>>>
>>>> So it's just F_WRLCK/F_UNLCK in a loop spread across multiple 
>>>> cores?
>>>> I'd think real workloads do some work while holding the lock, and a
>>>> 15%
>>>> regression on just the pure lock/unlock loop might not matter?  But
>>>> best
>>>> to be careful, I guess.
>>>>
>>>> --b.
>>>>
>>>
>>> Yeah, that's my take.
>>>
>>> I was assuming that getting a pid reference would be essentially 
>>> free,
>>> but it doesn't seem to be.
>>>
>>> So, I think we probably want to avoid taking it for a file_lock that
>>> we
>>> use to request a lock, but do take it for a file_lock that is used 
>>> to
>>> record a lock. How best to code that up, I'm not quite sure...
>>
>> Maybe as simple as only setting fl_nspid in locks_insert_lock_ctx(), 
>> but
>> that seems to just take us back to the problem of getting the pid 
>> wrong
>> if
>> the lock is inserted later by a different worker than created the
>> request.
>>
>> I have a mind now to just drop fl_nspid off the struct file_lock
>> completely,
>> and instead just carry fl_pid, and when we do F_GETLK, we can do:
>>
>> task = find_task_by_pid_ns(fl_pid, init_pid_ns)
>> fl_nspid = task_pid_nr_ns(task, task_active_pid_ns(current))
>>
>> That moves all the work off into the F_GETLK case, which I think is 
>> not
>> used
>> so much.
>>
>
> Actually I think what might work best is to:
>
> - have locks_copy_conflock also copy the fl_nspid and take a reference
> to it (as your patch #2 does)
>
> - only set fl_nspid and take a reference there in 
> locks_insert_lock_ctx
> if it's not already set
>
> - allow ->lock operations (like nfs) to set fl_nspid before they call
> locks_lock_inode_wait to set the local lock. Might need to take a 
> nspid
> reference before dispatching an RPC so that you get the right thread
> context.

It would, but I think fl_nspid is completely unnecessary.  The reason we
have it so that we can translate the pid number into other namespaces,  
the
most common case being that F_GETLK and views of /proc/locks within a
namespace represent the same pid numbers as the processes in that 
namespace
that are holding the locks.

It is much simpler to just keep using fl_pid as the pid number in the 
init
namespace, but move the translation of that pid number to lookup time,
rather than creation time.

Ben

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-06 13:00               ` Benjamin Coddington
@ 2017-06-06 13:15                 ` Jeff Layton
  2017-06-06 13:21                   ` Benjamin Coddington
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2017-06-06 13:15 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: J. Bruce Fields, kernel test robot, Alexander Viro, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On Tue, 2017-06-06 at 09:00 -0400, Benjamin Coddington wrote:
> 
> On 5 Jun 2017, at 18:02, Jeff Layton wrote:
> 
> > On Mon, 2017-06-05 at 14:34 -0400, Benjamin Coddington wrote:
> > > On 1 Jun 2017, at 11:48, Jeff Layton wrote:
> > > 
> > > > On Thu, 2017-06-01 at 11:14 -0400, J. Bruce Fields wrote:
> > > > > On Thu, Jun 01, 2017 at 08:59:21AM -0400, Jeff Layton wrote:
> > > > > > I'm not so sure. That would only be the case if the thing were
> > > > > > marked
> > > > > > for manadatory locking (a really rare thing).
> > > > > > 
> > > > > > The test is really simple and I don't think any read/write 
> > > > > > activity
> > > > > > is
> > > > > > involved:
> > > > > > 
> > > > > >     https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
> > > > > 
> > > > > So it's just F_WRLCK/F_UNLCK in a loop spread across multiple 
> > > > > cores?
> > > > > I'd think real workloads do some work while holding the lock, and a
> > > > > 15%
> > > > > regression on just the pure lock/unlock loop might not matter?  But
> > > > > best
> > > > > to be careful, I guess.
> > > > > 
> > > > > --b.
> > > > > 
> > > > 
> > > > Yeah, that's my take.
> > > > 
> > > > I was assuming that getting a pid reference would be essentially 
> > > > free,
> > > > but it doesn't seem to be.
> > > > 
> > > > So, I think we probably want to avoid taking it for a file_lock that
> > > > we
> > > > use to request a lock, but do take it for a file_lock that is used 
> > > > to
> > > > record a lock. How best to code that up, I'm not quite sure...
> > > 
> > > Maybe as simple as only setting fl_nspid in locks_insert_lock_ctx(), 
> > > but
> > > that seems to just take us back to the problem of getting the pid 
> > > wrong
> > > if
> > > the lock is inserted later by a different worker than created the
> > > request.
> > > 
> > > I have a mind now to just drop fl_nspid off the struct file_lock
> > > completely,
> > > and instead just carry fl_pid, and when we do F_GETLK, we can do:
> > > 
> > > task = find_task_by_pid_ns(fl_pid, init_pid_ns)
> > > fl_nspid = task_pid_nr_ns(task, task_active_pid_ns(current))
> > > 
> > > That moves all the work off into the F_GETLK case, which I think is 
> > > not
> > > used
> > > so much.
> > > 
> > 
> > Actually I think what might work best is to:
> > 
> > - have locks_copy_conflock also copy the fl_nspid and take a reference
> > to it (as your patch #2 does)
> > 
> > - only set fl_nspid and take a reference there in 
> > locks_insert_lock_ctx
> > if it's not already set
> > 
> > - allow ->lock operations (like nfs) to set fl_nspid before they call
> > locks_lock_inode_wait to set the local lock. Might need to take a 
> > nspid
> > reference before dispatching an RPC so that you get the right thread
> > context.
> 
> It would, but I think fl_nspid is completely unnecessary.  The reason we
> have it so that we can translate the pid number into other namespaces,  
> the
> most common case being that F_GETLK and views of /proc/locks within a
> namespace represent the same pid numbers as the processes in that 
> namespace
> that are holding the locks.
> 
> It is much simpler to just keep using fl_pid as the pid number in the 
> init
> namespace, but move the translation of that pid number to lookup time,
> rather than creation time.
> 

I think that would also work and I like the idea of getting rid of a
field in file_lock.

So, to be clear:

fl_pid would then store the pid of the process in the init_pid_ns, and
you'd just translate it as appropriate to the requestor's namespace?

If we want to go that route, then you'll probably still need a flag of
some sort to indicate that the fl_pid is to be expressed "as is", for
remote filesystems.

OTOH, if the lock is held remotely, I wonder if we'd be better off
simply reporting the pid as '-1', like we do with OFD locks. Hardly
anything pays attention to l_pid anyway and it's more or less
meaningless once the filesystem extends beyond the machine you're on.

That said, I'd be inclined to do that in a separate set so we could
revert it if it caused problems somewhere.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lkp-robot] [fs/locks]  9d21d181d0: will-it-scale.per_process_ops -14.1% regression
  2017-06-06 13:15                 ` Jeff Layton
@ 2017-06-06 13:21                   ` Benjamin Coddington
  0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Coddington @ 2017-06-06 13:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, kernel test robot, Alexander Viro, linux-fsdevel,
	linux-nfs, lkp, Christoph Hellwig

On 6 Jun 2017, at 9:15, Jeff Layton wrote:

> On Tue, 2017-06-06 at 09:00 -0400, Benjamin Coddington wrote:
>> It would, but I think fl_nspid is completely unnecessary.  The reason we
>> have it so that we can translate the pid number into other namespaces,
>> the
>> most common case being that F_GETLK and views of /proc/locks within a
>> namespace represent the same pid numbers as the processes in that
>> namespace
>> that are holding the locks.
>>
>> It is much simpler to just keep using fl_pid as the pid number in the
>> init
>> namespace, but move the translation of that pid number to lookup time,
>> rather than creation time.
>>
>
> I think that would also work and I like the idea of getting rid of a
> field in file_lock.
>
> So, to be clear:
>
> fl_pid would then store the pid of the process in the init_pid_ns, and
> you'd just translate it as appropriate to the requestor's namespace?

Right!

> If we want to go that route, then you'll probably still need a flag of
> some sort to indicate that the fl_pid is to be expressed "as is", for
> remote filesystems.

Yes, the hack I have now still uses that flag.

> OTOH, if the lock is held remotely, I wonder if we'd be better off
> simply reporting the pid as '-1', like we do with OFD locks. Hardly
> anything pays attention to l_pid anyway and it's more or less
> meaningless once the filesystem extends beyond the machine you're on.
>
> That said, I'd be inclined to do that in a separate set so we could
> revert it if it caused problems somewhere.

I think keeping that change separate is a good idea.  It might be a good
idea to just stay away from that change unless there's a compelling reason
for it.  If l_sysid ever happens, then there's no need to change the
behavior for l_pid back again.

I'll send some patches in a bit when I am happy with them.

Ben

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-06 13:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170601020556.GE16905@yexl-desktop>
2017-06-01 11:41 ` [lkp-robot] [fs/locks] 9d21d181d0: will-it-scale.per_process_ops -14.1% regression Jeff Layton
2017-06-01 11:49   ` Benjamin Coddington
2017-06-01 12:59     ` Jeff Layton
2017-06-01 15:14       ` J. Bruce Fields
2017-06-01 15:48         ` Jeff Layton
2017-06-05 18:34           ` Benjamin Coddington
2017-06-05 22:02             ` Jeff Layton
2017-06-06 13:00               ` Benjamin Coddington
2017-06-06 13:15                 ` Jeff Layton
2017-06-06 13:21                   ` Benjamin Coddington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).