2.6.9-rc1: page_referenced_one() CPU consumption

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.9-rc1: page_referenced_one() CPU consumption
@ 2004-09-10 10:51 Nikita Danilov
  2004-09-10 12:14 ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2004-09-10 10:51 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Hugh Dickins

Hello,

in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which
wasn't a case for 2.6.8-rc2) in the host kernel when running heavily
loaded UML. readprofile -b shows that time is spent in
spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill
page_map_lock" changes are probably not completely unrelated.

Without any deep investigation, one possible scenario is that multiple
threads are doing (as part of direct reclaim),

   refill_inactive_zone()
       page_referenced()
           page_referenced_file() /* (1) mapping->i_mmap_lock doesn't
                                     serialize them */
               page_referenced_one()
                   spin_lock(&mm->page_table_lock) /* (2) everybody is
                                                     serialized here */

(1) and (2) will be true if we have one huge address space with a lot of
VMAs, which seems to be exactly what UML does:

$ wc /proc/<UML-host-pid>/maps
4134 28931 561916

This didn't happen before, because page_referenced_one() used to
try-lock.

Nikita.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
  2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov
@ 2004-09-10 12:14 ` Hugh Dickins
  2004-09-10 12:21   ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2004-09-10 12:14 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List

On Fri, 10 Sep 2004, Nikita Danilov wrote:
> 
> in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which
> wasn't a case for 2.6.8-rc2) in the host kernel when running heavily
> loaded UML. readprofile -b shows that time is spent in
> spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill
> page_map_lock" changes are probably not completely unrelated.
> 
> Without any deep investigation, one possible scenario is that multiple
> threads are doing (as part of direct reclaim),
> 
>    refill_inactive_zone()
>        page_referenced()
>            page_referenced_file() /* (1) mapping->i_mmap_lock doesn't
>                                      serialize them */
>                page_referenced_one()
>                    spin_lock(&mm->page_table_lock) /* (2) everybody is
>                                                      serialized here */
> 
> (1) and (2) will be true if we have one huge address space with a lot of
> VMAs, which seems to be exactly what UML does:
> 
> $ wc /proc/<UML-host-pid>/maps
> 4134 28931 561916
> 
> This didn't happen before, because page_referenced_one() used to
> try-lock.

I'd be very surprised if you're wrong.

I remarked on that in the ChangeLog comment: "Though I suppose
it's possible that we'll find that vmscan makes better progress with
trylocks than spinning - we're free to choose trylocks again if so."

I'm quite content to go back to a trylock in page_referenced_one - and
in try_to_unmap_one?  But yours is the first report of an issue there,
so I'm inclined to wait for more reports (which should come flooding in
now you mention it!), and input from those with a better grasp than I
of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).

Hugh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
  2004-09-10 12:14 ` Hugh Dickins
@ 2004-09-10 12:21   ` Hugh Dickins
  2004-09-11  1:01     ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2004-09-10 12:21 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List

On Fri, 10 Sep 2004, Hugh Dickins wrote:
> 
> I'm quite content to go back to a trylock in page_referenced_one - and
> in try_to_unmap_one?  But yours is the first report of an issue there,
> so I'm inclined to wait for more reports (which should come flooding in
> now you mention it!), and input from those with a better grasp than I
> of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).

Just want to add, that there'd be little point in changing that back
to a trylock, if vmscan ends up cycling hopelessly around a larger
loop - though if the larger loop is more preemptible, that's a plus.

Hugh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
  2004-09-10 12:21   ` Hugh Dickins
@ 2004-09-11  1:01     ` Nick Piggin
  2004-09-12 15:53       ` Nikita Danilov
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2004-09-11  1:01 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Nikita Danilov, Linux Kernel Mailing List

Hugh Dickins wrote:
> On Fri, 10 Sep 2004, Hugh Dickins wrote:
> 
>>I'm quite content to go back to a trylock in page_referenced_one - and
>>in try_to_unmap_one?  But yours is the first report of an issue there,
>>so I'm inclined to wait for more reports (which should come flooding in
>>now you mention it!), and input from those with a better grasp than I
>>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
> 
> 
> Just want to add, that there'd be little point in changing that back
> to a trylock, if vmscan ends up cycling hopelessly around a larger
> loop - though if the larger loop is more preemptible, that's a plus.
> 

Yeah - I'm not sure why a trylock would perform better. If it is just
one big address space, and memory needs to be freed, presumably the
scanner will just choose a different page, and try the lock again.

Feel like doing a few more quick tests Nikita? ;)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
  2004-09-11  1:01     ` Nick Piggin
@ 2004-09-12 15:53       ` Nikita Danilov
  2004-09-13  4:53         ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2004-09-12 15:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Hugh Dickins, Linux Kernel Mailing List

Nick Piggin writes:
 > Hugh Dickins wrote:
 > > On Fri, 10 Sep 2004, Hugh Dickins wrote:
 > > 
 > >>I'm quite content to go back to a trylock in page_referenced_one - and
 > >>in try_to_unmap_one?  But yours is the first report of an issue there,
 > >>so I'm inclined to wait for more reports (which should come flooding in
 > >>now you mention it!), and input from those with a better grasp than I
 > >>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
 > > 
 > > 
 > > Just want to add, that there'd be little point in changing that back
 > > to a trylock, if vmscan ends up cycling hopelessly around a larger
 > > loop - though if the larger loop is more preemptible, that's a plus.
 > > 
 > 
 > Yeah - I'm not sure why a trylock would perform better. If it is just
 > one big address space, and memory needs to be freed, presumably the
 > scanner will just choose a different page, and try the lock again.
 > 
 > Feel like doing a few more quick tests Nikita? ;)

Ok, here are my highly unscientific results.

Work-load: copying 1G _byte_ file from XNU lustre client to the UML
lustre server running in the 2.6.9-rc1 host.

Top CPU consumers according to readprofile:

2.6.9-rc1 vanilla:

  3312 prio_tree_parent                          41.4000
  3483 ide_do_request                             3.9806
  4899 page_referenced_file                      25.1231
  6138 __copy_from_user_ll                       78.6923
  7461 get_offset_pmtmr                          54.0652
  7492 __copy_to_user_ll                         96.0513
  8042 finish_task_switch                        76.5905
  9657 prio_tree_next                            76.0394
 10083 wait_task_stopped                         11.7517
 11080 sigprocmask                               53.0144
 13345 prio_tree_right                           65.4167
 14956 vma_prio_tree_next                       173.9070
 15838 prio_tree_left                            92.6199
 27049 eligible_child                           124.0780
 28533 try_to_unmap_one                          64.8477
 33810 system_call                              768.4091
 49865 __preempt_spin_lock                      547.9670
 56045 do_wait                                   47.5360
109964 page_referenced_one                      388.5654
1529155 mwait_idle                               19604.5513
2011318 total                                      0.7514

2.6.9-rc1 with patch (below) applied:

  2999 prio_tree_parent                          37.4875
  3012 ide_outbsync                             301.2000
  3272 ide_do_request                             3.7394
  4365 page_referenced_file                      22.3846
  6031 __copy_to_user_ll                         77.3205
  6296 __copy_from_user_ll                       80.7179
  7563 get_offset_pmtmr                          54.8043
  7698 finish_task_switch                        73.3143
  9133 prio_tree_next                            71.9134
  9817 wait_task_stopped                         11.4417
 13242 prio_tree_right                           64.9118
 13620 vma_prio_tree_next                       158.3721
 15736 prio_tree_left                            92.0234
 17768 sigprocmask                               85.0144
 26260 try_to_unmap_one                          59.9543
 27096 eligible_child                           124.2936
 28141 system_call                              639.5682
 41002 __preempt_spin_lock                      450.5714
 58325 do_wait                                   49.4699
101512 page_referenced_one                      347.6438
1648567 mwait_idle                               21135.4744
2107521 total                                      0.7874

Patch:
----------------------------------------------------------------------
===== mm/rmap.c 1.77 vs edited =====
--- 1.77/mm/rmap.c      2004-08-24 13:08:39 +04:00
+++ edited/mm/rmap.c    2004-09-12 19:05:26 +04:00
@@ -268,7 +268,8 @@
        if (address == -EFAULT)
                goto out;
 
-       spin_lock(&mm->page_table_lock);
+       if (!spin_trylock(&mm->page_table_lock))
+               goto out;
 
        pgd = pgd_offset(mm, address);
        if (!pgd_present(*pgd))
----------------------------------------------------------------------

I ran tests few times, and difference between patched and un-patched
kernels is within noise, so you are right, try-lock does not help.

But now I have new great idea instead. :)

I think page_referenced() should transfer dirtiness to the struct page
as it scans pte's. Basically the earlier we mark page dirty the better
file system write-back performs, because page has more chances to be
bulk-written by ->writepages(). This is better than my previous patches
to this end (that used separate function to transfer dirtiness from
pte's to the page), because

    - locking overhead is avoided

    - it's simpler.

Nick, are you still in business of benchmarking random VM patches? :-)

Nikita.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
  2004-09-12 15:53       ` Nikita Danilov
@ 2004-09-13  4:53         ` Nick Piggin
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2004-09-13  4:53 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Hugh Dickins, Linux Kernel Mailing List

Nikita Danilov wrote:

> I ran tests few times, and difference between patched and un-patched
> kernels is within noise, so you are right, try-lock does not help.
> 

Well I'm glad - because I much prefer the spin_lock over the trylock :)

> But now I have new great idea instead. :)
> 
> I think page_referenced() should transfer dirtiness to the struct page
> as it scans pte's. Basically the earlier we mark page dirty the better
> file system write-back performs, because page has more chances to be
> bulk-written by ->writepages(). This is better than my previous patches
> to this end (that used separate function to transfer dirtiness from
> pte's to the page), because
> 
>     - locking overhead is avoided
> 
>     - it's simpler.
> 
> Nick, are you still in business of benchmarking random VM patches? :-)
> 

Yeah I am, and I do have that patch sitting around. It can *really*
help for writeout via maped memory (obviously doesn't help write()).

I think Andrew's response was that it can theoretically cause writeout
for workloads that don't want it, so I should come up with at least
one real-world improvement!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-09-13  4:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov
2004-09-10 12:14 ` Hugh Dickins
2004-09-10 12:21   ` Hugh Dickins
2004-09-11  1:01     ` Nick Piggin
2004-09-12 15:53       ` Nikita Danilov
2004-09-13  4:53         ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox