* 2.6.9-rc1: page_referenced_one() CPU consumption
@ 2004-09-10 10:51 Nikita Danilov
2004-09-10 12:14 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2004-09-10 10:51 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Hugh Dickins
Hello,
in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which
wasn't a case for 2.6.8-rc2) in the host kernel when running heavily
loaded UML. readprofile -b shows that time is spent in
spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill
page_map_lock" changes are probably not completely unrelated.
Without any deep investigation, one possible scenario is that multiple
threads are doing (as part of direct reclaim),
refill_inactive_zone()
page_referenced()
page_referenced_file() /* (1) mapping->i_mmap_lock doesn't
serialize them */
page_referenced_one()
spin_lock(&mm->page_table_lock) /* (2) everybody is
serialized here */
(1) and (2) will be true if we have one huge address space with a lot of
VMAs, which seems to be exactly what UML does:
$ wc /proc/<UML-host-pid>/maps
4134 28931 561916
This didn't happen before, because page_referenced_one() used to
try-lock.
Nikita.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: 2.6.9-rc1: page_referenced_one() CPU consumption 2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov @ 2004-09-10 12:14 ` Hugh Dickins 2004-09-10 12:21 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Hugh Dickins @ 2004-09-10 12:14 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List On Fri, 10 Sep 2004, Nikita Danilov wrote: > > in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which > wasn't a case for 2.6.8-rc2) in the host kernel when running heavily > loaded UML. readprofile -b shows that time is spent in > spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill > page_map_lock" changes are probably not completely unrelated. > > Without any deep investigation, one possible scenario is that multiple > threads are doing (as part of direct reclaim), > > refill_inactive_zone() > page_referenced() > page_referenced_file() /* (1) mapping->i_mmap_lock doesn't > serialize them */ > page_referenced_one() > spin_lock(&mm->page_table_lock) /* (2) everybody is > serialized here */ > > (1) and (2) will be true if we have one huge address space with a lot of > VMAs, which seems to be exactly what UML does: > > $ wc /proc/<UML-host-pid>/maps > 4134 28931 561916 > > This didn't happen before, because page_referenced_one() used to > try-lock. I'd be very surprised if you're wrong. I remarked on that in the ChangeLog comment: "Though I suppose it's possible that we'll find that vmscan makes better progress with trylocks than spinning - we're free to choose trylocks again if so." I'm quite content to go back to a trylock in page_referenced_one - and in try_to_unmap_one? But yours is the first report of an issue there, so I'm inclined to wait for more reports (which should come flooding in now you mention it!), and input from those with a better grasp than I of how vmscan pans out in practice (Andrew, Nick, Con spring to mind). Hugh ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption 2004-09-10 12:14 ` Hugh Dickins @ 2004-09-10 12:21 ` Hugh Dickins 2004-09-11 1:01 ` Nick Piggin 0 siblings, 1 reply; 6+ messages in thread From: Hugh Dickins @ 2004-09-10 12:21 UTC (permalink / raw) To: Nikita Danilov; +Cc: Linux Kernel Mailing List On Fri, 10 Sep 2004, Hugh Dickins wrote: > > I'm quite content to go back to a trylock in page_referenced_one - and > in try_to_unmap_one? But yours is the first report of an issue there, > so I'm inclined to wait for more reports (which should come flooding in > now you mention it!), and input from those with a better grasp than I > of how vmscan pans out in practice (Andrew, Nick, Con spring to mind). Just want to add, that there'd be little point in changing that back to a trylock, if vmscan ends up cycling hopelessly around a larger loop - though if the larger loop is more preemptible, that's a plus. Hugh ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption 2004-09-10 12:21 ` Hugh Dickins @ 2004-09-11 1:01 ` Nick Piggin 2004-09-12 15:53 ` Nikita Danilov 0 siblings, 1 reply; 6+ messages in thread From: Nick Piggin @ 2004-09-11 1:01 UTC (permalink / raw) To: Hugh Dickins; +Cc: Nikita Danilov, Linux Kernel Mailing List Hugh Dickins wrote: > On Fri, 10 Sep 2004, Hugh Dickins wrote: > >>I'm quite content to go back to a trylock in page_referenced_one - and >>in try_to_unmap_one? But yours is the first report of an issue there, >>so I'm inclined to wait for more reports (which should come flooding in >>now you mention it!), and input from those with a better grasp than I >>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind). > > > Just want to add, that there'd be little point in changing that back > to a trylock, if vmscan ends up cycling hopelessly around a larger > loop - though if the larger loop is more preemptible, that's a plus. > Yeah - I'm not sure why a trylock would perform better. If it is just one big address space, and memory needs to be freed, presumably the scanner will just choose a different page, and try the lock again. Feel like doing a few more quick tests Nikita? ;) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption 2004-09-11 1:01 ` Nick Piggin @ 2004-09-12 15:53 ` Nikita Danilov 2004-09-13 4:53 ` Nick Piggin 0 siblings, 1 reply; 6+ messages in thread From: Nikita Danilov @ 2004-09-12 15:53 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Linux Kernel Mailing List Nick Piggin writes: > Hugh Dickins wrote: > > On Fri, 10 Sep 2004, Hugh Dickins wrote: > > > >>I'm quite content to go back to a trylock in page_referenced_one - and > >>in try_to_unmap_one? But yours is the first report of an issue there, > >>so I'm inclined to wait for more reports (which should come flooding in > >>now you mention it!), and input from those with a better grasp than I > >>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind). > > > > > > Just want to add, that there'd be little point in changing that back > > to a trylock, if vmscan ends up cycling hopelessly around a larger > > loop - though if the larger loop is more preemptible, that's a plus. > > > > Yeah - I'm not sure why a trylock would perform better. If it is just > one big address space, and memory needs to be freed, presumably the > scanner will just choose a different page, and try the lock again. > > Feel like doing a few more quick tests Nikita? ;) Ok, here are my highly unscientific results. Work-load: copying 1G _byte_ file from XNU lustre client to the UML lustre server running in the 2.6.9-rc1 host. Top CPU consumers according to readprofile: 2.6.9-rc1 vanilla: 3312 prio_tree_parent 41.4000 3483 ide_do_request 3.9806 4899 page_referenced_file 25.1231 6138 __copy_from_user_ll 78.6923 7461 get_offset_pmtmr 54.0652 7492 __copy_to_user_ll 96.0513 8042 finish_task_switch 76.5905 9657 prio_tree_next 76.0394 10083 wait_task_stopped 11.7517 11080 sigprocmask 53.0144 13345 prio_tree_right 65.4167 14956 vma_prio_tree_next 173.9070 15838 prio_tree_left 92.6199 27049 eligible_child 124.0780 28533 try_to_unmap_one 64.8477 33810 system_call 768.4091 49865 __preempt_spin_lock 547.9670 56045 do_wait 47.5360 109964 page_referenced_one 388.5654 1529155 mwait_idle 19604.5513 2011318 total 0.7514 2.6.9-rc1 with patch (below) applied: 2999 prio_tree_parent 37.4875 3012 ide_outbsync 301.2000 3272 ide_do_request 3.7394 4365 page_referenced_file 22.3846 6031 __copy_to_user_ll 77.3205 6296 __copy_from_user_ll 80.7179 7563 get_offset_pmtmr 54.8043 7698 finish_task_switch 73.3143 9133 prio_tree_next 71.9134 9817 wait_task_stopped 11.4417 13242 prio_tree_right 64.9118 13620 vma_prio_tree_next 158.3721 15736 prio_tree_left 92.0234 17768 sigprocmask 85.0144 26260 try_to_unmap_one 59.9543 27096 eligible_child 124.2936 28141 system_call 639.5682 41002 __preempt_spin_lock 450.5714 58325 do_wait 49.4699 101512 page_referenced_one 347.6438 1648567 mwait_idle 21135.4744 2107521 total 0.7874 Patch: ---------------------------------------------------------------------- ===== mm/rmap.c 1.77 vs edited ===== --- 1.77/mm/rmap.c 2004-08-24 13:08:39 +04:00 +++ edited/mm/rmap.c 2004-09-12 19:05:26 +04:00 @@ -268,7 +268,8 @@ if (address == -EFAULT) goto out; - spin_lock(&mm->page_table_lock); + if (!spin_trylock(&mm->page_table_lock)) + goto out; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) ---------------------------------------------------------------------- I ran tests few times, and difference between patched and un-patched kernels is within noise, so you are right, try-lock does not help. But now I have new great idea instead. :) I think page_referenced() should transfer dirtiness to the struct page as it scans pte's. Basically the earlier we mark page dirty the better file system write-back performs, because page has more chances to be bulk-written by ->writepages(). This is better than my previous patches to this end (that used separate function to transfer dirtiness from pte's to the page), because - locking overhead is avoided - it's simpler. Nick, are you still in business of benchmarking random VM patches? :-) Nikita. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption 2004-09-12 15:53 ` Nikita Danilov @ 2004-09-13 4:53 ` Nick Piggin 0 siblings, 0 replies; 6+ messages in thread From: Nick Piggin @ 2004-09-13 4:53 UTC (permalink / raw) To: Nikita Danilov; +Cc: Hugh Dickins, Linux Kernel Mailing List Nikita Danilov wrote: > I ran tests few times, and difference between patched and un-patched > kernels is within noise, so you are right, try-lock does not help. > Well I'm glad - because I much prefer the spin_lock over the trylock :) > But now I have new great idea instead. :) > > I think page_referenced() should transfer dirtiness to the struct page > as it scans pte's. Basically the earlier we mark page dirty the better > file system write-back performs, because page has more chances to be > bulk-written by ->writepages(). This is better than my previous patches > to this end (that used separate function to transfer dirtiness from > pte's to the page), because > > - locking overhead is avoided > > - it's simpler. > > Nick, are you still in business of benchmarking random VM patches? :-) > Yeah I am, and I do have that patch sitting around. It can *really* help for writeout via maped memory (obviously doesn't help write()). I think Andrew's response was that it can theoretically cause writeout for workloads that don't want it, so I should come up with at least one real-world improvement! ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-09-13 4:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov 2004-09-10 12:14 ` Hugh Dickins 2004-09-10 12:21 ` Hugh Dickins 2004-09-11 1:01 ` Nick Piggin 2004-09-12 15:53 ` Nikita Danilov 2004-09-13 4:53 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox