* thrashing on file pages @ 2017-04-05 1:01 Luigi Semenzato 2017-04-11 19:25 ` Luigi Semenzato 2017-04-13 5:42 ` Minchan Kim 0 siblings, 2 replies; 5+ messages in thread From: Luigi Semenzato @ 2017-04-05 1:01 UTC (permalink / raw) To: Linux Memory Management List Greetings MM community, and apologies for being out of touch. We're running into a MM problem which we encountered in the early versions of Chrome OS, about 7 years ago, which is that under certain interactive loads we thrash on executable pages. At the time, Mandeep Baines solved this problem by introducing a min_filelist_kbytes parameter, which simply stops the scanning of the file list whenever the number of pages in it is below that threshold. This works surprisingly well for Chrome OS because the Chrome browser has a known text size and is the only large user program. Additionally we use Feedback-Directed Optimization to keep the hot code together in the same pages. But given that Chromebooks can run Android apps, the picture is changing. We can bump min_filelist_kbytes, but we no longer have an upper bound for the working set of a workflow which cycles through multiple Android apps. Tab/app switching is more natural and therefore more frequent on laptops than it is on phones, and it puts a bigger strain on the MM. I should mention that we manage memory also by OOM-killing Android apps and discarding Chrome tabs before the system runs our of memory. We also reassign kernel-OOM-kill priorities for the cases in which our user-level killing code isn't quick enough. In our attempts to avoid the thrashing, we played around with swappiness. Dmitry Torokhov (three desks down from mine) suggested shifting the upper bound of 100 to 200, which makes sense because we use zram to reclaim anonymous pages, and paging back from zram is a lot faster than reading from SSD. So I have played around with swappiness up to 190 but I can still reproduce the thrashing. I have noticed this code in vmscan.c: if (!sc->priority && swappiness) { scan_balance = SCAN_EQUAL; goto out; } which suggests that under heavy pressure, swappiness is ignored. I removed this code, but that didn't help either. I am not fully convinced that my experiments are fully repeatable (quite the opposite), and there may be variations in the point at which thrashing starts, but the bottom line is that it still starts. Are we the only ones with this problem? It's possible, since Android by design can be aggressive in killing processes, and conversely Chrome OS is popular in the low-end of the market, where devices with 2GB of RAM are still common, and memory exhaustion can be reached pretty easily. I noticed that vmscan.c has code which tries to protect pages with the VM_EXEC flag from premature eviction, so the problem might have been seen before in some form. I'll be grateful for any suggestion, advice, or other information. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thrashing on file pages 2017-04-05 1:01 thrashing on file pages Luigi Semenzato @ 2017-04-11 19:25 ` Luigi Semenzato 2017-04-13 5:42 ` Minchan Kim 1 sibling, 0 replies; 5+ messages in thread From: Luigi Semenzato @ 2017-04-11 19:25 UTC (permalink / raw) To: Linux Memory Management List Maybe this message was too long. Quick summary: Are we (chrome os) the only ones who experience thrashing from excessive eviction of code pages? Chrome OS added a mechanism (also called "the hacky patch") https://codereview.chromium.org/4128001 which stops the scanning of file lists below a fixed threshold (configurable with sysctl). This has worked very well. Would it be worth upstreaming? Are there alternatives? We have other ways of freeing up memory---specifically we close Chrome tabs (and Android apps, now). But, depending on allocation speed, we may get behind with the freeing, and end up thrashing to the point that even OOM kills are seriously delayed. And furthermore: are we the only one who would like to see the max value for swappiness be raised from 100 to 200? This seems reasonable when the swap device is much faster than the file backing device. These may not be issues on servers, where the load is carefully controlled. But they seem hard to avoid on consumer devices. Your reply will help millions of people! (Us too, but that's just a side effect.) Thanks :) On Tue, Apr 4, 2017 at 6:01 PM, Luigi Semenzato <semenzato@google.com> wrote: > Greetings MM community, and apologies for being out of touch. > > We're running into a MM problem which we encountered in the early > versions of Chrome OS, about 7 years ago, which is that under certain > interactive loads we thrash on executable pages. > > At the time, Mandeep Baines solved this problem by introducing a > min_filelist_kbytes parameter, which simply stops the scanning of the > file list whenever the number of pages in it is below that threshold. > This works surprisingly well for Chrome OS because the Chrome browser > has a known text size and is the only large user program. > Additionally we use Feedback-Directed Optimization to keep the hot > code together in the same pages. > > But given that Chromebooks can run Android apps, the picture is > changing. We can bump min_filelist_kbytes, but we no longer have an > upper bound for the working set of a workflow which cycles through > multiple Android apps. Tab/app switching is more natural and > therefore more frequent on laptops than it is on phones, and it puts a > bigger strain on the MM. > > I should mention that we manage memory also by OOM-killing Android > apps and discarding Chrome tabs before the system runs our of memory. > We also reassign kernel-OOM-kill priorities for the cases in which our > user-level killing code isn't quick enough. > > In our attempts to avoid the thrashing, we played around with > swappiness. Dmitry Torokhov (three desks down from mine) suggested > shifting the upper bound of 100 to 200, which makes sense because we > use zram to reclaim anonymous pages, and paging back from zram is a > lot faster than reading from SSD. So I have played around with > swappiness up to 190 but I can still reproduce the thrashing. I have > noticed this code in vmscan.c: > > if (!sc->priority && swappiness) { > scan_balance = SCAN_EQUAL; > goto out; > } > > which suggests that under heavy pressure, swappiness is ignored. I > removed this code, but that didn't help either. I am not fully > convinced that my experiments are fully repeatable (quite the > opposite), and there may be variations in the point at which thrashing > starts, but the bottom line is that it still starts. > > Are we the only ones with this problem? It's possible, since Android > by design can be aggressive in killing processes, and conversely > Chrome OS is popular in the low-end of the market, where devices with > 2GB of RAM are still common, and memory exhaustion can be reached > pretty easily. I noticed that vmscan.c has code which tries to > protect pages with the VM_EXEC flag from premature eviction, so the > problem might have been seen before in some form. > > I'll be grateful for any suggestion, advice, or other information. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thrashing on file pages 2017-04-05 1:01 thrashing on file pages Luigi Semenzato 2017-04-11 19:25 ` Luigi Semenzato @ 2017-04-13 5:42 ` Minchan Kim 2017-04-21 18:15 ` Luigi Semenzato 1 sibling, 1 reply; 5+ messages in thread From: Minchan Kim @ 2017-04-13 5:42 UTC (permalink / raw) To: Luigi Semenzato Cc: Linux Memory Management List, timmurray, Johannes Weiner, vinmenon Hi Luigi, On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote: > Greetings MM community, and apologies for being out of touch. > > We're running into a MM problem which we encountered in the early > versions of Chrome OS, about 7 years ago, which is that under certain > interactive loads we thrash on executable pages. > > At the time, Mandeep Baines solved this problem by introducing a > min_filelist_kbytes parameter, which simply stops the scanning of the > file list whenever the number of pages in it is below that threshold. > This works surprisingly well for Chrome OS because the Chrome browser > has a known text size and is the only large user program. > Additionally we use Feedback-Directed Optimization to keep the hot > code together in the same pages. > > But given that Chromebooks can run Android apps, the picture is > changing. We can bump min_filelist_kbytes, but we no longer have an > upper bound for the working set of a workflow which cycles through > multiple Android apps. Tab/app switching is more natural and > therefore more frequent on laptops than it is on phones, and it puts a > bigger strain on the MM. > > I should mention that we manage memory also by OOM-killing Android > apps and discarding Chrome tabs before the system runs our of memory. > We also reassign kernel-OOM-kill priorities for the cases in which our > user-level killing code isn't quick enough. > > In our attempts to avoid the thrashing, we played around with > swappiness. Dmitry Torokhov (three desks down from mine) suggested > shifting the upper bound of 100 to 200, which makes sense because we It does makes sense but look at below. > use zram to reclaim anonymous pages, and paging back from zram is a > lot faster than reading from SSD. So I have played around with > swappiness up to 190 but I can still reproduce the thrashing. I have > noticed this code in vmscan.c: > > if (!sc->priority && swappiness) { > scan_balance = SCAN_EQUAL; > goto out; > } > > which suggests that under heavy pressure, swappiness is ignored. I > removed this code, but that didn't help either. I am not fully > convinced that my experiments are fully repeatable (quite the > opposite), and there may be variations in the point at which thrashing > starts, but the bottom line is that it still starts. If sc->priroity is zero, maybe, it means VM would already reclaim lots of workingset. That might be one of reason you cannot see the difference. I think more culprit is as follow, get_scan_count: if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) { scan_balance = SCAN_FILE; goto out; } And it works with shrink_list: if (is_active_lru(lru)) if (inactive_list_is_low(lru) shrink_active_list(lru); It means VM prefer file-backed page to anonymous page reclaim until below condition. get_scan_count: if (global_reclaim(sc)) { if (zonefile + zonefree <= high_wmark_pages(zone)) scan_balance = SCAN_ANON; } It means VM will protect some amount of file-backed pages but the amount of pages VM protected depends high watermark which relies on min_free_kbytes. Recently, you can control the size via watermark_scale_factor without min_free_kbytes. So you can mimic min_filelist_kbytes with that although it has limitation for high watermark(20%). (795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory) > > Are we the only ones with this problem? It's possible, since Android No. You're not lonely. http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com Johannes are preparing some patches(aggressive anonymous page reclaim + thrashing detection). https://lwn.net/Articles/690069/ https://marc.info/?l=linux-mm&m=148351203826308 I hope we makes progress the discussion to find some solution. Please, join the discussion if you have interested. :) Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thrashing on file pages 2017-04-13 5:42 ` Minchan Kim @ 2017-04-21 18:15 ` Luigi Semenzato 2017-04-24 7:05 ` Minchan Kim 0 siblings, 1 reply; 5+ messages in thread From: Luigi Semenzato @ 2017-04-21 18:15 UTC (permalink / raw) To: Minchan Kim Cc: Linux Memory Management List, Tim Murray, Johannes Weiner, vinmenon Thank you very much Minchan. I took a look at Johannes proposal. It all makes sense but I'd like to point out one additional issue, which is partly a time scale issue. In Chrome OS (and this potentially applies to Android) one common use pattern is to do some work in one browser tab, then switch to another tab and do some work there and so on (think of apps instead of tabs on Android). Thus there is a loose notion of a "working set of tabs". For Chrome OS, it is important that the tab working set fit in memory (RAM + swap). If it does not, some tabs in the set get "discarded" while using the others: i.e. the browser releases most of their resources, including their javascript and DOM state. Thus, swapping is *much* better than discarding, and usually faster. Then it is quite allright for a renderer process (a process backing one or more tabs) to make very little progress for some time, while it pages in its code and data (mostly data in the case of Chrome OS). The length of "some time" depends on the application, but in this case (interactive application) could be as long as a small number of seconds. Thus there should be a way of nullifying any actions that may be taken as a result of thrashing detection, because in these cases the thrashing is expected and preferable to the alternatives. On Wed, Apr 12, 2017 at 10:42 PM, Minchan Kim <minchan@kernel.org> wrote: > Hi Luigi, > > On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote: >> Greetings MM community, and apologies for being out of touch. >> >> We're running into a MM problem which we encountered in the early >> versions of Chrome OS, about 7 years ago, which is that under certain >> interactive loads we thrash on executable pages. >> >> At the time, Mandeep Baines solved this problem by introducing a >> min_filelist_kbytes parameter, which simply stops the scanning of the >> file list whenever the number of pages in it is below that threshold. >> This works surprisingly well for Chrome OS because the Chrome browser >> has a known text size and is the only large user program. >> Additionally we use Feedback-Directed Optimization to keep the hot >> code together in the same pages. >> >> But given that Chromebooks can run Android apps, the picture is >> changing. We can bump min_filelist_kbytes, but we no longer have an >> upper bound for the working set of a workflow which cycles through >> multiple Android apps. Tab/app switching is more natural and >> therefore more frequent on laptops than it is on phones, and it puts a >> bigger strain on the MM. >> >> I should mention that we manage memory also by OOM-killing Android >> apps and discarding Chrome tabs before the system runs our of memory. >> We also reassign kernel-OOM-kill priorities for the cases in which our >> user-level killing code isn't quick enough. >> >> In our attempts to avoid the thrashing, we played around with >> swappiness. Dmitry Torokhov (three desks down from mine) suggested >> shifting the upper bound of 100 to 200, which makes sense because we > > It does makes sense but look at below. > >> use zram to reclaim anonymous pages, and paging back from zram is a >> lot faster than reading from SSD. So I have played around with >> swappiness up to 190 but I can still reproduce the thrashing. I have >> noticed this code in vmscan.c: >> >> if (!sc->priority && swappiness) { >> scan_balance = SCAN_EQUAL; >> goto out; >> } >> >> which suggests that under heavy pressure, swappiness is ignored. I >> removed this code, but that didn't help either. I am not fully >> convinced that my experiments are fully repeatable (quite the >> opposite), and there may be variations in the point at which thrashing >> starts, but the bottom line is that it still starts. > > If sc->priroity is zero, maybe, it means VM would already reclaim > lots of workingset. That might be one of reason you cannot see the > difference. > > I think more culprit is as follow, > > get_scan_count: > > if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) { > scan_balance = SCAN_FILE; > goto out; > } > > And it works with > shrink_list: > if (is_active_lru(lru)) > if (inactive_list_is_low(lru) > shrink_active_list(lru); > > It means VM prefer file-backed page to anonymous page reclaim until below condition. > > get_scan_count: > > if (global_reclaim(sc)) { > if (zonefile + zonefree <= high_wmark_pages(zone)) > scan_balance = SCAN_ANON; > } > > It means VM will protect some amount of file-backed pages but > the amount of pages VM protected depends high watermark which relies on > min_free_kbytes. Recently, you can control the size via watermark_scale_factor > without min_free_kbytes. So you can mimic min_filelist_kbytes with that > although it has limitation for high watermark(20%). > (795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory) > >> >> Are we the only ones with this problem? It's possible, since Android > > No. You're not lonely. > http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com > > Johannes are preparing some patches(aggressive anonymous page reclaim > + thrashing detection). > > https://lwn.net/Articles/690069/ > https://marc.info/?l=linux-mm&m=148351203826308 > > I hope we makes progress the discussion to find some solution. > Please, join the discussion if you have interested. :) > > Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: thrashing on file pages 2017-04-21 18:15 ` Luigi Semenzato @ 2017-04-24 7:05 ` Minchan Kim 0 siblings, 0 replies; 5+ messages in thread From: Minchan Kim @ 2017-04-24 7:05 UTC (permalink / raw) To: Luigi Semenzato Cc: Linux Memory Management List, Tim Murray, Johannes Weiner, vinmenon On Fri, Apr 21, 2017 at 11:15:11AM -0700, Luigi Semenzato wrote: > Thank you very much Minchan. > > I took a look at Johannes proposal. It all makes sense but I'd like > to point out one additional issue, which is partly a time scale issue. > > In Chrome OS (and this potentially applies to Android) one common use > pattern is to do some work in one browser tab, then switch to another > tab and do some work there and so on (think of apps instead of tabs on > Android). Thus there is a loose notion of a "working set of tabs". > > For Chrome OS, it is important that the tab working set fit in memory > (RAM + swap). If it does not, some tabs in the set get "discarded" > while using the others: i.e. the browser releases most of their > resources, including their javascript and DOM state. > > Thus, swapping is *much* better than discarding, and usually faster. > Then it is quite allright for a renderer process (a process backing > one or more tabs) to make very little progress for some time, while it > pages in its code and data (mostly data in the case of Chrome OS). > The length of "some time" depends on the application, but in this case > (interactive application) could be as long as a small number of > seconds. > > Thus there should be a way of nullifying any actions that may be taken > as a result of thrashing detection, because in these cases the > thrashing is expected and preferable to the alternatives. Once we are able to quantify memory pressure, it would be more easier to have a relative scale of memory pressure discrimination like Johannes mentioned. >From the idea, we can implement "reclaiming priorities per mem cgroup" from Tim more sientific, IMHO. With that, you can make some groups's reclaim void although thrashing happens. > > > > > On Wed, Apr 12, 2017 at 10:42 PM, Minchan Kim <minchan@kernel.org> wrote: > > Hi Luigi, > > > > On Tue, Apr 04, 2017 at 06:01:50PM -0700, Luigi Semenzato wrote: > >> Greetings MM community, and apologies for being out of touch. > >> > >> We're running into a MM problem which we encountered in the early > >> versions of Chrome OS, about 7 years ago, which is that under certain > >> interactive loads we thrash on executable pages. > >> > >> At the time, Mandeep Baines solved this problem by introducing a > >> min_filelist_kbytes parameter, which simply stops the scanning of the > >> file list whenever the number of pages in it is below that threshold. > >> This works surprisingly well for Chrome OS because the Chrome browser > >> has a known text size and is the only large user program. > >> Additionally we use Feedback-Directed Optimization to keep the hot > >> code together in the same pages. > >> > >> But given that Chromebooks can run Android apps, the picture is > >> changing. We can bump min_filelist_kbytes, but we no longer have an > >> upper bound for the working set of a workflow which cycles through > >> multiple Android apps. Tab/app switching is more natural and > >> therefore more frequent on laptops than it is on phones, and it puts a > >> bigger strain on the MM. > >> > >> I should mention that we manage memory also by OOM-killing Android > >> apps and discarding Chrome tabs before the system runs our of memory. > >> We also reassign kernel-OOM-kill priorities for the cases in which our > >> user-level killing code isn't quick enough. > >> > >> In our attempts to avoid the thrashing, we played around with > >> swappiness. Dmitry Torokhov (three desks down from mine) suggested > >> shifting the upper bound of 100 to 200, which makes sense because we > > > > It does makes sense but look at below. > > > >> use zram to reclaim anonymous pages, and paging back from zram is a > >> lot faster than reading from SSD. So I have played around with > >> swappiness up to 190 but I can still reproduce the thrashing. I have > >> noticed this code in vmscan.c: > >> > >> if (!sc->priority && swappiness) { > >> scan_balance = SCAN_EQUAL; > >> goto out; > >> } > >> > >> which suggests that under heavy pressure, swappiness is ignored. I > >> removed this code, but that didn't help either. I am not fully > >> convinced that my experiments are fully repeatable (quite the > >> opposite), and there may be variations in the point at which thrashing > >> starts, but the bottom line is that it still starts. > > > > If sc->priroity is zero, maybe, it means VM would already reclaim > > lots of workingset. That might be one of reason you cannot see the > > difference. > > > > I think more culprit is as follow, > > > > get_scan_count: > > > > if (!inactive_file_is_low(lruvec) && lruvec_lru_size() >> sc->priroity) { > > scan_balance = SCAN_FILE; > > goto out; > > } > > > > And it works with > > shrink_list: > > if (is_active_lru(lru)) > > if (inactive_list_is_low(lru) > > shrink_active_list(lru); > > > > It means VM prefer file-backed page to anonymous page reclaim until below condition. > > > > get_scan_count: > > > > if (global_reclaim(sc)) { > > if (zonefile + zonefree <= high_wmark_pages(zone)) > > scan_balance = SCAN_ANON; > > } > > > > It means VM will protect some amount of file-backed pages but > > the amount of pages VM protected depends high watermark which relies on > > min_free_kbytes. Recently, you can control the size via watermark_scale_factor > > without min_free_kbytes. So you can mimic min_filelist_kbytes with that > > although it has limitation for high watermark(20%). > > (795ae7a0de6b, mm: scale kswapd watermarks in proportion to memory) > > > >> > >> Are we the only ones with this problem? It's possible, since Android > > > > No. You're not lonely. > > http://lkml.kernel.org/r/20170317231636.142311-1-timmurray@google.com > > > > Johannes are preparing some patches(aggressive anonymous page reclaim > > + thrashing detection). > > > > https://lwn.net/Articles/690069/ > > https://marc.info/?l=linux-mm&m=148351203826308 > > > > I hope we makes progress the discussion to find some solution. > > Please, join the discussion if you have interested. :) > > > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-04-24 7:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-04-05 1:01 thrashing on file pages Luigi Semenzato 2017-04-11 19:25 ` Luigi Semenzato 2017-04-13 5:42 ` Minchan Kim 2017-04-21 18:15 ` Luigi Semenzato 2017-04-24 7:05 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).