From: Minchan Kim <minchan@kernel.org>
To: Luigi Semenzato <semenzato@google.com>
Cc: linux-mm@kvack.org, Dan Magenheimer <dan.magenheimer@oracle.com>,
Sonny Rao <sonnyrao@google.com>, Bryan Freed <bfreed@google.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: zram, OOM, and speed of allocation
Date: Mon, 3 Dec 2012 15:42:12 +0900 [thread overview]
Message-ID: <20121203064212.GA4569@blaptop> (raw)
In-Reply-To: <CAA25o9RiNfwtoeMBk=PLg-X_2wPSHuYLztONw1KToeOx9pUHGw@mail.gmail.com>
Hi Luigi,
On Thu, Nov 29, 2012 at 11:31:46AM -0800, Luigi Semenzato wrote:
> Oh well, I found the problem, it's laptop_mode. We keep it on by
> default. When I turn it off, I can allocate as fast as I can, and no
> OOMs happen until swap is exhausted.
>
> I don't think this is a desirable behavior even for laptop_mode, so if
> anybody wants to help me debug it (or wants my help in debugging it)
> do let me know.
Interesting.
Just a quick trial.
Could you try this patch based on your kernel without my previous patch "
wakeup kswapd in direct reclaim path"?
If you still has a trouble about stopped kswapd, plz apply both patches.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32bc955..4a7fe5d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -725,6 +725,7 @@ typedef struct pglist_data {
struct task_struct *kswapd; /* Protected by lock_memory_hotplug() */
int kswapd_max_order;
enum zone_type classzone_idx;
+ bool may_writepage;
} pg_data_t;
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 53dcde9..1952420 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -68,6 +68,11 @@ struct scan_control {
/* This context's GFP mask */
gfp_t gfp_mask;
+ /*
+ * If laptop_mode is true, you don't need to set may_writepage.
+ * Otherwise, you should set may_writepage explicitly.
+ */
+ bool laptop_mode;
int may_writepage;
/* Can mapped pages be reclaimed? */
@@ -1846,6 +1851,15 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
unsigned long nr_reclaimed, nr_scanned;
unsigned long nr_to_reclaim = sc->nr_to_reclaim;
struct blk_plug plug;
+ struct zone *zone = lruvec_zone(lruvec);
+ pg_data_t *pgdat = zone->zone_pgdat;
+
+ if (sc->laptop_mode) {
+ if (pgdat->may_writepage)
+ sc->may_writepage = 1;
+ else
+ sc->may_writepage = 0;
+ }
restart:
nr_reclaimed = 0;
@@ -2145,11 +2159,9 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
* writeout. So in laptop mode, write out the whole world.
*/
writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2;
- if (total_scanned > writeback_threshold) {
- wakeup_flusher_threads(laptop_mode ? 0 : total_scanned,
+ if (total_scanned > writeback_threshold)
+ wakeup_flusher_threads(sc->laptop_mode ? 0 : total_scanned,
WB_REASON_TRY_TO_FREE_PAGES);
- sc->may_writepage = 1;
- }
/* Take a nap, wait for some writeback to complete */
if (!sc->hibernation_mode && sc->nr_scanned &&
@@ -2289,7 +2301,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
unsigned long nr_reclaimed;
struct scan_control sc = {
.gfp_mask = gfp_mask,
- .may_writepage = !laptop_mode,
+ .laptop_mode = laptop_mode,
.nr_to_reclaim = SWAP_CLUSTER_MAX,
.may_unmap = 1,
.may_swap = 1,
@@ -2331,7 +2343,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
struct scan_control sc = {
.nr_scanned = 0,
.nr_to_reclaim = SWAP_CLUSTER_MAX,
- .may_writepage = !laptop_mode,
+ .laptop_mode = laptop_mode,
.may_unmap = 1,
.may_swap = !noswap,
.order = 0,
@@ -2370,7 +2382,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
unsigned long nr_reclaimed;
int nid;
struct scan_control sc = {
- .may_writepage = !laptop_mode,
+ .laptop_mode = laptop_mode,
.may_unmap = 1,
.may_swap = !noswap,
.nr_to_reclaim = SWAP_CLUSTER_MAX,
@@ -2585,7 +2597,7 @@ loop_again:
total_scanned = 0;
sc.priority = DEF_PRIORITY;
sc.nr_reclaimed = 0;
- sc.may_writepage = !laptop_mode;
+ sc.laptop_mode = laptop_mode;
count_vm_event(PAGEOUTRUN);
do {
@@ -2722,7 +2734,7 @@ loop_again:
*/
if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
- sc.may_writepage = 1;
+ zone->zone_pgdat->may_writepage = true;
if (zone->all_unreclaimable) {
if (end_zone && end_zone == i)
@@ -2749,6 +2761,7 @@ loop_again:
* speculatively avoid congestion waits
*/
zone_clear_flag(zone, ZONE_CONGESTED);
+ zone->zone_pgdat->may_writepage = false;
if (i <= *classzone_idx)
balanced += zone->present_pages;
}
@@ -3112,6 +3125,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
.gfp_mask = GFP_HIGHUSER_MOVABLE,
.may_swap = 1,
.may_unmap = 1,
+ .laptop_mode = false,
.may_writepage = 1,
.nr_to_reclaim = nr_to_reclaim,
.hibernation_mode = 1,
@@ -3299,6 +3313,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
struct task_struct *p = current;
struct reclaim_state reclaim_state;
struct scan_control sc = {
+ .laptop_mode = false,
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
.may_swap = 1,
>
> Thanks!
> Luigi
>
> On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato <semenzato@google.com> wrote:
> > Minchan:
> >
> > I tried your suggestion to move the call to wake_all_kswapd from after
> > "restart:" to after "rebalance:". The behavior is still similar, but
> > slightly improved. Here's what I see.
> >
> > Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used,
> > then OOM kills happen, and the system ends up with 1 GB swap used, 2
> > unused.
> >
> > Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used,
> > and continue happening while swap fills up. Eventually swap fills up
> > completely. This is better than before (could not go past about 1 GB
> > of swap used), but there are too many kills too early. I would like
> > to see no OOM kills until swap is full or almost full.
> >
> > Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills
> > happen earlier, and not all swap space is used (400 MB free at the
> > end).
> >
> > This is with 200 processes using 20 MB each, and 2:1 compression ratio.
> >
> > So it looks like kswapd is still not aggressive enough in pushing
> > pages out. What's the best way of changing that? Play around with
> > the watermarks?
> >
> > Incidentally, I also tried removing the min_filelist_kbytes hacky
> > patch, but, as usual, the system thrashes so badly that it's
> > impossible to complete any experiment. I set it to a lower minimum
> > amount of free file pages, 10 MB instead of the 50 MB which we use
> > normally, and I could run with some thrashing, but I got the same
> > results.
> >
> > Thanks!
> > Luigi
> >
> >
> > On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato <semenzato@google.com> wrote:
> >> I am beginning to understand why zram appears to work fine on our x86
> >> systems but not on our ARM systems. The bottom line is that swapping
> >> doesn't work as I would expect when allocation is "too fast".
> >>
> >> In one of my tests, opening 50 tabs simultaneously in a Chrome browser
> >> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I
> >> was observing that on the x86 device all of the zram swap space was
> >> used before OOM kills happened, but on the ARM device I would see OOM
> >> kills when only about 1 GB (out of 3) was swapped out.
> >>
> >> I wrote a simple program to understand this behavior. The program
> >> (called "hog") allocates memory and fills it with a mix of
> >> incompressible data (from /dev/urandom) and highly compressible data
> >> (1's, just to avoid zero pages) in a given ratio. The memory is never
> >> touched again.
> >>
> >> It turns out that if I don't limit the allocation speed, I see
> >> premature OOM kills also on the x86 device. If I limit the allocation
> >> to 10 MB/s, the premature OOM kills stop happening on the x86 device,
> >> but still happen on the ARM device. If I further limit the allocation
> >> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM
> >> device.
> >>
> >> I have noticed a few time constants in the MM whose value is not well
> >> explained, and I am wondering if the code is tuned for some ideal
> >> system that doesn't behave like ours (considering, for instance, that
> >> zram is much faster than swapping to a disk device, but it also uses
> >> more CPU). If this is plausible, I am wondering if anybody has
> >> suggestions for changes that I could try out to obtain a better
> >> behavior with a higher allocation speed.
> >>
> >> Thanks!
> >> Luigi
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-12-03 6:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-29 0:31 zram, OOM, and speed of allocation Luigi Semenzato
2012-11-29 18:46 ` Luigi Semenzato
2012-11-29 19:31 ` Luigi Semenzato
2012-11-29 20:55 ` Sonny Rao
2012-11-29 21:33 ` Luigi Semenzato
2012-11-29 22:57 ` Sonny Rao
2013-02-17 2:49 ` Jaegeuk Hanse
2012-12-03 6:42 ` Minchan Kim [this message]
2012-12-03 7:38 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121203064212.GA4569@blaptop \
--to=minchan@kernel.org \
--cc=bfreed@google.com \
--cc=dan.magenheimer@oracle.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=semenzato@google.com \
--cc=sonnyrao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).