linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Chad Talbott <ctalbott@google.com>,
	Justin TerAvest <teravest@google.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback
Date: Mon, 14 Mar 2011 13:54:08 -0400	[thread overview]
Message-ID: <20110314175408.GE31120@redhat.com> (raw)
In-Reply-To: <1299869011-26152-9-git-send-email-gthelen@google.com>

On Fri, Mar 11, 2011 at 10:43:30AM -0800, Greg Thelen wrote:
> If the current process is in a non-root memcg, then
> balance_dirty_pages() will consider the memcg dirty limits as well as
> the system-wide limits.  This allows different cgroups to have distinct
> dirty limits which trigger direct and background writeback at different
> levels.
> 
> If called with a mem_cgroup, then throttle_vm_writeout() queries the
> given cgroup for its dirty memory usage limits.
> 
> Signed-off-by: Andrea Righi <arighi@develer.com>
> Signed-off-by: Greg Thelen <gthelen@google.com>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> Changelog since v5:
> - Simplified this change by using mem_cgroup_balance_dirty_pages() rather than
>   cramming the somewhat different logic into balance_dirty_pages().  This means
>   the global (non-memcg) dirty limits are not passed around in the
>   struct dirty_info, so there's less change to existing code.

Yes there is less change to existing code but now we also have a separate
throttlig logic for cgroups. 

I thought that we are moving in the direction of IO less throttling
where bdi threads always do the IO and Jan Kara also implemented the
logic to distribute the finished IO pages uniformly across the waiting
threads.

Keeping it separate for cgroups, reduces the complexity but also forks
off the balancing logic for root and other cgroups. So if Jan Kara's
changes go in, it automatically does not get used for memory cgroups.

Not sure how good a idea it is to use a separate throttling logic for
for non-root cgroups. 

Thanks
Vivek 

> 
> Changelog since v4:
> - Added missing 'struct mem_cgroup' forward declaration in writeback.h.
> - Made throttle_vm_writeout() memcg aware.
> - Removed previously added dirty_writeback_pages() which is no longer needed.
> - Added logic to balance_dirty_pages() to throttle if over foreground memcg
>   limit.
> 
> Changelog since v3:
> - Leave determine_dirtyable_memory() static.  v3 made is non-static.
> - balance_dirty_pages() now considers both system and memcg dirty limits and
>   usage data.  This data is retrieved with global_dirty_info() and
>   memcg_dirty_info().  
> 
>  include/linux/writeback.h |    3 ++-
>  mm/page-writeback.c       |   34 ++++++++++++++++++++++++++++------
>  mm/vmscan.c               |    2 +-
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 0ead399..a45d895 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -8,6 +8,7 @@
>  #include <linux/fs.h>
>  
>  struct backing_dev_info;
> +struct mem_cgroup;
>  
>  extern spinlock_t inode_lock;
>  
> @@ -92,7 +93,7 @@ void laptop_mode_timer_fn(unsigned long data);
>  #else
>  static inline void laptop_sync_completion(void) { }
>  #endif
> -void throttle_vm_writeout(gfp_t gfp_mask);
> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup);
>  
>  /* These are exported to sysctl. */
>  extern int dirty_background_ratio;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index d8005b0..f6a8dd6 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -473,7 +473,8 @@ unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
>   * data.  It looks at the number of dirty pages in the machine and will force
>   * the caller to perform writeback if the system is over `vm_dirty_ratio'.
>   * If we're over `background_thresh' then the writeback threads are woken to
> - * perform some writeout.
> + * perform some writeout.  The current task may have per-memcg dirty
> + * limits, which are also checked.
>   */
>  static void balance_dirty_pages(struct address_space *mapping,
>  				unsigned long write_chunk)
> @@ -488,6 +489,8 @@ static void balance_dirty_pages(struct address_space *mapping,
>  	bool dirty_exceeded = false;
>  	struct backing_dev_info *bdi = mapping->backing_dev_info;
>  
> +	mem_cgroup_balance_dirty_pages(mapping, write_chunk);
> +
 
>  	for (;;) {
>  		struct writeback_control wbc = {
>  			.sync_mode	= WB_SYNC_NONE,
> @@ -651,23 +654,42 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
>  
> -void throttle_vm_writeout(gfp_t gfp_mask)
> +/*
> + * Throttle the current task if it is near dirty memory usage limits.  Both
> + * global dirty memory limits and (if @mem_cgroup is given) per-cgroup dirty
> + * memory limits are checked.
> + *
> + * If near limits, then wait for usage to drop.  Dirty usage should drop because
> + * dirty producers should have used balance_dirty_pages(), which would have
> + * scheduled writeback.
> + */
> +void throttle_vm_writeout(gfp_t gfp_mask, struct mem_cgroup *mem_cgroup)
>  {
>  	unsigned long background_thresh;
>  	unsigned long dirty_thresh;
> +	struct dirty_info memcg_info;
> +	bool do_memcg;
>  
>          for ( ; ; ) {
>  		global_dirty_limits(&background_thresh, &dirty_thresh);
> +		do_memcg = mem_cgroup && mem_cgroup_hierarchical_dirty_info(
> +			determine_dirtyable_memory(), true, mem_cgroup,
> +			&memcg_info);
>  
>                  /*
>                   * Boost the allowable dirty threshold a bit for page
>                   * allocators so they don't get DoS'ed by heavy writers
>                   */
>                  dirty_thresh += dirty_thresh / 10;      /* wheeee... */
> -
> -                if (global_page_state(NR_UNSTABLE_NFS) +
> -			global_page_state(NR_WRITEBACK) <= dirty_thresh)
> -                        	break;
> +		if (do_memcg)
> +			memcg_info.dirty_thresh += memcg_info.dirty_thresh / 10;
> +
> +		if ((global_page_state(NR_UNSTABLE_NFS) +
> +		     global_page_state(NR_WRITEBACK) <= dirty_thresh) &&
> +		    (!do_memcg ||
> +		     (memcg_info.nr_unstable_nfs +
> +		      memcg_info.nr_writeback <= memcg_info.dirty_thresh)))
> +			break;
>                  congestion_wait(BLK_RW_ASYNC, HZ/10);
>  
>  		/*
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 060e4c1..035d2ea 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1939,7 +1939,7 @@ restart:
>  					sc->nr_scanned - nr_scanned, sc))
>  		goto restart;
>  
> -	throttle_vm_writeout(sc->gfp_mask);
> +	throttle_vm_writeout(sc->gfp_mask, sc->mem_cgroup);
>  }
>  
>  /*
> -- 
> 1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-03-14 17:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 18:43 [PATCH v6 0/9] memcg: per cgroup dirty page accounting Greg Thelen
2011-03-11 18:43 ` [PATCH v6 1/9] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-03-14 14:50   ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 2/9] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-03-11 18:43 ` [PATCH v6 3/9] memcg: add dirty page accounting infrastructure Greg Thelen
2011-03-14 14:56   ` Minchan Kim
2011-03-11 18:43 ` [PATCH v6 4/9] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-03-14 15:10   ` Minchan Kim
2011-03-15  6:32     ` Greg Thelen
2011-03-15 13:50       ` Ryusuke Konishi
2011-03-11 18:43 ` [PATCH v6 5/9] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-03-11 18:43 ` [PATCH v6 6/9] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-03-14 15:16   ` Minchan Kim
2011-03-15 14:01   ` Mike Heffner
2011-03-16  0:00     ` KAMEZAWA Hiroyuki
2011-03-16  0:50     ` Greg Thelen
2011-03-11 18:43 ` [PATCH v6 7/9] memcg: add dirty limiting routines Greg Thelen
2011-03-11 18:43 ` [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback Greg Thelen
2011-03-14 17:54   ` Vivek Goyal [this message]
2011-03-14 17:59     ` Vivek Goyal
2011-03-14 21:10     ` Jan Kara
2011-03-15  3:27       ` Greg Thelen
2011-03-15 23:12         ` Jan Kara
2011-03-16  2:35           ` Greg Thelen
2011-03-16 12:35             ` Jan Kara
2011-03-16 18:07               ` Vivek Goyal
2011-03-15 16:20       ` Vivek Goyal
2011-03-11 18:43 ` [PATCH v6 9/9] memcg: make background writeback memcg aware Greg Thelen
2011-03-15 22:54   ` Vivek Goyal
2011-03-16  1:00     ` Greg Thelen
2011-03-12  1:10 ` [PATCH v6 0/9] memcg: per cgroup dirty page accounting Andrew Morton
2011-03-14 18:29   ` Greg Thelen
2011-03-14 20:23     ` Vivek Goyal
2011-03-15  2:41       ` Greg Thelen
2011-03-15 18:48         ` Vivek Goyal
2011-03-16 13:13           ` Johannes Weiner
2011-03-16 14:59             ` Vivek Goyal
2011-03-16 16:35               ` Johannes Weiner
2011-03-16 17:06                 ` Vivek Goyal
2011-03-16 21:19             ` Greg Thelen
2011-03-16 21:52               ` Johannes Weiner
2011-03-17  4:41                 ` Greg Thelen
2011-03-17 12:43                   ` Johannes Weiner
2011-03-17 14:49                     ` Vivek Goyal
2011-03-17 14:53                     ` Jan Kara
2011-03-17 15:42                       ` Curt Wohlgemuth
2011-03-18  7:57                     ` Greg Thelen
2011-03-18 14:50                       ` Vivek Goyal
2011-03-23  9:06                       ` KAMEZAWA Hiroyuki
2011-03-18 14:29                     ` Vivek Goyal
2011-03-18 14:46                       ` Johannes Weiner
2011-03-17 14:46                   ` Jan Kara
2011-03-17 17:12                     ` Vivek Goyal
2011-03-17 17:59                       ` Jan Kara
2011-03-17 18:15                         ` Vivek Goyal
2011-03-15 21:23         ` Vivek Goyal
2011-03-15 23:11           ` Vivek Goyal
2011-03-15  1:56     ` KAMEZAWA Hiroyuki
2011-03-15  2:51       ` Greg Thelen
2011-03-15  2:54         ` KAMEZAWA Hiroyuki
2011-03-16 12:45 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314175408.GE31120@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=ctalbott@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=teravest@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).