[RFC PATCH V2] Enforce RSS+Swap rlimit

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jerome Marchand <jmarchan@redhat.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: [RFC PATCH V2] Enforce RSS+Swap rlimit
Date: Tue, 15 Nov 2011 14:10:02 +0100	[thread overview]
Message-ID: <4EC264AA.30306@redhat.com> (raw)
In-Reply-To: <4EB3FA89.6090601@redhat.com>


Change since V1: rebase on 3.2-rc1

Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
its RSS limit and allow it swap out. That would hurts the performance of all
system, even when memory resources are plentiful.

Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
a limit on rss+swap value. This is similar to memsw limits of cgroup.
If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
signal. 

My tests show that code in do_anonymous_page() and __do_fault() indeed prevents
processes to get more memory than the limit and I haven't seen any adverse
effect, but so far, I have no test coverage of the code in do_wp_page(). I'm
not sure how to test it.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h |    7 +++++++
 mm/memory.c        |   21 +++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3dc3a8c..3b54ff1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1092,6 +1092,13 @@ static inline unsigned long get_mm_rss(struct mm_struct *mm)
 		get_mm_counter(mm, MM_ANONPAGES);
 }
 
+static inline unsigned long get_mm_memsw(struct mm_struct *mm)
+{
+	return get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter(mm, MM_ANONPAGES) +
+		get_mm_counter(mm, MM_SWAPENTS);
+}
+
 static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
 {
 	return max(mm->hiwater_rss, get_mm_rss(mm));
diff --git a/mm/memory.c b/mm/memory.c
index 829d437..b0463c2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2661,8 +2661,14 @@ gotten:
 				dec_mm_counter_fast(mm, MM_FILEPAGES);
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
-		} else
+		} else {
+			if (get_mm_memsw(mm) >=
+			    rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+				ret |= VM_FAULT_SIGBUS;
+				goto release;
+			}
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
+		}
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2713,6 +2719,7 @@ gotten:
 	} else
 		mem_cgroup_uncharge_page(new_page);
 
+release:
 	if (new_page)
 		page_cache_release(new_page);
 unlock:
@@ -3073,6 +3080,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t entry;
+	int ret = 0;
 
 	pte_unmap(page_table);
 
@@ -3109,6 +3117,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (!pte_none(*page_table))
 		goto release;
 
+	if (get_mm_memsw(mm) >=  rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+		ret = VM_FAULT_SIGBUS;
+		goto release;
+	}
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, vma, address);
 setpte:
@@ -3118,7 +3130,7 @@ setpte:
 	update_mmu_cache(vma, address, page_table);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
-	return 0;
+	return ret;
 release:
 	mem_cgroup_uncharge_page(page);
 	page_cache_release(page);
@@ -3263,6 +3275,10 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		entry = mk_pte(page, vma->vm_page_prot);
 		if (flags & FAULT_FLAG_WRITE)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		if (get_mm_memsw(mm) >=  rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+			ret = VM_FAULT_SIGBUS;
+			goto unlock;
+		}
 		if (anon) {
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
 			page_add_new_anon_rmap(page, vma, address);
@@ -3287,6 +3303,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			anon = 1; /* no anon but release faulted_page */
 	}
 
+unlock:
 	pte_unmap_unlock(page_table, ptl);
 
 	if (dirty_page) {

WARNING: multiple messages have this Message-ID (diff)

From: Jerome Marchand <jmarchan@redhat.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: [RFC PATCH V2] Enforce RSS+Swap rlimit
Date: Tue, 15 Nov 2011 14:10:02 +0100	[thread overview]
Message-ID: <4EC264AA.30306@redhat.com> (raw)
In-Reply-To: <4EB3FA89.6090601@redhat.com>


Change since V1: rebase on 3.2-rc1

Currently RSS rlimit is not enforced. We can not forbid a process to exceeds
its RSS limit and allow it swap out. That would hurts the performance of all
system, even when memory resources are plentiful.

Therefore, instead of enforcing a limit on rss usage alone, this patch enforces
a limit on rss+swap value. This is similar to memsw limits of cgroup.
If a process rss+swap usage exceeds RLIMIT_RSS max limit, he received a SIGBUS
signal. 

My tests show that code in do_anonymous_page() and __do_fault() indeed prevents
processes to get more memory than the limit and I haven't seen any adverse
effect, but so far, I have no test coverage of the code in do_wp_page(). I'm
not sure how to test it.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h |    7 +++++++
 mm/memory.c        |   21 +++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3dc3a8c..3b54ff1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1092,6 +1092,13 @@ static inline unsigned long get_mm_rss(struct mm_struct *mm)
 		get_mm_counter(mm, MM_ANONPAGES);
 }
 
+static inline unsigned long get_mm_memsw(struct mm_struct *mm)
+{
+	return get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter(mm, MM_ANONPAGES) +
+		get_mm_counter(mm, MM_SWAPENTS);
+}
+
 static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
 {
 	return max(mm->hiwater_rss, get_mm_rss(mm));
diff --git a/mm/memory.c b/mm/memory.c
index 829d437..b0463c2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2661,8 +2661,14 @@ gotten:
 				dec_mm_counter_fast(mm, MM_FILEPAGES);
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
-		} else
+		} else {
+			if (get_mm_memsw(mm) >=
+			    rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+				ret |= VM_FAULT_SIGBUS;
+				goto release;
+			}
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
+		}
 		flush_cache_page(vma, address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
@@ -2713,6 +2719,7 @@ gotten:
 	} else
 		mem_cgroup_uncharge_page(new_page);
 
+release:
 	if (new_page)
 		page_cache_release(new_page);
 unlock:
@@ -3073,6 +3080,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t entry;
+	int ret = 0;
 
 	pte_unmap(page_table);
 
@@ -3109,6 +3117,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (!pte_none(*page_table))
 		goto release;
 
+	if (get_mm_memsw(mm) >=  rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+		ret = VM_FAULT_SIGBUS;
+		goto release;
+	}
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, vma, address);
 setpte:
@@ -3118,7 +3130,7 @@ setpte:
 	update_mmu_cache(vma, address, page_table);
 unlock:
 	pte_unmap_unlock(page_table, ptl);
-	return 0;
+	return ret;
 release:
 	mem_cgroup_uncharge_page(page);
 	page_cache_release(page);
@@ -3263,6 +3275,10 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		entry = mk_pte(page, vma->vm_page_prot);
 		if (flags & FAULT_FLAG_WRITE)
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+		if (get_mm_memsw(mm) >=  rlimit_max(RLIMIT_RSS) >> PAGE_SHIFT) {
+			ret = VM_FAULT_SIGBUS;
+			goto unlock;
+		}
 		if (anon) {
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
 			page_add_new_anon_rmap(page, vma, address);
@@ -3287,6 +3303,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			anon = 1; /* no anon but release faulted_page */
 	}
 
+unlock:
 	pte_unmap_unlock(page_table, ptl);
 
 	if (dirty_page) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-11-15 13:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-04 14:45 [RFC PATCH] Enforce RSS+Swap rlimit Jerome Marchand
2011-11-04 14:45 ` Jerome Marchand
2011-11-15 13:10 ` Jerome Marchand [this message]
2011-11-15 13:10   ` [RFC PATCH V2] " Jerome Marchand
2011-11-16  0:02   ` KOSAKI Motohiro
2011-11-16  0:02     ` KOSAKI Motohiro
2011-11-16  9:40     ` Jerome Marchand
2011-11-16  9:40       ` Jerome Marchand
2011-11-16 10:09   ` Balbir Singh
2011-11-16 10:09     ` Balbir Singh

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3dc3a8c dfblob:3b54ff1 dfblob:829d437 dfblob:b0463c2
dfblob:3dc3a8c dfblob:3b54ff1 dfblob:829d437 dfblob:b0463c2 )
 OR (
bs:"[RFC PATCH V2] Enforce RSS+Swap rlimit" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EC264AA.30306@redhat.com \
    --to=jmarchan@redhat.com \
    --cc=bsingharora@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.