linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] ksm changes
@ 2009-05-02 22:16 Izik Eidus
  2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
  0 siblings, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Following patchs touch 4 diffrent areas inside ksm:

1) Patchs 1 - 3: Change the api to be more robust and make more sense.
                 This include:
                     * Limiting the number of memory regions user can
                       register inside ksm per file descriptor that
                       he open.

                     * Reject overlap memory addresses registrations.

                     * change KSM_REMOVE_MEMORY_REGION ioctl to make
                       more sense, untill this patchs user was able
                       to register servel memory regions per file
                       descriptor, but when he called to
                       KSM_REMOVE_MEMORY_REGION, he had no way to tell
                       what memory region he want to remove, as a
                       result each call to KSM_REMOVE_MEMORY_REGION
                       nuked all the regions inside the fd.
                       Now KSM_REMOVE_MEMORY_REGION is working on
                       specific addresses.

2) Patch 4: Use generic helper functions to deal with the vma prot.
            
3) Patch 5: Return ksm to be build on all archs (Now after patch 4,
            ksm shouldnt break any arch).

4) Patch 6: change the miscdevice minor number - lets wait to Alan
            saying he is happy with this change before we apply.

Thanks.

Izik Eidus (6):
  ksm: limiting the num of mem regions user can register per fd.
  ksm: dont allow overlap memory addresses registrations.
  ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  ksm: change the prot handling to use the generic helper functions
  ksm: build system make it compile for all archs
  ksm: use another miscdevice minor number.

 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 mm/Kconfig                 |    1 -
 mm/ksm.c                   |  112 +++++++++++++++++++++++++++++++++++++-------
 4 files changed, 96 insertions(+), 20 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-02 22:16 [PATCH 0/6] ksm changes Izik Eidus
@ 2009-05-02 22:16 ` Izik Eidus
  2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Right now user can open /dev/ksm fd and register unlimited number of
regions, such behavior may allocate unlimited amount of kernel memory
and get the whole host into out of memory situation.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6165276..d58db6b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -48,6 +48,9 @@ static int rmap_hash_size;
 module_param(rmap_hash_size, int, 0);
 MODULE_PARM_DESC(rmap_hash_size, "Hash table size for the reverse mapping");
 
+static int regions_per_fd;
+module_param(regions_per_fd, int, 0);
+
 /*
  * ksm_mem_slot - hold information for an userspace scanning range
  * (the scanning for this region will be from addr untill addr +
@@ -67,6 +70,7 @@ struct ksm_mem_slot {
  */
 struct ksm_sma {
 	struct list_head sma_slots;
+	int nregions;
 };
 
 /**
@@ -453,6 +457,11 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if ((ksm_sma->nregions + 1) > regions_per_fd) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
@@ -473,6 +482,7 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
+	ksm_sma->nregions++;
 
 	up_write(&slots_lock);
 	return 0;
@@ -511,6 +521,7 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
 		mmput(slot->mm);
 		list_del(&slot->sma_link);
 		kfree(slot);
+		ksm_sma->nregions--;
 	}
 	up_write(&slots_lock);
 	return 0;
@@ -1389,6 +1400,7 @@ static int ksm_dev_ioctl_create_shared_memory_area(void)
 	}
 
 	INIT_LIST_HEAD(&ksm_sma->sma_slots);
+	ksm_sma->nregions = 0;
 
 	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
 	if (fd < 0)
@@ -1631,6 +1643,9 @@ static int __init ksm_init(void)
 	if (r)
 		goto out_free1;
 
+	if (!regions_per_fd)
+		regions_per_fd = 1024;
+
 	ksm_thread = kthread_run(ksm_scan_thread, NULL, "kksmd");
 	if (IS_ERR(ksm_thread)) {
 		printk(KERN_ERR "ksm: creating kthread failed\n");
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
@ 2009-05-02 22:16   ` Izik Eidus
  2009-05-02 22:16     ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
  2009-05-03  2:08   ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
  2009-05-03  2:08   ` Rik van Riel
  2 siblings, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
@ 2009-05-02 22:16     ` Izik Eidus
  2009-05-02 22:16       ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
  2009-05-04 19:43       ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Hugh Dickins
  0 siblings, 2 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
memory region (instead of flushing all the registred memory regions inside
the file descriptor like it happen now)

The previoes api was:
user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
and then when he wanted to remove just one memory region, he had to remove them
all using KSM_REMOVE_MEMORY_REGION.

This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
ioctl to recive another paramter that it is the begining of the virtual
address that is wanted to be removed.

(user can still remove all the memory regions all at once, by just closing
the file descriptor)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 982dfff..c14019f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -561,17 +561,20 @@ static void remove_mm_from_hash_and_tree(struct mm_struct *mm)
 	list_del(&slot->link);
 }
 
-static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
+static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
+					      unsigned long addr)
 {
 	struct ksm_mem_slot *slot, *node;
 
 	down_write(&slots_lock);
 	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		remove_mm_from_hash_and_tree(slot->mm);
-		mmput(slot->mm);
-		list_del(&slot->sma_link);
-		kfree(slot);
-		ksm_sma->nregions--;
+		if (addr == slot->addr) {
+			remove_mm_from_hash_and_tree(slot->mm);
+			mmput(slot->mm);
+			list_del(&slot->sma_link);
+			kfree(slot);
+			ksm_sma->nregions--;
+		}
 	}
 	up_write(&slots_lock);
 	return 0;
@@ -579,12 +582,20 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
 
 static int ksm_sma_release(struct inode *inode, struct file *filp)
 {
+	struct ksm_mem_slot *slot, *node;
 	struct ksm_sma *ksm_sma = filp->private_data;
-	int r;
 
-	r = ksm_sma_ioctl_remove_memory_region(ksm_sma);
+	down_write(&slots_lock);
+	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
+		remove_mm_from_hash_and_tree(slot->mm);
+		mmput(slot->mm);
+		list_del(&slot->sma_link);
+		kfree(slot);
+	}
+	up_write(&slots_lock);
+
 	kfree(ksm_sma);
-	return r;
+	return 0;
 }
 
 static long ksm_sma_ioctl(struct file *filp,
@@ -607,7 +618,7 @@ static long ksm_sma_ioctl(struct file *filp,
 		break;
 	}
 	case KSM_REMOVE_MEMORY_REGION:
-		r = ksm_sma_ioctl_remove_memory_region(sma);
+		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
 		break;
 	}
 
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/6] ksm: change the prot handling to use the generic helper functions
  2009-05-02 22:16     ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
@ 2009-05-02 22:16       ` Izik Eidus
  2009-05-02 22:16         ` [PATCH 5/6] ksm: build system make it compile for all archs Izik Eidus
  2009-05-04 19:43       ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Hugh Dickins
  1 sibling, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This is needed to avoid breaking some architectures.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index c14019f..bfbbe1d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -766,8 +766,8 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 	copy_user_highpage(kpage, page1, addr1, vma);
 	ret = try_to_merge_one_page(mm1, vma, page1, kpage, prot);
@@ -784,8 +784,7 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 			return ret;
 		}
 
-		prot = vma->vm_page_prot;
-		pgprot_val(prot) &= ~_PAGE_RW;
+		prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 		ret = try_to_merge_one_page(mm2, vma, page2, kpage,
 					    prot);
@@ -831,8 +830,9 @@ static int try_to_merge_two_pages_noalloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
+
 	ret = try_to_merge_one_page(mm1, vma, page1, page2, prot);
 	up_read(&mm1->mmap_sem);
 	if (!ret)
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/6] ksm: build system make it compile for all archs
  2009-05-02 22:16       ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
@ 2009-05-02 22:16         ` Izik Eidus
  2009-05-02 22:16           ` [PATCH 6/6] ksm: use another miscdevice minor number Izik Eidus
  0 siblings, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The known issues with cross platform support were fixed,
so we return it back to compile on all archs.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/Kconfig |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index f59b1e4..fb8ac63 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -228,7 +228,6 @@ config MMU_NOTIFIER
 
 config KSM
 	tristate "Enable KSM for page sharing"
-	depends on X86
 	help
 	  Enable the KSM kernel module to allow page sharing of equal pages
 	  among different tasks.
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/6] ksm: use another miscdevice minor number.
  2009-05-02 22:16         ` [PATCH 5/6] ksm: build system make it compile for all archs Izik Eidus
@ 2009-05-02 22:16           ` Izik Eidus
  0 siblings, 0 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The old number was registered already by another project.
The new number is #234.

Thanks.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 53d64d3..a0c3259 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -443,6 +443,7 @@ Your cooperation is appreciated.
 		231 = /dev/snapshot	System memory snapshot device
 		232 = /dev/kvm		Kernel-based virtual machine (hardware virtualization extensions)
 		233 = /dev/kmview	View-OS A process with a view
+		234 = /dev/ksm		Dynamic page sharing
 		240-254			Reserved for local use
 		255			Reserved for MISC_DYNAMIC_MINOR
 
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 297c0bb..c7b8e9b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,7 +30,7 @@
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
-#define KSM_MINOR		233
+#define KSM_MINOR		234
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
  2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
@ 2009-05-03  2:08   ` Rik van Riel
  2009-05-03  9:06     ` Izik Eidus
  2009-05-03  2:08   ` Rik van Riel
  2 siblings, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2009-05-03  2:08 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

On Sun,  3 May 2009 01:16:07 +0300
Izik Eidus <ieidus@redhat.com> wrote:

> Right now user can open /dev/ksm fd and register unlimited number of
> regions, such behavior may allocate unlimited amount of kernel memory
> and get the whole host into out of memory situation.

How many times can a process open /dev/ksm?

If a process can open /dev/ksm a thousand times and then
register 1000 regions through each file descriptor, this
patch does not help all that much...

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
  2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
  2009-05-03  2:08   ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
@ 2009-05-03  2:08   ` Rik van Riel
  2 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2009-05-03  2:08 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

On Sun,  3 May 2009 01:16:07 +0300
Izik Eidus <ieidus@redhat.com> wrote:

> Right now user can open /dev/ksm fd and register unlimited number of
> regions, such behavior may allocate unlimited amount of kernel memory
> and get the whole host into out of memory situation.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  mm/ksm.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 6165276..d58db6b 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -48,6 +48,9 @@ static int rmap_hash_size;
>  module_param(rmap_hash_size, int, 0);
>  MODULE_PARM_DESC(rmap_hash_size, "Hash table size for the reverse
> mapping"); 
> +static int regions_per_fd;
> +module_param(regions_per_fd, int, 0);
> +
>  /*
>   * ksm_mem_slot - hold information for an userspace scanning range
>   * (the scanning for this region will be from addr untill addr +
> @@ -67,6 +70,7 @@ struct ksm_mem_slot {
>   */
>  struct ksm_sma {
>  	struct list_head sma_slots;
> +	int nregions;
>  };
>  
>  /**
> @@ -453,6 +457,11 @@ static int
> ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma, struct
> ksm_mem_slot *slot; int ret = -EPERM;
>  
> +	if ((ksm_sma->nregions + 1) > regions_per_fd) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
>  	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
>  	if (!slot) {
>  		ret = -ENOMEM;
> @@ -473,6 +482,7 @@ static int
> ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma, 
>  	list_add_tail(&slot->link, &slots);
>  	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
> +	ksm_sma->nregions++;
>  
>  	up_write(&slots_lock);
>  	return 0;
> @@ -511,6 +521,7 @@ static int
> ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
> mmput(slot->mm); list_del(&slot->sma_link);
>  		kfree(slot);
> +		ksm_sma->nregions--;
>  	}
>  	up_write(&slots_lock);
>  	return 0;
> @@ -1389,6 +1400,7 @@ static int
> ksm_dev_ioctl_create_shared_memory_area(void) }
>  
>  	INIT_LIST_HEAD(&ksm_sma->sma_slots);
> +	ksm_sma->nregions = 0;
>  
>  	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
>  	if (fd < 0)
> @@ -1631,6 +1643,9 @@ static int __init ksm_init(void)
>  	if (r)
>  		goto out_free1;
>  
> +	if (!regions_per_fd)
> +		regions_per_fd = 1024;
> +
>  	ksm_thread = kthread_run(ksm_scan_thread, NULL, "kksmd");
>  	if (IS_ERR(ksm_thread)) {
>  		printk(KERN_ERR "ksm: creating kthread failed\n");


-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-03  2:08   ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
@ 2009-05-03  9:06     ` Izik Eidus
  0 siblings, 0 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-03  9:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> On Sun,  3 May 2009 01:16:07 +0300
> Izik Eidus <ieidus@redhat.com> wrote:
>
>   
>> Right now user can open /dev/ksm fd and register unlimited number of
>> regions, such behavior may allocate unlimited amount of kernel memory
>> and get the whole host into out of memory situation.
>>     
>
> How many times can a process open /dev/ksm?
>
> If a process can open /dev/ksm a thousand times and then
> register 1000 regions through each file descriptor, this
> patch does not help all that much...
>
>   
The idea is that the limitation is now on the maximum file descriptors 
user can open.
So for each such file descriptor user can open 1024 structures that are 
just few bytes each.

The whole propose of this patch is to avoid while (1) { 
IOCTL(REGISTER_MEMORY_REGION) } and oom the host.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-02 22:16     ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
  2009-05-02 22:16       ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
@ 2009-05-04 19:43       ` Hugh Dickins
  2009-05-04 20:37         ` Izik Eidus
  1 sibling, 1 reply; 30+ messages in thread
From: Hugh Dickins @ 2009-05-04 19:43 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	nickpiggin

On Sun, 3 May 2009, Izik Eidus wrote:

> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
> memory region (instead of flushing all the registred memory regions inside
> the file descriptor like it happen now)
> 
> The previoes api was:
> user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
> and then when he wanted to remove just one memory region, he had to remove them
> all using KSM_REMOVE_MEMORY_REGION.
> 
> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
> ioctl to recive another paramter that it is the begining of the virtual
> address that is wanted to be removed.
> 
> (user can still remove all the memory regions all at once, by just closing
> the file descriptor)
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

I realize that it's ridiculous to break my silence with a comment
on this particular patch, when I've not yet commented on KSM as a
whole.  (In the last few days I have at last managed to set aside
some time to give KSM the attention it deserves, but I'm still
not yet through and ready to comment.)

However, although this patch is on the right lines (certainly you
should be allowing to remove individual regions rather than just
all at once), I believe the patch is seriously broken and corrupting
as is, so thought I'd better speak up now.

remove_mm_from_hash_and_tree(slot->mm) is still doing its own
silly loop through the slots:
	list_for_each_entry(slot, &slots, link)
		if (slot->mm == mm)
			break;
So it will be operating on whatever it finds first, in general
the wrong slot, and I expect havoc to follow once you kfree(slot).

Easily fixed: replace remove_mm_from_hash_and_tree(mm)
by remove_slot_from_hash_and_tree(slot).

Hugh

> ---
>  mm/ksm.c |   31 +++++++++++++++++++++----------
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 982dfff..c14019f 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -561,17 +561,20 @@ static void remove_mm_from_hash_and_tree(struct mm_struct *mm)
>  	list_del(&slot->link);
>  }
>  
> -static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
> +static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
> +					      unsigned long addr)
>  {
>  	struct ksm_mem_slot *slot, *node;
>  
>  	down_write(&slots_lock);
>  	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
> -		remove_mm_from_hash_and_tree(slot->mm);
> -		mmput(slot->mm);
> -		list_del(&slot->sma_link);
> -		kfree(slot);
> -		ksm_sma->nregions--;
> +		if (addr == slot->addr) {
> +			remove_mm_from_hash_and_tree(slot->mm);
> +			mmput(slot->mm);
> +			list_del(&slot->sma_link);
> +			kfree(slot);
> +			ksm_sma->nregions--;
> +		}
>  	}
>  	up_write(&slots_lock);
>  	return 0;
> @@ -579,12 +582,20 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
>  
>  static int ksm_sma_release(struct inode *inode, struct file *filp)
>  {
> +	struct ksm_mem_slot *slot, *node;
>  	struct ksm_sma *ksm_sma = filp->private_data;
> -	int r;
>  
> -	r = ksm_sma_ioctl_remove_memory_region(ksm_sma);
> +	down_write(&slots_lock);
> +	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
> +		remove_mm_from_hash_and_tree(slot->mm);
> +		mmput(slot->mm);
> +		list_del(&slot->sma_link);
> +		kfree(slot);
> +	}
> +	up_write(&slots_lock);
> +
>  	kfree(ksm_sma);
> -	return r;
> +	return 0;
>  }
>  
>  static long ksm_sma_ioctl(struct file *filp,
> @@ -607,7 +618,7 @@ static long ksm_sma_ioctl(struct file *filp,
>  		break;
>  	}
>  	case KSM_REMOVE_MEMORY_REGION:
> -		r = ksm_sma_ioctl_remove_memory_region(sma);
> +		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
>  		break;
>  	}
>  
> -- 
> 1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-04 19:43       ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Hugh Dickins
@ 2009-05-04 20:37         ` Izik Eidus
  0 siblings, 0 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-04 20:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	nickpiggin

Hugh Dickins wrote:
> On Sun, 3 May 2009, Izik Eidus wrote:
>
>   
>> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
>> memory region (instead of flushing all the registred memory regions inside
>> the file descriptor like it happen now)
>>
>> The previoes api was:
>> user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
>> and then when he wanted to remove just one memory region, he had to remove them
>> all using KSM_REMOVE_MEMORY_REGION.
>>
>> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
>> ioctl to recive another paramter that it is the begining of the virtual
>> address that is wanted to be removed.
>>
>> (user can still remove all the memory regions all at once, by just closing
>> the file descriptor)
>>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>>     
>
> I realize that it's ridiculous to break my silence

Hello ! :)

>  with a comment
> on this particular patch, when I've not yet commented on KSM as a
> whole.  (In the last few days I have at last managed to set aside
> some time to give KSM the attention it deserves, but I'm still
> not yet through and ready to comment.)
>
> However, although this patch is on the right lines (certainly you
> should be allowing to remove individual regions rather than just
> all at once), I believe the patch is seriously broken and corrupting
> as is, so thought I'd better speak up now.
>
> remove_mm_from_hash_and_tree(slot->mm) is still doing its own
> silly loop through the slots:
> 	list_for_each_entry(slot, &slots, link)
> 		if (slot->mm == mm)
> 			break;
> So it will be operating on whatever it finds first

I just started to write big answer that go over the code path to show 
why you are wrong, and then found the problem.

Thanks i will fix it and resend...


> , in general
> the wrong slot, and I expect havoc to follow once you kfree(slot).
>
> Easily fixed: replace remove_mm_from_hash_and_tree(mm)
> by remove_slot_from_hash_and_tree(slot).
>   

Yea, remove_mm_from_hash_and_tree(mm) is surely something that we dont 
need that was left from the old code base.

Thanks.
> Hugh
>
>   
>> ---
>>  mm/ksm.c |   31 +++++++++++++++++++++----------
>>  1 files changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 982dfff..c14019f 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -561,17 +561,20 @@ static void remove_mm_from_hash_and_tree(struct mm_struct *mm)
>>  	list_del(&slot->link);
>>  }
>>  
>> -static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
>> +static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
>> +					      unsigned long addr)
>>  {
>>  	struct ksm_mem_slot *slot, *node;
>>  
>>  	down_write(&slots_lock);
>>  	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
>> -		remove_mm_from_hash_and_tree(slot->mm);
>> -		mmput(slot->mm);
>> -		list_del(&slot->sma_link);
>> -		kfree(slot);
>> -		ksm_sma->nregions--;
>> +		if (addr == slot->addr) {
>> +			remove_mm_from_hash_and_tree(slot->mm);
>> +			mmput(slot->mm);
>> +			list_del(&slot->sma_link);
>> +			kfree(slot);
>> +			ksm_sma->nregions--;
>> +		}
>>  	}
>>  	up_write(&slots_lock);
>>  	return 0;
>> @@ -579,12 +582,20 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
>>  
>>  static int ksm_sma_release(struct inode *inode, struct file *filp)
>>  {
>> +	struct ksm_mem_slot *slot, *node;
>>  	struct ksm_sma *ksm_sma = filp->private_data;
>> -	int r;
>>  
>> -	r = ksm_sma_ioctl_remove_memory_region(ksm_sma);
>> +	down_write(&slots_lock);
>> +	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
>> +		remove_mm_from_hash_and_tree(slot->mm);
>> +		mmput(slot->mm);
>> +		list_del(&slot->sma_link);
>> +		kfree(slot);
>> +	}
>> +	up_write(&slots_lock);
>> +
>>  	kfree(ksm_sma);
>> -	return r;
>> +	return 0;
>>  }
>>  
>>  static long ksm_sma_ioctl(struct file *filp,
>> @@ -607,7 +618,7 @@ static long ksm_sma_ioctl(struct file *filp,
>>  		break;
>>  	}
>>  	case KSM_REMOVE_MEMORY_REGION:
>> -		r = ksm_sma_ioctl_remove_memory_region(sma);
>> +		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
>>  		break;
>>  	}
>>  
>> -- 
>> 1.5.6.5
>>     

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-04 22:25 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
@ 2009-05-04 22:25   ` Izik Eidus
  2009-05-06  0:43     ` Rik van Riel
  0 siblings, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-04 22:25   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
@ 2009-05-06  0:43     ` Rik van Riel
  2009-05-06  9:46       ` Izik Eidus
  0 siblings, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2009-05-06  0:43 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> subjects say it all.

Not a very useful commit message.

This makes me wonder, though.

What happens if a user mmaps a 30MB memory region, registers it
with KSM and then unmaps the middle 10MB?

> Signed-off-by: Izik Eidus <ieidus@redhat.com>

except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06  0:43     ` Rik van Riel
@ 2009-05-06  9:46       ` Izik Eidus
  2009-05-06 12:26         ` Rik van Riel
  0 siblings, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-06  9:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> subjects say it all.
>
> Not a very useful commit message.
>
> This makes me wonder, though.
>
> What happens if a user mmaps a 30MB memory region, registers it
> with KSM and then unmaps the middle 10MB?

User cant break 30MB into smaller one.
That mean that when you regisiter memory region that is X mb size, you 
can only remove it (as a whole), or add new region.
This should answer the next question you had about why i have just the 
start address for removing the regions.

>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>
> except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06  9:46       ` Izik Eidus
@ 2009-05-06 12:26         ` Rik van Riel
  2009-05-06 12:39           ` Izik Eidus
  2009-05-06 13:17           ` Andrea Arcangeli
  0 siblings, 2 replies; 30+ messages in thread
From: Rik van Riel @ 2009-05-06 12:26 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> Rik van Riel wrote:
>> Izik Eidus wrote:
>>> subjects say it all.
>>
>> Not a very useful commit message.
>>
>> This makes me wonder, though.
>>
>> What happens if a user mmaps a 30MB memory region, registers it
>> with KSM and then unmaps the middle 10MB?
> 
> User cant break 30MB into smaller one.

The user can break up the underlying VMAs though.

I am just wondering out loud if we really want two
VMA-like objects in the kernel, the VMA itself and
a separate KSM object, with different semantics.

Maybe this is fine, but I do think it's a question
that needs to be thought about.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 12:26         ` Rik van Riel
@ 2009-05-06 12:39           ` Izik Eidus
  2009-05-06 13:17           ` Andrea Arcangeli
  1 sibling, 0 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-06 12:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> Rik van Riel wrote:
>>> Izik Eidus wrote:
>>>> subjects say it all.
>>>
>>> Not a very useful commit message.
>>>
>>> This makes me wonder, though.
>>>
>>> What happens if a user mmaps a 30MB memory region, registers it
>>> with KSM and then unmaps the middle 10MB?
>>
>> User cant break 30MB into smaller one.
>
> The user can break up the underlying VMAs though.

So? KSM work on contigiouns virtual address, if user will break its 
virtual address and will leave it to be registered inside ksm
get_user_pages() will just fail, and ksm will skip scanning this 
addresses...

Normal usage of ksm is:

1) Allocating big chunck of memory.

2) registering it inside ksm

3) free the memory and remove it from ksm...

>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics. 
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.


Yea, we had some talk about that issue, considering the fact that user 
register its memory using ioctl and not systemcall, and considering the 
fact that ksm is loadable module that the kernel doesnt depend on,

How would you prefer to see the interface?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 12:26         ` Rik van Riel
  2009-05-06 12:39           ` Izik Eidus
@ 2009-05-06 13:17           ` Andrea Arcangeli
  2009-05-06 13:28             ` Hugh Dickins
  1 sibling, 1 reply; 30+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 13:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Izik Eidus, akpm, linux-kernel, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

On Wed, May 06, 2009 at 08:26:18AM -0400, Rik van Riel wrote:
> The user can break up the underlying VMAs though.
>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics.
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.

If we want to keep KVM self contained we need a separate object. If we
want to merge part of KVM into the kernel VM core, then it can use the
vma and use madvise or better its own syscall (usually madvise doesn't
depend on admin starting kernel threads) or similar and the semantics
will change slightly. From a practical point of view I don't think
there's much difference and it can be done later if we change our
mind, given the low amount of apps that uses KVM (but for those few
apps like KVM, KSM can save tons of memory).

For example for the swapping of KSM pages we've been thinking of using
external rmap hooks to avoid the VM to know anything specific to KSM
pages but to still allow their unmapping and swap. Otherwise if there
are other modules like KVM that wants to extend the VM they'll also
have to add their own PG_ bitflags just for allow the swapping of
their own pages in the VM LRUs etc..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:17           ` Andrea Arcangeli
@ 2009-05-06 13:28             ` Hugh Dickins
  2009-05-06 14:02               ` Izik Eidus
  2009-05-06 14:09               ` Andrea Arcangeli
  0 siblings, 2 replies; 30+ messages in thread
From: Hugh Dickins @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Andrea Arcangeli wrote:
> 
> For example for the swapping of KSM pages we've been thinking of using
> external rmap hooks to avoid the VM to know anything specific to KSM
> pages but to still allow their unmapping and swap.

There may prove to be various reasons why it wouldn't work out in practice;
but when thinking of swapping them, it is worth considering if those KSM
pages can just be assigned to a tmpfs file, then leave the swapping to that.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:28             ` Hugh Dickins
@ 2009-05-06 14:02               ` Izik Eidus
  2009-05-06 17:11                 ` Hugh Dickins
  2009-05-06 14:09               ` Andrea Arcangeli
  1 sibling, 1 reply; 30+ messages in thread
From: Izik Eidus @ 2009-05-06 14:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Andrea Arcangeli wrote:
>   
>> For example for the swapping of KSM pages we've been thinking of using
>> external rmap hooks to avoid the VM to know anything specific to KSM
>> pages but to still allow their unmapping and swap.
>>     
>
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.
>
> Hugh
>   

The problem here (as i see it) is reverse mapping for this vmas that 
point into the shared page.
Right now linux use the ->index to find this pages and then unpresent 
them...
But even if we move into allocating them inside tmpfs, who will know how 
to unpresent the virtual addresses when we want to swap the page?

I had old patch that did that extrnal rmap thing, it added few callbacks 
inside rmap.c (you had AnonPage FilePage and ExtRmapPage) and if any 
driver wanted to mangae the reverse mapping by itself it would just 
marked the pages as ExtRmapPages and will then tell the 
try_to_unmap_one() the virtual address it need to unmap...

As far as i remember it required some small changes into memory.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:28             ` Hugh Dickins
  2009-05-06 14:02               ` Izik Eidus
@ 2009-05-06 14:09               ` Andrea Arcangeli
  2009-05-06 14:21                 ` Alan Cox
  1 sibling, 1 reply; 30+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 02:28:33PM +0100, Hugh Dickins wrote:
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.

Not sure if I understand but the vma handled by KSM is anonymous, how
can you assign those pages to a tmpfs file, the anon vma won't permit
that, all regular anon methods will be called for swapin etc... What I
mean is that some change in core VM looks required and I plan those to
be external-rmap kind, KSM agnostic. But perhaps we can reuse some
shmem code yes, I didn't think about that yet. Anyway I'd rather
discuss this later, this isn't the time yet. I'm quite optimistic that
to make KSM swap it won't be a big change. For now there's a limit on
the max number of ksm pages that can be allocated at any given time so
to avoid OOM conditions, like the swap-compress logic that limits the
swapdevice size to less than ram.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:09               ` Andrea Arcangeli
@ 2009-05-06 14:21                 ` Alan Cox
  2009-05-06 14:46                   ` Hugh Dickins
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Cox @ 2009-05-06 14:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

> the max number of ksm pages that can be allocated at any given time so
> to avoid OOM conditions, like the swap-compress logic that limits the
> swapdevice size to less than ram.

Are those pages accounted for in the vm_overcommit logic, as if you
allocate a big chunk of memory as KSM will do you need the worst case
vm_overcommit behaviour preserved and that means keeping the stats
correct.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:21                 ` Alan Cox
@ 2009-05-06 14:46                   ` Hugh Dickins
  2009-05-06 14:56                     ` Andrea Arcangeli
  2009-05-06 14:57                     ` Izik Eidus
  0 siblings, 2 replies; 30+ messages in thread
From: Hugh Dickins @ 2009-05-06 14:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

On Wed, 6 May 2009, Alan Cox wrote:
> > the max number of ksm pages that can be allocated at any given time so
> > to avoid OOM conditions, like the swap-compress logic that limits the
> > swapdevice size to less than ram.

(I don't know anything about that swap-compress logic and limitation.)

> 
> Are those pages accounted for in the vm_overcommit logic, as if you
> allocate a big chunk of memory as KSM will do you need the worst case
> vm_overcommit behaviour preserved and that means keeping the stats
> correct.

As I understand it, KSM won't affect the vm_overcommit behaviour at all.
Those pages Izik refers to are not allocated up front, they're just a
limit on the number of process pages which may get held in core at any
one time, through being shared via the KSM mechanism.

KSM is not evading vm_committed_space at all, not opening a backdoor
away from the ordinary mmaps: just collapsing duplicated pages in
what's been mapped in the usual way, down to single copies.

So the vm_commited_space accounting is exactly as before: it would
be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
doesn't change its calculations at all - it will and will have to
be as pessimistic as it ever was.

The only difference would be in how much memory (mostly lowmem)
KSM's own data structures will take up - as usual, the kernel
data structures aren't being accounted, but do take up memory.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:46                   ` Hugh Dickins
@ 2009-05-06 14:56                     ` Andrea Arcangeli
  2009-05-06 23:55                       ` Minchan Kim
  2009-05-06 14:57                     ` Izik Eidus
  1 sibling, 1 reply; 30+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.

In short vm_overcommit is a virtual thing, KSM only makes virtual
takes less physical than before. One issue in KSM that was mentioned
was the cgroup accounting if you merge two pages in different groups
but that is kind of a corner case and it'll be handled "somehow" :)

> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.

Oh yeah, on 32bit systems that would be a problem... That lowmem is
taken for eacy virtual address scanned. One more reason to still allow
ksm to all users only selectively through chown/chmod with ioctl or
sysfs permissions with syscall/madvise. Luckily most systems where ksm
is used are 64bit. We don't plan to kmap_atomic around the
rmap_item/tree_item. No ram is allocated in the holes though, so if
there's not a real anonymous page allocated the rmap_item will not be
allocated either (without requiring pending update ;).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:46                   ` Hugh Dickins
  2009-05-06 14:56                     ` Andrea Arcangeli
@ 2009-05-06 14:57                     ` Izik Eidus
  1 sibling, 0 replies; 30+ messages in thread
From: Izik Eidus @ 2009-05-06 14:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Alan Cox wrote:
>   
>>> the max number of ksm pages that can be allocated at any given time so
>>> to avoid OOM conditions, like the swap-compress logic that limits the
>>> swapdevice size to less than ram.
>>>       
>
> (I don't know anything about that swap-compress logic and limitation.)
>
>   
>> Are those pages accounted for in the vm_overcommit logic, as if you
>> allocate a big chunk of memory as KSM will do you need the worst case
>> vm_overcommit behaviour preserved and that means keeping the stats
>> correct.
>>     
>
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> Those pages Izik refers to are not allocated up front, they're just a
> limit on the number of process pages which may get held in core at any
> one time, through being shared via the KSM mechanism.
>   

Exactly, this pages are not swappable (now), so we allow the sysadmin to 
control the maximum value of them.
> KSM is not evading vm_committed_space at all, not opening a backdoor
> away from the ordinary mmaps: just collapsing duplicated pages in
> what's been mapped in the usual way, down to single copies.
>
> So the vm_commited_space accounting is exactly as before: it would
> be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
> doesn't change its calculations at all - it will and will have to
> be as pessimistic as it ever was.
>
> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.
>
> Hugh
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:02               ` Izik Eidus
@ 2009-05-06 17:11                 ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2009-05-06 17:11 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Hugh Dickins wrote:
> >
> > There may prove to be various reasons why it wouldn't work out in practice;
> > but when thinking of swapping them, it is worth considering if those KSM
> > pages can just be assigned to a tmpfs file, then leave the swapping to that.
> 
> The problem here (as i see it) is reverse mapping for this vmas that point
> into the shared page.
> Right now linux use the ->index to find this pages and then unpresent them...
> But even if we move into allocating them inside tmpfs, who will know how to
> unpresent the virtual addresses when we want to swap the page?

Yes, you're right, tmpfs wouldn't be helping you at all with that problem,
so doubtful whether it has any help to offer here.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:56                     ` Andrea Arcangeli
@ 2009-05-06 23:55                       ` Minchan Kim
  2009-05-07  0:19                         ` Chris Wright
  2009-05-07 10:46                         ` Andrea Arcangeli
  0 siblings, 2 replies; 30+ messages in thread
From: Minchan Kim @ 2009-05-06 23:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

Hi, Andrea.

On Wed, 6 May 2009 16:56:42 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> > As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> 
> In short vm_overcommit is a virtual thing, KSM only makes virtual
> takes less physical than before. One issue in KSM that was mentioned
> was the cgroup accounting if you merge two pages in different groups
> but that is kind of a corner case and it'll be handled "somehow" :)
> 
> > The only difference would be in how much memory (mostly lowmem)
> > KSM's own data structures will take up - as usual, the kernel
> > data structures aren't being accounted, but do take up memory.
> 
> Oh yeah, on 32bit systems that would be a problem... That lowmem is
> taken for eacy virtual address scanned. One more reason to still allow
> ksm to all users only selectively through chown/chmod with ioctl or
> sysfs permissions with syscall/madvise. Luckily most systems where ksm
> is used are 64bit. We don't plan to kmap_atomic around the
> rmap_item/tree_item. No ram is allocated in the holes though, so if

Hmm. Don't you consider 32-bit system ?

In http://www.mail-archive.com/kvm@vger.kernel.org/msg13043.html, 
Jared siad, it's also good in embedded system. (but I don't know well his testing environement).
Many embedded system is so I/O bouneded that we can use much CPU time in there. 
I hope this feature will help saving memory in embedded system. 

One more thing about interface. 

Ksm map regions are dynamic characteritic ?
I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
Of course, It depends on application's behavior. 

For using this feature now, we have to add ioctl and recompile applications.
It means we have to know application internal well and to need source code. 
It would prevent various experiements and easy use.

I want to use this feature without appliation internal knowledge easily. 
Maybe it can be useless without appliation behavior knowledge.
But it will help various application experiments without much knowledge of application and recompile. 

ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Personally, I support cgroup interface but don't have a good idea now. 
It can help fork-like application and we can group same address of KSM range among tasks. 

> there's not a real anonymous page allocated the rmap_item will not be
> allocated either (without requiring pending update ;).
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 23:55                       ` Minchan Kim
@ 2009-05-07  0:19                         ` Chris Wright
  2009-05-07 10:46                         ` Andrea Arcangeli
  1 sibling, 0 replies; 30+ messages in thread
From: Chris Wright @ 2009-05-07  0:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrea Arcangeli, Hugh Dickins, Alan Cox, Rik van Riel,
	Izik Eidus, akpm, linux-kernel, chrisw, device, linux-mm,
	nickpiggin

* Minchan Kim (minchan.kim@gmail.com) wrote:
> I want to use this feature without appliation internal knowledge easily. 
> Maybe it can be useless without appliation behavior knowledge.
> But it will help various application experiments without much knowledge of application and recompile. 
> 
> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Yes, this is common request.  Izik made some changes to enable a pid
based registration, just not been cleaned up and made available yet.

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 23:55                       ` Minchan Kim
  2009-05-07  0:19                         ` Chris Wright
@ 2009-05-07 10:46                         ` Andrea Arcangeli
  2009-05-07 12:01                           ` Minchan Kim
  1 sibling, 1 reply; 30+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 10:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

On Thu, May 07, 2009 at 08:55:47AM +0900, Minchan Kim wrote:
> Hmm. Don't you consider 32-bit system ?

Sorry I was too short, don't worry, I meant hugemem 32bit systems,
like 32G. If there's not much highmem, no problem can ever
happen. Just like pagetables had to be moved to highmem on 32G 32bit
systems to make them workable, KSM on those systems may generate lots
of lowmem and triggering early OOM conditions when allocating inodes
or other slab objects etc... and we don't plan to move those
rmap_items that represents physical pages by the chain of the virtual
addresses that maps them in highmem.

> Many embedded system is so I/O bouneded that we can use much CPU time in there. 

Embedded systems with >4G of ram should run 64bit these days, so I
don't see a problem.

> I hope this feature will help saving memory in embedded system. 

It will (assuming that there are apps that are duplicating anonymous
memory of course ;).

> One more thing about interface. 
> 
> Ksm map regions are dynamic characteritic ?
> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
> Of course, It depends on application's behavior. 

Looks like the ioctl API is going away in favour of madvise so it'll
function like madvise, if you munmap the region the KSM registration
will go away.

> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

This was answered by Chris, and surely this is feasible, as it is
feasible for kksmd to scan the whole system regardless of any
madvise. Some sysfs mangling should allow it.

However regardless of the highmem issue (this applies to 64bit systems
too) you've to keep in mind that for kksmd to keep track all pages
under scan it has to build rbtree and allocate rmap_items and
tree_items for each page tracked, those objects take some memory, so
if there's not much ram sharing you may waste more memory in the kksmd
allocations than in the amount of memory actually freed by KSM. This
is why it's better to selectively only register ranges that we know in
advance there's an high probability to free memory.

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-07 10:46                         ` Andrea Arcangeli
@ 2009-05-07 12:01                           ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2009-05-07 12:01 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Minchan Kim, Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus,
	akpm, linux-kernel, chrisw, device, linux-mm, nickpiggin

>> Many embedded system is so I/O bouneded that we can use much CPU time in there.
>
> Embedded systems with >4G of ram should run 64bit these days, so I
> don't see a problem.

What I mean is that many embedded applications don't use so much cpu
time that we can use extra cpu time to scan identical pages for KSM.
:)

>> One more thing about interface.
>>
>> Ksm map regions are dynamic characteritic ?
>> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
>> Of course, It depends on application's behavior.
>
> Looks like the ioctl API is going away in favour of madvise so it'll
> function like madvise, if you munmap the region the KSM registration
> will go away.
>
>> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup.
>
> This was answered by Chris, and surely this is feasible, as it is
> feasible for kksmd to scan the whole system regardless of any
> madvise. Some sysfs mangling should allow it.
>
> However regardless of the highmem issue (this applies to 64bit systems
> too) you've to keep in mind that for kksmd to keep track all pages
> under scan it has to build rbtree and allocate rmap_items and
> tree_items for each page tracked, those objects take some memory, so
> if there's not much ram sharing you may waste more memory in the kksmd
> allocations than in the amount of memory actually freed by KSM. This
> is why it's better to selectively only register ranges that we know in
> advance there's an high probability to free memory.

Indeed.

This interface can use for just simple test and profiling.
If it don't add memory pressure and latency, we can use it without
modifying source code.
Unfortunately, it's not in usual case. ;-)

So if KSM can provide profiling information, we can tune easily than now.

ex)
pfn : 0x12, shared [pid 103, vaddr 0x80010000] [pid 201, vaddr 0x800ac000] .....
pfn : 0x301, shared [pid 103, vaddr 0x80020000] [pid 203, vaddr
0x801ac000] .....
...
...

If KSM can provide this profiling information, firstly we try to use
ksm without madive and next we can add madvise call on most like
sharable vma range using profiling data.

> Thanks!
> Andrea
>



-- 
Thanks,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-05-07 12:01 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-02 22:16 [PATCH 0/6] ksm changes Izik Eidus
2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-02 22:16     ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
2009-05-02 22:16       ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
2009-05-02 22:16         ` [PATCH 5/6] ksm: build system make it compile for all archs Izik Eidus
2009-05-02 22:16           ` [PATCH 6/6] ksm: use another miscdevice minor number Izik Eidus
2009-05-04 19:43       ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Hugh Dickins
2009-05-04 20:37         ` Izik Eidus
2009-05-03  2:08   ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
2009-05-03  9:06     ` Izik Eidus
2009-05-03  2:08   ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2009-05-04 22:25 [PATCH 0/6] ksm changes (v2) Izik Eidus
2009-05-04 22:25 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-04 22:25   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-06  0:43     ` Rik van Riel
2009-05-06  9:46       ` Izik Eidus
2009-05-06 12:26         ` Rik van Riel
2009-05-06 12:39           ` Izik Eidus
2009-05-06 13:17           ` Andrea Arcangeli
2009-05-06 13:28             ` Hugh Dickins
2009-05-06 14:02               ` Izik Eidus
2009-05-06 17:11                 ` Hugh Dickins
2009-05-06 14:09               ` Andrea Arcangeli
2009-05-06 14:21                 ` Alan Cox
2009-05-06 14:46                   ` Hugh Dickins
2009-05-06 14:56                     ` Andrea Arcangeli
2009-05-06 23:55                       ` Minchan Kim
2009-05-07  0:19                         ` Chris Wright
2009-05-07 10:46                         ` Andrea Arcangeli
2009-05-07 12:01                           ` Minchan Kim
2009-05-06 14:57                     ` Izik Eidus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).