From: Andrea Arcangeli <aarcange@redhat.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: CAI Qian <caiqian@redhat.com>,
Andrea Righi <andrea@betterlinux.com>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Izik Eidus <ieidus@redhat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
Date: Thu, 2 Jun 2011 18:44:58 +0200 [thread overview]
Message-ID: <20110602164458.GG19505@random.random> (raw)
In-Reply-To: <20110602153641.GJ23047@sequoia.sous-sol.org>
On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > >
> > > Right, that's just what the program is trying to do, segfault.
> > >
> > > > +++ killed by SIGSEGV (core dumped) +++
> > > > Segmentation fault (core dumped)
> > > >
> > > > Did I miss anything?
> > >
> > > I found it works but not 100% of the time.
> > >
> > > So I just run the bug in a loop.
> >
> > echo 0 >scan_millisecs helps.
>
> BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> happened to be recent regression). It looks like mm_slot is off the list:
>
> R10: dead000000200200 R11: dead000000100100
Yes it had to be use after free.
I cooked this patch, still untested but it builds. Will test it soon.
===
Subject: ksm: fix __ksm_exit vs ksm scan SMP race
From: Andrea Arcangeli <aarcange@redhat.com>
If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
to zero but before __ksm_exit had a chance runs, both the KSM scan and
__ksm_exit will free the slot. This fixes the SMP race condition by using
test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
scan or not.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..47ef4c1 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
if (ksm_test_exit(mm)) {
hlist_del(&mm_slot->link);
list_del(&mm_slot->mm_list);
+ /*
+ * After releasing ksm_mmlist_lock __ksm_exit
+ * can run and we already changed mm_slot, so
+ * notify it with MMF_VM_MERGEABLE not to free
+ * this again.
+ */
+ clear_bit(MMF_VM_MERGEABLE, &mm->flags);
spin_unlock(&ksm_mmlist_lock);
free_mm_slot(mm_slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
up_read(&mm->mmap_sem);
mmdrop(mm);
} else {
@@ -1377,10 +1383,15 @@ next_mm:
*/
hlist_del(&slot->link);
list_del(&slot->mm_list);
+ /*
+ * After releasing ksm_mmlist_lock __ksm_exit can run
+ * and we already changed mm_slot, so notify it with
+ * MMF_VM_MERGEABLE not to free this again.
+ */
+ clear_bit(MMF_VM_MERGEABLE, &mm->flags);
spin_unlock(&ksm_mmlist_lock);
free_mm_slot(slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
up_read(&mm->mmap_sem);
mmdrop(mm);
} else {
@@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
return 0; /* just ignore the advice */
+ /*
+ * It should be safe to test_bit instead of
+ * test_and_bit_set because the madvise generic caller
+ * holds the mmap_sem write mode.
+ */
if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
err = __ksm_enter(mm);
if (err)
@@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
spin_unlock(&ksm_mmlist_lock);
+ /*
+ * It should be safe to set it outside ksm_mmlist_lock because
+ * we hold a mm_user pin on the mm so __ksm_exit can't run.
+ */
set_bit(MMF_VM_MERGEABLE, &mm->flags);
atomic_inc(&mm->mm_count);
@@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
mm_slot = get_mm_slot(mm);
if (mm_slot && ksm_scan.mm_slot != mm_slot) {
if (!mm_slot->rmap_list) {
- hlist_del(&mm_slot->link);
- list_del(&mm_slot->mm_list);
- easy_to_free = 1;
+ /*
+ * If MMF_VM_MERGEABLE isn't set it was freed
+ * by the scan immediately after mm_count
+ * reached zero (visible by the scan) but
+ * before __ksm_exit() run, so we don't need
+ * to do anything here. We don't even need to
+ * wait for the KSM scan to release the
+ * mmap_sem as it's not working on the mm
+ * anymore but it's just releasing it, and it
+ * probably already did and dropped its
+ * mm_count too (it would however be safe to
+ * take mmap_sem here even if MMF_VM_MERGEABLE
+ * is already clear, as the actual mm can't be
+ * freed until we return and we run mmdrop
+ * too, but it's unnecessary).
+ */
+ if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
+ hlist_del(&mm_slot->link);
+ list_del(&mm_slot->mm_list);
+ easy_to_free = 1;
+ } else
+ mm_slot = NULL;
} else {
list_move(&mm_slot->mm_list,
&ksm_scan.mm_slot->mm_list);
@@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
if (easy_to_free) {
free_mm_slot(mm_slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
mmdrop(mm);
} else if (mm_slot) {
down_write(&mm->mmap_sem);
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: CAI Qian <caiqian@redhat.com>,
Andrea Righi <andrea@betterlinux.com>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Izik Eidus <ieidus@redhat.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
Date: Thu, 2 Jun 2011 18:44:58 +0200 [thread overview]
Message-ID: <20110602164458.GG19505@random.random> (raw)
In-Reply-To: <20110602153641.GJ23047@sequoia.sous-sol.org>
On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > >
> > > Right, that's just what the program is trying to do, segfault.
> > >
> > > > +++ killed by SIGSEGV (core dumped) +++
> > > > Segmentation fault (core dumped)
> > > >
> > > > Did I miss anything?
> > >
> > > I found it works but not 100% of the time.
> > >
> > > So I just run the bug in a loop.
> >
> > echo 0 >scan_millisecs helps.
>
> BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> happened to be recent regression). It looks like mm_slot is off the list:
>
> R10: dead000000200200 R11: dead000000100100
Yes it had to be use after free.
I cooked this patch, still untested but it builds. Will test it soon.
===
Subject: ksm: fix __ksm_exit vs ksm scan SMP race
From: Andrea Arcangeli <aarcange@redhat.com>
If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
to zero but before __ksm_exit had a chance runs, both the KSM scan and
__ksm_exit will free the slot. This fixes the SMP race condition by using
test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
scan or not.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..47ef4c1 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
if (ksm_test_exit(mm)) {
hlist_del(&mm_slot->link);
list_del(&mm_slot->mm_list);
+ /*
+ * After releasing ksm_mmlist_lock __ksm_exit
+ * can run and we already changed mm_slot, so
+ * notify it with MMF_VM_MERGEABLE not to free
+ * this again.
+ */
+ clear_bit(MMF_VM_MERGEABLE, &mm->flags);
spin_unlock(&ksm_mmlist_lock);
free_mm_slot(mm_slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
up_read(&mm->mmap_sem);
mmdrop(mm);
} else {
@@ -1377,10 +1383,15 @@ next_mm:
*/
hlist_del(&slot->link);
list_del(&slot->mm_list);
+ /*
+ * After releasing ksm_mmlist_lock __ksm_exit can run
+ * and we already changed mm_slot, so notify it with
+ * MMF_VM_MERGEABLE not to free this again.
+ */
+ clear_bit(MMF_VM_MERGEABLE, &mm->flags);
spin_unlock(&ksm_mmlist_lock);
free_mm_slot(slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
up_read(&mm->mmap_sem);
mmdrop(mm);
} else {
@@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
return 0; /* just ignore the advice */
+ /*
+ * It should be safe to test_bit instead of
+ * test_and_bit_set because the madvise generic caller
+ * holds the mmap_sem write mode.
+ */
if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
err = __ksm_enter(mm);
if (err)
@@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
spin_unlock(&ksm_mmlist_lock);
+ /*
+ * It should be safe to set it outside ksm_mmlist_lock because
+ * we hold a mm_user pin on the mm so __ksm_exit can't run.
+ */
set_bit(MMF_VM_MERGEABLE, &mm->flags);
atomic_inc(&mm->mm_count);
@@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
mm_slot = get_mm_slot(mm);
if (mm_slot && ksm_scan.mm_slot != mm_slot) {
if (!mm_slot->rmap_list) {
- hlist_del(&mm_slot->link);
- list_del(&mm_slot->mm_list);
- easy_to_free = 1;
+ /*
+ * If MMF_VM_MERGEABLE isn't set it was freed
+ * by the scan immediately after mm_count
+ * reached zero (visible by the scan) but
+ * before __ksm_exit() run, so we don't need
+ * to do anything here. We don't even need to
+ * wait for the KSM scan to release the
+ * mmap_sem as it's not working on the mm
+ * anymore but it's just releasing it, and it
+ * probably already did and dropped its
+ * mm_count too (it would however be safe to
+ * take mmap_sem here even if MMF_VM_MERGEABLE
+ * is already clear, as the actual mm can't be
+ * freed until we return and we run mmdrop
+ * too, but it's unnecessary).
+ */
+ if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
+ hlist_del(&mm_slot->link);
+ list_del(&mm_slot->mm_list);
+ easy_to_free = 1;
+ } else
+ mm_slot = NULL;
} else {
list_move(&mm_slot->mm_list,
&ksm_scan.mm_slot->mm_list);
@@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
if (easy_to_free) {
free_mm_slot(mm_slot);
- clear_bit(MMF_VM_MERGEABLE, &mm->flags);
mmdrop(mm);
} else if (mm_slot) {
down_write(&mm->mmap_sem);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-06-02 16:46 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-01 22:20 [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan() Andrea Righi
2011-06-01 22:20 ` Andrea Righi
2011-06-02 1:53 ` Chris Wright
2011-06-02 1:53 ` Chris Wright
2011-06-02 7:09 ` CAI Qian
2011-06-02 7:09 ` CAI Qian
2011-06-02 14:19 ` Andrea Righi
2011-06-02 14:19 ` Andrea Righi
2011-06-02 16:48 ` Chris Wright
2011-06-02 16:48 ` Chris Wright
2011-06-02 17:29 ` Hugh Dickins
2011-06-02 17:29 ` Hugh Dickins
2011-06-02 17:43 ` Andrea Arcangeli
2011-06-02 17:43 ` Andrea Arcangeli
2011-06-03 17:06 ` Hugh Dickins
2011-06-03 17:06 ` Hugh Dickins
2011-06-03 18:13 ` Andrea Arcangeli
2011-06-03 18:13 ` Andrea Arcangeli
2011-06-02 17:35 ` [PATCH] ksm: fix race between ksmd and exiting task Chris Wright
2011-06-02 17:35 ` Chris Wright
2011-06-02 20:12 ` Andrea Righi
2011-06-02 20:12 ` Andrea Righi
2011-06-02 21:23 ` Chris Wright
2011-06-02 21:23 ` Chris Wright
2011-06-03 16:37 ` Hugh Dickins
2011-06-03 16:37 ` Hugh Dickins
2011-06-04 0:54 ` [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item Chris Wright
2011-06-04 0:54 ` Chris Wright
2011-06-02 20:10 ` [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan() Andrea Righi
2011-06-02 20:10 ` Andrea Righi
2011-06-03 4:42 ` CAI Qian
2011-06-03 4:42 ` CAI Qian
2011-06-02 14:31 ` Chris Wright
2011-06-02 14:31 ` Chris Wright
2011-06-02 14:36 ` Andrea Arcangeli
2011-06-02 14:36 ` Andrea Arcangeli
2011-06-02 15:36 ` Chris Wright
2011-06-02 15:36 ` Chris Wright
2011-06-02 16:44 ` Andrea Arcangeli [this message]
2011-06-02 16:44 ` Andrea Arcangeli
2011-06-02 20:15 ` Andrea Righi
2011-06-02 20:15 ` Andrea Righi
2011-06-02 21:35 ` Andrea Arcangeli
2011-06-02 21:35 ` Andrea Arcangeli
2011-06-03 4:50 ` CAI Qian
2011-06-03 4:50 ` CAI Qian
2011-06-03 4:44 ` CAI Qian
2011-06-03 4:44 ` CAI Qian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110602164458.GG19505@random.random \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@betterlinux.com \
--cc=caiqian@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=hughd@google.com \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.