All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Blackwood <john.blackwood@ccur.com>
To: linux-kernel@vger.kernel.org
Cc: Andrea Arcangeli <andrea@suse.de>, Andi Kleen <ak@suse.de>
Subject: Subject: [PATCH] mm/mempolicy.c linux-2.6.12.3
Date: Fri, 22 Jul 2005 11:58:54 -0400	[thread overview]
Message-ID: <42E117BE.5000003@ccur.com> (raw)

Hello Andrea,

I believe that we are seeing a problem with one change from the patch

   2005/01/03 20:15:21-08:00 andrea@novell.com
   [PATCH] mempolicy optimisation

in the mpol_free_shared_policy() routine in mm/mempolicy.c, where a
rb_erase() line was removed.

The corresponding portion of the original patch is shown below:

@@ -1086,11 +1084,11 @@
         while (next) {
                 n = rb_entry(next, struct sp_node, nd);
                 next = rb_next(&n->nd);
-               rb_erase(&n->nd, &p->root);
                 mpol_free(n->policy);
                 kmem_cache_free(sn_cache, n);
         }
         spin_unlock(&p->lock);
+       p->root = RB_ROOT;
  }


When we build a 2.6.11.4 debug kernel on a 2 cpu NUMA-enabled opteron
system, and run the following set of commands:

echo "1" > /tmp/numatest
numactl --length=0x4000 --shm /tmp/numatest --localalloc
numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0
numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1
ipcs
ipcrm -M "the_key_value_of_this_shm_area"


On the ipcrm call above, the system will oops:

general protection fault: 0000 [1] PREEMPT SMP

Entering kdb (current=0xffff81008161a0f0, pid 3102) on processor 1 Oops: 
<NULL>
due to oops @ 0xffffffff802e51f3
      r15 = 0x0000000000000100      r14 = 0xffff81007d7a5470
      r13 = 0xffff810001eea460      r12 = 0xffff81007c4324f8
      rbp = 0xffff81007c4324f8      rbx = 0xffff81007c4324f8
      r11 = 0x0000000000000202      r10 = 0x00002aaaaabc3db0
       r9 = 0xffff81007c4324a8       r8 = 0x0000000000000000
      rax = 0x000000000000000f      rcx = 0x0000000000000000
      rdx = 0x0000000000000000      rsi = 0x000000000000006b
      rdi = 0x6b6b6b6b6b6b6b6b orig_rax = 0xffffffffffffffff
      rip = 0xffffffff802e51f3       cs = 0x0000000000000010
   eflags = 0x0000000000010202      rsp = 0xffff81007d06bc40
       ss = 0xffff81007d06a000 &regs = 0xffff81007d06bba8
[1]kdb> bt
Stack traceback for pid 3102
0xffff81008161a0f0     3102     2761  1    1   R  0xffff81008161a480 *ipcrm
RSP           RIP                Function (args)
0xffff81007d06bc40 0xffffffff802e51f3 rb_next+0xb (0xffff810001eea7d8, 
0xffff810001eea518, 0xffff810001eea518, 0x0, 0xffff810001eea518)
0xffff81007d06bc58 0xffffffff80189c74 mpol_free_shared_policy+0x3a
0xffff81007d06bc88 0xffffffff8018d306 shmem_destroy_inode+0x1f
0xffff81007d06bc98 0xffffffff801ab0e8 destroy_inode+0x31 
(0xffff81007d7a5484)
0xffff81007d06bca8 0xffffffff801ac5a6 generic_delete_inode+0x10c
0xffff81007d06bcc8 0xffffffff801ac796 iput+0x77 (0xffff8100815673c8)
0xffff81007d06bcd8 0xffffffff801a915c dput+0x1c0 (0xffff81007d3e8408, 
0x10000, 0x0, 0x0, 0xffff81007d3e8408)
0xffff81007d06bcf8 0xffffffff80190a9a __fput+0x128 (0xffff81007d3e8408)
0xffff81007d06bd28 0xffffffff8019096d fput+0x14
0xffff81007d06bd38 0xffffffff802d1a8c shm_destroy+0x4d
0xffff81007d06bd48 0xffffffff802d26d9 sys_shmctl+0x689


The rdi = 0x6b6b6b6b6b6b6b6b above (and additional debugging that I did)
shows that we called rb_next(&n->nd) in the while loop and we got back
a rb_node that we already just deallocated via kmem_cache_free() in that
same while loop execution.

As a result, on the debug kernel, the already deallocated rb_node has
0x6b6b6b6b6b6b6b6b in its memory locations and we oops when we try to
use this value as a pointer.

When I put back the rb_erase() line, the above example test, and lots
of others like it, start working again w/out oops-ing.

I'm no rb tree expert, but it seems that we still need to have the
rb_erase() line in the while loop:

diff -u linux-2.6.12.3/mm/mempolicy.c new/mm/mempolicy.c
--- linux-2.6.12.3/mm/mempolicy.c	2005-07-15 17:18:57.000000000 -0400
+++ new/mm/mempolicy.c	2005-07-22 11:12:56.000000000 -0400
@@ -1104,6 +1104,7 @@
  	while (next) {
  		n = rb_entry(next, struct sp_node, nd);
  		next = rb_next(&n->nd);
+		rb_erase(&n->nd, &p->root);
  		mpol_free(n->policy);
  		kmem_cache_free(sn_cache, n);
  	}



Thank you for your time and considerations.


                 reply	other threads:[~2005-07-22 15:59 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42E117BE.5000003@ccur.com \
    --to=john.blackwood@ccur.com \
    --cc=ak@suse.de \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.