Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Herton R. Krzesinski" <herton@redhat.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Rafael Aquini <aquini@redhat.com>, Joe Perches <joe@perches.com>,
	Aristeu Rozanski <aris@redhat.com>,
	djeffery@redhat.com
Subject: Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Date: Tue, 11 Aug 2015 13:48:33 -0300	[thread overview]
Message-ID: <20150811164833.GA29948@dhcppc4.redhat.com> (raw)
In-Reply-To: <55C8F533.1090007@colorfullife.com>

On Mon, Aug 10, 2015 at 09:02:11PM +0200, Manfred Spraul wrote:
> Hi Herton,
> 
> On 08/10/2015 05:31 PM, Herton R. Krzesinski wrote:
> >Well without the synchronize_rcu() and with the semid list loop fix I was still
> >able to get issues, and I thought the problem is related to racing with IPC_RMID
> >on freeary again. This is one scenario I would imagine:
> >
> >                A                                                  B
> >
> >freeary()
> >   list_del(&un->list_id)
> >   spin_lock(&un->ulp->lock)
> >   un->semid = -1
> >   list_del_rcu(&un->list_proc)
> >     __list_del_entry(&un->list_proc)
> >       __list_del(entry->prev, entry->next)      exit_sem()
> >         next->prev = prev;                        ...
> >         prev->next = next;                        ...
> >         ...                                       un = list_entry_rcu(ulp->list_proc.next...)
> >     (&un->list_proc)->prev = LIST_POISON2         if (&un->list_proc == &ulp->list_proc) <true, last un removed by thread A>
> >   ...                                             kfree(ulp)
> >   spin_unlock(&un->ulp->lock) <---- bug
> >
> >Now that is a very tight window, but I had problems even when I tried this patch
> >first:
> >
> >(...)
> >-               if (&un->list_proc == &ulp->list_proc)
> >-                       semid = -1;
> >-                else
> >-                       semid = un->semid;
> >+               if (&un->list_proc == &ulp->list_proc) {
> >+                       rcu_read_unlock();
> What about:
> + spin_unlock_wait(&ulp->lock);

spin_unlock_wait works, thanks (both in theory and in practice, I tested).

> >+                       break;
> >+               }
> >+               spin_lock(&ulp->lock);
> >+               semid = un->semid;
> >+               spin_unlock(&ulp->lock);
> >
> >+               /* exit_sem raced with IPC_RMID, nothing to do */
> >                 if (semid == -1) {
> >                         rcu_read_unlock();
> >-                       break;
> >+                       synchronize_rcu();
> >+                       continue;
> >                 }
> >(...)
> >
> >So even with the bad/uneeded synchronize_rcu() which I had placed there, I could
> >still get issues (however the testing on patch above was on an older kernel than
> >latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest
> >upstream, however the affected code is the same). That's when I thought of
> >scenario above. I was able to get this oops:
> Adding sleep() usually help, too. But it is ugly, so let's try to understand
> the race and to fix it.

Yes, the race should be what I described earlier, and it's understood. I also
now have proof in practice that without synchronization before exit, I'm able to
trigger a crash on latest upstream kernel as well. This is the change I tested
with latest stock 4.2-rc6:

diff --git a/ipc/sem.c b/ipc/sem.c
index bc3d530..3b8b66b 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2074,17 +2074,26 @@ void exit_sem(struct task_struct *tsk)
                rcu_read_lock();
                un = list_entry_rcu(ulp->list_proc.next,
                                    struct sem_undo, list_proc);
-               if (&un->list_proc == &ulp->list_proc)
-                       semid = -1;
-                else
-                       semid = un->semid;
+               if (&un->list_proc == &ulp->list_proc) {
+                       /* we must wait for freeary() before freeing this ulp,
+                        * in case we raced with last sem_undo. There is a small
+                        * possibility where we exit while freeary() didn't
+                        * finish unlocking sem_undo_list */
+                       /*spin_unlock_wait(&ulp->lock);*/
+                       rcu_read_unlock();
+                       break;
+               }
+               spin_lock(&ulp->lock);
+               semid = un->semid;
+               spin_unlock(&ulp->lock);
 
+               /* exit_sem raced with IPC_RMID, nothing to do */
                if (semid == -1) {
                        rcu_read_unlock();
-                       break;
+                       continue;
                }
 
-               sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
+               sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
                /* exit_sem raced with IPC_RMID, nothing to do */
                if (IS_ERR(sma)) {
                        rcu_read_unlock();


I commented the synchronization for testing (the spin_unlock_wait call).

Because of the comment/removing the spin_unlock_wait, I'm then able to trigger
the following crash:

[  587.768363] BUG: spinlock wrong owner on CPU#2, test/419
[  587.768434] general protection fault: 0000 [#1] SMP 
[  587.768454] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6
table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul c
rc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcor
e qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
[  587.768712] CPU: 2 PID: 419 Comm: test Not tainted 4.2.0-rc6+ #1
[  587.768732] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[  587.768762] task: ffff88003ad68340 ti: ffff88003a0ec000 task.ti: ffff88003a0ec000
[  587.768785] RIP: 0010:[<ffffffff810d60b3>]  [<ffffffff810d60b3>] spin_dump+0x53/0xc0
[  587.768818] RSP: 0018:ffff88003a0efd68  EFLAGS: 00010202
[  587.768836] RAX: 000000000000002c RBX: ffff88003d016f08 RCX: 0000000000000000
[  587.768859] RDX: ffff88003fd11518 RSI: ffff88003fd0eb68 RDI: ffff88003fd0eb68
[  587.768882] RBP: ffff88003a0efd78 R08: 0000000000000000 R09: 000000006b6b6b6b
[  587.768904] R10: 000000000000025f R11: ffffc90000b30020 R12: 6b6b6b6b6b6b6b6b
[  587.768927] R13: ffff88003a5e5c40 R14: ffff88003d1eb340 R15: ffff88003a5e5c40
[  587.768949] FS:  00007fc2fcd89700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[  587.768981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  587.769002] CR2: 00007fc2fcb84dd4 CR3: 000000003d230000 CR4: 00000000000406e0
[  587.769004] Stack:
[  587.769004]  ffff88003d016f08 ffffffff81a4f4ad ffff88003a0efd98 ffffffff810d6150
[  587.769004]  ffff88003d016f08 ffff88003a5e5cb8 ffff88003a0efdb8 ffffffff810d61e2
[  587.769004]  ffff88003a5e5c40 ffff88003a5e5c90 ffff88003a0efdc8 ffffffff8175d50e
[  587.769004] Call Trace:
[  587.769004]  [<ffffffff810d6150>] spin_bug+0x30/0x40
[  587.769004]  [<ffffffff810d61e2>] do_raw_spin_unlock+0x82/0xa0
[  587.769004]  [<ffffffff8175d50e>] _raw_spin_unlock+0xe/0x10
[  587.769004]  [<ffffffff812f45f2>] freeary+0x82/0x2a0
[  587.769004]  [<ffffffff8175d58e>] ? _raw_spin_lock+0xe/0x10
[  587.769004]  [<ffffffff812f5f2e>] semctl_down.clone.0+0xce/0x160
[  587.769004]  [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[  587.769004]  [<ffffffff8112e5e8>] ? __audit_syscall_entry+0xa8/0x100
[  587.769004]  [<ffffffff812f61f6>] SyS_semctl+0x236/0x2c0
[  587.769004]  [<ffffffff810215ce>] ? syscall_trace_leave+0xde/0x130
[  587.769004]  [<ffffffff8175d7ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[  587.769004] Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2d a4 81 31 c0 65 8b 15 8b 40 f3 7e e8 21 33 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89 
[  587.769004] RIP  [<ffffffff810d60b3>] spin_dump+0x53/0xc0
[  587.769004]  RSP <ffff88003a0efd68>
[  587.769563] ---[ end trace 844d186084062ba4 ]---
[  612.022002] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:446]
[  612.022002] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
[  612.022002] CPU: 3 PID: 446 Comm: test Tainted: G      D         4.2.0-rc6+ #1
[  612.022002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[  612.022002] task: ffff88003e3ab040 ti: ffff88003ac20000 task.ti: ffff88003ac20000
[  612.022002] RIP: 0010:[<ffffffff8101cb36>]  [<ffffffff8101cb36>] native_read_tsc+0x6/0x20
[  612.022002] RSP: 0018:ffff88003ac23c08  EFLAGS: 00000206
[  612.022002] RAX: 00000000e0df8c7d RBX: 800000002060b065 RCX: 00000000e0df8c5f
[  612.022002] RDX: 00000000000001a4 RSI: 0000000000000000 RDI: 0000000000000001
[  612.022002] RBP: ffff88003ac23c08 R08: 0000000000000000 R09: ffff88003aea5cc0
[  612.022002] R10: 00007ffdf0e89830 R11: 0000000000000008 R12: ffffffff8106a938
[  612.022002] R13: ffff88003ac23b98 R14: ffff88003a295300 R15: ffff88003a295000
[  612.022002] FS:  00007fc2fcd89700(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000
[  612.022002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  612.022002] CR2: 00007fc2fc8dffd0 CR3: 000000003905d000 CR4: 00000000000406e0
[  612.022002] Stack:
[  612.022002]  ffff88003ac23c38 ffffffff8138bd30 ffff88003a5e5c40 00000000ac762d50
[  612.022002]  000000003662aefc 0000000000000001 ffff88003ac23c48 ffffffff8138bcaf
[  612.022002]  ffff88003ac23c78 ffffffff810d6386 ffff88003a5e5c40 ffff880037a0f880
[  612.022002] Call Trace:
[  612.022002]  [<ffffffff8138bd30>] delay_tsc+0x40/0x70
[  612.022002]  [<ffffffff8138bcaf>] __delay+0xf/0x20
[  612.022002]  [<ffffffff810d6386>] do_raw_spin_lock+0x96/0x140
[  612.022002]  [<ffffffff8175d58e>] _raw_spin_lock+0xe/0x10
[  612.022002]  [<ffffffff812f4cb1>] sem_lock_and_putref+0x11/0x70
[  612.022002]  [<ffffffff812f59df>] SYSC_semtimedop+0x7bf/0x960
[  612.022002]  [<ffffffff811becd6>] ? handle_mm_fault+0xbf6/0x1880
[  612.022002]  [<ffffffff8175d50e>] ? _raw_spin_unlock+0xe/0x10
[  612.022002]  [<ffffffff81194e16>] ? free_pcppages_bulk+0x436/0x5a0
[  612.022002]  [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[  612.022002]  [<ffffffff811e4866>] ? kfree_debugcheck+0x16/0x40
[  612.022002]  [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[  612.022002]  [<ffffffff8112e5e8>] ? __audit_syscall_entry+0xa8/0x100
[  612.022002]  [<ffffffff81021686>] ? do_audit_syscall_entry+0x66/0x70
[  612.022002]  [<ffffffff810217c9>] ? syscall_trace_enter_phase1+0x139/0x160
[  612.022002]  [<ffffffff812f5b8e>] SyS_semtimedop+0xe/0x10
[  612.022002]  [<ffffffff812f36f0>] SyS_semop+0x10/0x20
[  612.022002]  [<ffffffff8175d7ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[  612.022002] Code: c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 55 48 89 e5 0f 31 <89> c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9 c3 66 2e 0f 1f 84 
[  612.022002] Kernel panic - not syncing: softlockup: hung tasks

Then in another build with the spin_unlock_wait in, I left the test case running
for some hours, I didn't see issues. In theory also I expect there is no more
problems.

I'll submit a new version of the patch, it's same diff as above but without the
comment.

> 
> Best regards,
>     Manfred

Thanks,
-- 
[]'s
Herton

     prev parent reply	other threads:[~2015-08-11 16:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-07 17:09 [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Herton R. Krzesinski
2015-08-07 19:30 ` Aristeu Rozanski
2015-08-09 17:49 ` Manfred Spraul
2015-08-10 15:31   ` Herton R. Krzesinski
2015-08-10 19:02     ` Manfred Spraul
2015-08-11 16:48       ` Herton R. Krzesinski [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:bc3d530 dfblob:3b8b66b )
 OR (
bs:"Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150811164833.GA29948@dhcppc4.redhat.com \
    --to=herton@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=aris@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=djeffery@redhat.com \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.