futex_wake_op deadlock?

Linux MIPS Architecture development
 help / color / mirror / Atom feed

* futex_wake_op deadlock?
@ 2007-11-16 23:52 Kaz Kylheku
  2007-11-16 23:52 ` Kaz Kylheku
  2007-11-19 18:48 ` Ralf Baechle
  0 siblings, 2 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-16 23:52 UTC (permalink / raw)
  To: linux-mips

Hey everyone,

From time to time, on 2.6.17.7, I see a deadlock situation go off. The
soft lockup tick occurs in the middle of do_futex, which is heavily
inlined.  The system is actually hosed; it's not one of those
recoverable CPU busy situations that can sometimes trigger the lockup
detector.

The instruction that is interrupted by the soft lockup tick appears to
be in the assembly code (__futex_atomic_op) used by the futex_wake_op
function; the case is FUTEX_OP_SET.  It's the instruction just before
the load-linked; i.e. the interrupt is outside of the ll/sc loop.

I can't figure out how the code would get into a loop here. The ll/sc
logic should eventually succeed. There is a large loop in the overall
futex operation, but that is bounded by an interation variable
(attempt++).

(I checked the 2.6.17 head, but there doesn't appear to be any
futex-related work).

This lockup has reproduced more than once for us. Once at bootup, and
several times on shutdown.

The call stack always includes several do_futex frames, and a
compat_sys_futex/handle_sysn32 at the top of the chain.

This is from syslog (the unusual format is due to running metalog rather
than syslog in our distribution, and the human-readable time in the
square-bracketed printk timestamps is a locally developed patch):

Jan  3 02:47:02 [kernel] [02:47:02.953075]  [<ffffffff8016de8c>]
softlockup_tick+0x1bc/0x208
Jan  3 02:47:02 [kernel] [02:47:02.953121]  [<ffffffff8014cc54>]
update_process_times+0x9c/0xe8
Jan  3 02:47:02 [kernel] [02:47:02.953158]  [<ffffffff801098bc>]
ll_local_timer_interrupt+0x94/0xa8
Jan  3 02:47:02 [kernel] [02:47:02.953194]  [<ffffffff801026a0>]
plat_irq_dispatch+0x120/0x1a0
Jan  3 02:47:02 [kernel] [02:47:02.953221]  [<ffffffff80163758>]
do_futex+0x870/0xb58
Jan  3 02:47:02 [kernel] [02:47:02.953251]  [<ffffffff801637e0>]
do_futex+0x8f8/0xb58
Jan  3 02:47:02 [kernel] [02:47:02.953275]  [<ffffffff8047b16c>]
__lock_text_end+0x1b3c/0x474c
Jan  3 02:47:02 [kernel] [02:47:02.953312]  [<ffffffff8036fc40>]
sys_sendto+0xe8/0x140
Jan  3 02:47:02 [kernel] [02:47:02.953345]  [<ffffffff80163fac>]
compat_sys_futex+0x84/0x188
Jan  3 02:47:02 [kernel] [02:47:02.953372]  [<ffffffff80116314>]
handle_sysn32+0x54/0xb0

The sys_sendto is a red herring, since the backtrace function dumps
every single word on the stack as an address, not having any frame
pointers to go by.

The code surrounding ffffffff80163758:

ffffffff8016374c:	00023000 	sll	a2,v0,0x0
ffffffff80163750:	08058c77 	j	ffffffff801631dc
<do_futex+0x2f4>
ffffffff80163754:	00034000 	sll	a4,v1,0x0
ffffffff80163758:	0000102d 	move	v0,zero      <----<<
ffffffff8016375c:	c2030000 	ll	v1,0(s0)
ffffffff80163760:	00a0082d 	move	at,a1
ffffffff80163764:	e2010000 	sc	at,0(s0)
ffffffff80163768:	1020fffc 	beqz	at,ffffffff8016375c
<do_futex+0x874>
ffffffff8016376c:	00000000 	nop
ffffffff80163770:	0000000f 	sync
ffffffff80163774:	8f870024 	lw	a3,36(gp)
ffffffff80163778:	00023000 	sll	a2,v0,0x0
ffffffff8016377c:	08058c77 	j	ffffffff801631dc
<do_futex+0x2f4>

You can tell from the "move at, a1" that it's the FUTEX_OP_SET case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* futex_wake_op deadlock?
  2007-11-16 23:52 futex_wake_op deadlock? Kaz Kylheku
@ 2007-11-16 23:52 ` Kaz Kylheku
  2007-11-19 18:48 ` Ralf Baechle
  1 sibling, 0 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-16 23:52 UTC (permalink / raw)
  To: linux-mips

Hey everyone,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: futex_wake_op deadlock?
  2007-11-16 23:52 futex_wake_op deadlock? Kaz Kylheku
  2007-11-16 23:52 ` Kaz Kylheku
@ 2007-11-19 18:48 ` Ralf Baechle
  2007-11-19 21:27   ` Kaz Kylheku
  1 sibling, 1 reply; 15+ messages in thread
From: Ralf Baechle @ 2007-11-19 18:48 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Fri, Nov 16, 2007 at 03:52:47PM -0800, Kaz Kylheku wrote:

> From time to time, on 2.6.17.7, I see a deadlock situation go off. The
> soft lockup tick occurs in the middle of do_futex, which is heavily
> inlined.  The system is actually hosed; it's not one of those
> recoverable CPU busy situations that can sometimes trigger the lockup
> detector.

Can you reproduce thing hang also if you're not running in a binary compat
mode, that is either running o32 binaries on a 32-bit kernel or 64-bit
binaries on a 64-bit kernel?

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-19 18:48 ` Ralf Baechle
@ 2007-11-19 21:27   ` Kaz Kylheku
  2007-11-19 21:27     ` Kaz Kylheku
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-19 21:27 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> On Fri, Nov 16, 2007 at 03:52:47PM -0800, Kaz Kylheku wrote:
> 
>> From time to time, on 2.6.17.7, I see a deadlock situation go off.
>> The soft lockup tick occurs in the middle of do_futex, which is
>> heavily inlined.  The system is actually hosed; it's not one of those
>> recoverable CPU busy situations that can sometimes trigger the lockup
>> detector.
> 
> Can you reproduce thing hang also if you're not running in a
> binary compat
> mode, that is either running o32 binaries on a 32-bit kernel or
> 64-bit binaries on a 64-bit kernel? 

I have hacked up little a test program which hosed my board within
seconds.
The system is not completely hung. However:

- I can't kill the test program with Ctrl-C.
- I can log into the box with telnet.
- If I run "ps aux" to see all processes, the ps command hangs partway
through the table, and cannot be killed with Ctrl-C.
- System hangs on soft reboot attempt; requires hard reset.

The program basically uses several threads to beat up the FUTEX_WAKE_OP.

The key trick is that there is an interfering thread which does a
mmap/munmap on the futexes in parallel with the threads which are using
them. .

If I just stick the futexes into a permanently good memory location,
nothing bad happens; the program just churns away taking up 400% of the
CPU time across the four cores of the 1480. If you call the function
with permanently bad addresses, nothing bad happens either; the syscalls
bail nicely with EFAULT.

The idea is to tickle some race condition or other bug in the
interaction between futexes and mmap.  I put a little delay into the
interfering thread so that the memory is held in a good state most of
the time, with a quick unmap/remap. We want the memory to be good most
of the time, but an unmap to happen from time to time at an inopportune
time, while the kernel is executing the futex code on one or more cores

This needs to be compiled -pthread, obviously, and you need -lrt to link
in the library for clock_nanosleep.

#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/syscall.h>
#include <sys/mman.h>

#define FUTEX_WAIT              0
#define FUTEX_WAKE              1
#define FUTEX_FD                2
#define FUTEX_REQUEUE           3
#define FUTEX_CMP_REQUEUE       4
#define FUTEX_WAKE_OP           5

#define FUTEX_OP_SET            0       /* *(int *)UADDR2 = OPARG; */
#define FUTEX_OP_ADD            1       /* *(int *)UADDR2 += OPARG; */
#define FUTEX_OP_OR             2       /* *(int *)UADDR2 |= OPARG; */
#define FUTEX_OP_ANDN           3       /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR            4       /* *(int *)UADDR2 ^= OPARG; */

#define FUTEX_OP_OPARG_SHIFT    8       /* Use (1 << OPARG) instead of
OPARG.  */

#define FUTEX_OP_CMP_EQ         0       /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE         1       /* if (oldval != CMPARG) wake */
#define FUTEX_OP_CMP_LT         2       /* if (oldval < CMPARG) wake */
#define FUTEX_OP_CMP_LE         3       /* if (oldval <= CMPARG) wake */
#define FUTEX_OP_CMP_GT         4       /* if (oldval > CMPARG) wake */
#define FUTEX_OP_CMP_GE         5       /* if (oldval >= CMPARG) wake */

#define NUM_THREADS 8

int futex_wake_op(int *addr1, int *addr2, 
                  int nr_wake_1, int nr_wake_2, int encoded_op)
{
    syscall(SYS_futex, addr1, FUTEX_WAKE_OP, nr_wake_1, 
            nr_wake_2, addr2, encoded_op);
}

int futex1 = 0, futex2 = 0;

struct {
    int futex1;
    int futex2;
} *shared;

void *mapper(void *arg)
{
    for (;;) {
        struct timespec delay;
        void *mem;

        delay.tv_sec = 0;
        delay.tv_nsec = 100000000;

        mem = mmap(0, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS, -1, 0);

        if (mem == (void *) -1) {
            perror("mmap");
            exit(EXIT_FAILURE);
        }

        shared = mem;

        clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &delay, 0);

        if (munmap(mem, 16384) < 0) {
            perror("munmap");
            exit(EXIT_FAILURE);
        }
    }
}

void *waker(void *arg)
{
    int rand_state = 1;

    for (;;) {
        int val = rand_r(&rand_state) & 0xFFFF;
        const int op = (FUTEX_OP_SET << 28) | (FUTEX_OP_CMP_GT << 24) |
val;
        int result = futex_wake_op(&shared->futex1, &shared->futex2, 1,
1, op);

        if (result < 0 && errno != EFAULT) {
            perror("futex_wake_op");
            exit(EXIT_FAILURE);
        }
    }

    /* notreached */
    return 0;
}

int main(void)
{
    int i;
    srand(1);

    for (i = 0; i < NUM_THREADS; i++) {
        pthread_t thr;
        void *(*func)(void *) = (i == 0) ? mapper : waker;
        int result = errno = pthread_create(&thr, 0, func, 0);
        if (result != 0) {
            perror("pthread_create");
            return EXIT_FAILURE;
        }
    }

    pthread_exit(0);
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-19 21:27   ` Kaz Kylheku
@ 2007-11-19 21:27     ` Kaz Kylheku
  2007-11-19 21:42     ` Kaz Kylheku
  2007-11-20 11:21     ` Ralf Baechle
  2 siblings, 0 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-19 21:27 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> On Fri, Nov 16, 2007 at 03:52:47PM -0800, Kaz Kylheku wrote:
> 
>> From time to time, on 2.6.17.7, I see a deadlock situation go off.
>> The soft lockup tick occurs in the middle of do_futex, which is
>> heavily inlined.  The system is actually hosed; it's not one of those
>> recoverable CPU busy situations that can sometimes trigger the lockup
>> detector.
> 
> Can you reproduce thing hang also if you're not running in a
> binary compat
> mode, that is either running o32 binaries on a 32-bit kernel or
> 64-bit binaries on a 64-bit kernel? 

I have hacked up little a test program which hosed my board within
seconds.
The system is not completely hung. However:

- I can't kill the test program with Ctrl-C.
- I can log into the box with telnet.
- If I run "ps aux" to see all processes, the ps command hangs partway
through the table, and cannot be killed with Ctrl-C.
- System hangs on soft reboot attempt; requires hard reset.

The program basically uses several threads to beat up the FUTEX_WAKE_OP.

The key trick is that there is an interfering thread which does a
mmap/munmap on the futexes in parallel with the threads which are using
them. .

If I just stick the futexes into a permanently good memory location,
nothing bad happens; the program just churns away taking up 400% of the
CPU time across the four cores of the 1480. If you call the function
with permanently bad addresses, nothing bad happens either; the syscalls
bail nicely with EFAULT.

The idea is to tickle some race condition or other bug in the
interaction between futexes and mmap.  I put a little delay into the
interfering thread so that the memory is held in a good state most of
the time, with a quick unmap/remap. We want the memory to be good most
of the time, but an unmap to happen from time to time at an inopportune
time, while the kernel is executing the futex code on one or more cores

This needs to be compiled -pthread, obviously, and you need -lrt to link
in the library for clock_nanosleep.

#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/syscall.h>
#include <sys/mman.h>

#define FUTEX_WAIT              0
#define FUTEX_WAKE              1
#define FUTEX_FD                2
#define FUTEX_REQUEUE           3
#define FUTEX_CMP_REQUEUE       4
#define FUTEX_WAKE_OP           5

#define FUTEX_OP_SET            0       /* *(int *)UADDR2 = OPARG; */
#define FUTEX_OP_ADD            1       /* *(int *)UADDR2 += OPARG; */
#define FUTEX_OP_OR             2       /* *(int *)UADDR2 |= OPARG; */
#define FUTEX_OP_ANDN           3       /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR            4       /* *(int *)UADDR2 ^= OPARG; */

#define FUTEX_OP_OPARG_SHIFT    8       /* Use (1 << OPARG) instead of
OPARG.  */

#define FUTEX_OP_CMP_EQ         0       /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE         1       /* if (oldval != CMPARG) wake */
#define FUTEX_OP_CMP_LT         2       /* if (oldval < CMPARG) wake */
#define FUTEX_OP_CMP_LE         3       /* if (oldval <= CMPARG) wake */
#define FUTEX_OP_CMP_GT         4       /* if (oldval > CMPARG) wake */
#define FUTEX_OP_CMP_GE         5       /* if (oldval >= CMPARG) wake */

#define NUM_THREADS 8

int futex_wake_op(int *addr1, int *addr2, 
                  int nr_wake_1, int nr_wake_2, int encoded_op)
{
    syscall(SYS_futex, addr1, FUTEX_WAKE_OP, nr_wake_1, 
            nr_wake_2, addr2, encoded_op);
}

int futex1 = 0, futex2 = 0;

struct {
    int futex1;
    int futex2;
} *shared;

void *mapper(void *arg)
{
    for (;;) {
        struct timespec delay;
        void *mem;

        delay.tv_sec = 0;
        delay.tv_nsec = 100000000;

        mem = mmap(0, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS, -1, 0);

        if (mem == (void *) -1) {
            perror("mmap");
            exit(EXIT_FAILURE);
        }

        shared = mem;

        clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &delay, 0);

        if (munmap(mem, 16384) < 0) {
            perror("munmap");
            exit(EXIT_FAILURE);
        }
    }
}

void *waker(void *arg)
{
    int rand_state = 1;

    for (;;) {
        int val = rand_r(&rand_state) & 0xFFFF;
        const int op = (FUTEX_OP_SET << 28) | (FUTEX_OP_CMP_GT << 24) |
val;
        int result = futex_wake_op(&shared->futex1, &shared->futex2, 1,
1, op);

        if (result < 0 && errno != EFAULT) {
            perror("futex_wake_op");
            exit(EXIT_FAILURE);
        }
    }

    /* notreached */
    return 0;
}

int main(void)
{
    int i;
    srand(1);

    for (i = 0; i < NUM_THREADS; i++) {
        pthread_t thr;
        void *(*func)(void *) = (i == 0) ? mapper : waker;
        int result = errno = pthread_create(&thr, 0, func, 0);
        if (result != 0) {
            perror("pthread_create");
            return EXIT_FAILURE;
        }
    }

    pthread_exit(0);
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-19 21:27   ` Kaz Kylheku
  2007-11-19 21:27     ` Kaz Kylheku
@ 2007-11-19 21:42     ` Kaz Kylheku
  2007-11-19 21:42       ` Kaz Kylheku
  2007-11-20 11:21     ` Ralf Baechle
  2 siblings, 1 reply; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-19 21:42 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Earlier, I wrote:
> I have hacked up little a test program which hosed my board within
> seconds. The system is not completely hung. However:
> 
> - I can't kill the test program with Ctrl-C.
> - I can log into the box with telnet.
> - If I run "ps aux" to see all processes, the ps command hangs partway
> through the table, and cannot be killed with Ctrl-C.
> - System hangs on soft reboot attempt; requires hard reset.

Furthermore: my console loglevel was too high to see the crash on the
serial console, but, surely enough, the syslog has this:

Nov 19 14:19:57 [kernel] [14:19:57.846017] BUG: soft lockup detected on
CPU#1!
Nov 19 14:19:57 [kernel] [14:19:57.846051] Call Trace:
Nov 19 14:19:58 [kernel] [14:19:57.846069]  [<ffffffff8016de8c>]
softlockup_tick+0x1bc/0x208
Nov 19 14:19:58 [kernel] [14:19:57.846112]  [<ffffffff8014cc54>]
update_process_times+0x9c/0xe8
Nov 19 14:19:58 [kernel] [14:19:57.846147]  [<ffffffff801098bc>]
ll_local_timer_interrupt+0x94/0xa8
Nov 19 14:19:58 [kernel] [14:19:57.846180]  [<ffffffff801098bc>]
ll_local_timer_interrupt+0x94/0xa8
Nov 19 14:19:58 [kernel] [14:19:57.846205]  [<ffffffff801026a0>]
plat_irq_dispatch+0x120/0x1a0
Nov 19 14:19:58 [kernel] [14:19:57.846232]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846258]  [<ffffffff801637e0>]
do_futex+0x8f8/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846281]  [<ffffffff8011db28>]
tlb_do_page_fault_1+0x110/0x128
Nov 19 14:19:58 [kernel] [14:19:57.846317]  [<ffffffff80163758>]
do_futex+0x870/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846339]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846364]  [<ffffffff80163170>]
do_futex+0x288/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846385]  [<ffffffff801637e0>]
do_futex+0x8f8/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846407]  [<ffffffff80163764>]
do_futex+0x87c/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846430]  [<ffffffff80177500>]
__alloc_pages+0x70/0x398
Nov 19 14:19:58 [kernel] [14:19:57.846456]  [<ffffffff80130d1c>]
try_to_wake_up+0x3c4/0x4f8
Nov 19 14:19:58 [kernel] [14:19:57.846489]  [<ffffffff802f3c28>]
__up_read+0xe8/0x130
Nov 19 14:19:58 [kernel] [14:19:57.846528]  [<ffffffff80163fac>]
compat_sys_futex+0x84/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846552]  [<ffffffff80116314>]
handle_sysn32+0x54/0xb0
Nov 19 14:19:58 [kernel] [14:19:57.846578]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-19 21:42     ` Kaz Kylheku
@ 2007-11-19 21:42       ` Kaz Kylheku
  0 siblings, 0 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-19 21:42 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Earlier, I wrote:
> I have hacked up little a test program which hosed my board within
> seconds. The system is not completely hung. However:
> 
> - I can't kill the test program with Ctrl-C.
> - I can log into the box with telnet.
> - If I run "ps aux" to see all processes, the ps command hangs partway
> through the table, and cannot be killed with Ctrl-C.
> - System hangs on soft reboot attempt; requires hard reset.

Furthermore: my console loglevel was too high to see the crash on the
serial console, but, surely enough, the syslog has this:

Nov 19 14:19:57 [kernel] [14:19:57.846017] BUG: soft lockup detected on
CPU#1!
Nov 19 14:19:57 [kernel] [14:19:57.846051] Call Trace:
Nov 19 14:19:58 [kernel] [14:19:57.846069]  [<ffffffff8016de8c>]
softlockup_tick+0x1bc/0x208
Nov 19 14:19:58 [kernel] [14:19:57.846112]  [<ffffffff8014cc54>]
update_process_times+0x9c/0xe8
Nov 19 14:19:58 [kernel] [14:19:57.846147]  [<ffffffff801098bc>]
ll_local_timer_interrupt+0x94/0xa8
Nov 19 14:19:58 [kernel] [14:19:57.846180]  [<ffffffff801098bc>]
ll_local_timer_interrupt+0x94/0xa8
Nov 19 14:19:58 [kernel] [14:19:57.846205]  [<ffffffff801026a0>]
plat_irq_dispatch+0x120/0x1a0
Nov 19 14:19:58 [kernel] [14:19:57.846232]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846258]  [<ffffffff801637e0>]
do_futex+0x8f8/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846281]  [<ffffffff8011db28>]
tlb_do_page_fault_1+0x110/0x128
Nov 19 14:19:58 [kernel] [14:19:57.846317]  [<ffffffff80163758>]
do_futex+0x870/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846339]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846364]  [<ffffffff80163170>]
do_futex+0x288/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846385]  [<ffffffff801637e0>]
do_futex+0x8f8/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846407]  [<ffffffff80163764>]
do_futex+0x87c/0xb58
Nov 19 14:19:58 [kernel] [14:19:57.846430]  [<ffffffff80177500>]
__alloc_pages+0x70/0x398
Nov 19 14:19:58 [kernel] [14:19:57.846456]  [<ffffffff80130d1c>]
try_to_wake_up+0x3c4/0x4f8
Nov 19 14:19:58 [kernel] [14:19:57.846489]  [<ffffffff802f3c28>]
__up_read+0xe8/0x130
Nov 19 14:19:58 [kernel] [14:19:57.846528]  [<ffffffff80163fac>]
compat_sys_futex+0x84/0x188
Nov 19 14:19:58 [kernel] [14:19:57.846552]  [<ffffffff80116314>]
handle_sysn32+0x54/0xb0
Nov 19 14:19:58 [kernel] [14:19:57.846578]  [<ffffffff80163f28>]
compat_sys_futex+0x0/0x188

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: futex_wake_op deadlock?
  2007-11-19 21:27   ` Kaz Kylheku
  2007-11-19 21:27     ` Kaz Kylheku
  2007-11-19 21:42     ` Kaz Kylheku
@ 2007-11-20 11:21     ` Ralf Baechle
  2007-11-20 18:06       ` Kaz Kylheku
                         ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Ralf Baechle @ 2007-11-20 11:21 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Mon, Nov 19, 2007 at 01:27:37PM -0800, Kaz Kylheku wrote:

> >> From time to time, on 2.6.17.7, I see a deadlock situation go off.
> >> The soft lockup tick occurs in the middle of do_futex, which is
> >> heavily inlined.  The system is actually hosed; it's not one of those
> >> recoverable CPU busy situations that can sometimes trigger the lockup
> >> detector.
> > 
> > Can you reproduce thing hang also if you're not running in a
> > binary compat
> > mode, that is either running o32 binaries on a 32-bit kernel or
> > 64-bit binaries on a 64-bit kernel? 
> 
> I have hacked up little a test program which hosed my board within
> seconds.
> The system is not completely hung. However:

Cute.  So looking again at the futex code this morning it was quite
obvious what happened.  The ll/sc loops in __futex_atomic_op() had the
usual fixups necessary for memory acccesses to userspace from kernel
space installed:

        __asm__ __volatile__(
        "       .set    push                            \n"
        "       .set    noat                            \n"
        "       .set    mips3                           \n"
        "1:     ll      %1, %4  # __futex_atomic_op     \n"
        "       .set    mips0                           \n"
        "       " insn  "                               \n"
        "       .set    mips3                           \n"
        "2:     sc      $1, %2                          \n"
        "       beqz    $1, 1b                          \n"
        __WEAK_LLSC_MB
        "3:                                             \n"
        "       .set    pop                             \n"
        "       .set    mips0                           \n"
        "       .section .fixup,\"ax\"                  \n"
        "4:     li      %0, %6                          \n"
        "       j       2b                              \n"	<-----
        "       .previous                               \n"
        "       .section __ex_table,\"a\"               \n"
        "       "__UA_ADDR "\t1b, 4b                    \n"
        "       "__UA_ADDR "\t2b, 4b                    \n"
        "       .previous                               \n"
        : "=r" (ret), "=&r" (oldval), "=R" (*uaddr)
        : "0" (0), "R" (*uaddr), "Jr" (oparg), "i" (-EFAULT)
        : "memory");

Notice the branch at the end of the fixup code, it goes back to the
SC instruction.  The SC instruction took an exception so it will not have
changed $1 so the loop will continue endless unless by coincidence the
value to be stored from $1 happened to be zero.

Obviously this one was MIPS specific and may hit all supported ABIs.  So
my initial suspicion this might be the issue David Miller recently
discovered in the binary compat code isn't true.  And it's a local DoS
probably for all of 2.6.16 and up.

Patch below.  It fixes your test case on a 32-bit kernel for me.

  Ralf

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

diff --git a/include/asm-mips/futex.h b/include/asm-mips/futex.h
index 3e7e30d..17f082c 100644
--- a/include/asm-mips/futex.h
+++ b/include/asm-mips/futex.h
@@ -35,7 +35,7 @@
 		"	.set	mips0				\n"	\
 		"	.section .fixup,\"ax\"			\n"	\
 		"4:	li	%0, %6				\n"	\
-		"	j	2b				\n"	\
+		"	j	3b				\n"	\
 		"	.previous				\n"	\
 		"	.section __ex_table,\"a\"		\n"	\
 		"	"__UA_ADDR "\t1b, 4b			\n"	\
@@ -61,7 +61,7 @@
 		"	.set	mips0				\n"	\
 		"	.section .fixup,\"ax\"			\n"	\
 		"4:	li	%0, %6				\n"	\
-		"	j	2b				\n"	\
+		"	j	3b				\n"	\
 		"	.previous				\n"	\
 		"	.section __ex_table,\"a\"		\n"	\
 		"	"__UA_ADDR "\t1b, 4b			\n"	\

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-20 11:21     ` Ralf Baechle
@ 2007-11-20 18:06       ` Kaz Kylheku
  2007-11-20 18:06         ` Kaz Kylheku
  2007-11-20 18:16         ` Ralf Baechle
  2007-11-20 18:24       ` Kaz Kylheku
  2007-11-20 18:29       ` David Daney
  2 siblings, 2 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-20 18:06 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
>         __asm__ __volatile__(
>         "       .set    push                            \n"
>         "       .set    noat                            \n"
>         "       .set    mips3                           \n"
>         "1:     ll      %1, %4  # __futex_atomic_op     \n"
>         "       .set    mips0                           \n"
>         "       " insn  "                               \n"
>         "       .set    mips3                           \n"
>         "2:     sc      $1, %2                          \n"
>         "       beqz    $1, 1b                          \n"        
>         __WEAK_LLSC_MB "3:                                           
>         \n" "       .set    pop                             \n"
>         "       .set    mips0                           \n"
>         "       .section .fixup,\"ax\"                  \n"
>         "4:     li      %0, %6                          \n"
>         "       j       2b                              \n"	<-----
>         "       .previous                               \n"
>         "       .section __ex_table,\"a\"               \n"
>         "       "__UA_ADDR "\t1b, 4b                    \n"
>         "       "__UA_ADDR "\t2b, 4b                    \n"
>         "       .previous                               \n"
>         : "=r" (ret), "=&r" (oldval), "=R" (*uaddr)
>         : "0" (0), "R" (*uaddr), "Jr" (oparg), "i" (-EFAULT)        
> : "memory"); 
> 
> Notice the branch at the end of the fixup code, it goes back to the
> SC instruction. 

Hi Ralf,

I had gone through all that code, but didn't see it!

The problem is I didn't pay enough attention because I didn't suspect it
enough.

I was misled by the backtrace address in the soft lockup dump, which
points to one instruction /before/ the ll instruction. So I thought that
the lockup is somewhere outside of that loop, right?

Does the backward branch on MIPS set up the instruction pointer in such
a way that if an interrupt goes off, it can be pointing to the previous
instruction? I thought about that possibility.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-20 18:06       ` Kaz Kylheku
@ 2007-11-20 18:06         ` Kaz Kylheku
  2007-11-20 18:16         ` Ralf Baechle
  1 sibling, 0 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-20 18:06 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
>         __asm__ __volatile__(
>         "       .set    push                            \n"
>         "       .set    noat                            \n"
>         "       .set    mips3                           \n"
>         "1:     ll      %1, %4  # __futex_atomic_op     \n"
>         "       .set    mips0                           \n"
>         "       " insn  "                               \n"
>         "       .set    mips3                           \n"
>         "2:     sc      $1, %2                          \n"
>         "       beqz    $1, 1b                          \n"        
>         __WEAK_LLSC_MB "3:                                           
>         \n" "       .set    pop                             \n"
>         "       .set    mips0                           \n"
>         "       .section .fixup,\"ax\"                  \n"
>         "4:     li      %0, %6                          \n"
>         "       j       2b                              \n"	<-----
>         "       .previous                               \n"
>         "       .section __ex_table,\"a\"               \n"
>         "       "__UA_ADDR "\t1b, 4b                    \n"
>         "       "__UA_ADDR "\t2b, 4b                    \n"
>         "       .previous                               \n"
>         : "=r" (ret), "=&r" (oldval), "=R" (*uaddr)
>         : "0" (0), "R" (*uaddr), "Jr" (oparg), "i" (-EFAULT)        
> : "memory"); 
> 
> Notice the branch at the end of the fixup code, it goes back to the
> SC instruction. 

Hi Ralf,

I had gone through all that code, but didn't see it!

The problem is I didn't pay enough attention because I didn't suspect it
enough.

I was misled by the backtrace address in the soft lockup dump, which
points to one instruction /before/ the ll instruction. So I thought that
the lockup is somewhere outside of that loop, right?

Does the backward branch on MIPS set up the instruction pointer in such
a way that if an interrupt goes off, it can be pointing to the previous
instruction? I thought about that possibility.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: futex_wake_op deadlock?
  2007-11-20 18:06       ` Kaz Kylheku
  2007-11-20 18:06         ` Kaz Kylheku
@ 2007-11-20 18:16         ` Ralf Baechle
  1 sibling, 0 replies; 15+ messages in thread
From: Ralf Baechle @ 2007-11-20 18:16 UTC (permalink / raw)
  To: Kaz Kylheku; +Cc: linux-mips

On Tue, Nov 20, 2007 at 10:06:44AM -0800, Kaz Kylheku wrote:

> The problem is I didn't pay enough attention because I didn't suspect it
> enough.
> 
> I was misled by the backtrace address in the soft lockup dump, which
> points to one instruction /before/ the ll instruction. So I thought that
> the lockup is somewhere outside of that loop, right?
> 
> Does the backward branch on MIPS set up the instruction pointer in such
> a way that if an interrupt goes off, it can be pointing to the previous
> instruction? I thought about that possibility.

The EPC will always point to the instruction which caused the exception
with the one special case where an instruction in a branch delay slot
was causing the exception.  If that's the case the EPC will point at the
branch and the BD bit in the cause register (bit 31) will be set to
indicate this special case.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-20 11:21     ` Ralf Baechle
  2007-11-20 18:06       ` Kaz Kylheku
@ 2007-11-20 18:24       ` Kaz Kylheku
  2007-11-20 18:24         ` Kaz Kylheku
  2007-11-20 18:29       ` David Daney
  2 siblings, 1 reply; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-20 18:24 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> Patch below.  It fixes your test case on a 32-bit kernel for me.

I'm running it now on 64 bit. The test case isn't causing any ill
effects.

Thanks a lot, Ralf!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: futex_wake_op deadlock?
  2007-11-20 18:24       ` Kaz Kylheku
@ 2007-11-20 18:24         ` Kaz Kylheku
  0 siblings, 0 replies; 15+ messages in thread
From: Kaz Kylheku @ 2007-11-20 18:24 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Ralf Baechle wrote:
> Patch below.  It fixes your test case on a 32-bit kernel for me.

I'm running it now on 64 bit. The test case isn't causing any ill
effects.

Thanks a lot, Ralf!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: futex_wake_op deadlock?
  2007-11-20 11:21     ` Ralf Baechle
  2007-11-20 18:06       ` Kaz Kylheku
  2007-11-20 18:24       ` Kaz Kylheku
@ 2007-11-20 18:29       ` David Daney
  2007-11-20 19:00         ` Ralf Baechle
  2 siblings, 1 reply; 15+ messages in thread
From: David Daney @ 2007-11-20 18:29 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Kaz Kylheku, linux-mips

Ralf Baechle wrote:

> 
> Notice the branch at the end of the fixup code, it goes back to the
> SC instruction.  The SC instruction took an exception so it will not have
> changed $1 so the loop will continue endless unless by coincidence the
> value to be stored from $1 happened to be zero.
> 
> Obviously this one was MIPS specific and may hit all supported ABIs.  So
> my initial suspicion this might be the issue David Miller recently
> discovered in the binary compat code isn't true.  And it's a local DoS
> probably for all of 2.6.16 and up.
> 

I mostly similar code is in 2.6.15, so I think it is effected as well. 
2.6.12 on the other hand doesn't seem to have futex.h

David Daney

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: futex_wake_op deadlock?
  2007-11-20 18:29       ` David Daney
@ 2007-11-20 19:00         ` Ralf Baechle
  0 siblings, 0 replies; 15+ messages in thread
From: Ralf Baechle @ 2007-11-20 19:00 UTC (permalink / raw)
  To: David Daney; +Cc: Kaz Kylheku, linux-mips

On Tue, Nov 20, 2007 at 10:29:47AM -0800, David Daney wrote:

>> Notice the branch at the end of the fixup code, it goes back to the
>> SC instruction.  The SC instruction took an exception so it will not have
>> changed $1 so the loop will continue endless unless by coincidence the
>> value to be stored from $1 happened to be zero.
>>
>> Obviously this one was MIPS specific and may hit all supported ABIs.  So
>> my initial suspicion this might be the issue David Miller recently
>> discovered in the binary compat code isn't true.  And it's a local DoS
>> probably for all of 2.6.16 and up.
>>
>
> I mostly similar code is in 2.6.15, so I think it is effected as well. 
> 2.6.12 on the other hand doesn't seem to have futex.h

It originally appeared in the lmo kernel for 2.6.14-rc1 and a little
after the 2.6.14 release in kernel.org.

If I say 2.6.16 then it's simply that I don't ever look at anything that
doesn't have a -stable branch.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-11-20 19:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-16 23:52 futex_wake_op deadlock? Kaz Kylheku
2007-11-16 23:52 ` Kaz Kylheku
2007-11-19 18:48 ` Ralf Baechle
2007-11-19 21:27   ` Kaz Kylheku
2007-11-19 21:27     ` Kaz Kylheku
2007-11-19 21:42     ` Kaz Kylheku
2007-11-19 21:42       ` Kaz Kylheku
2007-11-20 11:21     ` Ralf Baechle
2007-11-20 18:06       ` Kaz Kylheku
2007-11-20 18:06         ` Kaz Kylheku
2007-11-20 18:16         ` Ralf Baechle
2007-11-20 18:24       ` Kaz Kylheku
2007-11-20 18:24         ` Kaz Kylheku
2007-11-20 18:29       ` David Daney
2007-11-20 19:00         ` Ralf Baechle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox