From: "Matthieu Fertré" <matthieu.fertre@kerlabs.com>
To: Darren Hart <dvhart@linux.intel.com>
Cc: Louis Rilling <louis.rilling@kerlabs.com>,
linux-kernel@vger.kernel.org,
Rusty Russell <rusty@rustcorp.com.au>,
Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RESEND PATCH] futex: fix key reference counter in case of requeue.
Date: Mon, 18 Oct 2010 14:51:14 +0200 [thread overview]
Message-ID: <4CBC42C2.4040604@kerlabs.com> (raw)
In-Reply-To: <4CB8A7EB.6050303@linux.intel.com>
[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]
Hi Darren,
Le 15/10/2010 21:13, Darren Hart a écrit :
> On 10/14/2010 04:30 AM, Louis Rilling wrote:
>> From: Matthieu Fertré<matthieu.fertre@kerlabs.com>
>
> Hi Matthew,
>
>>
>> This patch ensures that we are referring to the right key when dropping
>> reference for the futex_wait operation.
>>
>> The following scenario explains a typical case where the bug was
>> happening:
>>
>> Process P calls futex_wait() on futex identified by 'key1'. 2 references
>> are taken on this key: one for the struct futex_q itself, and one for the
>> futex_wait operation.
>> If now, process P is requeued on a futex identified by 'key2', its
>> futex_q->key is updated from 'key1' to 'key2' and a reference is got
>> to 'key2' and one is dropped to 'key1'.
>> Later, another process calls futex_wake(): it gets a reference to
>> 'key2', wakes process P, and drops reference to 'key2'.
>> Once process P is woken up, it should unqueue, drop reference to 'key2'
>> (the one referring to the futex_q, this is done in unqueue_me())
>> and to 'key1' (the one referring to futex_wait operation). Without this
>> patch it drops reference to 'key2' instead of 'key1'.
>
> Nice catch. How did this manifest itself? Did you catch it just by code
> inspection?
I found it while testing the distributed implementation of futex in
Kerrighed (www.kerrighed.org).
After deeply looking, I noticed that the bug comes from vanilla linux
kernel 2.6.30 (the one on which current version of Kerrighed is based).
Then I checked if the bug still existed in latest linux rc or if there
were some bugfixes.
I have attached the test that reveals the bug on my system. The test
runs some basic wait/wake/requeue scenario on futex "hosted" in a sysv
shared memory segment. It is composed of one executable and 2 scripts
that are to be used with LTP. To run it without LPT, you can replace
calls to tst_resm/tst_brkm with echo in the shell scripts.
Without debugging facilities, it may BUG while destroying the shared
segment. As far as I remember, with some kernel hacking features
enabled, it was complaining in the kernel log, but there was no crash
and I don't remember exactly about what it complains.
>
> I've been trying to develop a futex test suite to catch issues with the
> futex implementation, as well as to test any changes made to avoid
> regressions. Mind having a look?
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary
I had already a git checkout for this futex test suite :)
It was not fitting to my tests since I was checking behavior of
distributed futex inside Kerrighed. (I need test done by separate
processes spreaded on different nodes accessing the futex through sysv
shared segments).
Regards,
Matthieu
>
>> Signed-off-by: Matthieu Fertré<matthieu.fertre@kerlabs.com>
>> Signed-off-by: Louis Rilling<louis.rilling@kerlabs.com>
>> ---
>> kernel/futex.c | 8 ++++++--
>> 1 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index 6a3a5fa..bed6717 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -1791,6 +1791,7 @@ static int futex_wait(u32 __user *uaddr, int
>> fshared,
>> struct restart_block *restart;
>> struct futex_hash_bucket *hb;
>> struct futex_q q;
>> + union futex_key key;
>
> We should be able to do this properly without requiring an additional
> key variable. I think tglx has proposed a suitable fix - but it needs
> testing to avoid any subtle regressions.
>
[-- Attachment #2: futex-shm-tool.c --]
[-- Type: text/x-csrc, Size: 4890 bytes --]
/*
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*
* Copyright (C) 2010 Kerlabs - Matthieu Fertré
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/mman.h>
#include <asm/unistd.h>
#include <linux/futex.h>
#include <sys/time.h>
static inline int futex(int *uaddr, int op, int val,
const struct timespec *utime, int *uaddr2, int val3)
{
return syscall(__NR_futex, uaddr, op, val, utime, uaddr2, val3);
}
#define SYSERROR(X, Y) \
do { \
if ((long)(X) == -1L) { \
perror(Y); \
exit(EXIT_FAILURE); \
} \
} while(0)
int shmkey = 23;
int shmkey2 = 24;
int nr_wake = 1;
int nr_requeue = 1;
int bitset = 0;
int quiet = 0;
struct timespec utime;
void print_usage(const char* cmd)
{
printf("%s -h: show this help\n", cmd);
}
void dowait(void)
{
int shmid, ret, *f, n;
struct timespec *timeout = NULL;
shmid = shmget(shmkey, 4, IPC_CREAT|0666);
SYSERROR(shmid, "shmget");
f = shmat(shmid, NULL, 0);
SYSERROR(f, "shmat");
n = *f;
if (utime.tv_sec)
timeout = &utime;
if (bitset) {
if (!quiet)
printf("WAIT_BITSET: %p{%x} bits: %x\n", f, n, bitset);
ret = futex(f, FUTEX_WAIT_BITSET, n, timeout, NULL, bitset);
} else {
if (!quiet)
printf("WAIT: %p{%x}\n", f, n);
ret = futex(f, FUTEX_WAIT, n, timeout, NULL, 0);
}
SYSERROR(ret, "futex_wait");
if (!quiet)
printf("WAITED: %d\n", ret);
ret = shmdt(f);
SYSERROR(ret, "shmdt");
}
int dowake(void)
{
int shmid, ret, *f, nr_proc;
shmid = shmget(shmkey, 4, IPC_CREAT|0666);
SYSERROR(shmid, "shmget");
f = shmat(shmid, NULL, 0);
SYSERROR(f, "shmat");
(*f)++;
if (bitset) {
if (!quiet)
printf("WAKE_BITSET: %p{%x} bits: %x\n", f, *f, bitset);
ret = futex(f, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset);
} else {
if (!quiet)
printf("WAKE: %p{%x}\n", f, *f);
ret = futex(f, FUTEX_WAKE, nr_wake, NULL, NULL, 0);
}
SYSERROR(ret, "futex_wake");
if (!quiet)
printf("WOKE: %d\n", ret);
nr_proc = ret;
ret = shmdt(f);
SYSERROR(ret, "shmdt");
if (!ret)
ret = nr_proc;
return ret;
}
int dorequeue(void)
{
int shmid1, shmid2, ret, *f1, *f2, nr_proc;
shmid1 = shmget(shmkey, 4, IPC_CREAT|0666);
SYSERROR(shmid1, "shmget");
shmid2 = shmget(shmkey2, 4, IPC_CREAT|0666);
SYSERROR(shmid2, "shmget");
f1 = shmat(shmid1, NULL, 0);
SYSERROR(f1, "shmat");
f2 = shmat(shmid2, NULL, 0);
SYSERROR(f2, "shmat");
/* requeue */
ret = futex(f1, FUTEX_REQUEUE, nr_wake,
(const struct timespec *) (long) nr_requeue, f2, 0);
SYSERROR(ret, "futex_requeue");
if (!quiet)
printf("WOKE or REQUEUED: %d\n", ret);
nr_proc = ret;
/* detaching shms */
ret = shmdt(f1);
SYSERROR(ret, "shmdt");
ret = shmdt(f2);
SYSERROR(ret, "shmdt");
if (!ret)
ret = nr_proc;
return ret;
}
void badfutex(void)
{
int *x;
int ret;
x = mmap(NULL, 16384, PROT_READ, MAP_PRIVATE|MAP_ANON, -1, 0);
SYSERROR(x, "mmap");
ret = futex(x, FUTEX_WAIT, 0, NULL, NULL, 0);
SYSERROR(ret, "futex");
}
void deletekey(void)
{
int shmid, ret;
shmid = shmget(shmkey, 4, 0666);
SYSERROR(shmid, "shmget");
ret = shmctl(shmid, IPC_RMID, NULL);
SYSERROR(ret, "shmctl(IPC_RMID)");
if (!quiet)
printf("SHM %d (id=%d) DELETED\n", shmkey, shmid);
}
void parse_args(int argc, char *argv[])
{
int c;
utime.tv_sec = 0;
utime.tv_nsec = 0;
while (1) {
c = getopt(argc, argv, "hqb:k:K:r:t:w:");
if (c == -1)
break;
switch (c) {
case 'h':
print_usage(argv[0]);
exit(EXIT_SUCCESS);
break;
case 'q':
quiet=1;
break;
case 'b':
bitset = atoi(optarg);
break;
case 'k':
shmkey = atoi(optarg);
break;
case 'K':
shmkey2 = atoi(optarg);
break;
case 'r':
nr_requeue = atoi(optarg);
break;
case 't':
utime.tv_sec = atoi(optarg);
break;
case 'w':
nr_wake = atoi(optarg);
break;
}
}
}
int main(int argc, char **argv)
{
char *action;
int ret = 0;
parse_args(argc, argv);
if (argc - optind == 0) {
print_usage(argv[0]);
exit(EXIT_FAILURE);
}
action = argv[optind];
if (!quiet)
printf("Command: %s\n", action);
if (strcmp(action, "badfutex") == 0)
badfutex();
else if (strcmp(action, "wait") == 0)
dowait();
else if (strcmp(action, "wake") == 0)
ret = dowake();
else if (strcmp(action, "requeue") == 0)
ret = dorequeue();
else if (strcmp(action, "delete") == 0)
deletekey();
else {
fprintf(stderr, "Unknown command\n");
print_usage(argv[0]);
exit(EXIT_FAILURE);
}
exit(ret);
}
[-- Attachment #3: futex_shm01.sh --]
[-- Type: application/x-shellscript, Size: 3373 bytes --]
[-- Attachment #4: lib_futex.sh --]
[-- Type: application/x-shellscript, Size: 3044 bytes --]
prev parent reply other threads:[~2010-10-18 12:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-14 11:30 [RESEND PATCH] futex: fix key reference counter in case of requeue Louis Rilling
2010-10-15 12:16 ` Thomas Gleixner
2010-10-15 19:19 ` Darren Hart
2010-10-18 12:14 ` Matthieu Fertré
2010-10-15 19:13 ` Darren Hart
2010-10-15 19:18 ` Thomas Gleixner
2010-10-18 12:51 ` Matthieu Fertré [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBC42C2.4040604@kerlabs.com \
--to=matthieu.fertre@kerlabs.com \
--cc=dvhart@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=louis.rilling@kerlabs.com \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox