public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Matthieu Fertré" <matthieu.fertre@kerlabs.com>
To: Darren Hart <dvhart@linux.intel.com>
Cc: Louis Rilling <louis.rilling@kerlabs.com>,
	linux-kernel@vger.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RESEND PATCH] futex: fix key reference counter in case of requeue.
Date: Mon, 18 Oct 2010 14:51:14 +0200	[thread overview]
Message-ID: <4CBC42C2.4040604@kerlabs.com> (raw)
In-Reply-To: <4CB8A7EB.6050303@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

Hi Darren,

Le 15/10/2010 21:13, Darren Hart a écrit :
> On 10/14/2010 04:30 AM, Louis Rilling wrote:
>> From: Matthieu Fertré<matthieu.fertre@kerlabs.com>
> 
> Hi Matthew,
> 
>>
>> This patch ensures that we are referring to the right key when dropping
>> reference for the futex_wait operation.
>>
>> The following scenario explains a typical case where the bug was
>> happening:
>>
>> Process P calls futex_wait() on futex identified by 'key1'. 2 references
>> are taken on this key: one for the struct futex_q itself, and one for the
>> futex_wait operation.
>> If now, process P is requeued on a futex identified by 'key2', its
>> futex_q->key is updated from 'key1' to 'key2' and a reference is got
>> to 'key2' and one is dropped to 'key1'.
>> Later, another process calls futex_wake(): it gets a reference to
>> 'key2', wakes process P, and drops reference to 'key2'.
>> Once process P is woken up, it should unqueue, drop reference to 'key2'
>> (the one referring to the futex_q, this is done in unqueue_me())
>> and to 'key1' (the one referring to futex_wait operation). Without this
>> patch it drops reference to 'key2' instead of 'key1'.
> 
> Nice catch. How did this manifest itself? Did you catch it just by code
> inspection?


I found it while testing the distributed implementation of futex in
Kerrighed (www.kerrighed.org).

After deeply looking, I noticed that the bug comes from vanilla linux
kernel 2.6.30 (the one on which current version of Kerrighed is based).
Then I checked if the bug still existed in latest linux rc or if there
were some bugfixes.

I have attached the test that reveals the bug on my system. The test
runs some basic wait/wake/requeue scenario on futex "hosted" in a sysv
shared memory segment. It is composed of one executable and 2 scripts
that are to be used with LTP. To run it without LPT, you can replace
calls to tst_resm/tst_brkm with echo in the shell scripts.

Without debugging facilities, it may BUG while destroying the shared
segment. As far as I remember, with some kernel hacking features
enabled, it was complaining in the kernel log, but there was no crash
and I don't remember exactly about what it complains.

> 
> I've been trying to develop a futex test suite to catch issues with the
> futex implementation, as well as to test any changes made to avoid
> regressions. Mind having a look?
> 
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary

I had already a git checkout for this futex test suite :)
It was not fitting to my tests since I was checking behavior of
distributed futex inside Kerrighed. (I need test done by separate
processes spreaded on different nodes accessing the futex through sysv
shared segments).

Regards,

Matthieu

> 
>> Signed-off-by: Matthieu Fertré<matthieu.fertre@kerlabs.com>
>> Signed-off-by: Louis Rilling<louis.rilling@kerlabs.com>
>> ---
>>   kernel/futex.c |    8 ++++++--
>>   1 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index 6a3a5fa..bed6717 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -1791,6 +1791,7 @@ static int futex_wait(u32 __user *uaddr, int
>> fshared,
>>       struct restart_block *restart;
>>       struct futex_hash_bucket *hb;
>>       struct futex_q q;
>> +    union futex_key key;
> 
> We should be able to do this properly without requiring an additional
> key variable. I think tglx has proposed a suitable fix - but it needs
> testing to avoid any subtle regressions.
> 


[-- Attachment #2: futex-shm-tool.c --]
[-- Type: text/x-csrc, Size: 4890 bytes --]

/*
 * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
 * Written by David Howells (dhowells@redhat.com)
 *
 * Copyright (C) 2010 Kerlabs - Matthieu Fertré
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version
 * 2 of the License, or (at your option) any later version.
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/mman.h>
#include <asm/unistd.h>
#include <linux/futex.h>
#include <sys/time.h>

static inline int futex(int *uaddr, int op, int val,
			const struct timespec *utime, int *uaddr2, int val3)
{
	return syscall(__NR_futex, uaddr, op, val, utime, uaddr2, val3);
}

#define SYSERROR(X, Y)					\
do {							\
	if ((long)(X) == -1L) {				\
		perror(Y);				\
		exit(EXIT_FAILURE);			\
	}						\
} while(0)

int shmkey = 23;
int shmkey2 = 24;
int nr_wake = 1;
int nr_requeue = 1;
int bitset = 0;
int quiet = 0;
struct timespec utime;

void print_usage(const char* cmd)
{
	printf("%s -h: show this help\n", cmd);
}

void dowait(void)
{
	int shmid, ret, *f, n;
	struct timespec *timeout = NULL;

	shmid = shmget(shmkey, 4, IPC_CREAT|0666);
	SYSERROR(shmid, "shmget");

	f = shmat(shmid, NULL, 0);
	SYSERROR(f, "shmat");

	n = *f;

	if (utime.tv_sec)
		timeout = &utime;

	if (bitset) {
		if (!quiet)
			printf("WAIT_BITSET: %p{%x} bits: %x\n", f, n, bitset);
		ret = futex(f, FUTEX_WAIT_BITSET, n, timeout, NULL, bitset);
	} else {
		if (!quiet)
			printf("WAIT: %p{%x}\n", f, n);
		ret = futex(f, FUTEX_WAIT, n, timeout, NULL, 0);
	}
	SYSERROR(ret, "futex_wait");
	if (!quiet)
		printf("WAITED: %d\n", ret);

	ret = shmdt(f);
	SYSERROR(ret, "shmdt");
}

int dowake(void)
{
	int shmid, ret, *f, nr_proc;

	shmid = shmget(shmkey, 4, IPC_CREAT|0666);
	SYSERROR(shmid, "shmget");

	f = shmat(shmid, NULL, 0);
	SYSERROR(f, "shmat");

	(*f)++;

	if (bitset) {
		if (!quiet)
			printf("WAKE_BITSET: %p{%x} bits: %x\n", f, *f, bitset);
		ret = futex(f, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset);
	} else {
		if (!quiet)
			printf("WAKE: %p{%x}\n", f, *f);
		ret = futex(f, FUTEX_WAKE, nr_wake, NULL, NULL, 0);
	}

	SYSERROR(ret, "futex_wake");
	if (!quiet)
		printf("WOKE: %d\n", ret);
	nr_proc = ret;

	ret = shmdt(f);
	SYSERROR(ret, "shmdt");

	if (!ret)
		ret = nr_proc;

	return ret;
}

int dorequeue(void)
{
	int shmid1, shmid2, ret, *f1, *f2, nr_proc;

	shmid1 = shmget(shmkey, 4, IPC_CREAT|0666);
	SYSERROR(shmid1, "shmget");

	shmid2 = shmget(shmkey2, 4, IPC_CREAT|0666);
	SYSERROR(shmid2, "shmget");

	f1 = shmat(shmid1, NULL, 0);
	SYSERROR(f1, "shmat");

	f2 = shmat(shmid2, NULL, 0);
	SYSERROR(f2, "shmat");

	/* requeue */
	ret = futex(f1, FUTEX_REQUEUE, nr_wake,
		    (const struct timespec *) (long) nr_requeue, f2, 0);
	SYSERROR(ret, "futex_requeue");

	if (!quiet)
		printf("WOKE or REQUEUED: %d\n", ret);
	nr_proc = ret;

	/* detaching shms */
	ret = shmdt(f1);
	SYSERROR(ret, "shmdt");

	ret = shmdt(f2);
	SYSERROR(ret, "shmdt");

	if (!ret)
		ret = nr_proc;

	return ret;
}

void badfutex(void)
{
	int *x;
	int ret;

	x = mmap(NULL, 16384, PROT_READ, MAP_PRIVATE|MAP_ANON, -1, 0);
	SYSERROR(x, "mmap");

	ret = futex(x, FUTEX_WAIT, 0, NULL, NULL, 0);
	SYSERROR(ret, "futex");
}

void deletekey(void)
{
	int shmid, ret;

	shmid = shmget(shmkey, 4, 0666);
	SYSERROR(shmid, "shmget");

	ret = shmctl(shmid, IPC_RMID, NULL);
	SYSERROR(ret, "shmctl(IPC_RMID)");

	if (!quiet)
		printf("SHM %d (id=%d) DELETED\n", shmkey, shmid);
}

void parse_args(int argc, char *argv[])
{
	int c;

	utime.tv_sec = 0;
	utime.tv_nsec = 0;

	while (1) {
		c = getopt(argc, argv, "hqb:k:K:r:t:w:");
		if (c == -1)
			break;

		switch (c) {
		case 'h':
			print_usage(argv[0]);
			exit(EXIT_SUCCESS);
			break;
		case 'q':
			quiet=1;
			break;

		case 'b':
			bitset = atoi(optarg);
			break;
		case 'k':
			shmkey = atoi(optarg);
			break;
		case 'K':
			shmkey2 = atoi(optarg);
			break;
		case 'r':
			nr_requeue = atoi(optarg);
			break;
		case 't':
			utime.tv_sec = atoi(optarg);
			break;
		case 'w':
			nr_wake = atoi(optarg);
			break;
		}
	}
}

int main(int argc, char **argv)
{
	char *action;
	int ret = 0;

	parse_args(argc, argv);

	if (argc - optind == 0) {
		print_usage(argv[0]);
		exit(EXIT_FAILURE);
	}

	action = argv[optind];

	if (!quiet)
		printf("Command: %s\n", action);

	if (strcmp(action, "badfutex") == 0)
		badfutex();

	else if (strcmp(action, "wait") == 0)
		dowait();

	else if (strcmp(action, "wake") == 0)
		ret = dowake();

	else if (strcmp(action, "requeue") == 0)
		ret = dorequeue();

	else if (strcmp(action, "delete") == 0)
		deletekey();

	else {
		fprintf(stderr, "Unknown command\n");
		print_usage(argv[0]);
		exit(EXIT_FAILURE);
	}

	exit(ret);
}

[-- Attachment #3: futex_shm01.sh --]
[-- Type: application/x-shellscript, Size: 3373 bytes --]

[-- Attachment #4: lib_futex.sh --]
[-- Type: application/x-shellscript, Size: 3044 bytes --]

      parent reply	other threads:[~2010-10-18 12:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-14 11:30 [RESEND PATCH] futex: fix key reference counter in case of requeue Louis Rilling
2010-10-15 12:16 ` Thomas Gleixner
2010-10-15 19:19   ` Darren Hart
2010-10-18 12:14   ` Matthieu Fertré
2010-10-15 19:13 ` Darren Hart
2010-10-15 19:18   ` Thomas Gleixner
2010-10-18 12:51   ` Matthieu Fertré [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBC42C2.4040604@kerlabs.com \
    --to=matthieu.fertre@kerlabs.com \
    --cc=dvhart@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox