Re: Soft lockups on kerberised NFSv4.0 clients

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Tuomas Räsänen" <tuomasjjrasanen@opinsys.fi>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>,
	linux-nfs@vger.kernel.org
Subject: Re: Soft lockups on kerberised NFSv4.0 clients
Date: Tue, 17 Jun 2014 13:51:42 +0000 (UTC)	[thread overview]
Message-ID: <1049368555.86792.1403013102335.JavaMail.zimbra@opinsys.fi> (raw)
In-Reply-To: <1726881404.72983.1402308693418.JavaMail.zimbra@opinsys.fi>

----- Original Message -----
> From: "Tuomas Räsänen" <tuomasjjrasanen@opinsys.fi>
> 
> The lockup mechnism seems to be as follows: the process (which is always
> firefox) is killed, and it tries to unlock the file (which is always a
> mmapped sqlite3 WAL index) which still has some pending IOs going on. The
> return value of nfs_wait_bit_killable() (-ERESTARTSYS from
> fatal_signal_pending(current)) is ignored and the process just keeps looṕing
> because io_count seems to be stuck at 1 (I still don't know why..). 

I wrote a simple program which simulates the behavior described above
and causes softlockups (see the bottom of the file).

Here's what it does:
- creates and opens jamfile.dat (10M)
- locks the file with flock
- spawns N threads which all:
  - mmap the whole file and write to the map
- unlocks the file after spawning threads

Sometimes unlocking flock() blocks for a while, waiting for pending
IOs [*]. If the process is killed during unlock (signaled SIGINT before the
program has printed 'unlock ok'), it seems to get stuck: pending IOs are
not finished and -ERESTARTSYS from nfs_wait_bit_killable() is not
handled, causing the task to loop inside __nfs_iocounter_wait()
indefinitely.

How to cause soft lockups:

1. Compile: gcc -pthread -o jam jam.c

2. Run ./jam

3. Press C-c shortly after running the script, after 'unlock' but before
   'unlock ok' is printed

4. You might need to repeat steps 2. and 3. couple of times

[*]: Sometimes flock() seem to block for *very* long time (for ever?),
     but sometimes only for a short period of time. But regarding this
     problem, it does not matter: whenever the task is killed during the
     unlock, the process freezes.

Applying the patch from my previous mail fixes the soft lockup issue,
because the task does not get into a infinite (or at least indefinite)
loop because interruptible wait_on_bit() is used instead. But what are
its side-effects? Is it completely brain-dead idea?

jam.c:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <unistd.h>

#define MAP_SIZE (sizeof(char) *  1024 * 1024 * 10)
#define THREADS 4

void *work_on_file(void *const arg)
{
	int i;
	int fd;
	char *map;

	fd = *((int *) arg);
	map = (char *) mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

	printf("write begins\n");
	for (i = 0; i < MAP_SIZE; ++i) {
		map[i] = 'a';
	}
	printf("write ends\n");

	return NULL;
}

int main(void)
{
	int i;
	pthread_t *threads;
	int fd;

	fd = open("jamfile.dat", O_RDWR | O_CREAT);
	ftruncate(fd, MAP_SIZE);

	threads = malloc(sizeof(pthread_t) * THREADS);

	printf("lock\n");
	if (flock(fd, LOCK_EX) == -1) {
		perror("failed to lock");
		return -1;
	}
	printf("lock ok\n");

	for (i = 0; i < THREADS; ++i) {
		pthread_attr_t attr;
		pthread_attr_init(&attr);
		pthread_create(&threads[i], &attr, &work_on_file, &fd);
		pthread_attr_destroy(&attr);
	}

	printf("unlock\n");
	if (flock(fd, LOCK_UN) == -1) {
		perror("failed to unlock");
		return -1;
	}
	printf("unlock ok\n");

	for (i = 0; i < THREADS; ++i) {
		pthread_join(threads[i], NULL);
	}

	free(threads);

	return close(fd);
}

-- 
Tuomas

next prev parent reply	other threads:[~2014-06-17 13:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <199810131.34257.1400570367382.JavaMail.zimbra@opinsys.fi>
2014-05-20  8:40 ` Soft lockups on kerberised NFSv4.0 clients Veli-Matti Lintu
2014-05-20 14:21   ` Jeff Layton
2014-05-21 14:55     ` Veli-Matti Lintu
2014-05-21 20:53       ` Jeff Layton
2014-06-02  9:56         ` Tuomas Räsänen
2014-06-02 19:10           ` Veli-Matti Lintu
2014-06-09 10:11         ` Tuomas Räsänen
2014-06-17 13:51           ` Tuomas Räsänen [this message]
2014-09-03  7:01           ` Tuomas Räsänen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1049368555.86792.1403013102335.JavaMail.zimbra@opinsys.fi \
    --to=tuomasjjrasanen@opinsys.fi \
    --cc=jlayton@poochiereds.net \
    --cc=linux-nfs@vger.kernel.org \
    --cc=veli-matti.lintu@opinsys.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).