All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: + poll-allow-f_op-poll-to-sleep-take-4.patch added to -mm tree
@ 2008-11-25 17:30 Oleg Nesterov
  2008-11-25 21:08 ` Davide Libenzi
  2008-11-26  4:33 ` Tejun Heo
  0 siblings, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2008-11-25 17:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Eric Van Hensbergen, Ron Minnich, Ingo Molnar, Christoph Hellwig,
	Miklos Szeredi, Davide Libenzi, Brad Boyer, Al Viro,
	Roland McGrath, Mauro Carvalho Chehab, Andrew Morton,
	linux-kernel

Hi!

Minor question about mbs, just trying to understand the patch...

> +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct poll_wqueues *pwq = wait->private;
> +	DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
> +
> +	set_mb(pwq->triggered, 1);
> +
> +	/*
> +	 * Perform the default wake up operation using a dummy
> +	 * waitqueue.
> +	 *
> +	 * TODO: This is hacky but there currently is no interface to
> +	 * pass in @sync.  @sync is scheduled to be removed and once
> +	 * that happens, wake_up_process() can be used directly.
> +	 */
> +	return default_wake_function(&dummy_wait, mode, sync, key);
> +}
> +
> +int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> +			  ktime_t *expires, unsigned long slack)
> +{
> +	int rc = -EINTR;
> +
> +	set_current_state(state);
> +	if (!pwq->triggered)
> +		rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
> +	__set_current_state(TASK_RUNNING);

So, why do we need this mb() in pollwake() ?

try_to_wake_up() has a full barrier semantics, note the wmb() before
task_rq_lock(). Since spin_lock() itself is STORE, the setting of
pwq->triggered can't be further re-ordered with the reading of p->state.

Or any other reason ?

> +	/* clear triggered for the next iteration */
> +	pwq->triggered = 0;

And don't we (in theory) actually need the mb() here instead?

Let's suppose do_poll() starts the next iteration, so we are doing

	pwq->triggered = 0;

	->poll(file)
		if (!check_file(file))
			return 0;

		return POLLXXX;

We don't have any barriers in between (unless fget_light bumps
->f_count), so this can be reordered as

	->poll(file)
		if (!check_file(file))
			return 0;

		pwq->triggered = 0;

And, if pollwake() happens in between we can miss the event, no?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread
* + poll-allow-f_op-poll-to-sleep-take-4.patch added to -mm tree
@ 2008-11-24 22:37 akpm
  0 siblings, 0 replies; 15+ messages in thread
From: akpm @ 2008-11-24 22:37 UTC (permalink / raw)
  To: mm-commits
  Cc: htejun, davidel, ericvh, flar, hch, mchehab, mingo, mszeredi,
	rminnich, roland, tj, viro


The patch titled
     poll: allow f_op->poll to sleep
has been added to the -mm tree.  Its filename is
     poll-allow-f_op-poll-to-sleep-take-4.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
out what to do about this

The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/

------------------------------------------------------
Subject: poll: allow f_op->poll to sleep
From: Tejun Heo <htejun@gmail.com>

f_op->poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Brad Boyer <flar@allandria.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/filesystems/Locking |    2 
 drivers/media/video/v4l1-compat.c |    4 -
 fs/select.c                       |   58 +++++++++++++++++++++-------
 include/linux/poll.h              |   15 +++++--
 4 files changed, 58 insertions(+), 21 deletions(-)

diff -puN Documentation/filesystems/Locking~poll-allow-f_op-poll-to-sleep-take-4 Documentation/filesystems/Locking
--- a/Documentation/filesystems/Locking~poll-allow-f_op-poll-to-sleep-take-4
+++ a/Documentation/filesystems/Locking
@@ -394,7 +394,7 @@ prototypes:
 };
 
 locking rules:
-	All except ->poll() may block.
+	All may block.
 			BKL
 llseek:			no	(see below)
 read:			no
diff -puN drivers/media/video/v4l1-compat.c~poll-allow-f_op-poll-to-sleep-take-4 drivers/media/video/v4l1-compat.c
--- a/drivers/media/video/v4l1-compat.c~poll-allow-f_op-poll-to-sleep-take-4
+++ a/drivers/media/video/v4l1-compat.c
@@ -203,7 +203,6 @@ static int poll_one(struct file *file, s
 	table = &pwq->pt;
 	for (;;) {
 		int mask;
-		set_current_state(TASK_INTERRUPTIBLE);
 		mask = file->f_op->poll(file, table);
 		if (mask & POLLIN)
 			break;
@@ -212,9 +211,8 @@ static int poll_one(struct file *file, s
 			retval = -ERESTARTSYS;
 			break;
 		}
-		schedule();
+		poll_schedule(pwq, TASK_INTERRUPTIBLE);
 	}
-	set_current_state(TASK_RUNNING);
 	poll_freewait(pwq);
 	return retval;
 }
diff -puN fs/select.c~poll-allow-f_op-poll-to-sleep-take-4 fs/select.c
--- a/fs/select.c~poll-allow-f_op-poll-to-sleep-take-4
+++ a/fs/select.c
@@ -109,11 +109,11 @@ static void __pollwait(struct file *filp
 void poll_initwait(struct poll_wqueues *pwq)
 {
 	init_poll_funcptr(&pwq->pt, __pollwait);
+	pwq->polling_task = current;
 	pwq->error = 0;
 	pwq->table = NULL;
 	pwq->inline_index = 0;
 }
-
 EXPORT_SYMBOL(poll_initwait);
 
 static void free_poll_entry(struct poll_table_entry *entry)
@@ -142,12 +142,10 @@ void poll_freewait(struct poll_wqueues *
 		free_page((unsigned long) old);
 	}
 }
-
 EXPORT_SYMBOL(poll_freewait);
 
-static struct poll_table_entry *poll_get_entry(poll_table *_p)
+static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
 {
-	struct poll_wqueues *p = container_of(_p, struct poll_wqueues, pt);
 	struct poll_table_page *table = p->table;
 
 	if (p->inline_index < N_INLINE_POLL_ENTRIES)
@@ -159,7 +157,6 @@ static struct poll_table_entry *poll_get
 		new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL);
 		if (!new_table) {
 			p->error = -ENOMEM;
-			__set_current_state(TASK_RUNNING);
 			return NULL;
 		}
 		new_table->entry = new_table->entries;
@@ -171,20 +168,57 @@ static struct poll_table_entry *poll_get
 	return table->entry++;
 }
 
+static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+	struct poll_wqueues *pwq = wait->private;
+	DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
+
+	set_mb(pwq->triggered, 1);
+
+	/*
+	 * Perform the default wake up operation using a dummy
+	 * waitqueue.
+	 *
+	 * TODO: This is hacky but there currently is no interface to
+	 * pass in @sync.  @sync is scheduled to be removed and once
+	 * that happens, wake_up_process() can be used directly.
+	 */
+	return default_wake_function(&dummy_wait, mode, sync, key);
+}
+
 /* Add a new entry */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 				poll_table *p)
 {
-	struct poll_table_entry *entry = poll_get_entry(p);
+	struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
+	struct poll_table_entry *entry = poll_get_entry(pwq);
 	if (!entry)
 		return;
 	get_file(filp);
 	entry->filp = filp;
 	entry->wait_address = wait_address;
-	init_waitqueue_entry(&entry->wait, current);
+	init_waitqueue_func_entry(&entry->wait, pollwake);
+	entry->wait.private = pwq;
 	add_wait_queue(wait_address, &entry->wait);
 }
 
+int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+			  ktime_t *expires, unsigned long slack)
+{
+	int rc = -EINTR;
+
+	set_current_state(state);
+	if (!pwq->triggered)
+		rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
+	__set_current_state(TASK_RUNNING);
+
+	/* clear triggered for the next iteration */
+	pwq->triggered = 0;
+
+	return rc;
+}
+EXPORT_SYMBOL(poll_schedule_timeout);
+
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
  * @to:		pointer to timespec variable for the final timeout
@@ -340,8 +374,6 @@ int do_select(int n, fd_set_bits *fds, s
 	for (;;) {
 		unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
 
-		set_current_state(TASK_INTERRUPTIBLE);
-
 		inp = fds->in; outp = fds->out; exp = fds->ex;
 		rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
 
@@ -411,10 +443,10 @@ int do_select(int n, fd_set_bits *fds, s
 			to = &expire;
 		}
 
-		if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+		if (!poll_schedule_timeout(&table, TASK_INTERRUPTIBLE,
+					   to, slack))
 			timed_out = 1;
 	}
-	__set_current_state(TASK_RUNNING);
 
 	poll_freewait(&table);
 
@@ -666,7 +698,6 @@ static int do_poll(unsigned int nfds,  s
 	for (;;) {
 		struct poll_list *walk;
 
-		set_current_state(TASK_INTERRUPTIBLE);
 		for (walk = list; walk != NULL; walk = walk->next) {
 			struct pollfd * pfd, * pfd_end;
 
@@ -709,10 +740,9 @@ static int do_poll(unsigned int nfds,  s
 			to = &expire;
 		}
 
-		if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+		if (!poll_schedule_timeout(wait, TASK_INTERRUPTIBLE, to, slack))
 			timed_out = 1;
 	}
-	__set_current_state(TASK_RUNNING);
 	return count;
 }
 
diff -puN include/linux/poll.h~poll-allow-f_op-poll-to-sleep-take-4 include/linux/poll.h
--- a/include/linux/poll.h~poll-allow-f_op-poll-to-sleep-take-4
+++ a/include/linux/poll.h
@@ -46,9 +46,9 @@ static inline void init_poll_funcptr(pol
 }
 
 struct poll_table_entry {
-	struct file * filp;
+	struct file *filp;
 	wait_queue_t wait;
-	wait_queue_head_t * wait_address;
+	wait_queue_head_t *wait_address;
 };
 
 /*
@@ -56,7 +56,9 @@ struct poll_table_entry {
  */
 struct poll_wqueues {
 	poll_table pt;
-	struct poll_table_page * table;
+	struct poll_table_page *table;
+	struct task_struct *polling_task;
+	int triggered;
 	int error;
 	int inline_index;
 	struct poll_table_entry inline_entries[N_INLINE_POLL_ENTRIES];
@@ -64,6 +66,13 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
+extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+				 ktime_t *expires, unsigned long slack);
+
+static inline int poll_schedule(struct poll_wqueues *pwq, int state)
+{
+	return poll_schedule_timeout(pwq, state, NULL, 0);
+}
 
 /*
  * Scaleable version of the fd_set.
_

Patches currently in -mm which might be from htejun@gmail.com are

piix3-warn-softer-about-enabling-passive-release.patch
dma-apitxt-fix-description-of-pci_map_sg-dma_map_sg-scatterlists-handling.patch
poll-allow-f_op-poll-to-sleep-take-4.patch


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-11-28 16:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-25 17:30 + poll-allow-f_op-poll-to-sleep-take-4.patch added to -mm tree Oleg Nesterov
2008-11-25 21:08 ` Davide Libenzi
2008-11-26  4:33 ` Tejun Heo
2008-11-26  4:40   ` [PATCH] poll: allow f_op->poll to sleep, take#5 Tejun Heo
2008-11-26  6:27     ` Davide Libenzi
2008-11-26  6:39       ` Tejun Heo
2008-11-26 19:36         ` Davide Libenzi
2008-11-27  9:18           ` Tejun Heo
2008-11-27  9:37             ` [PATCH] poll: allow f_op->poll to sleep, take#6 Tejun Heo
2008-11-28  4:35               ` Davide Libenzi
2008-11-28  4:44                 ` Tejun Heo
2008-11-28 16:08                 ` Oleg Nesterov
2008-11-27 20:13           ` [PATCH] poll: allow f_op->poll to sleep, take#5 Oleg Nesterov
2008-11-26  4:49   ` + poll-allow-f_op-poll-to-sleep-take-4.patch added to -mm tree Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2008-11-24 22:37 akpm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.