* poll() in 2.6 and beyond
@ 2004-03-02 18:21 Richard B. Johnson
2004-03-02 20:04 ` Roland Dreier
0 siblings, 1 reply; 29+ messages in thread
From: Richard B. Johnson @ 2004-03-02 18:21 UTC (permalink / raw)
To: Linux kernel
Poll in 2.6.0; when a driver routine calls poll_wait()
it returns <<immediately>> to somewhere in the
kernel, then waits for my wake_up_interuptible(), before
returning control to a user sleeping in poll(). This means
that the user gets the wrong poll return value! It
doesn't get the value it was given as a result of the
interrupt, but the value that existed (0) before the
interrupt occurred.
Poll should not return from poll_wait() until it gets
a wake_up_interruptible() call. The wait variable,
info->pwait, below, has been initialized by executing
init_waitqueue_head(&info->pwait) in the initialization
code. This code works in 2.4.24.
What do I do to make it work in 2.6.0 and beyond? There
are no hints in the 2.6 drivers as they all seem to be
written like this and they presumably work.
/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* The interrupt service routine.
*/
static void pci_isr(int irq, void *p, struct pt_regs *regs)
{
spin_lock(&info->lock);
DEB(printk(KERN_INFO"%s : Interrupt!\n", devname));
info->poll_flag |= POLLIN|DEF_POLL;
wake_up_interruptible(&info->pwait);
spin_unlock(&info->lock);
}
/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* Device poll routine.
*/
static size_t poll(struct file *fp, struct poll_table_struct *wait)
{
size_t poll_flag;
size_t flags;
DEB(printk(KERN_INFO"%s : poll called\n", devname));
poll_wait(fp, &info->pwait, wait);
lockit(TRUE, &flags);
poll_flag = info->poll_flag;
info->poll_flag = 0;
lockit(FALSE, &flags);
DEB(printk(KERN_INFO"%s : poll returns\n", devname));
return poll_flag;
}
Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: poll() in 2.6 and beyond 2004-03-02 18:21 poll() in 2.6 and beyond Richard B. Johnson @ 2004-03-02 20:04 ` Roland Dreier 2004-03-02 20:24 ` Richard B. Johnson 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-03-02 20:04 UTC (permalink / raw) To: root; +Cc: Linux kernel Richard> Poll in 2.6.0; when a driver routine calls poll_wait() it Richard> returns <<immediately>> to somewhere in the kernel, then Richard> waits for my wake_up_interuptible(), before returning Richard> control to a user sleeping in poll(). This means that the Richard> user gets the wrong poll return value! It doesn't get the Richard> value it was given as a result of the interrupt, but the Richard> value that existed (0) before the interrupt occurred. Nothing has changed in 2.6 that I know of. poll_wait() always returned immediately to the driver. The driver's poll method is supposed to call poll_wait() on the wait queues that indicate a change in poll status, and then return with the current status. Read the description of "poll and select" in LDD: <http://www.xml.com/ldd/chapter/book/ch05.html#t3> - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 20:04 ` Roland Dreier @ 2004-03-02 20:24 ` Richard B. Johnson 2004-03-02 21:00 ` Roland Dreier 0 siblings, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-02 20:24 UTC (permalink / raw) To: Roland Dreier; +Cc: Linux kernel On Tue, 2 Mar 2004, Roland Dreier wrote: > Richard> Poll in 2.6.0; when a driver routine calls poll_wait() it > Richard> returns <<immediately>> to somewhere in the kernel, then > Richard> waits for my wake_up_interuptible(), before returning > Richard> control to a user sleeping in poll(). This means that the > Richard> user gets the wrong poll return value! It doesn't get the > Richard> value it was given as a result of the interrupt, but the > Richard> value that existed (0) before the interrupt occurred. > > Nothing has changed in 2.6 that I know of. poll_wait() always > returned immediately to the driver. The driver's poll method is > supposed to call poll_wait() on the wait queues that indicate a change > in poll status, and then return with the current status. > > Read the description of "poll and select" in LDD: > <http://www.xml.com/ldd/chapter/book/ch05.html#t3> > > - Roland I'm talking about the driver! When a open fd called poll() or select(), in user-mode code, the driver's poll() was called, and the driver's poll() would call poll_wait(). Poll_wait() used to NOT return until the driver executed wake_up_interruptible() on that wait-queue. When poll_wait() returned, the driver would return to the caller with the new poll- status. Now, when the driver calls poll_wait(), it returns immediately to the driver. The driver then returns with the wrong poll status because nothing changed yet. This return, doesn't go back to the user-mode caller, but remains within the kernel until a wake_up_interruptible() has been received. Then it returns to the original caller, with the (wrong) status that existed before the wake_up_interruptible(). When the cycle repeats, the status from the previous event gets returned, so if the events are all the same (POLLIN), only the first one is wrong and nobody is the wiser. However, when there are multiple events, they appear to the user out-of-sequence and muck up user-mode code. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 20:24 ` Richard B. Johnson @ 2004-03-02 21:00 ` Roland Dreier 2004-03-02 21:26 ` Richard B. Johnson 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-03-02 21:00 UTC (permalink / raw) To: root; +Cc: Linux kernel Richard> I'm talking about the driver! When a open fd called Richard> poll() or select(), in user-mode code, the driver's Richard> poll() was called, and the driver's poll() would call Richard> poll_wait(). Poll_wait() used to NOT return until the Richard> driver executed wake_up_interruptible() on that Richard> wait-queue. When poll_wait() returned, the driver would Richard> return to the caller with the new poll- status. I don't think so. Even in kernel 2.4, poll_wait() just calls __pollwait(). I don't see anything in __pollwait() that sleeps. Think about it. How would the kernel handle userspace calling poll() with more than one file descriptor if each individual driver slept? I'll repeat my earlier suggestion. Read the description of "poll and select" in LDD: <http://www.xml.com/ldd/chapter/book/ch05.html#t3> If you refuse to understand the documented interface, I don't think anyone can help you. - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 21:00 ` Roland Dreier @ 2004-03-02 21:26 ` Richard B. Johnson 2004-03-02 21:39 ` Roland Dreier 0 siblings, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-02 21:26 UTC (permalink / raw) To: Roland Dreier; +Cc: Linux kernel On Tue, 2 Mar 2004, Roland Dreier wrote: > Richard> I'm talking about the driver! When a open fd called > Richard> poll() or select(), in user-mode code, the driver's > Richard> poll() was called, and the driver's poll() would call > Richard> poll_wait(). Poll_wait() used to NOT return until the > Richard> driver executed wake_up_interruptible() on that > Richard> wait-queue. When poll_wait() returned, the driver would > Richard> return to the caller with the new poll- status. > > I don't think so. Even in kernel 2.4, poll_wait() just calls > __pollwait(). I don't see anything in __pollwait() that sleeps. > Think about it. How would the kernel handle userspace calling poll() > with more than one file descriptor if each individual driver slept? > Well what to you think they do? Spin? > I'll repeat my earlier suggestion. Read the description of "poll and > select" in LDD: > <http://www.xml.com/ldd/chapter/book/ch05.html#t3> > > If you refuse to understand the documented interface, I don't think > anyone can help you. I am quite familiar with the operation of poll(), thank you. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 21:26 ` Richard B. Johnson @ 2004-03-02 21:39 ` Roland Dreier 2004-03-02 21:59 ` Richard B. Johnson 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-03-02 21:39 UTC (permalink / raw) To: root; +Cc: Linux kernel Roland> I don't think so. Even in kernel 2.4, poll_wait() just Roland> calls __pollwait(). I don't see anything in __pollwait() Roland> that sleeps. Think about it. How would the kernel handle Roland> userspace calling poll() with more than one file Roland> descriptor if each individual driver slept? Richard> Well what to you think they do? Spin? I don't know why I continue this, but.... can you point out the line in the kernel 2.4 source for __pollwait() where it sleeps? Or think about it. Suppose a user called poll() with two fds, each of which belonged to a different driver. Suppose each driver slept in its poll method. If the first driver never became ready (and stayed asleep), how would poll() return to user space if the second driver became ready? What actually happens is that each driver registers with the kernel the wait queues that it will wake up when it becomes ready. But the core kernel is responsible for sleeping, outside of the driver code. Really, read: <http://www.xml.com/ldd/chapter/book/ch05.html#t3> You might believe you're familiar with poll() but I think it would help to read what the Linux Device Drivers book has to say. It answers the exact question you're asking, and if you don't believe me you might believe the definitive published book. - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 21:39 ` Roland Dreier @ 2004-03-02 21:59 ` Richard B. Johnson 2004-03-02 22:41 ` Dave Dillow 2004-03-02 22:56 ` Roland Dreier 0 siblings, 2 replies; 29+ messages in thread From: Richard B. Johnson @ 2004-03-02 21:59 UTC (permalink / raw) To: Roland Dreier; +Cc: Linux kernel On Tue, 2 Mar 2004, Roland Dreier wrote: > Roland> I don't think so. Even in kernel 2.4, poll_wait() just > Roland> calls __pollwait(). I don't see anything in __pollwait() > Roland> that sleeps. Think about it. How would the kernel handle > Roland> userspace calling poll() with more than one file > Roland> descriptor if each individual driver slept? > > Richard> Well what to you think they do? Spin? > > I don't know why I continue this, but.... can you point out the line > in the kernel 2.4 source for __pollwait() where it sleeps? > You are playing games with semantics because you are wrong. The code in fs/select.c about line 101, adds the current caller to the wait-queue. This wait-queue is the mechanism by which the current caller sleeps, i.e., gives the CPU up to somebody else. That caller's thread will not return past that line until a wake_up_interruptible() call has been made for/from the driver or interface handling that file descriptor. In this manner any number of file discriptors may be handled because the poll() routine for each of then makes its own entry into the wait-queue using the described mechanism. > Or think about it. Suppose a user called poll() with two fds, each of > which belonged to a different driver. Suppose each driver slept in > its poll method. If the first driver never became ready (and stayed > asleep), how would poll() return to user space if the second driver > became ready? > Explained above. [SNIPPED bullshit] Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 21:59 ` Richard B. Johnson @ 2004-03-02 22:41 ` Dave Dillow 2004-03-02 22:56 ` Roland Dreier 1 sibling, 0 replies; 29+ messages in thread From: Dave Dillow @ 2004-03-02 22:41 UTC (permalink / raw) To: root; +Cc: Roland Dreier, Linux kernel On Tue, 2004-03-02 at 16:59, Richard B. Johnson wrote: > On Tue, 2 Mar 2004, Roland Dreier wrote: > > I don't know why I continue this, but.... can you point out the line > > in the kernel 2.4 source for __pollwait() where it sleeps? > The code in fs/select.c about line 101, adds the current caller > to the wait-queue. This wait-queue is the mechanism by which the > current caller sleeps, i.e., gives the CPU up to somebody else. Actually, no, it does not work that way. The wait queue is the mechanism by which a process can be *woken up* . The call to add_wait_queue() does not put the process to sleep. The process is put to sleep in do_poll(), using schedule_timeout() with the current state as TASK_INTERRUPTIBLE. This is on line 453. Your driver will eventually have to wake up sleepers on the queue. do_poll() will also wake up when the timeout expires, or a signal is sent to the poll()'ing process. These semantics have not changed since 2.4. The implementation has changed a bit, but the driver interface has not changed. Read the documentation the Roland pointed you to. -- Dave Dillow <dave@thedillows.org> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 21:59 ` Richard B. Johnson 2004-03-02 22:41 ` Dave Dillow @ 2004-03-02 22:56 ` Roland Dreier 2004-03-02 23:16 ` Richard B. Johnson 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-03-02 22:56 UTC (permalink / raw) To: root; +Cc: Linux kernel Richard> You are playing games with semantics because you are Richard> wrong. The code in fs/select.c about line 101, adds the Richard> current caller to the wait-queue. I assume you mean the call to add_wait_queue() there. That does not sleep. Look at the implementation. add_wait_queue() is defined in kernel/fork.c -- it just does some locking and calls __add_wait_queue(). __add_wait_queue() is really nothing more than a list_add(). There's nothing more to it and nothing that goes to sleep. Where do you think add_wait_queue() goes to sleep? Richard> This wait-queue is the mechanism by which the current Richard> caller sleeps, i.e., gives the CPU up to somebody else. Richard> That caller's thread will not return past that line until Richard> a wake_up_interruptible() call has been made for/from the Richard> driver or interface handling that file descriptor. In Richard> this manner any number of file discriptors may be handled Richard> because the poll() routine for each of then makes its own Richard> entry into the wait-queue using the described mechanism. But there's only one thread around: the user space process that called into the kernel via poll(). If the first driver goes to sleep, which thread do you think is going to wake up and call into the second driver? - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 22:56 ` Roland Dreier @ 2004-03-02 23:16 ` Richard B. Johnson 2004-03-02 23:21 ` Roland Dreier 0 siblings, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-02 23:16 UTC (permalink / raw) To: Roland Dreier; +Cc: Linux kernel On Tue, 2 Mar 2004, Roland Dreier wrote: > Richard> You are playing games with semantics because you are > Richard> wrong. The code in fs/select.c about line 101, adds the > Richard> current caller to the wait-queue. > > I assume you mean the call to add_wait_queue() there. That does not > sleep. Look at the implementation. add_wait_queue() is defined in > kernel/fork.c -- it just does some locking and calls > __add_wait_queue(). __add_wait_queue() is really nothing more than > a list_add(). There's nothing more to it and nothing that goes to > sleep. Where do you think add_wait_queue() goes to sleep? > > Richard> This wait-queue is the mechanism by which the current > Richard> caller sleeps, i.e., gives the CPU up to somebody else. > Richard> That caller's thread will not return past that line until > Richard> a wake_up_interruptible() call has been made for/from the > Richard> driver or interface handling that file descriptor. In > Richard> this manner any number of file discriptors may be handled > Richard> because the poll() routine for each of then makes its own > Richard> entry into the wait-queue using the described mechanism. > > But there's only one thread around: the user space process that called > into the kernel via poll(). If the first driver goes to sleep, which > thread do you think is going to wake up and call into the second > driver? > There are two routines where the CPU is actually given up, do_select() and do_poll(). Search for schedule_timeout(). Once the scheduler has the CPU, it's available for any of the other drivers. It's also available for the timer queues, other tasks, and the interrupts. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 23:16 ` Richard B. Johnson @ 2004-03-02 23:21 ` Roland Dreier 0 siblings, 0 replies; 29+ messages in thread From: Roland Dreier @ 2004-03-02 23:21 UTC (permalink / raw) To: root; +Cc: Linux kernel Richard> There are two routines where the CPU is actually given Richard> up, do_select() and do_poll(). Search for Richard> schedule_timeout(). Once the scheduler has the CPU, it's Richard> available for any of the other drivers. It's also Richard> available for the timer queues, other tasks, and the Richard> interrupts. OK, fine, I can't argue with that. But it has nothing to do with the discussion at hand. You still haven't said where poll_wait() sleeps in kernel 2.4. You claimed it was in add_wait_queue(), but add_wait_queue() doesn't sleep (it's just a list_add() guarded by a lock). Also, why would another driver for the same poll() call run? There's only one thread around that cares about this call to poll() -- the userspace process that originally called poll(). If one driver sleeps, no other drivers will run until that driver wakes up. - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1vmPm-4lU-11@gated-at.bofh.it>]
[parent not found: <1vonq-6dr-37@gated-at.bofh.it>]
[parent not found: <1voGY-6vC-41@gated-at.bofh.it>]
[parent not found: <1vpjt-7dl-17@gated-at.bofh.it>]
[parent not found: <1vpCV-7wY-41@gated-at.bofh.it>]
[parent not found: <1vpWa-7Py-19@gated-at.bofh.it>]
* Re: poll() in 2.6 and beyond [not found] ` <1vpWa-7Py-19@gated-at.bofh.it> @ 2004-03-02 22:53 ` Bill Davidsen 2004-03-02 22:57 ` Roland Dreier 2004-03-02 23:32 ` Richard B. Johnson 0 siblings, 2 replies; 29+ messages in thread From: Bill Davidsen @ 2004-03-02 22:53 UTC (permalink / raw) To: Roland Dreier; +Cc: Linux Kernel Mailing List Roland Dreier wrote: > I don't know why I continue this, but.... can you point out the line > in the kernel 2.4 source for __pollwait() where it sleeps? > > Or think about it. Suppose a user called poll() with two fds, each of > which belonged to a different driver. Suppose each driver slept in > its poll method. If the first driver never became ready (and stayed > asleep), how would poll() return to user space if the second driver > became ready? > > What actually happens is that each driver registers with the kernel > the wait queues that it will wake up when it becomes ready. But the > core kernel is responsible for sleeping, outside of the driver code. Could you maybe go back to the initial report, which is that after poll() gets wrong status? It's nice to argue about where the process waits, but the issue is if it gets the same status with 2.4 and 2.6, and if not which one should be fixed. Richard: can you show this with a small demo program? I assume you didn't find this just by reading code ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 22:53 ` Bill Davidsen @ 2004-03-02 22:57 ` Roland Dreier 2004-03-02 23:32 ` Richard B. Johnson 1 sibling, 0 replies; 29+ messages in thread From: Roland Dreier @ 2004-03-02 22:57 UTC (permalink / raw) To: Bill Davidsen; +Cc: Linux Kernel Mailing List Bill> Could you maybe go back to the initial report, which is that Bill> after poll() gets wrong status? It's nice to argue about Bill> where the process waits, but the issue is if it gets the Bill> same status with 2.4 and 2.6, and if not which one should be Bill> fixed. I'm sure the problem is a buggy driver. The original question represents a complete misunderstanding of how poll() is implemented, so it's no surprise that the driver breaks. - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 22:53 ` Bill Davidsen 2004-03-02 22:57 ` Roland Dreier @ 2004-03-02 23:32 ` Richard B. Johnson 2004-03-03 0:07 ` John Muir 2004-03-03 3:57 ` David Dillow 1 sibling, 2 replies; 29+ messages in thread From: Richard B. Johnson @ 2004-03-02 23:32 UTC (permalink / raw) To: Bill Davidsen; +Cc: Roland Dreier, Linux Kernel Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 2366 bytes --] On Tue, 2 Mar 2004, Bill Davidsen wrote: > Roland Dreier wrote: > > > I don't know why I continue this, but.... can you point out the line > > in the kernel 2.4 source for __pollwait() where it sleeps? > > > > Or think about it. Suppose a user called poll() with two fds, each of > > which belonged to a different driver. Suppose each driver slept in > > its poll method. If the first driver never became ready (and stayed > > asleep), how would poll() return to user space if the second driver > > became ready? > > > > What actually happens is that each driver registers with the kernel > > the wait queues that it will wake up when it becomes ready. But the > > core kernel is responsible for sleeping, outside of the driver code. > > Could you maybe go back to the initial report, which is that after > poll() gets wrong status? It's nice to argue about where the process > waits, but the issue is if it gets the same status with 2.4 and 2.6, and > if not which one should be fixed. > > Richard: can you show this with a small demo program? I assume you > didn't find this just by reading code ;-) Yes. The code I attached earlier shows that the poll() in a driver gets called (correctly), then it calls poll_wait(). Unfortunately the call to poll_wait() returns immediately so that the return value from the driver's poll() is whatever it was before some event occurred that the driver was going to signal with wake_up_interruptible(). The attached code clearly demonstrates this. It doesn't even contain any code to execute wake_up_interruptible(). When an event occurs in the driver that would have set the poll_flag to POLLIN, and executed wake_up_interruptible, the old status, the stuff that was returned when poll_wait() returned immediately instead of waiting for the wake up, gets returned to the user-mode program. Now, if the user-mode program calls poll() again, which is likely, it gets the status that was returned from the previous event so it "seems" to work. However, it is always one event behind so you need two events to recognize the first one. I attached the module and demo program again. It clearly shows that poll_wait() gets called and then immediately returns without waiting... Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. [-- Attachment #2: Type: APPLICATION/octet-stream, Size: 1453 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 23:32 ` Richard B. Johnson @ 2004-03-03 0:07 ` John Muir 2004-03-03 1:18 ` Richard B. Johnson 2004-03-03 3:57 ` David Dillow 1 sibling, 1 reply; 29+ messages in thread From: John Muir @ 2004-03-03 0:07 UTC (permalink / raw) To: root; +Cc: linux-kernel Richard B. Johnson wrote: > > Could you maybe go back to the initial report, which is that after > > poll() gets wrong status? It's nice to argue about where the process > > waits, but the issue is if it gets the same status with 2.4 and 2.6, > and > > if not which one should be fixed. > > > > Richard: can you show this with a small demo program? I assume you > > didn't find this just by reading code ;-) > > Yes. The code I attached earlier shows that the poll() in a driver > gets called (correctly), then it calls poll_wait(). Unfortunately > the call to poll_wait() returns immediately so that the return > value from the driver's poll() is whatever it was before some > event occurred that the driver was going to signal with > wake_up_interruptible(). > The documentation referred to earlier (http://www.xml.com/ldd/chapter/book/ch05.html#t3) clearly states that the poll function implementation for a device should: "1. Call /poll_wait/ on one or more wait queues that could indicate a change in the poll status. 2. Return a bit mask describing operations that could be immediately performed without blocking." What happens is if your device has data ready RIGHT NOW (without any waiting), the information regarding that is returned. Now, if you look closely at the implementation of do_poll(), it will loop forever until any device returns that data is available (or the timeout occurs). After data is available, do_pollfd() function no longer adds the devices to the wait queue. What this means is that the first time through each device is added to the wait queue. After the process is woken up from schedule_timeout (or the timeout occurs), then it will loop through each device again to either add that device to the wait queue again, or determine that there is data ready to be read/written, or an error or whatever. do_pollfd() then increases count, and do_poll() exits from it's loop (and wait queues are cleaned-up in sys_poll()). So, yes, the poll_wait() returns immediately, it should not wait, and your poll method should return the device's CURRENT status in the poll mask. Don't worry, because when your device wakes the waiting process when data is ready, the device's poll function is called again and you again can return the CURRENT status. Please let me know if I'm misunderstanding this, but quite frankly, poll_wait CANNOT wait, for the very reasons described in previous posts: when multiple devices are polled, then poll_wait is called FOR EACH DEVICE. Cheers, ..John ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 0:07 ` John Muir @ 2004-03-03 1:18 ` Richard B. Johnson 2004-03-03 4:04 ` Roland Dreier 0 siblings, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 1:18 UTC (permalink / raw) To: John Muir; +Cc: linux-kernel On Tue, 2 Mar 2004, John Muir wrote: > Richard B. Johnson wrote: > > > > Could you maybe go back to the initial report, which is that after > > > poll() gets wrong status? It's nice to argue about where the process > > > waits, but the issue is if it gets the same status with 2.4 and 2.6, > > and > > > if not which one should be fixed. > > > > > > Richard: can you show this with a small demo program? I assume you > > > didn't find this just by reading code ;-) > > > > Yes. The code I attached earlier shows that the poll() in a driver > > gets called (correctly), then it calls poll_wait(). Unfortunately > > the call to poll_wait() returns immediately so that the return > > value from the driver's poll() is whatever it was before some > > event occurred that the driver was going to signal with > > wake_up_interruptible(). > > > The documentation referred to earlier > (http://www.xml.com/ldd/chapter/book/ch05.html#t3) clearly states that > the poll function implementation for a device should: > "1. Call /poll_wait/ on one or more wait queues that could indicate a > change in the poll status. > 2. Return a bit mask describing operations that could be immediately > performed without blocking." > What happens is if your device has data ready RIGHT NOW (without any > waiting), the information regarding that is returned. > > Now, if you look closely at the implementation of do_poll(), it will > loop forever until any device returns that data is available (or the > timeout occurs). After data is available, do_pollfd() function no longer > adds the devices to the wait queue. > > What this means is that the first time through each device is added to > the wait queue. After the process is woken up from schedule_timeout (or > the timeout occurs), then it will loop through each device again to > either add that device to the wait queue again, or determine that there > is data ready to be read/written, or an error or whatever. do_pollfd() > then increases count, and do_poll() exits from it's loop (and wait > queues are cleaned-up in sys_poll()). > > So, yes, the poll_wait() returns immediately, it should not wait, and > your poll method should return the device's CURRENT status in the poll > mask. Don't worry, because when your device wakes the waiting process > when data is ready, the device's poll function is called again and you > again can return the CURRENT status. > Well the device's poll function isn't getting called the second time with 2.6.0. I never checked it in 2.4.x because it always worked. This problem occurs in a driver that only returns the fact that one event occurred. When it failed to report the event when built with a newer kernel, I added diagnostics which showed that the poll in the driver was only called once --and that the return from poll_wait happened immediately. So, if the poll_wait isn't a wait-function, but just some add-wakeup to the queue function, then its name probably should have been changed when it changed. At one time it did, truly, wait until it was awakened with wake_up_interruptible. > Please let me know if I'm misunderstanding this, but quite frankly, > poll_wait CANNOT wait, for the very reasons described in previous posts: > when multiple devices are polled, then poll_wait is called FOR EACH DEVICE. > > Cheers, > > ..John > > Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 1:18 ` Richard B. Johnson @ 2004-03-03 4:04 ` Roland Dreier 2004-03-03 12:38 ` Richard B. Johnson 0 siblings, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-03-03 4:04 UTC (permalink / raw) To: root; +Cc: linux-kernel Richard> Well the device's poll function isn't getting called the Richard> second time with 2.6.0. I never checked it in 2.4.x Richard> because it always worked. This problem occurs in a Richard> driver that only returns the fact that one event Richard> occurred. When it failed to report the event when built Richard> with a newer kernel, I added diagnostics which showed Richard> that the poll in the driver was only called once --and Richard> that the return from poll_wait happened immediately. Your driver is buggy. It's not surprising since you fundamentally don't understand the kernel interface you're trying to use. Richard> So, if the poll_wait isn't a wait-function, but just some Richard> add-wakeup to the queue function, then its name probably Richard> should have been changed when it changed. At one time it Richard> did, truly, wait until it was awakened with Richard> wake_up_interruptible. When did it change? Show me a kernel version where poll_wait() waited until the driver woke it up. (Kernel versions at least as far back as 1.0 are readily available from kernel.org, so it should be easy for you) - Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 4:04 ` Roland Dreier @ 2004-03-03 12:38 ` Richard B. Johnson 2004-03-03 14:29 ` Davide Libenzi 0 siblings, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 12:38 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel On Tue, 2 Mar 2004, Roland Dreier wrote: > Richard> Well the device's poll function isn't getting called the > Richard> second time with 2.6.0. I never checked it in 2.4.x > Richard> because it always worked. This problem occurs in a > Richard> driver that only returns the fact that one event > Richard> occurred. When it failed to report the event when built > Richard> with a newer kernel, I added diagnostics which showed > Richard> that the poll in the driver was only called once --and > Richard> that the return from poll_wait happened immediately. > > Your driver is buggy. It's not surprising since you fundamentally > don't understand the kernel interface you're trying to use. > > Richard> So, if the poll_wait isn't a wait-function, but just some > Richard> add-wakeup to the queue function, then its name probably > Richard> should have been changed when it changed. At one time it > Richard> did, truly, wait until it was awakened with > Richard> wake_up_interruptible. > > When did it change? Show me a kernel version where poll_wait() waited > until the driver woke it up. (Kernel versions at least as far back as > 1.0 are readily available from kernel.org, so it should be easy for > you) > > - Roland > I never said the DRIVER woke up, but that poll sleeps. FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND 100 0 1 0 6 0 224 148 do_select S ? 0:01 /sbin/init auto [SNIPPED....] 40 0 115 114 9 0 1452 728 pipe_wait S ? 0:00 /usr/sbin/nmbd -D 100000 0 11109 9459 16 0 1044 476 R 1 0:00 ps -laxw 0 0 11107 9504 12 0 692 216 do_poll S 2 0:00 ./tester .... and if it DOESN'T then ps is buggy and/or the entry in /proc in buggy. Clearly this task is sleeping in do_poll. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 12:38 ` Richard B. Johnson @ 2004-03-03 14:29 ` Davide Libenzi 0 siblings, 0 replies; 29+ messages in thread From: Davide Libenzi @ 2004-03-03 14:29 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Roland Dreier, linux-kernel On Wed, 3 Mar 2004, Richard B. Johnson wrote: > Clearly this task is sleeping in do_poll. I don't believe anyone ever argued that do_poll() sleeps. You were saying that poll_wait() was sleeping, that is different. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-02 23:32 ` Richard B. Johnson 2004-03-03 0:07 ` John Muir @ 2004-03-03 3:57 ` David Dillow 2004-03-03 18:23 ` Richard B. Johnson 1 sibling, 1 reply; 29+ messages in thread From: David Dillow @ 2004-03-03 3:57 UTC (permalink / raw) To: root; +Cc: Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Tue, 2004-03-02 at 18:32, Richard B. Johnson wrote: > Yes. The code I attached earlier shows that the poll() in a driver > gets called (correctly), then it calls poll_wait(). Unfortunately > the call to poll_wait() returns immediately so that the return > value from the driver's poll() is whatever it was before some > event occurred that the driver was going to signal with > wake_up_interruptible(). You've been handed a clue enough times now that you should understand that poll_wait() does not, and has never, put the process to sleep. If you can show a case where do_poll() returns stale data, then by all means do so. We will be happy to fix any such error in the kernel. You say do_poll() looses the status returned from your driver's poll method. If your driver is truly returning a nonzero status from the poll() method call, then a simple read of the code in do_pollfd() will show that the only way it looses information from that event mask is if your user space is not setting that event type in pollfd.events. If I were you, I'd check two things: 1) that your poll method is really returning a non-zero status when you think it is 2) that your user space program is really asking for all events you think it is I think you'll find your problem is not this well-used mechanism in the kernel. Dave ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 3:57 ` David Dillow @ 2004-03-03 18:23 ` Richard B. Johnson 2004-03-03 19:29 ` Dave Dillow 2004-03-03 22:25 ` Linus Torvalds 0 siblings, 2 replies; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 18:23 UTC (permalink / raw) To: David Dillow; +Cc: Bill Davidsen, Roland Dreier, Linux Kernel Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 5554 bytes --] On Tue, 2 Mar 2004, David Dillow wrote: > On Tue, 2004-03-02 at 18:32, Richard B. Johnson wrote: > > Yes. The code I attached earlier shows that the poll() in a driver > > gets called (correctly), then it calls poll_wait(). Unfortunately > > the call to poll_wait() returns immediately so that the return > > value from the driver's poll() is whatever it was before some > > event occurred that the driver was going to signal with > > wake_up_interruptible(). > > You've been handed a clue enough times now that you should understand > that poll_wait() does not, and has never, put the process to sleep. > > If you can show a case where do_poll() returns stale data, then by all > means do so. We will be happy to fix any such error in the kernel. > > You say do_poll() looses the status returned from your driver's poll > method. If your driver is truly returning a nonzero status from the > poll() method call, then a simple read of the code in do_pollfd() will > show that the only way it looses information from that event mask is if > your user space is not setting that event type in pollfd.events. > > If I were you, I'd check two things: > 1) that your poll method is really returning a non-zero status when you > think it is > 2) that your user space program is really asking for all events you > think it is > > I think you'll find your problem is not this well-used mechanism in the > kernel. > > Dave The very great problems that exist with poll on linux-2.6.0 are being quashed by those who just like to argue. Therefore, I wrote some code that emulates the environment in which I discovered the poll failure. Experts can decide whatever they want about the inner workings of poll(). I supposed that if `ps` showed that a task was sleeping in poll() then it must be sleeping in poll(). So, even it that's wrong, here is irrefutable proof that there is a problem with polling events getting lost on 2.6.0. The attached code will execute without errors in 2.4.24 and below. Here is the test code running on Linux-2.4.24 Script started on Wed Mar 3 10:24:05 2004 # make gcc -Wall -O2 -D__KERNEL__ -DMODULE -DMAJOR_NR=199 -I/usr/src/linux-2.4.24/include -c -o thingy.o thingy.c as -o rtc_hwr.o rtc_hwr.S ld -i -o driver.o thingy.o rtc_hwr.o gcc -Wall -o tester -O2 tester.c rm -f THING mknod THING c 199 0 # insmod driver.o # lsmod Module Size Used by driver 1928 0 (unused) nfs 48008 0 (autoclean) lockd 37308 0 (autoclean) [nfs] sunrpc 66236 0 (autoclean) [nfs lockd] vxidrvr 2748 0 (unused) vximsg 5328 0 [vxidrvr] ramdisk 5596 0 (unused) ipchains 42204 0 nls_cp437 4376 4 (autoclean) 3c59x 28252 1 isofs 17368 0 (unused) loop 8856 0 sr_mod 12188 0 (unused) cdrom 27936 0 [sr_mod] BusLogic 35832 7 sd_mod 10472 14 scsi_mod 56236 3 [sr_mod BusLogic sd_mod] # ./tester You have mail in /var/spool/mail/root # exit exit Script done on Wed Mar 3 11:05:56 2004 This ran for about 1/2 hour ( 10:24 -> 11:05) with no errors. This is the same thing, running on this exact same machine, booted with Linux-2.6.0-test8 Script started on Wed Mar 3 13:00:18 2004 # make gcc -Wall -O2 -D__KERNEL__ -DMODULE -DMAJOR_NR=199 -I/usr/src/linux-2.6.0-test8/include -c -o thingy.o thingy.c as -o rtc_hwr.o rtc_hwr.S ld -i -o driver.o thingy.o rtc_hwr.o gcc -Wall -o tester -O2 tester.c rm -f THING mknod THING c 199 0 # insmod driver.o # ./tester ERROR: (51 != 1) Lost events : 50 ERROR: (59371 != 59369) Lost events : 52 ERROR: (75501 != 75498) Lost events : 55 ERROR: (94511 != 94509) Lost events : 57 ERROR: (164660 != 164658) Lost events : 59 ERROR: (194867 != 194864) Lost events : 62 ERROR: (206892 != 206890) Lost events : 64 ERROR: (214232 != 214230) Lost events : 66 ERROR: (308340 != 308337) Lost events : 69 ERROR: (353062 != 353059) Lost events : 72 ERROR: (444541 != 444539) Lost events : 74 ERROR: (446516 != 446514) Lost events : 76 ERROR: (447906 != 447904) Lost events : 78 # exit exit Script done on Wed Mar 3 13:04:39 2004 It ran from 13:00 to 13:04, accumulating 78 errors. A review of the test code shows that the driver uses IRQ8 from the RTC timer chip. The driver doesn't care if some interrupt ticks are lost because it just increments a memory variable every time that an interrupt occurs. After that variable is incremented, the interrupt service routine sets a global flag to POLLIN and sends a wake_up_interruptible() to whomever is sleeping in poll(). Spin-locks are put around critical sections to prevent undefined results. In the user-mode code, the task sleeping in poll() reads the variable using read(). The user-mode code doesn't care what the value is, only that it must be one-greater than the last one read. If it isn't, an error message is written to standard error. Observations: If the call to mlockall() is removed in the test-code, the behavior seems to be better, just the opposite of what one would suspect. The priority (nice value) doesn't seem to affect anything. The code runs without any errors on 2.4.24, 2.4.22, 2.4.21, and 2.4.18. The only source I have on this machine for 2.6.x is for 2.6.0-test8. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. [-- Attachment #2: Type: APPLICATION/octet-stream, Size: 3572 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 18:23 ` Richard B. Johnson @ 2004-03-03 19:29 ` Dave Dillow 2004-03-03 20:10 ` Richard B. Johnson 2004-03-03 22:25 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Dave Dillow @ 2004-03-03 19:29 UTC (permalink / raw) To: root; +Cc: Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 2004-03-03 at 13:23, Richard B. Johnson wrote: > The very great problems that exist with poll on linux-2.6.0 > are being quashed by those who just like to argue. No, the argument has always been that your understanding of poll()'s internals is not entirely correct. We have simply asked you to post code that shows poll()'s problems, which you have finally provided. Sort of. > Therefore, > I wrote some code that emulates the environment in which I > discovered the poll failure. Experts can decide whatever they > want about the inner workings of poll(). I supposed that if > `ps` showed that a task was sleeping in poll() then it must > be sleeping in poll(). This we all agree on -- poll() sleeps. Duh. No argument there. poll_wait() doesn't and never has, which was your original assertion. But on to the code! > So, even it that's wrong, here is > irrefutable proof that there is a problem with polling events > getting lost on 2.6.0. Ahem, no, not so much. What you have here is proof that your user program is not getting control again withing 0.488ms of the interrupt happening. That does not mean poll() is loosing events. You are definately seeing some significant latency -- 50 lost increments is ~25ms. What else is running when you perform this test? Can you repeat with a more recent kernel? Can you repeat in single user mode, with it being the only process present? With as few extra modules loaded as possible? I still think your problem is not poll() -- if there were problems there, bug reports would be coming out of the woodwork. -- Dave Dillow <dave@thedillows.org> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 19:29 ` Dave Dillow @ 2004-03-03 20:10 ` Richard B. Johnson 0 siblings, 0 replies; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 20:10 UTC (permalink / raw) To: Dave Dillow; +Cc: Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Dave Dillow wrote: > On Wed, 2004-03-03 at 13:23, Richard B. Johnson wrote: > > The very great problems that exist with poll on linux-2.6.0 > > are being quashed by those who just like to argue. > > No, the argument has always been that your understanding of poll()'s > internals is not entirely correct. We have simply asked you to post code > that shows poll()'s problems, which you have finally provided. Sort of. > > > Therefore, > > I wrote some code that emulates the environment in which I > > discovered the poll failure. Experts can decide whatever they > > want about the inner workings of poll(). I supposed that if > > `ps` showed that a task was sleeping in poll() then it must > > be sleeping in poll(). > > This we all agree on -- poll() sleeps. Duh. No argument there. > poll_wait() doesn't and never has, which was your original assertion. > > But on to the code! > > > So, even it that's wrong, here is > > irrefutable proof that there is a problem with polling events > > getting lost on 2.6.0. > > Ahem, no, not so much. What you have here is proof that your user > program is not getting control again withing 0.488ms of the interrupt > happening. That does not mean poll() is loosing events. > Well that should mean the same thing in the final wash. > You are definately seeing some significant latency -- 50 lost increments > is ~25ms. > > What else is running when you perform this test? Can you repeat with a > more recent kernel? Can you repeat in single user mode, with it being > the only process present? With as few extra modules loaded as possible? > I would need to install a more recent kernel and I think I should. That will mean some re-write of the module code, so I am told. I wrote the module and another Engineer, Terry Skillman, wrote the test-code and the Makefile. He also performed the tests. He says that it will not compile on a newer 2.6.x so I would have to do more work if true. I will have to put that off until Monday because I have to take a work-break. Nothing else is running. Although there is a network board, there is no network line from the hub to that machine when the tests are made. > I still think your problem is not poll() -- if there were problems > there, bug reports would be coming out of the woodwork. > -- > Dave Dillow <dave@thedillows.org> > Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 18:23 ` Richard B. Johnson 2004-03-03 19:29 ` Dave Dillow @ 2004-03-03 22:25 ` Linus Torvalds 2004-03-03 22:42 ` Richard B. Johnson 2004-03-03 22:52 ` Linus Torvalds 1 sibling, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2004-03-03 22:25 UTC (permalink / raw) To: Richard B. Johnson Cc: David Dillow, Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Richard B. Johnson wrote: > > are being quashed by those who just like to argue. Therefore, > I wrote some code that emulates the environment in which I > discovered the poll failure. No. I think you wrote some code that shows the bug you have. Your "poll()" function IS BUGGY. Look at this: static size_t poll(struct file *fp, struct poll_table_struct *wait) { size_t poll_flag; size_t flags; DEB(printk(KERN_INFO"%s : poll() called\n", devname)); poll_wait(fp, &pwait, wait); DEB(printk(KERN_INFO"%s : poll() returned\n", devname)); spin_lock_irqsave(&rtc_lock, flags); poll_flag = global_poll_flag; *** global_poll_flag = 0; spin_unlock_irqrestore(&rtc_lock, flags); return poll_flag; } you are clearing your own flag that says "there are events pending", so if you call your "poll()" function twice, on the second time it will say "there are no events pending". You should clear the "events pending" flag only when you literally remove the event (ie at "read()" time, not at "poll()" time). Because the select() code _will_ call down to the "poll()" functions multiple times if it gets woken up for any bogus reason. See if that fixes anything. It may well be that 2.6.x calls down to the low-level driver "poll()" function more than it should. That would be a mis-feature, and worth looking at, but I think you should try to fix your test first, since right now the bug is questionable. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 22:25 ` Linus Torvalds @ 2004-03-03 22:42 ` Richard B. Johnson 2004-03-03 23:14 ` Linus Torvalds 2004-03-03 22:52 ` Linus Torvalds 1 sibling, 1 reply; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 22:42 UTC (permalink / raw) To: Linus Torvalds Cc: David Dillow, Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Linus Torvalds wrote: > > > On Wed, 3 Mar 2004, Richard B. Johnson wrote: > > > > are being quashed by those who just like to argue. Therefore, > > I wrote some code that emulates the environment in which I > > discovered the poll failure. > > No. I think you wrote some code that shows the bug you have. > > Your "poll()" function IS BUGGY. > > Look at this: > > static size_t poll(struct file *fp, struct poll_table_struct *wait) > { > size_t poll_flag; > size_t flags; > DEB(printk(KERN_INFO"%s : poll() called\n", devname)); > poll_wait(fp, &pwait, wait); > DEB(printk(KERN_INFO"%s : poll() returned\n", devname)); > spin_lock_irqsave(&rtc_lock, flags); > poll_flag = global_poll_flag; > *** global_poll_flag = 0; > spin_unlock_irqrestore(&rtc_lock, flags); > return poll_flag; > } > > you are clearing your own flag that says "there are events pending", so if > you call your "poll()" function twice, on the second time it will say > "there are no events pending". > But I am clearing it under a spin-lock. > You should clear the "events pending" flag only when you literally remove > the event (ie at "read()" time, not at "poll()" time). Because the > select() code _will_ call down to the "poll()" functions multiple times if > it gets woken up for any bogus reason. Oh wow. That is the BUG! I didn't know it could call in multiple times. > > See if that fixes anything. > > It may well be that 2.6.x calls down to the low-level driver "poll()" > function more than it should. That would be a mis-feature, and worth > looking at, but I think you should try to fix your test first, since right > now the bug is questionable. > > Linus And YES! If I clear the flag only after it is read. It "fixes" the observed problem! I don't know if it is a BUG or a FEATURE, but you put your finger right on it! Thanks. Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 22:42 ` Richard B. Johnson @ 2004-03-03 23:14 ` Linus Torvalds 0 siblings, 0 replies; 29+ messages in thread From: Linus Torvalds @ 2004-03-03 23:14 UTC (permalink / raw) To: Richard B. Johnson Cc: David Dillow, Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Richard B. Johnson wrote: > > And YES! If I clear the flag only after it is read. It "fixes" the > observed problem! Ok. That solves one problem, but it does seem to point to the fact that 2.6.x calls down to poll() more than 2.4.x does. It really _is_ normal to have multiple calls to "poll()" for the same fd as a result of a single "poll()" system call, but if we actually get a positive hit (ie the ->poll() call returns a bit that we consider to be a success by comparing it to what we _wanted_ to see), then we should have stopped doing that. And since your test program always sets "POLLIN", any ->poll() call that has the POLLIN flag set should have ended up being the last one (since it would have marked a "success"). So there's still something I don't understand, and that seems to differ between 2.4.x and 2.6.x. Can anybody else see it? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 22:25 ` Linus Torvalds 2004-03-03 22:42 ` Richard B. Johnson @ 2004-03-03 22:52 ` Linus Torvalds 2004-03-03 23:07 ` Richard B. Johnson 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2004-03-03 22:52 UTC (permalink / raw) To: Richard B. Johnson Cc: David Dillow, Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Linus Torvalds wrote: > > You should clear the "events pending" flag only when you literally remove > the event (ie at "read()" time, not at "poll()" time). Because the > select() code _will_ call down to the "poll()" functions multiple times if > it gets woken up for any bogus reason. Hmm.. The above is all still true and your poll() implementation is bad, but looking at your test program, the problematic case really shouldn't trigger (we should call poll() multiple times only when it returns zero). To trigger that bug, you'd have to occasionally call poll() with the POLLIN bit clear in the incoming events, which your test program doesn't ever do. Anyway, you should move the clearing to read(), but there may well be something else going on too. What's the frequency you are programming the thing to send interrupts at? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond 2004-03-03 22:52 ` Linus Torvalds @ 2004-03-03 23:07 ` Richard B. Johnson 0 siblings, 0 replies; 29+ messages in thread From: Richard B. Johnson @ 2004-03-03 23:07 UTC (permalink / raw) To: Linus Torvalds Cc: David Dillow, Bill Davidsen, Roland Dreier, Linux Kernel Mailing List On Wed, 3 Mar 2004, Linus Torvalds wrote: > > > On Wed, 3 Mar 2004, Linus Torvalds wrote: > > > > You should clear the "events pending" flag only when you literally remove > > the event (ie at "read()" time, not at "poll()" time). Because the > > select() code _will_ call down to the "poll()" functions multiple times if > > it gets woken up for any bogus reason. > > Hmm.. The above is all still true and your poll() implementation is bad, > but looking at your test program, the problematic case really shouldn't > trigger (we should call poll() multiple times only when it returns zero). > > To trigger that bug, you'd have to occasionally call poll() with the > POLLIN bit clear in the incoming events, which your test program doesn't > ever do. > Well, when I removed the local poll_flag, using only the global one, and cleared it to zero after the long long was fetched under the lock (in the read routine), my observed problems go away. static size_t poll(struct file *fp, struct poll_table_struct *wait) { DEB(printk(KERN_INFO"%s : poll() called\n", devname)); poll_wait(fp, &pwait, wait); DEB(printk(KERN_INFO"%s : poll() returned\n", devname)); return global_poll_flag; } static int read(struct file *fp, char *buf, size_t count, loff_t *ppos) { long long tmp; size_t flags; spin_lock_irqsave(&rtc_lock, flags); tmp = rtc_tick; global_poll_flag = 0; spin_unlock_irqrestore(&rtc_lock, flags); if(copy_to_user(buf, &tmp, sizeof(tmp))) return -EFAULT; return sizeof(tmp); } > Anyway, you should move the clearing to read(), but there may well be > something else going on too. > > What's the frequency you are programming the thing to send interrupts at? > 2048 ticks/second, trivial to handle. > Linus > - Cheers, Dick Johnson Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: poll() in 2.6 and beyond @ 2004-03-03 3:06 linux 0 siblings, 0 replies; 29+ messages in thread From: linux @ 2004-03-03 3:06 UTC (permalink / raw) To: linux-kernel, root > I'm talking about the driver! When a open fd called poll() or select(), > in user-mode code, the driver's poll() was called, and the driver's poll() > would call poll_wait(). Poll_wait() used to NOT return until the driver > executed wake_up_interruptible() on that wait-queue. When poll_wait() > returned, the driver would return to the caller with the new poll- > status. poll_wait has ALWAYS, since it was introduces during the select/poll changeover in 2.1 development, been a non-sleeping, immediately returning function. Its predecessor, select_wait(), has been a non-sleeping function since 1.0. > So, if the poll_wait isn't a wait-function, but just some add-wakeup > to the queue function, then its name probably should have been > changed when it changed. At one time it did, truly, wait until > it was awakened with wake_up_interruptible. There is no "when it changed". It has never changed. Go look at the 2.2.20 code on http://lxr.linux.no/. *************************************************************** * * * poll_wait: * * - DOES NOT sleep. * * - NEVER HAS slept, in any kernel version, EVER. * * - WOULD NOT WORK if it did sleep, for reasons that are * * so BLATANTLY OBVIOUS that arguing about it after it's * * been REPEATEDLY pointed out is a sign that the person * * arguing needs to go and visit the rest home with those * * nice, young men in their clean, white coats. * * * *************************************************************** It has aways, since select_wait in Linux 1.0, been nothing more than an "add-wakeup-to-the-queue" function. The last time the code changed significantly was the select/poll changeover in 2.1.x, and even then it was very similar. When a particular filp's poll method is called, there are two things that have to get done: 1) Check if the wakeup conditions are already satisfied (the "no-wait" case), and 2) Schedule the task for wakeup when the filp's condition changes (the "wait" case) Now, 2) only has to be done if 1) fails, but we can't do things in that order because there's a race condition. if the condition changed between the two steps, but doesn't change after that, we'll never wake up. So we have to do 2), and THEN check for 1). This is the fundamental race condition of sleeping until a condition becomes true, so anyone who entertains the remotest channce of ever writing functioning kernel code should be excruciatingly familiar with it, and its solution. For the terminally dim, everyone hold hands and follow me. Remember when we read Kernighan & Ritchie aloud and don't get lost... In the 2.2.20 kernel, pollwait is defined (include/linux/poll.h) as: 14 struct poll_table_entry { 15 struct file * filp; 16 struct wait_queue wait; 17 struct wait_queue ** wait_address; 18 }; 19 20 typedef struct poll_table_struct { 21 struct poll_table_struct * next; 22 unsigned int nr; 23 struct poll_table_entry * entry; 24 } poll_table; 25 26 #define __MAX_POLL_TABLE_ENTRIES ((PAGE_SIZE - sizeof (poll_table)) / sizeof (struct poll_table_entry)) 27 28 extern void __pollwait(struct file * filp, struct wait_queue ** wait_address, poll_table *p); 29 30 extern inline void poll_wait(struct file * filp, struct wait_queue ** wait_address, poll_table *p) 31 { 32 if (p && wait_address) 33 __pollwait(filp, wait_address, p); 34 } This ia a trivial wrapper around __pollwait(). Nothing else in the function could possibly take more than a few clock cycles. __pollwait is defined (fs/select.c) as: 94 void __pollwait(struct file * filp, struct wait_queue ** wait_address, poll_table *p) 95 { 96 for (;;) { 97 if (p->nr < __MAX_POLL_TABLE_ENTRIES) { 98 struct poll_table_entry * entry; 99 entry = p->entry + p->nr; 100 entry->filp = filp; 101 filp->f_count++; 102 entry->wait_address = wait_address; 103 entry->wait.task = current; 104 entry->wait.next = NULL; 105 add_wait_queue(wait_address,&entry->wait); 106 p->nr++; 107 return; 108 } 109 p = p->next; 110 } 111 } This does a little bit of bookkeeping and calls add_wait_queue(). Nothing else in the function could possibly take more than a few dozen clock cycles. Now look at add_wait_queue(), which is defined (include/linux/sched.h) as nothing more than calls to three other functions: 745 extern inline void __add_wait_queue(struct wait_queue ** p, struct wait_queue * wait) 746 { 747 wait->next = *p ? : WAIT_QUEUE_HEAD(p); 748 *p = wait; 749 } 750 751 extern rwlock_t waitqueue_lock; 752 753 extern inline void add_wait_queue(struct wait_queue ** p, struct wait_queue * wait) 754 { 755 unsigned long flags; 756 757 write_lock_irqsave(&waitqueue_lock, flags); 758 __add_wait_queue(p, wait); 759 write_unlock_irqrestore(&waitqueue_lock, flags); 760 } write_lock_irqsave() and write_unlock_irqsave() can get a little bit complicated, so pay careful attention. We're going to consider just the i386, non-SMP case. SMP locking is for big boys and girls who did all their homework and got As on their tests. They are defined (include/asm-i386/spinlock.h) as: 115 #define write_lock_irqsave(lock, flags) \ 116 do { save_flags(flags); cli(); } while (0) 117 #define write_unlock_irqrestore(lock, flags) \ 118 restore_flags(flags) These, in turn, go through some wrappers in include/asm-i386/system.h: 198 #define cli() __cli() 199 #define sti() __sti() 200 #define save_flags(x) __save_flags(x) 201 #define restore_flags(x) __restore_flags(x) and end up calling the primitive assembly routines in the same file: 176 /* interrupt control.. */ 177 #define __sti() __asm__ __volatile__ ("sti": : :"memory") 178 #define __cli() __asm__ __volatile__ ("cli": : :"memory") 179 #define __save_flags(x) \ 180 __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */ :"memory") 181 #define __restore_flags(x) \ 182 __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory") These are one or two machine instructions each. Finally, _add__wait_queue, above, is just two assignments. Just to be absolutely sure we've explored every function-like thing that the most deluded paranoid could think might sleep, I'll mention that WAIT_QUEUE_HEAD is defined in include/linux/wait.h as: 22 #define WAIT_QUEUE_HEAD(x) ((struct wait_queue *)((x)-1)) This, also, cannot possibly sleep. To recap, looking back in the wayback machine to 2.2.20, the complete call/macro graph of poll_wait is: poll_wait __pollwait add_wait_queue write_lock_irqsave save_flags __save_flags cli __cli __add_wait_queue WAIT_QUEUE_HEAD write_unlock_irqrestore restore_flags __restore_flags and NONE OF THESE FUNCTIONS SLEEP. Therefore, poll_wait DID NOT USED TO SLEEP. 2.4 and later allocate the poll_table pages on demand (with GFP_KERNEL) rather than preallocating everything, so *that* part can sleep, but it doesn't depend on the filp being polled, and the rest CANNOT. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2004-03-03 23:08 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-02 18:21 poll() in 2.6 and beyond Richard B. Johnson
2004-03-02 20:04 ` Roland Dreier
2004-03-02 20:24 ` Richard B. Johnson
2004-03-02 21:00 ` Roland Dreier
2004-03-02 21:26 ` Richard B. Johnson
2004-03-02 21:39 ` Roland Dreier
2004-03-02 21:59 ` Richard B. Johnson
2004-03-02 22:41 ` Dave Dillow
2004-03-02 22:56 ` Roland Dreier
2004-03-02 23:16 ` Richard B. Johnson
2004-03-02 23:21 ` Roland Dreier
[not found] <1vmPm-4lU-11@gated-at.bofh.it>
[not found] ` <1vonq-6dr-37@gated-at.bofh.it>
[not found] ` <1voGY-6vC-41@gated-at.bofh.it>
[not found] ` <1vpjt-7dl-17@gated-at.bofh.it>
[not found] ` <1vpCV-7wY-41@gated-at.bofh.it>
[not found] ` <1vpWa-7Py-19@gated-at.bofh.it>
2004-03-02 22:53 ` Bill Davidsen
2004-03-02 22:57 ` Roland Dreier
2004-03-02 23:32 ` Richard B. Johnson
2004-03-03 0:07 ` John Muir
2004-03-03 1:18 ` Richard B. Johnson
2004-03-03 4:04 ` Roland Dreier
2004-03-03 12:38 ` Richard B. Johnson
2004-03-03 14:29 ` Davide Libenzi
2004-03-03 3:57 ` David Dillow
2004-03-03 18:23 ` Richard B. Johnson
2004-03-03 19:29 ` Dave Dillow
2004-03-03 20:10 ` Richard B. Johnson
2004-03-03 22:25 ` Linus Torvalds
2004-03-03 22:42 ` Richard B. Johnson
2004-03-03 23:14 ` Linus Torvalds
2004-03-03 22:52 ` Linus Torvalds
2004-03-03 23:07 ` Richard B. Johnson
-- strict thread matches above, loose matches on Subject: below --
2004-03-03 3:06 linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox