From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751793AbWFCVah (ORCPT ); Sat, 3 Jun 2006 17:30:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751633AbWFCVah (ORCPT ); Sat, 3 Jun 2006 17:30:37 -0400 Received: from ms-smtp-03.nyroc.rr.com ([24.24.2.57]:30122 "EHLO ms-smtp-03.nyroc.rr.com") by vger.kernel.org with ESMTP id S1751289AbWFCVag (ORCPT ); Sat, 3 Jun 2006 17:30:36 -0400 Subject: Re: [patch 4/5] [PREEMPT_RT] Changing interrupt handlers from running in thread to hardirq and back runtime. From: Steven Rostedt To: Esben Nielsen Cc: linux-kernel@vger.kernel.org, Ingo Molnar In-Reply-To: References: <20060602165336.147812000@localhost> Content-Type: text/plain Date: Sat, 03 Jun 2006 17:30:30 -0400 Message-Id: <1149370230.13993.161.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.4.2.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2006-06-02 at 23:23 +0100, Esben Nielsen wrote: > Makes it possible for the 8250 serial driver to have it's interrupt handler > in both hard-irq and threaded context under PREEMPT_RT. > > Index: linux-2.6.16-rt23.spin_mutex/drivers/serial/8250.c > =================================================================== > --- linux-2.6.16-rt23.spin_mutex.orig/drivers/serial/8250.c > +++ linux-2.6.16-rt23.spin_mutex/drivers/serial/8250.c > @@ -140,7 +140,7 @@ struct uart_8250_port { > }; > > struct irq_info { > - spinlock_t lock; > + spin_mutex_t lock; > struct list_head *head; > }; > > @@ -1287,6 +1287,47 @@ serial8250_handle_port(struct uart_8250_ > spin_unlock(&up->port.lock); > } > > + > +static int serial8250_change_context(int irq, void *dev_id, > + enum change_context_cmd cmd) > +{ > + struct irq_info *i = dev_id; > + struct list_head *l; > + > + switch(cmd) { > + case IRQ_TO_HARDIRQ: > + /* Spin mutexes not available: */ > + if(!spin_mutexes_can_spin()) > + return -ENOSYS; > + > + /* First do the inner locks */ > + l = i->head; > + do { > + struct uart_8250_port *up; > + up = list_entry(l, struct uart_8250_port, list); > + spin_mutex_to_spin(&up->port.lock); > + l = l->next; > + } while(l != i->head && l != NULL); Now this shows the flaw in your whole design. Don't get me wrong, I'm very impressed that you got this working for you. But for this to be accepted, it can't be too intrusive. It is also very error prone since you must know not only the locks in the handlers, but also the locks in all the functions that you call. And each device must update them. Two things: 1) What if you have two serials, and one is threaded and one is not. It might mutex a lock that should be a spin. 2) You forgot to update sysrq_key_table_lock. That can be called from the serial interrupt as well. So even just working with two devices it's not trivial to get things right. Imagine what it would take for the whole kernel. Not to mention the maintenance nightmare it would cause. I do give you an A+ for effort, but I'm not sure if this is the solution. Frankly, it scares me, unless you can find a sure fire way to update all the locks properly, that would not cause problems elsewhere. Maybe the -rt way can introduce a new paradigm. Maybe we shouldn't have top halves at all, but instead a way to register threads to devices. Think about it, the big picture, what are interrupts for? They are to tell us that an event happened, and if there is nothing waiting for that event, then it was worthless. So really, what we could do (and this would perhaps also be good for mainline, although it would be very hard to get accepted) is to have registering functions for each device. So when a thread waits on a device, it is registered with the irq that the device has, instead of a wait queue. So each interrupt would have a priority sorted list of threads that are waiting for that interrupt. When the interrupt happens, the highest priority thread would wake up to handle it, instead of an interrupt handler. If the interrupt belonged to another thread, the one that was woken up would pass it off to the thread waiting and go back to sleep. This also allows the interrupt handlers to schedule and even write to user space, once it finds out that the thread working on the interrupt is the thread in question. Let's say that we have a network card, which are commonly shared with other devices, and it also has to do a lot before it knows what process it is working on for. And lets just say it is shared (uncommonly) with a IDE device. Now an interrupt comes in on this line. There's a list of all the highest priority tasks waiting on each device (not all tasks, just the highest for each device, like the pi_list in the rt code). So the highest priority tasks for each device is woken up (the highest for the network card and the highest for the IDE) and the interrupt handler would return (not unlike the irq threads of the current -rt, except, we wouldn't have irq threads but tasks that are waiting on them). These tasks would be in the registering function of the device, which would know how to handle the interrupt. Now if the current process is higher in priority, then it wont be preempted by this interrupt. But if a higher priority process is waiting, then the current process is preempted. So this is like interrupt handlers inheriting priorities of processes waiting. So the process now checks to see if the interrupt belongs to it, and if it does, then simply handle the case. Otherwise, if it does not belong to the device, simply go back to sleep and wait again. If this belongs to the device, but for another waiter, using our example, the network card, the process realized that it belonged to another process on another port, it could wake up that process and pass information on how much work the first process has done, so that the other process doesn't repeat the work. I'm just making this up as I go, but it really looks like this can be examined more in an academic arena. Perhaps it would even lead to less context switches, since the process waiting would already be woken up on interrupt. The above idea is based on what interrupts are for. They don't happen for no reasons, but there's almost always someone waiting for it. -- Steve > + spin_mutex_to_spin(&i->lock); > + break; > + case IRQ_CAN_THREAD: > + break; > + case IRQ_TO_THREADED: > + spin_mutex_to_mutex(&i->lock); > + l = i->head; > + > + do { > + struct uart_8250_port *up; > + up = list_entry(l, struct uart_8250_port, list); > + spin_mutex_to_mutex(&up->port.lock); > + l = l->next; > + } while(l != i->head && l != NULL); > + break; > + } > + > + return 0; > +} > + > /* > * This is the serial driver's interrupt routine. > * > @@ -1393,8 +1434,10 @@ static int serial_link_irq_chain(struct > i->head = &up->list; > spin_unlock_irq(&i->lock); > > - ret = request_irq(up->port.irq, serial8250_interrupt, > - irq_flags, "serial", i); > + ret = request_irq2(up->port.irq, serial8250_interrupt, > + irq_flags | SA_MUST_THREAD_RT, > + "serial", i, > + serial8250_change_context); > if (ret < 0) > serial_do_unlink(i, up); > } > Index: linux-2.6.16-rt23.spin_mutex/include/linux/serial_core.h > =================================================================== > --- linux-2.6.16-rt23.spin_mutex.orig/include/linux/serial_core.h > +++ linux-2.6.16-rt23.spin_mutex/include/linux/serial_core.h > @@ -206,7 +206,7 @@ struct uart_icount { > typedef unsigned int __bitwise__ upf_t; > > struct uart_port { > - spinlock_t lock; /* port lock */ > + spin_mutex_t lock; /* port lock */ > unsigned int iobase; /* in/out[bwl] */ > unsigned char __iomem *membase; /* read/write[bwl] */ > unsigned int irq; /* irq number */ > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Steven Rostedt Senior Programmer Kihon Technologies (607)786-4830