Andrew Morton wrote: > On Tue, 29 May 2007 13:31:05 -0400 Mark Hounschell wrote: > >> Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked >> it down to a single line of code. When the following patch is applied to the version in 2.6.18 >> my application works. >> >> --- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400 >> +++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400 >> @@ -893,7 +893,6 @@ >> set_current_state(TASK_RUNNING); >> remove_wait_queue(&fdc_wait, &wait); >> >> - flush_scheduled_work(); >> } >> command_status = FD_COMMAND_NONE; >> >> I don't claim to understand the changes from 2.6.17 to 2.6.18 except for the devfs removal. >> All I can say is this one line of code kills the application. I have tried to write a short pgm >> that shows my problem but everything else I write seems to work. The application only runs >> on SMP machines and uses process and irq affinities with real-time scheduling. When I turn >> off process and irq affinities the application runs. >> >> I have tried kernels up through 2.6.21.1 with the same results. All kernels from 2.6.18 up >> require that I remove this one line of code or my application does not work? > > Interesting. I'd expect that the calling process is spinning, with realtime > policy and is expecting some other process to do something (ie: run a workqueue). > > If you keep the process and irq affinities, and disable the realtime policy > does that also prevent the problem? > Yes it does. > It would be interesting it you could capture a few task traces while it is stuck: > echo 1 > /proc/sys/kernel/sysrq then do ALT-SYSRQ-P a bunch of times and ALT-SYSRQ-T, > see if you can work out where the CPU is stuck. > I've attached the syslog output as a result of doing the above. I can't really make any kind of determination from it myself as I don't really knowing what I'm looking at. > ALso, 2.6.22-rc3 might have accidentally fixed this. > No. Same thing there. The traces attached are using 2.6.22-rc3. Basically the main RT-process (which is a CPU bound process on processor-2) signals a thread to do some I/O. That RT-thread (running on the other processor) does a simple ioctl(Q->DevSpec1, FDSETPRM, &medprm) and there is no return from the call. That thread is hung. Thanks Mark