From: Mark Hounschell <markh@compro.net>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Subject: Re: floppy.c soft lockup
Date: Thu, 31 May 2007 16:18:30 -0400 [thread overview]
Message-ID: <465F2D96.9060502@compro.net> (raw)
In-Reply-To: <20070531192256.GA88@tv-sign.ru>
Oleg Nesterov wrote:
> On 05/31, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> On 05/31, Mark Hounschell wrote:
>>>> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
>>>> thread to do some I/O. That RT-thread (running on the other processor) does a simple
>>> If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
>>> can hang of course, we can do nothing.
>>>
>>>> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>>>>
>>>> and there is no return from the call. That thread is hung.
>>> What happens if you kill the main RT-process?
>>>
>> When I kill the main process all its threads also go away. Including the floppy thread.
>> Nothing notable happens with this kernel.
>
> Aha, I missed the word "thread", this is the single process.
>
> Still, this means that flush_workqueue() completes when other sub-threads go away,
> otherwise the thread doing ioctl() couldn't exit.
>
> Thank you very much.
>
> So, the main question is: is it possible that one of RT processes/threads pins itself
> to some CPU and eats 100% cpu power?
>
The main process is pinned to a processor(2) with all _non-kernel_ processes/threads forced over to processor 1.
Any already affinitized processes or kernel threads are left as is. Only user land stuff is moved. The main process
is for sure _not_ relinquishing it's processor(2) intentionally. All the I/O threads, floppy included, are running
on the other processor(1). During this failure only 1 or 2 of the I/O threads are actually doing anything.
I assume that what ever is going on in the kernel/floppy driver on behalf of the floppy thread is being done on processor 1?
Processor 1 has lots of CPU time available. Processor 2 is running balls to the wall.
>> On previous (2.6.18) I would get a dump
>> from the floppy driver in the syslog when I killed the process.
>
> Could you send me this output? just in case...
>
Today, 2.6.18 is doing the same as 2.6.22-rc3. I hate it when that happens. Maybe it was
on my box at home. I'll verify when I get there. Nothing from here now though.
>>> --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
>>> +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
>>> @@ -862,6 +862,8 @@ static void set_fdc(int drive)
>>> FDCS->reset = 1;
>>> }
>>>
>>> +static DECLARE_WORK(floppy_work, NULL);
>>> +
>>> /* locks the driver */
>>> static int _lock_fdc(int drive, int interruptible, int line)
>>> {
>>> @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
>>> set_current_state(TASK_RUNNING);
>>> remove_wait_queue(&fdc_wait, &wait);
>>>
>>> - flush_scheduled_work();
>>> + cancel_work_sync(&floppy_work);
>>> }
>>> command_status = FD_COMMAND_NONE;
>>>
>>> @@ -992,8 +994,6 @@ static void empty(void)
>>> {
>>> }
>>>
>>> -static DECLARE_WORK(floppy_work, NULL);
>>> -
>>> static void schedule_bh(void (*handler) (void))
>>> {
>>> PREPARE_WORK(&floppy_work, (work_func_t)handler);
>>>
>> The patch does make it work.
>
> I do not understand floppy.c, absolutely, so I am not sure this patch is correct.
>
> Even if correct, this patch doesn't solve this problem (if we really understand
> what's going on). cancel_work_sync() may still hang if floppy_work->func() runs
> on the starved CPU. This is unlikely, but possible.
>
> Thanks!
>
> Oleg.
>
next prev parent reply other threads:[~2007-05-31 20:18 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 17:31 floppy.c soft lockup Mark Hounschell
2007-05-31 5:46 ` Andrew Morton
2007-05-31 14:28 ` Mark Hounschell
2007-05-31 17:06 ` Oleg Nesterov
2007-05-31 18:01 ` Mark Hounschell
2007-05-31 18:44 ` Mark Hounschell
2007-05-31 19:22 ` Oleg Nesterov
2007-05-31 20:18 ` Mark Hounschell [this message]
2007-06-01 9:51 ` Mark Hounschell
2007-06-01 11:00 ` Oleg Nesterov
2007-06-01 14:10 ` Mark Hounschell
2007-06-01 15:16 ` Oleg Nesterov
2007-06-01 17:11 ` Mark Hounschell
2007-06-01 18:36 ` Oleg Nesterov
2007-06-01 19:52 ` Mark Hounschell
2007-06-02 12:30 ` Oleg Nesterov
2007-06-02 20:44 ` Mark Hounschell
2007-06-03 8:14 ` Oleg Nesterov
2007-06-04 14:00 ` Mark Hounschell
2007-06-06 13:12 ` Mark Hounschell
2007-06-06 17:28 ` Andrew Morton
2007-06-07 1:31 ` Matt Mackall
2007-06-07 10:18 ` Mark Hounschell
2007-06-07 14:25 ` Matt Mackall
2007-06-08 9:54 ` Mark Hounschell
2007-06-13 16:17 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=465F2D96.9060502@compro.net \
--to=markh@compro.net \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.