Current qc_defer implementation may lead to infinite recursion

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Current qc_defer implementation may lead to infinite recursion
@ 2008-02-10 17:54 Elias Oltmanns
  2008-02-11  5:06 ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-10 17:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Hi Tejun,

due to your commit 31cc23b34913bc173680bdc87af79e551bf8cc0d libata now
sets max_host_blocked and max_device_blocked to 1 for all devices it
manages. Under certain conditions this may lead to system lockups due to
infinite recursion as I have explained to James on the scsi list (kept
you cc-ed). James told me that it was the business of libata to make
sure that such a recursion cannot happen.

In my discussion with James I imprudently claimed that this was easy to
fix in libata. However, after giving the matter some thought, I'm not at
all sure as to what exactly should be done about it. The easy bit is
that max_host_blocked and max_device_blocked should be left alone as
long as the low level driver does not provide the ->qc_defer() callback.
But even if the driver has defined this callback, ata_std_qc_defer() for
one will not prevent this recursion on a uniprocessor, whereas things
might work out well on an SMP system due to the lock fiddling in the
scsi midlayer.

As a conclusion, the current implementation makes it imperative to leave
max_host_blocked and max_device_blocked alone on a uniprocessor system.
For SMP systems the current implementation might just be fine but even
there it might just as well be a good idea to make the adjustment
depending on ->qc_defer != NULL.

Any ideas?

Regards,

Elias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-10 17:54 Current qc_defer implementation may lead to infinite recursion Elias Oltmanns
@ 2008-02-11  5:06 ` Tejun Heo
  2008-02-11  7:57   ` Elias Oltmanns
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2008-02-11  5:06 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Elias Oltmanns wrote:
> Hi Tejun,
> 
> due to your commit 31cc23b34913bc173680bdc87af79e551bf8cc0d libata now
> sets max_host_blocked and max_device_blocked to 1 for all devices it
> manages. Under certain conditions this may lead to system lockups due to
> infinite recursion as I have explained to James on the scsi list (kept
> you cc-ed). James told me that it was the business of libata to make
> sure that such a recursion cannot happen.
> 
> In my discussion with James I imprudently claimed that this was easy to
> fix in libata. However, after giving the matter some thought, I'm not at
> all sure as to what exactly should be done about it. The easy bit is
> that max_host_blocked and max_device_blocked should be left alone as
> long as the low level driver does not provide the ->qc_defer() callback.
> But even if the driver has defined this callback, ata_std_qc_defer() for
> one will not prevent this recursion on a uniprocessor, whereas things
> might work out well on an SMP system due to the lock fiddling in the
> scsi midlayer.
> 
> As a conclusion, the current implementation makes it imperative to leave
> max_host_blocked and max_device_blocked alone on a uniprocessor system.
> For SMP systems the current implementation might just be fine but even
> there it might just as well be a good idea to make the adjustment
> depending on ->qc_defer != NULL.

Hmmm... The reason why max_host_blocked and max_device_blocked are set
to 1 is to let libata re-consider status after each command completion
as blocked status can be rather complex w/ PMP.  I haven't really
followed the code yet but you're saying that blocked count of 2 should
be used for that behavior, right?

Another strange thing is that there hasn't been any such lock up /
infinite recursion report till now although ->qc_defer mechanism bas
been used widely for some time now.  Can you reproduce the problem w/o
the disk shock protection?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-11  5:06 ` Tejun Heo
@ 2008-02-11  7:57   ` Elias Oltmanns
  2008-02-11  8:43     ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-11  7:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> Hi Tejun,
>> 
>> due to your commit 31cc23b34913bc173680bdc87af79e551bf8cc0d libata now
>> sets max_host_blocked and max_device_blocked to 1 for all devices it
>> manages. Under certain conditions this may lead to system lockups due to
>> infinite recursion as I have explained to James on the scsi list (kept
>> you cc-ed). James told me that it was the business of libata to make
>> sure that such a recursion cannot happen.
>> 
>> In my discussion with James I imprudently claimed that this was easy to
>> fix in libata. However, after giving the matter some thought, I'm not at
>> all sure as to what exactly should be done about it. The easy bit is
>> that max_host_blocked and max_device_blocked should be left alone as
>> long as the low level driver does not provide the ->qc_defer() callback.
>> But even if the driver has defined this callback, ata_std_qc_defer() for
>> one will not prevent this recursion on a uniprocessor, whereas things
>> might work out well on an SMP system due to the lock fiddling in the
>> scsi midlayer.
>> 
>> As a conclusion, the current implementation makes it imperative to leave
>> max_host_blocked and max_device_blocked alone on a uniprocessor system.
>> For SMP systems the current implementation might just be fine but even
>> there it might just as well be a good idea to make the adjustment
>> depending on ->qc_defer != NULL.
>
> Hmmm... The reason why max_host_blocked and max_device_blocked are set
> to 1 is to let libata re-consider status after each command completion
> as blocked status can be rather complex w/ PMP.  I haven't really
> followed the code yet but you're saying that blocked count of 2 should
> be used for that behavior, right?

Not quite. On an SMP system the current implementation will probably do
exactly what you had in mind. In particular, setting max_device_blocked
and max_host_blocked to 1 seems to be the right thing to do in this
case.

>
> Another strange thing is that there hasn't been any such lock up /
> infinite recursion report till now although ->qc_defer mechanism bas
> been used widely for some time now.  Can you reproduce the problem w/o
> the disk shock protection?

No, unfortunately, I'm unable to reproduce this without the patch on my
machine. This is for purely technical reasons though because I'm using
ata_piix. Running a vanilla kernel, I'd expect everything to work just
fine except for one case: A non-SMP system using a driver that provides
the ->qc_defer() callback. Currently, the ->qc_defer() callback is the
only thing that can possibly send a non zero return value to the scsi
midlayer. Once it does, however, the driver will only get a chance to
complete some qcs before ->qc_defer() is called again provided that
multithreading is supported.

So, what I'm saying is this: If the low level driver doesn't provide a
->qc_defer() callback, there is no (obvious) reason why
max_device_blocked and max_host_blocked should be set to 1 since libata
won't gain anything by it. However, it is not a bug either, even though
James considers it suboptimal and I will have to think about a solution
for my patch. On the other hand, once a driver defines the ->qc_defer()
callback, we really have a bug because things will go wrong once
->qc_defer() returns non zero on a uniprocessor. So, in this case
max_device_blocked and max_host_blocked should be set to 1 on an SMP
system and *have to* be bigger than 1 otherwise.

Regards,

Elias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-11  7:57   ` Elias Oltmanns
@ 2008-02-11  8:43     ` Tejun Heo
  2008-02-11 22:03       ` Elias Oltmanns
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2008-02-11  8:43 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Hello,

Elias Oltmanns wrote:
>> Hmmm... The reason why max_host_blocked and max_device_blocked are set
>> to 1 is to let libata re-consider status after each command completion
>> as blocked status can be rather complex w/ PMP.  I haven't really
>> followed the code yet but you're saying that blocked count of 2 should
>> be used for that behavior, right?
> 
> Not quite. On an SMP system the current implementation will probably do
> exactly what you had in mind. In particular, setting max_device_blocked
> and max_host_blocked to 1 seems to be the right thing to do in this
> case.

I currently can't test the code but SMP or not doesn't make much
difference if your analysis is correct.  You're saying the the code will
infinitely recurse if qc_defer returns non-zero but on SMP the recursion
will be terminated by another CPU finishing a pending request, right?  I
don't think things will fan out that way.  The recursing thread will
have much more than enough time to overflow the stack before any IO
event comes to stop the recursion.

I don't think such infinite recursions can happen because
blk_run_queue() doesn't allow recursing more than once.

>> Another strange thing is that there hasn't been any such lock up /
>> infinite recursion report till now although ->qc_defer mechanism bas
>> been used widely for some time now.  Can you reproduce the problem w/o
>> the disk shock protection?
> 
> No, unfortunately, I'm unable to reproduce this without the patch on my
> machine. This is for purely technical reasons though because I'm using
> ata_piix. Running a vanilla kernel, I'd expect everything to work just
> fine except for one case: A non-SMP system using a driver that provides
> the ->qc_defer() callback. Currently, the ->qc_defer() callback is the
> only thing that can possibly send a non zero return value to the scsi
> midlayer. Once it does, however, the driver will only get a chance to
> complete some qcs before ->qc_defer() is called again provided that
> multithreading is supported.
> 
> So, what I'm saying is this: If the low level driver doesn't provide a
> ->qc_defer() callback, there is no (obvious) reason why
> max_device_blocked and max_host_blocked should be set to 1 since libata
> won't gain anything by it. However, it is not a bug either, even though
> James considers it suboptimal and I will have to think about a solution
> for my patch. On the other hand, once a driver defines the ->qc_defer()
> callback, we really have a bug because things will go wrong once
> ->qc_defer() returns non zero on a uniprocessor. So, in this case
> max_device_blocked and max_host_blocked should be set to 1 on an SMP
> system and *have to* be bigger than 1 otherwise.

Well, we need to support ->qc_defer on UP systems too.  I'm still very
skeptical that the scenario you described can happen.  The mechanism is
just used too widely for such problem to go unnoticed.  For example,
deferring is exercised when FLUSH is issued to a driver which have
in-flight NCQ commands.  If you have barrier enabled FS mounted on an
NCQ enabled hard drive, such events will happen regularly but we haven't
heard of any similar lock up or stack overflow till now.

Feel free to add ->qc_defer to ata_piix and let it defer commands (say
every 20th or any other criteria you seem fit for testing) but I think
the worst can happen is somewhat inefficient deferring sequence like the
following.

1. ->qc_defer returns DEVICE BUSY
2. device_blocked set to 1
3. scsi_queue_insert() ends up calling blk_run_queue()
4. blk_run_queue() recurses, device_blocked cleared and queue processing
   starts again.
5. ->qc_defer returns DEVICE BUSY again
6. device_blocked set to 1
7. scsi_queue_insert() ends up calling blk_run_queue()
8. blk_run_queue() detects it's recursing the second time and veils.

At this point, although it went through unnecessary loop through queue
processing, everything is well.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-11  8:43     ` Tejun Heo
@ 2008-02-11 22:03       ` Elias Oltmanns
  2008-02-12  1:14         ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-11 22:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo <htejun@gmail.com> wrote:
> Hello,
>
> Elias Oltmanns wrote:
>>> Hmmm... The reason why max_host_blocked and max_device_blocked are set
>>> to 1 is to let libata re-consider status after each command completion
>>> as blocked status can be rather complex w/ PMP.  I haven't really
>>> followed the code yet but you're saying that blocked count of 2 should
>>> be used for that behavior, right?
>> 
>> Not quite. On an SMP system the current implementation will probably do
>> exactly what you had in mind. In particular, setting max_device_blocked
>> and max_host_blocked to 1 seems to be the right thing to do in this
>> case.
>
[...]
> I don't think such infinite recursions can happen because
> blk_run_queue() doesn't allow recursing more than once.

Quite right. Sorry for missing that. However, the problem remains
although I'm not too sure anymore whether libata is to blame. See below.

>
>>> Another strange thing is that there hasn't been any such lock up /
>>> infinite recursion report till now although ->qc_defer mechanism bas
>>> been used widely for some time now.  Can you reproduce the problem w/o
>>> the disk shock protection?
>> 
>> No, unfortunately, I'm unable to reproduce this without the patch on my
>> machine. This is for purely technical reasons though because I'm using
>> ata_piix. Running a vanilla kernel, I'd expect everything to work just
>> fine except for one case: A non-SMP system using a driver that provides
>> the ->qc_defer() callback. Currently, the ->qc_defer() callback is the
>> only thing that can possibly send a non zero return value to the scsi
>> midlayer. Once it does, however, the driver will only get a chance to
>> complete some qcs before ->qc_defer() is called again provided that
>> multithreading is supported.
>> 
>> So, what I'm saying is this: If the low level driver doesn't provide a
>> ->qc_defer() callback, there is no (obvious) reason why
>> max_device_blocked and max_host_blocked should be set to 1 since libata
>> won't gain anything by it. However, it is not a bug either, even though
>> James considers it suboptimal and I will have to think about a solution
>> for my patch. On the other hand, once a driver defines the ->qc_defer()
>> callback, we really have a bug because things will go wrong once
>> ->qc_defer() returns non zero on a uniprocessor. So, in this case
>> max_device_blocked and max_host_blocked should be set to 1 on an SMP
>> system and *have to* be bigger than 1 otherwise.
>
> Well, we need to support ->qc_defer on UP systems too.  I'm still very
> skeptical that the scenario you described can happen.  The mechanism is
> just used too widely for such problem to go unnoticed.  For example,
> deferring is exercised when FLUSH is issued to a driver which have
> in-flight NCQ commands.  If you have barrier enabled FS mounted on an
> NCQ enabled hard drive, such events will happen regularly but we haven't
> heard of any similar lock up or stack overflow till now.
>
> Feel free to add ->qc_defer to ata_piix and let it defer commands (say
> every 20th or any other criteria you seem fit for testing)

An excellent idea, although it took me some time to figure out what kind
of a qc_defer replacement would best fit my needs. Indeed, I can
reproduce the system freeze with a suitably crafted qc_defer callback.
See below the patch I've used to get it working. The intention is to
simulate the following scenario:

1. libata has accepted some commands from the midlayer already and has
   stored them on the private queue of some ata_port.
2. Eventually, qc_defer declines a command for whatever reason.
3. Even though an infinite recursion does not occur in the way I had
   suggested, qc_defer will decline this command again the next time
   because the general situation hasn't changed. In particular, all the
   commands that had already been enqueued for that port at the last
   time will still be there and will not have been processed or even
   changed in any way.

Hopefully, I didn't miss the crucial point this time. Please note that
you have to strip down the piix_qc_defer() function in this patch in
order to get the scenario described above. Just get rid of those two
if-blocks whose condition contains the expression

PIIX_QC_DEFER_THRESHOLD + 100

in some way. Not doing that will prevent the system from freezing
entirely but still lead to BUG: messages in dmesg. At least that's what
happens in my case. These messages suggest to me that libata may be not
the culprit after all. However, I have no experience with interpreting
them properly so I'd appreciate it if you had a look at them after you
have verified that my qc_defer function does indeed constitute a valid
test case. I'll append some of the messages I get after the patch.

Thanks for your help,

Elias


---
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b406b39..f4dede7 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -167,6 +167,7 @@ static void piix_set_dmamode(struct ata_
 static void ich_set_dmamode(struct ata_port *ap, struct ata_device *adev);
 static int ich_pata_cable_detect(struct ata_port *ap);
 static u8 piix_vmw_bmdma_status(struct ata_port *ap);
+static int piix_qc_defer(struct ata_queued_cmd *qc);
 #ifdef CONFIG_PM
 static int piix_pci_device_suspend(struct pci_dev *pdev, pm_message_t mesg);
 static int piix_pci_device_resume(struct pci_dev *pdev);
@@ -311,6 +312,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -343,6 +345,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -371,6 +374,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -402,6 +406,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= piix_vmw_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -419,6 +424,43 @@ static const struct ata_port_operations 
 	.port_start		= ata_port_start,
 };
 
+static int piix_qc_defer(struct ata_queued_cmd *qc)
+{
+	static struct ata_port *ap = NULL;
+	struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];
+	static int count = 0;
+#define PIIX_QC_DEFER_THRESHOLD 5000
+
+	if (!ap)
+		ap = qc->ap;
+	else if (ap != qc->ap)
+		return 0;
+
+	if (count > PIIX_QC_DEFER_THRESHOLD + 100)
+		return 0;
+
+	count++;
+	if (count < PIIX_QC_DEFER_THRESHOLD)
+		return 0;
+	else if (count == PIIX_QC_DEFER_THRESHOLD) {
+		int i;
+		if (ap->qc_allocated) {
+			for (i = 0; i < ATA_MAX_QUEUE; i++)
+				qcmd[i] = ap->qcmd[i];
+			printk(KERN_DEBUG "piix_qc_defer(): saved current state\n");
+			msleep(5000);
+		} else
+			count--;
+	} else if (count == PIIX_QC_DEFER_THRESHOLD + 100) {
+		dump_stack();
+		count++;
+	} else if (memcmp(qcmd, ap->qcmd, sizeof(qcmd[0]) * ATA_MAX_QUEUE))
+		return ATA_DEFER_LINK;
+	else
+		count = 0;
+	return 0;
+}
+
 static const struct piix_map_db ich5_map_db = {
 	.mask = 0x7,
 	.port_enable = 0x3,
---


====================================================================
Feb 11 20:33:05 denkblock kernel: piix_qc_defer(): saved current state
Feb 11 20:33:11 denkblock kernel: BUG: scheduling while atomic: swapper/0/0x00000100
Feb 11 20:33:11 denkblock kernel: 
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted (2.6.24.1-dbg-1 #2)
Feb 11 20:33:11 denkblock kernel: EIP: 0060:[<e0021814>] EFLAGS: 00000216 CPU: 0
Feb 11 20:33:11 denkblock kernel: EIP is at acpi_processor_idle+0x245/0x3c5 [processor]
Feb 11 20:33:11 denkblock kernel: EAX: 4e9bda7e EBX: 00008f3f ECX: 00000358 EDX: 00000037
Feb 11 20:33:11 denkblock kernel: ESI: de52dac8 EDI: 007188d5 EBP: de52d800 ESP: c0309fc8
Feb 11 20:33:11 denkblock kernel:  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Feb 11 20:33:11 denkblock kernel: CR0: 8005003b CR2: b7f11e20 CR3: 1c4ea000 CR4: 000006d0
Feb 11 20:33:11 denkblock kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 11 20:33:11 denkblock kernel: DR6: ffff0ff0 DR7: 00000400
Feb 11 20:33:11 denkblock kernel:  [cpu_idle+67/93] cpu_idle+0x43/0x5d
Feb 11 20:33:11 denkblock kernel:  [start_kernel+572/577] start_kernel+0x23c/0x241
Feb 11 20:33:11 denkblock kernel:  [unknown_bootoption+0/405] unknown_bootoption+0x0/0x195
Feb 11 20:33:11 denkblock kernel:  =======================
Feb 11 20:33:11 denkblock kernel: bad: scheduling from the idle thread!
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted 2.6.24.1-dbg-1 #2
Feb 11 20:33:11 denkblock kernel:  [dequeue_task_idle+21/24] dequeue_task_idle+0x15/0x18
Feb 11 20:33:11 denkblock kernel:  [dequeue_task+10/20] dequeue_task+0xa/0x14
Feb 11 20:33:11 denkblock kernel:  [deactivate_task+28/39] deactivate_task+0x1c/0x27
Feb 11 20:33:11 denkblock kernel:  [schedule+195/759] schedule+0xc3/0x2f7
Feb 11 20:33:11 denkblock kernel:  [<e003434e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [schedule_timeout+110/139] schedule_timeout+0x6e/0x8b
Feb 11 20:33:11 denkblock kernel:  [process_timeout+0/5] process_timeout+0x0/0x5
Feb 11 20:33:11 denkblock kernel:  [schedule_timeout+105/139] schedule_timeout+0x69/0x8b
Feb 11 20:33:11 denkblock kernel:  [msleep+13/18] msleep+0xd/0x12
Feb 11 20:33:11 denkblock kernel:  [<e002c10b>] piix_qc_defer+0xa1/0xff [ata_piix]
Feb 11 20:33:11 denkblock kernel:  [<e003434e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e0059518>] ata_scsi_qc_complete+0x0/0x36d [libata]
Feb 11 20:33:11 denkblock kernel:  [<e005556b>] ata_qc_complete_internal+0x0/0xb [libata]
Feb 11 20:33:11 denkblock kernel:  [<e00594e4>] ata_scsi_translate+0xd6/0x10a [libata]
Feb 11 20:33:11 denkblock kernel:  [<e003434e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e005bb8f>] ata_scsi_queuecmd+0x17f/0x184 [libata]
Feb 11 20:33:11 denkblock kernel:  [<e005920a>] ata_scsi_rw_xlat+0x0/0x1de [libata]
Feb 11 20:33:11 denkblock kernel:  [<e0034aa8>] scsi_dispatch_cmd+0x197/0x20b [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e0039b33>] scsi_request_fn+0x256/0x2d7 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [as_completed_request+438/451] as_completed_request+0x1b6/0x1c3
Feb 11 20:33:11 denkblock kernel:  [blk_run_queue+42/75] blk_run_queue+0x2a/0x4b
Feb 11 20:33:11 denkblock kernel:  [<e0038702>] scsi_next_command+0x25/0x2f [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e0038810>] scsi_end_request+0x8f/0x99 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e0039407>] scsi_io_completion+0x150/0x302 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [blk_done_softirq+74/85] blk_done_softirq+0x4a/0x55
Feb 11 20:33:11 denkblock kernel:  [__do_softirq+53/117] __do_softirq+0x35/0x75
Feb 11 20:33:11 denkblock kernel:  [do_softirq+34/38] do_softirq+0x22/0x26
Feb 11 20:33:11 denkblock kernel:  [irq_exit+41/88] irq_exit+0x29/0x58
Feb 11 20:33:11 denkblock kernel:  [do_IRQ+88/107] do_IRQ+0x58/0x6b
Feb 11 20:33:11 denkblock kernel:  [tick_notify+356/547] tick_notify+0x164/0x223
Feb 11 20:33:11 denkblock kernel:  [acpi_hw_register_read+129/297] acpi_hw_register_read+0x81/0x129
Feb 11 20:33:11 denkblock kernel:  [common_interrupt+35/40] common_interrupt+0x23/0x28
Feb 11 20:33:11 denkblock kernel:  [<e0021814>] acpi_processor_idle+0x245/0x3c5 [processor]
Feb 11 20:33:11 denkblock kernel:  [cpu_idle+67/93] cpu_idle+0x43/0x5d
Feb 11 20:33:11 denkblock kernel:  [start_kernel+572/577] start_kernel+0x23c/0x241
Feb 11 20:33:11 denkblock kernel:  [unknown_bootoption+0/405] unknown_bootoption+0x0/0x195
Feb 11 20:33:11 denkblock kernel:  =======================


This block (except for the first line) is repeated 27 times with
virtually no variance. Then it proceeds like this:


Feb 11 20:33:11 denkblock kernel: BUG: scheduling while atomic: swapper/0/0x00000100
Feb 11 20:33:11 denkblock kernel: 
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted (2.6.24.1-dbg-1 #2)
Feb 11 20:33:11 denkblock kernel: EIP: 0060:[<e0021814>] EFLAGS: 00000216 CPU: 0
Feb 11 20:33:11 denkblock kernel: EIP is at acpi_processor_idle+0x245/0x3c5 [processor]
Feb 11 20:33:11 denkblock kernel: EAX: 4e9bda7e EBX: 00008f3f ECX: 00000358 EDX: 00000037
Feb 11 20:33:11 denkblock kernel: ESI: de52dac8 EDI: 007188d5 EBP: de52d800 ESP: c0309fc8
Feb 11 20:33:11 denkblock kernel:  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Feb 11 20:33:11 denkblock kernel: CR0: 8005003b CR2: b7f11e20 CR3: 1c5fb000 CR4: 000006d0
Feb 11 20:33:11 denkblock kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 11 20:33:11 denkblock kernel: DR6: ffff0ff0 DR7: 00000400
Feb 11 20:33:11 denkblock kernel:  [cpu_idle+67/93] cpu_idle+0x43/0x5d
Feb 11 20:33:11 denkblock kernel:  [start_kernel+572/577] start_kernel+0x23c/0x241
Feb 11 20:33:11 denkblock kernel:  [unknown_bootoption+0/405] unknown_bootoption+0x0/0x195
Feb 11 20:33:11 denkblock kernel:  =======================
Feb 11 20:33:11 denkblock kernel: bad: scheduling from the idle thread!
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted 2.6.24.1-dbg-1 #2
Feb 11 20:33:11 denkblock kernel:  [dequeue_task_idle+21/24] dequeue_task_idle+0x15/0x18
Feb 11 20:33:11 denkblock kernel:  [dequeue_task+10/20] dequeue_task+0xa/0x14
Feb 11 20:33:11 denkblock kernel:  [deactivate_task+28/39] deactivate_task+0x1c/0x27
Feb 11 20:33:11 denkblock kernel:  [schedule+195/759] schedule+0xc3/0x2f7
Feb 11 20:33:11 denkblock kernel:  [<e003434e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [schedule_timeout+110/139] schedule_timeout+0x6e/0x8b
Feb 11 20:33:11 denkblock kernel:  [process_timeout+0/5] process_timeout+0x0/0x5
Feb 11 20:33:11 denkblock kernel: ===
Feb 11 20:33:11 denkblock kernel: Pid: 31, comm: kblockd/0 Not tainted 2.6.24.1-dbg-1 #2
Feb 11 20:33:11 denkblock kernel:  [<e002c125>] piix_qc_defer+0xbb/0xff [ata_piix]
Feb 11 20:33:11 denkblock kernel:  [__alloc_pages+96/718] __alloc_pages+0x60/0x2ce
Feb 11 20:33:11 denkblock kernel:  [clockevents_program_event+224/238] clockevents_program_event+0xe0/0xee
Feb 11 20:33:11 denkblock kernel:  [<e0057f20>] ata_build_rw_tf+0x175/0x242 [libata]
Feb 11 20:33:11 denkblock kernel:  [<e0059347>] ata_scsi_rw_xlat+0x13d/0x1de [libata]
Feb 11 20:33:11 denkblock kernel:  [insert_work+89/92] insert_work+0x59/0x5c
Feb 11 20:33:11 denkblock kernel:  [<e0057f20>] ata_build_rw_tf+0x175/0x242 [libata]
Feb 11 20:33:11 denkblock kernel:  [<e0059347>] ata_scsi_rw_xlat+0x13d/0x1de [libata]
Feb 11 20:33:11 denkblock kernel:  [<e00594e4>] ata_scsi_translate+0xd6/0x10a [libata]
Feb 11 20:33:11 denkblock kernel:  [<e003434e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e005bb8f>] ata_scsi_queuecmd+0x17f/0x184 [libata]
Feb 11 20:33:11 denkblock kernel:  [<e005920a>] ata_scsi_rw_xlat+0x0/0x1de [libata]
Feb 11 20:33:11 denkblock kernel:  [<e0034aa8>] scsi_dispatch_cmd+0x197/0x20b [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [<e0039b33>] scsi_request_fn+0x256/0x2d7 [scsi_mod]
Feb 11 20:33:11 denkblock kernel:  [blk_remove_plug+78/90] blk_remove_plug+0x4e/0x5a
Feb 11 20:33:11 denkblock kernel:  [blk_unplug_work+0/12] blk_unplug_work+0x0/0xc
Feb 11 20:33:11 denkblock kernel:  [__generic_unplug_device+29/31] __generic_unplug_device+0x1d/0x1f
Feb 11 20:33:11 denkblock kernel:  [generic_unplug_device+6/8] generic_unplug_device+0x6/0x8
Feb 11 20:33:11 denkblock kernel:  [blk_unplug_work+11/12] blk_unplug_work+0xb/0xc
Feb 11 20:33:11 denkblock kernel:  [run_workqueue+107/223] run_workqueue+0x6b/0xdf
Feb 11 20:33:11 denkblock kernel:  [worker_thread+0/189] worker_thread+0x0/0xbd
Feb 11 20:33:11 denkblock kernel:  [worker_thread+178/189] worker_thread+0xb2/0xbd
Feb 11 20:33:11 denkblock kernel:  [autoremove_wake_function+0/53] autoremove_wake_function+0x0/0x35
Feb 11 20:33:11 denkblock kernel:  [kthread+54/93] kthread+0x36/0x5d
Feb 11 20:33:11 denkblock kernel:  [kthread+0/93] kthread+0x0/0x5d
Feb 11 20:33:11 denkblock kernel:  [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10
Feb 11 20:33:11 denkblock kernel:  =======================
Feb 11 20:33:11 denkblock kernel: bad: scheduling from the idle thread!
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted 2.6.24.1-dbg-1 #2
Feb 11 20:33:11 denkblock kernel:  [dequeue_task_idle+21/24] dequeue_task_idle+0x15/0x18
Feb 11 20:33:11 denkblock kernel:  [dequeue_task+10/20] dequeue_task+0xa/0x14
Feb 11 20:33:11 denkblock kernel:  [deactivate_task+28/39] deactivate_task+0x1c/0x27
Feb 11 20:33:11 denkblock kernel:  [schedule+195/759] schedule+0xc3/0x2f7
Feb 11 20:33:11 denkblock kernel:  [tick_nohz_restart_sched_tick+275/305] tick_nohz_restart_sched_tick+0x113/0x131
Feb 11 20:33:11 denkblock kernel:  [cpu_idle+91/93] cpu_idle+0x5b/0x5d
Feb 11 20:33:11 denkblock kernel:  [start_kernel+572/577] start_kernel+0x23c/0x241
Feb 11 20:33:11 denkblock kernel:  [unknown_bootoption+0/405] unknown_bootoption+0x0/0x195
Feb 11 20:33:11 denkblock kernel:  =======================


Now, the last block beginning with the lines

Feb 11 20:33:11 denkblock kernel: bad: scheduling from the idle thread!
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted 2.6.24.1-dbg-1 #2

is repeated for (p)ages with minor variations. For instance, it would
occasionally read like this:

Feb 11 20:33:11 denkblock kernel: bad: scheduling from the idle thread!
Feb 11 20:33:11 denkblock kernel: Pid: 0, comm: swapper Not tainted 2.6.24.1-dbg-1 #2
Feb 11 20:33:11 denkblock kernel:  [dequeue_task_idle+21/24] dequeue_task_idle+0x15/0x18
Feb 11 20:33:11 denkblock kernel:  [dequeue_task+10/20] dequeue_task+0xa/0x14
Feb 11 20:33:11 denkblock kernel:  [deactivate_task+28/39] deactivate_task+0x1c/0x27
Feb 11 20:33:11 denkblock kernel:  [schedule+195/759] schedule+0xc3/0x2f7
Feb 11 20:33:11 denkblock kernel:  [tick_nohz_restart_sched_tick+275/305] tick_nohz_restart_sched_tick+0x113/0x131
Feb 11 20:33:11 denkblock kernel:  [cpu_idle+91/93] cpu_idle+0x5b/0x5d
Feb 11 20:33:11 denkblock kernel:  [start_kernel+572/577] start_kernel+0x23c/0x241
Feb 11 20:33:11 denkblock kernel:  [unknown_bootoption+0/405] unknown_bootoption+0x0/0x195
Feb 11 20:33:11 denkblock kernel:  =======================

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-11 22:03       ` Elias Oltmanns
@ 2008-02-12  1:14         ` Tejun Heo
  2008-02-12  8:57           ` Elias Oltmanns
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2008-02-12  1:14 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Elias Oltmanns wrote:
> +static int piix_qc_defer(struct ata_queued_cmd *qc)
> +{
> +	static struct ata_port *ap = NULL;
> +	struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];

missing static?

> +	static int count = 0;
> +#define PIIX_QC_DEFER_THRESHOLD 5000
> +
> +	if (!ap)
> +		ap = qc->ap;
> +	else if (ap != qc->ap)
> +		return 0;
> +
> +	if (count > PIIX_QC_DEFER_THRESHOLD + 100)
> +		return 0;
> +
> +	count++;
> +	if (count < PIIX_QC_DEFER_THRESHOLD)
> +		return 0;
> +	else if (count == PIIX_QC_DEFER_THRESHOLD) {
> +		int i;
> +		if (ap->qc_allocated) {
> +			for (i = 0; i < ATA_MAX_QUEUE; i++)
> +				qcmd[i] = ap->qcmd[i];
> +			printk(KERN_DEBUG "piix_qc_defer(): saved current state\n");
> +			msleep(5000);
> +		} else
> +			count--;
> +	} else if (count == PIIX_QC_DEFER_THRESHOLD + 100) {
> +		dump_stack();
> +		count++;
> +	} else if (memcmp(qcmd, ap->qcmd, sizeof(qcmd[0]) * ATA_MAX_QUEUE))

memcmp() will always mismatch.  qcmd contains garbage.

> +		return ATA_DEFER_LINK;
> +	else
> +		count = 0;
> +	return 0;
> +}
> +
>  static const struct piix_map_db ich5_map_db = {
>  	.mask = 0x7,
>  	.port_enable = 0x3,
> ---
> 
> 
> ====================================================================
> Feb 11 20:33:05 denkblock kernel: piix_qc_defer(): saved current state
> Feb 11 20:33:11 denkblock kernel: BUG: scheduling while atomic: swapper/0/0x00000100

You can't call msleep(5000) there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12  1:14         ` Tejun Heo
@ 2008-02-12  8:57           ` Elias Oltmanns
  2008-02-12  9:05             ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-12  8:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

[-- Attachment #1: Type: text/plain, Size: 3354 bytes --]

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> +static int piix_qc_defer(struct ata_queued_cmd *qc)
>> +{
>> +	static struct ata_port *ap = NULL;
>> +	struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];
>
> missing static?

Oh well, I must have been too tired already yesterday. There are a few
more things I got wrong last time. Please see the new patch at the end
of this email.

This time I applied the patch to 2.6.24.1 and performed a

# cat large-file > /dev/null &
# tail -f /var/log/kern.log

and aborted once the output of dump_stack() had occurred. This proves
that piix_qc_defer() has declined the same command 100 times in
succession. However, this will only happen if the status of all the
commands enqueued for one port hasn't changed in the meantime. This
suggests to me that the threads scheduled for command execution and
completion aren't served for some reason. Any ideas?

The foot print of piix_qc_defer() in my log files looks like this, by
the way:

Feb 12 08:53:42 denkblock kernel: piix_qc_defer(): saved current state
Feb 12 08:53:42 denkblock kernel: Pid: 31, comm: kblockd/0 Not tainted 2.6.24.1-dbg-1 #1
Feb 12 08:53:42 denkblock kernel:  [<e0042116>] piix_qc_defer+0xac/0xeb [ata_piix]
Feb 12 08:53:42 denkblock kernel:  [<e008c4e4>] ata_scsi_translate+0xd6/0x10a [libata]
Feb 12 08:53:42 denkblock kernel:  [<e008af20>] ata_build_rw_tf+0x175/0x242 [libata]
Feb 12 08:53:42 denkblock kernel:  [<e006734e>] scsi_done+0x0/0x16 [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e008eb8f>] ata_scsi_queuecmd+0x17f/0x184 [libata]
Feb 12 08:53:42 denkblock kernel:  [<e008c20a>] ata_scsi_rw_xlat+0x0/0x1de [libata]
Feb 12 08:53:42 denkblock kernel:  [<e0067aa8>] scsi_dispatch_cmd+0x197/0x20b [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e006cb33>] scsi_request_fn+0x256/0x2d7 [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e0042136>] piix_qc_defer+0xcc/0xeb [ata_piix]
Feb 12 08:53:42 denkblock kernel:  [<c01961d3>] blk_run_queue+0x2a/0x4b
Feb 12 08:53:42 denkblock kernel:  [<e006bea2>] scsi_queue_insert+0x84/0x8e [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e006a70d>] scsi_delete_timer+0xf/0x50 [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e0067adc>] scsi_dispatch_cmd+0x1cb/0x20b [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<e006cb33>] scsi_request_fn+0x256/0x2d7 [scsi_mod]
Feb 12 08:53:42 denkblock kernel:  [<c0195289>] blk_remove_plug+0x4e/0x5a
Feb 12 08:53:42 denkblock kernel:  [<c01934c2>] blk_unplug_work+0x0/0xc
Feb 12 08:53:42 denkblock kernel:  [<c01952b2>] __generic_unplug_device+0x1d/0x1f
Feb 12 08:53:42 denkblock kernel:  [<c0195c07>] generic_unplug_device+0x6/0x8
Feb 12 08:53:42 denkblock kernel:  [<c01934cd>] blk_unplug_work+0xb/0xc
Feb 12 08:53:42 denkblock kernel:  [<c0122943>] run_workqueue+0x6b/0xdf
Feb 12 08:53:42 denkblock kernel:  [<c024dcd2>] schedule+0x1f3/0x20d
Feb 12 08:53:42 denkblock kernel:  [<c0122f37>] worker_thread+0x0/0xbd
Feb 12 08:53:42 denkblock kernel:  [<c0122fe9>] worker_thread+0xb2/0xbd
Feb 12 08:53:42 denkblock kernel:  [<c0125429>] autoremove_wake_function+0x0/0x35
Feb 12 08:53:42 denkblock kernel:  [<c01252d2>] kthread+0x36/0x5c
Feb 12 08:53:42 denkblock kernel:  [<c012529c>] kthread+0x0/0x5c
Feb 12 08:53:42 denkblock kernel:  [<c0104733>] kernel_thread_helper+0x7/0x10
Feb 12 08:53:42 denkblock kernel:  =======================

Regards,

Elias


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: test-qc_defer.patch --]
[-- Type: text/x-patch, Size: 2965 bytes --]

---

 drivers/ata/ata_piix.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b406b39..c987462 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -167,6 +167,7 @@ static void piix_set_dmamode(struct ata_
 static void ich_set_dmamode(struct ata_port *ap, struct ata_device *adev);
 static int ich_pata_cable_detect(struct ata_port *ap);
 static u8 piix_vmw_bmdma_status(struct ata_port *ap);
+static int piix_qc_defer(struct ata_queued_cmd *qc);
 #ifdef CONFIG_PM
 static int piix_pci_device_suspend(struct pci_dev *pdev, pm_message_t mesg);
 static int piix_pci_device_resume(struct pci_dev *pdev);
@@ -311,6 +312,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -343,6 +345,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -371,6 +374,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= ata_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -402,6 +406,7 @@ static const struct ata_port_operations 
 	.bmdma_start		= ata_bmdma_start,
 	.bmdma_stop		= ata_bmdma_stop,
 	.bmdma_status		= piix_vmw_bmdma_status,
+	.qc_defer		= piix_qc_defer,
 	.qc_prep		= ata_qc_prep,
 	.qc_issue		= ata_qc_issue_prot,
 	.data_xfer		= ata_data_xfer,
@@ -419,6 +424,43 @@ static const struct ata_port_operations 
 	.port_start		= ata_port_start,
 };
 
+static int piix_qc_defer(struct ata_queued_cmd *qc)
+{
+	static struct ata_port *ap = NULL;
+	static struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];
+	static int count = 0;
+#define PIIX_QC_DEFER_THRESHOLD 5000
+
+	if (!ap)
+		ap = qc->ap;
+	else if (ap != qc->ap)
+		return 0;
+
+	if (count > PIIX_QC_DEFER_THRESHOLD + 100)
+		return 0;
+
+	count++;
+	if (count < PIIX_QC_DEFER_THRESHOLD)
+		return 0;
+	else if (count == PIIX_QC_DEFER_THRESHOLD) {
+		int i;
+		if (ap->qc_allocated) {
+			for (i = 0; i < ATA_MAX_QUEUE; i++)
+				qcmd[i] = ap->qcmd[i];
+			printk(KERN_DEBUG "piix_qc_defer(): saved current state\n");
+			return ATA_DEFER_LINK;
+		} else
+			count--;
+	} else if (count == PIIX_QC_DEFER_THRESHOLD + 100) {
+		dump_stack();
+		count++;
+	} else if (!memcmp(qcmd, ap->qcmd, sizeof(qcmd[0]) * ATA_MAX_QUEUE))
+		return ATA_DEFER_LINK;
+	else
+		count = 0;
+	return 0;
+}
+
 static const struct piix_map_db ich5_map_db = {
 	.mask = 0x7,
 	.port_enable = 0x3,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12  8:57           ` Elias Oltmanns
@ 2008-02-12  9:05             ` Tejun Heo
  2008-02-12  9:43               ` Elias Oltmanns
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2008-02-12  9:05 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Elias Oltmanns wrote:
> Tejun Heo <htejun@gmail.com> wrote:
>> Elias Oltmanns wrote:
>>> +static int piix_qc_defer(struct ata_queued_cmd *qc)
>>> +{
>>> +	static struct ata_port *ap = NULL;
>>> +	struct ata_queued_cmd qcmd[ATA_MAX_QUEUE];
>> missing static?
> 
> Oh well, I must have been too tired already yesterday. There are a few
> more things I got wrong last time. Please see the new patch at the end
> of this email.
> 
> This time I applied the patch to 2.6.24.1 and performed a
> 
> # cat large-file > /dev/null &
> # tail -f /var/log/kern.log
> 
> and aborted once the output of dump_stack() had occurred. This proves
> that piix_qc_defer() has declined the same command 100 times in
> succession. However, this will only happen if the status of all the
> commands enqueued for one port hasn't changed in the meantime. This
> suggests to me that the threads scheduled for command execution and
> completion aren't served for some reason. Any ideas?

Blocked counts of 1 will cause busy looping because when blk_run_queue()
returns because it's recursing too deep, it schedules unplug work right
away, so it will easily loop 100 times.  Max blocked counts should be
adjusted to two (needs some testing before actually submitting the
change).  But that still shouldn't cause any lock up.  What happens if
you remove the 100 times limit?  Does the machine hang on IO?

Thanks.


-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12  9:05             ` Tejun Heo
@ 2008-02-12  9:43               ` Elias Oltmanns
  2008-02-12 12:56                 ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-12  9:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> This proves that piix_qc_defer() has declined the same command 100
>> times in succession. However, this will only happen if the status of
>> all the commands enqueued for one port hasn't changed in the
>> meantime. This suggests to me that the threads scheduled for command
>> execution and completion aren't served for some reason. Any ideas?
>
> Blocked counts of 1 will cause busy looping because when blk_run_queue()
> returns because it's recursing too deep, it schedules unplug work right
> away, so it will easily loop 100 times.  Max blocked counts should be
> adjusted to two (needs some testing before actually submitting the
> change).  But that still shouldn't cause any lock up.  What happens if
> you remove the 100 times limit?  Does the machine hang on IO?

Yes, it does. In fact, I had already verified that before sending the
previous email.

Regards,

Elias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12  9:43               ` Elias Oltmanns
@ 2008-02-12 12:56                 ` Tejun Heo
  2008-02-18 20:03                   ` Elias Oltmanns
  2008-04-12  8:02                   ` Elias Oltmanns
  0 siblings, 2 replies; 12+ messages in thread
From: Tejun Heo @ 2008-02-12 12:56 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Elias Oltmanns wrote:
> Tejun Heo <htejun@gmail.com> wrote:
>> Elias Oltmanns wrote:
>>> This proves that piix_qc_defer() has declined the same command 100
>>> times in succession. However, this will only happen if the status of
>>> all the commands enqueued for one port hasn't changed in the
>>> meantime. This suggests to me that the threads scheduled for command
>>> execution and completion aren't served for some reason. Any ideas?
>> Blocked counts of 1 will cause busy looping because when blk_run_queue()
>> returns because it's recursing too deep, it schedules unplug work right
>> away, so it will easily loop 100 times.  Max blocked counts should be
>> adjusted to two (needs some testing before actually submitting the
>> change).  But that still shouldn't cause any lock up.  What happens if
>> you remove the 100 times limit?  Does the machine hang on IO?
> 
> Yes, it does. In fact, I had already verified that before sending the
> previous email.

Hmmm.... it's supposed not to lock up although it can cause busy wait.
I'll test it tomorrow.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12 12:56                 ` Tejun Heo
@ 2008-02-18 20:03                   ` Elias Oltmanns
  2008-04-12  8:02                   ` Elias Oltmanns
  1 sibling, 0 replies; 12+ messages in thread
From: Elias Oltmanns @ 2008-02-18 20:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Hi Tejun,

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> Tejun Heo <htejun@gmail.com> wrote:
>>> Elias Oltmanns wrote:
>>>> This proves that piix_qc_defer() has declined the same command 100
>>>> times in succession. However, this will only happen if the status of
>>>> all the commands enqueued for one port hasn't changed in the
>>>> meantime. This suggests to me that the threads scheduled for command
>>>> execution and completion aren't served for some reason. Any ideas?
>>> Blocked counts of 1 will cause busy looping because when blk_run_queue()
>>> returns because it's recursing too deep, it schedules unplug work right
>>> away, so it will easily loop 100 times.  Max blocked counts should be
>>> adjusted to two (needs some testing before actually submitting the
>>> change).  But that still shouldn't cause any lock up.  What happens if
>>> you remove the 100 times limit?  Does the machine hang on IO?
>> 
>> Yes, it does. In fact, I had already verified that before sending the
>> previous email.
>
> Hmmm.... it's supposed not to lock up although it can cause busy wait.
> I'll test it tomorrow.

Have you had a chance to test yet? What is to be done about it?

Regards,

Elias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Current qc_defer implementation may lead to infinite recursion
  2008-02-12 12:56                 ` Tejun Heo
  2008-02-18 20:03                   ` Elias Oltmanns
@ 2008-04-12  8:02                   ` Elias Oltmanns
  1 sibling, 0 replies; 12+ messages in thread
From: Elias Oltmanns @ 2008-04-12  8:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> Tejun Heo <htejun@gmail.com> wrote:
>>> Elias Oltmanns wrote:
>>>> This proves that piix_qc_defer() has declined the same command 100
>>>> times in succession. However, this will only happen if the status of
>>>> all the commands enqueued for one port hasn't changed in the
>>>> meantime. This suggests to me that the threads scheduled for command
>>>> execution and completion aren't served for some reason. Any ideas?
>>> Blocked counts of 1 will cause busy looping because when blk_run_queue()
>>> returns because it's recursing too deep, it schedules unplug work right
>>> away, so it will easily loop 100 times.  Max blocked counts should be
>>> adjusted to two (needs some testing before actually submitting the
>>> change).  But that still shouldn't cause any lock up.  What happens if
>>> you remove the 100 times limit?  Does the machine hang on IO?
>> 
>> Yes, it does. In fact, I had already verified that before sending the
>> previous email.
>
> Hmmm.... it's supposed not to lock up although it can cause busy wait.

The same problem still exitst in 2.6.25-rc9. As I understand, not all
configurations are affected. So, perhaps I should bring this to the
attention of those who are working on the scheduler. What do you think?

Regards,

Elias

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-04-12  8:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-10 17:54 Current qc_defer implementation may lead to infinite recursion Elias Oltmanns
2008-02-11  5:06 ` Tejun Heo
2008-02-11  7:57   ` Elias Oltmanns
2008-02-11  8:43     ` Tejun Heo
2008-02-11 22:03       ` Elias Oltmanns
2008-02-12  1:14         ` Tejun Heo
2008-02-12  8:57           ` Elias Oltmanns
2008-02-12  9:05             ` Tejun Heo
2008-02-12  9:43               ` Elias Oltmanns
2008-02-12 12:56                 ` Tejun Heo
2008-02-18 20:03                   ` Elias Oltmanns
2008-04-12  8:02                   ` Elias Oltmanns

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).