linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergio Callegari <sergio.callegari@gmail.com>
To: linux-ide@vger.kernel.org
Subject: Likely race condition for some disk interface / peripherals combinations (hung processes, high iowait)
Date: Fri, 21 Aug 2015 00:42:52 +0200	[thread overview]
Message-ID: <55D657EC.2020107@gmail.com> (raw)

Hi,

please CC me in answers.

I have recently run into an issue with a system with

- ASRock N68S motherboard with AMD Phenom(tm) II X4 920 Processor and
NVIDIA MCP61 SATA/IDE Chipset

- IDE interface where master is HL-DT-ST DVD-RAM GH22NP20 CDROM/DVD writer and 
slave is an IOMEGA ZIP 100 ATAPI Floppy

Symptoms can occur many hours after boot and include

* IOWAIT suddenly jumping high
* Processes hanging (e.g. all activities related to disks and specifically to 
the IOMEGA drive. For instance commands like lsblk or blkid hang as well as 
attempts at mounting the disk)
* Machine unable to shutdown

Note that before the symptoms occur, all seems to work just fine including use 
of the IOMEGA drive. Removing the IOMEGA drive from the system cures the issue.
The system used to work just fine until a recent past.

Due to the race-like nature of the issue and symptoms coming so late, bisecting 
has been extremely painful. It lead to commit 045065d.  However, this is just a 
trivial fix for a logic condition. What happens is that the bug fixed by 045065d 
was papering over a real issue in the kernel.

Some research revealed that the issue is biting other people as well. For 
instance, see https://bbs.archlinux.org/viewtopic.php?id=189324 and 
https://bugzilla.kernel.org/show_bug.cgi?id=87581

It is likely that as soon as distros start adopting kernels that incorporate 
045065d, more and more people will be affected by the issue.

Back in Nov 2014, a tentative patch was submitted to the LKML by Christoph 
Hellwig. See https://lkml.org/lkml/2014/11/20/581.

The patch fixes the issue for me, however, it was not incorporated in the kernel 
because, by the own words of Christoph, it is just hiding the real issue.
For others, increasing the delay time in blk_delay_queue(q, SCSI_QUEUE_DELAY)
also works.

Is there any way to help going into this problem?

Best regards,

Sergio Callegari




                 reply	other threads:[~2015-08-20 22:42 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D657EC.2020107@gmail.com \
    --to=sergio.callegari@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).