From: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
To: linux-raid@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Subject: [BUGREPORT] The kernel thread for md RAID10 could cause a md RAID10 array deadlock
Date: Wed, 13 Feb 2008 17:33:44 +0900 [thread overview]
Message-ID: <47B2AB68.1040604@ce.jp.nec.com> (raw)
This message describes another issue about md-RAID10 found by
testing the 2.6.24 md RAID10 using new scsi fault injection framework.
Abstract:
When a scsi command timeout occurs during RAID10 recovery, the kernel
threads for md RAID10 could cause a md RAID10 array deadlock.
The nr_pending flag set during normal I/O and barrier flag set by recovery
thread conflicts, results in raid10d() and sync_request() deadlock.
Details:
normal I/O recovery I/O
-----------------------------------------------------------------------------
B-1. kernel thread starts by calling
A-1. A process issues a read request. md_do_sync()
make_request() for raid10 is called
by block layer.
B-2. md_do_sync() calls sync_request
operation for md raid10.
A-2. In make_request(), wait_barrier()
increments nr_pending flag.
A-3. A read command is issued to the disk,
but it takes a lot of time because
of no response from the disk.
B-3. sync_request() of raid10 calls
raise_barrier(), increments barrier
flag, and waits for nr_pending set
in (A-2) to be cleared.
A-4. raid10_end_read_request() is called
in the interrupt context. It detects
read error and wakes up raid10d kernel
thread.
A-5. raid10d() calls freeze_array() and waits
for barrier flag incremented in (B-3)
to be cleared.
(** stalls here because waiting conditions in A-5 and B-3 are never met **)
A-6. raid1d calls fix_read_error() to
handle read error. B-4. barrier flag will be cleared after
the pending barrier request completes.
A-7 nr_pending flag will be cleared after
the pending read request completes.
The deadlock mechanism:
When a normal I/O occurs during recovery, nr_pending flag incremented in (A-2)
blocks subsequent recovery I/O until the normal I/O completes. The recovery thread
will increment barrier flag and wait for nr_pending flag to be decremented (B-3).
Normally, nr_pending flag is decremented after the I/O has completed successfully.
Also, barrier flag is decremented after barrier request (such as recovery I/O) has
completed successfully.
If a normal read I/O results in scsi command timeout, the read request is handled
by error handler in raid10d kernel thread. Then, raid10d calls freeze_array().
But the barrier flag is set by (B-3), freeze_array() waits for barrier request
completion. On the other hand, the recovery thread stalls waiting for nr_pending
flag to be decremented(B-3). In this way, both error handler and recovery
thread are deadlocked.
This problem can be reproduced by using the new scsi fault injection framework,
using "no response from the SCSI device" simulation.
I think the new scsi fault injection framework is a little bit complicated
to use, so I will upload some sample wrapper shell scripts for usability.
--
---------------------------------------------------------
Kenichi TANAKA | Open Source Software Platform Development Division
| Computers Software Operations Unit, NEC Corporation
| k-tanaka@ce.jp.nec.com
next reply other threads:[~2008-02-13 8:33 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-13 8:33 K.Tanaka [this message]
2008-03-03 0:11 ` [BUGREPORT] The kernel thread for md RAID10 could cause a md RAID10 array deadlock Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47B2AB68.1040604@ce.jp.nec.com \
--to=k-tanaka@ce.jp.nec.com \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).