[Bug] Double acquisition of the same semaphore

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Xuân Baldauf" <xuan--2006--dm-devel--redhat.com@baldauf.org>
To: dm-devel@redhat.com
Subject: [Bug] Double acquisition of the same semaphore
Date: Sun, 24 Dec 2006 04:13:42 +0100	[thread overview]
Message-ID: <458DF066.10708@baldauf.org> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 4523 bytes --]

Hello,

I have a Linux box where block devices are layered in this hierarchy:
dm-crypt over LVM2 over software-RAID5 over LVM over harddisks. Since
that hierarchy was used, there were occassionally crashes, whose
probability at a certain time seems to be proportional to the amount of
I/O done. Now one disk of the RAID5 showed block errors and I tried to
copy large amounts of data using pvmove to other block devices, but
without success, the machine crashes before the copying finishes. Now I
upgraded to linux-2.6.20-rc1 and this yields error messages when
accessing the block device hierarchy:

<4>[   37.732000] =============================================
<4>[   37.732000] [ INFO: possible recursive locking detected ]
<4>[   37.732000] 2.6.20-rc1-nopreempt #21
<4>[   37.732000] ---------------------------------------------
<4>[   37.732000] kcryptd/0/755 is trying to acquire lock:
<4>[   37.732000]  (&md->io_lock){----}, at: [<c02e3bc8>]
dm_request+0x18/0xf0
<4>[   37.732000]
<4>[   37.732000] but task is already holding lock:
<4>[   37.732000]  (&md->io_lock){----}, at: [<c02e3bc8>]
dm_request+0x18/0xf0
<4>[   37.732000]
<4>[   37.732000] other info that might help us debug this:
<4>[   37.732000] 1 lock held by kcryptd/0/755:
<4>[   37.732000]  #0:  (&md->io_lock){----}, at: [<c02e3bc8>]
dm_request+0x18/0xf0
<4>[   37.732000]
<4>[   37.732000] stack backtrace:
<4>[   37.732000]  [<c01042ee>] dump_trace+0x1ce/0x200
<4>[   37.732000]  [<c010433a>] show_trace_log_lvl+0x1a/0x30
<4>[   37.732000]  [<c0104972>] show_trace+0x12/0x20
<4>[   37.732000]  [<c0104a26>] dump_stack+0x16/0x20
<4>[   37.732000]  [<c013a3e5>] __lock_acquire+0xb35/0xe00
<4>[   37.732000]  [<c013a72e>] lock_acquire+0x7e/0xb0
<4>[   37.732000]  [<c01352aa>] down_read+0x3a/0x50
<4>[   37.732000]  [<c02e3bc8>] dm_request+0x18/0xf0
<4>[   37.732000]  [<c0208aed>] generic_make_request+0x1cd/0x380
<4>[   37.732000]  [<c02d1563>] make_request+0x553/0x5f0
<4>[   37.732000]  [<c0208aed>] generic_make_request+0x1cd/0x380
<4>[   37.732000]  [<c02e2bbf>] __map_bio+0xcf/0x130
<4>[   37.732000]  [<c02e36b1>] __split_bio+0x171/0x430
<4>[   37.732000]  [<c02e3c5a>] dm_request+0xaa/0xf0
<4>[   37.732000]  [<c0208aed>] generic_make_request+0x1cd/0x380
<4>[   37.732000]  [<c02e9cbf>] kcryptd_do_work+0x27f/0x3e0
<4>[   37.732000]  [<c012e9c5>] run_workqueue+0xf5/0x1d0
<4>[   37.732000]  [<c012f09a>] worker_thread+0x13a/0x160
<4>[   37.732000]  [<c0131cf3>] kthread+0xa3/0xd0
<4>[   37.732000]  [<c0103d0f>] kernel_thread_helper+0x7/0x18
<4>[   37.732000]  =======================
<3>[   37.733000] BUG: workqueue leaked lock or atomic:
kcryptd/0/0x00000000/755, in_atomic()=0, lockdep_depth(current)=1,
previous_lockdep_depth=0.
<3>[   37.734000]     last function: kcryptd_do_work+0x0/0x3e0
<4>[   37.734000] 1 lock held by kcryptd/0/755:
<4>[   37.734000]  #0:  (&md->io_lock){----}, at: [<c02e3bc8>]
dm_request+0x18/0xf0
<4>[   37.734000]  [<c01042ee>] dump_trace+0x1ce/0x200
<4>[   37.734000]  [<c010433a>] show_trace_log_lvl+0x1a/0x30
<4>[   37.734000]  [<c0104972>] show_trace+0x12/0x20
<4>[   37.734000]  [<c0104a26>] dump_stack+0x16/0x20
<4>[   37.734000]  [<c012ea65>] run_workqueue+0x195/0x1d0
<4>[   37.734000]  [<c012f09a>] worker_thread+0x13a/0x160
<4>[   37.734000]  [<c0131cf3>] kthread+0xa3/0xd0
<4>[   37.735000]  [<c0103d0f>] kernel_thread_helper+0x7/0x18
<4>[   37.735000]  =======================

As you can see, dm_request() finally calls itself, leading to a double
acquisition of the same semaphore &md->io_lock, while double acquisition
of the same semaphore is not supported by standard Linux kernel
semaphores. The semaphore should also be double freed and thus freed to
early. This might explain my I/O-sensitive crashes I suffer.

How can this bug be solved?

   1. Use recursive semaphores? I could not find recursive semaphores
      (semaphores which may be acquired multiple times by the same
      thread) in the Linux kernel.
   2. Pass an "already locked" flag between all the functions, including
      generic_make_request()?
   3. Use a kind of thread-local storage for knowing a list of all locks
      held?
   4. Implement recursive semaphores by myself? (I should leave this to
      the more experienced Linux hackers.)
   5. Rework device-mapper so that there will no a recursive call of
      dm_request() on the same mapped_device? (Is this possible?)

Please help.

Merry Christmas!

Xuân.


[-- Attachment #1.2: Type: text/html, Size: 6137 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

                 reply	other threads:[~2006-12-24  3:13 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=458DF066.10708@baldauf.org \
    --to=xuan--2006--dm-devel--redhat.com@baldauf.org \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.