Re: DIO process stuck apparently due to dioread_nolock (3.0)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: linux-ext4@vger.kernel.org
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)
Date: Thu, 11 Aug 2011 13:59:43 +0200	[thread overview]
Message-ID: <20110811115943.GF4755@quack.suse.cz> (raw)
In-Reply-To: <4E4262A5.6030903@msgid.tls.msk.ru>

  Hello,

On Wed 10-08-11 14:51:17, Michael Tokarev wrote:
> For a few days I'm evaluating various options to use
> storage.  I'm interested in concurrent direct I/O
> (oracle rdbms workload).
> 
> I noticed that somehow, ext4fs in mixed read-write
> test greatly prefers writes over reads - writes goes
> at full speed while reads are almost non-existent.
> 
> Sandeen on IRC pointed me at dioread_nolock mount
> option, which I tried with great results, if not
> one "but".
> 
> There's a deadlock somewhere, which I can't trigger
> "on demand" - I can't hit the right condition.  It
> happened twice in a row already, each time after the
> same scenario (more about that later).
> 
> When it happens, a process doing direct AIO stalls
> infinitely, with the following backtrace:
> 
> [87550.759848] INFO: task oracle:23176 blocked for more than 120 seconds.
> [87550.759892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [87550.759955] oracle          D 0000000000000000     0 23176      1 0x00000000
> [87550.760006]  ffff8820457b47d0 0000000000000082 ffff880600000000 ffff881278e3f7d0
> [87550.760085]  ffff8806215c1fd8 ffff8806215c1fd8 ffff8806215c1fd8 ffff8820457b47d0
> [87550.760163]  ffffea0010bd7c68 ffffffff00000000 ffff882045512ef8 ffffffff810eeda2
> [87550.760245] Call Trace:
> [87550.760285]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
> [87550.760327]  [<ffffffff81111ded>] ? kmem_getpages+0x5d/0x170
> [87550.760367]  [<ffffffff81112e58>] ? ____cache_alloc_node+0x48/0x140
> [87550.760430]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
> [87550.760475]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
> [87550.760523]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
> [87550.760566]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
> [87550.760607]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
> [87550.760646]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
> [87550.760689]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
  Hmm, the stack trace does not quite make sense to me - the part between
__do_fault and aio_rw_vect_retry is somehow broken. I can imagine we
blocked in ext4_file_write() but I don't see any place there where we would
allocate memory. By any chance, are there messages like "Unaligned AIO/DIO
on inode ..." in the kernel log?

> At this point, the process in question can't be killed or
> stopped.  Yes it's oracle DB, and I can kill all other processes
> of this instance (this one is lgwr, aka log writer), but the stuck
> process will continue to be stuck, so it is not an inter-process
> deadlock.
> 
> echo "w" > /proc/sysrq-trigger shows only that process, with the
> same stack trace.
> 
> This is 3.0.1 kernel from kernel.org (amd64 arch).  The system is
> a relatively large box (IBM System x3850 X5).  So far, I've seen
> this issue twice, and each time in the following scenario:
> 
> I copy an oracle database from another machine to filesystem
> mounted with dioread_nolock, and right after the copy completes,
> I start the database.  And immediately when Oracle opens its
> DB ("Database opened") I see stuck lgwr process like above.
> 
> So I suspect it happens when there are some unwritten files
> in buffer/page cache and some process tries to do direct
> writes.
> 
> I haven't seen this happening without dioread_nolock, but since
> I don't have an easy reproducer I can't say this mount option
> is a requiriment.  So far, I was able to trigger it only after
> large db copy, with small database I created in order to try
> to reproduce it the issue does not happen.
> 
> And sure thing, when it happens, the only way to clean up is
> to forcible reboot the machine (echo b > sysrq-trigger).
> 
> I'll continue experiments in a hope to find an easier reproducer,
> but the problem is that I've little time left before the machine
> in question will go into production.  So if anyone have hints
> for this issue, please share.. ;)

							Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2011-08-11 11:59 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-10 10:51 DIO process stuck apparently due to dioread_nolock (3.0) Michael Tokarev
2011-08-11 11:59 ` Jan Kara [this message]
2011-08-11 12:21   ` Michael Tokarev
2011-08-11 14:01     ` Jan Kara
2011-08-11 20:05       ` Michael Tokarev
2011-08-12  2:46         ` Jiaying Zhang
2011-08-12  6:23           ` Michael Tokarev
2011-08-12  7:07             ` Michael Tokarev
2011-08-12 13:07             ` Jan Kara
2011-08-12 15:55               ` Michael Tokarev
2011-08-12 17:01                 ` Eric Sandeen
2011-08-12 17:34                   ` Michael Tokarev
2011-08-13 16:02                     ` Tao Ma
2011-08-14 20:57                       ` Michael Tokarev
2011-08-14 21:07                         ` Michael Tokarev
2011-08-15  2:36                           ` Tao Ma
2011-08-15  8:00                             ` Michael Tokarev
2011-08-15  8:56                               ` Michael Tokarev
2011-08-15  9:03                                 ` Michael Tokarev
2011-08-15 10:28                                   ` Tao Ma
2011-08-15 23:53                                 ` Jiaying Zhang
2011-08-16  4:15                                   ` Tao Ma
2011-08-16  8:38                                   ` Michael Tokarev
2011-08-16 13:53                                   ` Jan Kara
2011-08-16 15:03                                     ` Tao Ma
2011-08-16 21:32                                       ` Jiaying Zhang
2011-08-16 22:28                                         ` Michael Tokarev
2011-08-16 23:07                                           ` Jiaying Zhang
2011-08-17 17:02                                             ` Ted Ts'o
2011-08-18  6:49                                               ` Michael Tokarev
2011-08-18 18:54                                                 ` Jiaying Zhang
2011-08-19  3:20                                                   ` Tao Ma
2011-08-19  3:18                                                 ` Tao Ma
2011-08-19  7:05                                                   ` Michael Tokarev
2011-08-19 17:55                                                     ` Jiaying Zhang
2011-08-16 23:59                                         ` Dave Chinner
2011-08-17  0:08                                           ` Jiaying Zhang
2011-08-17  2:22                                             ` Tao Ma
2011-08-17  9:04                                             ` Jan Kara
2011-08-15 16:08                       ` Eric Sandeen
2011-08-16  4:12                         ` Tao Ma
2011-08-16  6:15                         ` Tao Ma
2011-08-12 21:19                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110811115943.GF4755@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mjt@tls.msk.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.