All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Jiaying Zhang <jiayingz@google.com>
Cc: Jan Kara <jack@suse.cz>, linux-ext4@vger.kernel.org
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)
Date: Fri, 12 Aug 2011 10:23:41 +0400	[thread overview]
Message-ID: <4E44C6ED.2030506@msgid.tls.msk.ru> (raw)
In-Reply-To: <CAFgt=MCgfN-6_uBm=pDaeLqJKJQkSHry91XbYwWq=iBXip3D-w@mail.gmail.com>

12.08.2011 06:46, Jiaying Zhang wrote:
> Hi Michael,
> 
> Could you try your test with the patch I just posted:
> http://marc.info/?l=linux-ext4&m=131311627101278&w=2
> and see whether it fixes the problem?

No it does not.  I'm now able to trigger it more or
less reliable - I need to decompress a relatively
small (about 70Gb) oracle database and try to start
it (this requires a rebuild of initrd and reboot ofcourse --
whole thing takes about 15 minutes) - and I see this:

[  945.729965] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767175 by oracle; performance will be poor.
[  960.915602] SysRq : Show Blocked State
[  960.915650]   task                        PC stack   pid father
[  960.915852] oracle          D 0000000000000000     0  4985      1 0x00000000
[  960.915909]  ffff88103e627040 0000000000000082 ffff881000000000 ffff881078e3f7d0
[  960.917855]  ffff88103f88ffd8 ffff88103f88ffd8 ffff88103f88ffd8 ffff88103e627040
[  960.917953]  0000000001c08400 ffff88203e98c948 ffff88207873e240 ffffffff813527c6
[  960.918045] Call Trace:
[  960.918092]  [<ffffffff813527c6>] ? printk+0x43/0x48
[  960.918153]  [<ffffffffa01432a8>] ? ext4_msg+0x58/0x60 [ext4]
[  960.918201]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
[  960.918252]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
[  960.918301]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
[  960.918348]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
[  960.918392]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
[  960.918436]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
[  960.918483]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

(Inum 5767175 is one of oracle redologs).

Jan, I lied to you initially - I didn't even test your first patch,
because I loaded the wrong initrd.  With it applied, situation does
not improve still, and it looks exactly the same as with this new
patch by Jiaying Zhang:

[  415.288104] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767177 by oracle; performance will be poor.
[  422.967323] SysRq : Show Blocked State
[  422.967494]   task                        PC stack   pid father
[  422.967872] oracle          D 0000000000000000     0  3743      1 0x00000000
[  422.968112]  ffff88203e5a2810 0000000000000086 ffff882000000000 ffff88103f403080
[  422.968505]  ffff88203eeddfd8 ffff88203eeddfd8 ffff88203eeddfd8 ffff88203e5a2810
[  422.968895]  0000000001c08400 ffff88103f3db348 ffff88103f2fe800 ffffffff813527c6
[  422.969287] Call Trace:
[  422.969397]  [<ffffffff813527c6>] ? printk+0x43/0x48
[  422.969528]  [<ffffffffa0143288>] ? ext4_msg+0x58/0x60 [ext4]
[  422.969643]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
[  422.969758]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
[  422.969873]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
[  422.969985]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
[  422.970093]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
[  422.970200]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
[  422.970312]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

Note in both this cases, I now see slightly different
backtrace -- both mentions ext4_llseek and abort_exclusive_wait,
but the rest is different:

>> [   76.982985] EXT4-fs (dm-1): Unaligned AIO/DIO on inode 3407879 by oracle; performance will be poor.
>> [ 1469.734114] SysRq : Show Blocked State
>> [ 1469.734157]   task                        PC stack   pid father
>> [ 1469.734473] oracle          D 0000000000000000     0  6146      1 0x00000000
>> [ 1469.734525]  ffff88103f604810 0000000000000082 ffff881000000000 ffff881079791040
>> [ 1469.734603]  ffff880432c19fd8 ffff880432c19fd8 ffff880432c19fd8 ffff88103f604810
>> [ 1469.734681]  ffffea000ec13590 ffffffff00000000 ffff881438c8dad8 ffffffff810eeda2
>> [ 1469.734760] Call Trace:
>> [ 1469.734800]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
>> [ 1469.734863]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
>> [ 1469.734909]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
>> [ 1469.734956]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
>> [ 1469.734999]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
>> [ 1469.735039]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
>> [ 1469.735078]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
>> [ 1469.735122]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

As Jan already pointed out, this place looks bogus, and
the same can be said about the new backtrace.  So I wonder
if there's some stack corruption going on there as well.

Btw, does ext4_llseek() look sane here?  Note it's called from
aio_submit() -- does it _ever_ implement SEEKs?

Maybe some debugging is neecessary here?

Thank you!

/mjt

  reply	other threads:[~2011-08-12  6:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-10 10:51 DIO process stuck apparently due to dioread_nolock (3.0) Michael Tokarev
2011-08-11 11:59 ` Jan Kara
2011-08-11 12:21   ` Michael Tokarev
2011-08-11 14:01     ` Jan Kara
2011-08-11 20:05       ` Michael Tokarev
2011-08-12  2:46         ` Jiaying Zhang
2011-08-12  6:23           ` Michael Tokarev [this message]
2011-08-12  7:07             ` Michael Tokarev
2011-08-12 13:07             ` Jan Kara
2011-08-12 15:55               ` Michael Tokarev
2011-08-12 17:01                 ` Eric Sandeen
2011-08-12 17:34                   ` Michael Tokarev
2011-08-13 16:02                     ` Tao Ma
2011-08-14 20:57                       ` Michael Tokarev
2011-08-14 21:07                         ` Michael Tokarev
2011-08-15  2:36                           ` Tao Ma
2011-08-15  8:00                             ` Michael Tokarev
2011-08-15  8:56                               ` Michael Tokarev
2011-08-15  9:03                                 ` Michael Tokarev
2011-08-15 10:28                                   ` Tao Ma
2011-08-15 23:53                                 ` Jiaying Zhang
2011-08-16  4:15                                   ` Tao Ma
2011-08-16  8:38                                   ` Michael Tokarev
2011-08-16 13:53                                   ` Jan Kara
2011-08-16 15:03                                     ` Tao Ma
2011-08-16 21:32                                       ` Jiaying Zhang
2011-08-16 22:28                                         ` Michael Tokarev
2011-08-16 23:07                                           ` Jiaying Zhang
2011-08-17 17:02                                             ` Ted Ts'o
2011-08-18  6:49                                               ` Michael Tokarev
2011-08-18 18:54                                                 ` Jiaying Zhang
2011-08-19  3:20                                                   ` Tao Ma
2011-08-19  3:18                                                 ` Tao Ma
2011-08-19  7:05                                                   ` Michael Tokarev
2011-08-19 17:55                                                     ` Jiaying Zhang
2011-08-16 23:59                                         ` Dave Chinner
2011-08-17  0:08                                           ` Jiaying Zhang
2011-08-17  2:22                                             ` Tao Ma
2011-08-17  9:04                                             ` Jan Kara
2011-08-15 16:08                       ` Eric Sandeen
2011-08-16  4:12                         ` Tao Ma
2011-08-16  6:15                         ` Tao Ma
2011-08-12 21:19                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E44C6ED.2030506@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=jack@suse.cz \
    --cc=jiayingz@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.