All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jaxboe@fusionio.com>
To: Theodore Tso <tytso@MIT.EDU>
Cc: Dave Chinner <david@fromorbit.com>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [GIT PULL] Core block IO bits for 2.6.39 - early Oops
Date: Fri, 25 Mar 2011 13:14:36 +0100	[thread overview]
Message-ID: <4D8C872C.1030805@fusionio.com> (raw)
In-Reply-To: <91CCAB14-F9CC-4676-94C3-FBCDD0663FD5@mit.edu>

On 2011-03-25 12:59, Theodore Tso wrote:
> 
> On Mar 25, 2011, at 12:41 AM, Dave Chinner wrote:
> 
>>>
>>> It works insofar as the Oops is gone. But my xfs partitions apparently
>>> still get corrupted (I had to run xfs_repair on several of them, because
>>> they would not mount otherwise).
>>
>> So the patchset is causing repeatable filesystem corruption? Sounds
>> to me like this series is not yet ready for mainline merging. Last
>> thing I want to spend the .39 cycle helping people recover busted
>> filesystems as a result of undercooked block layer changes...
> 
> FYI.   I did a trial merge last night of the ext4 changes last night with
> the tip of Linus's tree.   The ext4 changes (based on 2.6.38-rc5) 
> survived xfstests -g auto before I merged in Linus's 2.6.39 master
> branch.  After I merged with 2.6.39-tip, I reran xfstests, and it got 
> past test #13 (fsstress), which normally means that everything is
> OK, so I sent a pull request to Linus.    Much later, (-g auto takes a 
> long time) I got an OOPS inside the virtio driver.   Ext4 was nowhere 
> in the stack trace, but of course the block layer was.   Grumbling
> that someone  had broke virtio during the merge window, I switched
> my KVM setup to use SATA emulation and used the sda devices
> instead.  This time I got an oops in the block I/O layer, again quite
> late in xfstests.  Somewhere around test #224 or so if I remember
> correctly.
> 
> It was too late last night to do any more investigating, which is why
> I hadn't sent a formal report yet, but next up is for me to retry xfstests
> before merging in my changes, and then to start a git bisect.
> 
> So before accusing some patch series which hasn't been merged
> into 2.6.39 yet, you might want to also worry about some change
> that already has been merged.   Of course the symptoms for me are
> quite different.   I'm not seeing an early oops, but only something
> which shows up when the the system is put under a lot of stress
> by xfstests.  So it could be a different problem....
> 
> 								- Ted
> 
> P.S.  And of course there is the chance that there is some
> subtle bug in the ext4 branch, which worked just fine when
> it was just based on 2.6.38-rc5, but which only manifested
> itself when I merged in the tip of Linus's branch.   So I'm not
> __accusing__ the block layer yet, even though the stack traces
> seem to point that way, because I don't have a smoking gun
> yet.   But I do have to admit I'm suspicious....

But this plugging change is merged, so it is a very likely candidate.
With the oddness going on, I suspect that we end up flushing a plug that
resides on a stack that is no longer valid.

Is there a way to check whether a given pointer is valid on the current
stack for this process?

I think we can rule out stack overflows, since the plug context itself
is very small (28 bytes). But if we have something like:

blk_start_plug(&plug1);
        ...
        blk_start_plug(&plug2);
        ...
flush(&plug2);

then that could explain the corruption and lockups.

So I'd really like to have something ala:

        if (is_str_ptr_valid(current, ptr, size))
                ...

to aid the debugging.

-- 
Jens Axboe


  reply	other threads:[~2011-03-25 12:14 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-24 13:43 [GIT PULL] Core block IO bits for 2.6.39 Jens Axboe
2011-03-24 18:30 ` [GIT PULL] Core block IO bits for 2.6.39 - early Oops Markus Trippelsdorf
2011-03-24 18:36   ` Jens Axboe
2011-03-24 18:47     ` Markus Trippelsdorf
2011-03-24 18:51     ` Jens Axboe
2011-03-24 18:54       ` Markus Trippelsdorf
2011-03-24 18:58         ` Jens Axboe
2011-03-24 19:34           ` Markus Trippelsdorf
2011-03-24 19:36             ` Jens Axboe
2011-03-24 19:45               ` Markus Trippelsdorf
2011-03-24 19:57                 ` Jens Axboe
2011-03-24 20:06                   ` Markus Trippelsdorf
2011-03-24 21:01                     ` Jens Axboe
2011-03-24 21:41                       ` Markus Trippelsdorf
2011-03-25  7:23                         ` Jens Axboe
2011-03-25  8:37                           ` Markus Trippelsdorf
2011-03-25  8:44                             ` Jens Axboe
2011-03-25  9:27                               ` Markus Trippelsdorf
2011-03-25  9:57                               ` Markus Trippelsdorf
2011-03-25 10:11                                 ` Jens Axboe
2011-03-25 12:44                                 ` Jens Axboe
2011-03-25 13:09                                   ` Markus Trippelsdorf
2011-03-25 14:10                                     ` Jens Axboe
2011-03-25 14:14                                       ` Markus Trippelsdorf
2011-03-25 14:18                                         ` Chris Mason
2011-03-25 14:19                                           ` Chris Mason
2011-03-25 14:24                                             ` Markus Trippelsdorf
2011-03-25 14:20                                         ` Jens Axboe
2011-03-25 14:28                                           ` Markus Trippelsdorf
2011-03-25 15:51                                             ` Jens Axboe
2011-03-25 15:58                                               ` Markus Trippelsdorf
2011-03-25 16:01                                                 ` Jens Axboe
2011-03-24 22:06                       ` Markus Trippelsdorf
2011-03-25  4:41             ` Dave Chinner
2011-03-25  7:26               ` Jens Axboe
2011-03-25 11:59               ` Theodore Tso
2011-03-25 12:14                 ` Jens Axboe [this message]
2011-03-25 12:33                   ` Ted Ts'o
2011-03-25 12:43                     ` Jens Axboe
2011-03-25 13:01               ` Chris Mason
2011-03-25 21:35 ` [GIT PULL] Core block IO bits for 2.6.39 Geert Uytterhoeven
2011-03-26  6:29   ` Jens Axboe
2011-03-26  7:21     ` Geert Uytterhoeven
2011-03-26  8:25       ` Jens Axboe
2011-03-26  8:34         ` Geert Uytterhoeven
2011-03-26  9:26           ` Jens Axboe
2011-03-26 16:48     ` Linus Torvalds
2011-03-26 16:53       ` Jens Axboe
2011-03-26 18:48         ` Jens Axboe
2011-03-27 13:21       ` Alan Cox
2011-03-27 11:49 ` Avi Kivity
2011-03-27 12:00   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8C872C.1030805@fusionio.com \
    --to=jaxboe@fusionio.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.