From: Zdenek Kabelac <zkabelac@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: linux-kernel@vger.kernel.org, Lukas Czerner <lczerner@redhat.com>,
amwang@redhat.com, Alasdair G Kergon <agk@redhat.com>,
LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel
Date: Sun, 01 Jul 2012 14:57:52 +0200 [thread overview]
Message-ID: <4FF04950.30503@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1206301541280.1919@eggly.anvils>
Dne 1.7.2012 01:10, Hugh Dickins napsal(a):
> On Sat, 30 Jun 2012, Zdenek Kabelac wrote:
>> Dne 30.6.2012 21:55, Hugh Dickins napsal(a):
>>> On Sat, 30 Jun 2012, Zdenek Kabelac wrote:
>>>>
>>>> When I've used 3.5-rc kernels - I've noticed kernel deadlocks.
>>>> Ooops log included. After some experimenting - reliable way to hit this
>>>> oops
>>>> is to run lvm test suite for 10 minutes. Since 3.5 merge window does not
>>>> included anything related to this oops I've went for bisect.
>>>
>>> Thanks a lot for reporting, and going to such effort to find
>>> a reproducible testcase that you could bisect on.
>>>
>>>>
>>>> Game result is commit: 3f31d07571eeea18a7d34db9af21d2285b807a17
>>>>
>>>> mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
>>>
>>> But this leaves me very puzzled.
>>>
>>> Is the "lvm test suite" what I find at git.fedorahosted.org/git/lvm2.git
>>> under tests/ ?
>>
>> Yes - that's it -
>>
>> make
>> as root:
>> cd test
>> make check_local
>>
>> (inside test subdirectory should be enough, if not - just report any problem)
>>
>>> If you have something else running at the same time, which happens to use
>>> madvise(,,MADV_REMOVE) on a filesystem which the commit above now enables
>>> it on (I guess ext4 from the =y in your config), then I suppose we should
>>> start searching for improper memory freeing or scribbling in its holepunch
>>> support: something that might be corrupting the dm_region in your oops.
>>
>> What the test is doing - it creates file in LVM_TEST_DIR (default is /tmp)
>> and using loop device to simulate device (small size - it should fit bellow
>> 200MB)
>>
>> Within this file second layer through virtual DM devices is created and
>> simulates various numbers of PV devices to play with.
>
> This sounds much easier to set up than I was expecting:
> thanks for the info, I'll try it later on today.
>
>>
>> So since everything now support TRIM - such operations should be passed
>> down to the backend file - which probably triggers the path.
>
> What filesystem do you have for /tmp?
>
> If tmpfs, then it will make much more sense if we assume your bisection
> endpoint was off by one. Your bisection log was not quite complete;
> and even if it did appear to converge on the commit you cite, you might
> have got (un)lucky when testing the commit before it, and concluded
> "good" when more attempts would have said "bad".
>
> The commit before, 83e4fa9c16e4af7122e31be3eca5d57881d236fe
> "tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE", would be a
> much more likely first bad commit if your /tmp is on tmpfs:
> that does indeed wire up loop to pass TRIM down to tmpfs by
> fallocate - that indeed played a part in my own testing.
>
> Whereas if your /tmp is on ext4, loop has been passing TRIM down
> with fallocate since v3.0. And whichever, madvise(,,MADV_REMOVE)
> should be completely irrelevant.
While I've been aware of the fact that tmpfs was enhanced with trim support -
I've not tried to run on real ext4 filesystem since for my tests I'm using
tmpfs for quite some time to safe rewrites of SSD :)
So now I've checked with real ext4 - and the bug is there as well
so I've went back - it crashes on 3.4, 3.3 and 3.2 as well.
3.1 is the first kernel which does survive (checked 5 repeated runs)
And you are correct, the first commit which causes crash really is
83e4fa9c16e4af when I use tmpfs as backend storage - the problem why I've
missed to properly identify this commit in my bisect is that crash usually
happens on the second pass of the lvm test suite 'make check_local' execution
- and I've been running test just once. To be sure I've run 5 run passes on
3.4.0-08568-gec9516f - which is OK, but 3.4.0-08569-g83e4fa9 is crashing
usually on second run, with commit 3f31d07571e the crash always happens in
the first pass.
I've also checked some rawhide kernel vmlinuz-3.5.0-0.rc2.git0.1.fc18.x86_64
and it's crashing as well - so it's probably not uniqueness of my config.
So is there any primary suspect in 3.2 which is worth to check - or I need
another day to play another bisect game ?
>
>>
>>> I'll be surprised if that is the case, but it's something that you can
>>> easily check by inserting a WARN_ON(1) in mm/madvise.c madvise_remove():
>>> that should tell us what process is using it.
>>
>> I could try that if that will help.
>
> That would help, if you're very sure of your bisection endpoint;
> but if your /tmp is on tmpfs, then I do think it's more likely
> that you've actually found a bug in the commit before.
the only thing which could be tricky is udev support
(by default it's not enabled ---enable-udev_sync)
However Debian based distros are distributing their own rules, which are not
100% compatible with upstream and create some unpredictable issues,
where the slowness is the least problem.)
If you have Fedora Rawhide with latest lvm2 installed - you should get pretty
well configured system for running test dir (unfortunately there is no way
to virtualize udev...)
Zdenek
next prev parent reply other threads:[~2012-07-01 12:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-30 11:41 [linux-lvm] Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel Zdenek Kabelac
2012-06-30 19:55 ` Hugh Dickins
2012-06-30 20:51 ` Zdenek Kabelac
2012-06-30 23:10 ` Hugh Dickins
2012-07-01 12:57 ` Zdenek Kabelac [this message]
2012-07-01 18:45 ` Hugh Dickins
2012-07-01 20:10 ` Zdenek Kabelac
2012-07-01 22:03 ` Zdenek Kabelac
2012-07-02 9:00 ` Lukáš Czerner
2012-07-02 10:35 ` Lukáš Czerner
2012-07-02 13:41 ` Mike Snitzer
2012-07-02 13:33 ` Mike Snitzer
2012-07-04 11:51 ` Zdenek Kabelac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FF04950.30503@redhat.com \
--to=zkabelac@redhat.com \
--cc=agk@redhat.com \
--cc=amwang@redhat.com \
--cc=hughd@google.com \
--cc=lczerner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).