From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Ondrej Zary <linux@rainbow-software.org>
Cc: "Hugh Dickins" <hughd@google.com>,
"Kernel development list" <linux-kernel@vger.kernel.org>,
"Dave Jones" <davej@redhat.com>,
"Hans de Bruin" <jmdebruin@xmsnet.nl>,
"Linux NFS mailing list" <linux-nfs@vger.kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Toralf Förster" <toralf.foerster@gmx.de>,
"richard -rw- weinberger" <richard.weinberger@gmail.com>
Subject: Re: [bisected commit 0fc9d10] NFS-server corruption with 3.4
Date: Tue, 05 Jun 2012 18:52:07 +0400 [thread overview]
Message-ID: <4FCE1D17.1080904@openvz.org> (raw)
In-Reply-To: <201206051620.47925.linux@rainbow-software.org>
[-- Attachment #1: Type: text/plain, Size: 4392 bytes --]
Hmm, very interesting!
Please try this patch, it must fix the problem and print some numbers to debug.
Ondrej Zary wrote:
> On Tuesday 05 June 2012, Konstantin Khlebnikov wrote:
>> Ondrej Zary wrote:
>>> Hello,
>>> I use NFS for deploying HDD images on new machines. My machine has 2nd
>>> network card just for this, running DHCPD, TFTPD and kernel NFS server.
>>> The target machine is set to boot from LAN and boots SystemRescueCD from
>>> my machine with an autorun script that launches Partimage and deploys the
>>> HDD image (400 to 900 MB compressed).
>>>
>>> It worked fine for years, until now. With kernel 3.4, everyting
>>> works only for the first time after boot (and not always). Next time
>>> (next machine), partimage aborts almost immediately as it's probably
>>> unable to decompress the image file. md5sum is different on my machine
>>> vs. on the target (through NFS). Also SystemRescueCD boot aborts with md5
>>> error sometimes. Everything works fine after rebooting back to 3.3.
>>>
>>> Bisection found this:
>>>
>>> 0fc9d1040313047edf6a39fd4d7c7defdca97c62 is the first bad commit
>>> commit 0fc9d1040313047edf6a39fd4d7c7defdca97c62
>>> Author: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Date: Wed Mar 28 14:42:54 2012 -0700
>>>
>>> radix-tree: use iterators in find_get_pages* functions
>>>
>>> Reverting this commit in 3.4 fixes the problem.
>>
>> [all reporters added to CC] let's keep all in one thread
>>
>> In attachment two patches which might help to debug this regression:
>>
>> "mm: recheck page index in find_get_pages_contig" adds paranoid check into
>> find_get_pages_contig(). It can explain everything, but currently I don't
>> see how this can hapens.
>>
>> "mm: debug fing_get_pages speculative restart" shows lookup restarting
>> condition which was removed by bisected commit.
>
> My dmesg (after corruption occured) with these two patches applied:
>
> [ 79.999511] ------------[ cut here ]------------
> [ 79.999564] WARNING: at mm/filemap.c:941 find_get_pages_contig+0x177/0x1b0()
> [ 79.999611] Hardware name: VT82C694X
> [ 79.999617] Modules linked in: nfsd lockd sunrpc des_generic ecb crypto_blkcipher md4 md5 hmac cryptomgr aead cifs crypto_hash crypto_algapi crypto
> firewire_ohci firewire_core
> [ 79.999653] Pid: 1563, comm: nfsd Not tainted 3.4.0-omega #4
> [ 79.999659] Call Trace:
> [ 79.999729] [<c011ff88>] ? warn_slowpath_common+0x78/0xb0
> [ 79.999744] [<c0175187>] ? find_get_pages_contig+0x177/0x1b0
> [ 79.999753] [<c0175187>] ? find_get_pages_contig+0x177/0x1b0
> [ 79.999763] [<c011ffd9>] ? warn_slowpath_null+0x19/0x20
> [ 79.999772] [<c0175187>] ? find_get_pages_contig+0x177/0x1b0
> [ 79.999805] [<c01c544b>] ? __generic_file_splice_read+0xeb/0x510
> [ 79.999853] [<c01c4040>] ? page_cache_pipe_buf_release+0x10/0x10
> [ 79.999873] [<c04f2589>] ? common_interrupt+0x29/0x30
> [ 79.999900] [<f892c710>] ? _fh_update.isra.11.part.12+0x60/0x60 [nfsd]
> [ 79.999931] [<c022c9f7>] ? exportfs_decode_fh+0xc7/0x250
> [ 79.999981] [<f893133d>] ? exp_get_by_name+0x3d/0x70 [nfsd]
> [ 80.000000] [<c0150215>] ? getboottime+0x35/0x40
> [ 80.007383] [<c04f0da8>] ? __schedule+0x198/0x470
> [ 80.007505] [<f88cbf34>] ? sunrpc_cache_lookup+0x54/0x2d0 [sunrpc]
> [ 80.007574] [<c01c58e3>] ? generic_file_splice_read+0x73/0x110
> [ 80.007590] [<c01254bf>] ? irq_exit+0x4f/0x90
> [ 80.007599] [<c01c5870>] ? __generic_file_splice_read+0x510/0x510
> [ 80.007608] [<c01c4330>] ? do_splice_to+0x60/0x90
> [ 80.007618] [<c01c459a>] ? splice_direct_to_actor+0xaa/0x1c0
> [ 80.007654] [<f892d710>] ? nfsd_buffered_filldir+0x160/0x160 [nfsd]
> [ 80.007700] [<f892dc37>] ? nfsd_vfs_read.isra.16+0x117/0x160 [nfsd]
> [ 80.007715] [<f892e764>] ? nfsd_read+0x1c4/0x280 [nfsd]
> [ 80.007732] [<f89357bf>] ? nfsd3_proc_read+0xcf/0x160 [nfsd]
> [ 80.007745] [<f892a7d0>] ? nfsd_dispatch+0xb0/0x190 [nfsd]
> [ 80.007779] [<f88c3682>] ? svc_process+0x442/0x7c0 [sunrpc]
> [ 80.007825] [<f892a0a3>] ? nfsd+0xa3/0x130 [nfsd]
> [ 80.007838] [<f892a000>] ? 0xf8929fff
> [ 80.007846] [<f892a000>] ? 0xf8929fff
> [ 80.007858] [<c01389bc>] ? kthread+0x6c/0x80
> [ 80.007867] [<c0138950>] ? kthread_freezable_should_stop+0x50/0x50
> [ 80.007896] [<c04f2596>] ? kernel_thread_helper+0x6/0xd
> [ 80.007937] ---[ end trace 0bc8170cf5ac5466 ]---
[-- Attachment #2: mm-fix-find_get_pages_contig --]
[-- Type: text/plain, Size: 761 bytes --]
mm: fix find_get_pages_contig
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
mm/filemap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 79c4b2b..f4343a3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -928,7 +928,10 @@ repeat:
* otherwise we can get both false positives and false
* negatives, which is just confusing to the caller.
*/
- if (page->mapping == NULL || page->index != iter.index) {
+ if (page->mapping == NULL || page->index != index + ret) {
+ if (iter.index != index + ret)
+ printk("%s %lu %lu %u\n", __func__,
+ iter.index, index, ret);
page_cache_release(page);
break;
}
next prev parent reply other threads:[~2012-06-05 14:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-05 9:16 [bisected] NFS corruption with 3.4 Ondrej Zary
2012-06-05 12:45 ` Dave Jones
2012-06-05 13:45 ` Holger Hoffstaette
2012-06-05 14:11 ` Ondrej Zary
2012-06-05 13:32 ` [bisected commit 0fc9d10] NFS-server " Konstantin Khlebnikov
2012-06-05 14:20 ` Ondrej Zary
2012-06-05 14:52 ` Konstantin Khlebnikov [this message]
2012-06-05 15:07 ` OGAWA Hirofumi
2012-06-05 15:14 ` Konstantin Khlebnikov
2012-06-05 15:59 ` Konstantin Khlebnikov
2012-06-05 16:18 ` OGAWA Hirofumi
2012-06-05 16:39 ` Konstantin Khlebnikov
2012-06-05 22:30 ` Hans de Bruin
2012-06-06 10:54 ` Konstantin Khlebnikov
2012-06-05 17:03 ` Toralf Förster
2012-06-05 17:17 ` Konstantin Khlebnikov
2012-06-06 8:55 ` Ondrej Zary
2012-06-05 14:21 ` Toralf Förster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FCE1D17.1080904@openvz.org \
--to=khlebnikov@openvz.org \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=hughd@google.com \
--cc=jmdebruin@xmsnet.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux@rainbow-software.org \
--cc=richard.weinberger@gmail.com \
--cc=toralf.foerster@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.