From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753357AbaI3QWQ (ORCPT ); Tue, 30 Sep 2014 12:22:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:64955 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751916AbaI3QWN (ORCPT ); Tue, 30 Sep 2014 12:22:13 -0400 Date: Tue, 30 Sep 2014 12:22:01 -0400 From: Dave Jones To: Linus Torvalds Cc: Al Viro , Linux Kernel , Rik van Riel , Ingo Molnar , Michel Lespinasse Subject: Re: pipe/page fault oddness. Message-ID: <20140930162201.GC15903@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Al Viro , Linux Kernel , Rik van Riel , Ingo Molnar , Michel Lespinasse References: <20140930033327.GA14558@redhat.com> <20140930043309.GA16196@redhat.com> <20140930160510.GA15903@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 30, 2014 at 09:10:26AM -0700, Linus Torvalds wrote: > On Tue, Sep 30, 2014 at 9:05 AM, Dave Jones wrote: > > > > I left it spinning overnight in case someone wanted me to probe it > > further, so I haven't tried reproducing it yet. It took ~12 hours > > yesterday before it got in that state. I'll restart it, and tell it > > to only use pipe fd's, which might speed things up a little. > > Actually, if you haven't restarted it yet, do a few more "Sysrq-T"'s > to see if the stack below the page fault ever changes. Is it *always* > that "second access by fault_in_pages_writeable()" or migth there be > some looping going on in copy_page_to_iter() after all? (Quite > frankly, I don't see how such looping could happen with a good > compiler, and 4.8.3 should be good, but just in case). Here's another snap of the same process.. trinity-c49 R running task 12544 19464 7633 0x00080004 ffffffff910ac790 ffffffff91042e54 ffff8800a09bf9e8 00000064c0206056 0000000000000001 0000000008dd0d40 ffff8800a09bfaa0 ffff8800a13427e8 ffff8800a13427d0 ffff88010b017e00 0000000000000002 ffffffff910ac795 Call Trace: [] ? lock_acquired+0x1d1/0x3c0 [] ? do_raw_spin_unlock+0x5/0x90 [] ? do_raw_spin_trylock+0x5/0x50 [] ? get_parent_ip+0xd/0x50 [] ? _raw_spin_unlock+0x31/0x50 [] ? handle_mm_fault+0x3a7/0xcd0 [] ? __do_page_fault+0x1a4/0x600 [] ? prepare_ftrace_return+0x73/0xe0 [] ? ftrace_graph_caller+0x5a/0x85 [] ? trace_hardirqs_off+0xd/0x10 [] ? retint_restore_args+0xe/0xe [] ? context_tracking_user_exit+0x67/0x1b0 [] ? do_page_fault+0x1e/0x70 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] ? page_fault+0x22/0x30 [] ? copy_page_to_iter+0x3b3/0x500 [] ? pipe_read+0xdf/0x330 [] ? pipe_write+0x490/0x490 [] ? do_sync_readv_writev+0xa0/0xa0 [] ? do_iter_readv_writev+0x78/0xc0 [] ? do_readv_writev+0xce/0x280 [] ? pipe_write+0x490/0x490 [] ? lock_release_holdtime.part.29+0xe6/0x160 [] ? get_parent_ip+0xd/0x50 [] ? get_parent_ip+0xd/0x50 [] ? preempt_count_sub+0x6b/0xf0 [] ? vfs_readv+0x39/0x50 [] ? SyS_readv+0x5c/0x100 [] ? tracesys+0xdd/0xe2 and a few seconds later.. trinity-c49 R running task 12544 19464 7633 0x00080004 ffff880235073840 ffff8800a09bf920 0000000000000046 ffff8800a09bf930 ffff8800a09bfa28 ffffffff910ac840 0000000000000000 ffff8800a09bf968 0000000000000000 00000000a8ef9693 ffff880244c203c0 00000000a8ef9693 Call Trace: [] ? do_raw_spin_trylock+0x50/0x50 [] ? trace_graph_entry+0x123/0x250 [] ? trace_buffer_lock_reserve+0x1e/0x60 [] ? handle_mm_fault+0x3a7/0xcd0 [] ? trace_hardirqs_on+0xd/0x10 [] ? trace_graph_entry+0x108/0x250 [] ? __do_page_fault+0x234/0x600 [] ? prepare_ftrace_return+0x73/0xe0 [] ? down_write_nested+0xc0/0xc0 [] ? get_parent_ip+0xd/0x50 [] ? __do_page_fault+0x234/0x600 [] ? prepare_ftrace_return+0x73/0xe0 [] ? ftrace_graph_caller+0x5a/0x85 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] ? context_tracking_user_exit+0x67/0x1b0 [] ? do_page_fault+0x1e/0x70 [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] ? copy_page_to_iter+0x3b3/0x500 [] ? pipe_read+0xdf/0x330 [] ? pipe_write+0x490/0x490 [] ? do_sync_readv_writev+0xa0/0xa0 [] ? do_iter_readv_writev+0x78/0xc0 [] ? do_readv_writev+0xce/0x280 [] ? pipe_write+0x490/0x490 [] ? lock_release_holdtime.part.29+0xe6/0x160 [] ? get_parent_ip+0xd/0x50 [] ? get_parent_ip+0xd/0x50 [] ? preempt_count_sub+0x6b/0xf0 [] ? vfs_readv+0x39/0x50 [] ? SyS_readv+0x5c/0x100 [] ? tracesys+0xdd/0xe2 That second one is odd, because I had disabled all the function tracing before doing the sysrq-t, so still seeing all the ftrace stuff on the stack suprised me. Also puzzling is how every symbol in both cases is '?'. I've got to run to the dentist for an hour or so, so if you still have more things you want me to test on this before I reboot, I'll put that off until I get back. Dave