From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: "Oleg Drokin" <green@linuxhacker.ru>,
reiserfs-devel@vger.kernel.org, "Dušan Čolić" <dusanc@gmail.com>
Subject: Re: [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback.
Date: Sat, 10 Oct 2015 17:03:43 +0200 [thread overview]
Message-ID: <561928CF.9030903@gmail.com> (raw)
In-Reply-To: <1444473863.4257.10.camel@gmail.com>
On 10/10/2015 12:44 PM, Ivan Shapovalov wrote:
> On 2015-10-09 at 22:23 +0200, Edward Shishkin wrote:
>> On 10/09/2015 07:14 PM, Ivan Shapovalov wrote:
>>> On 2015-10-09 at 16:55 +0200, Edward Shishkin wrote:
>>>> On 10/09/2015 03:50 PM, Ivan Shapovalov wrote:
>>>>> On 2015-10-09 at 15:27 +0200, Edward Shishkin wrote:
>>>>>> Hi Ivan,
>>>>>>
>>>>>> On 10/09/2015 01:16 PM, Ivan Shapovalov wrote:
>>>>>>> Ref.: https://www.mail-archive.com/linux-f2fs-devel%40lists
>>>>>>> .sou
>>>>>>> rcef
>>>>>>> orge.net/msg02745.html
>>>>>> Do you have a stack trace for reiser4?
>>>>>> How to reproduce it?
>>>>> I'll rebuild the kernel without the fix and provide you with
>>>>> the
>>>>> oops'
>>>>> stacktrace asap.
>>>>>
>>>>> I guess that it's tied to the config. In my case, it is
>>>>> reproducible on
>>>>> each boot, just as the DE starts up and something issues the
>>>>> first
>>>>> fsync().
>>>> Yes, let's try to find the culprit who doesn't set i_wb...
>>> So, here are the traces I've got after adding an
>>> assert(PageDirty(node->pg)) to queue_jnode():
>>> /* captured by hand as these are panics, not oopses */
>>>
>>> 1.
>>>
>>> queue_jnode()
>>> unformatted_make_reloc()
>>> assign_real_blocknrs()
>>> forward_relocate_unformatted()
>>> forward_alloc_unformatted_journal()
>>> ? coord_num_units()
>>> handle_pos_on_twig()
>>> flush_current_atom()
>>> flush_some_atom()
>>> reiser4_writeout()
>>> reiser4_writeback_inodes()
>>> <...>
>>>
>>> 2.
>>>
>>> znode_make_reloc()
>>> forward_alloc_formatted_wa()
>>> ? zload_ra()
>>> allocate_znode()
>>> alloc_pos_and_ancestors()
>>> flush_current_atom()
>>> reiser4_txn_end()
>>> ? reiser4_txn_end()
>>> reiser4_txn_restart_current()
>>> force_commit_atom()
>>> ? reiser4_txn_restart_current()
>>> txnmgr_force_commit_all()
>>> writepages_cryptcompress()
>>> reiser4_writepages_dispatch()
>>> <...>
>>> sys_fsync()
>>>
>>
>> Thanks Ivan.
>> Not a good news, TBH...
>>
>> For formatted nodes we can continue to narrow down the problem
>> (see the attached patch).
> Having applied the patch, I saw loads and loads of warnings (in ~10
> distinct stacktraces), but no panics or oopses in the initial location.
> The false positives are possible, right?
Yes, a lot of ones and nothing interesting.
The same for Dushan's logs. Sorry for bad idea..
Thanks,
Edward.
>
> The traces:
>
> 1.
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036de5c>] scan_by_coord+0x62c/0xed0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036e86d>] scan_unformatted+0x16d/0x320 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032b1f0>] ? incr_load_count+0x20/0xd0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036ed9b>] scan_common+0x37b/0x790 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0370074>] flush_current_atom+0xec4/0x1b40 [reiser4]
> <...>
>
> 2.
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036b952>] neighbor_in_slum.constprop.12+0x82/0x1c0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036bc4a>] handle_pos_on_formatted+0x1ba/0xa40 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 3.
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033fb4a>] insert_into_item+0x1fa/0x610 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033ffd4>] reiser4_resize_item+0x74/0x190 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03ec314>] add_entry_cde+0x104/0x2f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0329af5>] ? znode_invariant+0x3a5/0xd50 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03aa19e>] reiser4_rename2_common+0xbce/0x1140 [reiser4]
> <...>
>
> 4.
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032b40d>] ? zrelse+0x1d/0x70 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036bfc2>] handle_pos_on_formatted+0x532/0xa40 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 5.
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033edda>] insert_with_carry_by_coord+0xea/0x250 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03d6016>] ? free_space_node40+0x16/0x170 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc033f3c6>] insert_by_coord+0x166/0x360 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03fa16f>] ctail_insert_unprepped_cluster+0x1df/0x750 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03c98e3>] prepare_logical_cluster+0x753/0x17f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03cabdf>] do_write_cryptcompress+0x25f/0xed0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc0347a69>] ? is_in_reiser4_context+0x19/0x30 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03ce8d1>] write_cryptcompress+0xa1/0x1d0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03477fa>] ? _reiser4_init_context+0x6a/0xf0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel: [<ffffffffc03bcc66>] reiser4_write_dispatch+0x166/0x4f0 [reiser4]
> <...>
>
> 6.
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc036611a>] move_flush_pos+0xba/0x2c0 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc036c10e>] handle_pos_on_formatted+0x67e/0xa40 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 7.
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc0363b9e>] ? znode_check_flushprepped+0xfe/0x360 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc036bb28>] handle_pos_on_formatted+0x98/0xa40 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel: [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> ...and so on.
>
> I didn't check the code yet; I'll probably try with that assertion converted into warning and split into two
> (one for formatted and another for unformatted nodes), so that I could check what type of nodes is responsible
> for generating the final oops in set_page_writeback().
>
>> For unformatted nodes only code review
>> can help. Normally, all modifications of unformatted nodes should
>> look like the following:
>>
>> struct page *page = jnode_page(node);
>> lock_page(page);
>> char *data = kmap(page);
>> /* modifications are going here */
>> kunmap(page);
>> set_page_dirty_nobuffers(page); /* somebody forgets to do this */
>> unlock_page(page);
>>
>> Modifications of formatted nodes should look like the following:
>>
>> longterm_lock_znode(node);
>> zload(node);
>> /* modifications are going here */
>> zrelse(node);
>> znode_make_dirty(node); /* somebody forgets to do this */
>> longterm_unlock_znode();
>>
>> Anyway, we can use your patch 3 as a temporal fixup.
> The most persistent things are those conseived as the most temporary
> ones... ;)
next prev parent reply other threads:[~2015-10-10 15:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-09 11:16 [PATCH 0/3] reiser4: another batch of fixes for 4.2 Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 1/3] reiser4: remove last traces of JNODE_NEW in the debugging code Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 2/3] reiser4: call account_page_redirty() on re-dirtying pages before giving them to entd Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback Ivan Shapovalov
2015-10-09 13:27 ` Edward Shishkin
2015-10-09 13:50 ` Ivan Shapovalov
2015-10-09 14:55 ` Edward Shishkin
2015-10-09 16:13 ` Ivan Shapovalov
2015-10-09 16:27 ` Oleg Drokin
2015-10-09 16:29 ` Ivan Shapovalov
2015-10-09 17:14 ` Ivan Shapovalov
2015-10-09 20:23 ` Edward Shishkin
2015-10-10 7:19 ` Dušan Čolić
2015-10-10 10:44 ` Ivan Shapovalov
2015-10-10 15:03 ` Edward Shishkin [this message]
2015-10-10 16:51 ` Oleg Drokin
2015-10-12 9:10 ` Edward Shishkin
2015-10-12 9:07 ` Edward Shishkin
2015-10-14 10:05 ` Ivan Shapovalov
2015-10-14 10:55 ` Ivan Shapovalov
2015-10-14 19:06 ` Ivan Shapovalov
2015-10-15 17:20 ` Edward Shishkin
2015-10-24 7:17 ` Ivan Shapovalov
2015-11-04 18:09 ` Ivan Shapovalov
2015-11-09 11:40 ` Edward Shishkin
2015-10-09 15:29 ` Oleg Drokin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561928CF.9030903@gmail.com \
--to=edward.shishkin@gmail.com \
--cc=dusanc@gmail.com \
--cc=green@linuxhacker.ru \
--cc=intelfx100@gmail.com \
--cc=reiserfs-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).