All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Shishkin <edward.shishkin@gmail.com>
To: Ivan Shapovalov <intelfx100@gmail.com>
Cc: "Oleg Drokin" <green@linuxhacker.ru>,
	reiserfs-devel@vger.kernel.org, "Dušan Čolić" <dusanc@gmail.com>
Subject: Re: [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback.
Date: Sat, 10 Oct 2015 17:03:43 +0200	[thread overview]
Message-ID: <561928CF.9030903@gmail.com> (raw)
In-Reply-To: <1444473863.4257.10.camel@gmail.com>

On 10/10/2015 12:44 PM, Ivan Shapovalov wrote:
> On 2015-10-09 at 22:23 +0200, Edward Shishkin wrote:
>> On 10/09/2015 07:14 PM, Ivan Shapovalov wrote:
>>> On 2015-10-09 at 16:55 +0200, Edward Shishkin wrote:
>>>> On 10/09/2015 03:50 PM, Ivan Shapovalov wrote:
>>>>> On 2015-10-09 at 15:27 +0200, Edward Shishkin wrote:
>>>>>> Hi Ivan,
>>>>>>
>>>>>> On 10/09/2015 01:16 PM, Ivan Shapovalov wrote:
>>>>>>> Ref.: https://www.mail-archive.com/linux-f2fs-devel%40lists
>>>>>>> .sou
>>>>>>> rcef
>>>>>>> orge.net/msg02745.html
>>>>>> Do you have a stack trace for reiser4?
>>>>>> How to reproduce it?
>>>>> I'll rebuild the kernel without the fix and provide you with
>>>>> the
>>>>> oops'
>>>>> stacktrace asap.
>>>>>
>>>>> I guess that it's tied to the config. In my case, it is
>>>>> reproducible on
>>>>> each boot, just as the DE starts up and something issues the
>>>>> first
>>>>> fsync().
>>>> Yes, let's try to find the culprit who doesn't set i_wb...
>>> So, here are the traces I've got after adding an
>>> assert(PageDirty(node->pg)) to queue_jnode():
>>> /* captured by hand as these are panics, not oopses */
>>>
>>> 1.
>>>
>>> queue_jnode()
>>> unformatted_make_reloc()
>>> assign_real_blocknrs()
>>> forward_relocate_unformatted()
>>> forward_alloc_unformatted_journal()
>>> ? coord_num_units()
>>> handle_pos_on_twig()
>>> flush_current_atom()
>>> flush_some_atom()
>>> reiser4_writeout()
>>> reiser4_writeback_inodes()
>>> <...>
>>>
>>> 2.
>>>
>>> znode_make_reloc()
>>> forward_alloc_formatted_wa()
>>> ? zload_ra()
>>> allocate_znode()
>>> alloc_pos_and_ancestors()
>>> flush_current_atom()
>>> reiser4_txn_end()
>>> ? reiser4_txn_end()
>>> reiser4_txn_restart_current()
>>> force_commit_atom()
>>> ? reiser4_txn_restart_current()
>>> txnmgr_force_commit_all()
>>> writepages_cryptcompress()
>>> reiser4_writepages_dispatch()
>>> <...>
>>> sys_fsync()
>>>
>>
>> Thanks Ivan.
>> Not a good news, TBH...
>>
>> For formatted nodes we can continue to narrow down the problem
>> (see the attached patch).
> Having applied the patch, I saw loads and loads of warnings (in ~10
> distinct stacktraces), but no panics or oopses in the initial location.
> The false positives are possible, right?


Yes, a lot of ones and nothing interesting.
The same for Dushan's logs. Sorry for bad idea..

Thanks,
Edward.


>
> The traces:
>
> 1.
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036de5c>] scan_by_coord+0x62c/0xed0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036e86d>] scan_unformatted+0x16d/0x320 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b1f0>] ? incr_load_count+0x20/0xd0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036ed9b>] scan_common+0x37b/0x790 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370074>] flush_current_atom+0xec4/0x1b40 [reiser4]
> <...>
>
> 2.
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036b952>] neighbor_in_slum.constprop.12+0x82/0x1c0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bc4a>] handle_pos_on_formatted+0x1ba/0xa40 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 3.
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033fb4a>] insert_into_item+0x1fa/0x610 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033ffd4>] reiser4_resize_item+0x74/0x190 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ec314>] add_entry_cde+0x104/0x2f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0329af5>] ? znode_invariant+0x3a5/0xd50 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03aa19e>] reiser4_rename2_common+0xbce/0x1140 [reiser4]
> <...>
>
> 4.
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b40d>] ? zrelse+0x1d/0x70 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bfc2>] handle_pos_on_formatted+0x532/0xa40 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 5.
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033edda>] insert_with_carry_by_coord+0xea/0x250 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03d6016>] ? free_space_node40+0x16/0x170 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033f3c6>] insert_by_coord+0x166/0x360 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fa16f>] ctail_insert_unprepped_cluster+0x1df/0x750 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03c98e3>] prepare_logical_cluster+0x753/0x17f0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03cabdf>] do_write_cryptcompress+0x25f/0xed0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0347a69>] ? is_in_reiser4_context+0x19/0x30 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ce8d1>] write_cryptcompress+0xa1/0x1d0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03477fa>] ? _reiser4_init_context+0x6a/0xf0 [reiser4]
> Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03bcc66>] reiser4_write_dispatch+0x166/0x4f0 [reiser4]
> <...>
>
> 6.
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036611a>] move_flush_pos+0xba/0x2c0 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c10e>] handle_pos_on_formatted+0x67e/0xa40 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> 7.
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0363b9e>] ? znode_check_flushprepped+0xfe/0x360 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036bb28>] handle_pos_on_formatted+0x98/0xa40 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
> Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
> <...>
>
> ...and so on.
>
> I didn't check the code yet; I'll probably try with that assertion converted into warning and split into two
> (one for formatted and another for unformatted nodes), so that I could check what type of nodes is responsible
> for generating the final oops in set_page_writeback().
>
>> For unformatted nodes only code review
>> can help. Normally, all modifications of unformatted nodes should
>> look like the following:
>>
>> struct page *page = jnode_page(node);
>> lock_page(page);
>> char *data = kmap(page);
>> /* modifications are going here */
>> kunmap(page);
>> set_page_dirty_nobuffers(page); /* somebody forgets to do this */
>> unlock_page(page);
>>
>> Modifications of formatted nodes should look like the following:
>>
>> longterm_lock_znode(node);
>> zload(node);
>> /* modifications are going here */
>> zrelse(node);
>> znode_make_dirty(node); /* somebody forgets to do this */
>> longterm_unlock_znode();
>>
>> Anyway, we can use your patch 3 as a temporal fixup.
> The most persistent things are those conseived as the most temporary
> ones... ;)


  reply	other threads:[~2015-10-10 15:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09 11:16 [PATCH 0/3] reiser4: another batch of fixes for 4.2 Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 1/3] reiser4: remove last traces of JNODE_NEW in the debugging code Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 2/3] reiser4: call account_page_redirty() on re-dirtying pages before giving them to entd Ivan Shapovalov
2015-10-09 11:16 ` [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback Ivan Shapovalov
2015-10-09 13:27   ` Edward Shishkin
2015-10-09 13:50     ` Ivan Shapovalov
2015-10-09 14:55       ` Edward Shishkin
2015-10-09 16:13         ` Ivan Shapovalov
2015-10-09 16:27           ` Oleg Drokin
2015-10-09 16:29             ` Ivan Shapovalov
2015-10-09 17:14         ` Ivan Shapovalov
2015-10-09 20:23           ` Edward Shishkin
2015-10-10  7:19             ` Dušan Čolić
2015-10-10 10:44             ` Ivan Shapovalov
2015-10-10 15:03               ` Edward Shishkin [this message]
2015-10-10 16:51                 ` Oleg Drokin
2015-10-12  9:10                   ` Edward Shishkin
2015-10-12  9:07               ` Edward Shishkin
2015-10-14 10:05                 ` Ivan Shapovalov
2015-10-14 10:55                   ` Ivan Shapovalov
2015-10-14 19:06                     ` Ivan Shapovalov
2015-10-15 17:20                       ` Edward Shishkin
2015-10-24  7:17                         ` Ivan Shapovalov
2015-11-04 18:09                         ` Ivan Shapovalov
2015-11-09 11:40                           ` Edward Shishkin
2015-10-09 15:29   ` Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561928CF.9030903@gmail.com \
    --to=edward.shishkin@gmail.com \
    --cc=dusanc@gmail.com \
    --cc=green@linuxhacker.ru \
    --cc=intelfx100@gmail.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.