* [Bug 71791] New: Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible)
@ 2014-03-10 14:01 bugzilla-daemon
2014-03-11 20:14 ` [Bug 71791] " bugzilla-daemon
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: bugzilla-daemon @ 2014-03-10 14:01 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=71791
Bug ID: 71791
Summary: Unlinking a file that was moved to another folder but
still open by other process blocks either process (not
always reproducible)
Product: File System
Version: 2.5
Kernel Version: 3.12.9
Hardware: All
OS: Linux
Tree: PREEMPT_RT
Status: NEW
Severity: normal
Priority: P1
Component: ext4
Assignee: fs_ext4@kernel-bugs.osdl.org
Reporter: edpeur@gmail.com
Regression: No
Either the rm process or more rarely the process that has the file open will
block when a file that was moved to other folder is unlinked while still open
by other process. This is not always reproducible, but like 20% reproducible.
Here is the most common trace:
[27600.716129] INFO: task rm:28925 blocked for more than 120 seconds.
[27600.716139] Tainted: G W 3.12-1-rt-amd64 #1
[27600.716142] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[27600.716147] rm D ffff88007eb8a5f0 0 28925 28924 0x00000004
[27600.716159] ffff88007eb8a280 0000000000000086 ffff88004542bfd8
ffff88004542bfd8
[27600.716162] ffff88004542bfd8 ffff88004542bfd8 ffff88007eb8a280
ffff88007eb8a280
[27600.716165] 0000000000000002 ffffffff81123410 ffff88004542bd50
ffff88004542bdb8
[27600.716172] Call Trace:
[27600.716185] [<ffffffff81123410>] ? wait_on_page_read+0x60/0x60
[27600.716190] [<ffffffff814aecb1>] ? schedule+0x21/0x90
[27600.716194] [<ffffffff814aeda3>] ? io_schedule+0x83/0xd0
[27600.716197] [<ffffffff81123415>] ? sleep_on_page+0x5/0x10
[27600.716201] [<ffffffff814ad594>] ? __wait_on_bit+0x54/0x80
[27600.716204] [<ffffffff8112322f>] ? wait_on_page_bit+0x7f/0x90
[27600.716209] [<ffffffff81082050>] ? wake_atomic_t_function+0x30/0x30
[27600.716213] [<ffffffff81130795>] ? pagevec_lookup+0x15/0x20
[27600.716217] [<ffffffff8113284b>] ? truncate_inode_pages_range+0x26b/0x490
[27600.716265] [<ffffffffa02f944d>] ? ext4_evict_inode+0x13d/0x540 [ext4]
[27600.716270] [<ffffffff8119e055>] ? evict+0xb5/0x1b0
[27600.716279] [<ffffffff8119180b>] ? do_unlinkat+0x19b/0x250
[27600.716282] [<ffffffff81090140>] ? migrate_enable+0xa0/0x1e0
[27600.716286] [<ffffffff811c3795>] ? fsnotify_find_inode_mark+0x65/0x90
[27600.716288] [<ffffffff811c4d64>] ? dnotify_flush+0x34/0x110
[27600.716291] [<ffffffff814b7ab9>] ? system_call_fastpath+0x16/0x1b
And this trace has only happened once:
[49440.736350] INFO: task mplayer:30035 blocked for more than 120 seconds.
[49440.736360] Tainted: G W 3.12-1-rt-amd64 #1
[49440.736362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[49440.736365] mplayer D ffff88001d4dd9b0 0 30035 23580 0x00000000
[49440.736375] ffff88001d4dd640 0000000000000086 ffff88003e31bfd8
ffff88003e31bfd8
[49440.736377] ffff88003e31bfd8 ffff88003e31bfd8 ffff88001d4dd640
ffff88001d4dd640
[49440.736379] 0000000000000002 ffffffff81123410 ffff88003e31bd40
ffff88003e31bda0
[49440.736383] Call Trace:
[49440.736396] [<ffffffff81123410>] ? wait_on_page_read+0x60/0x60
[49440.736400] [<ffffffff814aecb1>] ? schedule+0x21/0x90
[49440.736403] [<ffffffff814aeda3>] ? io_schedule+0x83/0xd0
[49440.736405] [<ffffffff81123415>] ? sleep_on_page+0x5/0x10
[49440.736407] [<ffffffff814ad594>] ? __wait_on_bit+0x54/0x80
[49440.736409] [<ffffffff8112322f>] ? wait_on_page_bit+0x7f/0x90
[49440.736412] [<ffffffff81082050>] ? wake_atomic_t_function+0x30/0x30
[49440.736416] [<ffffffff81130795>] ? pagevec_lookup+0x15/0x20
[49440.736419] [<ffffffff8113284b>] ? truncate_inode_pages_range+0x26b/0x490
[49440.736491] [<ffffffffa031944d>] ? ext4_evict_inode+0x13d/0x540 [ext4]
[49440.736495] [<ffffffff8119e055>] ? evict+0xb5/0x1b0
[49440.736497] [<ffffffff8119a2b8>] ? dentry_kill+0x218/0x270
[49440.736499] [<ffffffff8119a374>] ? dput+0x64/0x110
[49440.736502] [<ffffffff811857fb>] ? __fput+0x13b/0x200
[49440.736505] [<ffffffff8107e077>] ? task_work_run+0x97/0xd0
[49440.736510] [<ffffffff810149e1>] ? do_notify_resume+0x81/0xe0
[49440.736513] [<ffffffff814b7d76>] ? int_signal+0x12/0x17
This is the kernel used:
http://ftp.debian.org/debian/pool/main/l/linux/linux-image-3.12-1-rt-amd64_3.12.9-1_amd64.deb
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 71791] Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible)
2014-03-10 14:01 [Bug 71791] New: Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible) bugzilla-daemon
@ 2014-03-11 20:14 ` bugzilla-daemon
2014-03-16 7:20 ` bugzilla-daemon
2014-03-18 15:35 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2014-03-11 20:14 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=71791
Jan Kara <jack@suse.cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
CC| |jack@suse.cz
--- Comment #1 from Jan Kara <jack@suse.cz> ---
So one note is that we are likely waiting for writeback on some page in that
file to finish. Another note is that I don't think another process has the file
open - the stacktrace clearly shows that we are completely deleting the file
which means noone can have the file open (if you can really show otherwise, it
would be a major bug).
I guess the file has been written quite recently and the filesystem is quite
loaded with the IO, isn't it?
Can you try mounting the filesystem with 'noauto_da_alloc' mount option?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 71791] Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible)
2014-03-10 14:01 [Bug 71791] New: Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible) bugzilla-daemon
2014-03-11 20:14 ` [Bug 71791] " bugzilla-daemon
@ 2014-03-16 7:20 ` bugzilla-daemon
2014-03-18 15:35 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2014-03-16 7:20 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=71791
Zheng Liu <gnehzuil.liu@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gnehzuil.liu@gmail.com
--- Comment #2 from Zheng Liu <gnehzuil.liu@gmail.com> ---
I guess that the journal mode is 'data=ordered', right? Could you please try
to switch the journal mode to 'data=writeback' and look at whether or not the
problem can be reproduced?
In our product system, we met a hang which is caused by 'data=ordered' under a
heavy IO workload. When the user uses 'rm' command to delete a file,
ext4_begin_ordered_truncate() should be called with 'data=ordered' and it could
trigger write back kernel thread to write out the dirty data. Then
truncate_inode_pages() will wait on write back.
Regards,
- Zheng
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug 71791] Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible)
2014-03-10 14:01 [Bug 71791] New: Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible) bugzilla-daemon
2014-03-11 20:14 ` [Bug 71791] " bugzilla-daemon
2014-03-16 7:20 ` bugzilla-daemon
@ 2014-03-18 15:35 ` bugzilla-daemon
2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2014-03-18 15:35 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=71791
--- Comment #3 from edpeur@gmail.com ---
Yes, the journal mode was 'data=ordered'.
This problem is reproduced on this external USB drive on my laptop using a
quite light workload.
It is not always reproducible and I had to change the way I work so I might not
be able to reproduce this bug in the near time.
There was another hang when shutdown tried to umount this drive:
[374641.816086] INFO: task shutdown:20015 blocked for more than 120 seconds.
[374641.816096] Tainted: G W 3.12-1-rt-amd64 #1
[374641.816099] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[374641.816103] shutdown D ffff8800b4ce6af0 0 20015 1
0x00000000
[374641.816116] ffff8800b4ce6780 0000000000000086 ffff880103177fd8
ffff880103177fd8
[374641.816119] ffff880103177fd8 ffff880103177fd8 ffff8800b4ce6780
ffff8800b4ce6780
[374641.816122] 0000000000000002 ffffffff81123410 ffff880103177d90
000000000000c394
[374641.816129] Call Trace:
[374641.816142] [<ffffffff81123410>] ? wait_on_page_read+0x60/0x60
[374641.816147] [<ffffffff814aecb1>] ? schedule+0x21/0x90
[374641.816151] [<ffffffff814aeda3>] ? io_schedule+0x83/0xd0
[374641.816154] [<ffffffff81123415>] ? sleep_on_page+0x5/0x10
[374641.816158] [<ffffffff814ad594>] ? __wait_on_bit+0x54/0x80
[374641.816162] [<ffffffff8112322f>] ? wait_on_page_bit+0x7f/0x90
[374641.816166] [<ffffffff81082050>] ? wake_atomic_t_function+0x30/0x30
[374641.816171] [<ffffffff811307b8>] ? pagevec_lookup_tag+0x18/0x20
[374641.816175] [<ffffffff81123318>] ? filemap_fdatawait_range+0xd8/0x150
[374641.816179] [<ffffffff8108e6b9>] ? get_parent_ip+0x9/0x20
[374641.816182] [<ffffffff8108e6b9>] ? get_parent_ip+0x9/0x20
[374641.816186] [<ffffffff814b42d7>] ? add_preempt_count+0x97/0xe0
[374641.816191] [<ffffffff8105ed50>] ? pin_current_cpu+0x90/0x1b0
[374641.817981] [<ffffffff8108e6b9>] ? get_parent_ip+0x9/0x20
[374641.817983] [<ffffffff8105ee7d>] ? unpin_current_cpu+0xd/0x60
[374641.817986] [<ffffffff811ac869>] ? sync_inodes_sb+0x169/0x1f0
[374641.817989] [<ffffffff811b1c80>] ? generic_write_sync+0x60/0x60
[374641.817991] [<ffffffff811877ba>] ? iterate_supers+0xba/0x120
[374641.817993] [<ffffffff811b1ecd>] ? sys_sync+0x2d/0x90
[374641.817996] [<ffffffff814b7ab9>] ? system_call_fastpath+0x16/0x1b
I have upgraded to:
http://ftp.debian.org/debian/pool/main/l/linux/linux-image-3.13-1-amd64_3.13.5-1_amd64.deb
I have enabled 'data=writeback' now. So, I will report if I see this problem
again.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-19 17:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-10 14:01 [Bug 71791] New: Unlinking a file that was moved to another folder but still open by other process blocks either process (not always reproducible) bugzilla-daemon
2014-03-11 20:14 ` [Bug 71791] " bugzilla-daemon
2014-03-16 7:20 ` bugzilla-daemon
2014-03-18 15:35 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).