* Is warn_on() right reply for i/o error?
@ 2014-07-24 15:27 Pavel Machek
2014-07-24 19:29 ` Theodore Ts'o
2014-07-29 12:04 ` Jan Kara
0 siblings, 2 replies; 4+ messages in thread
From: Pavel Machek @ 2014-07-24 15:27 UTC (permalink / raw)
To: tytso, linux-kernel, adilger.kernel, linux-ext4, jack
Hi!
Just... I know, I should not be unscrewing hard drive cover while
operating.
But on the other hand... WARN_ON() does not sound like right reply for
a disk failure... right?
And on related note... automounting is evil. Oh well.
Pavel
sd 6:0:0:0: [sdf] Unhandled error code
sd 6:0:0:0: [sdf]
Result: hostbyte=0x01 driverbyte=0x00
sd 6:0:0:0: [sdf] CDB:
cdb[0]=0x28: 28 00 00 05 4a 00 00 00 40 00
end_request: I/O error, dev sdf, sector 346624
Buffer I/O error on device sdf, logical block 43328
Buffer I/O error on device sdf, logical block 43329
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4710 at fs/fs-writeback.c:1199
__mark_inode_dirty+0x1be/0x1d0()
bdi-block not registered
Modules linked in:
CPU: 0 PID: 4710 Comm: umount Not tainted 3.16.0-rc5+ #381
Hardware name: /DG41MJ, BIOS
MJG4110H.86A.0006.2009.1223.1155 12/23/2009
000004af df661e18 c480956d c4a2bae0 df661e48 c403914a c4a2baf7
df661e74
00001266 c4a2bae0 000004af c41154fe c41154fe d21ffa74 c6263dec
d21ffc60
df661e60 c40391ee 00000009 df661e58 c4a2baf7 df661e74 df661e88
c41154fe
Call Trace:
[<c480956d>] dump_stack+0x41/0x52
[<c403914a>] warn_slowpath_common+0x7a/0xa0
[<c41154fe>] ? __mark_inode_dirty+0x1be/0x1d0
[<c41154fe>] ? __mark_inode_dirty+0x1be/0x1d0
[<c40391ee>] warn_slowpath_fmt+0x2e/0x30
[<c41154fe>] __mark_inode_dirty+0x1be/0x1d0
[<c411b806>] __set_page_dirty+0x66/0xb0
[<c411b8a6>] mark_buffer_dirty+0x56/0x80
[<c415bc1d>] ext3_put_super+0x20d/0x250
[<c410a042>] ? evict_inodes+0xb2/0x110
[<c40f4888>] generic_shutdown_super+0x68/0xe0
[<c40f4925>] kill_block_super+0x25/0x70
[<c40f4b88>] deactivate_locked_super+0x48/0x70
[<c40f5161>] deactivate_super+0x51/0x70
[<c410da6f>] mntput_no_expire+0x12f/0x1f0
[<c410f1e7>] ? SyS_umount+0xa7/0x430
[<c410f1e7>] SyS_umount+0xa7/0x430
[<c480e41e>] ? syscall_call+0x7/0xb
[<c40df3e1>] ? vm_munmap+0x41/0x50
[<c480e41e>] syscall_call+0x7/0xb
---[ end trace 6642457659b6f1ae ]---
EXT3-fs (sdf1): I/O error while writing superblock
usb 1-1: new high-speed USB device number 8 using ehci-pci
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Is warn_on() right reply for i/o error?
2014-07-24 15:27 Is warn_on() right reply for i/o error? Pavel Machek
@ 2014-07-24 19:29 ` Theodore Ts'o
2014-07-29 12:05 ` Jan Kara
2014-07-29 12:04 ` Jan Kara
1 sibling, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2014-07-24 19:29 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-kernel, adilger.kernel, linux-ext4, jack
On Thu, Jul 24, 2014 at 05:27:22PM +0200, Pavel Machek wrote:
> Hi!
>
> Just... I know, I should not be unscrewing hard drive cover while
> operating.
>
> But on the other hand... WARN_ON() does not sound like right reply for
> a disk failure... right?
Actually, it can be worse than that. If a hard drive disappears out
from under you while writeback is happening, it's possible to get
crashes in bdi_writeack_workfn() because you can have races between
bdi_unregister() and bdi_writeback_workfn(), since the latter requeues
itself and flush_delayed_work() can return while
bdi_writeback_workfn() is still executing.
This looks like it's a related problem, where the block device gets
unregistered (and this happens in the block device layer without it
telling the file system that the disk drive is about to disappear out
from under it), and occasionally, Bad Stuff Happens. :-(
- Ted
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Is warn_on() right reply for i/o error?
2014-07-24 15:27 Is warn_on() right reply for i/o error? Pavel Machek
2014-07-24 19:29 ` Theodore Ts'o
@ 2014-07-29 12:04 ` Jan Kara
1 sibling, 0 replies; 4+ messages in thread
From: Jan Kara @ 2014-07-29 12:04 UTC (permalink / raw)
To: Pavel Machek; +Cc: tytso, linux-kernel, adilger.kernel, linux-ext4, jack
Hi!
On Thu 24-07-14 17:27:22, Pavel Machek wrote:
> Just... I know, I should not be unscrewing hard drive cover while
> operating.
>
> But on the other hand... WARN_ON() does not sound like right reply for
> a disk failure... right?
No, it's not. Looks like a race between someone shutting down BDI and
mark_inode_dirty() running on it. Frankly we play a whack-a-mole with these
races between device removal while fs is operating on it for several years
already. I think we should decouple struct backing_dev_info from struct
request_queue, properly refcount it so that backing_dev_info can die only
after all users of it (fs et al) are done with it. There are too many
references to backing_dev_info from filesystems to remove it in race-free
way while fs still uses it. Now only to find time to do this... ;)
Honza
> sd 6:0:0:0: [sdf] Unhandled error code
> sd 6:0:0:0: [sdf]
> Result: hostbyte=0x01 driverbyte=0x00
> sd 6:0:0:0: [sdf] CDB:
> cdb[0]=0x28: 28 00 00 05 4a 00 00 00 40 00
> end_request: I/O error, dev sdf, sector 346624
> Buffer I/O error on device sdf, logical block 43328
> Buffer I/O error on device sdf, logical block 43329
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 4710 at fs/fs-writeback.c:1199
> __mark_inode_dirty+0x1be/0x1d0()
> bdi-block not registered
> Modules linked in:
> CPU: 0 PID: 4710 Comm: umount Not tainted 3.16.0-rc5+ #381
> Hardware name: /DG41MJ, BIOS
> MJG4110H.86A.0006.2009.1223.1155 12/23/2009
> 000004af df661e18 c480956d c4a2bae0 df661e48 c403914a c4a2baf7
> df661e74
> 00001266 c4a2bae0 000004af c41154fe c41154fe d21ffa74 c6263dec
> d21ffc60
> df661e60 c40391ee 00000009 df661e58 c4a2baf7 df661e74 df661e88
> c41154fe
> Call Trace:
> [<c480956d>] dump_stack+0x41/0x52
> [<c403914a>] warn_slowpath_common+0x7a/0xa0
> [<c41154fe>] ? __mark_inode_dirty+0x1be/0x1d0
> [<c41154fe>] ? __mark_inode_dirty+0x1be/0x1d0
> [<c40391ee>] warn_slowpath_fmt+0x2e/0x30
> [<c41154fe>] __mark_inode_dirty+0x1be/0x1d0
> [<c411b806>] __set_page_dirty+0x66/0xb0
> [<c411b8a6>] mark_buffer_dirty+0x56/0x80
> [<c415bc1d>] ext3_put_super+0x20d/0x250
> [<c410a042>] ? evict_inodes+0xb2/0x110
> [<c40f4888>] generic_shutdown_super+0x68/0xe0
> [<c40f4925>] kill_block_super+0x25/0x70
> [<c40f4b88>] deactivate_locked_super+0x48/0x70
> [<c40f5161>] deactivate_super+0x51/0x70
> [<c410da6f>] mntput_no_expire+0x12f/0x1f0
> [<c410f1e7>] ? SyS_umount+0xa7/0x430
> [<c410f1e7>] SyS_umount+0xa7/0x430
> [<c480e41e>] ? syscall_call+0x7/0xb
> [<c40df3e1>] ? vm_munmap+0x41/0x50
> [<c480e41e>] syscall_call+0x7/0xb
> ---[ end trace 6642457659b6f1ae ]---
> EXT3-fs (sdf1): I/O error while writing superblock
> usb 1-1: new high-speed USB device number 8 using ehci-pci
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Is warn_on() right reply for i/o error?
2014-07-24 19:29 ` Theodore Ts'o
@ 2014-07-29 12:05 ` Jan Kara
0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2014-07-29 12:05 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Pavel Machek, linux-kernel, adilger.kernel, linux-ext4, jack
On Thu 24-07-14 15:29:38, Ted Tso wrote:
> On Thu, Jul 24, 2014 at 05:27:22PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > Just... I know, I should not be unscrewing hard drive cover while
> > operating.
> >
> > But on the other hand... WARN_ON() does not sound like right reply for
> > a disk failure... right?
>
> Actually, it can be worse than that. If a hard drive disappears out
> from under you while writeback is happening, it's possible to get
> crashes in bdi_writeack_workfn() because you can have races between
> bdi_unregister() and bdi_writeback_workfn(), since the latter requeues
> itself and flush_delayed_work() can return while
> bdi_writeback_workfn() is still executing.
This should be fixed by 5acda9d12dcf1ad0d9a5a2a7c646de3472fa7555 I wrote
in April. But there are certainly other races in that code...
> This looks like it's a related problem, where the block device gets
> unregistered (and this happens in the block device layer without it
> telling the file system that the disk drive is about to disappear out
> from under it), and occasionally, Bad Stuff Happens. :-(
Yeah.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-07-29 12:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-24 15:27 Is warn_on() right reply for i/o error? Pavel Machek
2014-07-24 19:29 ` Theodore Ts'o
2014-07-29 12:05 ` Jan Kara
2014-07-29 12:04 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).