* Intentionally corrupted vfat fs causing BUG @ 2014-10-10 20:57 Sami Liedes 2014-10-11 10:20 ` Richard Weinberger 2014-10-12 12:08 ` OGAWA Hirofumi 0 siblings, 2 replies; 27+ messages in thread From: Sami Liedes @ 2014-10-10 20:57 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: linux-fsdevel Hi! I ran some fuzz tests on a vfat filesystem on 3.17 and found a filesystem that differs from a pristine filesystem by one bit and causes a kernel BUG. This seems to be an old bug; I can also replicate it on a 3.3.4 kernel I happened to have lying around. The set of operations I run for filesystems is this: mount $TARGET_DEV /mnt -t vfat cd /mnt timeout 30 cp -r doc doc2 >&/dev/null timeout 30 find -xdev >&/dev/null timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null timeout 30 mkdir tmp >&/dev/null timeout 30 echo whoah >tmp/filu >&/dev/null timeout 30 rm -rf /mnt/* >&/dev/null cd / umount /mnt The backtrace seems to indicate that the BUG happens at the rm phase. You can get the pristine filesystem from http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2 The broken filesystem is at http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2 The only difference is this one bit: --- /dev/fd/63 2014-10-10 23:23:09.424422610 +0300 +++ /dev/fd/62 2014-10-10 23:23:09.424422610 +0300 @@ -1977,7 +1977,7 @@ 0008d230 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......| 0008d240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * -0008da00 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |. .....| +0008da00 2e 20 20 20 20 20 20 20 20 20 60 10 00 00 bc 0d |. `.....| 0008da10 08 39 08 39 00 00 bc 0d 08 39 0b 01 00 00 00 00 |.9.9.....9......| 0008da20 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |.. .....| 0008da30 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......| Backtrace on 3.17: [ 1.363073] ------------[ cut here ]------------ [ 1.363437] kernel BUG at fs/namei.c:2430! [ 1.363749] invalid opcode: 0000 [#1] SMP [ 1.364088] CPU: 0 PID: 889 Comm: rm Not tainted 3.17.0+ #32 [ 1.364517] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1.365291] task: ffff880000066360 ti: ffff8800063b0000 task.ti: ffff8800063b0000 [ 1.365813] RIP: 0010:[<ffffffff8116c998>] [<ffffffff8116c998>] may_delete+0x128/0x140 [ 1.365813] RSP: 0018:ffff8800063b3e38 EFLAGS: 00010293 [ 1.365813] RAX: ffff8800065cf120 RBX: ffff8800065cf240 RCX: ffff8800000663b0 [ 1.365813] RDX: 0000000000000001 RSI: ffff8800065cf240 RDI: ffff880006631858 [ 1.365813] RBP: ffff8800063b3e58 R08: 0000000000000000 R09: 0000000000000001 [ 1.365813] R10: 0000000000000000 R11: 0000000000000044 R12: ffff8800066313b0 [ 1.365813] R13: ffff880006631858 R14: 0000000000000007 R15: 00000000fffffffe [ 1.365813] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7609940 [ 1.365813] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 1.365813] CR2: 00000000ff967000 CR3: 000000000637f000 CR4: 00000000000006b0 [ 1.365813] Stack: [ 1.365813] ffff8800065cf240 0000000000000000 ffff880006631858 0000000000000007 [ 1.365813] ffff8800063b3e80 ffffffff81173699 ffff880006334000 0000000000000000 [ 1.365813] 0000000008faf1e4 ffff8800063b3f68 ffffffff81173905 ffff8800065cf240 [ 1.365813] Call Trace: [ 1.365813] [<ffffffff81173699>] vfs_rmdir+0x19/0xf0 [ 1.365813] [<ffffffff81173905>] do_rmdir+0x195/0x1d0 [ 1.365813] [<ffffffff810aa11d>] ? trace_hardirqs_on_caller+0x15d/0x200 [ 1.365813] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1.365813] [<ffffffff81173d95>] SyS_unlinkat+0x25/0x40 [ 1.365813] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a [ 1.365813] Code: 41 5e 5d c3 0f 1f 80 00 00 00 00 b8 ff ff ff ff eb c4 90 0f 0b 66 0f 1f 44 00 00 48 39 5b 40 75 a2 b8 f0 ff ff ff eb ae 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 b8 fe ff ff ff eb 9c 66 0f 1f 84 00 00 [ 1.365813] RIP [<ffffffff8116c998>] may_delete+0x128/0x140 [ 1.365813] RSP <ffff8800063b3e38> [ 1.378725] ---[ end trace 15817999647273ef ]--- [ 1.379086] Kernel panic - not syncing: Fatal exception [ 1.379592] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 1.380349] Rebooting in 1 seconds.. Sami ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-10 20:57 Intentionally corrupted vfat fs causing BUG Sami Liedes @ 2014-10-11 10:20 ` Richard Weinberger 2014-10-12 12:08 ` OGAWA Hirofumi 1 sibling, 0 replies; 27+ messages in thread From: Richard Weinberger @ 2014-10-11 10:20 UTC (permalink / raw) To: Sami Liedes, OGAWA Hirofumi, linux-fsdevel, Al Viro On Fri, Oct 10, 2014 at 10:57 PM, Sami Liedes <sami.liedes@iki.fi> wrote: > Hi! > > I ran some fuzz tests on a vfat filesystem on 3.17 and found a > filesystem that differs from a pristine filesystem by one bit and > causes a kernel BUG. This seems to be an old bug; I can also replicate > it on a 3.3.4 kernel I happened to have lying around. > > The set of operations I run for filesystems is this: > > mount $TARGET_DEV /mnt -t vfat > cd /mnt > timeout 30 cp -r doc doc2 >&/dev/null > timeout 30 find -xdev >&/dev/null > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null > timeout 30 mkdir tmp >&/dev/null > timeout 30 echo whoah >tmp/filu >&/dev/null > timeout 30 rm -rf /mnt/* >&/dev/null > cd / > umount /mnt > > The backtrace seems to indicate that the BUG happens at the rm phase. > > You can get the pristine filesystem from > > http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2 > > The broken filesystem is at > > http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2 > > The only difference is this one bit: > > --- /dev/fd/63 2014-10-10 23:23:09.424422610 +0300 > +++ /dev/fd/62 2014-10-10 23:23:09.424422610 +0300 > @@ -1977,7 +1977,7 @@ > 0008d230 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......| > 0008d240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > -0008da00 2e 20 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |. .....| > +0008da00 2e 20 20 20 20 20 20 20 20 20 60 10 00 00 bc 0d |. `.....| The issue here is that this directory entry is faulty. Usually each entry has a "." entry which points to itself. In your image the ->name is different. Hence, there is no "." but a directory named ". `" which points to itself. IOW a directory loop. The vFAT file system misses to check whether there is really a "." and ".." directory. These directories are the only ones which are allowed to be hard links. And it misses to detect loops which triggers the BUG_ON() in the VFS core. For the latter issue I'm not unsure where to fix this. Al? > 0008da10 08 39 08 39 00 00 bc 0d 08 39 0b 01 00 00 00 00 |.9.9.....9......| > 0008da20 2e 2e 20 20 20 20 20 20 20 20 20 10 00 00 bc 0d |.. .....| > 0008da30 08 39 08 39 00 00 bc 0d 08 39 13 00 00 00 00 00 |.9.9.....9......| > > > Backtrace on 3.17: > > [ 1.363073] ------------[ cut here ]------------ > [ 1.363437] kernel BUG at fs/namei.c:2430! > [ 1.363749] invalid opcode: 0000 [#1] SMP > [ 1.364088] CPU: 0 PID: 889 Comm: rm Not tainted 3.17.0+ #32 > [ 1.364517] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 1.365291] task: ffff880000066360 ti: ffff8800063b0000 task.ti: ffff8800063b0000 > [ 1.365813] RIP: 0010:[<ffffffff8116c998>] [<ffffffff8116c998>] may_delete+0x128/0x140 > [ 1.365813] RSP: 0018:ffff8800063b3e38 EFLAGS: 00010293 > [ 1.365813] RAX: ffff8800065cf120 RBX: ffff8800065cf240 RCX: ffff8800000663b0 > [ 1.365813] RDX: 0000000000000001 RSI: ffff8800065cf240 RDI: ffff880006631858 > [ 1.365813] RBP: ffff8800063b3e58 R08: 0000000000000000 R09: 0000000000000001 > [ 1.365813] R10: 0000000000000000 R11: 0000000000000044 R12: ffff8800066313b0 > [ 1.365813] R13: ffff880006631858 R14: 0000000000000007 R15: 00000000fffffffe > [ 1.365813] FS: 0000000000000000(0000) GS:ffff880007c00000(0063) knlGS:00000000f7609940 > [ 1.365813] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > [ 1.365813] CR2: 00000000ff967000 CR3: 000000000637f000 CR4: 00000000000006b0 > [ 1.365813] Stack: > [ 1.365813] ffff8800065cf240 0000000000000000 ffff880006631858 0000000000000007 > [ 1.365813] ffff8800063b3e80 ffffffff81173699 ffff880006334000 0000000000000000 > [ 1.365813] 0000000008faf1e4 ffff8800063b3f68 ffffffff81173905 ffff8800065cf240 > [ 1.365813] Call Trace: > [ 1.365813] [<ffffffff81173699>] vfs_rmdir+0x19/0xf0 > [ 1.365813] [<ffffffff81173905>] do_rmdir+0x195/0x1d0 > [ 1.365813] [<ffffffff810aa11d>] ? trace_hardirqs_on_caller+0x15d/0x200 > [ 1.365813] [<ffffffff8165e9cb>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 1.365813] [<ffffffff81173d95>] SyS_unlinkat+0x25/0x40 > [ 1.365813] [<ffffffff8188e888>] sysenter_dispatch+0x7/0x2a > [ 1.365813] Code: 41 5e 5d c3 0f 1f 80 00 00 00 00 b8 ff ff ff ff eb c4 90 0f 0b 66 0f 1f 44 00 00 48 39 5b 40 75 a2 b8 f0 ff ff ff eb ae 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 b8 fe ff ff ff eb 9c 66 0f 1f 84 00 00 > [ 1.365813] RIP [<ffffffff8116c998>] may_delete+0x128/0x140 > [ 1.365813] RSP <ffff8800063b3e38> > [ 1.378725] ---[ end trace 15817999647273ef ]--- > [ 1.379086] Kernel panic - not syncing: Fatal exception > [ 1.379592] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > [ 1.380349] Rebooting in 1 seconds.. > > Sami > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, //richard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-10 20:57 Intentionally corrupted vfat fs causing BUG Sami Liedes 2014-10-11 10:20 ` Richard Weinberger @ 2014-10-12 12:08 ` OGAWA Hirofumi 2014-10-12 19:04 ` Richard Weinberger 1 sibling, 1 reply; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-12 12:08 UTC (permalink / raw) To: Sami Liedes; +Cc: linux-fsdevel Sami Liedes <sami.liedes@iki.fi> writes: > Hi! Hi, > I ran some fuzz tests on a vfat filesystem on 3.17 and found a > filesystem that differs from a pristine filesystem by one bit and > causes a kernel BUG. This seems to be an old bug; I can also replicate > it on a 3.3.4 kernel I happened to have lying around. > > The set of operations I run for filesystems is this: > > mount $TARGET_DEV /mnt -t vfat > cd /mnt > timeout 30 cp -r doc doc2 >&/dev/null > timeout 30 find -xdev >&/dev/null > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null > timeout 30 mkdir tmp >&/dev/null > timeout 30 echo whoah >tmp/filu >&/dev/null > timeout 30 rm -rf /mnt/* >&/dev/null > cd / > umount /mnt > > The backtrace seems to indicate that the BUG happens at the rm phase. > > You can get the pristine filesystem from > > http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2 > > The broken filesystem is at > > http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2 > > The only difference is this one bit: Hm, "." entry in directly seems to be corrupted. Maybe, possible causes are, memory error, memory corruption, fat bug, other bug? Well, this is 1 bit flip. Can you check memory by memtest or such? Do you know how this reproduce? testimg.vfat was empty, and testimg.vfat.24.min was corrupted already. We would need the way how make corrupted image like testimg.vfat.24.min, to find the cause of this problem. Base image for reproducing this bug, and way to do are very helpful. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-12 12:08 ` OGAWA Hirofumi @ 2014-10-12 19:04 ` Richard Weinberger 2014-10-12 20:40 ` Sami Liedes 2014-10-13 7:57 ` OGAWA Hirofumi 0 siblings, 2 replies; 27+ messages in thread From: Richard Weinberger @ 2014-10-12 19:04 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Sami Liedes, linux-fsdevel On Sun, Oct 12, 2014 at 2:08 PM, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote: > Sami Liedes <sami.liedes@iki.fi> writes: > >> Hi! > > Hi, > >> I ran some fuzz tests on a vfat filesystem on 3.17 and found a >> filesystem that differs from a pristine filesystem by one bit and >> causes a kernel BUG. This seems to be an old bug; I can also replicate >> it on a 3.3.4 kernel I happened to have lying around. >> >> The set of operations I run for filesystems is this: >> >> mount $TARGET_DEV /mnt -t vfat >> cd /mnt >> timeout 30 cp -r doc doc2 >&/dev/null >> timeout 30 find -xdev >&/dev/null >> timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/null >> timeout 30 mkdir tmp >&/dev/null >> timeout 30 echo whoah >tmp/filu >&/dev/null >> timeout 30 rm -rf /mnt/* >&/dev/null >> cd / >> umount /mnt >> >> The backtrace seems to indicate that the BUG happens at the rm phase. >> >> You can get the pristine filesystem from >> >> http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2 >> >> The broken filesystem is at >> >> http://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2 >> >> The only difference is this one bit: > > Hm, "." entry in directly seems to be corrupted. Maybe, possible causes > are, memory error, memory corruption, fat bug, other bug? > > Well, this is 1 bit flip. Can you check memory by memtest or such? > > Do you know how this reproduce? testimg.vfat was empty, and > testimg.vfat.24.min was corrupted already. > > We would need the way how make corrupted image like testimg.vfat.24.min, > to find the cause of this problem. Base image for reproducing this bug, > and way to do are very helpful. You misunderstood Sami's issue. He corrupted the vfat fs intentionally to find issues in the vfat driver. And as he reports he found an nasty issue. Any user can trigger a BUG_ON() using a crafted vfat image. Please note, if you mount exactly the same image using msdos fs the issue does not occur. -- Thanks, //richard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-12 19:04 ` Richard Weinberger @ 2014-10-12 20:40 ` Sami Liedes 2014-10-13 7:57 ` OGAWA Hirofumi 1 sibling, 0 replies; 27+ messages in thread From: Sami Liedes @ 2014-10-12 20:40 UTC (permalink / raw) To: Richard Weinberger; +Cc: OGAWA Hirofumi, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1464 bytes --] On Sun, Oct 12, 2014 at 09:04:19PM +0200, Richard Weinberger wrote: > You misunderstood Sami's issue. He corrupted the vfat fs intentionally > to find issues > in the vfat driver. > And as he reports he found an nasty issue. > Any user can trigger a BUG_ON() using a crafted vfat image. > Please note, if you mount exactly the same image using msdos fs the issue > does not occur. Yeah, you can think of it as either a security issue if you wish, or just as a plain old robustness issue in the age of unreliable USB sticks etc. in that it just would be more ideal to fail gracefully instead of crashing. Anyway, I'm not advocating for any measure of severity (I leave that to others); I just find and report these in the hope that someone finds the reports useful. I personally view these more as robustness than security bugs at the moment, but certainly they can be seen as either. And if some of these get fixed, I will rerun the tests, so I might produce a daunting stream of reports. I know it would be nicer to report everything at once, but usually the issues found first are prevalent enough to mask other issues. By the way, I find it interesting that once I implemented a tool to minimize the differences between a bad fs and a good fs, every bug I have found in any filesystem implementation has minimized to a single bit flip. That suggests to me that fuzz testing is probably not very effective in finding bugs that require more than that. Sami [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-12 19:04 ` Richard Weinberger 2014-10-12 20:40 ` Sami Liedes @ 2014-10-13 7:57 ` OGAWA Hirofumi 2014-10-13 8:22 ` Richard Weinberger 1 sibling, 1 reply; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-13 7:57 UTC (permalink / raw) To: Richard Weinberger; +Cc: Sami Liedes, linux-fsdevel, Al Viro Richard Weinberger <richard.weinberger@gmail.com> writes: >> >> We would need the way how make corrupted image like testimg.vfat.24.min, >> to find the cause of this problem. Base image for reproducing this bug, >> and way to do are very helpful. > > You misunderstood Sami's issue. He corrupted the vfat fs intentionally > to find issues > in the vfat driver. > And as he reports he found an nasty issue. > Any user can trigger a BUG_ON() using a crafted vfat image. > Please note, if you mount exactly the same image using msdos fs the issue > does not occur. Ah. BTW, msdos doesn't allow ".*" as filename, so not trigger this. But root cause of this is same as double linked dir, "." should not matter. I.e. this issue would be able to reproduce on all FSes if made corrupted image intentionally. If we want to fix intentional corruption like this seriously, I guess we would need something like online-fsck to detect like double link of dir. If we want to avoid only Oops, it might be enough to remove BUG_ON(). I'm still not sure whether this is right direction or not though, because mount operation is root only and untrusted image should run fsck before. But, also, Oops is clearly unexpected. Hmmm... Al? [PATCH] Avoid Oops on corrupted dir in may_delete() Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --- fs/namei.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff -puN fs/namei.c~fix-oops-on-corrupted-fs fs/namei.c --- linux-3.17/fs/namei.c~fix-oops-on-corrupted-fs 2014-10-13 16:34:28.352999516 +0900 +++ linux-3.17-hirofumi/fs/namei.c 2014-10-13 16:35:19.196803169 +0900 @@ -2427,7 +2427,10 @@ static int may_delete(struct inode *dir, return -ENOENT; BUG_ON(!inode); - BUG_ON(victim->d_parent->d_inode != dir); + /* Easy check of corrupted dir. */ + if (victim->d_parent->d_inode != dir) + return -EBUSY; + audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE); error = inode_permission(dir, MAY_WRITE | MAY_EXEC); _ -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 7:57 ` OGAWA Hirofumi @ 2014-10-13 8:22 ` Richard Weinberger 2014-10-13 8:35 ` OGAWA Hirofumi 0 siblings, 1 reply; 27+ messages in thread From: Richard Weinberger @ 2014-10-13 8:22 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Sami Liedes, linux-fsdevel, Al Viro Am 13.10.2014 um 09:57 schrieb OGAWA Hirofumi: > Richard Weinberger <richard.weinberger@gmail.com> writes: > >>> >>> We would need the way how make corrupted image like testimg.vfat.24.min, >>> to find the cause of this problem. Base image for reproducing this bug, >>> and way to do are very helpful. >> >> You misunderstood Sami's issue. He corrupted the vfat fs intentionally >> to find issues >> in the vfat driver. >> And as he reports he found an nasty issue. >> Any user can trigger a BUG_ON() using a crafted vfat image. >> Please note, if you mount exactly the same image using msdos fs the issue >> does not occur. > > Ah. > > BTW, msdos doesn't allow ".*" as filename, so not trigger this. But root > cause of this is same as double linked dir, "." should not > matter. I.e. this issue would be able to reproduce on all FSes if made > corrupted image intentionally. > > If we want to fix intentional corruption like this seriously, I guess we > would need something like online-fsck to detect like double link of > dir. If we want to avoid only Oops, it might be enough to remove > BUG_ON(). > > I'm still not sure whether this is right direction or not though, > because mount operation is root only and untrusted image should run fsck > before. But, also, Oops is clearly unexpected. Hmmm... This limitation is not true anymore. Plug in a USB stick into a recent Linux desktop, it will automatically mount it... Also think of user namespaces and FUSE. Thanks, //richard > Al? > > > [PATCH] Avoid Oops on corrupted dir in may_delete() > > Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> > --- > > fs/namei.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff -puN fs/namei.c~fix-oops-on-corrupted-fs fs/namei.c > --- linux-3.17/fs/namei.c~fix-oops-on-corrupted-fs 2014-10-13 16:34:28.352999516 +0900 > +++ linux-3.17-hirofumi/fs/namei.c 2014-10-13 16:35:19.196803169 +0900 > @@ -2427,7 +2427,10 @@ static int may_delete(struct inode *dir, > return -ENOENT; > BUG_ON(!inode); > > - BUG_ON(victim->d_parent->d_inode != dir); > + /* Easy check of corrupted dir. */ > + if (victim->d_parent->d_inode != dir) > + return -EBUSY; > + > audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE); > > error = inode_permission(dir, MAY_WRITE | MAY_EXEC); > _ > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 8:22 ` Richard Weinberger @ 2014-10-13 8:35 ` OGAWA Hirofumi 2014-10-13 8:39 ` Richard Weinberger 0 siblings, 1 reply; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-13 8:35 UTC (permalink / raw) To: Richard Weinberger; +Cc: Sami Liedes, linux-fsdevel, Al Viro Richard Weinberger <richard@nod.at> writes: >> I'm still not sure whether this is right direction or not though, >> because mount operation is root only and untrusted image should run fsck >> before. But, also, Oops is clearly unexpected. Hmmm... > > This limitation is not true anymore. Plug in a USB stick into a recent > Linux desktop, it will automatically mount it... Also think of user > namespaces and FUSE. Not really (well, true, some sort though). It is still controlled by root or capability, right? I.e. still controlled by admin of system. I read user namespaces last time, it doesn't allow to mount the block device by namespace's root. FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I still didn't check it fully. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 8:35 ` OGAWA Hirofumi @ 2014-10-13 8:39 ` Richard Weinberger 2014-10-13 8:59 ` OGAWA Hirofumi 0 siblings, 1 reply; 27+ messages in thread From: Richard Weinberger @ 2014-10-13 8:39 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Sami Liedes, linux-fsdevel, Al Viro Am 13.10.2014 um 10:35 schrieb OGAWA Hirofumi: > Richard Weinberger <richard@nod.at> writes: > >>> I'm still not sure whether this is right direction or not though, >>> because mount operation is root only and untrusted image should run fsck >>> before. But, also, Oops is clearly unexpected. Hmmm... >> >> This limitation is not true anymore. Plug in a USB stick into a recent >> Linux desktop, it will automatically mount it... Also think of user >> namespaces and FUSE. > > Not really (well, true, some sort though). It is still controlled by root > or capability, right? I.e. still controlled by admin of system. Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :) > I read user namespaces last time, it doesn't allow to mount the block > device by namespace's root. > > FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I > still didn't check it fully. The question is how long these limits will stay... User namespaces uncovered already a pile of issues wrt. to mounting. Thanks, //richard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 8:39 ` Richard Weinberger @ 2014-10-13 8:59 ` OGAWA Hirofumi 2014-10-13 14:36 ` Richard Weinberger 2014-10-19 16:36 ` Richard Weinberger 0 siblings, 2 replies; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-13 8:59 UTC (permalink / raw) To: Richard Weinberger; +Cc: Sami Liedes, linux-fsdevel, Al Viro Richard Weinberger <richard@nod.at> writes: > Am 13.10.2014 um 10:35 schrieb OGAWA Hirofumi: >> Richard Weinberger <richard@nod.at> writes: >> >>>> I'm still not sure whether this is right direction or not though, >>>> because mount operation is root only and untrusted image should run fsck >>>> before. But, also, Oops is clearly unexpected. Hmmm... >>> >>> This limitation is not true anymore. Plug in a USB stick into a recent >>> Linux desktop, it will automatically mount it... Also think of user >>> namespaces and FUSE. >> >> Not really (well, true, some sort though). It is still controlled by root >> or capability, right? I.e. still controlled by admin of system. > > Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :) > >> I read user namespaces last time, it doesn't allow to mount the block >> device by namespace's root. >> >> FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I >> still didn't check it fully. > > The question is how long these limits will stay... > User namespaces uncovered already a pile of issues wrt. to mounting. Well, anyway, I don't object like that simple patch. My worry is, I feel we need something like online-fsck finally if we tackled fully to avoid issues (I still didn't analyze about this issue seriously and fully), and measurable overheads. And I myself have interest to online/runtime-fsck (i.e. detect and fix) though, I don't have interest to make it generic operations, and I would not have interest to tackle for all FSes... -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 8:59 ` OGAWA Hirofumi @ 2014-10-13 14:36 ` Richard Weinberger 2014-10-19 16:36 ` Richard Weinberger 1 sibling, 0 replies; 27+ messages in thread From: Richard Weinberger @ 2014-10-13 14:36 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Sami Liedes, linux-fsdevel, Al Viro Am 13.10.2014 um 10:59 schrieb OGAWA Hirofumi: >> The question is how long these limits will stay... >> User namespaces uncovered already a pile of issues wrt. to mounting. > > Well, anyway, I don't object like that simple patch. > > My worry is, I feel we need something like online-fsck finally if we > tackled fully to avoid issues (I still didn't analyze about this issue > seriously and fully), and measurable overheads. > > And I myself have interest to online/runtime-fsck (i.e. detect and fix) > though, I don't have interest to make it generic operations, and I would > not have interest to tackle for all FSes... At the end of the day we have to treat every filesystem provided by the user as user input and must not trust it. Thanks, //richard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-13 8:59 ` OGAWA Hirofumi 2014-10-13 14:36 ` Richard Weinberger @ 2014-10-19 16:36 ` Richard Weinberger 2014-10-23 15:28 ` OGAWA Hirofumi 1 sibling, 1 reply; 27+ messages in thread From: Richard Weinberger @ 2014-10-19 16:36 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Sami Liedes, linux-fsdevel, Al Viro Am 13.10.2014 um 10:59 schrieb OGAWA Hirofumi: > Richard Weinberger <richard@nod.at> writes: > >> Am 13.10.2014 um 10:35 schrieb OGAWA Hirofumi: >>> Richard Weinberger <richard@nod.at> writes: >>> >>>>> I'm still not sure whether this is right direction or not though, >>>>> because mount operation is root only and untrusted image should run fsck >>>>> before. But, also, Oops is clearly unexpected. Hmmm... >>>> >>>> This limitation is not true anymore. Plug in a USB stick into a recent >>>> Linux desktop, it will automatically mount it... Also think of user >>>> namespaces and FUSE. >>> >>> Not really (well, true, some sort though). It is still controlled by root >>> or capability, right? I.e. still controlled by admin of system. >> >> Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :) >> >>> I read user namespaces last time, it doesn't allow to mount the block >>> device by namespace's root. >>> >>> FUSE is allowed to mount by true user (I.e. admin can't disallow it)? I >>> still didn't check it fully. >> >> The question is how long these limits will stay... >> User namespaces uncovered already a pile of issues wrt. to mounting. > > Well, anyway, I don't object like that simple patch. > > My worry is, I feel we need something like online-fsck finally if we > tackled fully to avoid issues (I still didn't analyze about this issue > seriously and fully), and measurable overheads. > > And I myself have interest to online/runtime-fsck (i.e. detect and fix) > though, I don't have interest to make it generic operations, and I would > not have interest to tackle for all FSes... > What about this one? diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c index 6df8d3d..60a28b7 100644 --- a/fs/fat/namei_vfat.c +++ b/fs/fat/namei_vfat.c @@ -736,7 +736,8 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry, } alias = d_find_alias(inode); - if (alias && !vfat_d_anon_disconn(alias)) { + if (alias && !vfat_d_anon_disconn(alias) && + alias->d_parent == dentry->d_parent) { /* * This inode has non anonymous-DCACHE_DISCONNECTED * dentry. This means, the user did ->lookup() by an VFAT suffers from the issue because it is using dentry aliases to have fast lookups for short and long names. Without aliases the VFS will catch the loop just fine as on ext2 (I've tested it). Let's change vfat_lookup() to verify that both the alias and dentry have the same parent otherwise we're facing a loop. Thanks, //richard ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-19 16:36 ` Richard Weinberger @ 2014-10-23 15:28 ` OGAWA Hirofumi 2014-10-23 16:01 ` Al Viro 0 siblings, 1 reply; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-23 15:28 UTC (permalink / raw) To: Richard Weinberger; +Cc: Sami Liedes, linux-fsdevel, Al Viro Richard Weinberger <richard@nod.at> writes: >> Well, anyway, I don't object like that simple patch. >> >> My worry is, I feel we need something like online-fsck finally if we >> tackled fully to avoid issues (I still didn't analyze about this issue >> seriously and fully), and measurable overheads. >> >> And I myself have interest to online/runtime-fsck (i.e. detect and fix) >> though, I don't have interest to make it generic operations, and I would >> not have interest to tackle for all FSes... >> > > What about this one? Looks like strange. If we want to tackle this at per-FS. We should not return double linked dir at first. Since double linked breaks dir hierarchy, even if this one can avoid that Oops, double linked can be easily the cause of another Oops, deadlock, etc. Well, this patch is untested though. For example, somethings like following. But, again, this fixes only one of cases in double linked. (And to fix fully, my mind was already talked.) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> --- fs/fat/namei_msdos.c | 9 +++++++++ fs/fat/namei_vfat.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff -puN fs/fat/namei_vfat.c~vfat-detect-dir-loop fs/fat/namei_vfat.c --- linux-tux3/fs/fat/namei_vfat.c~vfat-detect-dir-loop 2014-10-20 17:44:17.874542711 +0900 +++ linux-tux3-hirofumi/fs/fat/namei_vfat.c 2014-10-20 17:42:24.494864616 +0900 @@ -735,6 +735,15 @@ static struct dentry *vfat_lookup(struct goto error; } + /* Simple sanity check to avoid Oops. */ + if (inode == dentry->d_parent->d_inode) { + fat_fs_error(sb, "%s: detected directory loop (i_pos %lld)", + __func__, MSDOS_I(inode)->i_pos); + iput(inode); + err = -EIO; + goto error; + } + alias = d_find_alias(inode); if (alias && !vfat_d_anon_disconn(alias)) { /* diff -puN fs/fat/namei_msdos.c~vfat-detect-dir-loop fs/fat/namei_msdos.c --- linux-tux3/fs/fat/namei_msdos.c~vfat-detect-dir-loop 2014-10-20 17:44:25.858520043 +0900 +++ linux-tux3-hirofumi/fs/fat/namei_msdos.c 2014-10-20 17:45:42.654302012 +0900 @@ -215,6 +215,15 @@ static struct dentry *msdos_lookup(struc case 0: inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos); brelse(sinfo.bh); + + /* Simple sanity check to avoid Oops. */ + if (inode == dentry->d_parent->d_inode) { + fat_fs_error(sb, + "%s: detected directory loop (i_pos %lld)", + __func__, MSDOS_I(inode)->i_pos); + iput(inode); + inode = ERR_PTR(-EIO); + } break; default: inode = ERR_PTR(err); _ -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 15:28 ` OGAWA Hirofumi @ 2014-10-23 16:01 ` Al Viro 2014-10-23 16:16 ` Al Viro 0 siblings, 1 reply; 27+ messages in thread From: Al Viro @ 2014-10-23 16:01 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel On Fri, Oct 24, 2014 at 12:28:58AM +0900, OGAWA Hirofumi wrote: > > What about this one? > > Looks like strange. If we want to tackle this at per-FS. We should not > return double linked dir at first. Since double linked breaks dir > hierarchy, even if this one can avoid that Oops, double linked can be > easily the cause of another Oops, deadlock, etc. > > Well, this patch is untested though. For example, somethings like > following. But, again, this fixes only one of cases in double linked. > (And to fix fully, my mind was already talked.) Hmm... Why hadn't d_splice_alias() caught that, though? Look: in that case we see that inode is non-NULL, a directory and has an alias (namely, dentry->d_parent). So we hit this: new = __d_find_any_alias(inode); if (new) { if (!IS_ROOT(new)) { spin_unlock(&inode->i_lock); dput(new); return ERR_PTR(-EIO); } if (d_ancestor(new, dentry)) { spin_unlock(&inode->i_lock); dput(new); return ERR_PTR(-EIO); } and depending on whether that ->d_parent had been the filesystem root, we hit either the former or the latter. IOW, we should've done exactly that... FWIW, there *is* a bug in that path - we ought to have done iput(inode) on both failure exits in order to follow the calling conventions. But that doesn't look like it would oops right there... Could somebody repost the oops stack trace? The bug in d_splice_alias() is real (and fairly old), but I'd like to understand if there's anything else in the game... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:01 ` Al Viro @ 2014-10-23 16:16 ` Al Viro 2014-10-23 16:45 ` OGAWA Hirofumi 0 siblings, 1 reply; 27+ messages in thread From: Al Viro @ 2014-10-23 16:16 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel On Thu, Oct 23, 2014 at 05:01:06PM +0100, Al Viro wrote: > Hmm... Why hadn't d_splice_alias() caught that, though? Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there we don't call d_splice_alias() on the affected path... OK, so your check isn't enough. What we need there is this: if (alias && alias->d_parent == dentry->d_parent && ...) Otherwise that d_move() isn't safe at all. Moreover, for directories we don't want to bother with that codepath at all - d_splice_alias() will do that d_move() just fine there. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:16 ` Al Viro @ 2014-10-23 16:45 ` OGAWA Hirofumi 2014-10-23 16:50 ` OGAWA Hirofumi ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-23 16:45 UTC (permalink / raw) To: Al Viro; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel Al Viro <viro@ZenIV.linux.org.uk> writes: > On Thu, Oct 23, 2014 at 05:01:06PM +0100, Al Viro wrote: > >> Hmm... Why hadn't d_splice_alias() caught that, though? > > Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there > we don't call d_splice_alias() on the affected path... > > OK, so your check isn't enough. What we need there is this: > if (alias && alias->d_parent == dentry->d_parent && ...) > Otherwise that d_move() isn't safe at all. Hm, sounds like I'm missing something. what case has different ->d_parent on alias if prevented by my check? > Moreover, for directories we don't want to bother with that codepath > at all - d_splice_alias() will do that d_move() just fine there. d_splice_alias() calls __d_find_alias() with want_discon==1, so __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use d_move() path, right? -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:45 ` OGAWA Hirofumi @ 2014-10-23 16:50 ` OGAWA Hirofumi 2014-10-23 16:55 ` Richard Weinberger 2014-10-23 16:55 ` Al Viro 2014-10-23 16:56 ` Al Viro 2 siblings, 1 reply; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-23 16:50 UTC (permalink / raw) To: Al Viro; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes: > Al Viro <viro@ZenIV.linux.org.uk> writes: > >> On Thu, Oct 23, 2014 at 05:01:06PM +0100, Al Viro wrote: >> >>> Hmm... Why hadn't d_splice_alias() caught that, though? >> >> Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there >> we don't call d_splice_alias() on the affected path... >> >> OK, so your check isn't enough. What we need there is this: >> if (alias && alias->d_parent == dentry->d_parent && ...) >> Otherwise that d_move() isn't safe at all. > > Hm, sounds like I'm missing something. what case has different > ->d_parent on alias if prevented by my check? You meant the case of more complex double linked loop? -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:50 ` OGAWA Hirofumi @ 2014-10-23 16:55 ` Richard Weinberger 0 siblings, 0 replies; 27+ messages in thread From: Richard Weinberger @ 2014-10-23 16:55 UTC (permalink / raw) To: OGAWA Hirofumi, Al Viro; +Cc: Sami Liedes, linux-fsdevel Am 23.10.2014 um 18:50 schrieb OGAWA Hirofumi: > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes: > >> Al Viro <viro@ZenIV.linux.org.uk> writes: >> >>> On Thu, Oct 23, 2014 at 05:01:06PM +0100, Al Viro wrote: >>> >>>> Hmm... Why hadn't d_splice_alias() caught that, though? >>> >>> Aha. It's not namei_msdos.c part, it's namei_vfat.c one. And there >>> we don't call d_splice_alias() on the affected path... >>> >>> OK, so your check isn't enough. What we need there is this: >>> if (alias && alias->d_parent == dentry->d_parent && ...) >>> Otherwise that d_move() isn't safe at all. >> >> Hm, sounds like I'm missing something. what case has different >> ->d_parent on alias if prevented by my check? > > You meant the case of more complex double linked loop? Yes. This is why I came up with the alias->d_parent == dentry->d_parent check. Thanks, //richard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:45 ` OGAWA Hirofumi 2014-10-23 16:50 ` OGAWA Hirofumi @ 2014-10-23 16:55 ` Al Viro 2014-10-23 17:21 ` Al Viro ` (2 more replies) 2014-10-23 16:56 ` Al Viro 2 siblings, 3 replies; 27+ messages in thread From: Al Viro @ 2014-10-23 16:55 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: > d_splice_alias() calls __d_find_alias() with want_discon==1, so > __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use > d_move() path, right? Hmm... Not in the current mainline (and not because of want_discon - that's gone already). However, with the fixes I've got in the local tree it will both find and move it - same as d_materialise_unique() would in the current mainline. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:55 ` Al Viro @ 2014-10-23 17:21 ` Al Viro 2014-10-23 17:58 ` OGAWA Hirofumi 2014-10-23 20:46 ` Sami Liedes 2014-10-23 17:35 ` OGAWA Hirofumi 2014-10-23 17:54 ` J. Bruce Fields 2 siblings, 2 replies; 27+ messages in thread From: Al Viro @ 2014-10-23 17:21 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel On Thu, Oct 23, 2014 at 05:55:33PM +0100, Al Viro wrote: > On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: > > > d_splice_alias() calls __d_find_alias() with want_discon==1, so > > __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use > > d_move() path, right? > > Hmm... Not in the current mainline (and not because of want_discon - that's > gone already). However, with the fixes I've got in the local tree it > will both find and move it - same as d_materialise_unique() would in the > current mainline. Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique() merge happens, we'll be able to clean vfat_lookup() a bit more. a) don't bother with ->d_time for positives - we only check it for negatives anyway. b) make sure to set it at unlink and rmdir time - at *that* point soon-to-be negative dentry matches then-current directory contents c) don't go into renaming of old alias in vfat_lookup() unless it has the same parent (which it will, unless we are seeing corrupted image) *and* is a non-directory d) use (for now) d_materialise_unique() instead of d_splice_alias() - that one will do renames of old directory aliases just fine (and pretty soon so will d_splice_alias(), but this bug is -stable fodder) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c index 6df8d3d..eed856f 100644 --- a/fs/fat/namei_vfat.c +++ b/fs/fat/namei_vfat.c @@ -736,17 +736,17 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry, } alias = d_find_alias(inode); - if (alias && !vfat_d_anon_disconn(alias)) { + if (alias && alias->d_parent == dentry->d_parent && + !S_ISDIR(inode->i_mode) && !vfat_d_anon_disconn(alias)) { /* - * This inode has non anonymous-DCACHE_DISCONNECTED + * This file has non anonymous-DCACHE_DISCONNECTED * dentry. This means, the user did ->lookup() by an * another name (longname vs 8.3 alias of it) in past. * * Switch to new one for reason of locality if possible. */ BUG_ON(d_unhashed(alias)); - if (!S_ISDIR(inode->i_mode)) - d_move(alias, dentry); + d_move(alias, dentry); iput(inode); mutex_unlock(&MSDOS_SB(sb)->s_lock); return alias; @@ -755,12 +755,9 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry, out: mutex_unlock(&MSDOS_SB(sb)->s_lock); - dentry->d_time = dentry->d_parent->d_inode->i_version; - dentry = d_splice_alias(inode, dentry); - if (dentry) - dentry->d_time = dentry->d_parent->d_inode->i_version; - return dentry; - + if (!inode) + dentry->d_time = dir->i_version; + return d_materialise_unique(dentry, inode); error: mutex_unlock(&MSDOS_SB(sb)->s_lock); return ERR_PTR(err); @@ -793,7 +790,6 @@ static int vfat_create(struct inode *dir, struct dentry *dentry, umode_t mode, inode->i_mtime = inode->i_atime = inode->i_ctime = ts; /* timestamp is already written, so mark_inode_dirty() is unneeded. */ - dentry->d_time = dentry->d_parent->d_inode->i_version; d_instantiate(dentry, inode); out: mutex_unlock(&MSDOS_SB(sb)->s_lock); @@ -824,6 +820,7 @@ static int vfat_rmdir(struct inode *dir, struct dentry *dentry) clear_nlink(inode); inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC; fat_detach(inode); + dentry->d_time = dir->i_version; out: mutex_unlock(&MSDOS_SB(sb)->s_lock); @@ -849,6 +846,7 @@ static int vfat_unlink(struct inode *dir, struct dentry *dentry) clear_nlink(inode); inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC; fat_detach(inode); + dentry->d_time = dir->i_version; out: mutex_unlock(&MSDOS_SB(sb)->s_lock); @@ -889,7 +887,6 @@ static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) inode->i_mtime = inode->i_atime = inode->i_ctime = ts; /* timestamp is already written, so mark_inode_dirty() is unneeded. */ - dentry->d_time = dentry->d_parent->d_inode->i_version; d_instantiate(dentry, inode); mutex_unlock(&MSDOS_SB(sb)->s_lock); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 17:21 ` Al Viro @ 2014-10-23 17:58 ` OGAWA Hirofumi 2014-10-23 20:46 ` Sami Liedes 1 sibling, 0 replies; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-23 17:58 UTC (permalink / raw) To: Al Viro; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel Al Viro <viro@ZenIV.linux.org.uk> writes: > On Thu, Oct 23, 2014 at 05:55:33PM +0100, Al Viro wrote: >> On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: >> >> > d_splice_alias() calls __d_find_alias() with want_discon==1, so >> > __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use >> > d_move() path, right? >> >> Hmm... Not in the current mainline (and not because of want_discon - that's >> gone already). However, with the fixes I've got in the local tree it >> will both find and move it - same as d_materialise_unique() would in the >> current mainline. > > Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique() > merge happens, we'll be able to clean vfat_lookup() a bit more. > > a) don't bother with ->d_time for positives - we only check it for negatives > anyway. > b) make sure to set it at unlink and rmdir time - at *that* point soon-to-be > negative dentry matches then-current directory contents > c) don't go into renaming of old alias in vfat_lookup() unless it has > the same parent (which it will, unless we are seeing corrupted image) *and* > is a non-directory > d) use (for now) d_materialise_unique() instead of d_splice_alias() - that one > will do renames of old directory aliases just fine (and pretty soon so will > d_splice_alias(), but this bug is -stable fodder) Ah, this calls d_move() for directory? I recalled why it didn't change alias if directory. Changing result of getcwd() may become the surprise of apps... If doesn't surprise, looks good. > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c > index 6df8d3d..eed856f 100644 > --- a/fs/fat/namei_vfat.c > +++ b/fs/fat/namei_vfat.c > @@ -736,17 +736,17 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry, > } > > alias = d_find_alias(inode); > - if (alias && !vfat_d_anon_disconn(alias)) { > + if (alias && alias->d_parent == dentry->d_parent && > + !S_ISDIR(inode->i_mode) && !vfat_d_anon_disconn(alias)) { > /* > - * This inode has non anonymous-DCACHE_DISCONNECTED > + * This file has non anonymous-DCACHE_DISCONNECTED > * dentry. This means, the user did ->lookup() by an > * another name (longname vs 8.3 alias of it) in past. > * > * Switch to new one for reason of locality if possible. > */ > BUG_ON(d_unhashed(alias)); > - if (!S_ISDIR(inode->i_mode)) > - d_move(alias, dentry); > + d_move(alias, dentry); > iput(inode); > mutex_unlock(&MSDOS_SB(sb)->s_lock); > return alias; > @@ -755,12 +755,9 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry, > > out: > mutex_unlock(&MSDOS_SB(sb)->s_lock); > - dentry->d_time = dentry->d_parent->d_inode->i_version; > - dentry = d_splice_alias(inode, dentry); > - if (dentry) > - dentry->d_time = dentry->d_parent->d_inode->i_version; > - return dentry; > - > + if (!inode) > + dentry->d_time = dir->i_version; > + return d_materialise_unique(dentry, inode); > error: > mutex_unlock(&MSDOS_SB(sb)->s_lock); > return ERR_PTR(err); > @@ -793,7 +790,6 @@ static int vfat_create(struct inode *dir, struct dentry *dentry, umode_t mode, > inode->i_mtime = inode->i_atime = inode->i_ctime = ts; > /* timestamp is already written, so mark_inode_dirty() is unneeded. */ > > - dentry->d_time = dentry->d_parent->d_inode->i_version; > d_instantiate(dentry, inode); > out: > mutex_unlock(&MSDOS_SB(sb)->s_lock); > @@ -824,6 +820,7 @@ static int vfat_rmdir(struct inode *dir, struct dentry *dentry) > clear_nlink(inode); > inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC; > fat_detach(inode); > + dentry->d_time = dir->i_version; > out: > mutex_unlock(&MSDOS_SB(sb)->s_lock); > > @@ -849,6 +846,7 @@ static int vfat_unlink(struct inode *dir, struct dentry *dentry) > clear_nlink(inode); > inode->i_mtime = inode->i_atime = CURRENT_TIME_SEC; > fat_detach(inode); > + dentry->d_time = dir->i_version; > out: > mutex_unlock(&MSDOS_SB(sb)->s_lock); > > @@ -889,7 +887,6 @@ static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) > inode->i_mtime = inode->i_atime = inode->i_ctime = ts; > /* timestamp is already written, so mark_inode_dirty() is unneeded. */ > > - dentry->d_time = dentry->d_parent->d_inode->i_version; > d_instantiate(dentry, inode); > > mutex_unlock(&MSDOS_SB(sb)->s_lock); -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 17:21 ` Al Viro 2014-10-23 17:58 ` OGAWA Hirofumi @ 2014-10-23 20:46 ` Sami Liedes 1 sibling, 0 replies; 27+ messages in thread From: Sami Liedes @ 2014-10-23 20:46 UTC (permalink / raw) To: Al Viro; +Cc: OGAWA Hirofumi, Richard Weinberger, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1174 bytes --] On Thu, Oct 23, 2014 at 06:21:58PM +0100, Al Viro wrote: > Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique() > merge happens, we'll be able to clean vfat_lookup() a bit more. > > a) don't bother with ->d_time for positives - we only check it for negatives > anyway. > b) make sure to set it at unlink and rmdir time - at *that* point soon-to-be > negative dentry matches then-current directory contents > c) don't go into renaming of old alias in vfat_lookup() unless it has > the same parent (which it will, unless we are seeing corrupted image) *and* > is a non-directory > d) use (for now) d_materialise_unique() instead of d_splice_alias() - that one > will do renames of old directory aliases just fine (and pretty soon so will > d_splice_alias(), but this bug is -stable fodder) > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Sami Liedes <sami.liedes@iki.fi> I can verify that this patch fixes the crash on 3.17.1. However I have not tested that it doesn't break something else like non-fuzzed filesystems :-) I'm going to rerun the fuzz tests on vfat with this patch applied. Sami [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:55 ` Al Viro 2014-10-23 17:21 ` Al Viro @ 2014-10-23 17:35 ` OGAWA Hirofumi 2014-10-23 17:54 ` J. Bruce Fields 2 siblings, 0 replies; 27+ messages in thread From: OGAWA Hirofumi @ 2014-10-23 17:35 UTC (permalink / raw) To: Al Viro; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel Al Viro <viro@ZenIV.linux.org.uk> writes: > On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: > >> d_splice_alias() calls __d_find_alias() with want_discon==1, so >> __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use >> d_move() path, right? > > Hmm... Not in the current mainline (and not because of want_discon - that's > gone already). However, with the fixes I've got in the local tree it > will both find and move it - same as d_materialise_unique() would in the > current mainline. Ah, I see. Checked latest linus tree, looks like added some checks (I was still working on 3.6.15, sorry). So, Richard, can you add comment such as /* Check FS corruption, will handle by d_splice_alias() */ for that less understandable check, and resend patch with your Signed-off-by: (better to Cc: akpm)? Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:55 ` Al Viro 2014-10-23 17:21 ` Al Viro 2014-10-23 17:35 ` OGAWA Hirofumi @ 2014-10-23 17:54 ` J. Bruce Fields 2014-10-23 18:05 ` Al Viro 2 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2014-10-23 17:54 UTC (permalink / raw) To: Al Viro; +Cc: OGAWA Hirofumi, Richard Weinberger, Sami Liedes, linux-fsdevel On Thu, Oct 23, 2014 at 05:55:33PM +0100, Al Viro wrote: > On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: > > > d_splice_alias() calls __d_find_alias() with want_discon==1, so > > __d_find_alias() doesn't return dentry, and d_splice_alias() doesn't use > > d_move() path, right? > > Hmm... Not in the current mainline (and not because of want_discon - that's > gone already). However, with the fixes I've got in the local tree it > will both find and move it - same as d_materialise_unique() would in the > current mainline. What happened to the idea of failing with -EIO in that case? In the case of a disk filesystem it seems like it can only happen due to corruption and I'd assumed we're better off failing the first time we notice it. (That said, I haven't thought it through beyond that, so it's not something I have that strong an opinion on.) --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 17:54 ` J. Bruce Fields @ 2014-10-23 18:05 ` Al Viro 2014-10-23 18:16 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Al Viro @ 2014-10-23 18:05 UTC (permalink / raw) To: J. Bruce Fields Cc: OGAWA Hirofumi, Richard Weinberger, Sami Liedes, linux-fsdevel On Thu, Oct 23, 2014 at 01:54:28PM -0400, J. Bruce Fields wrote: > > Hmm... Not in the current mainline (and not because of want_discon - that's > > gone already). However, with the fixes I've got in the local tree it > > will both find and move it - same as d_materialise_unique() would in the > > current mainline. > > What happened to the idea of failing with -EIO in that case? Quiet d_move() is definitely better in this case. Failing with EIO is what d_splice_alias() does *now*, and that's why the patch I've posted uses d_materialise_unique(). Patches in my local tree switch that case of d_splice_alias() to what d_materialise_unique() does (and kill d_materialise_unique() completely - compat #define is left, but all in-tree users are switched to d_splice_alias())... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 18:05 ` Al Viro @ 2014-10-23 18:16 ` J. Bruce Fields 0 siblings, 0 replies; 27+ messages in thread From: J. Bruce Fields @ 2014-10-23 18:16 UTC (permalink / raw) To: Al Viro; +Cc: OGAWA Hirofumi, Richard Weinberger, Sami Liedes, linux-fsdevel On Thu, Oct 23, 2014 at 07:05:11PM +0100, Al Viro wrote: > On Thu, Oct 23, 2014 at 01:54:28PM -0400, J. Bruce Fields wrote: > > > > Hmm... Not in the current mainline (and not because of want_discon - that's > > > gone already). However, with the fixes I've got in the local tree it > > > will both find and move it - same as d_materialise_unique() would in the > > > current mainline. > > > > What happened to the idea of failing with -EIO in that case? > > Quiet d_move() is definitely better in this case. Can you explain why? (Apologies if I'm missing the obvious.) --b. > Failing with EIO is what > d_splice_alias() does *now*, and that's why the patch I've posted uses > d_materialise_unique(). Patches in my local tree switch that case of > d_splice_alias() to what d_materialise_unique() does (and kill > d_materialise_unique() completely - compat #define is left, but all > in-tree users are switched to d_splice_alias())... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Intentionally corrupted vfat fs causing BUG 2014-10-23 16:45 ` OGAWA Hirofumi 2014-10-23 16:50 ` OGAWA Hirofumi 2014-10-23 16:55 ` Al Viro @ 2014-10-23 16:56 ` Al Viro 2 siblings, 0 replies; 27+ messages in thread From: Al Viro @ 2014-10-23 16:56 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: Richard Weinberger, Sami Liedes, linux-fsdevel On Fri, Oct 24, 2014 at 01:45:49AM +0900, OGAWA Hirofumi wrote: > Hm, sounds like I'm missing something. what case has different > ->d_parent on alias if prevented by my check? A cross-directory link. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-10-23 20:46 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-10 20:57 Intentionally corrupted vfat fs causing BUG Sami Liedes 2014-10-11 10:20 ` Richard Weinberger 2014-10-12 12:08 ` OGAWA Hirofumi 2014-10-12 19:04 ` Richard Weinberger 2014-10-12 20:40 ` Sami Liedes 2014-10-13 7:57 ` OGAWA Hirofumi 2014-10-13 8:22 ` Richard Weinberger 2014-10-13 8:35 ` OGAWA Hirofumi 2014-10-13 8:39 ` Richard Weinberger 2014-10-13 8:59 ` OGAWA Hirofumi 2014-10-13 14:36 ` Richard Weinberger 2014-10-19 16:36 ` Richard Weinberger 2014-10-23 15:28 ` OGAWA Hirofumi 2014-10-23 16:01 ` Al Viro 2014-10-23 16:16 ` Al Viro 2014-10-23 16:45 ` OGAWA Hirofumi 2014-10-23 16:50 ` OGAWA Hirofumi 2014-10-23 16:55 ` Richard Weinberger 2014-10-23 16:55 ` Al Viro 2014-10-23 17:21 ` Al Viro 2014-10-23 17:58 ` OGAWA Hirofumi 2014-10-23 20:46 ` Sami Liedes 2014-10-23 17:35 ` OGAWA Hirofumi 2014-10-23 17:54 ` J. Bruce Fields 2014-10-23 18:05 ` Al Viro 2014-10-23 18:16 ` J. Bruce Fields 2014-10-23 16:56 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).