* torrent hash failures since 3.9.0-rc1 @ 2013-03-11 17:18 Markus Trippelsdorf 2013-03-11 19:17 ` Markus Trippelsdorf 0 siblings, 1 reply; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 17:18 UTC (permalink / raw) To: linux-ext4, linux-kernel I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8 seems to be fine). What happens is that the torrents apparently complete successfully. After reboot however the hash check fails and there are missing (or corrupted) chunks. I've tested this with two different clients (rtorrent and aria2c) and both are affected. So I think this might be a filesystem issue. /dev/sda ext4 1.4T 666G 640G 51% /var /dev/sda on /var type ext4 (rw,noatime,data=ordered) I use ECC memory (and there is nothing in the logs). -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 17:18 torrent hash failures since 3.9.0-rc1 Markus Trippelsdorf @ 2013-03-11 19:17 ` Markus Trippelsdorf 2013-03-11 19:41 ` Dave Jones 0 siblings, 1 reply; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 19:17 UTC (permalink / raw) To: linux-ext4, linux-kernel On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote: > I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8 > seems to be fine). What happens is that the torrents apparently complete > successfully. After reboot however the hash check fails and there are > missing (or corrupted) chunks. I've tested this with two different > clients (rtorrent and aria2c) and both are affected. So I think this > might be a filesystem issue. > > /dev/sda ext4 1.4T 666G 640G 51% /var > /dev/sda on /var type ext4 (rw,noatime,data=ordered) > > I use ECC memory (and there is nothing in the logs). To reproduce this issue just do the following: % wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent % rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent (Wait until the torrent finishes) % sudo echo 3 > /proc/sys/vm/drop_caches (Rehash the torrent (Ctrl-R)) The torrent doesn't rehash successfully and a few hunks are missing/corrupted and need to be downloaded again. -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 19:17 ` Markus Trippelsdorf @ 2013-03-11 19:41 ` Dave Jones 2013-03-11 20:13 ` Markus Trippelsdorf 0 siblings, 1 reply; 22+ messages in thread From: Dave Jones @ 2013-03-11 19:41 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: linux-ext4, linux-kernel On Mon, Mar 11, 2013 at 08:17:53PM +0100, Markus Trippelsdorf wrote: > On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote: > > I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8 > > seems to be fine). What happens is that the torrents apparently complete > > successfully. After reboot however the hash check fails and there are > > missing (or corrupted) chunks. I've tested this with two different > > clients (rtorrent and aria2c) and both are affected. So I think this > > might be a filesystem issue. > > > > /dev/sda ext4 1.4T 666G 640G 51% /var > > /dev/sda on /var type ext4 (rw,noatime,data=ordered) > > > > I use ECC memory (and there is nothing in the logs). > > To reproduce this issue just do the following: > > % wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent > % rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent > (Wait until the torrent finishes) > % sudo echo 3 > /proc/sys/vm/drop_caches > (Rehash the torrent (Ctrl-R)) > The torrent doesn't rehash successfully and a few hunks are > missing/corrupted and need to be downloaded again. Worked fine for me on two separate machines. Could it be a network problem perhaps ? If something is mangling the packet before it hits the disk, that would explain it. What NIC do you use ? Or maybe you could isolate it to a filesystem problem using something like fsx ? Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 19:41 ` Dave Jones @ 2013-03-11 20:13 ` Markus Trippelsdorf 2013-03-11 20:37 ` Theodore Ts'o 2013-03-11 20:44 ` Dave Jones 0 siblings, 2 replies; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 20:13 UTC (permalink / raw) To: Dave Jones, linux-ext4, linux-kernel On 2013.03.11 at 15:41 -0400, Dave Jones wrote: > On Mon, Mar 11, 2013 at 08:17:53PM +0100, Markus Trippelsdorf wrote: > > On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote: > > > I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8 > > > seems to be fine). What happens is that the torrents apparently complete > > > successfully. After reboot however the hash check fails and there are > > > missing (or corrupted) chunks. I've tested this with two different > > > clients (rtorrent and aria2c) and both are affected. So I think this > > > might be a filesystem issue. > > > > > > /dev/sda ext4 1.4T 666G 640G 51% /var > > > /dev/sda on /var type ext4 (rw,noatime,data=ordered) > > > > > > I use ECC memory (and there is nothing in the logs). > > > > To reproduce this issue just do the following: > > > > % wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent > > % rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent > > (Wait until the torrent finishes) > > % sudo echo 3 > /proc/sys/vm/drop_caches > > (Rehash the torrent (Ctrl-R)) > > The torrent doesn't rehash successfully and a few hunks are > > missing/corrupted and need to be downloaded again. > > Worked fine for me on two separate machines. Could it be a network problem > perhaps ? If something is mangling the packet before it hits the disk, > that would explain it. What NIC do you use ? I normally use ATL1E, but I've dusted off my E100 and the issue is also reproducible on the Intel card. > Or maybe you could isolate it to a filesystem problem using something > like fsx ? I've found fsx on your homepage, but I've no idea on how to use this tool. Any pointers? -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 20:13 ` Markus Trippelsdorf @ 2013-03-11 20:37 ` Theodore Ts'o 2013-03-11 20:46 ` Markus Trippelsdorf 2013-03-11 20:44 ` Dave Jones 1 sibling, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2013-03-11 20:37 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: Dave Jones, linux-ext4, linux-kernel On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote: > On 2013.03.11 at 15:41 -0400, Dave Jones wrote: > > Worked fine for me on two separate machines. Could it be a network problem > > perhaps ? If something is mangling the packet before it hits the disk, > > that would explain it. What NIC do you use ? I'm not a torrent expert, but I thought it did enough checksumming such that if the packet got mangled, it would get noticd by the torrent client before it writes the chunks to disk? > > Or maybe you could isolate it to a filesystem problem using something > > like fsx ? > > I've found fsx on your homepage, but I've no idea on how to use this > tool. Any pointers? We actually run fsx in a number of different configruations as part of our regression testing before we send Linus a pull request, and haven't found any issues. So unless it's a hardware problem, it seems unlikely to me that your running fsx would turn up anything. Can you send a dumpefs -h of the file system in question, and what mount options (if any) you are using? Thanks!! BTW, I'm currently running 3.9-rc2 with some additional fixes from the ext4 dev branch, and I'm not able to reproduce the problem using rtorrent on my laptop. How reliably is it reproducing for you? Are you seeing the problem every time you try this? - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 20:37 ` Theodore Ts'o @ 2013-03-11 20:46 ` Markus Trippelsdorf 2013-03-11 21:18 ` Theodore Ts'o 2013-03-12 8:28 ` Sander 0 siblings, 2 replies; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 20:46 UTC (permalink / raw) To: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote: > On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote: > > On 2013.03.11 at 15:41 -0400, Dave Jones wrote: > > > Worked fine for me on two separate machines. Could it be a network problem > > > perhaps ? If something is mangling the packet before it hits the disk, > > > that would explain it. What NIC do you use ? > > I'm not a torrent expert, but I thought it did enough checksumming > such that if the packet got mangled, it would get noticd by the > torrent client before it writes the chunks to disk? Yes, I think that's the idea. > > > Or maybe you could isolate it to a filesystem problem using something > > > like fsx ? > > > > I've found fsx on your homepage, but I've no idea on how to use this > > tool. Any pointers? > > We actually run fsx in a number of different configruations as part of > our regression testing before we send Linus a pull request, and > haven't found any issues. So unless it's a hardware problem, it seems > unlikely to me that your running fsx would turn up anything. Yes, I let it run for a while anyway and it didn't report any failure. > Can you send a dumpefs -h of the file system in question, and what > mount options (if any) you are using? Thanks!! # dumpe2fs -h /dev/sda dumpe2fs 1.42.7 (21-Jan-2013) Filesystem volume name: <none> Last mounted on: /var Filesystem UUID: 202f2c93-c6c5-4d70-a63f-d770161138bd Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 91578368 Block count: 366284646 Reserved block count: 18314232 Free blocks: 185850075 Free inodes: 90003798 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 936 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Mon Nov 19 16:02:46 2012 Last mount time: Mon Mar 11 21:16:23 2013 Last write time: Mon Mar 11 21:16:23 2013 Mount count: 20 Maximum mount count: -1 Last checked: Mon Mar 4 13:32:55 2013 Check interval: 0 (<none>) Lifetime writes: 2891 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 60164803 Default directory hash: half_md4 Directory Hash Seed: e86f34a0-390a-49b6-87a9-3336d861ab81 Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x00079bef Journal start: 1 noatime is the only mount option. > BTW, I'm currently running 3.9-rc2 with some additional fixes from the > ext4 dev branch, and I'm not able to reproduce the problem using > rtorrent on my laptop. How reliably is it reproducing for you? Are > you seeing the problem every time you try this? Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue vanishes. -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 20:46 ` Markus Trippelsdorf @ 2013-03-11 21:18 ` Theodore Ts'o 2013-03-11 21:38 ` Markus Trippelsdorf 2013-03-12 8:28 ` Sander 1 sibling, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2013-03-11 21:18 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: Dave Jones, linux-ext4, linux-kernel On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote: > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the > > ext4 dev branch, and I'm not able to reproduce the problem using > > rtorrent on my laptop. How reliably is it reproducing for you? Are > > you seeing the problem every time you try this? > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue > vanishes. Would you be willing to try an experiment? Try pulling down the master branch from the ext4 git tree here: git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git This contains all of the ext4 changes which are in 3.9-rc1, based on top of 3.8-rc3. See if it reproduces there. If it does, then it would tend to confirm the hypothesis that the issue was introduced by one of the ext4 patches that we merged during the 3.9-rc1 merge window... and then, since if you can reproduce the problem, if you could do a git bisect to find the guilty commit, that would really greatly appreciated. If you can't reproduce it from the ext4.git tree, then the problem is probably caused by some other change that was introduced between 3.8 and 3.9-rc1. Thanks in advance, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 21:18 ` Theodore Ts'o @ 2013-03-11 21:38 ` Markus Trippelsdorf 2013-03-11 23:12 ` Markus Trippelsdorf 0 siblings, 1 reply; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 21:38 UTC (permalink / raw) To: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote: > On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote: > > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the > > > ext4 dev branch, and I'm not able to reproduce the problem using > > > rtorrent on my laptop. How reliably is it reproducing for you? Are > > > you seeing the problem every time you try this? > > > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue > > vanishes. > > Would you be willing to try an experiment? > > Try pulling down the master branch from the ext4 git tree here: > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git > > This contains all of the ext4 changes which are in 3.9-rc1, based on > top of 3.8-rc3. See if it reproduces there. If it does, then it > would tend to confirm the hypothesis that the issue was introduced by > one of the ext4 patches that we merged during the 3.9-rc1 merge > window... and then, since if you can reproduce the problem, if you > could do a git bisect to find the guilty commit, that would really > greatly appreciated. > > If you can't reproduce it from the ext4.git tree, then the problem is > probably caused by some other change that was introduced between 3.8 > and 3.9-rc1. I've started a full bisection from v3.8 to todays git tree. It will take ~13 steps. However it's already late here in Germany. I will continue the bisection tomorrow and report back. -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 21:38 ` Markus Trippelsdorf @ 2013-03-11 23:12 ` Markus Trippelsdorf 2013-03-11 23:26 ` Dave Jones 2013-03-12 3:00 ` Zheng Liu 0 siblings, 2 replies; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-11 23:12 UTC (permalink / raw) To: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On 2013.03.11 at 22:38 +0100, Markus Trippelsdorf wrote: > On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote: > > On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote: > > > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the > > > > ext4 dev branch, and I'm not able to reproduce the problem using > > > > rtorrent on my laptop. How reliably is it reproducing for you? Are > > > > you seeing the problem every time you try this? > > > > > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue > > > vanishes. > > > > Would you be willing to try an experiment? > > > > Try pulling down the master branch from the ext4 git tree here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git > > > > This contains all of the ext4 changes which are in 3.9-rc1, based on > > top of 3.8-rc3. See if it reproduces there. If it does, then it > > would tend to confirm the hypothesis that the issue was introduced by > > one of the ext4 patches that we merged during the 3.9-rc1 merge > > window... and then, since if you can reproduce the problem, if you > > could do a git bisect to find the guilty commit, that would really > > greatly appreciated. > > > > If you can't reproduce it from the ext4.git tree, then the problem is > > probably caused by some other change that was introduced between 3.8 > > and 3.9-rc1. > > I've started a full bisection from v3.8 to todays git tree. It will take > ~13 steps. However it's already late here in Germany. I will continue > the bisection tomorrow and report back. The issue started with: 74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit commit 74cd15cd02708c7188581f279f33a98b2ae8d322 Author: Zheng Liu <wenqing.lz@taobao.com> Date: Mon Feb 18 00:32:55 2013 -0500 ext4: reclaim extents from extent status tree Please note that my local rtorrent version was configured with "--with-posix-fallocate". I'm not sure if distributions also enable this flag, but it could explain why Ted and Dave weren't able to reproduce the problem so far. -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 23:12 ` Markus Trippelsdorf @ 2013-03-11 23:26 ` Dave Jones 2013-03-12 3:00 ` Zheng Liu 1 sibling, 0 replies; 22+ messages in thread From: Dave Jones @ 2013-03-11 23:26 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Theodore Ts'o, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 12:12:27AM +0100, Markus Trippelsdorf wrote: > > I've started a full bisection from v3.8 to todays git tree. It will take > > ~13 steps. However it's already late here in Germany. I will continue > > the bisection tomorrow and report back. > > The issue started with: > > 74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit > commit 74cd15cd02708c7188581f279f33a98b2ae8d322 > Author: Zheng Liu <wenqing.lz@taobao.com> > Date: Mon Feb 18 00:32:55 2013 -0500 > > ext4: reclaim extents from extent status tree > > Please note that my local rtorrent version was configured with > "--with-posix-fallocate". I'm not sure if distributions also enable this > flag, but it could explain why Ted and Dave weren't able to reproduce > the problem so far. Looks like Fedora doesn't, so indeed that could explain it. Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 23:12 ` Markus Trippelsdorf 2013-03-11 23:26 ` Dave Jones @ 2013-03-12 3:00 ` Zheng Liu 2013-03-12 3:30 ` Theodore Ts'o 1 sibling, 1 reply; 22+ messages in thread From: Zheng Liu @ 2013-03-12 3:00 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 12:12:27AM +0100, Markus Trippelsdorf wrote: > On 2013.03.11 at 22:38 +0100, Markus Trippelsdorf wrote: > > On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote: > > > On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote: > > > > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the > > > > > ext4 dev branch, and I'm not able to reproduce the problem using > > > > > rtorrent on my laptop. How reliably is it reproducing for you? Are > > > > > you seeing the problem every time you try this? > > > > > > > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue > > > > vanishes. > > > > > > Would you be willing to try an experiment? > > > > > > Try pulling down the master branch from the ext4 git tree here: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git > > > > > > This contains all of the ext4 changes which are in 3.9-rc1, based on > > > top of 3.8-rc3. See if it reproduces there. If it does, then it > > > would tend to confirm the hypothesis that the issue was introduced by > > > one of the ext4 patches that we merged during the 3.9-rc1 merge > > > window... and then, since if you can reproduce the problem, if you > > > could do a git bisect to find the guilty commit, that would really > > > greatly appreciated. > > > > > > If you can't reproduce it from the ext4.git tree, then the problem is > > > probably caused by some other change that was introduced between 3.8 > > > and 3.9-rc1. > > > > I've started a full bisection from v3.8 to todays git tree. It will take > > ~13 steps. However it's already late here in Germany. I will continue > > the bisection tomorrow and report back. > > The issue started with: > > 74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit > commit 74cd15cd02708c7188581f279f33a98b2ae8d322 > Author: Zheng Liu <wenqing.lz@taobao.com> > Date: Mon Feb 18 00:32:55 2013 -0500 > > ext4: reclaim extents from extent status tree > > Please note that my local rtorrent version was configured with > "--with-posix-fallocate". I'm not sure if distributions also enable this > flag, but it could explain why Ted and Dave weren't able to reproduce > the problem so far. Hi Markus, Thanks for reporting this problem. My deepest apologies. As Ted suggested, could you please try to use ext4 git tree? I want to make sure whether this bug has been fixed by my lastest patch series or not. Thanks in advance, - Zheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 3:00 ` Zheng Liu @ 2013-03-12 3:30 ` Theodore Ts'o 2013-03-12 3:44 ` Theodore Ts'o 2013-03-12 6:16 ` Markus Trippelsdorf 0 siblings, 2 replies; 22+ messages in thread From: Theodore Ts'o @ 2013-03-12 3:30 UTC (permalink / raw) To: Markus Trippelsdorf, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote: > > Thanks for reporting this problem. My deepest apologies. > > As Ted suggested, could you please try to use ext4 git tree? I want to > make sure whether this bug has been fixed by my lastest patch series or > not. It's definitely worth a try to compile the master branch of the ext4 tree and see if it reproduces or not. However, I suspect the problem will still be there. Based on the commit which Markus has identified, I'm guessing it's a race between the extents_status shrinker and writing into uninitialized region of the file (since apprently compiling rtorrent with --with-posix-fallocate is required). Markus, how much memory do you have in your system? That may be the other reason why I haven't been able to reproduce it to date; I had a lot of free memory when I tried to reproduce the problem, so the slab shrinker didn't engage. Regards, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 3:30 ` Theodore Ts'o @ 2013-03-12 3:44 ` Theodore Ts'o 2013-03-12 6:16 ` Markus Trippelsdorf 1 sibling, 0 replies; 22+ messages in thread From: Theodore Ts'o @ 2013-03-12 3:44 UTC (permalink / raw) To: Markus Trippelsdorf, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Mon, Mar 11, 2013 at 11:30:54PM -0400, Theodore Ts'o wrote: > > As Ted suggested, could you please try to use ext4 git tree? I want to > > make sure whether this bug has been fixed by my lastest patch series or > > not. > > It's definitely worth a try to compile the master branch of the ext4 > tree and see if it reproduces or not. Sorry, what you should try is the dev branch of the ext4 tree. That has the new patches that we are currently QA'ing for 3.9-rc3 (hopefully). - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 3:30 ` Theodore Ts'o 2013-03-12 3:44 ` Theodore Ts'o @ 2013-03-12 6:16 ` Markus Trippelsdorf 2013-03-12 6:44 ` Zheng Liu 1 sibling, 1 reply; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-12 6:16 UTC (permalink / raw) To: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote: > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote: > > > > Thanks for reporting this problem. My deepest apologies. > > > > As Ted suggested, could you please try to use ext4 git tree? I want to > > make sure whether this bug has been fixed by my lastest patch series or > > not. > > It's definitely worth a try to compile the master branch of the ext4 > tree and see if it reproduces or not. I cannot reproduce the issue on top of "ext4.git dev", so fortunately the problem seems to be already fixed there. Thanks. Do you guys have a hunch which commit is the actual fix? (Maybe I will "bisect" it later today.) -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 6:16 ` Markus Trippelsdorf @ 2013-03-12 6:44 ` Zheng Liu 2013-03-12 6:48 ` Markus Trippelsdorf 0 siblings, 1 reply; 22+ messages in thread From: Zheng Liu @ 2013-03-12 6:44 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote: > On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote: > > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote: > > > > > > Thanks for reporting this problem. My deepest apologies. > > > > > > As Ted suggested, could you please try to use ext4 git tree? I want to > > > make sure whether this bug has been fixed by my lastest patch series or > > > not. > > > > It's definitely worth a try to compile the master branch of the ext4 > > tree and see if it reproduces or not. > > I cannot reproduce the issue on top of "ext4.git dev", so fortunately > the problem seems to be already fixed there. > Thanks. Great! Thanks for the confirmation. > > Do you guys have a hunch which commit is the actual fix? > (Maybe I will "bisect" it later today.) I think maybe this two commits can fix it, but I am not sure which one is the actual fix (I guess it is the former one, ;-) ). Please try it if you could bisect it. Thanks in advance. * 079d7667af20876a59a1d9b0d4d1e15dcf17fa34 ext4: fix wrong the number of the allocated blocks in ext4_split_extent() * cdee78433c138c2f2018a6884673739af2634787 ext4: fix wrong m_len value after unwritten extent conversion Regards, - Zheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 6:44 ` Zheng Liu @ 2013-03-12 6:48 ` Markus Trippelsdorf 2013-03-12 7:16 ` Zheng Liu 0 siblings, 1 reply; 22+ messages in thread From: Markus Trippelsdorf @ 2013-03-12 6:48 UTC (permalink / raw) To: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On 2013.03.12 at 14:44 +0800, Zheng Liu wrote: > On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote: > > On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote: > > > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote: > > > > > > > > Thanks for reporting this problem. My deepest apologies. > > > > > > > > As Ted suggested, could you please try to use ext4 git tree? I want to > > > > make sure whether this bug has been fixed by my lastest patch series or > > > > not. > > > > > > It's definitely worth a try to compile the master branch of the ext4 > > > tree and see if it reproduces or not. > > > > I cannot reproduce the issue on top of "ext4.git dev", so fortunately > > the problem seems to be already fixed there. > > Thanks. > > Great! Thanks for the confirmation. > > > > > Do you guys have a hunch which commit is the actual fix? > > (Maybe I will "bisect" it later today.) > > I think maybe this two commits can fix it, but I am not sure which one > is the actual fix (I guess it is the former one, ;-) ). Please try it > if you could bisect it. Thanks in advance. > > * 079d7667af20876a59a1d9b0d4d1e15dcf17fa34 > ext4: fix wrong the number of the allocated blocks in > ext4_split_extent() Your guess was right. The commit above is the actual fix. -- Markus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 6:48 ` Markus Trippelsdorf @ 2013-03-12 7:16 ` Zheng Liu 2013-03-12 13:28 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Zheng Liu @ 2013-03-12 7:16 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 07:48:10AM +0100, Markus Trippelsdorf wrote: > On 2013.03.12 at 14:44 +0800, Zheng Liu wrote: > > On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote: > > > On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote: > > > > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote: > > > > > > > > > > Thanks for reporting this problem. My deepest apologies. > > > > > > > > > > As Ted suggested, could you please try to use ext4 git tree? I want to > > > > > make sure whether this bug has been fixed by my lastest patch series or > > > > > not. > > > > > > > > It's definitely worth a try to compile the master branch of the ext4 > > > > tree and see if it reproduces or not. > > > > > > I cannot reproduce the issue on top of "ext4.git dev", so fortunately > > > the problem seems to be already fixed there. > > > Thanks. > > > > Great! Thanks for the confirmation. > > > > > > > > Do you guys have a hunch which commit is the actual fix? > > > (Maybe I will "bisect" it later today.) > > > > I think maybe this two commits can fix it, but I am not sure which one > > is the actual fix (I guess it is the former one, ;-) ). Please try it > > if you could bisect it. Thanks in advance. > > > > * 079d7667af20876a59a1d9b0d4d1e15dcf17fa34 > > ext4: fix wrong the number of the allocated blocks in > > ext4_split_extent() > > Your guess was right. The commit above is the actual fix. Thank you so much for verifing it. :-) Ted, I am wandering if we need to Cc this patch to stable kernel. We don't receive any report to complaint it, though, but it is worth backporting it I think. Regards, - Zheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 7:16 ` Zheng Liu @ 2013-03-12 13:28 ` Theodore Ts'o 2013-03-13 10:15 ` Zheng Liu 0 siblings, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2013-03-12 13:28 UTC (permalink / raw) To: Markus Trippelsdorf, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 03:16:06PM +0800, Zheng Liu wrote: > > Ted, I am wandering if we need to Cc this patch to stable kernel. We > don't receive any report to complaint it, though, but it is worth > backporting it I think. I'll check, bu I suspect it will require an explicit backport; it's not going to apply cleanly automatically, will it? (i.e., if we include cc: stable@vger.kernel.org, are there some prerequisite patches that will also have to be backported, and/or we will need manually fix up patch conflicts, right?) - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 13:28 ` Theodore Ts'o @ 2013-03-13 10:15 ` Zheng Liu 0 siblings, 0 replies; 22+ messages in thread From: Zheng Liu @ 2013-03-13 10:15 UTC (permalink / raw) To: Theodore Ts'o, Markus Trippelsdorf, Dave Jones, linux-ext4, linux-kernel, Zheng Liu On Tue, Mar 12, 2013 at 09:28:11AM -0400, Theodore Ts'o wrote: > On Tue, Mar 12, 2013 at 03:16:06PM +0800, Zheng Liu wrote: > > > > Ted, I am wandering if we need to Cc this patch to stable kernel. We > > don't receive any report to complaint it, though, but it is worth > > backporting it I think. > > I'll check, bu I suspect it will require an explicit backport; it's > not going to apply cleanly automatically, will it? I check the linux-stable tree and I think it can be applied cleanly from 3.0.y. because ext4_split_extent is introduced from 3.0 kernel. So maybe we can cc to stable@vger.kernel.org. That would be great if you could double check it. Thanks, - Zheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 20:46 ` Markus Trippelsdorf 2013-03-11 21:18 ` Theodore Ts'o @ 2013-03-12 8:28 ` Sander 2013-03-12 22:04 ` Dave Chinner 1 sibling, 1 reply; 22+ messages in thread From: Sander @ 2013-03-12 8:28 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel Markus Trippelsdorf wrote (ao): > On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote: > > We actually run fsx in a number of different configruations as part of > > our regression testing before we send Linus a pull request, and > > haven't found any issues. So unless it's a hardware problem, it seems > > unlikely to me that your running fsx would turn up anything. > > Yes, I let it run for a while anyway and it didn't report any failure. > Please note that my local rtorrent version was configured with > "--with-posix-fallocate". Would it be possible to enhance fsx to detect such an issue? Sander ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-12 8:28 ` Sander @ 2013-03-12 22:04 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2013-03-12 22:04 UTC (permalink / raw) To: Sander Cc: Markus Trippelsdorf, Theodore Ts'o, Dave Jones, linux-ext4, linux-kernel On Tue, Mar 12, 2013 at 09:28:54AM +0100, Sander wrote: > Markus Trippelsdorf wrote (ao): > > On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote: > > > We actually run fsx in a number of different configruations as part of > > > our regression testing before we send Linus a pull request, and > > > haven't found any issues. So unless it's a hardware problem, it seems > > > unlikely to me that your running fsx would turn up anything. > > > > Yes, I let it run for a while anyway and it didn't report any failure. > > > Please note that my local rtorrent version was configured with > > "--with-posix-fallocate". > > Would it be possible to enhance fsx to detect such an issue? fsx in xfstests already uses fallocate() for preallocation and hole punching, so such problems related to these operations can be found using fsx. The issue here, however, involves memory reclaim interactions and so is not something fsx can reproduce in isolation. :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: torrent hash failures since 3.9.0-rc1 2013-03-11 20:13 ` Markus Trippelsdorf 2013-03-11 20:37 ` Theodore Ts'o @ 2013-03-11 20:44 ` Dave Jones 1 sibling, 0 replies; 22+ messages in thread From: Dave Jones @ 2013-03-11 20:44 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: linux-ext4, linux-kernel On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote: > > Worked fine for me on two separate machines. Could it be a network problem > > perhaps ? If something is mangling the packet before it hits the disk, > > that would explain it. What NIC do you use ? > > I normally use ATL1E, but I've dusted off my E100 and the issue is also > reproducible on the Intel card. ok, good to rule that out at least. > > Or maybe you could isolate it to a filesystem problem using something > > like fsx ? > > I've found fsx on your homepage, but I've no idea on how to use this > tool. Any pointers? cd to the mount point you want to test, and then 'fsx test' will create a couple files there, and stress them. Dave ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-03-13 10:00 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-11 17:18 torrent hash failures since 3.9.0-rc1 Markus Trippelsdorf 2013-03-11 19:17 ` Markus Trippelsdorf 2013-03-11 19:41 ` Dave Jones 2013-03-11 20:13 ` Markus Trippelsdorf 2013-03-11 20:37 ` Theodore Ts'o 2013-03-11 20:46 ` Markus Trippelsdorf 2013-03-11 21:18 ` Theodore Ts'o 2013-03-11 21:38 ` Markus Trippelsdorf 2013-03-11 23:12 ` Markus Trippelsdorf 2013-03-11 23:26 ` Dave Jones 2013-03-12 3:00 ` Zheng Liu 2013-03-12 3:30 ` Theodore Ts'o 2013-03-12 3:44 ` Theodore Ts'o 2013-03-12 6:16 ` Markus Trippelsdorf 2013-03-12 6:44 ` Zheng Liu 2013-03-12 6:48 ` Markus Trippelsdorf 2013-03-12 7:16 ` Zheng Liu 2013-03-12 13:28 ` Theodore Ts'o 2013-03-13 10:15 ` Zheng Liu 2013-03-12 8:28 ` Sander 2013-03-12 22:04 ` Dave Chinner 2013-03-11 20:44 ` Dave Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).