* Btrfs raid1 array has issues with rtorrent usage pattern.
@ 2014-10-28 16:49 Alec Blayne
2014-10-29 21:50 ` Dan Merillat
0 siblings, 1 reply; 6+ messages in thread
From: Alec Blayne @ 2014-10-28 16:49 UTC (permalink / raw)
To: linux-btrfs
Hi, it seems that when using rtorrent to download into a btrfs system,
it leads to the creation of files that fail to read properly.
For instance, I get rtorrent to crash, but if I try to rsync the file he
was writting into someplace else, rsync also fails with the message
"can't map file "$file": Input/Output error (5)".
If I give it time, eventually the file gets into a good state and I can
rsync it somewhere else (as long as rtorrent doesn't keep writting into
it). This doesn't happen using ext4 on the same system.
No btrfs errors, or any other errors, show up in any log. Scrubbing or
balancing don't turn up any issues. I've tried using a subvolume mounted
with nodatacow and/or flushoncommit, which didn't help. I'm not using
quotas and at some point had a single snapshot that I deleted. The
filesystem was originally created recently (on a 3.16.4+ kernel).
Here's what the array looks like:
Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
Total devices 4 FS bytes used 3.14TiB
devid 4 size 2.73TiB used 2.36TiB path /dev/sdd1
devid 5 size 1.82TiB used 1.45TiB path /dev/sdc1
devid 6 size 1.82TiB used 1.45TiB path /dev/sdb1
devid 7 size 1.82TiB used 1.45TiB path /dev/sda1
Btrfs v3.17
Data, RAID1: total=3.34TiB, used=3.13TiB
System, RAID1: total=32.00MiB, used=512.00KiB
Metadata, RAID1: total=10.00GiB, used=7.31GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
AuthenticAMD GNU/Linux
I'm utterly puzzled and clueless at how to dig into this issue.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Btrfs raid1 array has issues with rtorrent usage pattern. 2014-10-28 16:49 Btrfs raid1 array has issues with rtorrent usage pattern Alec Blayne @ 2014-10-29 21:50 ` Dan Merillat 2014-10-29 23:02 ` Dan Merillat [not found] ` <54517726.5070507@tevsa.net> 0 siblings, 2 replies; 6+ messages in thread From: Dan Merillat @ 2014-10-29 21:50 UTC (permalink / raw) To: Alec Blayne; +Cc: BTRFS I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat > /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: "old" files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than "go grab torrent files with rtorrent". On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne <ab@tevsa.net> wrote: > Hi, it seems that when using rtorrent to download into a btrfs system, > it leads to the creation of files that fail to read properly. > For instance, I get rtorrent to crash, but if I try to rsync the file he > was writting into someplace else, rsync also fails with the message > "can't map file "$file": Input/Output error (5)". > If I give it time, eventually the file gets into a good state and I can > rsync it somewhere else (as long as rtorrent doesn't keep writting into > it). This doesn't happen using ext4 on the same system. > > No btrfs errors, or any other errors, show up in any log. Scrubbing or > balancing don't turn up any issues. I've tried using a subvolume mounted > with nodatacow and/or flushoncommit, which didn't help. I'm not using > quotas and at some point had a single snapshot that I deleted. The > filesystem was originally created recently (on a 3.16.4+ kernel). > > Here's what the array looks like: > > Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 > Total devices 4 FS bytes used 3.14TiB > devid 4 size 2.73TiB used 2.36TiB path /dev/sdd1 > devid 5 size 1.82TiB used 1.45TiB path /dev/sdc1 > devid 6 size 1.82TiB used 1.45TiB path /dev/sdb1 > devid 7 size 1.82TiB used 1.45TiB path /dev/sda1 > > Btrfs v3.17 > > Data, RAID1: total=3.34TiB, used=3.13TiB > System, RAID1: total=32.00MiB, used=512.00KiB > Metadata, RAID1: total=10.00GiB, used=7.31GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > > On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 > 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 > AuthenticAMD GNU/Linux > > I'm utterly puzzled and clueless at how to dig into this issue. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs raid1 array has issues with rtorrent usage pattern. 2014-10-29 21:50 ` Dan Merillat @ 2014-10-29 23:02 ` Dan Merillat [not found] ` <54517726.5070507@tevsa.net> 1 sibling, 0 replies; 6+ messages in thread From: Dan Merillat @ 2014-10-29 23:02 UTC (permalink / raw) To: Alec Blayne; +Cc: BTRFS The following code reliably throws a SIGBUS in the memset, and cat testfile > /dev/null returns an IO error. I've sometimes gotten as high as iteration 900 before a SIGBUS, so don't assume a single clear is OK. linux 3.17.0, SATA -> MD(raid5) -> bcache (ssd) -> btrfs Working on eliminating more variables. #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <stdint.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #define MB (1024ull * 1024) #define GB (1024ull * MB) #define TEST_SIZE (4096) int main() { int fd; srandom(1024); fd=open("testfile", O_RDWR|O_CREAT, 0600); posix_fallocate(fd, 0, TEST_SIZE * MB); uint8_t * map = 0; int i; for(i=0;i<1000;i++) { size_t location=(random() % (TEST_SIZE-1)) * MB; map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE, MAP_SHARED, fd, location); printf("%d: writing at %04zd mb\n", i, location); memset(map, 0x5a, 1 * MB); msync(map, 1*MB, MS_ASYNC); munmap(map, MB); } } On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat <dan.merillat@gmail.com> wrote: > I'm in the middle of debugging the exact same thing. 3.17.0 - > rtorrent dies with SIGBUS. > > I've done some debugging, the sequence is something like this: > open a new file > fallocate() to the final size > mmap() all (or a portion) of the file > write to the region > run SHA1 on that mmap'd region to validate the chink > crash, eventually. Generally not at the same point. > > Reading that file (cat > /dev/null) returns -EIO. > > Looking up the process maps, the SIGBUS appears to be happening in the > middle of a mapped region of a pre-allocated file - I.E. it shouldn't > be. I'm not completely ruling out a rtorrent bug but it appears sane > to me. > > Weirder: "old" files, that have been around a while, work just fine for seeding. > I've re-hashed my entire collection without an error. > > Seeing this on both inherit-COW and no-inherit-COW files, and the > filesystem is not using compression. > > The interesting part is going back and attempting to read the files > later they sometimes don't throw an IO error. > > Absolutely nothing in dmesg. > > Working on a testcase that triggers it reliably but no luck so far. I > thought I had bad RAM but two people upgrading to 3.17 and seeing the > same bug at around the same time can't be a coincidence. I rebooted > to 3.17 on the 25th, the first new download was on the 28th and that > failed. > > Working on a testcase for it that's more reproducable than "go grab > torrent files with rtorrent". > > On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne <ab@tevsa.net> wrote: >> Hi, it seems that when using rtorrent to download into a btrfs system, >> it leads to the creation of files that fail to read properly. >> For instance, I get rtorrent to crash, but if I try to rsync the file he >> was writting into someplace else, rsync also fails with the message >> "can't map file "$file": Input/Output error (5)". >> If I give it time, eventually the file gets into a good state and I can >> rsync it somewhere else (as long as rtorrent doesn't keep writting into >> it). This doesn't happen using ext4 on the same system. >> >> No btrfs errors, or any other errors, show up in any log. Scrubbing or >> balancing don't turn up any issues. I've tried using a subvolume mounted >> with nodatacow and/or flushoncommit, which didn't help. I'm not using >> quotas and at some point had a single snapshot that I deleted. The >> filesystem was originally created recently (on a 3.16.4+ kernel). >> >> Here's what the array looks like: >> >> Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 >> Total devices 4 FS bytes used 3.14TiB >> devid 4 size 2.73TiB used 2.36TiB path /dev/sdd1 >> devid 5 size 1.82TiB used 1.45TiB path /dev/sdc1 >> devid 6 size 1.82TiB used 1.45TiB path /dev/sdb1 >> devid 7 size 1.82TiB used 1.45TiB path /dev/sda1 >> >> Btrfs v3.17 >> >> Data, RAID1: total=3.34TiB, used=3.13TiB >> System, RAID1: total=32.00MiB, used=512.00KiB >> Metadata, RAID1: total=10.00GiB, used=7.31GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> >> On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 >> 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 >> AuthenticAMD GNU/Linux >> >> I'm utterly puzzled and clueless at how to dig into this issue. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <54517726.5070507@tevsa.net>]
* Re: Btrfs raid1 array has issues with rtorrent usage pattern. [not found] ` <54517726.5070507@tevsa.net> @ 2014-10-30 3:17 ` Dan Merillat 2014-10-30 7:50 ` Koen Kooi 0 siblings, 1 reply; 6+ messages in thread From: Dan Merillat @ 2014-10-30 3:17 UTC (permalink / raw) To: Alec Blayne; +Cc: BTRFS It's specifically BTRFS related, I was able to reproduce it on a bare drive (no lvm, no md, no bcache). It's not bad RAM, I was able to reproduce it on multiple machines running either 3.17 or late RCs. I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so that's good. If anyone else can reproduce this it'll probably need to be sent to 3.17-stable. On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne <ab@tevsa.net> wrote: > Really nice to know it's already getting handled :) > > I'm already "downgrading" to 3.16.6 now that I know I won't have that > issue. I was already planning to because of the read-only snapshots issue. > > Thank you and good luck debugging! > > On 29-10-2014 21:50, Dan Merillat wrote: >> I'm in the middle of debugging the exact same thing. 3.17.0 - >> rtorrent dies with SIGBUS. >> >> I've done some debugging, the sequence is something like this: >> open a new file >> fallocate() to the final size >> mmap() all (or a portion) of the file >> write to the region >> run SHA1 on that mmap'd region to validate the chink >> crash, eventually. Generally not at the same point. >> >> Reading that file (cat > /dev/null) returns -EIO. >> >> Looking up the process maps, the SIGBUS appears to be happening in the >> middle of a mapped region of a pre-allocated file - I.E. it shouldn't >> be. I'm not completely ruling out a rtorrent bug but it appears sane >> to me. >> >> Weirder: "old" files, that have been around a while, work just fine for seeding. >> I've re-hashed my entire collection without an error. >> >> Seeing this on both inherit-COW and no-inherit-COW files, and the >> filesystem is not using compression. >> >> The interesting part is going back and attempting to read the files >> later they sometimes don't throw an IO error. >> >> Absolutely nothing in dmesg. >> >> Working on a testcase that triggers it reliably but no luck so far. I >> thought I had bad RAM but two people upgrading to 3.17 and seeing the >> same bug at around the same time can't be a coincidence. I rebooted >> to 3.17 on the 25th, the first new download was on the 28th and that >> failed. >> >> Working on a testcase for it that's more reproducable than "go grab >> torrent files with rtorrent". >> >> On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne <ab@tevsa.net> wrote: >>> Hi, it seems that when using rtorrent to download into a btrfs system, >>> it leads to the creation of files that fail to read properly. >>> For instance, I get rtorrent to crash, but if I try to rsync the file he >>> was writting into someplace else, rsync also fails with the message >>> "can't map file "$file": Input/Output error (5)". >>> If I give it time, eventually the file gets into a good state and I can >>> rsync it somewhere else (as long as rtorrent doesn't keep writting into >>> it). This doesn't happen using ext4 on the same system. >>> >>> No btrfs errors, or any other errors, show up in any log. Scrubbing or >>> balancing don't turn up any issues. I've tried using a subvolume mounted >>> with nodatacow and/or flushoncommit, which didn't help. I'm not using >>> quotas and at some point had a single snapshot that I deleted. The >>> filesystem was originally created recently (on a 3.16.4+ kernel). >>> >>> Here's what the array looks like: >>> >>> Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 >>> Total devices 4 FS bytes used 3.14TiB >>> devid 4 size 2.73TiB used 2.36TiB path /dev/sdd1 >>> devid 5 size 1.82TiB used 1.45TiB path /dev/sdc1 >>> devid 6 size 1.82TiB used 1.45TiB path /dev/sdb1 >>> devid 7 size 1.82TiB used 1.45TiB path /dev/sda1 >>> >>> Btrfs v3.17 >>> >>> Data, RAID1: total=3.34TiB, used=3.13TiB >>> System, RAID1: total=32.00MiB, used=512.00KiB >>> Metadata, RAID1: total=10.00GiB, used=7.31GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> >>> >>> On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 >>> 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 >>> AuthenticAMD GNU/Linux >>> >>> I'm utterly puzzled and clueless at how to dig into this issue. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs raid1 array has issues with rtorrent usage pattern. 2014-10-30 3:17 ` Dan Merillat @ 2014-10-30 7:50 ` Koen Kooi 2014-11-01 18:00 ` Dan Merillat 0 siblings, 1 reply; 6+ messages in thread From: Koen Kooi @ 2014-10-30 7:50 UTC (permalink / raw) To: linux-btrfs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dan Merillat schreef op 30-10-14 04:17: > It's specifically BTRFS related, I was able to reproduce it on a bare > drive (no lvm, no md, no bcache). It's not bad RAM, I was able to > reproduce it on multiple machines running either 3.17 or late RCs. > > I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so > that's good. If anyone else can reproduce this it'll probably need to be > sent to 3.17-stable. 3.17.2 has a lot of btrfs backports queued[1] already, could you see if the fix for your problem is already present? regards, Koen [1] https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/commit/queue-3.17/btrfs-fix-a-deadlock-in-btrfs_dev_replace_finishing.patch?id=2792dbfd1e02a70a8eef7e0cc3f44cb77d6c100f > > On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne <ab@tevsa.net> wrote: >> Really nice to know it's already getting handled :) >> >> I'm already "downgrading" to 3.16.6 now that I know I won't have that >> issue. I was already planning to because of the read-only snapshots >> issue. >> >> Thank you and good luck debugging! >> >> On 29-10-2014 21:50, Dan Merillat wrote: >>> I'm in the middle of debugging the exact same thing. 3.17.0 - >>> rtorrent dies with SIGBUS. >>> >>> I've done some debugging, the sequence is something like this: open a >>> new file fallocate() to the final size mmap() all (or a portion) of >>> the file write to the region run SHA1 on that mmap'd region to >>> validate the chink crash, eventually. Generally not at the same >>> point. >>> >>> Reading that file (cat > /dev/null) returns -EIO. >>> >>> Looking up the process maps, the SIGBUS appears to be happening in >>> the middle of a mapped region of a pre-allocated file - I.E. it >>> shouldn't be. I'm not completely ruling out a rtorrent bug but it >>> appears sane to me. >>> >>> Weirder: "old" files, that have been around a while, work just fine >>> for seeding. I've re-hashed my entire collection without an error. >>> >>> Seeing this on both inherit-COW and no-inherit-COW files, and the >>> filesystem is not using compression. >>> >>> The interesting part is going back and attempting to read the files >>> later they sometimes don't throw an IO error. >>> >>> Absolutely nothing in dmesg. >>> >>> Working on a testcase that triggers it reliably but no luck so far. >>> I thought I had bad RAM but two people upgrading to 3.17 and seeing >>> the same bug at around the same time can't be a coincidence. I >>> rebooted to 3.17 on the 25th, the first new download was on the 28th >>> and that failed. >>> >>> Working on a testcase for it that's more reproducable than "go grab >>> torrent files with rtorrent". >>> >>> On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne <ab@tevsa.net> wrote: >>>> Hi, it seems that when using rtorrent to download into a btrfs >>>> system, it leads to the creation of files that fail to read >>>> properly. For instance, I get rtorrent to crash, but if I try to >>>> rsync the file he was writting into someplace else, rsync also >>>> fails with the message "can't map file "$file": Input/Output error >>>> (5)". If I give it time, eventually the file gets into a good state >>>> and I can rsync it somewhere else (as long as rtorrent doesn't keep >>>> writting into it). This doesn't happen using ext4 on the same >>>> system. >>>> >>>> No btrfs errors, or any other errors, show up in any log. Scrubbing >>>> or balancing don't turn up any issues. I've tried using a subvolume >>>> mounted with nodatacow and/or flushoncommit, which didn't help. I'm >>>> not using quotas and at some point had a single snapshot that I >>>> deleted. The filesystem was originally created recently (on a >>>> 3.16.4+ kernel). >>>> >>>> Here's what the array looks like: >>>> >>>> Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total >>>> devices 4 FS bytes used 3.14TiB devid 4 size 2.73TiB used >>>> 2.36TiB path /dev/sdd1 devid 5 size 1.82TiB used 1.45TiB path >>>> /dev/sdc1 devid 6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid >>>> 7 size 1.82TiB used 1.45TiB path /dev/sda1 >>>> >>>> Btrfs v3.17 >>>> >>>> Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: >>>> total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, >>>> used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B >>>> >>>> >>>> On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 >>>> 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) >>>> R3 AuthenticAMD GNU/Linux >>>> >>>> I'm utterly puzzled and clueless at how to dig into this issue. -- >>>> To unsubscribe from this list: send the line "unsubscribe >>>> linux-btrfs" in the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in the body of a message to majordomo@vger.kernel.org More majordomo info > at http://vger.kernel.org/majordomo-info.html > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) Comment: GPGTools - http://gpgtools.org iD8DBQFUUe3QMkyGM64RGpERAn5dAJ9Bflg06EYS4kOlu61x85c9/yebngCgunfu DTpcyDmWwKf5dM0uK7tzheY= =y9b0 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs raid1 array has issues with rtorrent usage pattern. 2014-10-30 7:50 ` Koen Kooi @ 2014-11-01 18:00 ` Dan Merillat 0 siblings, 0 replies; 6+ messages in thread From: Dan Merillat @ 2014-11-01 18:00 UTC (permalink / raw) To: Koen Kooi; +Cc: BTRFS On Thu, Oct 30, 2014 at 3:50 AM, Koen Kooi <koen@dominion.thruhere.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dan Merillat schreef op 30-10-14 04:17: >> It's specifically BTRFS related, I was able to reproduce it on a bare >> drive (no lvm, no md, no bcache). It's not bad RAM, I was able to >> reproduce it on multiple machines running either 3.17 or late RCs. >> >> I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so >> that's good. If anyone else can reproduce this it'll probably need to be >> sent to 3.17-stable. > > 3.17.2 has a lot of btrfs backports queued[1] already, could you see if the > fix for your problem is already present? > Sorry about all the top-posting, I dislike the way gmail makes it the default. Yes, the patches queued for 3.17.2 appear to have fixed it. I didn't have time to run a bisection to see where it broke between .16 and .17, though. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-11-01 18:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-28 16:49 Btrfs raid1 array has issues with rtorrent usage pattern Alec Blayne
2014-10-29 21:50 ` Dan Merillat
2014-10-29 23:02 ` Dan Merillat
[not found] ` <54517726.5070507@tevsa.net>
2014-10-30 3:17 ` Dan Merillat
2014-10-30 7:50 ` Koen Kooi
2014-11-01 18:00 ` Dan Merillat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox