* btrfs-replace OOM on 2GB machine
@ 2015-11-13 16:15 Georg Lukas
2015-11-17 12:55 ` Austin S Hemmelgarn
0 siblings, 1 reply; 3+ messages in thread
From: Georg Lukas @ 2015-11-13 16:15 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 6056 bytes --]
Hi,
while evaluating btrfs for production use I ended up with a degraded
two-disk RAID1 with one disk missing, and wanted to perform a "btrfs
replace" to rebuild the RAID1. However, the replace operation causes
most of my userland to be OOM-killed and aborts eventually, at about
30% progress, on a box with 2GB of physical RAM.
My setup is:
Linux-4.3 with the following patches applied:
- http://www.spinics.net/lists/linux-btrfs/msg46123.html
(needed for degraded mount of RAID1)
- http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/patch/?id=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4
(needed for the Seagate SMRs)
btrfs-progs v4.2.3
A btrfs RAID1 initially built on two dm-crypt containers on top of two
Seagate 8TB SMR disks. For testing purposes, I unmounted the fs,
reformatted one of the two crypto containers, mounted the fs in degraded
mode (which required Anand's patch), and tried different approaches to
get it back to full operation (rebalance to m=d=single, remove the
missing drive, finally a replace), all without success.
The current status is as follows:
# btrfs dev usage /media/archive/
/dev/mapper/archive1, ID: 1
Device size: 7.28TiB
Data,single: 837.00GiB
Data,RAID0: 1.17TiB
Data,RAID1: 959.00GiB
Data,DUP: 2.17TiB
Metadata,single: 2.00GiB
Metadata,RAID1: 4.00GiB
Metadata,DUP: 5.00GiB
System,RAID1: 32.00MiB
System,DUP: 192.00MiB
Unallocated: 2.17TiB
missing, ID: 2
Device size: 0.00B
Data,RAID0: 1.17TiB
Data,RAID1: 959.00GiB
Metadata,RAID1: 4.00GiB
System,RAID1: 32.00MiB
Unallocated: 5.17TiB
I then start the replace:
# btrfs replace start 2 /dev/mapper/archive2 /media/archive/
That takes a while, OOM-kills half of my userspace in the process (it
seems like the kernel is allocating and freeing large chunks of memory
during the replace:
total used free shared buffers cached
Mem: 1.9G 1.6G 342M 784K 1.8M 14M
-/+ buffers/cache: 1.6G 358M
Swap: 4.0G 48M 4.0G
(5 second pause)
total used free shared buffers cached
Mem: 1.9G 157M 1.8G 808K 6.6M 32M
-/+ buffers/cache: 118M 1.8G
Swap: 4.0G 46M 4.0G
(another 5 seconds)
total used free shared buffers cached
Mem: 1.9G 1.1G 835M 808K 6.7M 37M
-/+ buffers/cache: 1.1G 879M
Swap: 4.0G 46M 4.0G
That seems to be kernel memory, as the swap is hardly used, despite
default swappiness settings. Furthermore, /proc/meminfo and slabtop have
no indication of how the memory is used; it just vanishes from the
"available" pool.
Eventually, the replace aborts:
[64326.700731] BTRFS: btrfs_scrub_dev(<missing disk>, 2, /dev/mapper/archive2) failed -12
[64326.700986] ------------[ cut here ]------------
[64326.701024] WARNING: CPU: 1 PID: 36251 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x36b/0x390 [btrfs]()
[64326.701062] Modules linked in: btrfs dm_crypt loop sha256_ssse3 sha256_generic hmac drbg ansi_cprng xts gf128mul algif_skcipher af_alg cpuid nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc xor raid6_pq intel_rapl iosf_mbi x86_pkg_temp_thermal iTCO_wdt intel_powerclamp iTCO_vendor_support kvm_intel kvm evdev crct10dif_pclmul crc32_pclmul cryptd snd_pcm snd_timer snd soundcore pcspkr psmouse serio_raw hpwdt hpilo lpc_ich mfd_core 8250_fintek shpchp acpi_power_meter button pcc_cpufreq acpi_cpufreq processor coretemp ipmi_watchdog dm_mod ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse autofs4 ext4 crc16 mbcache jbd2 sg sd_mod usb_storage hid_generic usbhid hid crc32c_intel uhci_hcd thermal ahci libahci libata scsi_mod tg3 ptp pps_core libphy ehci_pci ehci_hcd xhci_pci xhci_hcd
[64326.701579] usbcore usb_common [last unloaded: btrfs]
[64326.701611] CPU: 1 PID: 36251 Comm: btrfs Tainted: G W 4.3.0-gl+ #42
[64326.701647] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014
[64326.701671] ffffffffa06e8b71 ffffffff8129eac3 0000000000000000 ffffffff8106891c
[64326.701720] 00000000fffffff4 ffff880079079800 ffff880006f1a000 ffff880074e2e000
[64326.701769] ffff880006f1aec8 ffffffffa06da7db 00007ffc00000001 ffff880071c42400
[64326.701818] Call Trace:
[64326.701840] [<ffffffff8129eac3>] ? dump_stack+0x40/0x5d
[64326.701864] [<ffffffff8106891c>] ? warn_slowpath_common+0x7c/0xb0
[64326.701896] [<ffffffffa06da7db>] ? btrfs_dev_replace_start+0x36b/0x390 [btrfs]
[64326.701939] [<ffffffffa06a6bbe>] ? btrfs_ioctl+0x1b6e/0x27b0 [btrfs]
[64326.701964] [<ffffffff8116b83a>] ? page_add_file_rmap+0x2a/0x50
[64326.706074] [<ffffffff81160379>] ? do_set_pte+0x99/0xc0
[64326.706100] [<ffffffff81135f49>] ? filemap_map_pages+0x219/0x220
[64326.706123] [<ffffffff81162127>] ? handle_mm_fault+0xdd7/0x16c0
[64326.706149] [<ffffffff811b0b0e>] ? do_vfs_ioctl+0x2be/0x490
[64326.706174] [<ffffffff811b0d51>] ? SyS_ioctl+0x71/0x80
[64326.706198] [<ffffffff815000ee>] ? entry_SYSCALL_64_fastpath+0x12/0x71
[64326.706222] ---[ end trace 37fc29aa3c600bcf ]---
I'm not sure how to proceed from here, or how to debug this issue. While
the disks are not holding critical data, I'm sure it would benefit the
community (and btrfs' reputation) if this issue could be sorted out.
Kind regards,
Georg
--
|| http://op-co.de ++ GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N ++
|| gpg: 0x962FD2DE || o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+ ||
|| Ge0rG: euIRCnet || X(+++) tv+ b+(++) DI+++ D- G e++++ h- r++ y? ||
++ IRCnet OFTC OPN ||_________________________________________________||
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: btrfs-replace OOM on 2GB machine
2015-11-13 16:15 btrfs-replace OOM on 2GB machine Georg Lukas
@ 2015-11-17 12:55 ` Austin S Hemmelgarn
2015-11-17 13:18 ` Georg Lukas
0 siblings, 1 reply; 3+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-17 12:55 UTC (permalink / raw)
To: Georg Lukas, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 6835 bytes --]
On 2015-11-13 11:15, Georg Lukas wrote:
> Hi,
>
> while evaluating btrfs for production use I ended up with a degraded
> two-disk RAID1 with one disk missing, and wanted to perform a "btrfs
> replace" to rebuild the RAID1. However, the replace operation causes
> most of my userland to be OOM-killed and aborts eventually, at about
> 30% progress, on a box with 2GB of physical RAM.
>
> My setup is:
>
> Linux-4.3 with the following patches applied:
> - http://www.spinics.net/lists/linux-btrfs/msg46123.html
> (needed for degraded mount of RAID1)
> - http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/patch/?id=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4
> (needed for the Seagate SMRs)
>
> btrfs-progs v4.2.3
>
> A btrfs RAID1 initially built on two dm-crypt containers on top of two
> Seagate 8TB SMR disks. For testing purposes, I unmounted the fs,
> reformatted one of the two crypto containers, mounted the fs in degraded
> mode (which required Anand's patch), and tried different approaches to
> get it back to full operation (rebalance to m=d=single, remove the
> missing drive, finally a replace), all without success.
While it probably isn't related to the OOM issue, I would be
particularly wary of using BTRFS on SMR disks, we've had multiple
reports of serious issues with them (and IIRC, they were all the same
model of 8TB Seagate SMR disks).
>
> The current status is as follows:
>
> # btrfs dev usage /media/archive/
> /dev/mapper/archive1, ID: 1
> Device size: 7.28TiB
> Data,single: 837.00GiB
> Data,RAID0: 1.17TiB
> Data,RAID1: 959.00GiB
> Data,DUP: 2.17TiB
> Metadata,single: 2.00GiB
> Metadata,RAID1: 4.00GiB
> Metadata,DUP: 5.00GiB
> System,RAID1: 32.00MiB
> System,DUP: 192.00MiB
> Unallocated: 2.17TiB
>
> missing, ID: 2
> Device size: 0.00B
> Data,RAID0: 1.17TiB
> Data,RAID1: 959.00GiB
> Metadata,RAID1: 4.00GiB
> System,RAID1: 32.00MiB
> Unallocated: 5.17TiB
Hmm, it looks like things weren't all RAID1, you've got a little over
1TiB of data that was RAID0, and that may be why you can't rebuild the
FS. This shouldn't be causing an OOM condition, but it definitely means
things are not fully recoverable.
>
> I then start the replace:
>
> # btrfs replace start 2 /dev/mapper/archive2 /media/archive/
>
> That takes a while, OOM-kills half of my userspace in the process (it
> seems like the kernel is allocating and freeing large chunks of memory
> during the replace:
>
> total used free shared buffers cached
> Mem: 1.9G 1.6G 342M 784K 1.8M 14M
> -/+ buffers/cache: 1.6G 358M
> Swap: 4.0G 48M 4.0G
>
> (5 second pause)
> total used free shared buffers cached
> Mem: 1.9G 157M 1.8G 808K 6.6M 32M
> -/+ buffers/cache: 118M 1.8G
> Swap: 4.0G 46M 4.0G
>
> (another 5 seconds)
> total used free shared buffers cached
> Mem: 1.9G 1.1G 835M 808K 6.7M 37M
> -/+ buffers/cache: 1.1G 879M
> Swap: 4.0G 46M 4.0G
>
> That seems to be kernel memory, as the swap is hardly used, despite
> default swappiness settings. Furthermore, /proc/meminfo and slabtop have
> no indication of how the memory is used; it just vanishes from the
> "available" pool.
This sounds to me like a memory leak in the kernel, but I'm not certain.
I'm going to try and reproduce this without the SMR patch (and
obviously without the SMR drives themselves) in a VM.
>
> Eventually, the replace aborts:
>
> [64326.700731] BTRFS: btrfs_scrub_dev(<missing disk>, 2, /dev/mapper/archive2) failed -12
> [64326.700986] ------------[ cut here ]------------
> [64326.701024] WARNING: CPU: 1 PID: 36251 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x36b/0x390 [btrfs]()
> [64326.701062] Modules linked in: btrfs dm_crypt loop sha256_ssse3 sha256_generic hmac drbg ansi_cprng xts gf128mul algif_skcipher af_alg cpuid nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc xor raid6_pq intel_rapl iosf_mbi x86_pkg_temp_thermal iTCO_wdt intel_powerclamp iTCO_vendor_support kvm_intel kvm evdev crct10dif_pclmul crc32_pclmul cryptd snd_pcm snd_timer snd soundcore pcspkr psmouse serio_raw hpwdt hpilo lpc_ich mfd_core 8250_fintek shpchp acpi_power_meter button pcc_cpufreq acpi_cpufreq processor coretemp ipmi_watchdog dm_mod ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse autofs4 ext4 crc16 mbcache jbd2 sg sd_mod usb_storage hid_generic usbhid hid crc32c_intel uhci_hcd thermal ahci libahci libata scsi_mod tg3 ptp pps_core libphy ehci_pci ehci_hcd xhci_pci xhci_hcd
> [64326.701579] usbcore usb_common [last unloaded: btrfs]
> [64326.701611] CPU: 1 PID: 36251 Comm: btrfs Tainted: G W 4.3.0-gl+ #42
> [64326.701647] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014
> [64326.701671] ffffffffa06e8b71 ffffffff8129eac3 0000000000000000 ffffffff8106891c
> [64326.701720] 00000000fffffff4 ffff880079079800 ffff880006f1a000 ffff880074e2e000
> [64326.701769] ffff880006f1aec8 ffffffffa06da7db 00007ffc00000001 ffff880071c42400
> [64326.701818] Call Trace:
> [64326.701840] [<ffffffff8129eac3>] ? dump_stack+0x40/0x5d
> [64326.701864] [<ffffffff8106891c>] ? warn_slowpath_common+0x7c/0xb0
> [64326.701896] [<ffffffffa06da7db>] ? btrfs_dev_replace_start+0x36b/0x390 [btrfs]
> [64326.701939] [<ffffffffa06a6bbe>] ? btrfs_ioctl+0x1b6e/0x27b0 [btrfs]
> [64326.701964] [<ffffffff8116b83a>] ? page_add_file_rmap+0x2a/0x50
> [64326.706074] [<ffffffff81160379>] ? do_set_pte+0x99/0xc0
> [64326.706100] [<ffffffff81135f49>] ? filemap_map_pages+0x219/0x220
> [64326.706123] [<ffffffff81162127>] ? handle_mm_fault+0xdd7/0x16c0
> [64326.706149] [<ffffffff811b0b0e>] ? do_vfs_ioctl+0x2be/0x490
> [64326.706174] [<ffffffff811b0d51>] ? SyS_ioctl+0x71/0x80
> [64326.706198] [<ffffffff815000ee>] ? entry_SYSCALL_64_fastpath+0x12/0x71
> [64326.706222] ---[ end trace 37fc29aa3c600bcf ]---
This actually looks like it's a different issue potentially, for some
reason BTRFS is trying to scrub the missing disk (which won't work of
course).
>
> I'm not sure how to proceed from here, or how to debug this issue. While
> the disks are not holding critical data, I'm sure it would benefit the
> community (and btrfs' reputation) if this issue could be sorted out.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: btrfs-replace OOM on 2GB machine
2015-11-17 12:55 ` Austin S Hemmelgarn
@ 2015-11-17 13:18 ` Georg Lukas
0 siblings, 0 replies; 3+ messages in thread
From: Georg Lukas @ 2015-11-17 13:18 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]
* Austin S Hemmelgarn <ahferroin7@gmail.com> [2015-11-17 13:56]:
> While it probably isn't related to the OOM issue, I would be particularly
> wary of using BTRFS on SMR disks, we've had multiple reports of serious
> issues with them (and IIRC, they were all the same model of 8TB Seagate SMR
> disks).
Yes, that's exactly the model I have here, but the problems are related
to the SMR support in the kernel, and hopefully not at all to btrfs.
With the latest patch by Martin K. Petersen [*], the disks seem to be
stable and reliable, finally.
[*] https://bugzilla.kernel.org/show_bug.cgi?id=93581
> Hmm, it looks like things weren't all RAID1, you've got a little over 1TiB
> of data that was RAID0, and that may be why you can't rebuild the FS. This
> shouldn't be causing an OOM condition, but it definitely means things are
> not fully recoverable.
I think this was caused by my attempt to rebalance the degraded RAID1
into RAID0, and there are indeed some files on the fs that I can't read
any more. As this is not a production system, I'm not very bothered - I
just wanted to find out if I can get it back to live, which currently
fails on the replace.
> This actually looks like it's a different issue potentially, for some reason
> BTRFS is trying to scrub the missing disk (which won't work of course).
Indeed, scrubbing the degraded disk set did not succeed either.
If you need me to perform any other actions on that disk set, let me
know on- or off-list.
Georg
--
|| http://op-co.de ++ GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N ++
|| gpg: 0x962FD2DE || o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+ ||
|| Ge0rG: euIRCnet || X(+++) tv+ b+(++) DI+++ D- G e++++ h- r++ y? ||
++ IRCnet OFTC OPN ||_________________________________________________||
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-17 13:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-13 16:15 btrfs-replace OOM on 2GB machine Georg Lukas
2015-11-17 12:55 ` Austin S Hemmelgarn
2015-11-17 13:18 ` Georg Lukas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox