public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* btrfs send: page allocation failure
@ 2014-01-13 12:58 Jim Salter
  2014-01-13 15:17 ` Wang Shilong
  2014-01-13 18:23 ` David Sterba
  0 siblings, 2 replies; 15+ messages in thread
From: Jim Salter @ 2014-01-13 12:58 UTC (permalink / raw)
  To: linux-btrfs

Hi list -

Getting sporadic page allocation failures in btrfs send. This happened 
once several weeks ago but was fine after a reboot; yesterday I did not 
reboot, but had the failure back-to-back trying to send two different 
snapshots. These are full sends, not incremental, of a bit over 600G of 
data. Test machine has 32G of RAM, with 21G of it free (not including 
cache):

root@gwa-virt1:/data/images/.snapshots# free -m
              total       used       free     shared    buffers cached
Mem:         32159      31789        369          0          0 21276
-/+ buffers/cache:      10513      21646
Swap:            0          0          0

In both cases (all three, really) the btrfs send failed a bit more than 
half of the way through the send (somewhere around the 380GB mark).

Kern log snippets follow:

Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation 
failure: order:6, mode:0x104050
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: 
btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? 
btrfs_get_token_64+0x64/0xf0 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? 
ulist_add_merge+0xcd/0x270 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] 
ulist_add_merge+0xcd/0x270 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] 
find_parent_nodes+0x50c/0x6f0 [btrf ]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? 
compare_refs.isra.23+0x130/0x130 btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] 
iterate_extent_inodes+0xf9/0x270 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? 
free_extent_buffer+0x35/0x40 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] 
find_extent_clone.isra.26+0x26d/0x340 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] 
process_extent+0xd7/0x180 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] 
changed_cb+0xdf/0x170 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] 
full_send_tree+0x142/0x280 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? 
send_subvol_begin+0xbc/0x2b0 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] 
send_subvol+0xe0/0xf0 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] 
btrfs_ioctl_send+0x341/0x520 [btrfs]
Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] 
btrfs_ioctl+0x953/0xac0 [btrfs]

Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation 
failure: order:5, mode:0x104050
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 
Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? 
btrfs_get_token_64+0x64/0xf0 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? 
ulist_add_merge+0xcd/0x270 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] 
ulist_add_merge+0xcd/0x270 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] 
find_parent_nodes+0x50c/0x6f0 [btrf ]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? 
compare_refs.isra.23+0x130/0x130 btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] 
iterate_extent_inodes+0xf9/0x270 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? 
free_extent_buffer+0x35/0x40 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] 
find_extent_clone.isra.26+0x26d/0x340 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] 
process_extent+0xd7/0x180 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] 
changed_cb+0xdf/0x170 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] 
full_send_tree+0x142/0x280 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? 
send_subvol_begin+0xbc/0x2b0 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] 
send_subvol+0xe0/0xf0 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] 
btrfs_ioctl_send+0x341/0x520 [btrfs]
Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] 
btrfs_ioctl+0x953/0xac0 [btrfs]


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 12:58 btrfs send: page allocation failure Jim Salter
@ 2014-01-13 15:17 ` Wang Shilong
  2014-01-13 15:20   ` Jim Salter
  2014-01-13 18:23 ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: Wang Shilong @ 2014-01-13 15:17 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

Hello,

I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
krealloc which might cause memory allocation fails especial you use full send.

Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
this issue.

Thanks,
Wang

> Hi list -
> 
> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
> 
> root@gwa-virt1:/data/images/.snapshots# free -m
>             total       used       free     shared    buffers cached
> Mem:         32159      31789        369          0          0 21276
> -/+ buffers/cache:      10513      21646
> Swap:            0          0          0
> 
> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
> 
> Kern log snippets follow:
> 
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
> 
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 15:17 ` Wang Shilong
@ 2014-01-13 15:20   ` Jim Salter
  2014-01-13 15:29     ` Wang Shilong
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Salter @ 2014-01-13 15:20 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

Er... I can't use incremental send if I can't get one full send to go 
through first. =)

I'm hoping the problem will go away for long enough to get a full send 
completed once I reboot the box, but I can't do that until (much) later 
in the day.

On 01/13/2014 10:17 AM, Wang Shilong wrote:
> Hello,
>
> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
> krealloc which might cause memory allocation fails especial you use full send.
>
> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
> this issue.
>
> Thanks,
> Wang
>
>> Hi list -
>>
>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>
>> root@gwa-virt1:/data/images/.snapshots# free -m
>>              total       used       free     shared    buffers cached
>> Mem:         32159      31789        369          0          0 21276
>> -/+ buffers/cache:      10513      21646
>> Swap:            0          0          0
>>
>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>
>> Kern log snippets follow:
>>
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 15:20   ` Jim Salter
@ 2014-01-13 15:29     ` Wang Shilong
  2014-01-13 15:44       ` Wang Shilong
  0 siblings, 1 reply; 15+ messages in thread
From: Wang Shilong @ 2014-01-13 15:29 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs


在 2014-1-13,下午11:20,Jim Salter <jim@jrs-s.net> 写道:

> Er... I can't use incremental send if I can't get one full send to go through first. =)

sory,  i mean one approach is use '-p' option, you can use:

# btrfs sub create subv
# btrfs sub snapshot -r subv snap
# btrfs sub snapshot -r sub  snap1
# btrfs send snap -p snap1 -f 1
# btrfs receive -f 1 backup
# btrfs sub delete snap1 -<-- now you can delete snap1 safely

The above approach is much faster, i think you can try it!

Thanks,
Wang
> 
> I'm hoping the problem will go away for long enough to get a full send completed once I reboot the box, but I can't do that until (much) later in the day.
> 
> On 01/13/2014 10:17 AM, Wang Shilong wrote:
>> Hello,
>> 
>> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
>> krealloc which might cause memory allocation fails especial you use full send.
>> 
>> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
>> this issue.
>> 
>> Thanks,
>> Wang
>> 
>>> Hi list -
>>> 
>>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>> 
>>> root@gwa-virt1:/data/images/.snapshots# free -m
>>>             total       used       free     shared    buffers cached
>>> Mem:         32159      31789        369          0          0 21276
>>> -/+ buffers/cache:      10513      21646
>>> Swap:            0          0          0
>>> 
>>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>> 
>>> Kern log snippets follow:
>>> 
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>> 
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 15:29     ` Wang Shilong
@ 2014-01-13 15:44       ` Wang Shilong
  2014-01-13 16:00         ` Jim Salter
  2014-01-13 16:01         ` Jim Salter
  0 siblings, 2 replies; 15+ messages in thread
From: Wang Shilong @ 2014-01-13 15:44 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

Just double check, what is your kernel version to trigger this problem…
I suppose this should be an older kernel? If yes,  can you have a try at the latest
upstream kernel and see if problem still exist? 

Thanks,
Wang

> 
> 在 2014-1-13,下午11:20,Jim Salter <jim@jrs-s.net> 写道:
> 
>> Er... I can't use incremental send if I can't get one full send to go through first. =)
> 
> sory,  i mean one approach is use '-p' option, you can use:
> 
> # btrfs sub create subv
> # btrfs sub snapshot -r subv snap
> # btrfs sub snapshot -r sub  snap1
> # btrfs send snap -p snap1 -f 1
> # btrfs receive -f 1 backup
> # btrfs sub delete snap1 -<-- now you can delete snap1 safely
> 
> The above approach is much faster, i think you can try it!
> 
> Thanks,
> Wang
>> 
>> I'm hoping the problem will go away for long enough to get a full send completed once I reboot the box, but I can't do that until (much) later in the day.
>> 
>> On 01/13/2014 10:17 AM, Wang Shilong wrote:
>>> Hello,
>>> 
>>> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
>>> krealloc which might cause memory allocation fails especial you use full send.
>>> 
>>> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
>>> this issue.
>>> 
>>> Thanks,
>>> Wang
>>> 
>>>> Hi list -
>>>> 
>>>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>>> 
>>>> root@gwa-virt1:/data/images/.snapshots# free -m
>>>>            total       used       free     shared    buffers cached
>>>> Mem:         32159      31789        369          0          0 21276
>>>> -/+ buffers/cache:      10513      21646
>>>> Swap:            0          0          0
>>>> 
>>>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>>> 
>>>> Kern log snippets follow:
>>>> 
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>> 
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 15:44       ` Wang Shilong
@ 2014-01-13 16:00         ` Jim Salter
  2014-01-13 16:09           ` Wang Shilong
  2014-01-13 16:01         ` Jim Salter
  1 sibling, 1 reply; 15+ messages in thread
From: Jim Salter @ 2014-01-13 16:00 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

It's a pretty new kernel - 3.13.0-031300rc7-generic.

Sorry if I'm misunderstanding something, but... how can I send with -p
if the parent snapshot doesn't already exist on the target?

What I'm doing is btrfs send /.snapshots/name-of-snapshot | ssh
othermachine btrfs receive /.snapshots. If the parent specified in -p
doesn't exist on othermachine, won't the receive operation fail?

On 01/13/2014 10:44 AM, Wang Shilong wrote:
> Just double check, what is your kernel version to trigger this problem…
> I suppose this should be an older kernel? If yes,  can you have a try at the latest
> upstream kernel and see if problem still exist? 
>
> Thanks,
> Wang
>
>> 在 2014-1-13,下午11:20,Jim Salter <jim@jrs-s.net> 写道:
>>
>>> Er... I can't use incremental send if I can't get one full send to go through first. =)
>> sory,  i mean one approach is use '-p' option, you can use:
>>
>> # btrfs sub create subv
>> # btrfs sub snapshot -r subv snap
>> # btrfs sub snapshot -r sub  snap1
>> # btrfs send snap -p snap1 -f 1
>> # btrfs receive -f 1 backup
>> # btrfs sub delete snap1 -<-- now you can delete snap1 safely
>>
>> The above approach is much faster, i think you can try it!
>>
>> Thanks,
>> Wang
>>> I'm hoping the problem will go away for long enough to get a full send completed once I reboot the box, but I can't do that until (much) later in the day.
>>>
>>> On 01/13/2014 10:17 AM, Wang Shilong wrote:
>>>> Hello,
>>>>
>>>> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
>>>> krealloc which might cause memory allocation fails especial you use full send.
>>>>
>>>> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
>>>> this issue.
>>>>
>>>> Thanks,
>>>> Wang
>>>>
>>>>> Hi list -
>>>>>
>>>>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>>>>
>>>>> root@gwa-virt1:/data/images/.snapshots# free -m
>>>>>            total       used       free     shared    buffers cached
>>>>> Mem:         32159      31789        369          0          0 21276
>>>>> -/+ buffers/cache:      10513      21646
>>>>> Swap:            0          0          0
>>>>>
>>>>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>>>>
>>>>> Kern log snippets follow:
>>>>>
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 15:44       ` Wang Shilong
  2014-01-13 16:00         ` Jim Salter
@ 2014-01-13 16:01         ` Jim Salter
  1 sibling, 0 replies; 15+ messages in thread
From: Jim Salter @ 2014-01-13 16:01 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

BTW, this problem occurred with the 3.11 kernel shipping by default in
Ubuntu Saucy as well as this 3.13rc7 daily kernel I'm using currently.

On 01/13/2014 10:44 AM, Wang Shilong wrote:
> Just double check, what is your kernel version to trigger this problem…
> I suppose this should be an older kernel? If yes,  can you have a try at the latest
> upstream kernel and see if problem still exist? 
>
> Thanks,
> Wang
>
>> 在 2014-1-13,下午11:20,Jim Salter <jim@jrs-s.net> 写道:
>>
>>> Er... I can't use incremental send if I can't get one full send to go through first. =)
>> sory,  i mean one approach is use '-p' option, you can use:
>>
>> # btrfs sub create subv
>> # btrfs sub snapshot -r subv snap
>> # btrfs sub snapshot -r sub  snap1
>> # btrfs send snap -p snap1 -f 1
>> # btrfs receive -f 1 backup
>> # btrfs sub delete snap1 -<-- now you can delete snap1 safely
>>
>> The above approach is much faster, i think you can try it!
>>
>> Thanks,
>> Wang
>>> I'm hoping the problem will go away for long enough to get a full send completed once I reboot the box, but I can't do that until (much) later in the day.
>>>
>>> On 01/13/2014 10:17 AM, Wang Shilong wrote:
>>>> Hello,
>>>>
>>>> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
>>>> krealloc which might cause memory allocation fails especial you use full send.
>>>>
>>>> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
>>>> this issue.
>>>>
>>>> Thanks,
>>>> Wang
>>>>
>>>>> Hi list -
>>>>>
>>>>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>>>>
>>>>> root@gwa-virt1:/data/images/.snapshots# free -m
>>>>>            total       used       free     shared    buffers cached
>>>>> Mem:         32159      31789        369          0          0 21276
>>>>> -/+ buffers/cache:      10513      21646
>>>>> Swap:            0          0          0
>>>>>
>>>>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>>>>
>>>>> Kern log snippets follow:
>>>>>
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 16:00         ` Jim Salter
@ 2014-01-13 16:09           ` Wang Shilong
  0 siblings, 0 replies; 15+ messages in thread
From: Wang Shilong @ 2014-01-13 16:09 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs


> It's a pretty new kernel - 3.13.0-031300rc7-generic.
> 
> Sorry if I'm misunderstanding something, but... how can I send with -p
> if the parent snapshot doesn't already exist on the target?

Obviously, you can not…

Sorry, i was just thinking you use all the operations in one machine just like me ^_^

> 
> What I'm doing is btrfs send /.snapshots/name-of-snapshot | ssh
> othermachine btrfs receive /.snapshots. If the parent specified in -p
> doesn't exist on othermachine, won't the receive operation fail?
> 
> On 01/13/2014 10:44 AM, Wang Shilong wrote:
>> Just double check, what is your kernel version to trigger this problem…
>> I suppose this should be an older kernel? If yes,  can you have a try at the latest
>> upstream kernel and see if problem still exist? 
>> 
>> Thanks,
>> Wang
>> 
>>> 在 2014-1-13,下午11:20,Jim Salter <jim@jrs-s.net> 写道:
>>> 
>>>> Er... I can't use incremental send if I can't get one full send to go through first. =)
>>> sory,  i mean one approach is use '-p' option, you can use:
>>> 
>>> # btrfs sub create subv
>>> # btrfs sub snapshot -r subv snap
>>> # btrfs sub snapshot -r sub  snap1
>>> # btrfs send snap -p snap1 -f 1
>>> # btrfs receive -f 1 backup
>>> # btrfs sub delete snap1 -<-- now you can delete snap1 safely
>>> 
>>> The above approach is much faster, i think you can try it!
>>> 
>>> Thanks,
>>> Wang
>>>> I'm hoping the problem will go away for long enough to get a full send completed once I reboot the box, but I can't do that until (much) later in the day.
>>>> 
>>>> On 01/13/2014 10:17 AM, Wang Shilong wrote:
>>>>> Hello,
>>>>> 
>>>>> I took a careful think about your problems below, i think this is because btrfs *ulist* implement use
>>>>> krealloc which might cause memory allocation fails especial you use full send.
>>>>> 
>>>>> Before we kicked off now stupid  *ulist* implements, i think you can use incremental send to solve
>>>>> this issue.
>>>>> 
>>>>> Thanks,
>>>>> Wang
>>>>> 
>>>>>> Hi list -
>>>>>> 
>>>>>> Getting sporadic page allocation failures in btrfs send. This happened once several weeks ago but was fine after a reboot; yesterday I did not reboot, but had the failure back-to-back trying to send two different snapshots. These are full sends, not incremental, of a bit over 600G of data. Test machine has 32G of RAM, with 21G of it free (not including cache):
>>>>>> 
>>>>>> root@gwa-virt1:/data/images/.snapshots# free -m
>>>>>>           total       used       free     shared    buffers cached
>>>>>> Mem:         32159      31789        369          0          0 21276
>>>>>> -/+ buffers/cache:      10513      21646
>>>>>> Swap:            0          0          0
>>>>>> 
>>>>>> In both cases (all three, really) the btrfs send failed a bit more than half of the way through the send (somewhere around the 380GB mark).
>>>>>> 
>>>>>> Kern log snippets follow:
>>>>>> 
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation failure: order:6, mode:0x104050
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627622] CPU: 6 PID: 9642 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627773] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627860] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627894] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627930] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627965] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628003] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628037] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628072] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628107] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628141] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628174] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628209] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628244] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.628279] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>> 
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016512] btrfs: page allocation failure: order:5, mode:0x104050
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016518] CPU: 4 PID: 18689 Comm: btrfs Not tainted 3.13.0-031300rc7-generic #201401041835
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016597] [<ffffffffa0142214>] ? btrfs_get_token_64+0x64/0xf0 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016617] [<ffffffffa01876dd>] ? ulist_add_merge+0xcd/0x270 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016637] [<ffffffffa01876dd>] ulist_add_merge+0xcd/0x270 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016653] [<ffffffffa018615c>] find_parent_nodes+0x50c/0x6f0 [btrf ]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016669] [<ffffffffa018e550>] ? compare_refs.isra.23+0x130/0x130 btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016684] [<ffffffffa0187019>] iterate_extent_inodes+0xf9/0x270 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016700] [<ffffffffa014b7a5>] ? free_extent_buffer+0x35/0x40 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016716] [<ffffffffa018dc9d>] find_extent_clone.isra.26+0x26d/0x340 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016732] [<ffffffffa0191207>] process_extent+0xd7/0x180 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016747] [<ffffffffa01918ff>] changed_cb+0xdf/0x170 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016763] [<ffffffffa0191ad2>] full_send_tree+0x142/0x280 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016778] [<ffffffffa0191ccc>] ? send_subvol_begin+0xbc/0x2b0 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016794] [<ffffffffa0191fa0>] send_subvol+0xe0/0xf0 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016810] [<ffffffffa01922f1>] btrfs_ioctl_send+0x341/0x520 [btrfs]
>>>>>> Jan 12 21:34:00 gwa-virt1 kernel: [562448.016826] [<ffffffffa01606d3>] btrfs_ioctl+0x953/0xac0 [btrfs]
>>>>>> 
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 12:58 btrfs send: page allocation failure Jim Salter
  2014-01-13 15:17 ` Wang Shilong
@ 2014-01-13 18:23 ` David Sterba
  2014-01-13 18:36   ` Josef Bacik
  2014-01-13 18:37   ` Jim Salter
  1 sibling, 2 replies; 15+ messages in thread
From: David Sterba @ 2014-01-13 18:23 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs, jbacik

On Mon, Jan 13, 2014 at 07:58:48AM -0500, Jim Salter wrote:
> Getting sporadic page allocation failures in btrfs send. This happened once
> several weeks ago but was fine after a reboot; yesterday I did not reboot,
> but had the failure back-to-back trying to send two different snapshots.
> These are full sends, not incremental, of a bit over 600G of data. Test
> machine has 32G of RAM, with 21G of it free (not including cache):
> 
> root@gwa-virt1:/data/images/.snapshots# free -m
>              total       used       free     shared    buffers cached
> Mem:         32159      31789        369          0          0 21276
> -/+ buffers/cache:      10513      21646
> Swap:            0          0          0
> 
> In both cases (all three, really) the btrfs send failed a bit more than half
> of the way through the send (somewhere around the 380GB mark).
> 
> Kern log snippets follow:
> 
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation
> failure: order:6, mode:0x104050
> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ?
> ulist_add_merge+0xcd/0x270 [btrfs]

That's the krealloc failure, Josef has a patch that came out of
https://bugzilla.kernel.org/show_bug.cgi?id=60579
but I don't see it merged anywhere.

david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 18:23 ` David Sterba
@ 2014-01-13 18:36   ` Josef Bacik
  2014-01-13 18:37   ` Jim Salter
  1 sibling, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2014-01-13 18:36 UTC (permalink / raw)
  To: dsterba, Jim Salter, linux-btrfs


On 01/13/2014 01:23 PM, David Sterba wrote:
> On Mon, Jan 13, 2014 at 07:58:48AM -0500, Jim Salter wrote:
>> Getting sporadic page allocation failures in btrfs send. This happened once
>> several weeks ago but was fine after a reboot; yesterday I did not reboot,
>> but had the failure back-to-back trying to send two different snapshots.
>> These are full sends, not incremental, of a bit over 600G of data. Test
>> machine has 32G of RAM, with 21G of it free (not including cache):
>>
>> root@gwa-virt1:/data/images/.snapshots# free -m
>>               total       used       free     shared    buffers cached
>> Mem:         32159      31789        369          0          0 21276
>> -/+ buffers/cache:      10513      21646
>> Swap:            0          0          0
>>
>> In both cases (all three, really) the btrfs send failed a bit more than half
>> of the way through the send (somewhere around the 380GB mark).
>>
>> Kern log snippets follow:
>>
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation
>> failure: order:6, mode:0x104050
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ?
>> ulist_add_merge+0xcd/0x270 [btrfs]
> That's the krealloc failure, Josef has a patch that came out of
> https://bugzilla.kernel.org/show_bug.cgi?id=60579
> but I don't see it merged anywhere.
That patch isn't quite right which is why it isn't merged, I'll come up 
with something better and send it out soonish.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 18:23 ` David Sterba
  2014-01-13 18:36   ` Josef Bacik
@ 2014-01-13 18:37   ` Jim Salter
  2014-01-13 18:56     ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: Jim Salter @ 2014-01-13 18:37 UTC (permalink / raw)
  To: dsterba, linux-btrfs, jbacik

What makes you believe that? The bug filed there appears to be related 
to defragging, which I am not doing either manually or automatically.

On 01/13/2014 01:23 PM, David Sterba wrote:
> On Mon, Jan 13, 2014 at 07:58:48AM -0500, Jim Salter wrote:
>> Getting sporadic page allocation failures in btrfs send. This happened once
>> several weeks ago but was fine after a reboot; yesterday I did not reboot,
>> but had the failure back-to-back trying to send two different snapshots.
>> These are full sends, not incremental, of a bit over 600G of data. Test
>> machine has 32G of RAM, with 21G of it free (not including cache):
>>
>> root@gwa-virt1:/data/images/.snapshots# free -m
>>               total       used       free     shared    buffers cached
>> Mem:         32159      31789        369          0          0 21276
>> -/+ buffers/cache:      10513      21646
>> Swap:            0          0          0
>>
>> In both cases (all three, really) the btrfs send failed a bit more than half
>> of the way through the send (somewhere around the 380GB mark).
>>
>> Kern log snippets follow:
>>
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627611] btrfs: page allocation
>> failure: order:6, mode:0x104050
>> Jan 12 14:05:36 gwa-virt1 kernel: [535523.627818] [<ffffffffa01876dd>] ?
>> ulist_add_merge+0xcd/0x270 [btrfs]
> That's the krealloc failure, Josef has a patch that came out of
> https://bugzilla.kernel.org/show_bug.cgi?id=60579
> but I don't see it merged anywhere.
>
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 18:37   ` Jim Salter
@ 2014-01-13 18:56     ` David Sterba
  2014-01-13 19:03       ` Jim Salter
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2014-01-13 18:56 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs, jbacik

On Mon, Jan 13, 2014 at 01:37:31PM -0500, Jim Salter wrote:
> What makes you believe that? The bug filed there appears to be related to
> defragging, which I am not doing either manually or automatically.

The quota groups are on and the symptoms match the known problem when
there are lots of backrefs.
krealloc is on the stack although not mentioned explicitly, called
through ulist_add_merge. The allocation order is 6 from the OOM report,
means that somebody wanted a large congiguous chunk of memory, which is
what the krealloc did and failed. Kernel memory is fragmented and this
kind of allocations is hard to satisfy in the long run.

david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 18:56     ` David Sterba
@ 2014-01-13 19:03       ` Jim Salter
  2014-01-14 13:13         ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Salter @ 2014-01-13 19:03 UTC (permalink / raw)
  To: dsterba, linux-btrfs, jbacik

OK, thanks. If kernel memory fragmentation is a big factor, that would 
also explain why it succeeds after a reboot but does not succeed after 
weeks of uptime...

On 01/13/2014 01:56 PM, David Sterba wrote:
> On Mon, Jan 13, 2014 at 01:37:31PM -0500, Jim Salter wrote:
>> What makes you believe that? The bug filed there appears to be related to
>> defragging, which I am not doing either manually or automatically.
> The quota groups are on and the symptoms match the known problem when
> there are lots of backrefs.
> krealloc is on the stack although not mentioned explicitly, called
> through ulist_add_merge. The allocation order is 6 from the OOM report,
> means that somebody wanted a large congiguous chunk of memory, which is
> what the krealloc did and failed. Kernel memory is fragmented and this
> kind of allocations is hard to satisfy in the long run.
>
> david


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-13 19:03       ` Jim Salter
@ 2014-01-14 13:13         ` David Sterba
  2014-01-14 14:58           ` Jim Salter
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2014-01-14 13:13 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs, jbacik

On Mon, Jan 13, 2014 at 02:03:52PM -0500, Jim Salter wrote:
> OK, thanks. If kernel memory fragmentation is a big factor, that would also
> explain why it succeeds after a reboot but does not succeed after weeks of
> uptime...

Yes, that's it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: btrfs send: page allocation failure
  2014-01-14 13:13         ` David Sterba
@ 2014-01-14 14:58           ` Jim Salter
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Salter @ 2014-01-14 14:58 UTC (permalink / raw)
  To: dsterba, linux-btrfs, jbacik

FWIW, for those following along: took the system down for a reboot last 
night at 9PM EST, then tried the full send after the reboot - took 12 
hours or so to complete, but completed no issues, as have a few 
incremental sends since.

So, yeah, definitely looks related to kernel memory fragmentation AFAICT.

On 01/14/2014 08:13 AM, David Sterba wrote:
> On Mon, Jan 13, 2014 at 02:03:52PM -0500, Jim Salter wrote:
>> OK, thanks. If kernel memory fragmentation is a big factor, that would also
>> explain why it succeeds after a reboot but does not succeed after weeks of
>> uptime...
> Yes, that's it.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-14 14:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-13 12:58 btrfs send: page allocation failure Jim Salter
2014-01-13 15:17 ` Wang Shilong
2014-01-13 15:20   ` Jim Salter
2014-01-13 15:29     ` Wang Shilong
2014-01-13 15:44       ` Wang Shilong
2014-01-13 16:00         ` Jim Salter
2014-01-13 16:09           ` Wang Shilong
2014-01-13 16:01         ` Jim Salter
2014-01-13 18:23 ` David Sterba
2014-01-13 18:36   ` Josef Bacik
2014-01-13 18:37   ` Jim Salter
2014-01-13 18:56     ` David Sterba
2014-01-13 19:03       ` Jim Salter
2014-01-14 13:13         ` David Sterba
2014-01-14 14:58           ` Jim Salter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox