linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chandan Rajendra <chandan@linux.vnet.ibm.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Qgroup accounting issue on kdave/for-next branch
Date: Tue, 29 Nov 2016 15:49:50 +0530	[thread overview]
Message-ID: <2462244.ZlK0p3ShUA@localhost.localdomain> (raw)
In-Reply-To: <dd0cae85-213b-db19-2d69-176707eb7d12@cn.fujitsu.com>

On Tuesday, November 29, 2016 04:41:41 PM Qu Wenruo wrote:
> 
> At 11/29/2016 04:21 PM, Chandan Rajendra wrote:
> > On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote:
> >> At 11/29/2016 02:36 PM, Chandan Rajendra wrote:
> >>> When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I
> >>> noticed the following call trace.
> >>>
> >>> [   77.335887] ------------[ cut here ]------------
> >>> [   77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220
> >>> [   77.336303] Modules linked in:
> >>> [   77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22
> >>> [   77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000
> >>> [   77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0
> >>> [   77.336771] REGS: c00000062b9a7450 TRAP: 0700   Not tainted  (4.9.0-rc5-00062-g6b74e43)
> >>> [   77.336908] MSR: 800000000282b032
> >>> [   77.336960] <
> >>> [   77.337027] SF
> >>> [   77.337053] ,VEC
> >>> [   77.337087] ,VSX
> >>> [   77.337114] ,EE
> >>> [   77.337146] ,FP
> >>> [   77.337173] ,ME
> >>> [   77.337207] ,IR
> >>> [   77.337233] ,DR
> >>> [   77.337267] ,RI
> >>> [   77.337294] >
> >>> [   77.337330]   CR: 88000842  XER: 00000000
> >>> [   77.337392] CFAR: c0000000005c9b5c
> >>> [   77.337443] SOFTE: 1
> >>> [   77.337477]
> >>>                GPR00:
> >>> [   77.337517] c0000000005cefcc
> >>> [   77.337575] c00000062b9a76d0
> >>> [   77.337626] c000000001103a00
> >>> [   77.337714] c00000063e058d40
> >>> [   77.337765]
> >>>                GPR04:
> >>> [   77.337817] c00000062b9a7740
> >>> [   77.337868] 0000000000000008
> >>> [   77.337927] 000000000005f000
> >>> [   77.337978] 000000063eed0000
> >>> [   77.338037]
> >>>                GPR08:
> >>> [   77.338080] c00000062f9629c8
> >>> [   77.338138] 0000000000000001
> >>> [   77.338191] fffffffffffff000
> >>> [   77.338248] c00000063e058e80
> >>> [   77.338300]
> >>>                GPR12:
> >>> [   77.338384] 0000000000000000
> >>> [   77.338435] c00000000fe00000
> >>> [   77.338498] 0000000020000000
> >>> [   77.338548] 0000000000000000
> >>> [   77.338605]
> >>>                GPR16:
> >>> [   77.338645] 0000000000000008
> >>> [   77.338703] 000000004d5906fc
> >>> [   77.338754] 000000004d5a6c08
> >>> [   77.338811] 000000004d54b3d0
> >>> [   77.338866]
> >>>                GPR20:
> >>> [   77.338921] 000001000bcf8440
> >>> [   77.338972] 0000000000000000
> >>> [   77.339030] 0000000000000000
> >>> [   77.339080] c00000062523b078
> >>> [   77.339138]
> >>>                GPR24:
> >>> [   77.339178] c00000062523b080
> >>> [   77.339240] c000000000da2b58
> >>> [   77.339290] 0000000000000000
> >>> [   77.339347] c00000062e539600
> >>> [   77.339398]
> >>>                GPR28:
> >>> [   77.339485] 0000000000060000
> >>> [   77.339536] c00000062523b000
> >>> [   77.339594] c00000062e539600
> >>> [   77.339644] c00000062e539688
> >>>
> >>> [   77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220
> >>> [   77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220
> >>> [   77.339959] Call Trace:
> >>> [   77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220
> >>> [   77.340123]  (unreliable)
> >>> [   77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240
> >>> [   77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0
> >>> [   77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0
> >>> [   77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100
> >>> [   77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0
> >>> [   77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30
> >>> [   77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0
> >>> [   77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0
> >>> [   77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0
> >>> [   77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180
> >>> [   77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0
> >>> [   77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74
> >>> [   77.343187] Instruction dump:
> >>> [   77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8
> >>> [   77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8
> >>> [   77.343771] ---[ end trace 8130fb71377d4ff8 ]---
> >>> [   77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216
> >>> [   77.735193] BTRFS info (device loop1): disabling disk space caching
> >>> [   77.736629] BTRFS info (device loop1): has skinny extents
> >>>
> >>>
> >>> The above call trace happens because of the following:
> >>>
> >>>  |--------------------------------+-----------------------------------|
> >>>  | Task A                         | Task B                            |
> >>>  |--------------------------------+-----------------------------------|
> >>>  | Write 4k bytes to a file       |                                   |
> >>>  | at offset range [0, 4095]      |                                   |
> >>>  | - __btrfs_buffered_write       |                                   |
> >>>  | -- btrfs_check_data_free_space |                                   |
> >>>  | --- Set EXTENT_QGROUP_RESERVED |                                   |
> >>>  |     bit for range [0, 64k-1]   |                                   |
> >>>  | -- Copy 4k data from userspace |                                   |
> >>>  |    to page cache               |                                   |
> >>>  |                                | Write 4k bytes to the file        |
> >>>  |                                | at offset range [4096, 8191]      |
> >>>  |                                | - __btrfs_buffered_write          |
> >>>  |                                | -- btrfs_check_data_free_space    |
> >>>  |                                | --- Since EXTENT_QGROUP_RESERVED  |
> >>>  |                                | is already set we don't           |
> >>>  |                                | reserve qgroup space              |
> >>>  |                                | -- Assume the call to             |
> >>>  |                                | btrfs_delalloc_reserve_metadata() |
> >>>  |                                | fails                             |
> >>>  |                                | -- btrfs_free_reserved_data_space |
> >>>  |                                | --- Clear EXTENT_QGROUP_RESERVED  |
> >>>  |                                |     file range [4096, 8191]       |
> >>>  |--------------------------------+-----------------------------------|
> >>
> >> Task B part seems a little strange.
> >>
> >> For the lastest for-next branch, for-next-20161125, in
> >> btrfs_delalloc_reserve_metadata(), it will roundup @start and @len.
> >
> > I was using for-next-20161118. But the problem is still reproducible on
> > for-next-20161125. I guess, you meant to mention
> > btrfs_free_reserved_data_space() instead of
> > btrfs_delalloc_reserve_metadata().
> 
> My fault. And yes, I was meant to mention btrfs_free_reserved_data_space().
> >>
> >> So for btrfs_qgroup_free_data(), we will free the range of [0, 64k-1],
> >> just like what we reserved in Task A.
> >
> > So with the new code, Instead of 4k, Task B frees 64k worth of qgroup space
> > that it had *not* reserved.
> 
> For 64K page size, we only support 64K sectorsize. (Without your subpage 
> size support patches)
> And in that case, file extent will be 64K size, so we reserve 64K and 
> free 64K, which is correct.
> 
> > The reservation was originally done by Task A. So
> > when Task A's file extent item is being inserted, it will end up releasing 64k
> > additional qgroup reservation.
> 
> You are right about the case.
> 
> If Tasl A's data isn't written into disk(buffered write), then Task B
> doesn't need to reserve space, just reuse the space Task A reserved.
> 
> And in that case, you're completely right, in error path for Task B, the 
> free will be deadly.
> 
> Two fix seems to be available:
> 1) Stop buffered write for the reserved space of Task A
>     But since Task A succeeded in buffered write, it should assume the
>     data will reach disk, and we will break the assumption.
> 2) Use accurate free_reserved range in error path
>     I think it's possible to export the extent_changeset structure and
>     use them to record what range we reserved, so we can free accurate
>     amount of space.
>     But the change may be a little large. Several interface needs to be
>     changed.
>
> If I misunderstood the problem again, please point it out in case I
> missed something again.
> 
> 
> BTW, does the 64K sectorsize of PPC64 makes metadata much easier to hit 
> ENOSPC?

On ppc64, I am actually hitting EDQUOT easily since nodesize is 64k and also
the 0/5 qgroup is limited to 512k.

> 
> Thanks again for your detailed report,
> Qu
> 
> >
> > Executing "perf record -e btrfs:btrfs_qgroup_release_data -e
> > btrfs:btrfs_qgroup_reserve_data -a -g ./check btrfs/126" gives the following,
> >
> > xfs_io  9536 [002]   454.601476: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=reserve
> > xfs_io  9536 [002]   454.601522: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=reserve
> > xfs_io  9536 [002]   454.601530: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=free
> > kthreadd  9171 [003]   454.704145: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=release
> >
> > Please note that in the above perf ouput, Instead of two tasks, the same task
> > writes two 4k blocks of data.
> >
> 
> 

-- 
chandan


  reply	other threads:[~2016-11-29 15:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-29  6:36 Qgroup accounting issue on kdave/for-next branch Chandan Rajendra
2016-11-29  7:10 ` Qu Wenruo
2016-11-29  7:55 ` Qu Wenruo
2016-11-29  8:21   ` Chandan Rajendra
2016-11-29  8:41     ` Qu Wenruo
2016-11-29 10:19       ` Chandan Rajendra [this message]
2016-11-29 15:56 ` Jeff Mahoney
2016-11-29 16:01   ` Jeff Mahoney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2462244.ZlK0p3ShUA@localhost.localdomain \
    --to=chandan@linux.vnet.ibm.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).