From: Chandan Babu R <chandanrlinux@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
linux-xfs@vger.kernel.org, david@fromorbit.com,
bfoster@redhat.com
Subject: Re: [PATCH 2/7] xfs: Check for per-inode extent count overflow
Date: Sat, 20 Jun 2020 18:23:37 +0530 [thread overview]
Message-ID: <1932444.5ZnB01jybz@garuda> (raw)
In-Reply-To: <20200619143608.GB29528@infradead.org>
On Friday 19 June 2020 8:06:08 PM IST Christoph Hellwig wrote:
> I'm lost in 4 layers of full quotes. Can someone summarize the
> discussion without hundreds of lines of irrelevant quotes?
>
XFS does not check for possible overflow of per-inode extent counter fields
when adding extents to either data or attr fork.
For e.g.
1. Insert 5 million xattrs (each having a value size of 255 bytes) and then
delete 50% of them in an alternating manner.
./benchmark-xattrs -l 255 -n 5000000 -s 50 -f $mntpnt/testfile-0
benchmark-xattrs.c and related sources can be obtained from
https://github.com/chandanr/xfs-xattr-benchmark/blob/master/src/
2. This causes 98511 extents to be created in the attr fork of the inode.
xfsaild/loop0 2035 [003] 9643.390490: probe:xfs_iflush_int: (ffffffffac6225c0) if_nextents=98511 inode=131
3. The incore inode fork extent counter is a signed 32-bit quantity. However
the on-disk extent counter is an unsigned 16-bit quantity and hence cannot
hold 98511 extents.
4. The following incorrect value is stored in the attr extent counter
# xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
core.naextents = -32561
As an aside, Please note that the sequence of events causing counter overflow
that have been described in this patch's description is not applicable since
the patches have been rebased on xfs-ifork-cleanup.2 which causes a signed
32-bit (xfs_extnum_t) quantity to be used to store the extent counter in
memory. However, the "on-disk counter overflow" bug still exists as was
described above.
To prevent the overflow bug from occuring silently, this patch now checks for
the overflow condition before incrementing the in-core value and returns an
error if a possible overflow is detected.
Darrick pointed out that returning an error code can cause the transaction to
be aborted because it is most likely to be dirty. Hence his suggestion (to
which I agree) was to check for possible overflow just after starting a
transaction and obtaining the inode's i_lock.
Also, since we are extending the data and xattr extent counters to 32 and 47
bits in the later patches the value of log reservations will change since they
are a function of maximum height of BMBT trees. The maximum of height of BMBT
trees are themselves calculated based on the maximum number of xattr and data
extents. Due to this, "min log size" can end up being larger than what was
calculated during mkfs.xfs time. This can cause mount to fail as shown by the
following call trace which was generated when executing xfs/306 test,
XFS (loop0): Mounting V5 Filesystem
XFS (loop0): Log size 8906 blocks too small, minimum size is 9075 blocks
XFS (loop0): AAIEEE! Log failed size checks. Abort!
XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 711
------------[ cut here ]------------
WARNING: CPU: 0 PID: 12821 at fs/xfs/xfs_message.c:112 assfail+0x25/0x28
Modules linked in:
CPU: 0 PID: 12821 Comm: mount Tainted: G W 5.6.0-rc6-next-20200320-chandan-00003-g071c2af3f4de #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:assfail+0x25/0x28
Code: ff ff 0f 0b c3 0f 1f 44 00 00 41 89 c8 48 89 d1 48 89 f2 48 c7 c6 40 b7 4b b3 e8 82 f9 ff ff 80 3d 83 d6 64 01 00 74 02 0f $
RSP: 0018:ffffb05b414cbd78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff9d9d501d5000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffb346dc65
RBP: ffff9da444b49a80 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: 00000000ffffffea
R13: 000000000000000e R14: 0000000000004594 R15: ffff9d9d501d5628
FS: 00007fd6c5d17c80(0000) GS:ffff9da44d800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 00000008a48c0000 CR4: 00000000000006f0
Call Trace:
xfs_log_mount+0xf8/0x300
xfs_mountfs+0x46e/0x950
xfs_fc_fill_super+0x318/0x510
? xfs_mount_free+0x30/0x30
get_tree_bdev+0x15c/0x250
vfs_get_tree+0x25/0xb0
do_mount+0x740/0x9b0
? memdup_user+0x41/0x80
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x48/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fd6c5f2ccda
Code: 48 8b 0d b9 e1 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f $
RSP: 002b:00007ffe00dfb9f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000560c1aaa92c0 RCX: 00007fd6c5f2ccda
RDX: 0000560c1aaae110 RSI: 0000560c1aaad040 RDI: 0000560c1aaa94d0
RBP: 00007fd6c607d204 R08: 0000000000000000 R09: 0000560c1aaadde0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000560c1aaa94d0 R15: 0000560c1aaae110
---[ end trace 6436391b468bc652 ]---
XFS (loop0): log mount failed
The corresponding filesystem was created using mkfs options
"-m rmapbt=1,reflink=1 -b size=1k -d size=20m -n size=64k".
To prevent such incidents, we are contemplating on using the following
approach,
1. Use existing constants for max extent counts (i.e. signed 2^16 for xattrs
and signed 2^32 for data extents).
2. Compute max bmbt heights, log reservations and hence min log size during
mount time.
3. Later, during the mount lifetime of the filesystem, when we are about to
overflow the extent counter we use larger values (2^32 for xattr and 2^47
for data) for max extent count and then recompute log reservations and min
logsize. At this point, if on-disk log size is smaller than min log size
we return with an error.
4. Otherwise, we set an RO-feature flag and use the revised log reservations
for future transactions.
Please let me know your opinion on the above approaches.
--
chandan
next prev parent reply other threads:[~2020-06-20 12:54 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-06 8:27 [PATCH 0/7] xfs: Extend per-inode extent counters Chandan Babu R
2020-06-06 8:27 ` [PATCH 1/7] xfs: Fix log reservation calculation for xattr insert operation Chandan Babu R
2020-06-19 14:33 ` Christoph Hellwig
2020-06-20 12:53 ` Chandan Babu R
2020-06-06 8:27 ` [PATCH 2/7] xfs: Check for per-inode extent count overflow Chandan Babu R
2020-06-08 16:24 ` Darrick J. Wong
2020-06-08 16:32 ` Darrick J. Wong
2020-06-09 14:22 ` Chandan Babu R
2020-06-09 17:07 ` Darrick J. Wong
2020-06-10 6:24 ` Chandan Babu R
2020-06-09 14:22 ` Chandan Babu R
2020-06-09 17:10 ` Darrick J. Wong
2020-06-19 14:36 ` Christoph Hellwig
2020-06-19 21:31 ` Darrick J. Wong
2020-06-20 12:53 ` Chandan Babu R [this message]
2020-06-06 8:27 ` [PATCH 3/7] xfs: Compute maximum height of directory BMBT separately Chandan Babu R
2020-06-08 20:59 ` Darrick J. Wong
2020-06-09 14:23 ` Chandan Babu R
2020-06-09 18:40 ` Darrick J. Wong
2020-06-10 6:23 ` Chandan Babu R
2020-06-11 6:38 ` Chandan Babu R
2020-06-06 8:27 ` [PATCH 4/7] xfs: Add "Use Dir BMBT height" argument to XFS_BM_MAXLEVELS() Chandan Babu R
2020-06-08 17:50 ` Darrick J. Wong
2020-06-09 14:23 ` Chandan Babu R
2020-06-06 8:27 ` [PATCH 5/7] xfs: Use 2^27 as the maximum number of directory extents Chandan Babu R
2020-06-08 16:52 ` Darrick J. Wong
2020-06-09 14:23 ` Chandan Babu R
2020-06-06 8:27 ` [PATCH 6/7] xfs: Extend data extent counter to 47 bits Chandan Babu R
2020-06-08 17:14 ` Darrick J. Wong
2020-06-09 14:23 ` Chandan Babu R
2020-08-31 21:05 ` Darrick J. Wong
2020-06-19 14:38 ` Christoph Hellwig
2020-06-20 12:52 ` Chandan Babu R
2020-06-06 8:27 ` [PATCH 7/7] xfs: Extend attr extent counter to 32 bits Chandan Babu R
2020-06-08 17:21 ` Darrick J. Wong
2020-06-09 14:22 ` Chandan Babu R
2020-06-19 14:39 ` Christoph Hellwig
2020-06-20 12:53 ` Chandan Babu R
2020-06-08 17:31 ` [PATCH 0/7] xfs: Extend per-inode extent counters Darrick J. Wong
2020-06-09 14:22 ` Chandan Babu R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1932444.5ZnB01jybz@garuda \
--to=chandanrlinux@gmail.com \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox