linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Krzysztof Kozlowski <krzk@kernel.org>
To: Jeff Layton <jlayton@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	viro@zeniv.linux.org.uk, linux-nfs@vger.kernel.org,
	bfields@fieldses.org, neilb@suse.de, jack@suse.de,
	linux-ext4@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, linux-xfs@vger.kernel.org,
	darrick.wong@oracle.com, david@fromorbit.com,
	linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com,
	dsterba@suse.com, linux-integrity@vger.kernel.org,
	zohar@linux.vnet.ibm.com, dmitry.kasatkin@gmail.com,
	linux-afs@lists.infradead.org, dhowells@redhat.com,
	jaltman@auristor.com, linux-samsung-soc@vger.kernel.org,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Bartłomiej Żołnierkiewicz" <b.zolnierkie@samsung.com>,
	"Sylwester Nawrocki" <s.nawrocki@samsung.com>
Subject: Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently
Date: Mon, 8 Jan 2018 18:29:28 +0100	[thread overview]
Message-ID: <20180108172928.aqndozcpljb73uj6@kozik-lap> (raw)
In-Reply-To: <1515418164.3486.18.camel@kernel.org>

On Mon, Jan 08, 2018 at 08:29:24AM -0500, Jeff Layton wrote:
> On Mon, 2018-01-08 at 14:21 +0100, Krzysztof Kozlowski wrote:
> > On Mon, Jan 8, 2018 at 1:56 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > On Sun, 2018-01-07 at 13:44 +0100, Krzysztof Kozlowski wrote:
> > > > On Fri, Dec 22, 2017 at 1:05 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > From: Jeff Layton <jlayton@redhat.com>
> > > > > 
> > > > > Since i_version is mostly treated as an opaque value, we can exploit that
> > > > > fact to avoid incrementing it when no one is watching. With that change,
> > > > > we can avoid incrementing the counter on writes, unless someone has
> > > > > queried for it since it was last incremented. If the a/c/mtime don't
> > > > > change, and the i_version hasn't changed, then there's no need to dirty
> > > > > the inode metadata on a write.
> > > > > 
> > > > > Convert the i_version counter to an atomic64_t, and use the lowest order
> > > > > bit to hold a flag that will tell whether anyone has queried the value
> > > > > since it was last incremented.
> > > > > 
> > > > > When we go to maybe increment it, we fetch the value and check the flag
> > > > > bit.  If it's clear then we don't need to do anything if the update
> > > > > isn't being forced.
> > > > > 
> > > > > If we do need to update, then we increment the counter by 2, and clear
> > > > > the flag bit, and then use a CAS op to swap it into place. If that
> > > > > works, we return true. If it doesn't then do it again with the value
> > > > > that we fetch from the CAS operation.
> > > > > 
> > > > > On the query side, if the flag is already set, then we just shift the
> > > > > value down by 1 bit and return it. Otherwise, we set the flag in our
> > > > > on-stack value and again use cmpxchg to swap it into place if it hasn't
> > > > > changed. If it has, then we use the value from the cmpxchg as the new
> > > > > "old" value and try again.
> > > > > 
> > > > > This method allows us to avoid incrementing the counter on writes (and
> > > > > dirtying the metadata) under typical workloads. We only need to increment
> > > > > if it has been queried since it was last changed.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > > ---
> > > > >  include/linux/fs.h       |   2 +-
> > > > >  include/linux/iversion.h | 208 ++++++++++++++++++++++++++++++++++-------------
> > > > >  2 files changed, 154 insertions(+), 56 deletions(-)
> > > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > On recent linux-next my ARM/Exynos boards fail to boot over nfsroot.
> > > > This commit popped up through bisect (log at the end). Systemd
> > > > timeouts on some device-specific services, including mounting ext4
> > > > /home:
> > > > 
> > > > [ *** ] (1 of 4) A start job is running for…ress polling (1min 41s / no limit)
> > > > [ TIME ] Timed out waiting for device dev-ttySAC2.device.
> > > > Jan 07 13:29:38 [DEPEND] Dependency failed for Serial Getty on ttySAC2.
> > > > Jan 07 13:29:38 [ TIME ] Timed out waiting for device
> > > > dev-disk-by\x2dlabel-home.device.
> > > > Jan 07 13:29:38 [DEPEND] Dependency failed for /home.
> > > > Jan 07 13:29:38 [DEPEND] Dependency failed for Local File Systems.
> > > > Jan 07 13:29:38 [DEPEND] Dependency failed for File System Check on
> > > > /dev/disk/by-label/home.
> > > > Jan 07 13:30:02 [ *** ] (1 of 2) A start job is running for…ress
> > > > polling (1min 53s / no limit)
> > > > 
> > > > Kernel command line:
> > > > console=tty1 console=ttySAC2,115200n8
> > > > ip=192.168.1.11:192.168.1.10:192.168.1.1:255.255.255.0::eth0:none
> > > > nfsrootdebug root=/dev/nfs
> > > > nfsroot=192.168.1.10:/srv/nfs/odroidxu3,vers=4,nolock rootwait rw
> > > > smsc95xx.macaddr=00:1e:06:61:7a:93 no_console_suspend
> > > > 
> > > > /home is /dev/mmcblk1p2:
> > > > kozik@odroidxu3:~$ tune2fs -l /dev/mmcblk1p2
> > > > tune2fs 1.43.7 (16-Oct-2017)
> > > > Filesystem volume name:   home
> > > > Last mounted on:          /home
> > > > Filesystem UUID:          3f9dbeba-2738-45d3-807e-c1b2e21128ed
> > > > Filesystem magic number:  0xEF53
> > > > Filesystem revision #:    1 (dynamic)
> > > > Filesystem features:      has_journal ext_attr resize_inode dir_index
> > > > filetype needs_recovery extent flex_bg sparse_super large_file
> > > > uninit_bg dir_nlink extra_isize
> > > > Filesystem flags:         signed_directory_hash
> > > > Default mount options:    user_xattr acl
> > > > Filesystem state:         clean
> > > > Errors behavior:          Continue
> > > > Filesystem OS type:       Linux
> > > > Inode count:              1430800
> > > > Block count:              5717760
> > > > Reserved block count:     285888
> > > > Free blocks:              5467576
> > > > Free inodes:              1428301
> > > > First block:              0
> > > > Block size:               4096
> > > > Fragment size:            4096
> > > > Reserved GDT blocks:      1022
> > > > Blocks per group:         32768
> > > > Fragments per group:      32768
> > > > Inodes per group:         8176
> > > > Inode blocks per group:   511
> > > > Flex block group size:    16
> > > > Filesystem created:       Thu May 21 12:17:05 2015
> > > > Last mount time:          Thu Dec 21 13:31:26 2017
> > > > Last write time:          Thu Dec 21 13:31:26 2017
> > > > Mount count:              1
> > > > Maximum mount count:      -1
> > > > Last checked:             Thu Dec 21 13:31:25 2017
> > > > Check interval:           0 (<none>)
> > > > Lifetime writes:          126 GB
> > > > Reserved blocks uid:      0 (user root)
> > > > Reserved blocks gid:      0 (group root)
> > > > First inode:              11
> > > > Inode size:           256
> > > > Required extra isize:     28
> > > > Desired extra isize:      28
> > > > Journal inode:            8
> > > > Default directory hash:   half_md4
> > > > Directory Hash Seed:      42e17e23-86b2-4356-ad63-78aa51651d03
> > > > Journal backup:           inode blocks
> > > > 
> > > > 
> > > > Full dmesg log:
> > > > http://www.krzk.eu/#/builders/1/builds/1258/steps/10/logs/serial0
> > > > 
> > > > The regular boot from rootfs on SD card also fails - but without any
> > > > serial console logs (just "Starting kernel...") so the real cause is
> > > > unknown.
> > > > 
> > > > Any hints?
> > > > 
> > > > Best regards,
> > > > Krzysztof
> > > > 
> > > > 
> > > > bisect log:
> > > > # bad: [73005e1a35fd67c644b0645c9e4c1efabd0fe62c] Add linux-next
> > > > specific files for 20180103
> > > > # good: [30a7acd573899fd8b8ac39236eff6468b195ac7d] Linux 4.15-rc6
> > > > git bisect start 'next/master' 'next/stable'
> > > > # bad: [c1d290f9ce8daa2b0a79d2fe48c1b7c3c5370f5a] Merge
> > > > remote-tracking branch 'crypto/master'
> > > > git bisect bad c1d290f9ce8daa2b0a79d2fe48c1b7c3c5370f5a
> > > > # bad: [55695d94f0915121d106cd2d1ab94983a32f3e9a] Merge
> > > > remote-tracking branch 'hid/for-next'
> > > > git bisect bad 55695d94f0915121d106cd2d1ab94983a32f3e9a
> > > > # good: [cffae1eead0dd91be1a3069a8348127bb00158f3] Merge
> > > > remote-tracking branch 'realtek/for-next'
> > > > git bisect good cffae1eead0dd91be1a3069a8348127bb00158f3
> > > > # good: [5f889f1176dc99636c6bf8af7c286decc888c007] Merge
> > > > remote-tracking branch 'btrfs/next'
> > > > git bisect good 5f889f1176dc99636c6bf8af7c286decc888c007
> > > > # good: [984c35877f36bee305e43a1c58176169854d85cf] Merge
> > > > remote-tracking branch 'xfs/for-next'
> > > > git bisect good 984c35877f36bee305e43a1c58176169854d85cf
> > > > # bad: [f9fec502daea2a869232b6dff33ba3de79dd0d61] Merge
> > > > remote-tracking branch 'printk/for-next'
> > > > git bisect bad f9fec502daea2a869232b6dff33ba3de79dd0d61
> > > > # good: [c71d227fc4133f949dae620ed5e3a250b43f2415] make kernel-side
> > > > POLL... arch-independent
> > > > git bisect good c71d227fc4133f949dae620ed5e3a250b43f2415
> > > > # good: [416d20e8c31107f5dfd45d1d80d1e6c8e4871180] Merge branches
> > > > 'work.get_user_pages_fast', 'work.wmci', 'work.sock_recvmsg',
> > > > 'misc.netdrv', 'misc.poll', 'work.mqueue', 'work.whack-a-mole' and
> > > > 'work.misc' into for-next
> > > > git bisect good 416d20e8c31107f5dfd45d1d80d1e6c8e4871180
> > > > # good: [325a1de4a691512a48c1426b943a7b0b9f8d6744] xfs: convert to new
> > > > i_version API
> > > > git bisect good 325a1de4a691512a48c1426b943a7b0b9f8d6744
> > > > # good: [a94fe10fb114c169e7ddaecd8251521886409121] checkpatch: add
> > > > pF/pf deprecation warning
> > > > git bisect good a94fe10fb114c169e7ddaecd8251521886409121
> > > > # good: [6b3911dffd1184fdcd63299a5fee59ac000f2067] btrfs: only dirty
> > > > the inode in btrfs_update_time if something was changed
> > > > git bisect good 6b3911dffd1184fdcd63299a5fee59ac000f2067
> > > > # bad: [448f8c749a7a0ae03505823910ec45a112678048] Merge
> > > > remote-tracking branch 'iversion/iversion-next'
> > > > git bisect bad 448f8c749a7a0ae03505823910ec45a112678048
> > > > # bad: [8618bff776758ebff5b55211e7b5a60a0fc119a5] fs: handle
> > > > inode->i_version more efficiently
> > > > git bisect bad 8618bff776758ebff5b55211e7b5a60a0fc119a5
> > > > # first bad commit: [8618bff776758ebff5b55211e7b5a60a0fc119a5] fs:
> > > > handle inode->i_version more efficiently
> > > 
> > > That's really strange. I'm afraid I have no idea what could be going on.
> > > 
> > > With NFS, we really just treat i_version as an opaque value, so I'm not
> > > sure how this patch in particular would affect anything there. We _do_
> > > increment it if you have a write delegation in some cases, but not many
> > > servers hand those out.
> > 
> > About the NFS server, Arch Linux on Raspberry Pi (so 32-bit, ARMv6):
> > Linux pi 4.9.70-1-ARCH #1 SMP Mon Dec 18 19:38:00 UTC 2017 armv6l GNU/Linux
> > 
> > The /etc/nfs.conf is default except:
> > [nfsd]
> > vers2=n
> > vers3=n
> > 
> > The client logs for nfsroot mounts are:
> > Jan 08 14:07:25 :: running hook [net_nfs4]
> > Jan 08 14:07:25 IP-Config: eth0 hardware address ba:17:70:7e:87:d1 mtu 1500
> > Jan 08 14:07:25 IP-Config: eth0 guessed broadcast address 192.168.1.255
> > Jan 08 14:07:25 IP-Config: eth0 complete (from 192.168.1.10):
> > Jan 08 14:07:25 address: 192.168.1.11 broadcast: 192.168.1.255
> > netmask: 255.255.255.0
> > Jan 08 14:07:25 gateway: 192.168.1.1 dns0 : 0.0.0.0 dns1 : 0.0.0.0
> > Jan 08 14:07:25 rootserver: 192.168.1.10 rootpath:
> > Jan 08 14:07:25 filename :
> > Jan 08 14:07:25 NFS-Mount: 192.168.1.10:/srv/nfs/odroidxu3
> > Jan 08 14:07:25 Waiting 10 seconds for device /dev/nfs ...
> > Jan 08 14:07:36 ERROR: device '/dev/nfs' not found. Skipping fsck.
> > Jan 08 14:07:36 Mount cmd:
> > Jan 08 14:07:36 /opt/tools/buildbot/arch-arm-bin/mount.nfs4 -o
> > vers=4,nolock 192.168.1.10:/srv/nfs/odroidxu3 /new_root
> > 
> > Only root (/) is froom NFS. The /home comes from sdcard (/dev/mmcblk1).
> > 
> > > ext4 will only touch the i_version field if you mount it with '-o
> > > iversion'. Are you doing that here?
> > 
> > The /home is mounted by systemd from /etc/fstab:
> >       LABEL=home /home ext4 defaults 0 2
> > 
> > > 
> > > Have you run the bisect more than once? Is this maybe an intermittent
> > > problem, and the bisect has landed on the wrong commit?
> > 
> > I just run tests on commits around again (but not full bisect) on
> > current next (next-20180108):
> > fbf97ece47d66d22 - works
> > 3da7bcdd695bae43 ("fs: handle inode->i_version more efficiently") -
> > fails: http://www.krzk.eu/#/builders/1/builds/1267
> > 
> > 
> > Best regards,
> > Krzysztof
> 
> Ok, thanks. If you're seeing hangs then that might imply that we have
> some sort of excessive looping going on in the cmpxchg loops.
> 
> Could you apply the patch below and let me know if it causes either of
> the warnings to pop? That might at least point us in the right
> direction:

No new warnings with attached patch (except existing already lockdep:
"INFO: trying to register non-static key.").

Systemd timeouts on mounting /home but after entering rescue shell there
is no problem running mount /home:
	Give root password for maintenance
	(or press Control-D to continue): 
	root@odroidxu3:~# mount /home
	[  220.659331] EXT4-fs (mmcblk1p2): mounted filesystem with ordered data mode. Opts: (null)

Best regards,
Krzysztof


  reply	other threads:[~2018-01-08 17:29 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-22 12:05 [PATCH v4 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-22 12:05 ` [PATCH v4 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-22 23:14   ` NeilBrown
2017-12-22 23:54     ` Jeff Layton
2018-01-02 17:01       ` Jan Kara
2017-12-25 14:50   ` Jeff Layton
2017-12-22 12:05 ` [PATCH v4 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-22 12:05 ` [PATCH v4 03/19] fat: convert to new i_version API Jeff Layton
2017-12-22 12:05 ` [PATCH v4 04/19] affs: " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 05/19] afs: " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 06/19] btrfs: " Jeff Layton
2018-01-08 17:59   ` David Sterba
2017-12-22 12:05 ` [PATCH v4 07/19] exofs: switch " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 08/19] ext2: convert " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 09/19] ext4: " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 10/19] nfs: " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 11/19] nfsd: " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 12/19] ocfs2: " Jeff Layton
2018-01-04 13:34   ` Jeff Layton
2017-12-22 12:05 ` [PATCH v4 13/19] ufs: use " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 14/19] xfs: convert to " Jeff Layton
2017-12-23  0:05   ` Darrick J. Wong
2017-12-22 12:05 ` [PATCH v4 15/19] IMA: switch IMA over " Jeff Layton
2017-12-22 12:05 ` [PATCH v4 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2018-01-02 16:50   ` Jan Kara
2018-01-02 19:03     ` Jeff Layton
2017-12-22 12:05 ` [PATCH v4 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-23  0:07   ` Darrick J. Wong
2017-12-22 12:05 ` [PATCH v4 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2018-01-08 17:59   ` David Sterba
2017-12-22 12:05 ` [PATCH v4 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2018-01-02 17:00   ` Jan Kara
2018-01-07 12:44   ` Krzysztof Kozlowski
2018-01-08 12:56     ` Jeff Layton
2018-01-08 13:21       ` Krzysztof Kozlowski
2018-01-08 13:29         ` Jeff Layton
2018-01-08 17:29           ` Krzysztof Kozlowski [this message]
2018-01-08 18:00             ` Jeff Layton
2018-01-08 18:33               ` Krzysztof Kozlowski
2018-01-08 19:15                 ` Jeff Layton
2018-01-08 20:05                   ` Jeff Layton
2018-01-08 20:17                   ` Krzysztof Kozlowski
2018-01-08 21:39                     ` Jeff Layton
2018-01-09  9:27                       ` Krzysztof Kozlowski
2018-01-08 13:30   ` Matthew Wilcox
2018-01-08 13:46     ` Jeff Layton
2018-01-02 17:20 ` [PATCH v4 05/19] afs: convert to new i_version API David Howells
2018-01-02 18:57   ` Jeff Layton
2018-01-03 16:28   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180108172928.aqndozcpljb73uj6@kozik-lap \
    --to=krzk@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=b.zolnierkie@samsung.com \
    --cc=bfields@fieldses.org \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=dmitry.kasatkin@gmail.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.de \
    --cc=jaltman@auristor.com \
    --cc=jbacik@fb.com \
    --cc=jlayton@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=neilb@suse.de \
    --cc=s.nawrocki@samsung.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).