* should we make "-o iversion" the default on ext4 ? @ 2022-07-19 13:51 Jeff Layton 2022-07-20 14:15 ` Lukas Czerner 2022-07-21 22:32 ` Dave Chinner 0 siblings, 2 replies; 14+ messages in thread From: Jeff Layton @ 2022-07-19 13:51 UTC (permalink / raw) To: tytso, adilger.kernel Cc: linux-ext4, linux-fsdevel, lczerner, Benjamin Coddington Back in 2018, I did a patchset [1] to rework the inode->i_version counter handling to be much less expensive, particularly when no-one is querying for it. Testing at the time showed that the cost of enabling i_version on ext4 was close to 0 when nothing is querying it, but I stopped short of trying to make it the default at the time (mostly out of an abundance of caution). Since then, we still see a steady stream of cache-coherency problems with NFSv4 on ext4 when this option is disabled (e.g. [2]). Is it time to go ahead and make this option the default on ext4? I don't see a real downside to doing so, though I'm unclear on how we should approach this. Currently the option is twiddled using MS_I_VERSION flag, and it's unclear to me how we can reverse the sense of such a flag. Thoughts? [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a4b7fd7d34de5765dece2dd08060d2e1f7be3b39 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=2107587 -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-19 13:51 should we make "-o iversion" the default on ext4 ? Jeff Layton @ 2022-07-20 14:15 ` Lukas Czerner 2022-07-20 14:38 ` Jeff Layton 2022-07-21 22:32 ` Dave Chinner 1 sibling, 1 reply; 14+ messages in thread From: Lukas Czerner @ 2022-07-20 14:15 UTC (permalink / raw) To: Jeff Layton Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Tue, Jul 19, 2022 at 09:51:33AM -0400, Jeff Layton wrote: > Back in 2018, I did a patchset [1] to rework the inode->i_version > counter handling to be much less expensive, particularly when no-one is > querying for it. > > Testing at the time showed that the cost of enabling i_version on ext4 > was close to 0 when nothing is querying it, but I stopped short of > trying to make it the default at the time (mostly out of an abundance of > caution). Since then, we still see a steady stream of cache-coherency > problems with NFSv4 on ext4 when this option is disabled (e.g. [2]). > > Is it time to go ahead and make this option the default on ext4? I don't > see a real downside to doing so, though I'm unclear on how we should > approach this. Currently the option is twiddled using MS_I_VERSION flag, > and it's unclear to me how we can reverse the sense of such a flag. > > Thoughts? > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a4b7fd7d34de5765dece2dd08060d2e1f7be3b39 > [2]: https://bugzilla.redhat.com/show_bug.cgi?id=2107587 Hi, I don't have the results myself yet, but a quick look at how it is done suggests that indeed the impact should be low. But not zero, at least every time the inode is loaded from disk it is scheduled for i_version update on the next attempted increment. Could that have an effect on some particular common workload you can think of? How would we approach making iversion a default? libmount is passing this option to the kernel as just a MS_I_VERSION flag that is set when -o iversion is used and left empty when the -o noiversion is used. This means that while we could make it a default in ext4, we don't have any way of knowing whether the user asked for -o noiversion. So that's not really an option. Updating the mke2fs/tune2fs to allow setting iversion as a default mount option I think has the same problem. So the only way I can see ATM would be to introduce another mountflag for libmount to indicate -o noiversion. This way we can make iversion a default on ext4 without loosing the information about user provided -o noiversion option. Is there a different way I am not seeing? If we can do reasonably extensive testing that will indeed show negligible impact when nothing is querying i_version, then I would be in favor of the change. Thanks! -Lukas > > -- > Jeff Layton <jlayton@kernel.org> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 14:15 ` Lukas Czerner @ 2022-07-20 14:38 ` Jeff Layton 2022-07-20 15:22 ` Lukas Czerner 2022-07-20 15:56 ` Benjamin Coddington 0 siblings, 2 replies; 14+ messages in thread From: Jeff Layton @ 2022-07-20 14:38 UTC (permalink / raw) To: Lukas Czerner Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: > On Tue, Jul 19, 2022 at 09:51:33AM -0400, Jeff Layton wrote: > > Back in 2018, I did a patchset [1] to rework the inode->i_version > > counter handling to be much less expensive, particularly when no-one is > > querying for it. > > > > Testing at the time showed that the cost of enabling i_version on ext4 > > was close to 0 when nothing is querying it, but I stopped short of > > trying to make it the default at the time (mostly out of an abundance of > > caution). Since then, we still see a steady stream of cache-coherency > > problems with NFSv4 on ext4 when this option is disabled (e.g. [2]). > > > > Is it time to go ahead and make this option the default on ext4? I don't > > see a real downside to doing so, though I'm unclear on how we should > > approach this. Currently the option is twiddled using MS_I_VERSION flag, > > and it's unclear to me how we can reverse the sense of such a flag. > > > > Thoughts? > > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a4b7fd7d34de5765dece2dd08060d2e1f7be3b39 > > [2]: https://bugzilla.redhat.com/show_bug.cgi?id=2107587 > > Hi, > > I don't have the results myself yet, but a quick look at how it is done > suggests that indeed the impact should be low. But not zero, at least > every time the inode is loaded from disk it is scheduled for i_version > update on the next attempted increment. Could that have an effect on > some particular common workload you can think of? > Yeah, it's not zero, but hopefully any performance hit would end up amortized over the long term use of the inode. In the common situation where the i_version flag hasn't been queried, this just ends up being an extra atomic fetch to look at i_version and detect that. > How would we approach making iversion a default? libmount is passing > this option to the kernel as just a MS_I_VERSION flag that is set when > -o iversion is used and left empty when the -o noiversion is used. This > means that while we could make it a default in ext4, we don't have any > way of knowing whether the user asked for -o noiversion. So that's not > really an option. > > Updating the mke2fs/tune2fs to allow setting iversion as a default mount > option I think has the same problem. > > So the only way I can see ATM would be to introduce another mountflag > for libmount to indicate -o noiversion. This way we can make iversion a > default on ext4 without loosing the information about user provided -o > noiversion option. > > Is there a different way I am not seeing? > Right, implementing this is the difficult bit actually since this uses a MS_* flag. If we do make this the default, we'd definitely want to continue allowing "-o noiversion" to disable it. Could we just reverse the default in libmount? It might cause this to suddenly be enabled in some deployments, but in most cases, people wouldn't even notice and they could still specify -o noiversion to turn it off. Another idea would be to introduce new mount options for this, but that's kind of nasty from a UI standpoint. > > If we can do reasonably extensive testing that will indeed show > negligible impact when nothing is querying i_version, then I would be in > favor of the change. > Excellent! I think that would be best if we can get away with it. A lot of people are currently running ext4-backed nfs servers and aren't using that mount option. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 14:38 ` Jeff Layton @ 2022-07-20 15:22 ` Lukas Czerner 2022-07-20 16:42 ` Jeff Layton 2022-07-20 15:56 ` Benjamin Coddington 1 sibling, 1 reply; 14+ messages in thread From: Lukas Czerner @ 2022-07-20 15:22 UTC (permalink / raw) To: Jeff Layton Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Wed, Jul 20, 2022 at 10:38:31AM -0400, Jeff Layton wrote: > On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: --snip-- > > How would we approach making iversion a default? libmount is passing > > this option to the kernel as just a MS_I_VERSION flag that is set when > > -o iversion is used and left empty when the -o noiversion is used. This > > means that while we could make it a default in ext4, we don't have any > > way of knowing whether the user asked for -o noiversion. So that's not > > really an option. > > > > Updating the mke2fs/tune2fs to allow setting iversion as a default mount > > option I think has the same problem. > > > > So the only way I can see ATM would be to introduce another mountflag > > for libmount to indicate -o noiversion. This way we can make iversion a > > default on ext4 without loosing the information about user provided -o > > noiversion option. > > > > Is there a different way I am not seeing? > > > > Right, implementing this is the difficult bit actually since this uses a > MS_* flag. If we do make this the default, we'd definitely want to > continue allowing "-o noiversion" to disable it. > > Could we just reverse the default in libmount? It might cause this to > suddenly be enabled in some deployments, but in most cases, people > wouldn't even notice and they could still specify -o noiversion to turn > it off. Can be done, but that would change the default for everyone. Not sure if that desirable. Also I can image this being a bit confusing. I still think the best approach would be to introduce another MS_ flag for noiversion case. I think there is precedence in the case of MS_STRICTATIME - not exactly the same but similar enough. > Another idea would be to introduce new mount options for this, but > that's kind of nasty from a UI standpoint. > > > > > If we can do reasonably extensive testing that will indeed show > > negligible impact when nothing is querying i_version, then I would be in > > favor of the change. > > > > Excellent! I think that would be best if we can get away with it. A lot > of people are currently running ext4-backed nfs servers and aren't using > that mount option. Could you provide some performance numbers for iversion case? Thanks! -Lukas > -- > Jeff Layton <jlayton@kernel.org> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 15:22 ` Lukas Czerner @ 2022-07-20 16:42 ` Jeff Layton 2022-07-21 14:06 ` Lukas Czerner 0 siblings, 1 reply; 14+ messages in thread From: Jeff Layton @ 2022-07-20 16:42 UTC (permalink / raw) To: Lukas Czerner Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Wed, 2022-07-20 at 17:22 +0200, Lukas Czerner wrote: > But not zero, at least > every time the inode is loaded from disk it is scheduled for i_version > update on the next attempted increment. Could that have an effect on > some particular common workload you can think of? > FWIW, it's doubtful that you'd even notice this. You'd almost certainly be updating the mtime or ctime on the next change anyway, so updating the i_version in that case is basically free. You will probably need to do some a few extra atomic in-memory operations, but that's probably not noticeable in something I/O constrained. > > Could you provide some performance numbers for iversion case? > I'm writing to a LVM volume on a no-name-brand ssd I have sitting around. fio jobfile is here: [global] name=fio-seq-write filename=fio-seq-write rw=write bs=4k direct=0 numjobs=1 time_based runtime=300 [file1] size=1G ioengine=libaio iodepth=16 iversion support disabled: $ fio ./4k-write.fio file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 fio-3.27 Starting 1 process file1: Laying out IO file (1 file / 1024MiB) Jobs: 1 (f=1): [W(1)][100.0%][w=52.5MiB/s][w=13.4k IOPS][eta 00m:00s] file1: (groupid=0, jobs=1): err= 0: pid=10056: Wed Jul 20 12:28:21 2022 write: IOPS=96.3k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets slat (nsec): min=1112, max=5727.5k, avg=1917.70, stdev=1300.30 clat (nsec): min=1112, max=2146.5M, avg=156067.38, stdev=15568002.13 lat (usec): min=3, max=2146.5k, avg=158.03, stdev=15568.00 clat percentiles (usec): | 1.00th=[ 36], 5.00th=[ 36], 10.00th=[ 37], 20.00th=[ 37], | 30.00th=[ 38], 40.00th=[ 38], 50.00th=[ 38], 60.00th=[ 39], | 70.00th=[ 39], 80.00th=[ 40], 90.00th=[ 42], 95.00th=[ 44], | 99.00th=[ 52], 99.50th=[ 59], 99.90th=[ 77], 99.95th=[ 88], | 99.99th=[ 169] bw ( KiB/s): min=15664, max=1599456, per=100.00%, avg=897761.07, stdev=504329.17, samples=257 iops : min= 3916, max=399864, avg=224440.26, stdev=126082.33, samples=257 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.80% lat (usec) : 100=1.18%, 250=0.02%, 500=0.01% lat (msec) : 10=0.01%, 2000=0.01%, >=2000=0.01% cpu : usr=5.45%, sys=23.92%, ctx=78418, majf=0, minf=14 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,28889786,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec Disk stats (read/write): dm-7: ios=0/22878, merge=0/0, ticks=0/373254, in_queue=373254, util=43.89%, aggrios=0/99746, aggrmerge=0/9246, aggrticks=0/1406831, aggrin_queue=1408420, aggrutil=73.56% sda: ios=0/99746, merge=0/9246, ticks=0/1406831, in_queue=1408420, util=73.56% mounted with -o iversion: $ fio ./4k-write.fio file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 fio-3.27 Starting 1 process Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s] file1: (groupid=0, jobs=1): err= 0: pid=10369: Wed Jul 20 12:33:57 2022 write: IOPS=96.2k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets slat (nsec): min=1112, max=1861.5k, avg=1994.58, stdev=890.78 clat (nsec): min=1392, max=2113.3M, avg=156252.71, stdev=15409487.99 lat (usec): min=3, max=2113.3k, avg=158.30, stdev=15409.49 clat percentiles (usec): | 1.00th=[ 37], 5.00th=[ 38], 10.00th=[ 38], 20.00th=[ 38], | 30.00th=[ 39], 40.00th=[ 39], 50.00th=[ 40], 60.00th=[ 40], | 70.00th=[ 41], 80.00th=[ 42], 90.00th=[ 43], 95.00th=[ 45], | 99.00th=[ 53], 99.50th=[ 60], 99.90th=[ 79], 99.95th=[ 90], | 99.99th=[ 174] bw ( KiB/s): min= 304, max=1540000, per=100.00%, avg=870727.42, stdev=499371.78, samples=265 iops : min= 76, max=385000, avg=217681.82, stdev=124842.94, samples=265 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.49% lat (usec) : 100=1.48%, 250=0.02%, 500=0.01% lat (msec) : 2=0.01%, 2000=0.01%, >=2000=0.01% cpu : usr=5.71%, sys=24.49%, ctx=52874, majf=0, minf=18 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,28856695,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec Disk stats (read/write): dm-7: ios=1/16758, merge=0/0, ticks=2/341817, in_queue=341819, util=47.93%, aggrios=1/98153, aggrmerge=0/5691, aggrticks=2/1399496, aggrin_queue=1400893, aggrutil=73.42% sda: ios=1/98153, merge=0/5691, ticks=2/1399496, in_queue=1400893, util=73.42% -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 16:42 ` Jeff Layton @ 2022-07-21 14:06 ` Lukas Czerner 2022-07-21 17:03 ` Jeff Layton 0 siblings, 1 reply; 14+ messages in thread From: Lukas Czerner @ 2022-07-21 14:06 UTC (permalink / raw) To: Jeff Layton Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Wed, Jul 20, 2022 at 12:42:11PM -0400, Jeff Layton wrote: > On Wed, 2022-07-20 at 17:22 +0200, Lukas Czerner wrote: > > > But not zero, at least > > every time the inode is loaded from disk it is scheduled for i_version > > update on the next attempted increment. Could that have an effect on > > some particular common workload you can think of? > > > > FWIW, it's doubtful that you'd even notice this. You'd almost certainly > be updating the mtime or ctime on the next change anyway, so updating > the i_version in that case is basically free. You will probably need to > do some a few extra atomic in-memory operations, but that's probably not > noticeable in something I/O constrained. > > > > > Could you provide some performance numbers for iversion case? > > > > I'm writing to a LVM volume on a no-name-brand ssd I have sitting > around. fio jobfile is here: That's very simplistic test, but fair enough. I've ran 10 iterations of xfstests with and without iversion and there is no significant difference, in fact it's all well within run by run variation. That's true in aggregate as well for individual tests. However there are problems to solve before we attempt to make it a default. With -o iversion ext4/026 and generic/622 fails. The ext4/026 seems to be a real bug and I am not sure about the other one yet. I'll look into it. -Lukas > > [global] > name=fio-seq-write > filename=fio-seq-write > rw=write > bs=4k > direct=0 > numjobs=1 > time_based > runtime=300 > > [file1] > size=1G > ioengine=libaio > iodepth=16 > > iversion support disabled: > > $ fio ./4k-write.fio > file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 > fio-3.27 > Starting 1 process > file1: Laying out IO file (1 file / 1024MiB) > Jobs: 1 (f=1): [W(1)][100.0%][w=52.5MiB/s][w=13.4k IOPS][eta 00m:00s] > file1: (groupid=0, jobs=1): err= 0: pid=10056: Wed Jul 20 12:28:21 2022 > write: IOPS=96.3k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets > slat (nsec): min=1112, max=5727.5k, avg=1917.70, stdev=1300.30 > clat (nsec): min=1112, max=2146.5M, avg=156067.38, stdev=15568002.13 > lat (usec): min=3, max=2146.5k, avg=158.03, stdev=15568.00 > clat percentiles (usec): > | 1.00th=[ 36], 5.00th=[ 36], 10.00th=[ 37], 20.00th=[ 37], > | 30.00th=[ 38], 40.00th=[ 38], 50.00th=[ 38], 60.00th=[ 39], > | 70.00th=[ 39], 80.00th=[ 40], 90.00th=[ 42], 95.00th=[ 44], > | 99.00th=[ 52], 99.50th=[ 59], 99.90th=[ 77], 99.95th=[ 88], > | 99.99th=[ 169] > bw ( KiB/s): min=15664, max=1599456, per=100.00%, avg=897761.07, stdev=504329.17, samples=257 > iops : min= 3916, max=399864, avg=224440.26, stdev=126082.33, samples=257 > lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.80% > lat (usec) : 100=1.18%, 250=0.02%, 500=0.01% > lat (msec) : 10=0.01%, 2000=0.01%, >=2000=0.01% > cpu : usr=5.45%, sys=23.92%, ctx=78418, majf=0, minf=14 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued rwts: total=0,28889786,0,0 short=0,0,0,0 dropped=0,0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=16 > > Run status group 0 (all jobs): > WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec > > Disk stats (read/write): > dm-7: ios=0/22878, merge=0/0, ticks=0/373254, in_queue=373254, util=43.89%, aggrios=0/99746, aggrmerge=0/9246, aggrticks=0/1406831, aggrin_queue=1408420, aggrutil=73.56% > sda: ios=0/99746, merge=0/9246, ticks=0/1406831, in_queue=1408420, util=73.56% > > mounted with -o iversion: > > $ fio ./4k-write.fio > file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 > fio-3.27 > Starting 1 process > Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s] > file1: (groupid=0, jobs=1): err= 0: pid=10369: Wed Jul 20 12:33:57 2022 > write: IOPS=96.2k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets > slat (nsec): min=1112, max=1861.5k, avg=1994.58, stdev=890.78 > clat (nsec): min=1392, max=2113.3M, avg=156252.71, stdev=15409487.99 > lat (usec): min=3, max=2113.3k, avg=158.30, stdev=15409.49 > clat percentiles (usec): > | 1.00th=[ 37], 5.00th=[ 38], 10.00th=[ 38], 20.00th=[ 38], > | 30.00th=[ 39], 40.00th=[ 39], 50.00th=[ 40], 60.00th=[ 40], > | 70.00th=[ 41], 80.00th=[ 42], 90.00th=[ 43], 95.00th=[ 45], > | 99.00th=[ 53], 99.50th=[ 60], 99.90th=[ 79], 99.95th=[ 90], > | 99.99th=[ 174] > bw ( KiB/s): min= 304, max=1540000, per=100.00%, avg=870727.42, stdev=499371.78, samples=265 > iops : min= 76, max=385000, avg=217681.82, stdev=124842.94, samples=265 > lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.49% > lat (usec) : 100=1.48%, 250=0.02%, 500=0.01% > lat (msec) : 2=0.01%, 2000=0.01%, >=2000=0.01% > cpu : usr=5.71%, sys=24.49%, ctx=52874, majf=0, minf=18 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued rwts: total=0,28856695,0,0 short=0,0,0,0 dropped=0,0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=16 > > Run status group 0 (all jobs): > WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec > > Disk stats (read/write): > dm-7: ios=1/16758, merge=0/0, ticks=2/341817, in_queue=341819, util=47.93%, aggrios=1/98153, aggrmerge=0/5691, aggrticks=2/1399496, aggrin_queue=1400893, aggrutil=73.42% > sda: ios=1/98153, merge=0/5691, ticks=2/1399496, in_queue=1400893, util=73.42% > > -- > Jeff Layton <jlayton@kernel.org> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-21 14:06 ` Lukas Czerner @ 2022-07-21 17:03 ` Jeff Layton 0 siblings, 0 replies; 14+ messages in thread From: Jeff Layton @ 2022-07-21 17:03 UTC (permalink / raw) To: Lukas Czerner Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, Benjamin Coddington On Thu, 2022-07-21 at 16:06 +0200, Lukas Czerner wrote: > On Wed, Jul 20, 2022 at 12:42:11PM -0400, Jeff Layton wrote: > > On Wed, 2022-07-20 at 17:22 +0200, Lukas Czerner wrote: > > > > > But not zero, at least > > > every time the inode is loaded from disk it is scheduled for i_version > > > update on the next attempted increment. Could that have an effect on > > > some particular common workload you can think of? > > > > > > > FWIW, it's doubtful that you'd even notice this. You'd almost certainly > > be updating the mtime or ctime on the next change anyway, so updating > > the i_version in that case is basically free. You will probably need to > > do some a few extra atomic in-memory operations, but that's probably not > > noticeable in something I/O constrained. > > > > > > > > Could you provide some performance numbers for iversion case? > > > > > > > I'm writing to a LVM volume on a no-name-brand ssd I have sitting > > around. fio jobfile is here: > > That's very simplistic test, but fair enough. I've ran 10 iterations of > xfstests with and without iversion and there is no significant > difference, in fact it's all well within run by run variation. That's > true in aggregate as well for individual tests. > Yeah. This change was most evident with small I/O sizes, so if there is an effect here it'll likely show up there. > However there are problems to solve before we attempt to make it a > default. With -o iversion ext4/026 and generic/622 fails. The ext4/026 > seems to be a real bug and I am not sure about the other one yet. > > I'll look into it. > Interesting, thanks. Lack of testing with that option enabled is probably another good reason to go ahead and make it the default. Let me know what you find. > -Lukas > > > > > [global] > > name=fio-seq-write > > filename=fio-seq-write > > rw=write > > bs=4k > > direct=0 > > numjobs=1 > > time_based > > runtime=300 > > > > [file1] > > size=1G > > ioengine=libaio > > iodepth=16 > > > > iversion support disabled: > > > > $ fio ./4k-write.fio > > file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 > > fio-3.27 > > Starting 1 process > > file1: Laying out IO file (1 file / 1024MiB) > > Jobs: 1 (f=1): [W(1)][100.0%][w=52.5MiB/s][w=13.4k IOPS][eta 00m:00s] > > file1: (groupid=0, jobs=1): err= 0: pid=10056: Wed Jul 20 12:28:21 2022 > > write: IOPS=96.3k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets > > slat (nsec): min=1112, max=5727.5k, avg=1917.70, stdev=1300.30 > > clat (nsec): min=1112, max=2146.5M, avg=156067.38, stdev=15568002.13 > > lat (usec): min=3, max=2146.5k, avg=158.03, stdev=15568.00 > > clat percentiles (usec): > > | 1.00th=[ 36], 5.00th=[ 36], 10.00th=[ 37], 20.00th=[ 37], > > | 30.00th=[ 38], 40.00th=[ 38], 50.00th=[ 38], 60.00th=[ 39], > > | 70.00th=[ 39], 80.00th=[ 40], 90.00th=[ 42], 95.00th=[ 44], > > | 99.00th=[ 52], 99.50th=[ 59], 99.90th=[ 77], 99.95th=[ 88], > > | 99.99th=[ 169] > > bw ( KiB/s): min=15664, max=1599456, per=100.00%, avg=897761.07, stdev=504329.17, samples=257 > > iops : min= 3916, max=399864, avg=224440.26, stdev=126082.33, samples=257 > > lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.80% > > lat (usec) : 100=1.18%, 250=0.02%, 500=0.01% > > lat (msec) : 10=0.01%, 2000=0.01%, >=2000=0.01% > > cpu : usr=5.45%, sys=23.92%, ctx=78418, majf=0, minf=14 > > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > > issued rwts: total=0,28889786,0,0 short=0,0,0,0 dropped=0,0,0,0 > > latency : target=0, window=0, percentile=100.00%, depth=16 > > > > Run status group 0 (all jobs): > > WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec > > > > Disk stats (read/write): > > dm-7: ios=0/22878, merge=0/0, ticks=0/373254, in_queue=373254, util=43.89%, aggrios=0/99746, aggrmerge=0/9246, aggrticks=0/1406831, aggrin_queue=1408420, aggrutil=73.56% > > sda: ios=0/99746, merge=0/9246, ticks=0/1406831, in_queue=1408420, util=73.56% > > > > mounted with -o iversion: > > > > $ fio ./4k-write.fio > > file1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16 > > fio-3.27 > > Starting 1 process > > Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s] > > file1: (groupid=0, jobs=1): err= 0: pid=10369: Wed Jul 20 12:33:57 2022 > > write: IOPS=96.2k, BW=376MiB/s (394MB/s)(110GiB/300001msec); 0 zone resets > > slat (nsec): min=1112, max=1861.5k, avg=1994.58, stdev=890.78 > > clat (nsec): min=1392, max=2113.3M, avg=156252.71, stdev=15409487.99 > > lat (usec): min=3, max=2113.3k, avg=158.30, stdev=15409.49 > > clat percentiles (usec): > > | 1.00th=[ 37], 5.00th=[ 38], 10.00th=[ 38], 20.00th=[ 38], > > | 30.00th=[ 39], 40.00th=[ 39], 50.00th=[ 40], 60.00th=[ 40], > > | 70.00th=[ 41], 80.00th=[ 42], 90.00th=[ 43], 95.00th=[ 45], > > | 99.00th=[ 53], 99.50th=[ 60], 99.90th=[ 79], 99.95th=[ 90], > > | 99.99th=[ 174] > > bw ( KiB/s): min= 304, max=1540000, per=100.00%, avg=870727.42, stdev=499371.78, samples=265 > > iops : min= 76, max=385000, avg=217681.82, stdev=124842.94, samples=265 > > lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=98.49% > > lat (usec) : 100=1.48%, 250=0.02%, 500=0.01% > > lat (msec) : 2=0.01%, 2000=0.01%, >=2000=0.01% > > cpu : usr=5.71%, sys=24.49%, ctx=52874, majf=0, minf=18 > > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > > issued rwts: total=0,28856695,0,0 short=0,0,0,0 dropped=0,0,0,0 > > latency : target=0, window=0, percentile=100.00%, depth=16 > > > > Run status group 0 (all jobs): > > WRITE: bw=376MiB/s (394MB/s), 376MiB/s-376MiB/s (394MB/s-394MB/s), io=110GiB (118GB), run=300001-300001msec > > > > Disk stats (read/write): > > dm-7: ios=1/16758, merge=0/0, ticks=2/341817, in_queue=341819, util=47.93%, aggrios=1/98153, aggrmerge=0/5691, aggrticks=2/1399496, aggrin_queue=1400893, aggrutil=73.42% > > sda: ios=1/98153, merge=0/5691, ticks=2/1399496, in_queue=1400893, util=73.42% > > > > -- > > Jeff Layton <jlayton@kernel.org> > > > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 14:38 ` Jeff Layton 2022-07-20 15:22 ` Lukas Czerner @ 2022-07-20 15:56 ` Benjamin Coddington 2022-07-20 16:06 ` Christoph Hellwig 2022-07-20 16:15 ` Jeff Layton 1 sibling, 2 replies; 14+ messages in thread From: Benjamin Coddington @ 2022-07-20 15:56 UTC (permalink / raw) To: Jeff Layton Cc: Lukas Czerner, tytso, adilger.kernel, linux-ext4, linux-fsdevel On 20 Jul 2022, at 10:38, Jeff Layton wrote: > On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: >> >> Is there a different way I am not seeing? >> > > Right, implementing this is the difficult bit actually since this uses a > MS_* flag. If we do make this the default, we'd definitely want to > continue allowing "-o noiversion" to disable it. > > Could we just reverse the default in libmount? It might cause this to > suddenly be enabled in some deployments, but in most cases, people > wouldn't even notice and they could still specify -o noiversion to turn > it off. > > Another idea would be to introduce new mount options for this, but > that's kind of nasty from a UI standpoint. Is it safe to set SB_I_VERSION at export time? If so, export_operations could grow an ->enable_iversion(). Ben ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 15:56 ` Benjamin Coddington @ 2022-07-20 16:06 ` Christoph Hellwig 2022-07-20 16:15 ` Jeff Layton 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2022-07-20 16:06 UTC (permalink / raw) To: Benjamin Coddington Cc: Jeff Layton, Lukas Czerner, tytso, adilger.kernel, linux-ext4, linux-fsdevel On Wed, Jul 20, 2022 at 11:56:12AM -0400, Benjamin Coddington wrote: > > Another idea would be to introduce new mount options for this, but > > that's kind of nasty from a UI standpoint. > > Is it safe to set SB_I_VERSION at export time? If so, export_operations > could grow an ->enable_iversion(). No, it's not. Think of unexporting, modifying thing and re-exporting and now the client will see the same change counter for a modified file. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 15:56 ` Benjamin Coddington 2022-07-20 16:06 ` Christoph Hellwig @ 2022-07-20 16:15 ` Jeff Layton 2022-07-20 16:29 ` Benjamin Coddington 1 sibling, 1 reply; 14+ messages in thread From: Jeff Layton @ 2022-07-20 16:15 UTC (permalink / raw) To: Benjamin Coddington Cc: Lukas Czerner, tytso, adilger.kernel, linux-ext4, linux-fsdevel On Wed, 2022-07-20 at 11:56 -0400, Benjamin Coddington wrote: > On 20 Jul 2022, at 10:38, Jeff Layton wrote: > > On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: > > > > > > Is there a different way I am not seeing? > > > > > > > Right, implementing this is the difficult bit actually since this uses a > > MS_* flag. If we do make this the default, we'd definitely want to > > continue allowing "-o noiversion" to disable it. > > > > Could we just reverse the default in libmount? It might cause this to > > suddenly be enabled in some deployments, but in most cases, people > > wouldn't even notice and they could still specify -o noiversion to turn > > it off. > > > > Another idea would be to introduce new mount options for this, but > > that's kind of nasty from a UI standpoint. > > Is it safe to set SB_I_VERSION at export time? If so, export_operations > could grow an ->enable_iversion(). > That sounds like it might be problematic. Consider the case where a NFSv4 client has cached file data and the change attribute for the file. Server then reboots, but before the export happens a local user makes a change to the file and it doesn't update the i_version. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 16:15 ` Jeff Layton @ 2022-07-20 16:29 ` Benjamin Coddington 2022-07-20 16:46 ` Jeff Layton 0 siblings, 1 reply; 14+ messages in thread From: Benjamin Coddington @ 2022-07-20 16:29 UTC (permalink / raw) To: Jeff Layton Cc: Lukas Czerner, tytso, adilger.kernel, linux-ext4, linux-fsdevel On 20 Jul 2022, at 12:15, Jeff Layton wrote: > On Wed, 2022-07-20 at 11:56 -0400, Benjamin Coddington wrote: >> On 20 Jul 2022, at 10:38, Jeff Layton wrote: >>> On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: >>>> >>>> Is there a different way I am not seeing? >>>> >>> >>> Right, implementing this is the difficult bit actually since this uses a >>> MS_* flag. If we do make this the default, we'd definitely want to >>> continue allowing "-o noiversion" to disable it. >>> >>> Could we just reverse the default in libmount? It might cause this to >>> suddenly be enabled in some deployments, but in most cases, people >>> wouldn't even notice and they could still specify -o noiversion to turn >>> it off. >>> >>> Another idea would be to introduce new mount options for this, but >>> that's kind of nasty from a UI standpoint. >> >> Is it safe to set SB_I_VERSION at export time? If so, export_operations >> could grow an ->enable_iversion(). >> > > That sounds like it might be problematic. > > Consider the case where a NFSv4 client has cached file data and the > change attribute for the file. Server then reboots, but before the > export happens a local user makes a change to the file and it doesn't > update the i_version. Nfsd currently uses both ctime and i_version if its available, I'd expect that eliminates this case. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-20 16:29 ` Benjamin Coddington @ 2022-07-20 16:46 ` Jeff Layton 0 siblings, 0 replies; 14+ messages in thread From: Jeff Layton @ 2022-07-20 16:46 UTC (permalink / raw) To: Benjamin Coddington Cc: Lukas Czerner, tytso, adilger.kernel, linux-ext4, linux-fsdevel On Wed, 2022-07-20 at 12:29 -0400, Benjamin Coddington wrote: > On 20 Jul 2022, at 12:15, Jeff Layton wrote: > > > On Wed, 2022-07-20 at 11:56 -0400, Benjamin Coddington wrote: > > > On 20 Jul 2022, at 10:38, Jeff Layton wrote: > > > > On Wed, 2022-07-20 at 16:15 +0200, Lukas Czerner wrote: > > > > > > > > > > Is there a different way I am not seeing? > > > > > > > > > > > > > Right, implementing this is the difficult bit actually since this uses a > > > > MS_* flag. If we do make this the default, we'd definitely want to > > > > continue allowing "-o noiversion" to disable it. > > > > > > > > Could we just reverse the default in libmount? It might cause this to > > > > suddenly be enabled in some deployments, but in most cases, people > > > > wouldn't even notice and they could still specify -o noiversion to turn > > > > it off. > > > > > > > > Another idea would be to introduce new mount options for this, but > > > > that's kind of nasty from a UI standpoint. > > > > > > Is it safe to set SB_I_VERSION at export time? If so, export_operations > > > could grow an ->enable_iversion(). > > > > > > > That sounds like it might be problematic. > > > > Consider the case where a NFSv4 client has cached file data and the > > change attribute for the file. Server then reboots, but before the > > export happens a local user makes a change to the file and it doesn't > > update the i_version. > > Nfsd currently uses both ctime and i_version if its available, I'd expect > that eliminates this case. > Good point, that probably would. Still, I'd rather we just enable this wholesale if we can get away with it. There's still some interest in exposing i_version to userland via statx or the like, so I'd rather not assume that only nfsd will care about it. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-19 13:51 should we make "-o iversion" the default on ext4 ? Jeff Layton 2022-07-20 14:15 ` Lukas Czerner @ 2022-07-21 22:32 ` Dave Chinner 2022-07-25 16:22 ` Jeff Layton 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2022-07-21 22:32 UTC (permalink / raw) To: Jeff Layton Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, lczerner, Benjamin Coddington On Tue, Jul 19, 2022 at 09:51:33AM -0400, Jeff Layton wrote: > Back in 2018, I did a patchset [1] to rework the inode->i_version > counter handling to be much less expensive, particularly when no-one is > querying for it. Yup, there's zero additional overhead for maintaining i_version in XFS when nothing is monitoring it. Updating it comes for free in any transaction that modifies the inode, so when writes occur i_version gets bumped if timestamps change or allocation is required. And when something is monitoring it, the overhead is effectively a single "timestamp" update for each peek at i_version the monitoring agent makes. This is also largely noise.... > Testing at the time showed that the cost of enabling i_version on ext4 > was close to 0 when nothing is querying it, but I stopped short of > trying to make it the default at the time (mostly out of an abundance of > caution). Since then, we still see a steady stream of cache-coherency > problems with NFSv4 on ext4 when this option is disabled (e.g. [2]). > > Is it time to go ahead and make this option the default on ext4? I don't > see a real downside to doing so, though I'm unclear on how we should > approach this. Currently the option is twiddled using MS_I_VERSION flag, > and it's unclear to me how we can reverse the sense of such a flag. XFS only enables SB_I_VERSION based on an on disk format flag - you can't turn it on or off by mount options, so it completely ignores MS_I_VERSION. > Thoughts? My 2c is to behave like XFS: ignore the mount option and always turn it on. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: should we make "-o iversion" the default on ext4 ? 2022-07-21 22:32 ` Dave Chinner @ 2022-07-25 16:22 ` Jeff Layton 0 siblings, 0 replies; 14+ messages in thread From: Jeff Layton @ 2022-07-25 16:22 UTC (permalink / raw) To: Dave Chinner Cc: tytso, adilger.kernel, linux-ext4, linux-fsdevel, lczerner, Benjamin Coddington On Fri, 2022-07-22 at 08:32 +1000, Dave Chinner wrote: > On Tue, Jul 19, 2022 at 09:51:33AM -0400, Jeff Layton wrote: > > Back in 2018, I did a patchset [1] to rework the inode->i_version > > counter handling to be much less expensive, particularly when no-one is > > querying for it. > > Yup, there's zero additional overhead for maintaining i_version in > XFS when nothing is monitoring it. Updating it comes for free in any > transaction that modifies the inode, so when writes > occur i_version gets bumped if timestamps change or allocation is > required. > > And when something is monitoring it, the overhead is effectively a > single "timestamp" update for each peek at i_version the monitoring > agent makes. This is also largely noise.... > > > Testing at the time showed that the cost of enabling i_version on ext4 > > was close to 0 when nothing is querying it, but I stopped short of > > trying to make it the default at the time (mostly out of an abundance of > > caution). Since then, we still see a steady stream of cache-coherency > > problems with NFSv4 on ext4 when this option is disabled (e.g. [2]). > > > > Is it time to go ahead and make this option the default on ext4? I don't > > see a real downside to doing so, though I'm unclear on how we should > > approach this. Currently the option is twiddled using MS_I_VERSION flag, > > and it's unclear to me how we can reverse the sense of such a flag. > > XFS only enables SB_I_VERSION based on an on disk format flag - you > can't turn it on or off by mount options, so it completely ignores > MS_I_VERSION. > > > Thoughts? > > My 2c is to behave like XFS: ignore the mount option and always turn > it on. I'd be fine with that, personally. They could also couple that with a tune2fs flag or something, so you could still disable it if it were a problem for some reason. It's unlikely that anyone will really notice however, so turning it on unconditionally may be the best place to start. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-07-25 16:22 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-19 13:51 should we make "-o iversion" the default on ext4 ? Jeff Layton 2022-07-20 14:15 ` Lukas Czerner 2022-07-20 14:38 ` Jeff Layton 2022-07-20 15:22 ` Lukas Czerner 2022-07-20 16:42 ` Jeff Layton 2022-07-21 14:06 ` Lukas Czerner 2022-07-21 17:03 ` Jeff Layton 2022-07-20 15:56 ` Benjamin Coddington 2022-07-20 16:06 ` Christoph Hellwig 2022-07-20 16:15 ` Jeff Layton 2022-07-20 16:29 ` Benjamin Coddington 2022-07-20 16:46 ` Jeff Layton 2022-07-21 22:32 ` Dave Chinner 2022-07-25 16:22 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).