public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* generic/232 test failures on 4.14-rc1
@ 2017-09-21 15:48 Eric Whitney
  2017-09-25 13:59 ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Whitney @ 2017-09-21 15:48 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack

I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
trials, it failed in the same manner 4 out of 10 times when running the 4k test
configuration for ext4.

The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
the failure did not reoccur in a series of 20 trials.

Example output from the failed test:

QA output created by 232

Testing fsstress

seed = S
Comparing user usage
218a219
> #3740     --       4       0       0              1     0     0       
245a247
> #45       --       0       0       0              1     0     0     

Note:  I'm also seeing a similar failure for generic/233, but the patch
containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
another bug report once I locate it.

Thanks,
Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-21 15:48 generic/232 test failures on 4.14-rc1 Eric Whitney
@ 2017-09-25 13:59 ` Jan Kara
  2017-09-26 12:58   ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2017-09-25 13:59 UTC (permalink / raw)
  To: Eric Whitney; +Cc: linux-ext4, jack

On Thu 21-09-17 11:48:46, Eric Whitney wrote:
> I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
> on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
> trials, it failed in the same manner 4 out of 10 times when running the 4k test
> configuration for ext4.
> 
> The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
> v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
> the failure did not reoccur in a series of 20 trials.

Thanks for debugging this! I'd just note that the commit hash of that
change is different for me - d2faa415166b2883428efa92f451774ef44373ac.

> Example output from the failed test:
> 
> QA output created by 232
> 
> Testing fsstress
> 
> seed = S
> Comparing user usage
> 218a219
> > #3740     --       4       0       0              1     0     0       
> 245a247
> > #45       --       0       0       0              1     0     0     
> 
> Note:  I'm also seeing a similar failure for generic/233, but the patch
> containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
> another bug report once I locate it.

I'll try to debug this further. Thanks for report!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-25 13:59 ` Jan Kara
@ 2017-09-26 12:58   ` Jan Kara
  2017-09-26 21:41     ` Eric Whitney
  2017-09-27  1:19     ` Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2017-09-26 12:58 UTC (permalink / raw)
  To: Eric Whitney; +Cc: linux-ext4, jack

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On Mon 25-09-17 15:59:46, Jan Kara wrote:
> On Thu 21-09-17 11:48:46, Eric Whitney wrote:
> > I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
> > on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
> > trials, it failed in the same manner 4 out of 10 times when running the 4k test
> > configuration for ext4.
> > 
> > The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
> > v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
> > the failure did not reoccur in a series of 20 trials.
> 
> Thanks for debugging this! I'd just note that the commit hash of that
> change is different for me - d2faa415166b2883428efa92f451774ef44373ac.
> 
> > Example output from the failed test:
> > 
> > QA output created by 232
> > 
> > Testing fsstress
> > 
> > seed = S
> > Comparing user usage
> > 218a219
> > > #3740     --       4       0       0              1     0     0       
> > 245a247
> > > #45       --       0       0       0              1     0     0     
> > 
> > Note:  I'm also seeing a similar failure for generic/233, but the patch
> > containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
> > another bug report once I locate it.
> 
> I'll try to debug this further. Thanks for report!

Attached patch fixes the problem for me. I'll merge it through my tree.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-quota-Fix-quota-corruption-with-generic-232-test.patch --]
[-- Type: text/x-patch, Size: 1876 bytes --]

>From a0ae41c2a9c204374eafd24a928e4352841bd905 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 26 Sep 2017 10:36:05 +0200
Subject: [PATCH] quota: Fix quota corruption with generic/232 test

Eric has reported that since commit d2faa415166b "quota: Do not acquire
dqio_sem for dquot overwrites in v2 format" test generic/232
occasionally fails due to quota information being incorrect. Indeed that
commit was too eager to remove dqio_sem completely from the path that
just overwrites quota structure with updated information. Although that
is innocent on its own, another process that inserts new quota structure
to the same block can perform read-modify-write cycle of that block thus
effectively discarding quota information update if they race in a wrong
way.

Fix the problem by acquiring dqio_sem for reading for overwrites of
quota structure. Note that it *is* possible to completely avoid taking
dqio_sem in the overwrite path however that will require modifying path
inserting / deleting quota structures to avoid RMW cycles of the full
block and for now it is not clear whether it is worth the hassle.

Fixes: d2faa415166b2883428efa92f451774ef44373ac
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/quota/quota_v2.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c
index c0187cda2c1e..a73e5b34db41 100644
--- a/fs/quota/quota_v2.c
+++ b/fs/quota/quota_v2.c
@@ -328,12 +328,16 @@ static int v2_write_dquot(struct dquot *dquot)
 	if (!dquot->dq_off) {
 		alloc = true;
 		down_write(&dqopt->dqio_sem);
+	} else {
+		down_read(&dqopt->dqio_sem);
 	}
 	ret = qtree_write_dquot(
 			sb_dqinfo(dquot->dq_sb, dquot->dq_id.type)->dqi_priv,
 			dquot);
 	if (alloc)
 		up_write(&dqopt->dqio_sem);
+	else
+		up_read(&dqopt->dqio_sem);
 	return ret;
 }
 
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-26 12:58   ` Jan Kara
@ 2017-09-26 21:41     ` Eric Whitney
  2017-09-27  9:34       ` Jan Kara
  2017-09-27  1:19     ` Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Whitney @ 2017-09-26 21:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Whitney, linux-ext4

* Jan Kara <jack@suse.cz>:
> On Mon 25-09-17 15:59:46, Jan Kara wrote:
> > On Thu 21-09-17 11:48:46, Eric Whitney wrote:
> > > I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
> > > on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
> > > trials, it failed in the same manner 4 out of 10 times when running the 4k test
> > > configuration for ext4.
> > > 
> > > The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
> > > v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
> > > the failure did not reoccur in a series of 20 trials.
> > 
> > Thanks for debugging this! I'd just note that the commit hash of that
> > change is different for me - d2faa415166b2883428efa92f451774ef44373ac.
> > 
> > > Example output from the failed test:
> > > 
> > > QA output created by 232
> > > 
> > > Testing fsstress
> > > 
> > > seed = S
> > > Comparing user usage
> > > 218a219
> > > > #3740     --       4       0       0              1     0     0       
> > > 245a247
> > > > #45       --       0       0       0              1     0     0     
> > > 
> > > Note:  I'm also seeing a similar failure for generic/233, but the patch
> > > containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
> > > another bug report once I locate it.
> > 
> > I'll try to debug this further. Thanks for report!
> 
> Attached patch fixes the problem for me. I'll merge it through my tree.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> From a0ae41c2a9c204374eafd24a928e4352841bd905 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 26 Sep 2017 10:36:05 +0200
> Subject: [PATCH] quota: Fix quota corruption with generic/232 test
> 
> Eric has reported that since commit d2faa415166b "quota: Do not acquire
> dqio_sem for dquot overwrites in v2 format" test generic/232
> occasionally fails due to quota information being incorrect. Indeed that
> commit was too eager to remove dqio_sem completely from the path that
> just overwrites quota structure with updated information. Although that
> is innocent on its own, another process that inserts new quota structure
> to the same block can perform read-modify-write cycle of that block thus
> effectively discarding quota information update if they race in a wrong
> way.
> 
> Fix the problem by acquiring dqio_sem for reading for overwrites of
> quota structure. Note that it *is* possible to completely avoid taking
> dqio_sem in the overwrite path however that will require modifying path
> inserting / deleting quota structures to avoid RMW cycles of the full
> block and for now it is not clear whether it is worth the hassle.
> 
> Fixes: d2faa415166b2883428efa92f451774ef44373ac
> Reported-by: Eric Whitney <enwlinux@gmail.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/quota/quota_v2.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c
> index c0187cda2c1e..a73e5b34db41 100644
> --- a/fs/quota/quota_v2.c
> +++ b/fs/quota/quota_v2.c
> @@ -328,12 +328,16 @@ static int v2_write_dquot(struct dquot *dquot)
>  	if (!dquot->dq_off) {
>  		alloc = true;
>  		down_write(&dqopt->dqio_sem);
> +	} else {
> +		down_read(&dqopt->dqio_sem);
>  	}
>  	ret = qtree_write_dquot(
>  			sb_dqinfo(dquot->dq_sb, dquot->dq_id.type)->dqi_priv,
>  			dquot);
>  	if (alloc)
>  		up_write(&dqopt->dqio_sem);
> +	else
> +		up_read(&dqopt->dqio_sem);
>  	return ret;
>  }
>  
> -- 
> 2.12.3
> 

Hi Honza:

That patch works for me - 100 out of 100 trials of generic/232 passed
successfully running a modified 4.14-rc1 kernel on kvm-xfstests' ext4 4k
test configuration.

Tested-by: Eric Whitney <enwlinux@gmail.com>

Thanks!
Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-26 12:58   ` Jan Kara
  2017-09-26 21:41     ` Eric Whitney
@ 2017-09-27  1:19     ` Darrick J. Wong
  2017-09-27  9:33       ` Jan Kara
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2017-09-27  1:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Whitney, linux-ext4, xfs

[cc xfs list]

On Tue, Sep 26, 2017 at 02:58:31PM +0200, Jan Kara wrote:
> On Mon 25-09-17 15:59:46, Jan Kara wrote:
> > On Thu 21-09-17 11:48:46, Eric Whitney wrote:
> > > I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
> > > on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
> > > trials, it failed in the same manner 4 out of 10 times when running the 4k test
> > > configuration for ext4.
> > > 
> > > The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
> > > v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
> > > the failure did not reoccur in a series of 20 trials.
> > 
> > Thanks for debugging this! I'd just note that the commit hash of that
> > change is different for me - d2faa415166b2883428efa92f451774ef44373ac.
> > 
> > > Example output from the failed test:
> > > 
> > > QA output created by 232
> > > 
> > > Testing fsstress
> > > 
> > > seed = S
> > > Comparing user usage
> > > 218a219
> > > > #3740     --       4       0       0              1     0     0       
> > > 245a247
> > > > #45       --       0       0       0              1     0     0     
> > > 
> > > Note:  I'm also seeing a similar failure for generic/233, but the patch
> > > containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
> > > another bug report once I locate it.
> > 
> > I'll try to debug this further. Thanks for report!
> 
> Attached patch fixes the problem for me. I'll merge it through my tree.

Hi Jan,

Ever since 4.14-rc1, I've noticed the same problem (intermittent
failures of generic/{232,233,270}) that Eric Whitney was complaining
about when running xfstests against XFS.  I'll try a proper bisect
tomorrow, but given the big locking rework I wonder if that rings any
bells for you?

--D

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> From a0ae41c2a9c204374eafd24a928e4352841bd905 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 26 Sep 2017 10:36:05 +0200
> Subject: [PATCH] quota: Fix quota corruption with generic/232 test
> 
> Eric has reported that since commit d2faa415166b "quota: Do not acquire
> dqio_sem for dquot overwrites in v2 format" test generic/232
> occasionally fails due to quota information being incorrect. Indeed that
> commit was too eager to remove dqio_sem completely from the path that
> just overwrites quota structure with updated information. Although that
> is innocent on its own, another process that inserts new quota structure
> to the same block can perform read-modify-write cycle of that block thus
> effectively discarding quota information update if they race in a wrong
> way.
> 
> Fix the problem by acquiring dqio_sem for reading for overwrites of
> quota structure. Note that it *is* possible to completely avoid taking
> dqio_sem in the overwrite path however that will require modifying path
> inserting / deleting quota structures to avoid RMW cycles of the full
> block and for now it is not clear whether it is worth the hassle.
> 
> Fixes: d2faa415166b2883428efa92f451774ef44373ac
> Reported-by: Eric Whitney <enwlinux@gmail.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/quota/quota_v2.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c
> index c0187cda2c1e..a73e5b34db41 100644
> --- a/fs/quota/quota_v2.c
> +++ b/fs/quota/quota_v2.c
> @@ -328,12 +328,16 @@ static int v2_write_dquot(struct dquot *dquot)
>  	if (!dquot->dq_off) {
>  		alloc = true;
>  		down_write(&dqopt->dqio_sem);
> +	} else {
> +		down_read(&dqopt->dqio_sem);
>  	}
>  	ret = qtree_write_dquot(
>  			sb_dqinfo(dquot->dq_sb, dquot->dq_id.type)->dqi_priv,
>  			dquot);
>  	if (alloc)
>  		up_write(&dqopt->dqio_sem);
> +	else
> +		up_read(&dqopt->dqio_sem);
>  	return ret;
>  }
>  
> -- 
> 2.12.3
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-27  1:19     ` Darrick J. Wong
@ 2017-09-27  9:33       ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2017-09-27  9:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Jan Kara, Eric Whitney, linux-ext4, xfs

Hi Darrick!

On Tue 26-09-17 18:19:29, Darrick J. Wong wrote:
> On Tue, Sep 26, 2017 at 02:58:31PM +0200, Jan Kara wrote:
> > On Mon 25-09-17 15:59:46, Jan Kara wrote:
> > > On Thu 21-09-17 11:48:46, Eric Whitney wrote:
> > > > I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel
> > > > on xfstest-bld's most recent kvm-xfstests test appliance.  In one set of
> > > > trials, it failed in the same manner 4 out of 10 times when running the 4k test
> > > > configuration for ext4.
> > > > 
> > > > The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in
> > > > v2 format" (ab2b86360f6e).  When this patch was reverted in a 4.14-rc1 kernel,
> > > > the failure did not reoccur in a series of 20 trials.
> > > 
> > > Thanks for debugging this! I'd just note that the commit hash of that
> > > change is different for me - d2faa415166b2883428efa92f451774ef44373ac.
> > > 
> > > > Example output from the failed test:
> > > > 
> > > > QA output created by 232
> > > > 
> > > > Testing fsstress
> > > > 
> > > > seed = S
> > > > Comparing user usage
> > > > 218a219
> > > > > #3740     --       4       0       0              1     0     0       
> > > > 245a247
> > > > > #45       --       0       0       0              1     0     0     
> > > > 
> > > > Note:  I'm also seeing a similar failure for generic/233, but the patch
> > > > containing the root cause likely comes somewhere after ab2b86360f6e.  I'll post
> > > > another bug report once I locate it.
> > > 
> > > I'll try to debug this further. Thanks for report!
> > 
> > Attached patch fixes the problem for me. I'll merge it through my tree.
> 
> Ever since 4.14-rc1, I've noticed the same problem (intermittent
> failures of generic/{232,233,270}) that Eric Whitney was complaining
> about when running xfstests against XFS.  I'll try a proper bisect
> tomorrow, but given the big locking rework I wonder if that rings any
> bells for you?

Hum, no idea. XFS uses its own thing for quotas so my changes don't
influence it at all. I don't know much about XFS internal quota
implementation so I don't have a good guess what could have caused the
breakage you see. I'm sorry.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: generic/232 test failures on 4.14-rc1
  2017-09-26 21:41     ` Eric Whitney
@ 2017-09-27  9:34       ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2017-09-27  9:34 UTC (permalink / raw)
  To: Eric Whitney; +Cc: Jan Kara, linux-ext4

On Tue 26-09-17 17:41:52, Eric Whitney wrote:
> Hi Honza:
> 
> That patch works for me - 100 out of 100 trials of generic/232 passed
> successfully running a modified 4.14-rc1 kernel on kvm-xfstests' ext4 4k
> test configuration.
> 
> Tested-by: Eric Whitney <enwlinux@gmail.com>

Thanks for testing!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-27  9:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-21 15:48 generic/232 test failures on 4.14-rc1 Eric Whitney
2017-09-25 13:59 ` Jan Kara
2017-09-26 12:58   ` Jan Kara
2017-09-26 21:41     ` Eric Whitney
2017-09-27  9:34       ` Jan Kara
2017-09-27  1:19     ` Darrick J. Wong
2017-09-27  9:33       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox