* [PATCH 0/1] Fix generic/390 failure due to quota release after freeze
@ 2024-11-15 18:34 Ojaswin Mujoo
2024-11-15 18:34 ` [PATCH 1/1] quota: flush quota_release_work upon quota writeback Ojaswin Mujoo
0 siblings, 1 reply; 9+ messages in thread
From: Ojaswin Mujoo @ 2024-11-15 18:34 UTC (permalink / raw)
To: linux-ext4, Jan Kara; +Cc: Ritesh Harjani, linux-kernel, linux-fsdevel
Recently we noticed generic/390 failing on powerpc systems. This test
basically does a freeze-unfreeze loop in parallel with fsstress on the
FS to detect any races in the code paths.
We noticed that the test started failing due to kernel WARN_ONs because
quota_release_work workqueue started executing while the FS was frozen
which led to creating new transactions in ext4_release_quota.
Most of the details are in the bug however I'd just like to add that
I'm completely new to quota code so the patch, although fixing the
issue, might be not be logically the right thing to do. So reviews and
suggestions are welcome.
Also, I can only replicate this race on one of my machines reliably and
does not appear on others. I've tested with with fstests -g quota and
don't see any new failures.
Ojaswin Mujoo (1):
quota: flush quota_release_work upon quota writeback
fs/quota/dquot.c | 2 ++
1 file changed, 2 insertions(+)
--
2.43.5
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-15 18:34 [PATCH 0/1] Fix generic/390 failure due to quota release after freeze Ojaswin Mujoo
@ 2024-11-15 18:34 ` Ojaswin Mujoo
2024-11-15 20:50 ` Ritesh Harjani
2024-11-18 13:15 ` Jan Kara
0 siblings, 2 replies; 9+ messages in thread
From: Ojaswin Mujoo @ 2024-11-15 18:34 UTC (permalink / raw)
To: linux-ext4, Jan Kara
Cc: Ritesh Harjani, linux-kernel, linux-fsdevel, Disha Goel
One of the paths quota writeback is called from is:
freeze_super()
sync_filesystem()
ext4_sync_fs()
dquot_writeback_dquots()
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
__ext4_journal_start_sb+0x64/0x1c0 [ext4]
ext4_release_dquot+0x90/0x1d0 [ext4]
quota_release_workfn+0x43c/0x4d0
Which is the following line:
WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
Which ultimately results in generic/390 failing due to dmesg
noise. This was detected on powerpc machine 15 cores.
To avoid this, make sure to flush the workqueue during
dquot_writeback_dquots() so we dont have any pending workitems after
freeze.
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
fs/quota/dquot.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 3dd8d6f27725..2782cfc8c302 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
sb->dq_op->write_info(sb, cnt);
dqstats_inc(DQST_SYNCS);
+ flush_delayed_work("a_release_work);
+
return ret;
}
EXPORT_SYMBOL(dquot_writeback_dquots);
--
2.43.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-15 18:34 ` [PATCH 1/1] quota: flush quota_release_work upon quota writeback Ojaswin Mujoo
@ 2024-11-15 20:50 ` Ritesh Harjani
2024-11-16 17:59 ` Ojaswin Mujoo
2024-11-18 13:15 ` Jan Kara
1 sibling, 1 reply; 9+ messages in thread
From: Ritesh Harjani @ 2024-11-15 20:50 UTC (permalink / raw)
To: Ojaswin Mujoo, linux-ext4, Jan Kara
Cc: linux-kernel, linux-fsdevel, Disha Goel
Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
> One of the paths quota writeback is called from is:
>
> freeze_super()
> sync_filesystem()
> ext4_sync_fs()
> dquot_writeback_dquots()
>
> Since we currently don't always flush the quota_release_work queue in
> this path, we can end up with the following race:
>
> 1. dquot are added to releasing_dquots list during regular operations.
> 2. FS freeze starts, however, this does not flush the quota_release_work queue.
> 3. Freeze completes.
> 4. Kernel eventually tries to flush the workqueue while FS is frozen which
> hits a WARN_ON since transaction gets started during frozen state:
>
> ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
> __ext4_journal_start_sb+0x64/0x1c0 [ext4]
> ext4_release_dquot+0x90/0x1d0 [ext4]
> quota_release_workfn+0x43c/0x4d0
>
> Which is the following line:
>
> WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
>
> Which ultimately results in generic/390 failing due to dmesg
> noise. This was detected on powerpc machine 15 cores.
>
> To avoid this, make sure to flush the workqueue during
> dquot_writeback_dquots() so we dont have any pending workitems after
> freeze.
Not just that, sync_filesystem can also be called from other places and
quota_release_workfn() could write out and and release the dquot
structures if such are found during processing of releasing_dquots list.
IIUC, this was earlier done in the same dqput() context but had races
with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
dquot structures to releasing_dquots list and will schedule a delayed
workfn which will process the releasing_dquots list.
And so after the final dqput and before the release_workfn gets
scheduled, if dquot gets marked as dirty or dquot_transfer gets called -
then I am suspecting that it could lead to a dirty or an active dquot.
Hence, flushing the delayed quota_release_work at the end of
dquot_writeback_dquots() looks like the right thing to do IMO.
But I can give another look as this part of the code is not that well
known to me.
>
> Reported-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
Maybe a fixes tag as well?
> fs/quota/dquot.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 3dd8d6f27725..2782cfc8c302 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
> sb->dq_op->write_info(sb, cnt);
> dqstats_inc(DQST_SYNCS);
>
> + flush_delayed_work("a_release_work);
> +
> return ret;
> }
> EXPORT_SYMBOL(dquot_writeback_dquots);
> --
> 2.43.5
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-15 20:50 ` Ritesh Harjani
@ 2024-11-16 17:59 ` Ojaswin Mujoo
2024-11-18 1:29 ` Baokun Li
0 siblings, 1 reply; 9+ messages in thread
From: Ojaswin Mujoo @ 2024-11-16 17:59 UTC (permalink / raw)
To: Ritesh Harjani
Cc: linux-ext4, Jan Kara, linux-kernel, linux-fsdevel, Disha Goel,
Baokun Li
On Sat, Nov 16, 2024 at 02:20:26AM +0530, Ritesh Harjani wrote:
> Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
>
> > One of the paths quota writeback is called from is:
> >
> > freeze_super()
> > sync_filesystem()
> > ext4_sync_fs()
> > dquot_writeback_dquots()
> >
> > Since we currently don't always flush the quota_release_work queue in
> > this path, we can end up with the following race:
> >
> > 1. dquot are added to releasing_dquots list during regular operations.
> > 2. FS freeze starts, however, this does not flush the quota_release_work queue.
> > 3. Freeze completes.
> > 4. Kernel eventually tries to flush the workqueue while FS is frozen which
> > hits a WARN_ON since transaction gets started during frozen state:
> >
> > ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
> > __ext4_journal_start_sb+0x64/0x1c0 [ext4]
> > ext4_release_dquot+0x90/0x1d0 [ext4]
> > quota_release_workfn+0x43c/0x4d0
> >
> > Which is the following line:
> >
> > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> >
> > Which ultimately results in generic/390 failing due to dmesg
> > noise. This was detected on powerpc machine 15 cores.
> >
> > To avoid this, make sure to flush the workqueue during
> > dquot_writeback_dquots() so we dont have any pending workitems after
> > freeze.
>
> Not just that, sync_filesystem can also be called from other places and
> quota_release_workfn() could write out and and release the dquot
> structures if such are found during processing of releasing_dquots list.
> IIUC, this was earlier done in the same dqput() context but had races
> with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
> dquot structures to releasing_dquots list and will schedule a delayed
> workfn which will process the releasing_dquots list.
Hi Ritesh,
Ohh right, thanks for the context. I see this was done here:
dabc8b207566 quota: fix dqput() to follow the guarantees dquot_srcu
should provide
Which went in v6.5. Let me cc Baokun as well.
>
> And so after the final dqput and before the release_workfn gets
> scheduled, if dquot gets marked as dirty or dquot_transfer gets called -
> then I am suspecting that it could lead to a dirty or an active dquot.
>
> Hence, flushing the delayed quota_release_work at the end of
> dquot_writeback_dquots() looks like the right thing to do IMO.
>
> But I can give another look as this part of the code is not that well
> known to me.
>
> >
> > Reported-by: Disha Goel <disgoel@linux.ibm.com>
> > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> > ---
>
> Maybe a fixes tag as well?
Right, I'll add that in the next revision. I believe it would be:
Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
Regards,
ojaswin
>
> > fs/quota/dquot.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> > index 3dd8d6f27725..2782cfc8c302 100644
> > --- a/fs/quota/dquot.c
> > +++ b/fs/quota/dquot.c
> > @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
> > sb->dq_op->write_info(sb, cnt);
> > dqstats_inc(DQST_SYNCS);
> >
> > + flush_delayed_work("a_release_work);
> > +
> > return ret;
> > }
> > EXPORT_SYMBOL(dquot_writeback_dquots);
> > --
> > 2.43.5
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-16 17:59 ` Ojaswin Mujoo
@ 2024-11-18 1:29 ` Baokun Li
2024-11-18 12:53 ` Jan Kara
0 siblings, 1 reply; 9+ messages in thread
From: Baokun Li @ 2024-11-18 1:29 UTC (permalink / raw)
To: Ojaswin Mujoo
Cc: Ritesh Harjani, linux-ext4, Jan Kara, linux-kernel, linux-fsdevel,
Disha Goel, Yang Erkun
On 2024/11/17 1:59, Ojaswin Mujoo wrote:
> On Sat, Nov 16, 2024 at 02:20:26AM +0530, Ritesh Harjani wrote:
>> Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
>>
>>> One of the paths quota writeback is called from is:
>>>
>>> freeze_super()
>>> sync_filesystem()
>>> ext4_sync_fs()
>>> dquot_writeback_dquots()
>>>
>>> Since we currently don't always flush the quota_release_work queue in
>>> this path, we can end up with the following race:
>>>
>>> 1. dquot are added to releasing_dquots list during regular operations.
>>> 2. FS freeze starts, however, this does not flush the quota_release_work queue.
>>> 3. Freeze completes.
>>> 4. Kernel eventually tries to flush the workqueue while FS is frozen which
>>> hits a WARN_ON since transaction gets started during frozen state:
>>>
>>> ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
>>> __ext4_journal_start_sb+0x64/0x1c0 [ext4]
>>> ext4_release_dquot+0x90/0x1d0 [ext4]
>>> quota_release_workfn+0x43c/0x4d0
>>>
>>> Which is the following line:
>>>
>>> WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
>>>
>>> Which ultimately results in generic/390 failing due to dmesg
>>> noise. This was detected on powerpc machine 15 cores.
>>>
>>> To avoid this, make sure to flush the workqueue during
>>> dquot_writeback_dquots() so we dont have any pending workitems after
>>> freeze.
>> Not just that, sync_filesystem can also be called from other places and
>> quota_release_workfn() could write out and and release the dquot
>> structures if such are found during processing of releasing_dquots list.
>> IIUC, this was earlier done in the same dqput() context but had races
>> with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
>> dquot structures to releasing_dquots list and will schedule a delayed
>> workfn which will process the releasing_dquots list.
> Hi Ritesh,
>
> Ohh right, thanks for the context. I see this was done here:
>
> dabc8b207566 quota: fix dqput() to follow the guarantees dquot_srcu
> should provide
>
> Which went in v6.5. Let me cc Baokun as well.
Hello Ojaswin,
Nice catch! Thanks for fixing this up!
Have you tested the performance impact of this patch? It looks like the
unconditional call to flush_delayed_work() in dquot_writeback_dquots()
may have some performance impact for frequent chown/sync scenarios.
When calling release_dquot(), we will only remove the quota of an object
(user/group/project) from disk if it is not quota-limited and does not
use any inode or block.
Asynchronous removal is now much more performance friendly, not only does
it make full use of the multi-core, but for scenarios where we have to
repeatedly chown between two objects, delayed release avoids the need to
repeatedly allocate/free space in memory and on disk.
Overall, since the actual dirty data is already on the disk, there is no
consistency issue here as it is just clearing unreferenced quota on the
disk, so I thought maybe it would be better to call flush_delayed_work()
in the freeze context.
Thanks,
Baokun
>> And so after the final dqput and before the release_workfn gets
>> scheduled, if dquot gets marked as dirty or dquot_transfer gets called -
>> then I am suspecting that it could lead to a dirty or an active dquot.
>>
>> Hence, flushing the delayed quota_release_work at the end of
>> dquot_writeback_dquots() looks like the right thing to do IMO.
>>
>> But I can give another look as this part of the code is not that well
>> known to me.
>>
>>> Reported-by: Disha Goel <disgoel@linux.ibm.com>
>>> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>>> ---
>> Maybe a fixes tag as well?
> Right, I'll add that in the next revision. I believe it would be:
>
> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
>
> Regards,
> ojaswin
>
>>> fs/quota/dquot.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>>> index 3dd8d6f27725..2782cfc8c302 100644
>>> --- a/fs/quota/dquot.c
>>> +++ b/fs/quota/dquot.c
>>> @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
>>> sb->dq_op->write_info(sb, cnt);
>>> dqstats_inc(DQST_SYNCS);
>>>
>>> + flush_delayed_work("a_release_work);
>>> +
>>> return ret;
>>> }
>>> EXPORT_SYMBOL(dquot_writeback_dquots);
>>> --
>>> 2.43.5
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-18 1:29 ` Baokun Li
@ 2024-11-18 12:53 ` Jan Kara
2024-11-19 6:29 ` Baokun Li
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-11-18 12:53 UTC (permalink / raw)
To: Baokun Li
Cc: Ojaswin Mujoo, Ritesh Harjani, linux-ext4, Jan Kara, linux-kernel,
linux-fsdevel, Disha Goel, Yang Erkun
On Mon 18-11-24 09:29:19, Baokun Li wrote:
> On 2024/11/17 1:59, Ojaswin Mujoo wrote:
> > On Sat, Nov 16, 2024 at 02:20:26AM +0530, Ritesh Harjani wrote:
> > > Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
> > >
> > > > One of the paths quota writeback is called from is:
> > > >
> > > > freeze_super()
> > > > sync_filesystem()
> > > > ext4_sync_fs()
> > > > dquot_writeback_dquots()
> > > >
> > > > Since we currently don't always flush the quota_release_work queue in
> > > > this path, we can end up with the following race:
> > > >
> > > > 1. dquot are added to releasing_dquots list during regular operations.
> > > > 2. FS freeze starts, however, this does not flush the quota_release_work queue.
> > > > 3. Freeze completes.
> > > > 4. Kernel eventually tries to flush the workqueue while FS is frozen which
> > > > hits a WARN_ON since transaction gets started during frozen state:
> > > >
> > > > ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
> > > > __ext4_journal_start_sb+0x64/0x1c0 [ext4]
> > > > ext4_release_dquot+0x90/0x1d0 [ext4]
> > > > quota_release_workfn+0x43c/0x4d0
> > > >
> > > > Which is the following line:
> > > >
> > > > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> > > >
> > > > Which ultimately results in generic/390 failing due to dmesg
> > > > noise. This was detected on powerpc machine 15 cores.
> > > >
> > > > To avoid this, make sure to flush the workqueue during
> > > > dquot_writeback_dquots() so we dont have any pending workitems after
> > > > freeze.
> > > Not just that, sync_filesystem can also be called from other places and
> > > quota_release_workfn() could write out and and release the dquot
> > > structures if such are found during processing of releasing_dquots list.
> > > IIUC, this was earlier done in the same dqput() context but had races
> > > with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
> > > dquot structures to releasing_dquots list and will schedule a delayed
> > > workfn which will process the releasing_dquots list.
> > Hi Ritesh,
> >
> > Ohh right, thanks for the context. I see this was done here:
> >
> > dabc8b207566 quota: fix dqput() to follow the guarantees dquot_srcu
> > should provide
Yup.
> Nice catch! Thanks for fixing this up!
>
> Have you tested the performance impact of this patch? It looks like the
> unconditional call to flush_delayed_work() in dquot_writeback_dquots()
> may have some performance impact for frequent chown/sync scenarios.
Well, but sync(2) or so is expensive anyway. Also dquot_writeback_dquots()
should persist all pending quota modifications and it is true that pending
dquot_release() calls can remove quota structures from the quota file and
thus are by definition pending modifications. So I agree with Ojaswin that
putting the workqueue flush there makes sense and is practically required
for data consistency guarantees.
> When calling release_dquot(), we will only remove the quota of an object
> (user/group/project) from disk if it is not quota-limited and does not
> use any inode or block.
>
> Asynchronous removal is now much more performance friendly, not only does
> it make full use of the multi-core, but for scenarios where we have to
> repeatedly chown between two objects, delayed release avoids the need to
> repeatedly allocate/free space in memory and on disk.
True, but unless you call sync(2) in between these two calls this is going
to still hold.
> Overall, since the actual dirty data is already on the disk, there is no
> consistency issue here as it is just clearing unreferenced quota on the
> disk, so I thought maybe it would be better to call flush_delayed_work()
> in the freeze context.
To summarise, I don't think real-life workloads are going to observe the
benefit and conceptually the call really belongs more to
dquot_writeback_dquots().
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-15 18:34 ` [PATCH 1/1] quota: flush quota_release_work upon quota writeback Ojaswin Mujoo
2024-11-15 20:50 ` Ritesh Harjani
@ 2024-11-18 13:15 ` Jan Kara
2024-11-19 5:42 ` Ojaswin Mujoo
1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-11-18 13:15 UTC (permalink / raw)
To: Ojaswin Mujoo
Cc: linux-ext4, Jan Kara, Ritesh Harjani, linux-kernel, linux-fsdevel,
Disha Goel
On Sat 16-11-24 00:04:49, Ojaswin Mujoo wrote:
> One of the paths quota writeback is called from is:
>
> freeze_super()
> sync_filesystem()
> ext4_sync_fs()
> dquot_writeback_dquots()
>
> Since we currently don't always flush the quota_release_work queue in
> this path, we can end up with the following race:
>
> 1. dquot are added to releasing_dquots list during regular operations.
> 2. FS freeze starts, however, this does not flush the quota_release_work queue.
> 3. Freeze completes.
> 4. Kernel eventually tries to flush the workqueue while FS is frozen which
> hits a WARN_ON since transaction gets started during frozen state:
>
> ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
> __ext4_journal_start_sb+0x64/0x1c0 [ext4]
> ext4_release_dquot+0x90/0x1d0 [ext4]
> quota_release_workfn+0x43c/0x4d0
>
> Which is the following line:
>
> WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
>
> Which ultimately results in generic/390 failing due to dmesg
> noise. This was detected on powerpc machine 15 cores.
>
> To avoid this, make sure to flush the workqueue during
> dquot_writeback_dquots() so we dont have any pending workitems after
> freeze.
>
> Reported-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Thanks for debugging this!
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 3dd8d6f27725..2782cfc8c302 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
> sb->dq_op->write_info(sb, cnt);
> dqstats_inc(DQST_SYNCS);
>
> + flush_delayed_work("a_release_work);
> +
I'd rather do this at the start of dquot_writeback_dquots(). Chances are
this saves some retry loops in the dirty list iterations. That being said I
don't think this is enough as I'm thinking about it. iput() can be called
anytime while the filesystem is frozen (just freeze the filesystem and do
echo 3 >/proc/sys/vm/drop_caches) which will consequently call dquot_drop()
-> dqput(). This should not be really freeing the dquot on-disk structure
(the inode itself is still accounted there) but nevertheless it may end up
dropping the last dquot in-memory reference and ext4_release_dquot() will
call ext4_journal_start() and complain. So I think on top of this patch
which makes sense on its own and deals with 99.9% of cases, we also need
ext4 specific fix which uses sb_start_intwrite() to get freeze protection
in ext4_release_dquot() (and in principle we always needed this, delayed
dquot releasing does not influence this particular problem). Some care will
be needed if the transaction is already started when ext4_release_dquot()
is called - you can take inspiration in how ext4_evict_inode() handles
this.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-18 13:15 ` Jan Kara
@ 2024-11-19 5:42 ` Ojaswin Mujoo
0 siblings, 0 replies; 9+ messages in thread
From: Ojaswin Mujoo @ 2024-11-19 5:42 UTC (permalink / raw)
To: Jan Kara
Cc: linux-ext4, Jan Kara, Ritesh Harjani, linux-kernel, linux-fsdevel,
Disha Goel
On Mon, Nov 18, 2024 at 02:15:12PM +0100, Jan Kara wrote:
> On Sat 16-11-24 00:04:49, Ojaswin Mujoo wrote:
> > One of the paths quota writeback is called from is:
> >
> > freeze_super()
> > sync_filesystem()
> > ext4_sync_fs()
> > dquot_writeback_dquots()
> >
> > Since we currently don't always flush the quota_release_work queue in
> > this path, we can end up with the following race:
> >
> > 1. dquot are added to releasing_dquots list during regular operations.
> > 2. FS freeze starts, however, this does not flush the quota_release_work queue.
> > 3. Freeze completes.
> > 4. Kernel eventually tries to flush the workqueue while FS is frozen which
> > hits a WARN_ON since transaction gets started during frozen state:
> >
> > ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
> > __ext4_journal_start_sb+0x64/0x1c0 [ext4]
> > ext4_release_dquot+0x90/0x1d0 [ext4]
> > quota_release_workfn+0x43c/0x4d0
> >
> > Which is the following line:
> >
> > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> >
> > Which ultimately results in generic/390 failing due to dmesg
> > noise. This was detected on powerpc machine 15 cores.
> >
> > To avoid this, make sure to flush the workqueue during
> > dquot_writeback_dquots() so we dont have any pending workitems after
> > freeze.
> >
> > Reported-by: Disha Goel <disgoel@linux.ibm.com>
> > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>
> Thanks for debugging this!
>
> > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> > index 3dd8d6f27725..2782cfc8c302 100644
> > --- a/fs/quota/dquot.c
> > +++ b/fs/quota/dquot.c
> > @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
> > sb->dq_op->write_info(sb, cnt);
> > dqstats_inc(DQST_SYNCS);
> >
> > + flush_delayed_work("a_release_work);
> > +
>
> I'd rather do this at the start of dquot_writeback_dquots(). Chances are
> this saves some retry loops in the dirty list iterations. That being said I
Hi Jan, thanks for review :)
> don't think this is enough as I'm thinking about it. iput() can be called
> anytime while the filesystem is frozen (just freeze the filesystem and do
> echo 3 >/proc/sys/vm/drop_caches) which will consequently call dquot_drop()
> -> dqput(). This should not be really freeing the dquot on-disk structure
> (the inode itself is still accounted there) but nevertheless it may end up
> dropping the last dquot in-memory reference and ext4_release_dquot() will
> call ext4_journal_start() and complain. So I think on top of this patch
> which makes sense on its own and deals with 99.9% of cases, we also need
> ext4 specific fix which uses sb_start_intwrite() to get freeze protection
> in ext4_release_dquot() (and in principle we always needed this, delayed
> dquot releasing does not influence this particular problem). Some care will
> be needed if the transaction is already started when ext4_release_dquot()
> is called - you can take inspiration in how ext4_evict_inode() handles
> this.
That's a good point Jan, this could indeed happen if we drop caches
destroying an inode pinned in the lru cache. Thanks for the pointers,
I'll try to look into hardening ext4_release_dquot() as you suggested
and send a v2.
Regards,
ojaswin
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] quota: flush quota_release_work upon quota writeback
2024-11-18 12:53 ` Jan Kara
@ 2024-11-19 6:29 ` Baokun Li
0 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2024-11-19 6:29 UTC (permalink / raw)
To: Jan Kara
Cc: Ojaswin Mujoo, Ritesh Harjani, linux-ext4, Jan Kara, linux-kernel,
linux-fsdevel, Disha Goel, Yang Erkun
On 2024/11/18 20:53, Jan Kara wrote:
> On Mon 18-11-24 09:29:19, Baokun Li wrote:
>> On 2024/11/17 1:59, Ojaswin Mujoo wrote:
>>> On Sat, Nov 16, 2024 at 02:20:26AM +0530, Ritesh Harjani wrote:
>>>> Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
>>>>
>>>>> One of the paths quota writeback is called from is:
>>>>>
>>>>> freeze_super()
>>>>> sync_filesystem()
>>>>> ext4_sync_fs()
>>>>> dquot_writeback_dquots()
>>>>>
>>>>> Since we currently don't always flush the quota_release_work queue in
>>>>> this path, we can end up with the following race:
>>>>>
>>>>> 1. dquot are added to releasing_dquots list during regular operations.
>>>>> 2. FS freeze starts, however, this does not flush the quota_release_work queue.
>>>>> 3. Freeze completes.
>>>>> 4. Kernel eventually tries to flush the workqueue while FS is frozen which
>>>>> hits a WARN_ON since transaction gets started during frozen state:
>>>>>
>>>>> ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
>>>>> __ext4_journal_start_sb+0x64/0x1c0 [ext4]
>>>>> ext4_release_dquot+0x90/0x1d0 [ext4]
>>>>> quota_release_workfn+0x43c/0x4d0
>>>>>
>>>>> Which is the following line:
>>>>>
>>>>> WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
>>>>>
>>>>> Which ultimately results in generic/390 failing due to dmesg
>>>>> noise. This was detected on powerpc machine 15 cores.
>>>>>
>>>>> To avoid this, make sure to flush the workqueue during
>>>>> dquot_writeback_dquots() so we dont have any pending workitems after
>>>>> freeze.
>>>> Not just that, sync_filesystem can also be called from other places and
>>>> quota_release_workfn() could write out and and release the dquot
>>>> structures if such are found during processing of releasing_dquots list.
>>>> IIUC, this was earlier done in the same dqput() context but had races
>>>> with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
>>>> dquot structures to releasing_dquots list and will schedule a delayed
>>>> workfn which will process the releasing_dquots list.
>>> Hi Ritesh,
>>>
>>> Ohh right, thanks for the context. I see this was done here:
>>>
>>> dabc8b207566 quota: fix dqput() to follow the guarantees dquot_srcu
>>> should provide
> Yup.
>
>> Nice catch! Thanks for fixing this up!
>>
>> Have you tested the performance impact of this patch? It looks like the
>> unconditional call to flush_delayed_work() in dquot_writeback_dquots()
>> may have some performance impact for frequent chown/sync scenarios.
> Well, but sync(2) or so is expensive anyway. Also dquot_writeback_dquots()
> should persist all pending quota modifications and it is true that pending
> dquot_release() calls can remove quota structures from the quota file and
> thus are by definition pending modifications. So I agree with Ojaswin that
> putting the workqueue flush there makes sense and is practically required
> for data consistency guarantees.
Make sense.
>> When calling release_dquot(), we will only remove the quota of an object
>> (user/group/project) from disk if it is not quota-limited and does not
>> use any inode or block.
>>
>> Asynchronous removal is now much more performance friendly, not only does
>> it make full use of the multi-core, but for scenarios where we have to
>> repeatedly chown between two objects, delayed release avoids the need to
>> repeatedly allocate/free space in memory and on disk.
> True, but unless you call sync(2) in between these two calls this is going
> to still hold.
Yeah without sync or syncfs, it's the same as before.
>> Overall, since the actual dirty data is already on the disk, there is no
>> consistency issue here as it is just clearing unreferenced quota on the
>> disk, so I thought maybe it would be better to call flush_delayed_work()
>> in the freeze context.
> To summarise, I don't think real-life workloads are going to observe the
> benefit and conceptually the call really belongs more to
> dquot_writeback_dquots().
>
> Honza
Okay, thanks for the feedback!
Regards,
Baokun
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-11-19 6:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-15 18:34 [PATCH 0/1] Fix generic/390 failure due to quota release after freeze Ojaswin Mujoo
2024-11-15 18:34 ` [PATCH 1/1] quota: flush quota_release_work upon quota writeback Ojaswin Mujoo
2024-11-15 20:50 ` Ritesh Harjani
2024-11-16 17:59 ` Ojaswin Mujoo
2024-11-18 1:29 ` Baokun Li
2024-11-18 12:53 ` Jan Kara
2024-11-19 6:29 ` Baokun Li
2024-11-18 13:15 ` Jan Kara
2024-11-19 5:42 ` Ojaswin Mujoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox