* [fs] inode_lru_isolate(): Move counter increment into spinlock section
@ 2013-12-18 19:24 Christoph Lameter
2013-12-19 4:23 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2013-12-18 19:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-kernel, Alexander Viro
The counter increment in inode_lru_isolate is happening after
spinlocks have been dropped with preemption on using __count_vm_events
making counter increment races possible.
Move the counter increments to be done when the spinlock is
reacquired later so that the counter can be safely incremented.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c 2013-12-18 13:14:43.211693438 -0600
+++ linux/fs/inode.c 2013-12-18 13:15:58.489266129 -0600
@@ -715,21 +715,21 @@ inode_lru_isolate(struct list_head *item
}
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
+ unsigned long reap = 0;
__iget(inode);
spin_unlock(&inode->i_lock);
spin_unlock(lru_lock);
if (remove_inode_buffers(inode)) {
- unsigned long reap;
reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
- if (current_is_kswapd())
- __count_vm_events(KSWAPD_INODESTEAL, reap);
- else
- __count_vm_events(PGINODESTEAL, reap);
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += reap;
}
iput(inode);
spin_lock(lru_lock);
+ if (current_is_kswapd())
+ __count_vm_events(KSWAPD_INODESTEAL, reap);
+ else
+ __count_vm_events(PGINODESTEAL, reap);
return LRU_RETRY;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fs] inode_lru_isolate(): Move counter increment into spinlock section
2013-12-18 19:24 [fs] inode_lru_isolate(): Move counter increment into spinlock section Christoph Lameter
@ 2013-12-19 4:23 ` Dave Chinner
2013-12-19 15:26 ` Christoph Lameter
0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-12-19 4:23 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Dave Chinner, linux-kernel, Alexander Viro
On Wed, Dec 18, 2013 at 07:24:46PM +0000, Christoph Lameter wrote:
> The counter increment in inode_lru_isolate is happening after
> spinlocks have been dropped with preemption on using __count_vm_events
> making counter increment races possible.
That's a nasty, undocumented problem that __count_vm_events() has.
Nobody who is modifying the fs/inode.c code is likely to know about
this, so just moving the code under an unrelated lock is not
sufficient to prevent this from happening again. Hence I'd prefer
that you just change it to use count_vm_events() rather than try to
be tricksy by replacing the landmine in the code that we've already
stepped on once.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fs] inode_lru_isolate(): Move counter increment into spinlock section
2013-12-19 4:23 ` Dave Chinner
@ 2013-12-19 15:26 ` Christoph Lameter
2013-12-20 3:35 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2013-12-19 15:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: Dave Chinner, linux-kernel, Alexander Viro
On Thu, 19 Dec 2013, Dave Chinner wrote:
> On Wed, Dec 18, 2013 at 07:24:46PM +0000, Christoph Lameter wrote:
> > The counter increment in inode_lru_isolate is happening after
> > spinlocks have been dropped with preemption on using __count_vm_events
> > making counter increment races possible.
>
> That's a nasty, undocumented problem that __count_vm_events() has.
AFACIT that is a pretty well established and known issue. It only
affects cases where the fallback code for the counter increments is used.
> Nobody who is modifying the fs/inode.c code is likely to know about
> this, so just moving the code under an unrelated lock is not
> sufficient to prevent this from happening again. Hence I'd prefer
> that you just change it to use count_vm_events() rather than try to
> be tricksy by replacing the landmine in the code that we've already
> stepped on once.
I have a patchset here that is supposed to be merged soon that will detect
these cases.
Moving the code is IMHO the simplest solution. count_vm_events
will have to disable interrupts on platforms that do not support fast RMV
operations otherwise.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fs] inode_lru_isolate(): Move counter increment into spinlock section
2013-12-19 15:26 ` Christoph Lameter
@ 2013-12-20 3:35 ` Dave Chinner
2013-12-20 17:52 ` Christoph Lameter
2013-12-20 18:05 ` Christoph Lameter
0 siblings, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2013-12-20 3:35 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Dave Chinner, linux-kernel, Alexander Viro
On Thu, Dec 19, 2013 at 09:26:12AM -0600, Christoph Lameter wrote:
> On Thu, 19 Dec 2013, Dave Chinner wrote:
>
> > On Wed, Dec 18, 2013 at 07:24:46PM +0000, Christoph Lameter wrote:
> > > The counter increment in inode_lru_isolate is happening after
> > > spinlocks have been dropped with preemption on using __count_vm_events
> > > making counter increment races possible.
> >
> > That's a nasty, undocumented problem that __count_vm_events() has.
>
> AFACIT that is a pretty well established and known issue. It only
> affects cases where the fallback code for the counter increments is used.
Maybe for mm developers. Not so filesystem people, and certainly not
the person who made the last set of modifications, even though I do
spend a fair bit of time around the fringes of the MM code...
> > Nobody who is modifying the fs/inode.c code is likely to know about
> > this, so just moving the code under an unrelated lock is not
> > sufficient to prevent this from happening again. Hence I'd prefer
> > that you just change it to use count_vm_events() rather than try to
> > be tricksy by replacing the landmine in the code that we've already
> > stepped on once.
>
> I have a patchset here that is supposed to be merged soon that will detect
> these cases.
>
> Moving the code is IMHO the simplest solution. count_vm_events
> will have to disable interrupts on platforms that do not support fast RMV
> operations otherwise.
If count_vm_events requires irqs to be disabled to behave correctly,
then putting __count_vm_events under a spin lock is still not irq
safe. Either way, this isn't in a performance critical path, so I'd
much prefer the simpler, safer option be used rather than leave a
landmine for other unsuspecting developers.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fs] inode_lru_isolate(): Move counter increment into spinlock section
2013-12-20 3:35 ` Dave Chinner
@ 2013-12-20 17:52 ` Christoph Lameter
2013-12-20 18:05 ` Christoph Lameter
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Lameter @ 2013-12-20 17:52 UTC (permalink / raw)
To: Dave Chinner; +Cc: Dave Chinner, linux-kernel, Alexander Viro
On Fri, 20 Dec 2013, Dave Chinner wrote:
> > Moving the code is IMHO the simplest solution. count_vm_events
> > will have to disable interrupts on platforms that do not support fast RMV
> > operations otherwise.
>
> If count_vm_events requires irqs to be disabled to behave correctly,
> then putting __count_vm_events under a spin lock is still not irq
> safe. Either way, this isn't in a performance critical path, so I'd
> much prefer the simpler, safer option be used rather than leave a
> landmine for other unsuspecting developers.
What we need is just preempt safeness. But there are no operations that
are just preempt safe and not interrupt safe (operations were removed
since seen as too excessive). So we fall back to interrupt
safe.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fs] inode_lru_isolate(): Move counter increment into spinlock section
2013-12-20 3:35 ` Dave Chinner
2013-12-20 17:52 ` Christoph Lameter
@ 2013-12-20 18:05 ` Christoph Lameter
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Lameter @ 2013-12-20 18:05 UTC (permalink / raw)
To: Dave Chinner; +Cc: Dave Chinner, linux-kernel, Alexander Viro
On Fri, 20 Dec 2013, Dave Chinner wrote:
> If count_vm_events requires irqs to be disabled to behave correctly,
> then putting __count_vm_events under a spin lock is still not irq
> safe. Either way, this isn't in a performance critical path, so I'd
> much prefer the simpler, safer option be used rather than leave a
> landmine for other unsuspecting developers.
Subject: [fs] Use safe counter increment operations
The counter increment in inode_lru_isolate is happening with
preemption on using __count_vm_events making counter
increment races possible.
Use count_vm_events instead.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c 2013-12-20 12:02:14.409583380 -0600
+++ linux/fs/inode.c 2013-12-20 12:02:14.409583380 -0600
@@ -722,9 +722,9 @@ inode_lru_isolate(struct list_head *item
unsigned long reap;
reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
if (current_is_kswapd())
- __count_vm_events(KSWAPD_INODESTEAL, reap);
+ count_vm_events(KSWAPD_INODESTEAL, reap);
else
- __count_vm_events(PGINODESTEAL, reap);
+ count_vm_events(PGINODESTEAL, reap);
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += reap;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-20 18:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-18 19:24 [fs] inode_lru_isolate(): Move counter increment into spinlock section Christoph Lameter
2013-12-19 4:23 ` Dave Chinner
2013-12-19 15:26 ` Christoph Lameter
2013-12-20 3:35 ` Dave Chinner
2013-12-20 17:52 ` Christoph Lameter
2013-12-20 18:05 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox