Re: Feedback on preemptible kernel patch

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Feedback on preemptible kernel patch
@ 2001-09-15 19:18 Robert Love
  2001-09-16  1:28 ` Daniel Phillips
  2001-09-18 11:05 ` Feedback on preemptible kernel patch xfs jury gerold
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Love @ 2001-09-15 19:18 UTC (permalink / raw)
  To: phillips; +Cc: linux-kernel

On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> comes either from grow_buffers or ReiserFS, probably the former.  In
> either case, it means we ran completely out of free pages, even though
> the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> scanning.

Hi, Daniel.  If you remember, a few users of the preemption patch
reported instability and/or syslog messages such as:

Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd last message repeated 93 times
Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd last message repeated 281 times

It now seems that all of them are indeed using ReiserFS.  There are no
other reported problems with the preemption patch, except from those
users...

I am beginning to muse over the source, looking at when kmalloc is
called with GFP_NOFS in ReiserFS, and then the path that code takes in
the VM source.

I assume the kernel VM code has a hole somewhere, and the request is
falling through?  It should wait, even if no pages were free so, right? 

Where should I begin looking?  How does it relate to ReiserFS?  How is
preemption related?

Thank you very much,

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-15 19:18 Feedback on preemptible kernel patch Robert Love
@ 2001-09-16  1:28 ` Daniel Phillips
  2001-09-16  1:54   ` Daniel Phillips
  2001-09-18 11:05 ` Feedback on preemptible kernel patch xfs jury gerold
  1 sibling, 1 reply; 9+ messages in thread
From: Daniel Phillips @ 2001-09-16  1:28 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel, Marcelo Tosatti, Chris Mason

On September 15, 2001 09:18 pm, Robert Love wrote:
> On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> > This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> > comes either from grow_buffers or ReiserFS, probably the former.  In
> > either case, it means we ran completely out of free pages, even though
> > the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> > scanning.
> 
> Hi, Daniel.  If you remember, a few users of the preemption patch
> reported instability and/or syslog messages such as:
> 
> Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd last message repeated 93 times
> Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd last message repeated 281 times
> 
> It now seems that all of them are indeed using ReiserFS.  There are no
> other reported problems with the preemption patch, except from those
> users...
> 
> I am beginning to muse over the source, looking at when kmalloc is
> called with GFP_NOFS in ReiserFS, and then the path that code takes in
> the VM source.
> 
> I assume the kernel VM code has a hole somewhere, and the request is
> falling through?  It should wait, even if no pages were free so, right? 
> 
> Where should I begin looking?  How does it relate to ReiserFS?  

The only other path that uses NO_FS is grow_buffers.  But the page probably 
got dirtied in generic_file_write, which already put buffers on it.  A NOFS 
allocation could also be triggered by Ext2 (and other filesystems) by having 
lots of dirty mmaps: when page_launder calls page->writepage then the page
won't have buffers on it.  That's probably not what's happening though.

I don't think NOFS is causing the problem by the way, it's just a convenient 
marker to recognize where the allocation is coming from.

What happens is, page_launder calls reseiserfs_writepage which for some 
reason recursively allocates a page (I don't have time to look for the exact 
path - it's probably for the journal - but whoever has the problem can check 
it via show_trace).  We now are in a recursive allocation situation (with 
PF_MEMALLOC) so page_launder doesn't get called and we drop through to get 
the "failed" message.

It's not nice for __alloc_pages to fail back to a caller that's willing to
wait.  See below for one idea about what to do about it.

> How is preemption related?

I'll speculate: page_launder is now yielding to other tasks when it releases
spinlocks to do a writepage.  One of them is likely to come back in and
attempt another allocation while we're at rock bottom.

If that's true then I think we should consider something I've wanted to try:
make callers block on a wait queue in __alloc_pages when memory is really
tight.

Hmm.  We could do that just in this specific case of PF_MEMALLOC+GFP_WAIT.
Semaphores work well for this kind of thing, something like:

	if (!(current->flags & PF_MEMALLOC)) {
		<the existing reclaim/launder logic>
	} else
		if (gfp_mask & __GFP_WAIT) {
			wakeup_kswapd();
			atomic_inc(&memwaiters);
			down(&memwait);
			goto try_again;
		}

Then in kswapd:

        waiters = atomic_read(&memwaiters);
        atomic_sub(waiters, &memwaiters);
	while (waiters--)
		up(&memwaiter);

when we detect that free memory is restored to something reasonable.  This
won't deadlock on memwait because kswapd doesn't use __GFP_WAIT.

We also have to make kswapd count wakeups so we can be sure it doesn't sleep
while somebody is waiting in __alloc_pages.  The only way I know to do this
reliably is with another semaphore:

	void wakeup_kswapd() {
		up(&kswapd_sleep);
	}

and kswapd downs that semaphore instead of doing interruptible_sleep_on_timeout.
Additionally, a timer has to up() the semaphore periodically, to recover the
sleep_on_timeout behaviour.

Sound like overkill?  The alternative is to let GFP_WAIT allocations fail which
forces users like journalling filesystems to busy wait and load up the runqueue.

Sorry I don't have time to code this just now, but I'd like to give this a try
if the problem's still there next week though.  Or if you're in the mood...

--
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-16  1:28 ` Daniel Phillips
@ 2001-09-16  1:54   ` Daniel Phillips
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Phillips @ 2001-09-16  1:54 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel, Marcelo Tosatti, Chris Mason

On September 16, 2001 03:28 am, Daniel Phillips wrote:
> On September 15, 2001 09:18 pm, Robert Love wrote:
> > On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> > > This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> > > comes either from grow_buffers or ReiserFS, probably the former.  In
> > > either case, it means we ran completely out of free pages, even though
> > > the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> > > scanning.

Oh, wait, I was working off 2.4.9 source, and your user had the problem with
2.4.9-pre4, where we have GFP_NOHIGHIO.  So - reinterpreting the bits - all
those failures are bounce buffer allocations.  People are also getting these
failures without your patch, so relax ;-)

Maybe allowing preemption inside page_launder makes it happen more often.

--
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-15 19:18 Feedback on preemptible kernel patch Robert Love
  2001-09-16  1:28 ` Daniel Phillips
@ 2001-09-18 11:05 ` jury gerold
  2001-09-18 22:52   ` Robert Love
  1 sibling, 1 reply; 9+ messages in thread
From: jury gerold @ 2001-09-18 11:05 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

I used your patch on 2.4.10-pre10-xfs from the SGI cvs tree.
2 files had to be changed
fs/xfs_support/atomic.h and fs/xfs_support/mutex.h
needed a include sched.h

rootfilesystem is ext2 everything else is xfs
athlon optimisation is switched on
chipset is via
the nvidia kernel module for OpenGL acceleration is running
hisax isdn driver for internet access
USB web cam

I have tried heavy filesystem operations (cp -ar x y && rm -rf y)
with a big compile job -j2 and some OpenGL programs together with the 
web cam

on the USB side i had some "_comp parameters have gone AWOL" messages in 
the syslog from the cpia driver
but i remember them from a no preemtion kernel as well

so far everything is stable

i like the idea but i have not made any tests on the latency yet

Regards
Gerold

Robert Love wrote:

>It now seems that all of them are indeed using ReiserFS.  There are no
>other reported problems with the preemption patch, except from those
>users...
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-18 11:05 ` Feedback on preemptible kernel patch xfs jury gerold
@ 2001-09-18 22:52   ` Robert Love
  2001-09-20  1:49     ` Gerold Jury
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Love @ 2001-09-18 22:52 UTC (permalink / raw)
  To: jury gerold; +Cc: linux-kernel

On Tue, 2001-09-18 at 07:05, jury gerold wrote:
> I used your patch on 2.4.10-pre10-xfs from the SGI cvs tree.
> 2 files had to be changed
> fs/xfs_support/atomic.h and fs/xfs_support/mutex.h
> needed a include sched.h

Thank you for reporting this.

I just made a diff against xfs-cvs-20010917 with your changes. 
Obviously I can't merge the changes into the main patch since not
everyone has XFS, but I will make the patch available.

> rootfilesystem is ext2 everything else is xfs
> athlon optimisation is switched on
> chipset is via
> the nvidia kernel module for OpenGL acceleration is running
> hisax isdn driver for internet access
> USB web cam
> 
> I have tried heavy filesystem operations (cp -ar x y && rm -rf y)
> with a big compile job -j2 and some OpenGL programs together with the 
> web cam

Good.  Then we can probably mark XFS as preempt safe (no reason to think
otherwise).  Those file operations were on the XFS partitions, right?

> on the USB side i had some "_comp parameters have gone AWOL" messages in 
> the syslog from the cpia driver
> but i remember them from a no preemtion kernel as well

Yah, probably from mainline kernel...

> so far everything is stable

excellent.

> i like the idea but i have not made any tests on the latency yet

if you do, please post.

thanks for the feedback,

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-20  1:49     ` Gerold Jury
@ 2001-09-20  0:56       ` Robert Love
  2001-09-21 12:29         ` Gerold Jury
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Love @ 2001-09-20  0:56 UTC (permalink / raw)
  To: Gerold Jury; +Cc: linux-kernel

On Wed, 2001-09-19 at 21:49, Gerold Jury wrote:
> First the good news.
> Even my most ugly ideas where not able to crash your preemtible
> 2.4.10-pre10-xfs

Good to hear.

> But, to be sure i repeated everything, neither latencytest-0.42 nor
> my own tests could find a difference with or without the preemptible
> patch. I do not know if i can expect a lower latency at this stage of
> development.

I am surprised, you should see a difference, especially with the
latencytest.  Silly question, but you both applied the patch and enabled
the config statement, right?

No, at this stage of development we are seeing greatly reduced latency
times with the patch.  Continued work is going to be on improving
locking mechanisms, but this is something that will come about later and
improve the kernel overall.

> A maximum of 15 msec latency with all the stress, i managed to put on the
> machine is not that bad anyway.

No, 15ms is very good.  I am seeing things 5-10ms here, but much much
higher without preemption. Odd.

> The CPU is a 1.1GHz Athlon. I forgot to mention this.

Oh, Good. we earlier had problems with an Athlon optimized kernel, but
we have solved those problems.

> I will continue to test the preempt patches.

Thank you.

> Do you want me to test anything special ?

I can't think of a benchmark that tests various aspects of a filesystem
(file creation/deletion, directory seeking and listing, etc.) but that
would be great to see if xfs improves with preemption.

You can test raw disk I/O with dbench ftp://samba.org/pub/tridge/dbench/
... try 16 threads (dbench -16).

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-18 22:52   ` Robert Love
@ 2001-09-20  1:49     ` Gerold Jury
  2001-09-20  0:56       ` Robert Love
  0 siblings, 1 reply; 9+ messages in thread
From: Gerold Jury @ 2001-09-20  1:49 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Hi Robert

First the good news.
Even my most ugly ideas where not able to crash your preemtible 2.4.10-pre10-xfs

But, to be sure i repeated everything, neither latencytest-0.42 nor my own tests could
find a difference with or without the preemptible patch.
I do not know if i can expect a lower latency at this stage of development.

A maximum of 15 msec latency with all the stress, i managed to put on the machine
is not that bad anyway.

The CPU is a 1.1GHz Athlon. I forgot to mention this.

I will continue to test the preempt patches.

Do you want me to test anything special ?

Best Regards
Gerold

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-20  0:56       ` Robert Love
@ 2001-09-21 12:29         ` Gerold Jury
  2001-09-21 19:50           ` Robert Love
  0 siblings, 1 reply; 9+ messages in thread
From: Gerold Jury @ 2001-09-21 12:29 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

On Thursday 20 September 2001 02:56, Robert Love wrote:
> I am surprised, you should see a difference, especially with the
> latencytest.  Silly question, but you both applied the patch and enabled
> the config statement, right?
>
Really, i have checked twice.
The patch could, by the way, write a line to the syslog when enabled.

All the filesystem operations happend on the xfs partitions.
I noticed more equally distributed read/write operations with smaller slices 
during big copy jobs on xfs.
This effect may well come from the preemption patch. I used a spare partition
for the test, so the filesystem was in the same state with both kernels 
during the tests.
Xfs usually delays the write operations and does them in bigger blocks.
The behavior of XFS has changed with the kernel versions towards this 
direction anyway but is clearly different with the preemption patch.

I will redo the latency tests with the standard Xfree86 nvidia driver.
It may give a different picture.
The graphics test and the /proc test have shown the highest latency's.
Both involve the xserver (proc for the xterm).
The other tests have been around 5-6 msec in both cases.

And i will do the dbench test of course.

Gerold

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Feedback on preemptible kernel patch xfs
  2001-09-21 12:29         ` Gerold Jury
@ 2001-09-21 19:50           ` Robert Love
  0 siblings, 0 replies; 9+ messages in thread
From: Robert Love @ 2001-09-21 19:50 UTC (permalink / raw)
  To: Gerold Jury; +Cc: linux-kernel

On Fri, 2001-09-21 at 08:29, Gerold Jury wrote:
> On Thursday 20 September 2001 02:56, Robert Love wrote:
> > I am surprised, you should see a difference, especially with the
> > latencytest.  Silly question, but you both applied the patch and enabled
> > the config statement, right?
> >
> Really, i have checked twice.
> The patch could, by the way, write a line to the syslog when enabled.

OK, I believe you :)

Yes, but I find all the `NET4.0 loaded!' as crap as it is.  If
CONFIG_PREEMPT is defined, rest assured the code is correct.

> All the filesystem operations happend on the xfs partitions.
> I noticed more equally distributed read/write operations with smaller slices 
> during big copy jobs on xfs.
> This effect may well come from the preemption patch. I used a spare partition
> for the test, so the filesystem was in the same state with both kernels 
> during the tests.
> Xfs usually delays the write operations and does them in bigger blocks.
> The behavior of XFS has changed with the kernel versions towards this 
> direction anyway but is clearly different with the preemption patch.
> 
> I will redo the latency tests with the standard Xfree86 nvidia driver.
> It may give a different picture.
> The graphics test and the /proc test have shown the highest latency's.
> Both involve the xserver (proc for the xterm).
> The other tests have been around 5-6 msec in both cases.
> 
> And i will do the dbench test of course.

Very good. Please let me know.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-09-21 19:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-15 19:18 Feedback on preemptible kernel patch Robert Love
2001-09-16  1:28 ` Daniel Phillips
2001-09-16  1:54   ` Daniel Phillips
2001-09-18 11:05 ` Feedback on preemptible kernel patch xfs jury gerold
2001-09-18 22:52   ` Robert Love
2001-09-20  1:49     ` Gerold Jury
2001-09-20  0:56       ` Robert Love
2001-09-21 12:29         ` Gerold Jury
2001-09-21 19:50           ` Robert Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox