pdflush eating a lot of CPU on heavy NFS I/O

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* pdflush eating a lot of CPU on heavy NFS I/O
@ 2004-04-28  5:18 Brent Cook
  2004-04-28  6:02 ` Andrew Morton
  2004-04-28  6:06 ` Brett E.
  0 siblings, 2 replies; 22+ messages in thread
From: Brent Cook @ 2004-04-28  5:18 UTC (permalink / raw)
  To: Kernel Mailing List

Hi,

  Running any kernel from the 2.6.6-rc* series (and a few previous
-mm*'s), the pdflush process starts using near 100% CPU indefinitely after
a few minutes of initial NFS traffic, as far as I can tell.

To trigger this, I can either compile a kernel with the source residing on
an NFS share, or compile something bigger, like KDE.

I get the same results running on a PIII with an i815 chipset with
ReiserFS 3, or a newer nForce2 board with an Athlon XP on ext3, so I don't
think it has to do with the IDE chipsets or filesystems. pdflush has
something to do with writing back FS data, and NFS is just the common
factor between systems that experience this problem. pdflush just seems to
hang when the system is heavily loaded and eat up all CPU resources even
when the system is otherwise idle. Renice failes to reschedule pdflush.

 This didn't seem to be a problem with 2.6.5 or 2.4. Is there something I
can do to control pdflush or to provide more information?

Thanks
 - Brent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28  5:18 pdflush eating a lot of CPU on heavy NFS I/O Brent Cook
@ 2004-04-28  6:02 ` Andrew Morton
  2004-04-28  7:29   ` Phy Prabab
                     ` (2 more replies)
  2004-04-28  6:06 ` Brett E.
  1 sibling, 3 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-28  6:02 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
>   Running any kernel from the 2.6.6-rc* series (and a few previous
>  -mm*'s),

It's a shame this wasn't reported earlier.

> the pdflush process starts using near 100% CPU indefinitely after
>  a few minutes of initial NFS traffic, as far as I can tell.

Please confirm that the problem is observed on the NFS client and not the
NFS server?  I'll assume the client.

What other filesystems are in use on the client?

Please describe the NFS mount options and the number of CPUs and the amount
of memory in the machine.  And please send me your .config, off-list.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28  6:02 ` Andrew Morton
@ 2004-04-28  7:29   ` Phy Prabab
  2004-04-28 10:18   ` Sven Geggus
  2004-04-28 13:50   ` Brent Cook
  2 siblings, 0 replies; 22+ messages in thread
From: Phy Prabab @ 2004-04-28  7:29 UTC (permalink / raw)
  To: Andrew Morton, busterbcook; +Cc: linux-kernel

If you guys could please either cc or keep this on the
mailing list I would greatly appreciate it.  I am
currently planning to deploy a dual Xeon FS with
2.6.5+akpm-mm6 and would like to know what the issue
is that Brent is seeing.

Thanks!

--- Andrew Morton <akpm@osdl.org> wrote:
> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> >   Running any kernel from the 2.6.6-rc* series
> (and a few previous
> >  -mm*'s),
> 
> It's a shame this wasn't reported earlier.
> 
> > the pdflush process starts using near 100% CPU
> indefinitely after
> >  a few minutes of initial NFS traffic, as far as I
> can tell.
> 
> Please confirm that the problem is observed on the
> NFS client and not the
> NFS server?  I'll assume the client.
> 
> What other filesystems are in use on the client?
> 
> Please describe the NFS mount options and the number
> of CPUs and the amount
> of memory in the machine.  And please send me your
> .config, off-list.
> 
> Thanks.
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


	
		
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28  6:02 ` Andrew Morton
  2004-04-28  7:29   ` Phy Prabab
@ 2004-04-28 10:18   ` Sven Geggus
  2004-04-28 13:50   ` Brent Cook
  2 siblings, 0 replies; 22+ messages in thread
From: Sven Geggus @ 2004-04-28 10:18 UTC (permalink / raw)
  To: linux-kernel

Andrew Morton <akpm@osdl.org> wrote:

> It's a shame this wasn't reported earlier.

I did report this behaviour in <c6gi0f$g6i$1@benzin.geggus.net> a few days
ago.

> Please confirm that the problem is observed on the NFS client and not the
> NFS server?  I'll assume the client.

Shure! The problem is observed on the NFS client, a diskless machine in my
case.

> What other filesystems are in use on the client?

Non in my case!

> Please describe the NFS mount options and the number of CPUs and the amount
> of memory in the machine.

NFS mount options are default and my machine does not use an SMP Kernel (AMD
Athlon 2000+, Single CPU).

Sven

-- 
/*
 * Wirzenius wrote this portably, Torvalds fucked it up :-)
 */                        (taken from /usr/src/linux/lib/vsprintf.c)
/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28  6:02 ` Andrew Morton
  2004-04-28  7:29   ` Phy Prabab
  2004-04-28 10:18   ` Sven Geggus
@ 2004-04-28 13:50   ` Brent Cook
  2004-04-28 19:48     ` Andrew Morton
  2 siblings, 1 reply; 22+ messages in thread
From: Brent Cook @ 2004-04-28 13:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Tue, 27 Apr 2004, Andrew Morton wrote:

> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> >   Running any kernel from the 2.6.6-rc* series (and a few previous
> >  -mm*'s),
>
> It's a shame this wasn't reported earlier.

Since it was a pretty big deal on my system, I just assumed it was for
other people's too, and that someone else would have reported it by
now. I only got concerned when it persisted between rc's.

> > the pdflush process starts using near 100% CPU indefinitely after
> >  a few minutes of initial NFS traffic, as far as I can tell.
>
> Please confirm that the problem is observed on the NFS client and not the
> NFS server?  I'll assume the client.

Yes, both affected machines had the issue when connecting as a client to a
2.4.25-based NFS server.

> What other filesystems are in use on the client?

One uses Reiser on /, the other uses ext3 on /. Here is the mount table
for one machine:

/dev/hda3 on / type ext3 (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
usbfs on /proc/bus/usb type usbfs (rw)
ozma:/home on /home type nfs (rw,addr=192.168.1.1)

Running 2.6.6-rc2-mm1,
Here is a shot compiling KDE with the source on the NFS mount,
-j2. This is the initial state:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
12091 busterb   25   0 63524  59M  5140 R    38.4 23.8   0:19   0 cc1plus
12199 busterb   25   0 55660  52M  5140 R    38.0 20.8   0:07   0 cc1plus
    7 root      16   0     0    0     0 SW    4.9  0.0   0:03   0 pdflush

About 10 minutes into the process, pdflush starts taking over:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
    7 root      25   0     0    0     0 RW   34.4  0.0   3:05   0 pdflush
17856 busterb   25   0 69400  65M  5140 R    34.4 26.1   0:31   0 cc1plus
19466 busterb   25   0 43732  39M  5140 R    26.3 15.5   0:03   0 cc1plus

After stopping the compile, pdflush remains until a reboot:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
    7 root      25   0     0    0     0 RW   98.0  0.0   3:21   0 pdflush

The network light will flash continually on each machine once pdflush
gets into this state, which makes me think NFS. Each machine has
512-256 MB of ram and a single CPU.

 - Brent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 13:50   ` Brent Cook
@ 2004-04-28 19:48     ` Andrew Morton
  2004-04-28 20:12       ` Brent Cook
  2004-04-28 20:39       ` Brent Cook
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-28 19:48 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
> On Tue, 27 Apr 2004, Andrew Morton wrote:
> 
> > Brent Cook <busterbcook@yahoo.com> wrote:
> > >
> > >   Running any kernel from the 2.6.6-rc* series (and a few previous
> > >  -mm*'s),
> >
> > It's a shame this wasn't reported earlier.
> 
> Since it was a pretty big deal on my system, I just assumed it was for
> other people's too, and that someone else would have reported it by
> now. I only got concerned when it persisted between rc's.

I think three people have reported it now.

> > > the pdflush process starts using near 100% CPU indefinitely after
> > >  a few minutes of initial NFS traffic, as far as I can tell.
> >
> > Please confirm that the problem is observed on the NFS client and not the
> > NFS server?  I'll assume the client.
> 
> Yes, both affected machines had the issue when connecting as a client to a
> 2.4.25-based NFS server.
> 
> > What other filesystems are in use on the client?
> 
> One uses Reiser on /, the other uses ext3 on /. Here is the mount table
> for one machine:

Both machines are exhibiting the problem?

> About 10 minutes into the process, pdflush starts taking over:
> 
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>     7 root      25   0     0    0     0 RW   34.4  0.0   3:05   0 pdflush
> 17856 busterb   25   0 69400  65M  5140 R    34.4 26.1   0:31   0 cc1plus
> 19466 busterb   25   0 43732  39M  5140 R    26.3 15.5   0:03   0 cc1plus
> ...
> The network light will flash continually on each machine once pdflush
> gets into this state, which makes me think NFS. Each machine has
> 512-256 MB of ram and a single CPU.

ok..  I spent a couple of hours yesterday trying to get this to happen.  No
joy.  Can't make it happen with your .config either.  I'll set up a 2.4.25
server later on.

What version of gcc are you using?

Could you please capture the contents of /proc/meminfo and /proc/vmstats
when it's happening?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 19:48     ` Andrew Morton
@ 2004-04-28 20:12       ` Brent Cook
  2004-04-28 20:39       ` Brent Cook
  1 sibling, 0 replies; 22+ messages in thread
From: Brent Cook @ 2004-04-28 20:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Wed, 28 Apr 2004, Andrew Morton wrote:

> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> > On Tue, 27 Apr 2004, Andrew Morton wrote:
> >
> > > Brent Cook <busterbcook@yahoo.com> wrote:
>
> > > > the pdflush process starts using near 100% CPU indefinitely after
> > > >  a few minutes of initial NFS traffic, as far as I can tell.
> > >
> > > Please confirm that the problem is observed on the NFS client and not the
> > > NFS server?  I'll assume the client.
> >
> > Yes, both affected machines had the issue when connecting as a client to a
> > 2.4.25-based NFS server.
> >
> > > What other filesystems are in use on the client?
> >
> > One uses Reiser on /, the other uses ext3 on /. Here is the mount table
> > for one machine:
>
> Both machines are exhibiting the problem?

Yes. They both exhibit about 5-10 minutes after starting once I start
compiling from the NFS share. I'm going to also try a big compile from a
local directory to exercise the local FS to eliminate it as a possibility.

> > About 10 minutes into the process, pdflush starts taking over:
> >
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> >     7 root      25   0     0    0     0 RW   34.4  0.0   3:05   0 pdflush
> > 17856 busterb   25   0 69400  65M  5140 R    34.4 26.1   0:31   0 cc1plus
> > 19466 busterb   25   0 43732  39M  5140 R    26.3 15.5   0:03   0 cc1plus
> > ...
> > The network light will flash continually on each machine once pdflush
> > gets into this state, which makes me think NFS. Each machine has
> > 512-256 MB of ram and a single CPU.
>
> ok..  I spent a couple of hours yesterday trying to get this to happen.  No
> joy.  Can't make it happen with your .config either.  I'll set up a 2.4.25
> server later on.
>
> What version of gcc are you using?

3.3.3 previously. I just did a build with 3.4 with the same results; I'm
tracking Slackware-current, so my build environment reflects this.

> Could you please capture the contents of /proc/meminfo and /proc/vmstats
> when it's happening?

Will do. I can also provide a test account to you offline if it helps so
you can see it for yourself.

  - Brent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 19:48     ` Andrew Morton
  2004-04-28 20:12       ` Brent Cook
@ 2004-04-28 20:39       ` Brent Cook
  2004-04-28 21:19         ` Trond Myklebust
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Brent Cook @ 2004-04-28 20:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> Could you please capture the contents of /proc/meminfo and /proc/vmstats
> when it's happening?
>
> Thanks.
>

Here is the top of top for one machine:

 15:36:55  up  7:09,  1 user,  load average: 1.00, 1.00, 1.00
48 processes: 46 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:   0.1% user  99.8% system   0.0% nice   0.0% iowait   0.0% idle
Mem:   256992k av,  117644k used,  139348k free,       0k shrd,   36464k buff
        50968k active,              51592k inactive
Swap:  514040k av,       0k used,  514040k free                   61644k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
    7 root      25   0     0    0     0 RW   99.4  0.0 415:26   0 pdflush

Here are memory stats for the same machine. The other machine's stats
are similar; there doesn't appear to be anything out of the ordinary and
its not even touching swap if these numbers are to be believed.

busterb@snowball2:~$ cat /proc/meminfo
MemTotal:       256992 kB
MemFree:        139700 kB
Buffers:         36464 kB
Cached:          61516 kB
SwapCached:          0 kB
Active:          50536 kB
Inactive:        51672 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       256992 kB
LowFree:        139700 kB
SwapTotal:      514040 kB
SwapFree:       514040 kB
Dirty:            1876 kB
Writeback:           0 kB
Mapped:           8552 kB
Slab:            10924 kB
Committed_AS:    14612 kB
PageTables:        356 kB
VmallocTotal:   778164 kB
VmallocUsed:      2936 kB
VmallocChunk:   774708 kB

busterb@snowball2:~$ cat /proc/vmstat
nr_dirty 469
nr_writeback 0
nr_unstable 0
nr_page_table_pages 89
nr_mapped 2138
nr_slab 2730
pgpgin 79849
pgpgout 121656
pswpin 0
pswpout 0
pgalloc_high 0
pgalloc_normal 1812796
pgalloc_dma 18991
pgfree 1866775
pgactivate 55529
pgdeactivate 14634
pgfault 3622942
pgmajfault 2322
pgrefill_high 0
pgrefill_normal 16726
pgrefill_dma 49791
pgsteal_high 0
pgsteal_normal 11781
pgsteal_dma 203
pgscan_kswapd_high 0
pgscan_kswapd_normal 10065
pgscan_kswapd_dma 231
pgscan_direct_high 0
pgscan_direct_normal 2310
pgscan_direct_dma 0
pginodesteal 0
slabs_scanned 4349
kswapd_steal 9758
kswapd_inodesteal 0
pageoutrun 32
allocstall 35
pgrotated 0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 20:39       ` Brent Cook
@ 2004-04-28 21:19         ` Trond Myklebust
  2004-04-29  3:55           ` Brent Cook
  2004-04-29  1:24         ` Andrew Morton
  2004-04-29  3:54         ` Andrew Morton
  2 siblings, 1 reply; 22+ messages in thread
From: Trond Myklebust @ 2004-04-28 21:19 UTC (permalink / raw)
  To: busterbcook; +Cc: Andrew Morton, linux-kernel

On Wed, 2004-04-28 at 16:39, Brent Cook wrote:
> > Could you please capture the contents of /proc/meminfo and /proc/vmstats
> > when it's happening?
> >
> > Thanks.
> >
> 
> Here is the top of top for one machine:
> 
>  15:36:55  up  7:09,  1 user,  load average: 1.00, 1.00, 1.00
> 48 processes: 46 sleeping, 2 running, 0 zombie, 0 stopped
> CPU states:   0.1% user  99.8% system   0.0% nice   0.0% iowait   0.0% idle
> Mem:   256992k av,  117644k used,  139348k free,       0k shrd,   36464k buff
>         50968k active,              51592k inactive
> Swap:  514040k av,       0k used,  514040k free                   61644k cached
> 
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>     7 root      25   0     0    0     0 RW   99.4  0.0 415:26   0 pdflush

Could you please also supply the mount options you are using as well as
the contents of /proc/mounts corresponding to your NFS partition.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 21:19         ` Trond Myklebust
@ 2004-04-29  3:55           ` Brent Cook
  2004-04-29  4:21             ` Trond Myklebust
  0 siblings, 1 reply; 22+ messages in thread
From: Brent Cook @ 2004-04-29  3:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: busterbcook, Andrew Morton, linux-kernel

On Wed, 28 Apr 2004, Trond Myklebust wrote:

> On Wed, 2004-04-28 at 16:39, Brent Cook wrote:
> > > Could you please capture the contents of /proc/meminfo and /proc/vmstats
> > > when it's happening?
> > >
> > > Thanks.
> > >
> >
> > Here is the top of top for one machine:
> >
> >  15:36:55  up  7:09,  1 user,  load average: 1.00, 1.00, 1.00
> > 48 processes: 46 sleeping, 2 running, 0 zombie, 0 stopped
> > CPU states:   0.1% user  99.8% system   0.0% nice   0.0% iowait   0.0% idle
> > Mem:   256992k av,  117644k used,  139348k free,       0k shrd,   36464k buff
> >         50968k active,              51592k inactive
> > Swap:  514040k av,       0k used,  514040k free                   61644k cached
> >
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> >     7 root      25   0     0    0     0 RW   99.4  0.0 415:26   0 pdflush
>
> Could you please also supply the mount options you are using as well as
> the contents of /proc/mounts corresponding to your NFS partition.
>
> Cheers,
>   Trond

Here is /proc/mounts on the aforementioned test system:

rootfs / rootfs rw 0 0
/dev/root / reiserfs rw 0 0
devpts /dev/pts devpts rw 0 0
proc /proc proc rw,nodiratime 0 0
none /sys sysfs rw 0 0
usbfs /proc/bus/usb usbfs rw 0 0
ozma:/home /home nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=ozma 0 0

 - Brent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  3:55           ` Brent Cook
@ 2004-04-29  4:21             ` Trond Myklebust
  2004-04-29  4:42               ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Trond Myklebust @ 2004-04-29  4:21 UTC (permalink / raw)
  To: busterbcook; +Cc: Andrew Morton, linux-kernel

On Wed, 2004-04-28 at 23:55, Brent Cook wrote:

> ozma:/home /home nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=ozma 0 0

OK, then it's not the case that it is doing synchronous I/O.

I see that we're failing to set wbc->encountered_congestion in the case
where a nonblocking writeback is forced to exit due to congestion. Could
that be causing pdflush to loop Andrew?

If so, does the following patch help?

--- linux-2.6.6-rc3/fs/nfs/write.c.orig	2004-04-28 22:25:46.000000000 -0400
+++ linux-2.6.6-rc3/fs/nfs/write.c	2004-04-29 00:06:25.000000000 -0400
@@ -347,8 +347,10 @@ int nfs_writepages(struct address_space 
 	if (err)
 		return err;
 	while (test_and_set_bit(BDI_write_congested, &bdi->state) != 0) {
-		if (wbc->nonblocking)
+		if (wbc->nonblocking) {
+			wbc->encountered_congestion = 1;
 			return 0;
+		}
 		nfs_wait_on_write_congestion(mapping, 0);
 	}
 	err = nfs_flush_inode(inode, 0, 0, wb_priority(wbc));


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:21             ` Trond Myklebust
@ 2004-04-29  4:42               ` Andrew Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  4:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: busterbcook, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> On Wed, 2004-04-28 at 23:55, Brent Cook wrote:
> 
> > ozma:/home /home nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=ozma 0 0
> 
> OK, then it's not the case that it is doing synchronous I/O.
> 
> I see that we're failing to set wbc->encountered_congestion in the case
> where a nonblocking writeback is forced to exit due to congestion. Could
> that be causing pdflush to loop Andrew?

It should be OK - after writepages completes the inode-level writeback code
will inspect BDI_write_congested and will propagate that into the
writeback_control for the top-level writeback code to ponder.

The pdflush silliness has been seen on smbfs and reiserfs too, so...

It looks like the problem is that the inode is stuck on the superblock's
s_io list, and we keep calling writepage(s) and writepage(s) keeps on
redirtying the page rather than writing it.  If so, moving the page off
s_io should fix it up.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 20:39       ` Brent Cook
  2004-04-28 21:19         ` Trond Myklebust
@ 2004-04-29  1:24         ` Andrew Morton
  2004-04-29  3:53           ` Brent Cook
  2004-04-29  3:54         ` Andrew Morton
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  1:24 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>      7 root      25   0     0    0     0 RW   99.4  0.0 415:26   0 pdflush

This getting very irritating.  Cannot reproduce it with a 2.4 server, gcc
3.4.0, 2.6.6-rc3 client.  grrr.

Could you please apply the below two patches, then wait for pdflush to go
nuts, then do:

	echo 1 > /proc/sys/debug/0
	echo 0 > /proc/sys/debug/0
	dmesg -s 1000000 > foo

then mail me foo?  It probably won't tell me much, but one has to start
somewhere.

Thanks.


 25-akpm/include/linux/kernel.h |    2 ++
 25-akpm/kernel/sysctl.c        |   19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff -puN include/linux/kernel.h~proc-sys-debug include/linux/kernel.h
--- 25/include/linux/kernel.h~proc-sys-debug	Tue Apr 27 17:11:39 2004
+++ 25-akpm/include/linux/kernel.h	Tue Apr 27 17:11:39 2004
@@ -220,6 +220,8 @@ extern void dump_stack(void);
 	1; \
 })
 
+extern int proc_sys_debug[8];
+
 #endif /* __KERNEL__ */
 
 #define SI_LOAD_SHIFT	16
diff -puN kernel/sysctl.c~proc-sys-debug kernel/sysctl.c
--- 25/kernel/sysctl.c~proc-sys-debug	Tue Apr 27 17:11:39 2004
+++ 25-akpm/kernel/sysctl.c	Tue Apr 27 17:11:39 2004
@@ -888,7 +888,26 @@ static ctl_table fs_table[] = {
 	{ .ctl_name = 0 }
 };
 
+int proc_sys_debug[8];
+EXPORT_SYMBOL(proc_sys_debug);
+
 static ctl_table debug_table[] = {
+	{1, "0", &proc_sys_debug[0], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{2, "1", &proc_sys_debug[1], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{3, "2", &proc_sys_debug[2], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{4, "3", &proc_sys_debug[3], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{5, "4", &proc_sys_debug[4], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{6, "5", &proc_sys_debug[5], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{7, "6", &proc_sys_debug[6], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{8, "7", &proc_sys_debug[7], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
 	{ .ctl_name = 0 }
 };
 

_





---

 25-akpm/fs/fs-writeback.c |   22 ++++++++++++++++++++++
 25-akpm/fs/mpage.c        |    6 ++++++
 2 files changed, 28 insertions(+)

diff -puN fs/fs-writeback.c~pdflush-debug fs/fs-writeback.c
--- 25/fs/fs-writeback.c~pdflush-debug	Tue Apr 27 17:12:11 2004
+++ 25-akpm/fs/fs-writeback.c	Tue Apr 27 17:19:21 2004
@@ -152,7 +152,23 @@ __sync_single_inode(struct inode *inode,
 
 	spin_unlock(&inode_lock);
 
+	if (proc_sys_debug[0]) {
+		printk("%s: writepages in nr_pages:%lu nr_to_write:%ld"
+				" pages_skipped:%ld en:%d\n",
+			__FUNCTION__,
+			mapping->nrpages, wbc->nr_to_write,
+			wbc->pages_skipped,
+			wbc->encountered_congestion);
+	}
 	ret = do_writepages(mapping, wbc);
+	if (proc_sys_debug[0]) {
+		printk("%s: writepages in nr_pages:%lu nr_to_write:%ld"
+				" pages_skipped:%ld en:%d\n",
+			__FUNCTION__,
+			mapping->nrpages, wbc->nr_to_write,
+			wbc->pages_skipped,
+			wbc->encountered_congestion);
+	}
 
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC))
@@ -328,6 +344,9 @@ sync_sb_inodes(struct super_block *sb, s
 		if (current_is_pdflush() && !writeback_acquire(bdi))
 			break;
 
+		if (proc_sys_debug[0]) {
+			printk("%s: write inode %p\n", __FUNCTION__, inode);
+		}
 		BUG_ON(inode->i_state & I_FREEING);
 		__iget(inode);
 		pages_skipped = wbc->pages_skipped;
@@ -384,6 +403,9 @@ writeback_inodes(struct writeback_contro
 	for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
 		if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
 			spin_unlock(&sb_lock);
+			if (proc_sys_debug[0]) {
+				printk("%s: sync sb %p\n", __FUNCTION__, sb);
+			}
 			sync_sb_inodes(sb, wbc);
 			spin_lock(&sb_lock);
 		}
diff -puN fs/mpage.c~pdflush-debug fs/mpage.c
--- 25/fs/mpage.c~pdflush-debug	Tue Apr 27 17:19:41 2004
+++ 25-akpm/fs/mpage.c	Tue Apr 27 17:20:57 2004
@@ -658,6 +658,12 @@ retry:
 			if (writepage) {
 				ret = (*writepage)(page, wbc);
 				if (ret) {
+					if (proc_sys_debug[0]) {
+						printk("%s: writepage "
+							"returned %d\n",
+							__FUNCTION__,
+							ret);
+					}
 					if (ret == -ENOSPC)
 						set_bit(AS_ENOSPC,
 							&mapping->flags);

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  1:24         ` Andrew Morton
@ 2004-04-29  3:53           ` Brent Cook
  2004-04-29  4:02             ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Brent Cook @ 2004-04-29  3:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Wed, 28 Apr 2004, Andrew Morton wrote:

> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> >      7 root      25   0     0    0     0 RW   99.4  0.0 415:26   0 pdflush
>
> This getting very irritating.  Cannot reproduce it with a 2.4 server, gcc
> 3.4.0, 2.6.6-rc3 client.  grrr.
>
> Could you please apply the below two patches, then wait for pdflush to go
> nuts, then do:
>
> 	echo 1 > /proc/sys/debug/0
> 	echo 0 > /proc/sys/debug/0
> 	dmesg -s 1000000 > foo
>
> then mail me foo?  It probably won't tell me much, but one has to start
> somewhere.
>

That seems like a good start. I compiled arts from KDE from an NFS
directory, and about 4 minutes into it, pdflush appears to be hung-up
writing a single inode. dmesg just contains this over and over:

...
sync_sb_inodes: write inode c55d25bc
__sync_single_inode: writepages in nr_pages:25 nr_to_write:949
pages_skipped:0 en:0
__sync_single_inode: writepages in nr_pages:25 nr_to_write:949
pages_skipped:0 en:0
sync_sb_inodes: write inode c55d25bc
__sync_single_inode: writepages in nr_pages:25 nr_to_write:949
pages_skipped:0 en:0
__sync_single_inode: writepages in nr_pages:25 nr_to_write:949
pages_skipped:0 en:0
sync_sb_inodes: write inode c55d25bc
....

After rebooting, I tried repeating the experiment compiling from /tmp and
pdflush behaved. It didn't matter whether the NFS share was mounted at the
time, just whether the source was compiled from the share or elsewhere.

FYI, my fstab on this test machine (the PIII with 256MB/i815 chipset) is
pretty boring:

/dev/hda1        swap             swap        defaults         0   0
/dev/hda2        /                reiserfs    defaults         1   1
ozma:/home       /home            nfs         rw               0   0
/dev/cdrom       /mnt/cdrom       iso9660     noauto,owner,ro  0   0
/dev/fd0         /mnt/floppy      auto        noauto,owner     0   0
devpts           /dev/pts         devpts      gid=5,mode=620   0   0
proc             /proc            proc        defaults         0   0
none             /sys             sysfs       defaults         0   0

The share is exported from the 2.4.25 server as follows:

/home snowball2(rw,async,no_root_squash)

 - Brent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  3:53           ` Brent Cook
@ 2004-04-29  4:02             ` Andrew Morton
  2004-04-29  4:03               ` Andrew Morton
  2004-04-29  4:37               ` Brent Cook
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  4:02 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
> sync_sb_inodes: write inode c55d25bc
>  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
>  pages_skipped:0 en:0
>  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
>  pages_skipped:0 en:0

uh-huh.

Does this fix it?

 25-akpm/fs/fs-writeback.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/fs-writeback.c~a fs/fs-writeback.c
--- 25/fs/fs-writeback.c~a	2004-04-28 21:01:37.012603336 -0700
+++ 25-akpm/fs/fs-writeback.c	2004-04-28 21:02:00.701002152 -0700
@@ -191,8 +191,8 @@ __sync_single_inode(struct inode *inode,
 				 */
 				inode->i_state |= I_DIRTY_PAGES;
 				inode->dirtied_when = jiffies;
-				list_move(&inode->i_list, &sb->s_dirty);
 			}
+			list_move(&inode->i_list, &sb->s_dirty);
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Someone redirtied the inode while were writing back

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:02             ` Andrew Morton
@ 2004-04-29  4:03               ` Andrew Morton
  2004-04-29  4:37               ` Brent Cook
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  4:03 UTC (permalink / raw)
  To: busterbcook, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:
>
>  Does this fix it?

If not, please try this new debug patch, against -rc3.


diff -puN include/linux/kernel.h~proc-sys-debug include/linux/kernel.h
--- 25/include/linux/kernel.h~proc-sys-debug	Tue Apr 27 17:11:39 2004
+++ 25-akpm/include/linux/kernel.h	Tue Apr 27 17:11:39 2004
@@ -220,6 +220,8 @@ extern void dump_stack(void);
 	1; \
 })
 
+extern int proc_sys_debug[8];
+
 #endif /* __KERNEL__ */
 
 #define SI_LOAD_SHIFT	16
diff -puN kernel/sysctl.c~proc-sys-debug kernel/sysctl.c
--- 25/kernel/sysctl.c~proc-sys-debug	Tue Apr 27 17:11:39 2004
+++ 25-akpm/kernel/sysctl.c	Tue Apr 27 17:11:39 2004
@@ -888,7 +888,26 @@ static ctl_table fs_table[] = {
 	{ .ctl_name = 0 }
 };
 
+int proc_sys_debug[8];
+EXPORT_SYMBOL(proc_sys_debug);
+
 static ctl_table debug_table[] = {
+	{1, "0", &proc_sys_debug[0], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{2, "1", &proc_sys_debug[1], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{3, "2", &proc_sys_debug[2], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{4, "3", &proc_sys_debug[3], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{5, "4", &proc_sys_debug[4], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{6, "5", &proc_sys_debug[5], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{7, "6", &proc_sys_debug[6], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
+	{8, "7", &proc_sys_debug[7], sizeof(int), 0644, NULL,
+	 &proc_dointvec_minmax, &sysctl_intvec, NULL, NULL, NULL},
 	{ .ctl_name = 0 }
 };
 

_




---

 25-akpm/fs/fs-writeback.c |   32 ++++++++++++++++++++++++++++++++
 25-akpm/fs/mpage.c        |    6 ++++++
 2 files changed, 38 insertions(+)

diff -puN fs/fs-writeback.c~pdflush-debug fs/fs-writeback.c
--- 25/fs/fs-writeback.c~pdflush-debug	2004-04-28 20:52:05.820437744 -0700
+++ 25-akpm/fs/fs-writeback.c	2004-04-28 21:01:05.062460496 -0700
@@ -152,7 +152,23 @@ __sync_single_inode(struct inode *inode,
 
 	spin_unlock(&inode_lock);
 
+	if (proc_sys_debug[0]) {
+		printk("%s: writepages in nr_pages:%lu nr_to_write:%ld"
+				" pages_skipped:%ld en:%d\n",
+			__FUNCTION__,
+			mapping->nrpages, wbc->nr_to_write,
+			wbc->pages_skipped,
+			wbc->encountered_congestion);
+	}
 	ret = do_writepages(mapping, wbc);
+	if (proc_sys_debug[0]) {
+		printk("%s: writepages out nr_pages:%lu nr_to_write:%ld"
+				" pages_skipped:%ld en:%d\n",
+			__FUNCTION__,
+			mapping->nrpages, wbc->nr_to_write,
+			wbc->pages_skipped,
+			wbc->encountered_congestion);
+	}
 
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC))
@@ -180,6 +196,8 @@ __sync_single_inode(struct inode *inode,
 				 * writeout as soon as the queue becomes
 				 * uncongested.
 				 */
+				if (proc_sys_debug[0])
+					printk("%s:%d\n", __FILE__, __LINE__);
 				inode->i_state |= I_DIRTY_PAGES;
 			} else {
 				/*
@@ -192,22 +210,30 @@ __sync_single_inode(struct inode *inode,
 				inode->i_state |= I_DIRTY_PAGES;
 				inode->dirtied_when = jiffies;
 				list_move(&inode->i_list, &sb->s_dirty);
+				if (proc_sys_debug[0])
+					printk("%s:%d\n", __FILE__, __LINE__);
 			}
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Someone redirtied the inode while were writing back
 			 * the pages: nothing to do.
 			 */
+			if (proc_sys_debug[0])
+				printk("%s:%d\n", __FILE__, __LINE__);
 		} else if (atomic_read(&inode->i_count)) {
 			/*
 			 * The inode is clean, inuse
 			 */
 			list_move(&inode->i_list, &inode_in_use);
+			if (proc_sys_debug[0])
+				printk("%s:%d\n", __FILE__, __LINE__);
 		} else {
 			/*
 			 * The inode is clean, unused
 			 */
 			list_move(&inode->i_list, &inode_unused);
+			if (proc_sys_debug[0])
+				printk("%s:%d\n", __FILE__, __LINE__);
 		}
 	}
 	wake_up_inode(inode);
@@ -328,6 +354,9 @@ sync_sb_inodes(struct super_block *sb, s
 		if (current_is_pdflush() && !writeback_acquire(bdi))
 			break;
 
+		if (proc_sys_debug[0]) {
+			printk("%s: write inode %p\n", __FUNCTION__, inode);
+		}
 		BUG_ON(inode->i_state & I_FREEING);
 		__iget(inode);
 		pages_skipped = wbc->pages_skipped;
@@ -384,6 +413,9 @@ writeback_inodes(struct writeback_contro
 	for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
 		if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
 			spin_unlock(&sb_lock);
+			if (proc_sys_debug[0]) {
+				printk("%s: sync sb %p\n", __FUNCTION__, sb);
+			}
 			sync_sb_inodes(sb, wbc);
 			spin_lock(&sb_lock);
 		}
diff -puN fs/mpage.c~pdflush-debug fs/mpage.c
--- 25/fs/mpage.c~pdflush-debug	2004-04-28 20:52:05.821437592 -0700
+++ 25-akpm/fs/mpage.c	2004-04-28 20:52:05.825436984 -0700
@@ -658,6 +658,12 @@ retry:
 			if (writepage) {
 				ret = (*writepage)(page, wbc);
 				if (ret) {
+					if (proc_sys_debug[0]) {
+						printk("%s: writepage "
+							"returned %d\n",
+							__FUNCTION__,
+							ret);
+					}
 					if (ret == -ENOSPC)
 						set_bit(AS_ENOSPC,
 							&mapping->flags);

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:02             ` Andrew Morton
  2004-04-29  4:03               ` Andrew Morton
@ 2004-04-29  4:37               ` Brent Cook
  2004-04-29  4:47                 ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Brent Cook @ 2004-04-29  4:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: busterbcook, linux-kernel

On Wed, 28 Apr 2004, Andrew Morton wrote:

> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> > sync_sb_inodes: write inode c55d25bc
> >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
> >  pages_skipped:0 en:0
> >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
> >  pages_skipped:0 en:0
>
> uh-huh.
>
> Does this fix it?

I'm going to run a compile/load test overnight, but the test that
triggered it every time previously failed to do so with this patch.

pdflush is behaving so far, and I'll say you've figured it out for now,
with the final verdict in about 8 hours.

Does this mean that, if there were too many dirty pages and not enough
time to write them all back, that the dirty page list just stopped being
traversed, stuck on a single page? That would make explain why this was
easier to trigger on NFS than a local FS, due to the extra latency, etc.

Maybe I should look into tuning my NFS server while we're at it, as yours
must be much faster ;)

 - Brent

>  25-akpm/fs/fs-writeback.c |    2 +-
>  1 files changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN fs/fs-writeback.c~a fs/fs-writeback.c
> --- 25/fs/fs-writeback.c~a	2004-04-28 21:01:37.012603336 -0700
> +++ 25-akpm/fs/fs-writeback.c	2004-04-28 21:02:00.701002152 -0700
> @@ -191,8 +191,8 @@ __sync_single_inode(struct inode *inode,
>  				 */
>  				inode->i_state |= I_DIRTY_PAGES;
>  				inode->dirtied_when = jiffies;
> -				list_move(&inode->i_list, &sb->s_dirty);
>  			}
> +			list_move(&inode->i_list, &sb->s_dirty);
>  		} else if (inode->i_state & I_DIRTY) {
>  			/*
>  			 * Someone redirtied the inode while were writing back
>
> _
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:37               ` Brent Cook
@ 2004-04-29  4:47                 ` Andrew Morton
  2004-04-29  5:29                   ` Andrew Morton
  2004-04-29 13:19                   ` Brent Cook
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  4:47 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
> On Wed, 28 Apr 2004, Andrew Morton wrote:
> 
>  > Brent Cook <busterbcook@yahoo.com> wrote:
>  > >
>  > > sync_sb_inodes: write inode c55d25bc
>  > >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
>  > >  pages_skipped:0 en:0
>  > >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
>  > >  pages_skipped:0 en:0
>  >
>  > uh-huh.
>  >
>  > Does this fix it?
> 
>  I'm going to run a compile/load test overnight, but the test that
>  triggered it every time previously failed to do so with this patch.

OK, thanks.  A better patch would be:


diff -puN fs/fs-writeback.c~writeback-livelock-fix-2 fs/fs-writeback.c
--- 25/fs/fs-writeback.c~writeback-livelock-fix-2	2004-04-28 21:19:32.779061976 -0700
+++ 25-akpm/fs/fs-writeback.c	2004-04-28 21:20:11.080239312 -0700
@@ -176,11 +176,12 @@ __sync_single_inode(struct inode *inode,
 			if (wbc->for_kupdate) {
 				/*
 				 * For the kupdate function we leave the inode
-				 * where it is on sb_dirty so it will get more
+				 * at the head of sb_dirty so it will get more
 				 * writeout as soon as the queue becomes
 				 * uncongested.
 				 */
 				inode->i_state |= I_DIRTY_PAGES;
+				list_move_tail(&inode->i_list, &sb->s_dirty);
 			} else {
 				/*
 				 * Otherwise fully redirty the inode so that

_

>  pdflush is behaving so far, and I'll say you've figured it out for now,
>  with the final verdict in about 8 hours.
> 
>  Does this mean that, if there were too many dirty pages and not enough
>  time to write them all back, that the dirty page list just stopped being
>  traversed, stuck on a single page?

No..  There's all sorts of livelock avoidance code in there and I keep on
forgetting that sometimes writepage won't write the dang page at all -
instead it just redirties the page (and hence the inode).

Now, that redirtying of the inode _should_ have moved the inode off the
s_io list and onto the s_dirty list.  But for some reason it looks like it
didn't, so we get stuck in a loop.  I need to think about it a bit more.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:47                 ` Andrew Morton
@ 2004-04-29  5:29                   ` Andrew Morton
  2004-04-29 13:19                   ` Brent Cook
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  5:29 UTC (permalink / raw)
  To: busterbcook, linux-kernel, Trond Myklebust

Andrew Morton <akpm@osdl.org> wrote:
>
> Now, that redirtying of the inode _should_ have moved the inode off the
>  s_io list and onto the s_dirty list.  But for some reason it looks like it
>  didn't, so we get stuck in a loop.  I need to think about it a bit more.

OK, it looks like nfs_writepages() might have been encountered the
congested flag and it baled out without doing anything - the inode is still
on the temporary s_io list, and no pages were redirtied, hence the inode
wasn't redirtied, hence it remains stuck on the s_io list.  The patch I
sent will fix that up.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-29  4:47                 ` Andrew Morton
  2004-04-29  5:29                   ` Andrew Morton
@ 2004-04-29 13:19                   ` Brent Cook
  1 sibling, 0 replies; 22+ messages in thread
From: Brent Cook @ 2004-04-29 13:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: busterbcook, linux-kernel

On Wed, 28 Apr 2004, Andrew Morton wrote:

> Brent Cook <busterbcook@yahoo.com> wrote:
> >
> > On Wed, 28 Apr 2004, Andrew Morton wrote:
> >
> >  > Brent Cook <busterbcook@yahoo.com> wrote:
> >  > >
> >  > > sync_sb_inodes: write inode c55d25bc
> >  > >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
> >  > >  pages_skipped:0 en:0
> >  > >  __sync_single_inode: writepages in nr_pages:25 nr_to_write:949
> >  > >  pages_skipped:0 en:0
> >  >
> >  > uh-huh.
> >  >
> >  > Does this fix it?
> >
> >  I'm going to run a compile/load test overnight, but the test that
> >  triggered it every time previously failed to do so with this patch.
>
> OK, thanks.  A better patch would be:

No, thank you! The overnight test was successful. I have been running this
better patch for a little while, and it is no worse. I think you have
solved the bigger problem, which was the runaway process, at least for me.

So, moving it to the tail of the s_dirty list now puts that page in a
higher-priority to be written back next time? That sounds better than just
redirtying it; the poor inode has been through enough as it is without
having to wait even longer.

If you want to think about it a little more, pdflush on 2.6.6-rc3 with
this patch still seems to use more resources than it did on 2.6.5. With
heavy NFS traffic, it still uses about 2-3% CPU on 2.6.6-rc3, but on 2.6.5
it averages about 0.1%. Maybe it just wasn't being used to its full
potential in 2.6.5?

Thanks
 - Brent

>
> diff -puN fs/fs-writeback.c~writeback-livelock-fix-2 fs/fs-writeback.c
> --- 25/fs/fs-writeback.c~writeback-livelock-fix-2	2004-04-28 21:19:32.779061976 -0700
> +++ 25-akpm/fs/fs-writeback.c	2004-04-28 21:20:11.080239312 -0700
> @@ -176,11 +176,12 @@ __sync_single_inode(struct inode *inode,
>  			if (wbc->for_kupdate) {
>  				/*
>  				 * For the kupdate function we leave the inode
> -				 * where it is on sb_dirty so it will get more
> +				 * at the head of sb_dirty so it will get more
>  				 * writeout as soon as the queue becomes
>  				 * uncongested.
>  				 */
>  				inode->i_state |= I_DIRTY_PAGES;
> +				list_move_tail(&inode->i_list, &sb->s_dirty);
>  			} else {
>  				/*
>  				 * Otherwise fully redirty the inode so that
>
> _
>
> >  pdflush is behaving so far, and I'll say you've figured it out for now,
> >  with the final verdict in about 8 hours.
> >
> >  Does this mean that, if there were too many dirty pages and not enough
> >  time to write them all back, that the dirty page list just stopped being
> >  traversed, stuck on a single page?
>
> No..  There's all sorts of livelock avoidance code in there and I keep on
> forgetting that sometimes writepage won't write the dang page at all -
> instead it just redirties the page (and hence the inode).
>
> Now, that redirtying of the inode _should_ have moved the inode off the
> s_io list and onto the s_dirty list.  But for some reason it looks like it
> didn't, so we get stuck in a loop.  I need to think about it a bit more.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28 20:39       ` Brent Cook
  2004-04-28 21:19         ` Trond Myklebust
  2004-04-29  1:24         ` Andrew Morton
@ 2004-04-29  3:54         ` Andrew Morton
  2 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2004-04-29  3:54 UTC (permalink / raw)
  To: busterbcook; +Cc: linux-kernel

Brent Cook <busterbcook@yahoo.com> wrote:
>
> Here are memory stats for the same machine. The other machine's stats
>  are similar; there doesn't appear to be anything out of the ordinary and
>  its not even touching swap if these numbers are to be believed.
> 
>  busterb@snowball2:~$ cat /proc/meminfo
>  MemTotal:       256992 kB
>  MemFree:        139700 kB
>  Buffers:         36464 kB
>  Cached:          61516 kB
>  SwapCached:          0 kB
>  Active:          50536 kB
>  Inactive:        51672 kB
>  HighTotal:           0 kB
>  HighFree:            0 kB
>  LowTotal:       256992 kB
>  LowFree:        139700 kB
>  SwapTotal:      514040 kB
>  SwapFree:       514040 kB
>  Dirty:            1876 kB
>  Writeback:           0 kB
>  Mapped:           8552 kB
>  Slab:            10924 kB
>  Committed_AS:    14612 kB
>  PageTables:        356 kB
>  VmallocTotal:   778164 kB
>  VmallocUsed:      2936 kB
>  VmallocChunk:   774708 kB

Was this captured while pdflush was misbehaving?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: pdflush eating a lot of CPU on heavy NFS I/O
  2004-04-28  5:18 pdflush eating a lot of CPU on heavy NFS I/O Brent Cook
  2004-04-28  6:02 ` Andrew Morton
@ 2004-04-28  6:06 ` Brett E.
  1 sibling, 0 replies; 22+ messages in thread
From: Brett E. @ 2004-04-28  6:06 UTC (permalink / raw)
  To: busterbcook; +Cc: Kernel Mailing List, akpm

Brent Cook wrote:

>  This didn't seem to be a problem with 2.6.5 or 2.4. Is there something I
> can do to control pdflush or to provide more information?
> 
> Thanks
>  - Brent
> 
Yes regarding controlling pdflush, I don't know about providing pdflush 
information. From Andrew Morton(Thanks Andrew!):

The tunables in /proc/sys/vm are documented in 
Documentation/filesystems/proc.txt.




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-04-29 13:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-28  5:18 pdflush eating a lot of CPU on heavy NFS I/O Brent Cook
2004-04-28  6:02 ` Andrew Morton
2004-04-28  7:29   ` Phy Prabab
2004-04-28 10:18   ` Sven Geggus
2004-04-28 13:50   ` Brent Cook
2004-04-28 19:48     ` Andrew Morton
2004-04-28 20:12       ` Brent Cook
2004-04-28 20:39       ` Brent Cook
2004-04-28 21:19         ` Trond Myklebust
2004-04-29  3:55           ` Brent Cook
2004-04-29  4:21             ` Trond Myklebust
2004-04-29  4:42               ` Andrew Morton
2004-04-29  1:24         ` Andrew Morton
2004-04-29  3:53           ` Brent Cook
2004-04-29  4:02             ` Andrew Morton
2004-04-29  4:03               ` Andrew Morton
2004-04-29  4:37               ` Brent Cook
2004-04-29  4:47                 ` Andrew Morton
2004-04-29  5:29                   ` Andrew Morton
2004-04-29 13:19                   ` Brent Cook
2004-04-29  3:54         ` Andrew Morton
2004-04-28  6:06 ` Brett E.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox