public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Helge Hafting <helgehaf@aitel.hist.no>
Cc: linux-kernel@vger.kernel.org
Subject: Re: 2.6.17-mm one process gets stuck in infinite loop in the kernel.
Date: Fri, 30 Jun 2006 16:55:32 -0700	[thread overview]
Message-ID: <20060630165532.5eadf286.akpm@osdl.org> (raw)
In-Reply-To: <20060630215405.GA9744@aitel.hist.no>

Helge Hafting <helgehaf@aitel.hist.no> wrote:
>
> On Thu, Jun 29, 2006 at 10:41:17AM -0700, Andrew Morton wrote:
> > On Thu, 29 Jun 2006 13:25:20 +0200
> > Helge Hafting <helge.hafting@aitel.hist.no> wrote:
> > 
> > > I have seen this both with mm2, m33 and mm4.
> > > Suddenly, the load meter jumps.
> > > Using ps & top, I see one process using 100% cpu.
> > > This is always a process that was exiting, this tend to happen
> > > when I close applications, or doing debian upgrades which
> > > runs lots of short-lived processes.
> > > 
> > > I believe it is running in the kernel, ps lists it with stat "RN"
> > > and it cannot be killed, not even with kill -9 from root.
> > > 
> > > Something wrong with process termination?
> > > 
> > 
> > Please generate a kernel profile when it happens so we can see
> > where it got stuck.
> > 
> > <boot with profile=1>
> > <wait for it to happen>
> > readprofile -r
> > sleep 10
> > readprofile -n -v -m /boot/System.map | sort -n -k 3 | tail -40
> 
> It was easier to reproduce on my home machine, running mm2.
> I followed the recipe above, except typing manually means
> the wait was more than 10s.
> 
> Output from the pipe above:
> ffffffff801f9050 do_get_write_access                          210,0170
> ffffffff80111c20 __do_softirq                                 280,1591
> ffffffff8012e0e0 vm_stat_account                              280,2917
> ffffffff80194890 search_exception_tables                      491,5312
> ffffffff801f1380 ext3_journal_start_sb                        821,0250
> ffffffff801fd670 __log_space_left                             892,7812
> ffffffff8010bf50 __wake_up_bit                               1252,6042
> ffffffff8010c2f0 put_page                                    1412,9375
> ffffffff8010dd30 mark_page_accessed                          1732,1625
> ffffffff801a3340 __filemap_copy_from_user_iovec_inatomic     1951,7411
> ffffffff801643f0 cond_resched                                2001,5625
> ffffffff8015f241 error_exit                                  2431,8409
> ffffffff801a5dd0 balance_dirty_pages_ratelimited_nr          2580,5375
> ffffffff801f9aa0 journal_start                               3211,0559
> ffffffff8011ac00 page_waitqueue                              3813,9688
> ffffffff80139ea0 generic_commit_write                        4254,4271
> ffffffff80117500 __block_commit_write                        4902,3558
> ffffffff8010b320 __down_read_trylock                         50210,4583
> ffffffff8013d100 block_prepare_write                         57712,0208
> ffffffff80121800 __up_read                                   5833,3125
> ffffffff80117bc0 unlock_page                                 69214,4167
> ffffffff801fd710 journal_blocks_per_page                     83426,0625
> ffffffff8012e0c0 __wake_up                                   89427,9375
> ffffffff80109d70 kmem_cache_alloc                           112617,5938
> ffffffff801e78b0 walk_page_buffers                          12657,1875
> ffffffff801ea920 ext3_ordered_commit_write                  13525,2812
> ffffffff801074c0 kmem_cache_free                            15807,5962
> ffffffff801f11e0 __ext3_journal_stop                        169617,6667
> ffffffff801f96b0 start_this_handle                          18711,8562
> ffffffff801f8e50 journal_stop                               19063,7227
> ffffffff801eacf0 ext3_prepare_write                         19915,9256
> ffffffff80113440 find_lock_page                             205314,2569
> ffffffff8010f690 generic_file_buffered_write                21391,2612
> ffffffff8010deb0 __block_prepare_write                      22841,9291
> ffffffff801e85c0 ext3_writepage_trans_blocks                279919,4375
> ffffffff80166356 bad_gs                                     33520,4129
> ffffffff80179070 search_extable                             386834,5357
> ffffffff8010b400 find_vma                                   420137,5089
> ffffffff8010a180 do_page_fault                             107535,1302
> 0000000000000000 total                                     516580,0124


Oh.  This is probably the generic_file_buffer_write() hang, due to
zero-length iovec segments.

If so, the below should fix it up.

The presence of do_page_fault() in that trace is interesting.  At a guess,
I'd say that userspace is passing in a bad iovec.iov_base as well as
iovec.iov_len=0, and the kernel's copy_from_user() implementation is
needlessly dereferencing the pointer, getting a fault, then seeing that it
didn't need to copy anything anyway.  hmm.


diff -puN mm/filemap.c~generic_file_buffered_write-handle-zero-length-iovec-segments-stable mm/filemap.c
--- a/mm/filemap.c~generic_file_buffered_write-handle-zero-length-iovec-segments-stable
+++ a/mm/filemap.c
@@ -2125,6 +2125,12 @@ generic_file_buffered_write(struct kiocb
 			break;
 		}
 
+		if (unlikely(bytes == 0)) {
+			status = 0;
+			copied = 0;
+			goto zero_length_segment;
+		}
+
 		status = a_ops->prepare_write(file, page, offset, offset+bytes);
 		if (unlikely(status)) {
 			loff_t isize = i_size_read(inode);
@@ -2154,7 +2160,8 @@ generic_file_buffered_write(struct kiocb
 			page_cache_release(page);
 			continue;
 		}
-		if (likely(copied > 0)) {
+zero_length_segment:
+		if (likely(copied >= 0)) {
 			if (!status)
 				status = copied;
 
diff -puN mm/filemap.h~generic_file_buffered_write-handle-zero-length-iovec-segments-stable mm/filemap.h
--- a/mm/filemap.h~generic_file_buffered_write-handle-zero-length-iovec-segments-stable
+++ a/mm/filemap.h
@@ -88,7 +88,7 @@ filemap_set_next_iovec(const struct iove
 	const struct iovec *iov = *iovp;
 	size_t base = *basep;
 
-	while (bytes) {
+	do {
 		int copy = min(bytes, iov->iov_len - base);
 
 		bytes -= copy;
@@ -97,7 +97,7 @@ filemap_set_next_iovec(const struct iove
 			iov++;
 			base = 0;
 		}
-	}
+	} while (bytes);
 	*iovp = iov;
 	*basep = base;
 }
_



  reply	other threads:[~2006-06-30 23:52 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-29  8:36 2.6.17-mm4 Andrew Morton
2006-06-29  9:44 ` 2.6.17-mm4 Benoit Boissinot
2006-06-29 11:25 ` 2.6.17-mm one process gets stuck in infinite loop in the kernel Helge Hafting
2006-06-29 17:41   ` Andrew Morton
2006-06-29 20:39     ` Ralf Hildebrandt
2006-06-29 21:00       ` Andrew Morton
2006-06-30 12:48     ` Helge Hafting
2006-06-30 21:54     ` Helge Hafting
2006-06-30 23:55       ` Andrew Morton [this message]
2006-07-01 10:58         ` Helge Hafting
2006-07-01 11:05           ` Andrew Morton
2006-06-29 11:44 ` 2.6.17-mm4 Reuben Farrelly
2006-06-29 11:45 ` 2.6.17-mm4 Reuben Farrelly
2006-06-29 17:52   ` 2.6.17-mm4 Andrew Morton
2006-06-30  7:18     ` 2.6.17-mm4 Reuben Farrelly
2006-06-30  7:33       ` 2.6.17-mm4 Andrew Morton
2006-06-29 17:53 ` 2.6.17-mm4 Jesse Brandeburg
2006-06-29 19:05   ` 2.6.17-mm4 Andrew Morton
2006-06-30 23:53     ` 2.6.17-mm4 Jesse Brandeburg
2006-07-01  0:12       ` 2.6.17-mm4 Andrew Morton
2006-07-01  0:17         ` 2.6.17-mm4 Jesse Brandeburg
2006-07-01  0:31           ` 2.6.17-mm4 john stultz
2006-07-01 17:33             ` 2.6.17-mm4 Jesse Brandeburg
2006-07-01 17:56               ` 2.6.17-mm4 john stultz
2006-07-01 23:57                 ` 2.6.17-mm4 Andrew Morton
2006-07-02  2:45                   ` 2.6.17-mm4 john stultz
2006-07-02  3:19                     ` 2.6.17-mm4 Andrew Morton
2006-07-02  3:37                       ` 2.6.17-mm4 john stultz
2006-07-01  0:52           ` 2.6.17-mm4 Andrew Morton
2006-07-01 18:18             ` 2.6.17-mm4 Jesse Brandeburg
2006-07-01  0:22         ` 2.6.17-mm4 Andrew Morton
2006-06-29 19:20 ` [-mm patch] drivers/message/fusion/mptsas.c: make 2 functions static Adrian Bunk
2006-06-29 19:20 ` [-mm patch] fs/nfs/: " Adrian Bunk
2006-06-29 19:36 ` Possible circular locking dependency detected in Reiser4 Andrew James Wade
2006-06-29 20:39 ` 2.6.17-mm4 Michal Piotrowski
2006-06-29 20:43   ` 2.6.17-mm4 Dave Jones
2006-06-29 20:46     ` 2.6.17-mm4 Michal Piotrowski
2006-06-29 20:49       ` 2.6.17-mm4 Dave Jones
2006-06-29 20:57         ` 2.6.17-mm4 Michal Piotrowski
2006-06-29 20:58       ` 2.6.17-mm4 Andrew Morton
2006-06-29 21:41         ` 2.6.17-mm4 Michal Piotrowski
2006-06-29 21:09     ` 2.6.17-mm4 Ingo Molnar
2006-06-29 23:05       ` 2.6.17-mm4 Ingo Molnar
2006-06-30 10:07         ` 2.6.17-mm4 Alan Cox
2006-06-30  9:50           ` 2.6.17-mm4 Ingo Molnar
2006-06-30  9:54           ` 2.6.17-mm4 Arjan van de Ven
2006-06-30 11:01             ` 2.6.17-mm4 Andreas Mohr
2006-06-30 12:14             ` 2.6.17-mm4 Alan Cox
2006-06-30 17:27               ` 2.6.17-mm4 Dave Jones
2006-06-30 17:52                 ` 2.6.17-mm4 Alan Cox
2006-06-29 21:40 ` 2.6.17-mm4 Chris Rode
2006-06-29 22:18   ` 2.6.17-mm4 Andrew Morton
2006-06-29 23:27 ` 2.6.17-mm4 Ingo Molnar
2006-06-30 19:20 ` 2.6.17-mm4 Manuel Lauss
2006-06-30 23:26   ` 2.6.17-mm4 Andrew Morton
2006-07-01  7:12     ` 2.6.17-mm4 Manuel Lauss
2006-06-30 20:16 ` 2.6.17-mm4 Rafael J. Wysocki
2006-07-01 11:11 ` 2.6.17-mm4 raid bugs & traces Helge Hafting
2006-07-01 11:52   ` Andrew Morton
2006-07-01 16:25   ` Helge Hafting
2006-07-02  5:38     ` Reuben Farrelly
2006-07-02 18:46       ` Helge Hafting
2006-07-03 13:10         ` David Greaves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060630165532.5eadf286.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=helgehaf@aitel.hist.no \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox