From: Andrew Morton <akpm@linux-foundation.org>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@vger.kernel.org,
agk@redhat.com, mbroz@redhat.com, chris@arachsys.com
Subject: Re: [PATCH] Memory management livelock
Date: Tue, 23 Sep 2008 16:46:23 -0700 [thread overview]
Message-ID: <20080923164623.ce82c1c2.akpm@linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0809231902010.19496@hs20-bc2-1.build.redhat.com>
On Tue, 23 Sep 2008 19:11:51 -0400 (EDT)
Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> > > wait_on_page_writeback_range is another example where the livelock
> > > happened, there is no protection at all against starvation.
> >
> > um, OK. So someone else is initiating IO for this inode and this
> > thread *never* gets to initiate any writeback. That's a bit of a
> > surprise.
> >
> > How do we fix that? Maybe decrement nt_to_write for these pages as
> > well?
>
> And what do you want to do with wait_on_page_writeback_range?
Don't know. I was asking you.
> When I
> solved that livelock in write_cache_pages(), I got another livelock in
> wait_on_page_writeback_range.
>
> > > BTW. that .nr_to_write = mapping->nrpages * 2 looks like a dangerous thing
> > > to me.
> > >
> > > Imagine this case: You have two pages with indices 4 and 5 dirty in a
> > > file. You call fsync(). It sets nr_to_write to 4.
> > >
> > > Meanwhile, another process makes pages 0, 1, 2, 3 dirty.
> > >
> > > The fsync() process goes to write_cache_pages, writes the first 4 dirty
> > > pages and exits because it goes over the limit.
> > >
> > > result --- you violate fsync() semantics, pages that were dirty before
> > > call to fsync() are not written when fsync() exits.
> >
> > yup, that's pretty much unfixable, really, unless new locks are added
> > which block threads which are writing to unrelated sections of the
> > file, and that could hurt some workloads quite a lot, I expect.
>
> It is fixable with the patch I sent --- it doesn't take any locks unless
> the starvation happens. Then, you don't have to use .nr_to_write for
> fsync anymore.
I agree that the patch is low-impact and relatively straightforward.
The main problem is making the address_space larger - there can (and
often are) millions and millions of these things in memory. Making it
larger is a big deal. We should work hard to seek an alternative and
afacit that isn't happening here.
We already have existing code and design which attempts to avoid
livelock without adding stuff to the address_space. Can it be modified
so as to patch up this quite obscure and rarely-occuring problem?
> Another solution could be to record in page structure jiffies when the
> page entered dirty state and writeback state. The start writeback/wait on
> writeback functions could then trivially ignore pages that were
> dirtied/writebacked while the function was in progress.
>
> > Hopefully high performance applications are instantiating the file
> > up-front and are using sync_file_range() to prevent these sorts of
> > things from happening. But they probably aren't.
>
> --- for databases it is pretty much possible that one thread is writing
> already journaled data (so it doesn't care when the data are really
> written) and another thread is calling fsync() on the same inode
> simultaneously --- so fsync() could mistakenly write the data generated by
> the first thread and ignore the data generated by the second thread, that
> it should really write.
>
> Mikulas
next prev parent reply other threads:[~2008-09-23 23:47 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080911101616.GA24064@agk.fab.redhat.com>
2008-09-22 21:10 ` [PATCH] Memory management livelock Mikulas Patocka
2008-09-23 0:48 ` Andrew Morton
2008-09-23 22:34 ` Mikulas Patocka
2008-09-23 22:49 ` Andrew Morton
2008-09-23 23:11 ` Mikulas Patocka
2008-09-23 23:46 ` Andrew Morton [this message]
2008-09-24 18:50 ` Mikulas Patocka
2008-09-24 18:51 ` [PATCH 1/3] " Mikulas Patocka
2008-09-24 18:52 ` [PATCH 2/3] " Mikulas Patocka
2008-10-02 5:54 ` Andrew Morton
2008-10-05 22:11 ` RFC: one-bit mutexes (was: Re: [PATCH 2/3] Memory management livelock) Mikulas Patocka
2008-10-11 12:06 ` Nick Piggin
2008-10-20 20:14 ` Mikulas Patocka
2008-10-21 1:51 ` Nick Piggin
2008-10-05 22:14 ` [PATCH 1/3] bit mutexes Mikulas Patocka
2008-10-05 22:14 ` [PATCH 2/3] Fix fsync livelock Mikulas Patocka
2008-10-05 22:33 ` Arjan van de Ven
2008-10-05 23:02 ` Mikulas Patocka
2008-10-05 23:07 ` Arjan van de Ven
2008-10-05 23:18 ` Mikulas Patocka
2008-10-05 23:28 ` Arjan van de Ven
2008-10-06 0:01 ` Mikulas Patocka
2008-10-06 0:30 ` Arjan van de Ven
2008-10-06 3:30 ` Mikulas Patocka
2008-10-06 4:20 ` Arjan van de Ven
2008-10-06 13:00 ` Mikulas Patocka
2008-10-06 13:50 ` Arjan van de Ven
2008-10-06 20:44 ` Mikulas Patocka
2008-10-08 10:56 ` Pavel Machek
2008-10-06 2:51 ` Dave Chinner
2008-10-05 22:16 ` [PATCH 3/3] Fix fsync-vs-write misbehavior Mikulas Patocka
2008-10-09 1:12 ` [PATCH] documentation: explain memory barriers Randy Dunlap
2008-10-09 1:17 ` Chris Snook
2008-10-09 1:31 ` Andrew Morton
2008-10-09 5:51 ` Chris Snook
2008-10-09 9:58 ` Ben Hutchings
2008-10-09 21:27 ` Nick Piggin
2008-10-09 17:29 ` Nick Piggin
2008-10-09 1:50 ` Valdis.Kletnieks
2008-10-09 17:35 ` Nick Piggin
2008-10-09 6:52 ` Valdis.Kletnieks
2008-09-24 18:53 ` [PATCH 3/3] Memory management livelock Mikulas Patocka
2008-10-03 2:32 ` [PATCH] " Nick Piggin
2008-10-03 2:40 ` Andrew Morton
2008-10-03 2:59 ` Nick Piggin
2008-10-03 3:14 ` Andrew Morton
2008-10-03 3:47 ` Nick Piggin
2008-10-03 3:56 ` Andrew Morton
2008-10-03 4:07 ` Nick Piggin
2008-10-03 4:17 ` Andrew Morton
2008-10-03 4:29 ` Nick Piggin
2008-10-03 11:43 ` Mikulas Patocka
2008-10-03 12:27 ` Nick Piggin
2008-10-03 13:53 ` Mikulas Patocka
2008-10-03 2:54 ` Nick Piggin
2008-10-03 11:26 ` Mikulas Patocka
2008-10-03 12:31 ` Nick Piggin
2008-10-03 13:50 ` Mikulas Patocka
2008-10-03 14:50 ` Alasdair G Kergon
2008-10-03 14:36 ` Alasdair G Kergon
2008-10-03 15:52 ` application syncing options (was Re: [PATCH] Memory management livelock) david
2008-10-06 0:04 ` Mikulas Patocka
2008-10-06 0:19 ` david
2008-10-06 3:42 ` Mikulas Patocka
2008-10-07 3:37 ` david
2008-10-07 15:44 ` Mikulas Patocka
2008-10-07 17:16 ` david
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080923164623.ce82c1c2.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=agk@redhat.com \
--cc=chris@arachsys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@vger.kernel.org \
--cc=mbroz@redhat.com \
--cc=mpatocka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox