public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@transmeta.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	<linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@math.psu.edu>
Subject: Re: DVD blockdevice buffers
Date: 25 May 2001 16:31:37 -0600	[thread overview]
Message-ID: <m11ypdgiwm.fsf@frodo.biederman.org> (raw)
In-Reply-To: <Pine.LNX.4.31.0105251417070.7867-100000@penguin.transmeta.com>
In-Reply-To: Linus Torvalds's message of "Fri, 25 May 2001 14:18:31 -0700 (PDT)"

Linus Torvalds <torvalds@transmeta.com> writes:

> On 25 May 2001, Eric W. Biederman wrote:
> >
> > I obviously picked a bad name, and a bad place to start.
> > int data_uptodate(struct page *page, unsigned offset, unsigned len)
> >
> > This is really an extension to PG_uptodate, not readpage.
> 
> Ugh.
> 
> The above is just horrible.

Useless and possibly excess maintence, but horrible?
Clean concepts and interfaces are horrible?

> It doesn't fix any problems, it is only an ugly work-around for a
> situation that never happens in real life. 

An ugly work around?  If exposing the details of what happens during
a partial write is ugly, then I submit we have issues.  

Changing 
	if (!Page_Uptodate(page)) ....
to
	if (!Page_Uptodate(page) && !data_uptodate(page, offset, nr)) ..

In one or two places that really only need partial page data
should be trivial, and clean.  Probably less than 100 lines for the
whole implementation.

> An application that only
> re-reads the data that it just wrote itself is a _stupid_ application, and
> I'm absolutely not interested in having a new interface that is useless
> for everything _but_ such a stupid application.

Problem:
Reading from memory is always faster from reading from disk.
We already have all the information we need to check to see if something is
actually in memory.  We just don't have a way to get at it.  So we do
extra I/O to no benefit.

The case where this comes up all of the time is the tail of files.  We
happen to be able to set PG_Uptodate in these cases, today, so it
probably isn't a big deal.   A lot of this I can imagine getting worse
as PAGE_CACHE_SIZE gets bigger.  

Usefulness: 
That is hard to tell.  The idea is general and clean.  It
fits in nicely with reading and writing partial pages.  So anytime we
have that case it could come in handy.

But I do agree the good implementation would probably to extend buffer
heads so that they can be used by network file systems for their
pending I/O.  Then we wouldn't need a virtual function, and could 
still find out if we have pages partially populated.  

I see this as a reduction in cache misses at very little cost.  And as
such probably worth it.  If the code maintenance or number of lines is
two high, then it probably isn't worth it. 

Applications:
But any application that reads/writes to a Btree.  With the Btree page
size being smaller than PAGE_CACHE_SIZE I can see this happening to.  
For Btree's you traverse the tree multiple times, in a row.  You have
no clue what your locality is going to be, and no way to predict which
page you are going to traverse into next.  Btrees are designed to
minimize the number of I/O's so taking a preventable I/O not likely
a big deal, but what is the point of using the OS's caching mechanism
if it doesn't help?

And I can think of other cases where you are doing random I/O in
on database type records, and happen to get locality because you have
multiple transactions dealing with the same data because you are doing
multiple things to a single persons account.  Now normally you will
have the read/modify write case, but occasionally you will be adding
new records and have the write and then read modify write cases.

So no I don't think it is only stupid applications that will trigger
these cases.  Simply applications that are prone to random I/O and
know the OS does a decent job of caching so haven't written their own
caching layer.  And they happen to have records that are smaller than
the application page size.

In unix we don't get a lot of this because we tend to use ascii based
files, with small data sets where it is cheaper to read the whole file
into memory at once.   With a larger data sets the tradeoffs become
different.  /etc/passwd on a system with 100's of thousands of users
is a classic example, of these tradeoffs changing.  

So a non stupid application would just need to add a user to
/etc/passwd.db and then set their password, and change their login
shell.  To trigger unexpected locality, in a random I/O case.   With
enough ram this could be within a period of a couple of minutes.  When
the administrator is setting up an account.

Eric

  reply	other threads:[~2001-05-25 22:35 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-05-18 19:02 DVD blockdevice buffers Eduard Hasenleithner
2001-05-18 19:25 ` Jens Axboe
2001-05-18 19:59   ` Eduard Hasenleithner
2001-05-20  2:36   ` Linus Torvalds
2001-05-23 17:34     ` Stephen C. Tweedie
2001-05-23 18:12       ` Linus Torvalds
2001-05-23 19:57         ` Stephen C. Tweedie
2001-05-23 20:01           ` Linus Torvalds
2001-05-23 20:40             ` Jeff Garzik
2001-05-23 22:32               ` Andrea Arcangeli
2001-05-25 20:12                 ` blkdev-pagecache-2 [was Re: DVD blockdevice buffers] Andrea Arcangeli
2001-05-25 20:15                   ` Andrea Arcangeli
2001-05-23 22:09             ` DVD blockdevice buffers Andrea Arcangeli
2001-05-23 22:13               ` Alexander Viro
2001-05-23 22:24                 ` Andrea Arcangeli
2001-05-24 11:36             ` Stephen C. Tweedie
2001-05-25 15:09               ` Eric W. Biederman
2001-05-25 15:45                 ` Stephen C. Tweedie
2001-05-25 17:16                 ` Linus Torvalds
2001-05-25 17:40                   ` Alexander Viro
2001-05-25 18:05                     ` Linus Torvalds
2001-05-25 18:24                       ` Alexander Viro
2001-05-25 19:02                         ` Stephen C. Tweedie
2001-05-27  6:38                     ` Pavel Machek
2001-05-25 21:07                   ` Eric W. Biederman
2001-05-25 21:18                     ` Linus Torvalds
2001-05-25 22:31                       ` Eric W. Biederman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2001-05-19 18:16 Adam Schrotenboer
2001-05-19 22:56 ` Jens Axboe
2001-05-20  1:55   ` Adam Schrotenboer
2001-05-21 15:44   ` Adam Schrotenboer
2001-05-21 15:47     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m11ypdgiwm.fsf@frodo.biederman.org \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sct@redhat.com \
    --cc=torvalds@transmeta.com \
    --cc=viro@math.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox