All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 17:10:07 +1000	[thread overview]
Message-ID: <447BEFCF.5000406@yahoo.com.au> (raw)
In-Reply-To: <17531.57913.151520.946557@cse.unsw.edu.au>

Neil Brown wrote:
> On Tuesday May 30, nickpiggin@yahoo.com.au wrote:
> 
>>Nick Piggin wrote:
>>

>>For workloads where plugging helps (ie. lots of smaller, contiguous
>>requests going into the IO layer), the request pattern should be
>>pretty good without plugging these days, due to multiple page
>>readahead and writeback.
> 
> 
> Can I please put in a vote for not thinking that every device is disk
> drive?
> 
> I find plugging fairly important for raid5, particularly for write.
> 
> The more whole-stripe writes I can get, the better throughput I get.
> So I tend to keep a raid5 array plugged while any requests are
> arriving, and interpret 'plugged' to mean that incomplete stripes
> don't get processed while full stripes (needing no pre-reading) do get
> processed.
> 
> The only way "large requests" are going to replace plugging is they
> are perfectly aligned, which I don't expect to ever see.

Fair enough, thanks for the input. I was more imagining that IO tends
to come down in decent chunks, but obviously that's still not sufficient
for some. OK.

> 
> As for your original problem.... I wonder if PG_locked is protecting
> too much?  It protects against IO and it also protects against ->mapping
> changes.  So if you want to ensure that ->mapping won't change, you
> need to wait for any pending read request to finish, which seems a bit
> dumb.

I don't think that is the problem. set_page_dirty_lock is really
unlikely to get held up on read IO: that'd mean there were two things
writing into that page at the same time.

 >
> Maybe we need a new bit: PG_maplocked.  You are only allowed to change
> ->mapping or ->index of you hold PG_locked and PG_maplocked, you are
> not allowed to wait for PG_locked while holding PG_maplocked, and
> you can read ->mapping or ->index while PG_locked or PG_maplocked are
> held.
> Think of PG_locked like a mutex and PG_maplocked like a spinlock (and
> probably use bit_spinlock to get it).

Well the original problem is fixed by not doing the sync_page thing in
set_page_dirty_lock. Is there any advantage to having another bit?
Considering a) it will be very unlikely that a page is locked at the
same time one would like to dirty it; and b) that would seem to imply
adding extra atomic ops and barriers to reclaim and truncate (maybe
others).

> 
> Then set_page_dirty_lock would use PG_maplocked to get access to
> ->mapping, and then hold a reference on the address_space while
> calling into balance_dirty_pages ... I wonder how you hold a reference
> on an address space...

inode. Presumably PG_maplocked would pin it? I don't understand
why you've brought balance_dirty_pages into it, though.

> 
> There are presumably few pieces of code that change ->mapping.  Once
> they all take PG_maplocked as well as PG_locked, you can start freeing
> up other code to take PG_maplocked instead of PG_locked....
> 
> Does that make sense at all?  Do we have any spare page bits?

I'm sure it could be made to work, but I don't really see the point.
If someone really wanted to do it, I guess the right way to go is have
a PG_readin counterpart to PG_writeback (or even extend PG_writeback
to PG_io)...

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Neil Brown <neilb@suse.de>
Cc: Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 17:10:07 +1000	[thread overview]
Message-ID: <447BEFCF.5000406@yahoo.com.au> (raw)
In-Reply-To: <17531.57913.151520.946557@cse.unsw.edu.au>

Neil Brown wrote:
> On Tuesday May 30, nickpiggin@yahoo.com.au wrote:
> 
>>Nick Piggin wrote:
>>

>>For workloads where plugging helps (ie. lots of smaller, contiguous
>>requests going into the IO layer), the request pattern should be
>>pretty good without plugging these days, due to multiple page
>>readahead and writeback.
> 
> 
> Can I please put in a vote for not thinking that every device is disk
> drive?
> 
> I find plugging fairly important for raid5, particularly for write.
> 
> The more whole-stripe writes I can get, the better throughput I get.
> So I tend to keep a raid5 array plugged while any requests are
> arriving, and interpret 'plugged' to mean that incomplete stripes
> don't get processed while full stripes (needing no pre-reading) do get
> processed.
> 
> The only way "large requests" are going to replace plugging is they
> are perfectly aligned, which I don't expect to ever see.

Fair enough, thanks for the input. I was more imagining that IO tends
to come down in decent chunks, but obviously that's still not sufficient
for some. OK.

> 
> As for your original problem.... I wonder if PG_locked is protecting
> too much?  It protects against IO and it also protects against ->mapping
> changes.  So if you want to ensure that ->mapping won't change, you
> need to wait for any pending read request to finish, which seems a bit
> dumb.

I don't think that is the problem. set_page_dirty_lock is really
unlikely to get held up on read IO: that'd mean there were two things
writing into that page at the same time.

 >
> Maybe we need a new bit: PG_maplocked.  You are only allowed to change
> ->mapping or ->index of you hold PG_locked and PG_maplocked, you are
> not allowed to wait for PG_locked while holding PG_maplocked, and
> you can read ->mapping or ->index while PG_locked or PG_maplocked are
> held.
> Think of PG_locked like a mutex and PG_maplocked like a spinlock (and
> probably use bit_spinlock to get it).

Well the original problem is fixed by not doing the sync_page thing in
set_page_dirty_lock. Is there any advantage to having another bit?
Considering a) it will be very unlikely that a page is locked at the
same time one would like to dirty it; and b) that would seem to imply
adding extra atomic ops and barriers to reclaim and truncate (maybe
others).

> 
> Then set_page_dirty_lock would use PG_maplocked to get access to
> ->mapping, and then hold a reference on the address_space while
> calling into balance_dirty_pages ... I wonder how you hold a reference
> on an address space...

inode. Presumably PG_maplocked would pin it? I don't understand
why you've brought balance_dirty_pages into it, though.

> 
> There are presumably few pieces of code that change ->mapping.  Once
> they all take PG_maplocked as well as PG_locked, you can start freeing
> up other code to take PG_maplocked instead of PG_locked....
> 
> Does that make sense at all?  Do we have any spare page bits?

I'm sure it could be made to work, but I don't really see the point.
If someone really wanted to do it, I guess the right way to go is have
a PG_readin counterpart to PG_writeback (or even extend PG_writeback
to PG_io)...

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-05-30  7:10 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-29  9:34 [rfc][patch] remove racy sync_page? Nick Piggin
2006-05-29 19:15 ` Andrew Morton
2006-05-29 19:15   ` Andrew Morton
2006-05-30  0:08   ` Nick Piggin
2006-05-30  0:08     ` Nick Piggin
2006-05-30  1:32     ` Andrew Morton
2006-05-30  1:32       ` Andrew Morton
2006-05-30  2:54       ` Nick Piggin
2006-05-30  2:54         ` Nick Piggin
2006-05-30  3:14         ` Andrew Morton
2006-05-30  3:14           ` Andrew Morton
2006-05-30  4:13           ` Nick Piggin
2006-05-30  4:13             ` Nick Piggin
2006-05-30  9:05           ` Jens Axboe
2006-05-30  9:05             ` Jens Axboe
2006-05-31 13:43             ` Nick Piggin
2006-05-31 13:43               ` Nick Piggin
2006-05-31 15:09               ` Hugh Dickins
2006-05-31 15:09                 ` Hugh Dickins
2006-05-31 15:22                 ` Nick Piggin
2006-05-31 15:22                   ` Nick Piggin
2006-05-31 17:51                   ` Jens Axboe
2006-05-31 17:51                     ` Jens Axboe
2006-05-31 17:50               ` Jens Axboe
2006-05-31 17:50                 ` Jens Axboe
2006-05-30  4:20         ` Linus Torvalds
2006-05-30  4:20           ` Linus Torvalds
2006-05-30  5:07           ` Nick Piggin
2006-05-30  5:07             ` Nick Piggin
2006-05-30  5:21             ` Nick Piggin
2006-05-30  5:21               ` Nick Piggin
2006-05-30  6:12               ` Neil Brown
2006-05-30  6:12                 ` Neil Brown
2006-05-30  7:10                 ` Nick Piggin [this message]
2006-05-30  7:10                   ` Nick Piggin
2006-05-31  4:34                   ` Neil Brown
2006-05-31  4:34                     ` Neil Brown
2006-05-30  8:24               ` Nikita Danilov
2006-05-30  8:24                 ` Nikita Danilov
2006-05-30 17:55               ` Linus Torvalds
2006-05-30 17:55                 ` Linus Torvalds
2006-05-31  0:32                 ` Nick Piggin
2006-05-31  0:32                   ` Nick Piggin
2006-05-31  0:56                   ` Linus Torvalds
2006-05-31  0:56                     ` Linus Torvalds
2006-05-31  1:33                     ` Mark Lord
2006-05-31  1:33                       ` Mark Lord
2006-05-31  6:11                       ` Jens Axboe
2006-05-31  6:11                         ` Jens Axboe
2006-05-31 12:55                         ` Mark Lord
2006-05-31 12:55                           ` Mark Lord
2006-05-31 13:02                           ` Jens Axboe
2006-05-31 13:02                             ` Jens Axboe
2006-06-01 13:19                           ` NCQ performance (was Re: [rfc][patch] remove racy sync_page?) Jens Axboe
2006-06-01 13:19                             ` Jens Axboe
2006-06-01 14:56                             ` Avi Kivity
2006-06-01 14:56                               ` Avi Kivity
2006-06-01 15:03                               ` Jens Axboe
2006-06-01 15:03                                 ` Jens Axboe
2006-06-01 18:04                                 ` Jens Axboe
2006-06-01 18:04                                   ` Jens Axboe
2006-06-05  5:30                                   ` Avi Kivity
2006-06-05  5:30                                     ` Avi Kivity
2006-06-05  7:59                                     ` Jens Axboe
2006-06-05  7:59                                       ` Jens Axboe
2006-05-31 12:31                     ` [rfc][patch] remove racy sync_page? Helge Hafting
2006-05-31 12:31                       ` Helge Hafting
2006-05-31 12:36                       ` Arjan van de Ven
2006-05-31 12:36                         ` Arjan van de Ven
2006-05-31 13:29                     ` Nick Piggin
2006-05-31 13:29                       ` Nick Piggin
2006-05-31 13:41                       ` Jens Axboe
2006-05-31 13:41                         ` Jens Axboe
2006-05-31 13:54                         ` Nick Piggin
2006-05-31 13:54                           ` Nick Piggin
2006-05-31 14:43                       ` Linus Torvalds
2006-05-31 14:43                         ` Linus Torvalds
2006-05-31 14:57                         ` Nick Piggin
2006-05-31 14:57                           ` Nick Piggin
2006-05-31 15:13                           ` Linus Torvalds
2006-05-31 15:13                             ` Linus Torvalds
2006-05-31 15:33                             ` Nick Piggin
2006-05-31 15:57                               ` Linus Torvalds
2006-05-31 16:12                                 ` Linus Torvalds
2006-05-31 16:26                                   ` Nick Piggin
2006-05-31 16:19                                 ` Nick Piggin
2006-05-31 16:22                                   ` Nick Piggin
2006-05-31 16:41                                     ` Linus Torvalds
2006-06-02  2:34                                       ` Nick Piggin
2006-06-02  2:39                                         ` Nick Piggin
2006-05-31 16:39                                   ` Linus Torvalds
2006-06-02  2:21                                     ` Nick Piggin
2006-05-31 23:59                                   ` Neil Brown
2006-05-31 15:09                         ` Linus Torvalds
2006-05-31 15:09                           ` Linus Torvalds
2006-05-31 18:13                           ` Jens Axboe
2006-05-31 18:13                             ` Jens Axboe
2006-05-31 18:26                             ` Linus Torvalds
2006-05-31 18:26                               ` Linus Torvalds
2006-05-30  5:36             ` Nick Piggin
2006-05-30 18:31               ` Hugh Dickins
2006-05-30 18:31                 ` Hugh Dickins
2006-05-31  0:21                 ` Nick Piggin
2006-05-31  0:21                   ` Nick Piggin
2006-05-31  3:06                   ` Hugh Dickins
2006-05-31  3:06                     ` Hugh Dickins
2006-05-31 14:30                     ` Hugh Dickins
2006-05-31 14:30                       ` Hugh Dickins
2006-05-31 17:56                     ` Jens Axboe
2006-05-31 17:56                       ` Jens Axboe
2006-05-30  5:51 ` Josef Sipek
2006-05-30  5:51   ` Josef Sipek
2006-05-30  6:44   ` Nick Piggin
2006-05-30  6:44     ` Nick Piggin
2006-05-30  6:50     ` Nick Piggin
2006-05-30  6:50       ` Nick Piggin
2006-05-30 13:12     ` Josef Sipek
2006-05-30 13:12       ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=447BEFCF.5000406@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mason@suse.com \
    --cc=neilb@suse.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.