All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de,
	torvalds@osdl.org
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 12:54:53 +1000	[thread overview]
Message-ID: <447BB3FD.1070707@yahoo.com.au> (raw)
In-Reply-To: <20060529183201.0e8173bc.akpm@osdl.org>

Andrew Morton wrote:

>On Tue, 30 May 2006 10:08:06 +1000
>Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
>>Which is what I want to know. I don't exactly have an interesting
>>disk setup.
>>
>
>You don't need one - just a single disk should show up such problems.  I
>forget which workloads though.  Perhaps just a linear read (readahead
>queues the I/O but doesn't unplug, subsequent lock_page() sulks).
>

I guess so. Is plugging still needed now that the IO layer should
get larger requests? Disabling it might result in a small initial
request (although even that may be good for pipelining)...

Otherwise, we could make set_page_dirty_lock use a weird non-unplugging
variant (although maybe that will upset Ken), but I'd rather look
at simplification first ;)

>
>>Yes. So set_page_dirty_lock is broken, right?
>>
>
>yup.
>
>
>>And the wait_on_page_stuff needs an inode ref.
>>Also splice seems to have broken sync_page.
>>
>
>Please describe the splice() problem which you've observed.
>

sync_page wants to get either the current mapping, or a NULL one.
The sync_page methods must then be able to handle running into a
NULL mapping.

With splice, the mapping can change, so you can have the wrong
sync_page callback run against the page.

>
>>Well yes, writing to a page would be the main reason to set it dirty.
>>Is splice broken as well? I'm not sure that it always has a ref on the
>>inode when stealing a page.
>>
>
>Whereabouts?
>

The ->pin() calls in pipe_to_file and pipe_to_sendpage?

>
>>It sounds like you think fixing the set_page_dirty_lock callers wouldn't
>>be too difficult? I wouldn't know (although the ptrace one should be
>>able to be turned into a set_page_dirty, because we're holding mmap_sem).
>>
>
>No, I think it's damn impossible ;)
>
>get_user_pages() has gotten us a random pagecache page.  How do we
>non-racily get at the address_space prior to locking that page?
>
>I don't think we can.
>

But the vma isn't going to disappear because mmap_sem is held; and the
vma should hold a ref on the inode I think?

>
>>You're sure about all other lock_page()rs? I'm not, given that
>>set_page_dirty_lock got it so wrong. But you'd have a better idea than
>>me.
>>
>
>No, I'm not sure.
>
>However it is rare for the kernel to play with pagecache pages against
>which the caller doesn't have an inode ref.  Think: how did the caller look
>up that page in the first place if not from the address_space in the first
>place?
>
>- get_user_pages(): the current problem
>
>- page LRU: OK, uses trylock first.
>
>- pagetable walk??
>

Am I wrong about mmap_sem?

Anyway, it is possible that most of the problems could be solved by locking
the page at the time of lookup, and unlocking it on completion/dirtying...
it's just that that would be a bit of a task. Can we somehow add BUG_ONs to
lock_page to ensure we've got an inode ref?

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de,
	torvalds@osdl.org
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 12:54:53 +1000	[thread overview]
Message-ID: <447BB3FD.1070707@yahoo.com.au> (raw)
In-Reply-To: <20060529183201.0e8173bc.akpm@osdl.org>

Andrew Morton wrote:

>On Tue, 30 May 2006 10:08:06 +1000
>Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
>>Which is what I want to know. I don't exactly have an interesting
>>disk setup.
>>
>
>You don't need one - just a single disk should show up such problems.  I
>forget which workloads though.  Perhaps just a linear read (readahead
>queues the I/O but doesn't unplug, subsequent lock_page() sulks).
>

I guess so. Is plugging still needed now that the IO layer should
get larger requests? Disabling it might result in a small initial
request (although even that may be good for pipelining)...

Otherwise, we could make set_page_dirty_lock use a weird non-unplugging
variant (although maybe that will upset Ken), but I'd rather look
at simplification first ;)

>
>>Yes. So set_page_dirty_lock is broken, right?
>>
>
>yup.
>
>
>>And the wait_on_page_stuff needs an inode ref.
>>Also splice seems to have broken sync_page.
>>
>
>Please describe the splice() problem which you've observed.
>

sync_page wants to get either the current mapping, or a NULL one.
The sync_page methods must then be able to handle running into a
NULL mapping.

With splice, the mapping can change, so you can have the wrong
sync_page callback run against the page.

>
>>Well yes, writing to a page would be the main reason to set it dirty.
>>Is splice broken as well? I'm not sure that it always has a ref on the
>>inode when stealing a page.
>>
>
>Whereabouts?
>

The ->pin() calls in pipe_to_file and pipe_to_sendpage?

>
>>It sounds like you think fixing the set_page_dirty_lock callers wouldn't
>>be too difficult? I wouldn't know (although the ptrace one should be
>>able to be turned into a set_page_dirty, because we're holding mmap_sem).
>>
>
>No, I think it's damn impossible ;)
>
>get_user_pages() has gotten us a random pagecache page.  How do we
>non-racily get at the address_space prior to locking that page?
>
>I don't think we can.
>

But the vma isn't going to disappear because mmap_sem is held; and the
vma should hold a ref on the inode I think?

>
>>You're sure about all other lock_page()rs? I'm not, given that
>>set_page_dirty_lock got it so wrong. But you'd have a better idea than
>>me.
>>
>
>No, I'm not sure.
>
>However it is rare for the kernel to play with pagecache pages against
>which the caller doesn't have an inode ref.  Think: how did the caller look
>up that page in the first place if not from the address_space in the first
>place?
>
>- get_user_pages(): the current problem
>
>- page LRU: OK, uses trylock first.
>
>- pagetable walk??
>

Am I wrong about mmap_sem?

Anyway, it is possible that most of the problems could be solved by locking
the page at the time of lookup, and unlocking it on completion/dirtying...
it's just that that would be a bit of a task. Can we somehow add BUG_ONs to
lock_page to ensure we've got an inode ref?

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-05-30  2:55 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-29  9:34 [rfc][patch] remove racy sync_page? Nick Piggin
2006-05-29 19:15 ` Andrew Morton
2006-05-29 19:15   ` Andrew Morton
2006-05-30  0:08   ` Nick Piggin
2006-05-30  0:08     ` Nick Piggin
2006-05-30  1:32     ` Andrew Morton
2006-05-30  1:32       ` Andrew Morton
2006-05-30  2:54       ` Nick Piggin [this message]
2006-05-30  2:54         ` Nick Piggin
2006-05-30  3:14         ` Andrew Morton
2006-05-30  3:14           ` Andrew Morton
2006-05-30  4:13           ` Nick Piggin
2006-05-30  4:13             ` Nick Piggin
2006-05-30  9:05           ` Jens Axboe
2006-05-30  9:05             ` Jens Axboe
2006-05-31 13:43             ` Nick Piggin
2006-05-31 13:43               ` Nick Piggin
2006-05-31 15:09               ` Hugh Dickins
2006-05-31 15:09                 ` Hugh Dickins
2006-05-31 15:22                 ` Nick Piggin
2006-05-31 15:22                   ` Nick Piggin
2006-05-31 17:51                   ` Jens Axboe
2006-05-31 17:51                     ` Jens Axboe
2006-05-31 17:50               ` Jens Axboe
2006-05-31 17:50                 ` Jens Axboe
2006-05-30  4:20         ` Linus Torvalds
2006-05-30  4:20           ` Linus Torvalds
2006-05-30  5:07           ` Nick Piggin
2006-05-30  5:07             ` Nick Piggin
2006-05-30  5:21             ` Nick Piggin
2006-05-30  5:21               ` Nick Piggin
2006-05-30  6:12               ` Neil Brown
2006-05-30  6:12                 ` Neil Brown
2006-05-30  7:10                 ` Nick Piggin
2006-05-30  7:10                   ` Nick Piggin
2006-05-31  4:34                   ` Neil Brown
2006-05-31  4:34                     ` Neil Brown
2006-05-30  8:24               ` Nikita Danilov
2006-05-30  8:24                 ` Nikita Danilov
2006-05-30 17:55               ` Linus Torvalds
2006-05-30 17:55                 ` Linus Torvalds
2006-05-31  0:32                 ` Nick Piggin
2006-05-31  0:32                   ` Nick Piggin
2006-05-31  0:56                   ` Linus Torvalds
2006-05-31  0:56                     ` Linus Torvalds
2006-05-31  1:33                     ` Mark Lord
2006-05-31  1:33                       ` Mark Lord
2006-05-31  6:11                       ` Jens Axboe
2006-05-31  6:11                         ` Jens Axboe
2006-05-31 12:55                         ` Mark Lord
2006-05-31 12:55                           ` Mark Lord
2006-05-31 13:02                           ` Jens Axboe
2006-05-31 13:02                             ` Jens Axboe
2006-06-01 13:19                           ` NCQ performance (was Re: [rfc][patch] remove racy sync_page?) Jens Axboe
2006-06-01 13:19                             ` Jens Axboe
2006-06-01 14:56                             ` Avi Kivity
2006-06-01 14:56                               ` Avi Kivity
2006-06-01 15:03                               ` Jens Axboe
2006-06-01 15:03                                 ` Jens Axboe
2006-06-01 18:04                                 ` Jens Axboe
2006-06-01 18:04                                   ` Jens Axboe
2006-06-05  5:30                                   ` Avi Kivity
2006-06-05  5:30                                     ` Avi Kivity
2006-06-05  7:59                                     ` Jens Axboe
2006-06-05  7:59                                       ` Jens Axboe
2006-05-31 12:31                     ` [rfc][patch] remove racy sync_page? Helge Hafting
2006-05-31 12:31                       ` Helge Hafting
2006-05-31 12:36                       ` Arjan van de Ven
2006-05-31 12:36                         ` Arjan van de Ven
2006-05-31 13:29                     ` Nick Piggin
2006-05-31 13:29                       ` Nick Piggin
2006-05-31 13:41                       ` Jens Axboe
2006-05-31 13:41                         ` Jens Axboe
2006-05-31 13:54                         ` Nick Piggin
2006-05-31 13:54                           ` Nick Piggin
2006-05-31 14:43                       ` Linus Torvalds
2006-05-31 14:43                         ` Linus Torvalds
2006-05-31 14:57                         ` Nick Piggin
2006-05-31 14:57                           ` Nick Piggin
2006-05-31 15:13                           ` Linus Torvalds
2006-05-31 15:13                             ` Linus Torvalds
2006-05-31 15:33                             ` Nick Piggin
2006-05-31 15:57                               ` Linus Torvalds
2006-05-31 16:12                                 ` Linus Torvalds
2006-05-31 16:26                                   ` Nick Piggin
2006-05-31 16:19                                 ` Nick Piggin
2006-05-31 16:22                                   ` Nick Piggin
2006-05-31 16:41                                     ` Linus Torvalds
2006-06-02  2:34                                       ` Nick Piggin
2006-06-02  2:39                                         ` Nick Piggin
2006-05-31 16:39                                   ` Linus Torvalds
2006-06-02  2:21                                     ` Nick Piggin
2006-05-31 23:59                                   ` Neil Brown
2006-05-31 15:09                         ` Linus Torvalds
2006-05-31 15:09                           ` Linus Torvalds
2006-05-31 18:13                           ` Jens Axboe
2006-05-31 18:13                             ` Jens Axboe
2006-05-31 18:26                             ` Linus Torvalds
2006-05-31 18:26                               ` Linus Torvalds
2006-05-30  5:36             ` Nick Piggin
2006-05-30 18:31               ` Hugh Dickins
2006-05-30 18:31                 ` Hugh Dickins
2006-05-31  0:21                 ` Nick Piggin
2006-05-31  0:21                   ` Nick Piggin
2006-05-31  3:06                   ` Hugh Dickins
2006-05-31  3:06                     ` Hugh Dickins
2006-05-31 14:30                     ` Hugh Dickins
2006-05-31 14:30                       ` Hugh Dickins
2006-05-31 17:56                     ` Jens Axboe
2006-05-31 17:56                       ` Jens Axboe
2006-05-30  5:51 ` Josef Sipek
2006-05-30  5:51   ` Josef Sipek
2006-05-30  6:44   ` Nick Piggin
2006-05-30  6:44     ` Nick Piggin
2006-05-30  6:50     ` Nick Piggin
2006-05-30  6:50       ` Nick Piggin
2006-05-30 13:12     ` Josef Sipek
2006-05-30 13:12       ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=447BB3FD.1070707@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mason@suse.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.