All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de,
	torvalds@osdl.org
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 14:13:01 +1000	[thread overview]
Message-ID: <447BC64D.3060706@yahoo.com.au> (raw)
In-Reply-To: <20060529201444.cd89e0d8.akpm@osdl.org>

Andrew Morton wrote:

>On Tue, 30 May 2006 12:54:53 +1000
>Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
>>I guess so. Is plugging still needed now that the IO layer should
>>get larger requests? Disabling it might result in a small initial
>>request (although even that may be good for pipelining)...
>>
>
>Mysterious question, that.  A few years ago I think Jens tried pulling unplugging
>out, but some devices still want it (magneto-optical storage iirc).  And I
>think we did try removing it, and it caused hurt.
>

Would be nice to get rid of it. I guess that's out of the question
for several releases, even if there was no performance problems.

>
>>Otherwise, we could make set_page_dirty_lock use a weird non-unplugging
>>variant
>>
>
>Yes, that would work.  In fact the number of times when direct-io actually
>calls set_page_dirty_lock() is infinitesimal - I had to jump through hoops
>to even test that code.  The speculative
>set-the-page-dirty-before-we-do-the-IO thing is very effective.  So the
>performace impact of making such a change would be nil.
>

OK, that'll probably be the way to go for upcoming releases. I'll post
a patch soon.

>
>That's for the direct-io case.  Other cases might be hurt more.
>
>Also, perhaps we could poke kblockd to do it for us.
>

Hard, I think, to get back to ->mapping. Well not hard if we just use
a non-syncing lock_page, but in that case we don't need kblockd.

>
>>sync_page wants to get either the current mapping, or a NULL one.
>>The sync_page methods must then be able to handle running into a
>>NULL mapping.
>>
>>With splice, the mapping can change, so you can have the wrong
>>sync_page callback run against the page.
>>
>
>Oh.
>

Don't know what we can do about that off the top of my head. Within
block_sync_page there shouldn't be any problems (at worst, we'd unplug
the wrong dev). Maybe the whole sync_page callback scheme can be removed
so nobody tries to do anything funny with it? Call blk_run_backing_dev
directly.

>
>>>>Well yes, writing to a page would be the main reason to set it dirty.
>>>>Is splice broken as well? I'm not sure that it always has a ref on the
>>>>inode when stealing a page.
>>>>
>>>>
>>>Whereabouts?
>>>
>>>
>>The ->pin() calls in pipe_to_file and pipe_to_sendpage?
>>
>
>One for Jens...
>
>
>>>>It sounds like you think fixing the set_page_dirty_lock callers wouldn't
>>>>be too difficult? I wouldn't know (although the ptrace one should be
>>>>able to be turned into a set_page_dirty, because we're holding mmap_sem).
>>>>
>>>>
>>>No, I think it's damn impossible ;)
>>>
>>>get_user_pages() has gotten us a random pagecache page.  How do we
>>>non-racily get at the address_space prior to locking that page?
>>>
>>>I don't think we can.
>>>
>>>
>>But the vma isn't going to disappear because mmap_sem is held; and the
>>vma should hold a ref on the inode I think?
>>
>
>That's true during the get_user_pages() call.  Be we run
>set_page_dirty_lock() much later, after IO completion.
>

Oh, I thought you specifically meant the ptrace case. Yes, in
general it is much harder.

>>Anyway, it is possible that most of the problems could be solved by locking
>>the page at the time of lookup, and unlocking it on completion/dirtying...
>>it's just that that would be a bit of a task.
>>
>
>But lock_page() requires access to the address_space.  To kick the IO so we
>don't wait for ever.
>

It shouldn't wait for ever, because the unplug timer will go off
and kblockd will do it eventually. And I was imagining that it would
have a pin on the address space at the point it is looked up... But
reworking all callers is just a pipe dream anyway, so nevermind ;)

Where we have synchronous IO, we do want the queue to be unplugged,
however in most set_page_dirty_lock case this is normally not the
case and a locked page should be rare. So hmm yes that would make
sense to have it use a special lock_page...

>
>>Can we somehow add BUG_ONs to
>>lock_page to ensure we've got an inode ref?
>>
>
>WARN_ONs.
>

But is it practical? Or I suspect the warning would only ever trigger
in the really rare racy cases anyway, when dentry, inode, etc have
been reclaimed.

Anyway, I'll come up with a less intrusive patch shortly...

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mason@suse.com,
	andrea@suse.de, hugh@veritas.com, axboe@suse.de,
	torvalds@osdl.org
Subject: Re: [rfc][patch] remove racy sync_page?
Date: Tue, 30 May 2006 14:13:01 +1000	[thread overview]
Message-ID: <447BC64D.3060706@yahoo.com.au> (raw)
In-Reply-To: <20060529201444.cd89e0d8.akpm@osdl.org>

Andrew Morton wrote:

>On Tue, 30 May 2006 12:54:53 +1000
>Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
>>I guess so. Is plugging still needed now that the IO layer should
>>get larger requests? Disabling it might result in a small initial
>>request (although even that may be good for pipelining)...
>>
>
>Mysterious question, that.  A few years ago I think Jens tried pulling unplugging
>out, but some devices still want it (magneto-optical storage iirc).  And I
>think we did try removing it, and it caused hurt.
>

Would be nice to get rid of it. I guess that's out of the question
for several releases, even if there was no performance problems.

>
>>Otherwise, we could make set_page_dirty_lock use a weird non-unplugging
>>variant
>>
>
>Yes, that would work.  In fact the number of times when direct-io actually
>calls set_page_dirty_lock() is infinitesimal - I had to jump through hoops
>to even test that code.  The speculative
>set-the-page-dirty-before-we-do-the-IO thing is very effective.  So the
>performace impact of making such a change would be nil.
>

OK, that'll probably be the way to go for upcoming releases. I'll post
a patch soon.

>
>That's for the direct-io case.  Other cases might be hurt more.
>
>Also, perhaps we could poke kblockd to do it for us.
>

Hard, I think, to get back to ->mapping. Well not hard if we just use
a non-syncing lock_page, but in that case we don't need kblockd.

>
>>sync_page wants to get either the current mapping, or a NULL one.
>>The sync_page methods must then be able to handle running into a
>>NULL mapping.
>>
>>With splice, the mapping can change, so you can have the wrong
>>sync_page callback run against the page.
>>
>
>Oh.
>

Don't know what we can do about that off the top of my head. Within
block_sync_page there shouldn't be any problems (at worst, we'd unplug
the wrong dev). Maybe the whole sync_page callback scheme can be removed
so nobody tries to do anything funny with it? Call blk_run_backing_dev
directly.

>
>>>>Well yes, writing to a page would be the main reason to set it dirty.
>>>>Is splice broken as well? I'm not sure that it always has a ref on the
>>>>inode when stealing a page.
>>>>
>>>>
>>>Whereabouts?
>>>
>>>
>>The ->pin() calls in pipe_to_file and pipe_to_sendpage?
>>
>
>One for Jens...
>
>
>>>>It sounds like you think fixing the set_page_dirty_lock callers wouldn't
>>>>be too difficult? I wouldn't know (although the ptrace one should be
>>>>able to be turned into a set_page_dirty, because we're holding mmap_sem).
>>>>
>>>>
>>>No, I think it's damn impossible ;)
>>>
>>>get_user_pages() has gotten us a random pagecache page.  How do we
>>>non-racily get at the address_space prior to locking that page?
>>>
>>>I don't think we can.
>>>
>>>
>>But the vma isn't going to disappear because mmap_sem is held; and the
>>vma should hold a ref on the inode I think?
>>
>
>That's true during the get_user_pages() call.  Be we run
>set_page_dirty_lock() much later, after IO completion.
>

Oh, I thought you specifically meant the ptrace case. Yes, in
general it is much harder.

>>Anyway, it is possible that most of the problems could be solved by locking
>>the page at the time of lookup, and unlocking it on completion/dirtying...
>>it's just that that would be a bit of a task.
>>
>
>But lock_page() requires access to the address_space.  To kick the IO so we
>don't wait for ever.
>

It shouldn't wait for ever, because the unplug timer will go off
and kblockd will do it eventually. And I was imagining that it would
have a pin on the address space at the point it is looked up... But
reworking all callers is just a pipe dream anyway, so nevermind ;)

Where we have synchronous IO, we do want the queue to be unplugged,
however in most set_page_dirty_lock case this is normally not the
case and a locked page should be rare. So hmm yes that would make
sense to have it use a special lock_page...

>
>>Can we somehow add BUG_ONs to
>>lock_page to ensure we've got an inode ref?
>>
>
>WARN_ONs.
>

But is it practical? Or I suspect the warning would only ever trigger
in the really rare racy cases anyway, when dentry, inode, etc have
been reclaimed.

Anyway, I'll come up with a less intrusive patch shortly...

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-05-30  4:13 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-29  9:34 [rfc][patch] remove racy sync_page? Nick Piggin
2006-05-29 19:15 ` Andrew Morton
2006-05-29 19:15   ` Andrew Morton
2006-05-30  0:08   ` Nick Piggin
2006-05-30  0:08     ` Nick Piggin
2006-05-30  1:32     ` Andrew Morton
2006-05-30  1:32       ` Andrew Morton
2006-05-30  2:54       ` Nick Piggin
2006-05-30  2:54         ` Nick Piggin
2006-05-30  3:14         ` Andrew Morton
2006-05-30  3:14           ` Andrew Morton
2006-05-30  4:13           ` Nick Piggin [this message]
2006-05-30  4:13             ` Nick Piggin
2006-05-30  9:05           ` Jens Axboe
2006-05-30  9:05             ` Jens Axboe
2006-05-31 13:43             ` Nick Piggin
2006-05-31 13:43               ` Nick Piggin
2006-05-31 15:09               ` Hugh Dickins
2006-05-31 15:09                 ` Hugh Dickins
2006-05-31 15:22                 ` Nick Piggin
2006-05-31 15:22                   ` Nick Piggin
2006-05-31 17:51                   ` Jens Axboe
2006-05-31 17:51                     ` Jens Axboe
2006-05-31 17:50               ` Jens Axboe
2006-05-31 17:50                 ` Jens Axboe
2006-05-30  4:20         ` Linus Torvalds
2006-05-30  4:20           ` Linus Torvalds
2006-05-30  5:07           ` Nick Piggin
2006-05-30  5:07             ` Nick Piggin
2006-05-30  5:21             ` Nick Piggin
2006-05-30  5:21               ` Nick Piggin
2006-05-30  6:12               ` Neil Brown
2006-05-30  6:12                 ` Neil Brown
2006-05-30  7:10                 ` Nick Piggin
2006-05-30  7:10                   ` Nick Piggin
2006-05-31  4:34                   ` Neil Brown
2006-05-31  4:34                     ` Neil Brown
2006-05-30  8:24               ` Nikita Danilov
2006-05-30  8:24                 ` Nikita Danilov
2006-05-30 17:55               ` Linus Torvalds
2006-05-30 17:55                 ` Linus Torvalds
2006-05-31  0:32                 ` Nick Piggin
2006-05-31  0:32                   ` Nick Piggin
2006-05-31  0:56                   ` Linus Torvalds
2006-05-31  0:56                     ` Linus Torvalds
2006-05-31  1:33                     ` Mark Lord
2006-05-31  1:33                       ` Mark Lord
2006-05-31  6:11                       ` Jens Axboe
2006-05-31  6:11                         ` Jens Axboe
2006-05-31 12:55                         ` Mark Lord
2006-05-31 12:55                           ` Mark Lord
2006-05-31 13:02                           ` Jens Axboe
2006-05-31 13:02                             ` Jens Axboe
2006-06-01 13:19                           ` NCQ performance (was Re: [rfc][patch] remove racy sync_page?) Jens Axboe
2006-06-01 13:19                             ` Jens Axboe
2006-06-01 14:56                             ` Avi Kivity
2006-06-01 14:56                               ` Avi Kivity
2006-06-01 15:03                               ` Jens Axboe
2006-06-01 15:03                                 ` Jens Axboe
2006-06-01 18:04                                 ` Jens Axboe
2006-06-01 18:04                                   ` Jens Axboe
2006-06-05  5:30                                   ` Avi Kivity
2006-06-05  5:30                                     ` Avi Kivity
2006-06-05  7:59                                     ` Jens Axboe
2006-06-05  7:59                                       ` Jens Axboe
2006-05-31 12:31                     ` [rfc][patch] remove racy sync_page? Helge Hafting
2006-05-31 12:31                       ` Helge Hafting
2006-05-31 12:36                       ` Arjan van de Ven
2006-05-31 12:36                         ` Arjan van de Ven
2006-05-31 13:29                     ` Nick Piggin
2006-05-31 13:29                       ` Nick Piggin
2006-05-31 13:41                       ` Jens Axboe
2006-05-31 13:41                         ` Jens Axboe
2006-05-31 13:54                         ` Nick Piggin
2006-05-31 13:54                           ` Nick Piggin
2006-05-31 14:43                       ` Linus Torvalds
2006-05-31 14:43                         ` Linus Torvalds
2006-05-31 14:57                         ` Nick Piggin
2006-05-31 14:57                           ` Nick Piggin
2006-05-31 15:13                           ` Linus Torvalds
2006-05-31 15:13                             ` Linus Torvalds
2006-05-31 15:33                             ` Nick Piggin
2006-05-31 15:57                               ` Linus Torvalds
2006-05-31 16:12                                 ` Linus Torvalds
2006-05-31 16:26                                   ` Nick Piggin
2006-05-31 16:19                                 ` Nick Piggin
2006-05-31 16:22                                   ` Nick Piggin
2006-05-31 16:41                                     ` Linus Torvalds
2006-06-02  2:34                                       ` Nick Piggin
2006-06-02  2:39                                         ` Nick Piggin
2006-05-31 16:39                                   ` Linus Torvalds
2006-06-02  2:21                                     ` Nick Piggin
2006-05-31 23:59                                   ` Neil Brown
2006-05-31 15:09                         ` Linus Torvalds
2006-05-31 15:09                           ` Linus Torvalds
2006-05-31 18:13                           ` Jens Axboe
2006-05-31 18:13                             ` Jens Axboe
2006-05-31 18:26                             ` Linus Torvalds
2006-05-31 18:26                               ` Linus Torvalds
2006-05-30  5:36             ` Nick Piggin
2006-05-30 18:31               ` Hugh Dickins
2006-05-30 18:31                 ` Hugh Dickins
2006-05-31  0:21                 ` Nick Piggin
2006-05-31  0:21                   ` Nick Piggin
2006-05-31  3:06                   ` Hugh Dickins
2006-05-31  3:06                     ` Hugh Dickins
2006-05-31 14:30                     ` Hugh Dickins
2006-05-31 14:30                       ` Hugh Dickins
2006-05-31 17:56                     ` Jens Axboe
2006-05-31 17:56                       ` Jens Axboe
2006-05-30  5:51 ` Josef Sipek
2006-05-30  5:51   ` Josef Sipek
2006-05-30  6:44   ` Nick Piggin
2006-05-30  6:44     ` Nick Piggin
2006-05-30  6:50     ` Nick Piggin
2006-05-30  6:50       ` Nick Piggin
2006-05-30 13:12     ` Josef Sipek
2006-05-30 13:12       ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=447BC64D.3060706@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mason@suse.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.