public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Timur Tabi <timur.tabi@ammasso.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Libor Michalek <libor@topspin.com>, Andrew Morton <akpm@osdl.org>,
	Andrea Arcangeli <andrea@suse.de>,
	linux-kernel@vger.kernel.org, openib-general@openib.org
Subject: Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation
Date: Sat, 07 May 2005 09:45:50 -0500	[thread overview]
Message-ID: <427CD49E.6080300@ammasso.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0505071304010.4713@goblin.wat.veritas.com>

Hugh Dickins wrote:

> Oh, well, maybe, but what is the real problem?
> Are you sure that copy-on-write doesn't come into it?

No, but I do know that my test case doesn't call fork(), so it's reproducible without 
involving COW.  Of course, I'm sure someone's going to tell me now that COW comes into 
effect even without fork().  If so, please explain.

> I haven't reread through the whole thread, but my recollection is
> that you never quite said what the real problem is: you'd found some
> time ago that get_user_pages sometimes failed to pin the pages for
> your complex app, so were forced to mlock too; but couldn't provide
> any simple test case for the failure (which can indeed be a lot of
> work to devise), so we were all in the dark as to what went wrong.

The short answer: under "extreme" memory pressure, the data inside a page pinned by 
get_user_pages() is swapped out, moved, or deleted (I'm not sure which).  Some other data 
is placed into that physical location.

By extreme memory pressure, I mean having the process allocate and touch as much memory as 
possible.  Something like this:

num_bytes = get_amount_of_physical_ram();
char *p = malloc(num_bytes);
for (i=0; i<num_bytes; i+=PAGE_SIZE)
   p[i] = 0;

The above over-simplified code fails on earlier 2.6 kernels (or earlier versions of glibc 
that accompany most distros the use the earlier 2.6 kernels).  Either malloc() returns 
NULL, or the p[i]=0 part causes a segfault.  I haven't bothered to trace down why.  But 
when it does work, the page pinned by get_user_pages() changes.

> But you've now found that 2.6.7 and later kernels allow your app to
> work correctly without mlock, good.  get_user_pages is certainly the
> right tool to use for such pinning.  (On the question of whether
> mlock guarantees that user virtual addresses map to the same physical
> addresses, I prefer Arjan's view that it does not; but accept that
> there might prove to be difficulties in holding that position.)

My understanding is that mlock() could in theory allow the page to be moved, but that 
currently nothing in the kernel would actually move it.  However, that could change in the 
future to allow hot-swapping of RAM.

> So, it works now, you've exonerated today's get_user_pages, and you've
> identified at least one get_user_pages fix which went in at that time:
> do we really need to chase this further?

My driver needs to support all 2.4 and 2.6 kernel versions.  My makefile scans the kernel 
source tree with 'grep' to identify various characterists, and I use #ifdefs to 
conditionally compile code depending on what features are present in the kernel.  I can't 
use the kernel version number, because that's not reliable - distros will incorporate 
patches from future kernels without changing the version ID.

So I need to take into account distro vendors that use an earlier kernel, like 2.6.5, and 
back-port the patch from 2.6.7.  The distro vendor will keep the 2.6.5 version number, 
which is why I can't rely on it.

I need to know exactly what the fix is, so that when I scan mm/rmap.c, I know what to look 
for.  Currently, I look for this regex:

try_to_unmap_one.*vm_area_struct

which seems to work.  However, now I think it's just a coincidence.

> By the way, please don't be worried when soon the try_to_unmap_one
> comment and code that you identified above disappear.  When I'm
> back in patch submission mode, I'll be sending Andrew a patch which
> removes it, instead reworking can_share_swap_page to rely on the
> page_mapcount instead of page_count, which avoids the ironical
> behaviour my comment refers to, and allows an awkward page migration
> case to proceed (once unpinned).  Andrea and I now both prefer this
> page_mapcount approach.

Ugh, that means my regex is probably going to break.  Not only that, but I don't 
understand what you're saying either.  Trying to understand the VM is really hard.

I guess in this specific case, it doesn't really matter, because calling mlock() when I 
should be calling get_user_pages() is not a bad thing.

  reply	other threads:[~2005-05-07 14:46 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-04 22:09 [PATCH][RFC][0/4] InfiniBand userspace verbs implementation Roland Dreier
2005-04-04 22:09 ` [PATCH][RFC][1/4] IB: core changes for userspace verbs Roland Dreier
2005-04-04 22:09   ` [PATCH][RFC][2/4] IB: userspace verbs main module Roland Dreier
2005-04-04 22:09     ` [PATCH][RFC][3/4] IB: userspace verbs mthca changes Roland Dreier
2005-04-04 22:09       ` [PATCH][RFC][4/4] IB: userspace verbs Kconfig/Makefile changes Roland Dreier
2005-04-04 22:49       ` [openib-general] [PATCH][RFC][3/4] IB: userspace verbs mthca changes Tom Duffy
2005-04-04 23:34         ` Roland Dreier
2005-04-21  0:37       ` [PATCH][MTHCA] fix sparc build WAS: " Tom Duffy
2005-04-21  0:38         ` David S. Miller
2005-04-11 14:22 ` [PATCH][RFC][0/4] InfiniBand userspace verbs implementation Troy Benjegerdes
2005-04-11 15:34   ` Roland Dreier
2005-04-11 16:33     ` Troy Benjegerdes
2005-04-11 16:56       ` Roland Dreier
2005-04-11 18:01         ` Troy Benjegerdes
2005-04-11 18:03           ` Roland Dreier
2005-04-12  0:13             ` Andrew Morton
2005-04-12  0:21               ` Roland Dreier
2005-04-12 18:23                 ` Michael S. Tsirkin
2005-04-13 18:28                   ` Roland Dreier
2005-04-13 19:32                     ` Andrew Morton
2005-04-13  1:04               ` [openib-general] " Libor Michalek
2005-04-18 17:15                 ` Timur Tabi
2005-04-26  3:31                 ` Libor Michalek
2005-05-04 18:27                   ` Timur Tabi
2005-05-05 18:48                     ` Timur Tabi
2005-05-06 23:08                       ` Timur Tabi
2005-05-07 13:18                         ` Hugh Dickins
2005-05-07 14:45                           ` Timur Tabi [this message]
2005-05-07 16:30                             ` Hugh Dickins
2005-05-11 20:12                               ` William Jordan
2005-05-11 20:42                                 ` Hugh Dickins
2005-05-11 22:52                                   ` Andrea Arcangeli
2005-05-11 22:49                                 ` Andrea Arcangeli
2005-05-11 22:53                                   ` Timur Tabi
2005-05-11 23:05                                     ` Andrea Arcangeli
2005-05-05 23:34                     ` Libor Michalek
2005-04-18 16:22               ` Timur Tabi
2005-04-18 16:43                 ` Christoph Hellwig
2005-04-18 16:45                   ` Timur Tabi
2005-04-24  2:44                     ` Andrew Morton
2005-04-24 14:23                       ` Timur Tabi
2005-04-24 20:53                         ` Greg KH
2005-04-24 21:52                           ` Timur Tabi
2005-04-25  1:03                             ` Greg KH
2005-04-25  4:12                               ` Timur Tabi
2005-04-25 13:30                                 ` Dave Hansen
2005-04-25 13:15                         ` Roland Dreier
2005-04-25 13:17                           ` Christoph Hellwig
2005-04-25 14:16                             ` Roland Dreier
2005-04-25 20:54                           ` Andrew Morton
2005-04-25 21:12                             ` Roland Dreier
2005-04-25 22:14                               ` Andrew Morton
2005-04-25 22:21                                 ` Timur Tabi
2005-04-25 22:32                                   ` Andrew Morton
2005-04-25 23:58                                     ` Roland Dreier
2005-04-26  0:11                                       ` Andrew Morton
2005-04-26  0:23                                         ` Roland Dreier
2005-04-26  0:37                                           ` Andrew Morton
2005-04-26  2:21                                             ` Timur Tabi
2005-04-26  3:16                                               ` Andrew Morton
2005-04-26  3:38                                                 ` Timur Tabi
2005-04-26  4:33                                                   ` Andrew Morton
2005-04-26 14:07                                                     ` Timur Tabi
2005-04-26 15:31                                             ` Roland Dreier
2005-04-26 15:42                                               ` [openib-general] " Libor Michalek
2005-04-26 15:49                                                 ` Roland Dreier
2005-04-26 19:28                                                   ` Andrew Morton
2005-04-26 20:14                                                     ` Roland Dreier
2005-04-26 20:18                                                       ` Timur Tabi
2005-04-26 20:37                                                         ` Andrew Morton
2005-04-29 14:26                                                           ` Bill Jordan
2005-04-29 15:56                                                             ` Caitlin Bestler
2005-04-29 16:45                                                               ` RDMA memory registration (was: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation) Roland Dreier
2005-04-29 17:23                                                                 ` Libor Michalek
2005-04-29 18:22                                                                 ` RDMA memory registration Brice Goglin
2005-04-29 18:31                                                                   ` Roland Dreier
2005-04-29 19:33                                                                   ` [openib-general] " Grant Grundler
2005-05-03  8:42                                                                     ` David Addison
2005-05-03 15:36                                                                       ` Grant Grundler
2005-04-29 19:43                                                                 ` RDMA memory registration (was: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation) Bill Jordan
2005-04-29 19:45                                                                   ` RDMA memory registration Roland Dreier
2005-04-29 17:04                                                               ` [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation Libor Michalek
2005-04-30  0:31                                                                 ` Caitlin Bestler
2005-05-03 18:43                                                                   ` Andy Isaacson
2005-05-03 19:04                                                                     ` Caitlin Bestler
2005-05-04 18:22                                                                     ` William Jordan
2005-05-05  1:27                                                                       ` Rik van Riel
2005-05-05  1:57                                                                         ` Andy Isaacson
2005-04-26 20:32                                                       ` Andrew Morton
2005-04-26 21:23                                                         ` Roland Dreier
2005-04-27  0:05                                                           ` Andrew Morton
2005-04-27  2:13                                                             ` Roland Dreier
2005-04-27  3:21                                                             ` Caitlin Bestler
2005-04-27  3:15                                                     ` Caitlin Bestler
2005-04-26  2:03                                       ` IWAMOTO Toshihiro
2005-04-26  2:16                                         ` Timur Tabi
2005-04-26  2:26                                         ` [openib-general] " Stephen Langdon
2005-04-25 22:23                                 ` Timur Tabi
2005-04-25 22:35                                   ` Andrew Morton
2005-04-25 22:42                                     ` Timur Tabi
2005-04-25 23:13                                       ` Andrew Morton
2005-04-25 23:21                                         ` Timur Tabi
2005-04-25 23:27                                           ` Andrew Morton
2005-04-26  0:08                                         ` Roland Dreier
2005-04-25 22:51                                     ` [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbsimplementation Bob Woodruff
2005-04-25 23:13                                       ` Timur Tabi
2005-04-25 23:17                                         ` Andrew Morton
2005-04-25 23:29                                         ` Bob Woodruff
2005-04-25 23:17                                     ` [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation Libor Michalek
2005-04-25 23:24                                       ` Andrew Morton
2005-04-25 23:37                                         ` Caitlin Bestler
2005-04-26  0:10                                           ` Andrew Morton
2005-04-26  3:55                                         ` Libor Michalek
2005-04-26  0:02                                 ` Roland Dreier
2005-04-26  6:12                                   ` Christoph Hellwig
2005-04-26 13:45                                     ` [openib-general] " Caitlin Bestler
2005-04-26 15:24                                     ` Timur Tabi
2005-04-25 19:11                       ` Andy Isaacson
2005-04-18 16:09     ` Timur Tabi
2005-04-18 16:12       ` Roland Dreier
2005-04-18 16:50         ` Timur Tabi
2005-04-21 19:47           ` Pavel Machek
2005-04-18 16:16       ` Arjan van de Ven
2005-04-18 16:25         ` Timur Tabi
2005-04-18 19:40           ` Arjan van de Ven
2005-04-18 20:00             ` Timur Tabi
2005-04-18 20:05               ` Arjan van de Ven
2005-04-18 20:19                 ` Timur Tabi
2005-04-18 20:07             ` [openib-general] " Bernhard Fischer
2005-04-21  2:17               ` Troy Benjegerdes
2005-04-21  3:07                 ` Timur Tabi
2005-04-21 17:38                   ` Andy Isaacson
2005-04-21 18:39                     ` Timur Tabi
2005-04-21 19:56                       ` Andy Isaacson
2005-04-21 20:07                         ` Timur Tabi
2005-04-21 20:12                           ` Chris Wright
2005-04-21 20:14                             ` Timur Tabi
2005-04-21 20:25                               ` Chris Wright
2005-04-21 20:30                                 ` Arjan van de Ven
2005-04-22  6:14                           ` Greg KH
2005-04-22 17:55         ` Timur Tabi
2005-04-22 18:12           ` Arjan van de Ven
2005-04-29  0:56         ` Andrew Morton
     [not found] <3VAeQ-1To-7@gated-at.bofh.it>
     [not found] ` <3VNYt-4M4-15@gated-at.bofh.it>
2005-04-22 13:10   ` [openib-general] " Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=427CD49E.6080300@ammasso.com \
    --to=timur.tabi@ammasso.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=hugh@veritas.com \
    --cc=libor@topspin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=openib-general@openib.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox