Question about pte_offset_map

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Question about pte_offset_map_lock
@ 2009-12-17  2:46 Minchan Kim
  2009-12-17  8:01 ` Peter Zijlstra
  2009-12-17  9:54 ` Hugh Dickins
  0 siblings, 2 replies; 5+ messages in thread
From: Minchan Kim @ 2009-12-17  2:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Hugh Dickins, Christoph Lameter, Peter Zijlstra,
	KAMEZAWA Hiroyuki

It may be a dumb question.

As I read the code of pte_lock, I have a question. 
Now, there is pte_offset_map_lock following as. 

#define pte_offset_map_lock(mm, pmd, address, ptlp)     \
({                                                      \
        spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
        pte_t *__pte = pte_offset_map(pmd, address);    \
        *(ptlp) = __ptl;                                \
        spin_lock(__ptl);                               \
        __pte;                                          \
})

Why do we grab the lock after getting __pte?
Is it possible that __pte might be changed before we grab the spin_lock?

Some codes in mm checks original pte by pte_same. 
There are not-checked cases in proc. As looking over the cases,
It seems no problem. But in future, new user of pte_offset_map_lock 
could mistake with that?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about pte_offset_map_lock
  2009-12-17  2:46 Question about pte_offset_map_lock Minchan Kim
@ 2009-12-17  8:01 ` Peter Zijlstra
  2009-12-17 14:49   ` Minchan Kim
  2009-12-17  9:54 ` Hugh Dickins
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-12-17  8:01 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, Hugh Dickins, Christoph Lameter, KAMEZAWA Hiroyuki

On Thu, 2009-12-17 at 11:46 +0900, Minchan Kim wrote:
> It may be a dumb question.
> 
> As I read the code of pte_lock, I have a question. 
> Now, there is pte_offset_map_lock following as. 
> 
> #define pte_offset_map_lock(mm, pmd, address, ptlp)     \
> ({                                                      \
>         spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
>         pte_t *__pte = pte_offset_map(pmd, address);    \
>         *(ptlp) = __ptl;                                \
>         spin_lock(__ptl);                               \
>         __pte;                                          \
> })
> 
> Why do we grab the lock after getting __pte?
> Is it possible that __pte might be changed before we grab the spin_lock?
> 
> Some codes in mm checks original pte by pte_same. 
> There are not-checked cases in proc. As looking over the cases,
> It seems no problem. But in future, new user of pte_offset_map_lock 
> could mistake with that?

I think currently mmap_sem serializes all that. Cases like faults that
take the mmap_sem for reading sometimes need the pte validation to check
if they didn't race with another fault etc.

But since mmap_sem is held for reading the vma can't dissapear and the
memory map is stable in the sense that the page tables will be present
(or can be instantiated when needed), since munmap removes the
pagetables for vmas.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about pte_offset_map_lock
  2009-12-17  8:01 ` Peter Zijlstra
@ 2009-12-17 14:49   ` Minchan Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Minchan Kim @ 2009-12-17 14:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mm, Hugh Dickins, Christoph Lameter, KAMEZAWA Hiroyuki

Hi, Peter.

On Thu, Dec 17, 2009 at 5:01 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2009-12-17 at 11:46 +0900, Minchan Kim wrote:
>> It may be a dumb question.
>>
>> As I read the code of pte_lock, I have a question.
>> Now, there is pte_offset_map_lock following as.
>>
>> #define pte_offset_map_lock(mm, pmd, address, ptlp)     \
>> ({                                                      \
>>         spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
>>         pte_t *__pte = pte_offset_map(pmd, address);    \
>>         *(ptlp) = __ptl;                                \
>>         spin_lock(__ptl);                               \
>>         __pte;                                          \
>> })
>>
>> Why do we grab the lock after getting __pte?
>> Is it possible that __pte might be changed before we grab the spin_lock?
>>
>> Some codes in mm checks original pte by pte_same.
>> There are not-checked cases in proc. As looking over the cases,
>> It seems no problem. But in future, new user of pte_offset_map_lock
>> could mistake with that?
>
> I think currently mmap_sem serializes all that. Cases like faults that
> take the mmap_sem for reading sometimes need the pte validation to check
> if they didn't race with another fault etc.
>
> But since mmap_sem is held for reading the vma can't dissapear and the
> memory map is stable in the sense that the page tables will be present
> (or can be instantiated when needed), since munmap removes the
> pagetables for vmas.

Thanks for answering dumb question.

It means sometimes pte split lock depends on mmap_sem.
First of all, Shouldn't we remove this dependency by reordering spinlock(__ptl)
in pte_offset_map_lock  for range locking?

>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about pte_offset_map_lock
  2009-12-17  2:46 Question about pte_offset_map_lock Minchan Kim
  2009-12-17  8:01 ` Peter Zijlstra
@ 2009-12-17  9:54 ` Hugh Dickins
  2009-12-17 15:02   ` Minchan Kim
  1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2009-12-17  9:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, Christoph Lameter, Peter Zijlstra, KAMEZAWA Hiroyuki

On Thu, 17 Dec 2009, Minchan Kim wrote:
> It may be a dumb question.
> 
> As I read the code of pte_lock, I have a question. 
> Now, there is pte_offset_map_lock following as. 
> 
> #define pte_offset_map_lock(mm, pmd, address, ptlp)     \
> ({                                                      \
>         spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
>         pte_t *__pte = pte_offset_map(pmd, address);    \
>         *(ptlp) = __ptl;                                \
>         spin_lock(__ptl);                               \
>         __pte;                                          \
> })
> 
> Why do we grab the lock after getting __pte?
> Is it possible that __pte might be changed before we grab the spin_lock?
> 
> Some codes in mm checks original pte by pte_same. 
> There are not-checked cases in proc. As looking over the cases,
> It seems no problem. But in future, new user of pte_offset_map_lock 
> could mistake with that?

I think you wouldn't be asking the question if we'd called it __ptep.

It's a (perhaps kmap_atomic) pointer into the page table: the virtual
address of a page table entry, not the page table entry itself.

You're right that the entry itself could change before we get the lock,
and pte_same() is what we use to check that an entry is still what we
were expecting; but the containing page table will remain the same,
until munmap() or exit_mmap() at least.

(For completeness, I ought to add that the entry might even change
while we have the lock: accessed and dirty bits could get set by a
racing thread in userspace.  There are places where we have to be
very careful about not missing a dirty bit, but missing an accessed
bit on rare occasions doesn't matter.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about pte_offset_map_lock
  2009-12-17  9:54 ` Hugh Dickins
@ 2009-12-17 15:02   ` Minchan Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Minchan Kim @ 2009-12-17 15:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Christoph Lameter, Peter Zijlstra, KAMEZAWA Hiroyuki

Hi, Hugh.

On Thu, Dec 17, 2009 at 6:54 PM, Hugh Dickins
<hugh.dickins@tiscali.co.uk> wrote:
> On Thu, 17 Dec 2009, Minchan Kim wrote:
>> It may be a dumb question.
>>
>> As I read the code of pte_lock, I have a question.
>> Now, there is pte_offset_map_lock following as.
>>
>> #define pte_offset_map_lock(mm, pmd, address, ptlp)     \
>> ({                                                      \
>>         spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
>>         pte_t *__pte = pte_offset_map(pmd, address);    \
>>         *(ptlp) = __ptl;                                \
>>         spin_lock(__ptl);                               \
>>         __pte;                                          \
>> })
>>
>> Why do we grab the lock after getting __pte?
>> Is it possible that __pte might be changed before we grab the spin_lock?
>>
>> Some codes in mm checks original pte by pte_same.
>> There are not-checked cases in proc. As looking over the cases,
>> It seems no problem. But in future, new user of pte_offset_map_lock
>> could mistake with that?
>
> I think you wouldn't be asking the question if we'd called it __ptep.

Absolutely.

>
> It's a (perhaps kmap_atomic) pointer into the page table: the virtual
> address of a page table entry, not the page table entry itself.
>
> You're right that the entry itself could change before we get the lock,
> and pte_same() is what we use to check that an entry is still what we
> were expecting; but the containing page table will remain the same,
> until munmap() or exit_mmap() at least

Yes, In unmap case, it can be protected by mmap_sem.  :)

>
> (For completeness, I ought to add that the entry might even change
> while we have the lock: accessed and dirty bits could get set by a
> racing thread in userspace.  There are places where we have to be
> very careful about not missing a dirty bit, but missing an accessed
> bit on rare occasions doesn't matter.)

Indeed!.
Thanks! Hugh.

> Hugh
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-17 15:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-17  2:46 Question about pte_offset_map_lock Minchan Kim
2009-12-17  8:01 ` Peter Zijlstra
2009-12-17 14:49   ` Minchan Kim
2009-12-17  9:54 ` Hugh Dickins
2009-12-17 15:02   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).