linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Why does unmap_area_sections() depend on !CONFIG_SMP?
@ 2013-10-01  9:59 Joonsoo Kim
  2013-10-01 18:23 ` Nicolas Pitre
  0 siblings, 1 reply; 3+ messages in thread
From: Joonsoo Kim @ 2013-10-01  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hello, Russell.

I looked at ioremap code in arm tree and found that unmap_area_sections()
is enabled only if !CONFIG_SMP. I can't understand the comments
above this function and it comes from you. Could you elaborate more
on this?

I guess that flush_cache_vunmap() before clearing page table and
flush_tlb_kernel_range() after clearing page table is safe enough to cache
consistency regardless CONFIG_SMP configuration. I think that 4K vunmap()
also depends on this flushing logic.

Please let me know what I am missing here.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Why does unmap_area_sections() depend on !CONFIG_SMP?
  2013-10-01  9:59 Why does unmap_area_sections() depend on !CONFIG_SMP? Joonsoo Kim
@ 2013-10-01 18:23 ` Nicolas Pitre
  2013-10-02  5:05   ` 김준수
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolas Pitre @ 2013-10-01 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 1 Oct 2013, Joonsoo Kim wrote:

> Hello, Russell.
> 
> I looked at ioremap code in arm tree and found that unmap_area_sections()
> is enabled only if !CONFIG_SMP. I can't understand the comments
> above this function and it comes from you. Could you elaborate more
> on this?
> 
> I guess that flush_cache_vunmap() before clearing page table and
> flush_tlb_kernel_range() after clearing page table is safe enough to cache
> consistency regardless CONFIG_SMP configuration. I think that 4K vunmap()
> also depends on this flushing logic.
> 
> Please let me know what I am missing here.

This is all related to the page table level involved.

Each entry in the first level page table may refer to a second level 
page table covering 1MB worth of virtual space, or it may be a direct 
mapping corresponding to 1MB of contiguous physical memory.

In Linux, all tasks, including kernel threads, have their own first 
level page table.  On process creation, the top entries covering 
TASK_SIZE and above in the first level page table is copied from init_mm 
into the new page table as the kernel address space is meant to be 
identical across all tasks.

This is however not always the case though.  Consider one call to 
ioremap() which does create a new entry in the kernel virtual space.  
In order to ensure that the kernel virtual space is indeed the same 
across all tasks, the ioremap code would have to walk the entire task 
list just to update their own copy of the kernel virtual mapping.  So 
what we do instead is to create the new page table entry in init_mm 
only, and lazily update the other task's page table when they fault on 
access due to their own page table being incomplete.

What about iounmap() then.  When a mapping is removed, we don't want it 
to be accessible through some random task's page table.  Well, in the 
normal ioremap() case, the actual mapping is created into a second level 
page table which happens to be common to all tasks.  Hence the first 
level page table entry being created is actually a pointer to that 
second level page table, and when a mapping is removed it is only 
removed from that second level page table.  The second level table 
itself remains in memory forever, ready to be reused for any other call 
to ioremap().  Therefore there is no need to update each task's first 
level table again.

So far so good.

Now comes the section mapping for ioremap().  Since this is handled into 
the first level page table only with no common second level table, we 
needed a mechanism to ensure that any mapping removal gets propagated to 
all first level tables in the system.  This is accomplished with a 
sequence counter namely vmalloc_seq which is incremented whenever such a 
change occurs.  Upon every task switch, this counter is checked against 
the master copy to detect when the next task to be scheduled has its 
first level page table out of date, and if so it is updated before the 
new memory context is instated.

But... this works only if not SMP.  On SMP, different tasks might be 
running on the other CPUs and incrementing vmalloc_seq won't have any 
effect on them.  This is why section mappings for ioremap() is not 
available if SMP.

This could probably be fixed by sending an IPI to the other processors, 
forcing them to resync their page table right after clearing the mapping 
from the master table.  But no one implemented it so far.


Nicolas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Why does unmap_area_sections() depend on !CONFIG_SMP?
  2013-10-01 18:23 ` Nicolas Pitre
@ 2013-10-02  5:05   ` 김준수
  0 siblings, 0 replies; 3+ messages in thread
From: 김준수 @ 2013-10-02  5:05 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Nicolas Pitre [mailto:nicolas.pitre at linaro.org]
> Sent: Wednesday, October 02, 2013 3:24 AM
> To: Joonsoo Kim
> Cc: Russell King; linux-arm-kernel at lists.infradead.org
> Subject: Re: Why does unmap_area_sections() depend on !CONFIG_SMP?
> 
> On Tue, 1 Oct 2013, Joonsoo Kim wrote:
> 
> > Hello, Russell.
> >
> > I looked at ioremap code in arm tree and found that
> > unmap_area_sections() is enabled only if !CONFIG_SMP. I can't
> > understand the comments above this function and it comes from you.
> > Could you elaborate more on this?
> >
> > I guess that flush_cache_vunmap() before clearing page table and
> > flush_tlb_kernel_range() after clearing page table is safe enough to
> > cache consistency regardless CONFIG_SMP configuration. I think that 4K
> > vunmap() also depends on this flushing logic.
> >
> > Please let me know what I am missing here.
> 
> This is all related to the page table level involved.
> 
> Each entry in the first level page table may refer to a second level page
> table covering 1MB worth of virtual space, or it may be a direct mapping
> corresponding to 1MB of contiguous physical memory.
> 
> In Linux, all tasks, including kernel threads, have their own first level
> page table.  On process creation, the top entries covering TASK_SIZE and
> above in the first level page table is copied from init_mm into the new
> page table as the kernel address space is meant to be identical across all
> tasks.
> 
> This is however not always the case though.  Consider one call to
> ioremap() which does create a new entry in the kernel virtual space.
> In order to ensure that the kernel virtual space is indeed the same across
> all tasks, the ioremap code would have to walk the entire task list just
> to update their own copy of the kernel virtual mapping.  So what we do
> instead is to create the new page table entry in init_mm only, and lazily
> update the other task's page table when they fault on access due to their
> own page table being incomplete.
> 
> What about iounmap() then.  When a mapping is removed, we don't want it to
> be accessible through some random task's page table.  Well, in the normal
> ioremap() case, the actual mapping is created into a second level page
> table which happens to be common to all tasks.  Hence the first level page
> table entry being created is actually a pointer to that second level page
> table, and when a mapping is removed it is only removed from that second
> level page table.  The second level table itself remains in memory
forever,
> ready to be reused for any other call to ioremap().  Therefore there is no
> need to update each task's first level table again.
> 
> So far so good.
> 
> Now comes the section mapping for ioremap().  Since this is handled into
> the first level page table only with no common second level table, we
> needed a mechanism to ensure that any mapping removal gets propagated to
> all first level tables in the system.  This is accomplished with a
> sequence counter namely vmalloc_seq which is incremented whenever such a
> change occurs.  Upon every task switch, this counter is checked against
> the master copy to detect when the next task to be scheduled has its first
> level page table out of date, and if so it is updated before the new
> memory context is instated.
> 
> But... this works only if not SMP.  On SMP, different tasks might be
> running on the other CPUs and incrementing vmalloc_seq won't have any
> effect on them.  This is why section mappings for ioremap() is not
> available if SMP.
> 
> This could probably be fixed by sending an IPI to the other processors,
> forcing them to resync their page table right after clearing the mapping
> from the master table.  But no one implemented it so far.

Hello, Nicolas.

Really thanks for kind explanation.
Now, I totally understand why it is not available if SMP.
I will investigate more on this and try to implement that section mapping
for ioremap() works on SMP.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-02  5:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01  9:59 Why does unmap_area_sections() depend on !CONFIG_SMP? Joonsoo Kim
2013-10-01 18:23 ` Nicolas Pitre
2013-10-02  5:05   ` 김준수

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).