* [LSF/MM TOPIC] Persistent Memory
@ 2013-12-20 17:05 Matthew Wilcox
2014-01-08 15:42 ` [Lsf-pc] " Mel Gorman
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2013-12-20 17:05 UTC (permalink / raw)
To: lsf-pc; +Cc: linux-fsdevel, linux-mm
I should like to discuss the current situation with Linux support for
persistent memory. While I expect the current discussion to be long
over by March, I am certain that there will be topics around persistent
memory that have not been settled at that point.
I believe this will mostly be of crossover interest between filesystem
and MM people, and of lesser interest to storage people (since we're
basically avoiding their code).
Subtopics might include
- Using persistent memory for FS metadata
(The XIP code provides persistent memory to userspace. The filesystem
still uses BIOs to fetch its metadata)
- Supporting PMD/PGD mappings for userspace
(Not only does the filesystem have to avoid fragmentation to make this
happen, the VM code has to permit these giant mappings)
- Persistent page cache
(Another way to take advantage of persstent memory would be to place it
in the page cache. But we don't have struct pages for it! What to do?)
- Making XIP and non-XIP codepaths closer to each other
(I think we have a good start on this, but more is needed)
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Persistent Memory
2013-12-20 17:05 [LSF/MM TOPIC] Persistent Memory Matthew Wilcox
@ 2014-01-08 15:42 ` Mel Gorman
2014-01-09 1:35 ` Bob Liu
0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2014-01-08 15:42 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: lsf-pc, linux-fsdevel, linux-mm
On Fri, Dec 20, 2013 at 10:05:02AM -0700, Matthew Wilcox wrote:
>
> I should like to discuss the current situation with Linux support for
> persistent memory. While I expect the current discussion to be long
> over by March, I am certain that there will be topics around persistent
> memory that have not been settled at that point.
>
> I believe this will mostly be of crossover interest between filesystem
> and MM people, and of lesser interest to storage people (since we're
> basically avoiding their code).
>
> Subtopics might include
> - Using persistent memory for FS metadata
> (The XIP code provides persistent memory to userspace. The filesystem
> still uses BIOs to fetch its metadata)
> - Supporting PMD/PGD mappings for userspace
> (Not only does the filesystem have to avoid fragmentation to make this
> happen, the VM code has to permit these giant mappings)
The filesystem would also have to correctly align the data on disk. All
this implies that the underlying device is byte-addressible, similar access
speeds to RAM and directly accessible from userspace without the kernel
being involved. Without those conditions, I find it hard to believe that
TLB pressure dominates access cost. Then again I have no experience with
the devices or their intended use case so would not mind an education.
However, if you really wanted the device to be accessible like this then
the shortest solutions (and I want to punch myself for even suggesting
this) is to extend hugetlbfs to directly access these devices. It's
almost certainly a bad direction to take though, there would need to be a
good justification for it. Anything in this direction is pushing usage of
persistent devices to userspace and the kernel just provides an interface,
maybe that is desirable maybe not.
> - Persistent page cache
> (Another way to take advantage of persstent memory would be to place it
> in the page cache. But we don't have struct pages for it! What to do?)
I don't the struct pages are really the problem here. Minimally you could
bodge it by creating a pgdat structure and allocating the struct pages for it
similar to how RAM is initialised. However, it completely sucks as a solution
because it causes all sorts of cache management problems, particularly page
aging inversion problems when treated as memory like this. The resulting
API for userspace would hurt like like. Think of NUMA problems, but much
much worse. Don't do this. The only reason I mention it is because so many
people seem to think it's a great solution at first glance.
Even considering the solution begs the question of "why". Sure, page cache
would be persistent across reboots but the information is readily available
on disk and if the data is read-mostly then who cares. If it's read/write,
making it persistent across a reboot will not improve overall performance. I
can see the need for some data to be persisted across a reboot (application
checkpoint, suspend/resume, crash data, something like bcache even if
sufficiently motivated) but none of that requires page cache support as such.
I'll throw my hands up and say that my lack of familiarity with the
expected use cases handicaps me. We can twist the VM into all sorts of
circles but it'd be nice to know more about *why* we are doing something
before worrying about the how. Maybe I'm the only VM person that suffers
from this particular problem in which case I would appreciate being
pointed in a sensible direction some time before LSF/MM.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Persistent Memory
2014-01-08 15:42 ` [Lsf-pc] " Mel Gorman
@ 2014-01-09 1:35 ` Bob Liu
2014-01-09 10:37 ` Mel Gorman
0 siblings, 1 reply; 4+ messages in thread
From: Bob Liu @ 2014-01-09 1:35 UTC (permalink / raw)
To: Mel Gorman, Matthew Wilcox; +Cc: lsf-pc, linux-fsdevel, linux-mm
On 01/08/2014 11:42 PM, Mel Gorman wrote:
> On Fri, Dec 20, 2013 at 10:05:02AM -0700, Matthew Wilcox wrote:
>>
>> I should like to discuss the current situation with Linux support for
>> persistent memory. While I expect the current discussion to be long
>> over by March, I am certain that there will be topics around persistent
>> memory that have not been settled at that point.
>>
>> I believe this will mostly be of crossover interest between filesystem
>> and MM people, and of lesser interest to storage people (since we're
>> basically avoiding their code).
>>
>> Subtopics might include
>> - Using persistent memory for FS metadata
>> (The XIP code provides persistent memory to userspace. The filesystem
>> still uses BIOs to fetch its metadata)
>> - Supporting PMD/PGD mappings for userspace
>> (Not only does the filesystem have to avoid fragmentation to make this
>> happen, the VM code has to permit these giant mappings)
>
> The filesystem would also have to correctly align the data on disk. All
> this implies that the underlying device is byte-addressible, similar access
> speeds to RAM and directly accessible from userspace without the kernel
> being involved. Without those conditions, I find it hard to believe that
> TLB pressure dominates access cost. Then again I have no experience with
> the devices or their intended use case so would not mind an education.
>
> However, if you really wanted the device to be accessible like this then
> the shortest solutions (and I want to punch myself for even suggesting
> this) is to extend hugetlbfs to directly access these devices. It's
> almost certainly a bad direction to take though, there would need to be a
> good justification for it. Anything in this direction is pushing usage of
> persistent devices to userspace and the kernel just provides an interface,
> maybe that is desirable maybe not.
>
>> - Persistent page cache
>> (Another way to take advantage of persstent memory would be to place it
>> in the page cache. But we don't have struct pages for it! What to do?)
>
I think one potential way is to use persistent memory as a second-level
clean page cache through the cleancache API.
--
Regards,
-Bob
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Persistent Memory
2014-01-09 1:35 ` Bob Liu
@ 2014-01-09 10:37 ` Mel Gorman
0 siblings, 0 replies; 4+ messages in thread
From: Mel Gorman @ 2014-01-09 10:37 UTC (permalink / raw)
To: Bob Liu; +Cc: Matthew Wilcox, lsf-pc, linux-fsdevel, linux-mm
On Thu, Jan 09, 2014 at 09:35:45AM +0800, Bob Liu wrote:
>
> On 01/08/2014 11:42 PM, Mel Gorman wrote:
> > On Fri, Dec 20, 2013 at 10:05:02AM -0700, Matthew Wilcox wrote:
> >>
> >> I should like to discuss the current situation with Linux support for
> >> persistent memory. While I expect the current discussion to be long
> >> over by March, I am certain that there will be topics around persistent
> >> memory that have not been settled at that point.
> >>
> >> I believe this will mostly be of crossover interest between filesystem
> >> and MM people, and of lesser interest to storage people (since we're
> >> basically avoiding their code).
> >>
> >> Subtopics might include
> >> - Using persistent memory for FS metadata
> >> (The XIP code provides persistent memory to userspace. The filesystem
> >> still uses BIOs to fetch its metadata)
> >> - Supporting PMD/PGD mappings for userspace
> >> (Not only does the filesystem have to avoid fragmentation to make this
> >> happen, the VM code has to permit these giant mappings)
> >
> > The filesystem would also have to correctly align the data on disk. All
> > this implies that the underlying device is byte-addressible, similar access
> > speeds to RAM and directly accessible from userspace without the kernel
> > being involved. Without those conditions, I find it hard to believe that
> > TLB pressure dominates access cost. Then again I have no experience with
> > the devices or their intended use case so would not mind an education.
> >
> > However, if you really wanted the device to be accessible like this then
> > the shortest solutions (and I want to punch myself for even suggesting
> > this) is to extend hugetlbfs to directly access these devices. It's
> > almost certainly a bad direction to take though, there would need to be a
> > good justification for it. Anything in this direction is pushing usage of
> > persistent devices to userspace and the kernel just provides an interface,
> > maybe that is desirable maybe not.
> >
> >> - Persistent page cache
> >> (Another way to take advantage of persstent memory would be to place it
> >> in the page cache. But we don't have struct pages for it! What to do?)
> >
>
> I think one potential way is to use persistent memory as a second-level
> clean page cache through the cleancache API.
>
Cleancache is inherently read-mostly. What is the motivation for persisting
that across a reboot when it's much easier to just read it once after
reboot? It seems like a lot of complexity for marginal gain that only
exists very early in the lifetime of the system. There appears to be some
mixing between the use cases for fast storage and persistent memory when
they have different purposes.
I would understand a use-case whereby persistent memory was used for
filesystem journals so they could be quickly updated and replayed on power
failures but that would not need PMD/PGD mapping support or extensive VM
support though.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-01-09 10:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-20 17:05 [LSF/MM TOPIC] Persistent Memory Matthew Wilcox
2014-01-08 15:42 ` [Lsf-pc] " Mel Gorman
2014-01-09 1:35 ` Bob Liu
2014-01-09 10:37 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).