* Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
2016-02-19 6:40 ` Konstantin Khlebnikov
@ 2016-02-19 6:57 ` Konstantin Khlebnikov
2016-02-19 21:13 ` Andrew Morton
2016-02-29 0:02 ` Hugh Dickins
2 siblings, 0 replies; 9+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-19 6:57 UTC (permalink / raw)
To: Hugh Dickins
Cc: Johannes Weiner, linux-mm@kvack.org, Andrew Morton, Rik van Riel,
Mel Gorman, Linux Kernel Mailing List, kernel-team
On Fri, Feb 19, 2016 at 9:40 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Fri, Feb 19, 2016 at 1:57 AM, Hugh Dickins <hughd@google.com> wrote:
>> On Thu, 18 Feb 2016, Johannes Weiner wrote:
>>
>>> Even before we added MemAvailable, users knew that page cache is
>>> easily convertible to free memory on pressure, and estimated their
>>> "available" memory by looking at the sum of MemFree, Cached, Buffers.
>>> However, "Cached" is calculated using NR_FILE_PAGES, which includes
>>> shmem and random driver pages inserted into the page tables; neither
>>> of which are easily reclaimable, or reclaimable at all. Reclaiming
>>> shmem requires swapping, which is slow. And unlike page cache, which
>>> has fairly conservative dirty limits, all of shmem needs to be written
>>> out before becoming evictable. Without swap, shmem is not evictable at
>>> all. And driver pages certainly never are.
>>>
>>> Calling these pages "Cached" is misleading and has resulted in broken
>>> formulas in userspace. They misrepresent the memory situation and
>>> cause either waste or unexpected OOM kills. With 64-bit and per-cpu
>>> memory we are way past the point where the relationship between
>>> virtual and physical memory is meaningful and users can rely on
>>> overcommit protection. OOM kills can not be avoided without wasting
>>> enormous amounts of memory this way. This shifts the management burden
>>> toward userspace, toward applications monitoring their environment and
>>> adjusting their operations. And so where statistics like /proc/meminfo
>>> used to be more informational, we have more and more software relying
>>> on them to make automated decisions based on utilization.
>>>
>>> But if userspace is supposed to take over responsibility, it needs a
>>> clear and accurate kernel interface to base its judgement on. And one
>>> of the requirements is certainly that memory consumers with wildly
>>> different reclaimability are not conflated. Adding MemAvailable is a
>>> good step in that direction, but there is software like Sigar[1] in
>>> circulation that might not get updated anytime soon. And even then,
>>> new users will continue to go for the intuitive interpretation of the
>>> Cached item. We can't blame them. There are years of tradition behind
>>> it, starting with the way free(1) and vmstat(8) have always reported
>>> free, buffers, cached. And try as we might, using "Cached" for
>>> unevictable memory is never going to be obvious.
>>>
>>> The semantics of Cached including shmem and kernel pages have been
>>> this way forever, dictated by the single-LRU implementation rather
>>> than optimal semantics. So it's an uncomfortable proposal to change it
>>> now. But what other way to fix this for existing users? What other way
>>> to make the interface more intuitive for future users? And what could
>>> break by removing it now? I guess somebody who already subtracts Shmem
>>> from Cached.
>>>
>>> What are your thoughts on this?
>>
>> My thoughts are NAK. A misleading stat is not so bad as a
>> misleading stat whose meaning we change in some random kernel.
>>
>> By all means improve Documentation/filesystems/proc.txt on Cached.
>> By all means promote Active(file)+Inactive(file)-Buffers as often a
>> better measure (though Buffers itself is obscure to me - is it intended
>> usually to approximate resident FS metadata?). By all means work on
>> /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
>>
>> We have to assume that Cached has been useful to some people, and that
>> they've learnt to subtract Shmem from it, if slow or no swap concerns them.
>>
>> Added Konstantin to Cc: he's had valuable experience of people learning
>> to adapt to the numbers that we put out.
>>
>
> I think everything will ok. Subtraction of shmem isn't widespread practice,
> more like secret knowledge. This wasn't documented and people who use
> this should be aware that this might stop working at any time. So, ACK.
Actually, NR_FILE_PAGES could try to retire after that.
Where only few places where it is used and looks like it's easy to replace it
with something else, even more accurate.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
2016-02-19 6:40 ` Konstantin Khlebnikov
2016-02-19 6:57 ` Konstantin Khlebnikov
@ 2016-02-19 21:13 ` Andrew Morton
2016-02-29 0:03 ` Hugh Dickins
2016-02-29 0:02 ` Hugh Dickins
2 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2016-02-19 21:13 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Johannes Weiner, linux-mm@kvack.org, Rik van Riel,
Mel Gorman, Linux Kernel Mailing List, kernel-team
On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> >> What are your thoughts on this?
> >
> > My thoughts are NAK. A misleading stat is not so bad as a
> > misleading stat whose meaning we change in some random kernel.
> >
> > By all means improve Documentation/filesystems/proc.txt on Cached.
> > By all means promote Active(file)+Inactive(file)-Buffers as often a
> > better measure (though Buffers itself is obscure to me - is it intended
> > usually to approximate resident FS metadata?). By all means work on
> > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
> >
> > We have to assume that Cached has been useful to some people, and that
> > they've learnt to subtract Shmem from it, if slow or no swap concerns them.
> >
> > Added Konstantin to Cc: he's had valuable experience of people learning
> > to adapt to the numbers that we put out.
> >
>
> I think everything will ok. Subtraction of shmem isn't widespread practice,
> more like secret knowledge. This wasn't documented and people who use
> this should be aware that this might stop working at any time. So, ACK.
It worries me as well - we're deliberately altering the behaviour of
existing userspace code. Not all of those alterations will be welcome!
We could add a shiny new field into meminfo and train people to migrate
to that. But that would just be a sum of already-available fields. In
an ideal world we could solve all of this with documentation and
cluebatting (and some apologizing!).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
2016-02-19 21:13 ` Andrew Morton
@ 2016-02-29 0:03 ` Hugh Dickins
2016-02-29 7:03 ` Konstantin Khlebnikov
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2016-02-29 0:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Konstantin Khlebnikov, Hugh Dickins, Johannes Weiner,
linux-mm@kvack.org, Rik van Riel, Mel Gorman,
Linux Kernel Mailing List, kernel-team
On Fri, 19 Feb 2016, Andrew Morton wrote:
> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>
> > >> What are your thoughts on this?
> > >
> > > My thoughts are NAK. A misleading stat is not so bad as a
> > > misleading stat whose meaning we change in some random kernel.
> > >
> > > By all means improve Documentation/filesystems/proc.txt on Cached.
> > > By all means promote Active(file)+Inactive(file)-Buffers as often a
> > > better measure (though Buffers itself is obscure to me - is it intended
> > > usually to approximate resident FS metadata?). By all means work on
> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
> > >
> > > We have to assume that Cached has been useful to some people, and that
> > > they've learnt to subtract Shmem from it, if slow or no swap concerns them.
> > >
> > > Added Konstantin to Cc: he's had valuable experience of people learning
> > > to adapt to the numbers that we put out.
> > >
> >
> > I think everything will ok. Subtraction of shmem isn't widespread practice,
> > more like secret knowledge. This wasn't documented and people who use
> > this should be aware that this might stop working at any time. So, ACK.
>
> It worries me as well - we're deliberately altering the behaviour of
> existing userspace code. Not all of those alterations will be welcome!
>
> We could add a shiny new field into meminfo and train people to migrate
> to that. But that would just be a sum of already-available fields. In
> an ideal world we could solve all of this with documentation and
> cluebatting (and some apologizing!).
Ah, I missed this, and just sent a redundant addition to the thread;
followed by this doubly redundant addition.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
2016-02-29 0:03 ` Hugh Dickins
@ 2016-02-29 7:03 ` Konstantin Khlebnikov
0 siblings, 0 replies; 9+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-29 7:03 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Johannes Weiner, linux-mm@kvack.org, Rik van Riel,
Mel Gorman, Linux Kernel Mailing List, kernel-team
On Mon, Feb 29, 2016 at 3:03 AM, Hugh Dickins <hughd@google.com> wrote:
> On Fri, 19 Feb 2016, Andrew Morton wrote:
>> On Fri, 19 Feb 2016 09:40:45 +0300 Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>
>> > >> What are your thoughts on this?
>> > >
>> > > My thoughts are NAK. A misleading stat is not so bad as a
>> > > misleading stat whose meaning we change in some random kernel.
>> > >
>> > > By all means improve Documentation/filesystems/proc.txt on Cached.
>> > > By all means promote Active(file)+Inactive(file)-Buffers as often a
>> > > better measure (though Buffers itself is obscure to me - is it intended
>> > > usually to approximate resident FS metadata?). By all means work on
>> > > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
>> > >
>> > > We have to assume that Cached has been useful to some people, and that
>> > > they've learnt to subtract Shmem from it, if slow or no swap concerns them.
>> > >
>> > > Added Konstantin to Cc: he's had valuable experience of people learning
>> > > to adapt to the numbers that we put out.
>> > >
>> >
>> > I think everything will ok. Subtraction of shmem isn't widespread practice,
>> > more like secret knowledge. This wasn't documented and people who use
>> > this should be aware that this might stop working at any time. So, ACK.
>>
>> It worries me as well - we're deliberately altering the behaviour of
>> existing userspace code. Not all of those alterations will be welcome!
>>
>> We could add a shiny new field into meminfo and train people to migrate
>> to that. But that would just be a sum of already-available fields. In
>> an ideal world we could solve all of this with documentation and
>> cluebatting (and some apologizing!).
>
> Ah, I missed this, and just sent a redundant addition to the thread;
> followed by this doubly redundant addition.
"Cached" has been used for ages as amount of "potentially free memory".
This patch corrects it in original meaning and makes it closer to that
"potential"
meaining at the same time.
MemAvailable means exactly that and thing else so logic behind it could be
tuned and changed in the future. Thus, adding new fields makes no sense.
BTW
Glibc recently switched sysconf(_SC_PHYS_PAGES) / sysconf(_SC_AVPHYS_PAGES)
from /proc/meminfo MemTotal / MemFree to sysinfo(2) totalram / freeram for
performance reason. It seems possible to expose MemAvailable via sysinfo:
there is space for one field. Probably it's also possible to switch
_SC_AVPHYS_PAGES
to really available memory and add memcg awareness too.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached
2016-02-19 6:40 ` Konstantin Khlebnikov
2016-02-19 6:57 ` Konstantin Khlebnikov
2016-02-19 21:13 ` Andrew Morton
@ 2016-02-29 0:02 ` Hugh Dickins
2 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2016-02-29 0:02 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Johannes Weiner, linux-mm@kvack.org, Andrew Morton,
Rik van Riel, Mel Gorman, Linux Kernel Mailing List, kernel-team
On Fri, 19 Feb 2016, Konstantin Khlebnikov wrote:
> On Fri, Feb 19, 2016 at 1:57 AM, Hugh Dickins <hughd@google.com> wrote:
> > On Thu, 18 Feb 2016, Johannes Weiner wrote:
> >
> >> Even before we added MemAvailable, users knew that page cache is
> >> easily convertible to free memory on pressure, and estimated their
> >> "available" memory by looking at the sum of MemFree, Cached, Buffers.
> >> However, "Cached" is calculated using NR_FILE_PAGES, which includes
> >> shmem and random driver pages inserted into the page tables; neither
> >> of which are easily reclaimable, or reclaimable at all. Reclaiming
> >> shmem requires swapping, which is slow. And unlike page cache, which
> >> has fairly conservative dirty limits, all of shmem needs to be written
> >> out before becoming evictable. Without swap, shmem is not evictable at
> >> all. And driver pages certainly never are.
> >>
> >> Calling these pages "Cached" is misleading and has resulted in broken
> >> formulas in userspace. They misrepresent the memory situation and
> >> cause either waste or unexpected OOM kills. With 64-bit and per-cpu
> >> memory we are way past the point where the relationship between
> >> virtual and physical memory is meaningful and users can rely on
> >> overcommit protection. OOM kills can not be avoided without wasting
> >> enormous amounts of memory this way. This shifts the management burden
> >> toward userspace, toward applications monitoring their environment and
> >> adjusting their operations. And so where statistics like /proc/meminfo
> >> used to be more informational, we have more and more software relying
> >> on them to make automated decisions based on utilization.
> >>
> >> But if userspace is supposed to take over responsibility, it needs a
> >> clear and accurate kernel interface to base its judgement on. And one
> >> of the requirements is certainly that memory consumers with wildly
> >> different reclaimability are not conflated. Adding MemAvailable is a
> >> good step in that direction, but there is software like Sigar[1] in
> >> circulation that might not get updated anytime soon. And even then,
> >> new users will continue to go for the intuitive interpretation of the
> >> Cached item. We can't blame them. There are years of tradition behind
> >> it, starting with the way free(1) and vmstat(8) have always reported
> >> free, buffers, cached. And try as we might, using "Cached" for
> >> unevictable memory is never going to be obvious.
> >>
> >> The semantics of Cached including shmem and kernel pages have been
> >> this way forever, dictated by the single-LRU implementation rather
> >> than optimal semantics. So it's an uncomfortable proposal to change it
> >> now. But what other way to fix this for existing users? What other way
> >> to make the interface more intuitive for future users? And what could
> >> break by removing it now? I guess somebody who already subtracts Shmem
> >> from Cached.
> >>
> >> What are your thoughts on this?
> >
> > My thoughts are NAK. A misleading stat is not so bad as a
> > misleading stat whose meaning we change in some random kernel.
> >
> > By all means improve Documentation/filesystems/proc.txt on Cached.
> > By all means promote Active(file)+Inactive(file)-Buffers as often a
> > better measure (though Buffers itself is obscure to me - is it intended
> > usually to approximate resident FS metadata?). By all means work on
> > /proc/meminfo-v2 (though that may entail dispiritingly long discussions).
> >
> > We have to assume that Cached has been useful to some people, and that
> > they've learnt to subtract Shmem from it, if slow or no swap concerns them.
> >
> > Added Konstantin to Cc: he's had valuable experience of people learning
> > to adapt to the numbers that we put out.
> >
>
> I think everything will ok. Subtraction of shmem isn't widespread practice,
> more like secret knowledge. This wasn't documented and people who use
> this should be aware that this might stop working at any time. So, ACK.
I'll take your ACK as cancelling my NAK then; but I do still remain
uncomfortable with such a change - I think "we" would do much better
to add fields with the necessary missing information to /proc/meminfo,
than mess around with the meaning of existing fields. But if I'm the
only one who thinks that way, ignore me.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread