* mincore() & fincore() @ 2013-07-25 14:58 Cédric Villemain 2013-07-25 15:07 ` Cédric Villemain 0 siblings, 1 reply; 6+ messages in thread From: Cédric Villemain @ 2013-07-25 14:58 UTC (permalink / raw) To: linux-mm; +Cc: Johannes Weiner [-- Attachment #1: Type: text/plain, Size: 1274 bytes --] Hello First, the proposed changes in this email are to be used at least for PostgreSQL extensions, maybe for core. Purpose is to offer better monitoring/tracking of the hot/cold areas (and read/write paterns) in the tables and indexes, in PostgreSQL those are by default written in segments of 1GB. There are some possible usecase already: * planning of hardware upgrade * easier configuration setup (both PostgreSQL and linux) * provide more informations to the planner/executor of PostgreSQL My ideas so far are to * improve mincore() in linux and add it information like in freeBSD (at least adding 'mincore_modified' to track clean vs dirty pages). * adding fincore() to make the information easier to grab from PostgreSQL (no mmap) * maybe some access to those stats in /proc/ It makes years that libprefetch, mincore() and fincore() are discussed on linux mailling lists. And they got a good feedback... So I hope it is ok to keep on those and provide updated patches. Johannes, I add you in CC because you're the last one who proposed something. Should I update your patch ? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation [-- Attachment #2: Type: text/html, Size: 6541 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mincore() & fincore() 2013-07-25 14:58 mincore() & fincore() Cédric Villemain @ 2013-07-25 15:07 ` Cédric Villemain 2013-07-25 15:32 ` Johannes Weiner 0 siblings, 1 reply; 6+ messages in thread From: Cédric Villemain @ 2013-07-25 15:07 UTC (permalink / raw) To: linux-mm; +Cc: Johannes Weiner [-- Attachment #1: Type: text/plain, Size: 1487 bytes --] [sorry, previous mail was sent earlier than expected] > First, the proposed changes in this email are to be used at least for > PostgreSQL extensions, maybe for core. > > Purpose is to offer better monitoring/tracking of the hot/cold areas (and > read/write paterns) in the tables and indexes, in PostgreSQL those are by default > written in segments of 1GB. > > There are some possible usecase already: > > * planning of hardware upgrade > * easier configuration setup (both PostgreSQL and linux) > * provide more informations to the planner/executor of PostgreSQL > > My ideas so far are to > > * improve mincore() in linux and add it information like in freeBSD (at > least adding 'mincore_modified' to track clean vs dirty pages). > * adding fincore() to make the information easier to grab from PostgreSQL (no > mmap) > * maybe some access to those stats in /proc/ > > It makes years that libprefetch, mincore() and fincore() are discussed on linux > mailling lists. And they got a good feedback... So I hope it is ok to keep on > those and provide updated patches. Johannes, I add you in CC because you're the last one who proposed something. Can I update your patch with previous suggestions from reviewers ? I'm also asking for feedback in this area, others ideas are very welcome. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation [-- Attachment #2: Type: text/html, Size: 7179 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mincore() & fincore() 2013-07-25 15:07 ` Cédric Villemain @ 2013-07-25 15:32 ` Johannes Weiner 2013-07-26 1:55 ` Wanpeng Li 2013-07-26 1:55 ` Wanpeng Li 0 siblings, 2 replies; 6+ messages in thread From: Johannes Weiner @ 2013-07-25 15:32 UTC (permalink / raw) To: Cédric Villemain; +Cc: Andrew Morton, linux-mm On Thu, Jul 25, 2013 at 05:07:10PM +0200, Cedric Villemain wrote: > [sorry, previous mail was sent earlier than expected] > > > First, the proposed changes in this email are to be used at least for > > PostgreSQL extensions, maybe for core. > > > > Purpose is to offer better monitoring/tracking of the hot/cold areas (and > > read/write paterns) in the tables and indexes, in PostgreSQL those are by default > > written in segments of 1GB. > > > > There are some possible usecase already: > > > > * planning of hardware upgrade > > * easier configuration setup (both PostgreSQL and linux) > > * provide more informations to the planner/executor of PostgreSQL > > > > My ideas so far are to > > > > * improve mincore() in linux and add it information like in freeBSD (at > > least adding 'mincore_modified' to track clean vs dirty pages). > > * adding fincore() to make the information easier to grab from PostgreSQL (no > > mmap) > > * maybe some access to those stats in /proc/ > > > > It makes years that libprefetch, mincore() and fincore() are discussed on linux > > mailling lists. And they got a good feedback... So I hope it is ok to keep on > > those and provide updated patches. > > Johannes, I add you in CC because you're the last one who proposed something. > Can I update your patch with previous suggestions from reviewers ? Absolutely! > I'm also asking for feedback in this area, others ideas are very welcome. Andrew didn't like the idea of the one byte per covered page representation but all proposals to express continuous ranges in a more compact fashion had worse worst cases and a much more involved interface. I do wonder if we should model fincore() after mincore() and add a separate syscall to query page cache coverage with statistical output (x present [y dirty, z active, whatever] in specified area) rather than describing individual pages or continuous chunks of pages in address order. That might leave us with better interfaces than trying to integrate all of this into one arcane syscall. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mincore() & fincore() 2013-07-25 15:32 ` Johannes Weiner @ 2013-07-26 1:55 ` Wanpeng Li 2013-07-27 20:08 ` Cédric Villemain 2013-07-26 1:55 ` Wanpeng Li 1 sibling, 1 reply; 6+ messages in thread From: Wanpeng Li @ 2013-07-26 1:55 UTC (permalink / raw) To: Johannes Weiner; +Cc: Cédric Villemain, Andrew Morton, linux-mm On Thu, Jul 25, 2013 at 11:32:07AM -0400, Johannes Weiner wrote: >On Thu, Jul 25, 2013 at 05:07:10PM +0200, Cedric Villemain wrote: >> [sorry, previous mail was sent earlier than expected] >> >> > First, the proposed changes in this email are to be used at least for >> > PostgreSQL extensions, maybe for core. >> > >> > Purpose is to offer better monitoring/tracking of the hot/cold areas (and >> > read/write paterns) in the tables and indexes, in PostgreSQL those are by default >> > written in segments of 1GB. >> > >> > There are some possible usecase already: >> > >> > * planning of hardware upgrade >> > * easier configuration setup (both PostgreSQL and linux) >> > * provide more informations to the planner/executor of PostgreSQL >> > >> > My ideas so far are to >> > >> > * improve mincore() in linux and add it information like in freeBSD (at >> > least adding 'mincore_modified' to track clean vs dirty pages). >> > * adding fincore() to make the information easier to grab from PostgreSQL (no >> > mmap) >> > * maybe some access to those stats in /proc/ >> > >> > It makes years that libprefetch, mincore() and fincore() are discussed on linux >> > mailling lists. And they got a good feedback... So I hope it is ok to keep on >> > those and provide updated patches. >> >> Johannes, I add you in CC because you're the last one who proposed something. >> Can I update your patch with previous suggestions from reviewers ? > >Absolutely! > >> I'm also asking for feedback in this area, others ideas are very welcome. > >Andrew didn't like the idea of the one byte per covered page >representation but all proposals to express continuous ranges in a mincore utilize byte array and the least significant bit is used to check if the corresponding page is currently resident in memory, I don't know the history, what's the reason for not using bitmap? >more compact fashion had worse worst cases and a much more involved >interface. > >I do wonder if we should model fincore() after mincore() and add a >separate syscall to query page cache coverage with statistical output >(x present [y dirty, z active, whatever] in specified area) rather >than describing individual pages or continuous chunks of pages in >address order. That might leave us with better interfaces than trying >to integrate all of this into one arcane syscall. > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mincore() & fincore() 2013-07-26 1:55 ` Wanpeng Li @ 2013-07-27 20:08 ` Cédric Villemain 0 siblings, 0 replies; 6+ messages in thread From: Cédric Villemain @ 2013-07-27 20:08 UTC (permalink / raw) To: Wanpeng Li; +Cc: Johannes Weiner, Andrew Morton, linux-mm [-- Attachment #1: Type: Text/Plain, Size: 2061 bytes --] > >> Johannes, I add you in CC because you're the last one who proposed > >> something. Can I update your patch with previous suggestions from > >> reviewers ? > > > >Absolutely! OK. > >> I'm also asking for feedback in this area, others ideas are very > >> welcome. > > > >Andrew didn't like the idea of the one byte per covered page > >representation but all proposals to express continuous ranges in a > > mincore utilize byte array and the least significant bit is used to > check if the corresponding page is currently resident in memory, I > don't know the history, what's the reason for not using bitmap? > > >more compact fashion had worse worst cases and a much more involved > >interface. > > > >I do wonder if we should model fincore() after mincore() and add a > >separate syscall to query page cache coverage with statistical output > >(x present [y dirty, z active, whatever] in specified area) rather > >than describing individual pages or continuous chunks of pages in > >address order. That might leave us with better interfaces than trying > >to integrate all of this into one arcane syscall. It should works too. My tool pgfincore (for postgresql) also outputs the number of group of contiguous in-memory page, it is to get a quick idea of the access pattern: from large number of groups (random) to few groups (sequential). So for this usage, I don't really need the full vector and page level information, but some stats are needed to make those sums useful. However another usage is to snapshot/restore in-memory pages, it is useful in at least 2 scenarios. One for simple server restart, PostgreSQL is back to full speed faster when you're able to restore the previous cache content. The other one is similar, switchover to a previously 'cold' server or prepare a server to get traffic. For those use-cases, it is interesting to have the details. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mincore() & fincore() 2013-07-25 15:32 ` Johannes Weiner 2013-07-26 1:55 ` Wanpeng Li @ 2013-07-26 1:55 ` Wanpeng Li 1 sibling, 0 replies; 6+ messages in thread From: Wanpeng Li @ 2013-07-26 1:55 UTC (permalink / raw) To: Johannes Weiner; +Cc: Cédric Villemain, Andrew Morton, linux-mm On Thu, Jul 25, 2013 at 11:32:07AM -0400, Johannes Weiner wrote: >On Thu, Jul 25, 2013 at 05:07:10PM +0200, Cédric Villemain wrote: >> [sorry, previous mail was sent earlier than expected] >> >> > First, the proposed changes in this email are to be used at least for >> > PostgreSQL extensions, maybe for core. >> > >> > Purpose is to offer better monitoring/tracking of the hot/cold areas (and >> > read/write paterns) in the tables and indexes, in PostgreSQL those are by default >> > written in segments of 1GB. >> > >> > There are some possible usecase already: >> > >> > * planning of hardware upgrade >> > * easier configuration setup (both PostgreSQL and linux) >> > * provide more informations to the planner/executor of PostgreSQL >> > >> > My ideas so far are to >> > >> > * improve mincore() in linux and add it information like in freeBSD (at >> > least adding 'mincore_modified' to track clean vs dirty pages). >> > * adding fincore() to make the information easier to grab from PostgreSQL (no >> > mmap) >> > * maybe some access to those stats in /proc/ >> > >> > It makes years that libprefetch, mincore() and fincore() are discussed on linux >> > mailling lists. And they got a good feedback... So I hope it is ok to keep on >> > those and provide updated patches. >> >> Johannes, I add you in CC because you're the last one who proposed something. >> Can I update your patch with previous suggestions from reviewers ? > >Absolutely! > >> I'm also asking for feedback in this area, others ideas are very welcome. > >Andrew didn't like the idea of the one byte per covered page >representation but all proposals to express continuous ranges in a mincore utilize byte array and the least significant bit is used to check if the corresponding page is currently resident in memory, I don't know the history, what's the reason for not using bitmap? >more compact fashion had worse worst cases and a much more involved >interface. > >I do wonder if we should model fincore() after mincore() and add a >separate syscall to query page cache coverage with statistical output >(x present [y dirty, z active, whatever] in specified area) rather >than describing individual pages or continuous chunks of pages in >address order. That might leave us with better interfaces than trying >to integrate all of this into one arcane syscall. > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-07-27 20:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-25 14:58 mincore() & fincore() Cédric Villemain 2013-07-25 15:07 ` Cédric Villemain 2013-07-25 15:32 ` Johannes Weiner 2013-07-26 1:55 ` Wanpeng Li 2013-07-27 20:08 ` Cédric Villemain 2013-07-26 1:55 ` Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).