* Re: dynamic swap prioritizing
@ 2001-10-10 15:23 Venkatesh Ramamurthy
2001-10-10 15:55 ` 'adilger@turbolabs.com'
0 siblings, 1 reply; 7+ messages in thread
From: Venkatesh Ramamurthy @ 2001-10-10 15:23 UTC (permalink / raw)
To: 'adilger@turbolabs.com', 'xuan--lkml@baldauf.org'
Cc: 'linux-kernel@vger.kernel.org'
> If this is to be generally useful, it would be good to find things
> like max sequential read speed, max sequential write speed, and max
> seek time (at least). Estimates for max sequential read speed and
> seek time could be found at boot time for each disk relatively
> easily, but write speed may have to be found only at runtime (or
> it could all be fed in to the kernel from user space from benchmarks
> run previously).
Maybe we can find out the statistics for the first time (or when swap is
created) and store this information in the swap partition itself. This would
allow us to compute time consuming statistics only once. Also we need to
create new fields in the swap structure for this purpose.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: dynamic swap prioritizing 2001-10-10 15:23 dynamic swap prioritizing Venkatesh Ramamurthy @ 2001-10-10 15:55 ` 'adilger@turbolabs.com' 2001-10-10 17:14 ` Richard B. Johnson ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: 'adilger@turbolabs.com' @ 2001-10-10 15:55 UTC (permalink / raw) To: Venkatesh Ramamurthy Cc: 'xuan--lkml@baldauf.org', 'linux-kernel@vger.kernel.org' On Oct 10, 2001 11:23 -0400, Venkatesh Ramamurthy wrote: > > If this is to be generally useful, it would be good to find things > > like max sequential read speed, max sequential write speed, and max > > seek time (at least). Estimates for max sequential read speed and > > seek time could be found at boot time for each disk relatively > > easily, but write speed may have to be found only at runtime (or > > it could all be fed in to the kernel from user space from benchmarks > > run previously). > > Maybe we can find out the statistics for the first time (or when swap is > created) and store this information in the swap partition itself. This would > allow us to compute time consuming statistics only once. Also we need to > create new fields in the swap structure for this purpose. I'd rather just have the statistic data in a regular file for ALL disks, and then send it to the kernel via ioctl or write to a special file that the kernel will read from. I don't think it is critical to have this data right at boot time, since it would only be used for optimizing I/O access and would not be required for a disk to actually work. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing 2001-10-10 15:55 ` 'adilger@turbolabs.com' @ 2001-10-10 17:14 ` Richard B. Johnson 2001-10-11 11:30 ` OO swap interface David Nicol 2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf 2 siblings, 0 replies; 7+ messages in thread From: Richard B. Johnson @ 2001-10-10 17:14 UTC (permalink / raw) To: 'adilger@turbolabs.com' Cc: Venkatesh Ramamurthy, 'xuan--lkml@baldauf.org', 'linux-kernel@vger.kernel.org' I think that when the kernel needs a page of memory, it needs it NOW! Swapping a dirty page to other in-memory pages wastes the very page(s) that you need. The contents of a least-recently used page should be written to the swap device to free it for immediate use, regardless of the disk-write speed, and regardless of how close the kernel thinks the heads are to some track. Other stuff, like "prioritizing" wastes resources you are trying to obtain. Further, all attempts so-far, to use "elevator" algorithms to speed disk access fails to provide any measurable improvements in anything. In fact, buffering until the data will fit on a nearby track wastes memory pages and the CPU resources necessary to manage them. In the days when CPUs were slow, memory was scarce, and I/O was at a crawl, Digital made a VMS system that worked. Using the same kind of memory handling should be suburb now-days. (1) You keep a page of zeroed data. This is used by all processes for new buffers. A single page handles all. Reads are always allowed. Writes cause a page-fault. This is called demand-zero paging. (2) Pages used for shared file mapping are kept in real memory as long as possible (run-time libraries). (3) All other pages are available for swapping. The page- stealer grabs the least-recently used pages from sleeping processes first. Tasks that are waiting for I/O are the next to have their least-recently used pages stolen. Tasks that are waiting for kernel services are the last to have their least-recently used pages swiped. The Linux kernel is not a task, so "waiting for kernel services" is not valid here. Everything else is. Not every machine runs with gigahertz processors where CPU overhead of keeping track of pages is in the noise. Additionally, prioritizing based upon some "goodness" puts policy in the kernel. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). I was going to compile a list of innovations that could be attributed to Microsoft. Once I realized that Ctrl-Alt-Del was handled in the BIOS, I found that there aren't any. ^ permalink raw reply [flat|nested] 7+ messages in thread
* OO swap interface 2001-10-10 15:55 ` 'adilger@turbolabs.com' 2001-10-10 17:14 ` Richard B. Johnson @ 2001-10-11 11:30 ` David Nicol 2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf 2 siblings, 0 replies; 7+ messages in thread From: David Nicol @ 2001-10-11 11:30 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Here's an idea that has been mulling under my mullet for the last few weeks: the nbd can be used for a swap device, but since swap has no reason to inform the drive about what parts of it are free, it is not possible to have a central nbd server overcommit for multiple client swapping nodes. Therefore I wonder how tricky it would be to create a swap interface that is ignorant of disk geometries. the swap interface language would accept requests for space, with unique handles, and would return the swapped-out data on representation of the handle. Like a virtual memory hat check. -- David Nicol 816.235.1187 1,3,7-trimethylxanthine ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing 2001-10-10 15:55 ` 'adilger@turbolabs.com' 2001-10-10 17:14 ` Richard B. Johnson 2001-10-11 11:30 ` OO swap interface David Nicol @ 2001-10-12 0:45 ` Xuan Baldauf 2001-10-12 3:32 ` 'adilger@turbolabs.com' 2 siblings, 1 reply; 7+ messages in thread From: Xuan Baldauf @ 2001-10-12 0:45 UTC (permalink / raw) To: 'adilger@turbolabs.com' Cc: Venkatesh Ramamurthy, 'xuan--lkml@baldauf.org', 'linux-kernel@vger.kernel.org' "'adilger@turbolabs.com'" wrote: > On Oct 10, 2001 11:23 -0400, Venkatesh Ramamurthy wrote: > > > If this is to be generally useful, it would be good to find things > > > like max sequential read speed, max sequential write speed, and max > > > seek time (at least). Estimates for max sequential read speed and > > > seek time could be found at boot time for each disk relatively > > > easily, but write speed may have to be found only at runtime (or > > > it could all be fed in to the kernel from user space from benchmarks > > > run previously). > > > > Maybe we can find out the statistics for the first time (or when swap is > > created) and store this information in the swap partition itself. This would > > allow us to compute time consuming statistics only once. Also we need to > > create new fields in the swap structure for this purpose. > > I'd rather just have the statistic data in a regular file for ALL disks, > and then send it to the kernel via ioctl or write to a special file that > the kernel will read from. I don't think it is critical to have this > data right at boot time, since it would only be used for optimizing I/O > access and would not be required for a disk to actually work. > > Cheers, Andreas Hey people, why do you want to separate statistics data out? The statistics are not about disk throughput, head seek times, etc. They are just about the time between "needing a page" and "getting that page", which is very abstract. Let's call it the swapin-delay. It does not only depend on disk-throughput and head seek times, but also on "device business". For every swap device, there is a "swap_business" data structure, which covers a - average_swapin_delay - average_swapin_delay_last_write_timestamp /* timestamp where swapin_delay was last written */ There is a "swap_business_memory_timeout" kernel parameter (accessible via /proc) which represents the length of a time interval from now into the past. This interval is to be used as the time interval where gathered disk activity data should be used for reasoning swap decisions of the future. For every page fault which requires a page to be swapped in, a timestamp is written to a datastructure covering the swapin process. When the page is ready available in memory, a function is called which does following: - compute the current_swapin_delay for the current swapin - my_swap_device->average_swapin_delay = (current_swapin_delay * (now - average_swapin_delay_last_write_timestamp) + my_swap_device->average_swapin_delay * (average_swapin_delay_last_write_timestamp - (now - swap_business_memory_timeout))/swap_business_memory_timeout; There are some special cases like "no disk activity". In this case, swap_business is not updated for that device. But maybe the reason for no disk activity is that the disk is a swap disk and the values of "swap_business" where once so bad that this device will not be considered anymore. That would be a "soft deadlock"... On swapout, the "average_swapin_delay" fields of every "swap_business" data structure of every swap device is compared against same field of other available swap devices. According to these comparision, a decision is made where to do the next swapout to. Because that framework only can bring advantages if there are at least two swap devices, it can be skipped for the one-swap-device-case (most setups do not have more than one swap device, but maybe just because the 32MB or 64MB graphics card (with plenty of mostly unused RAM) needs to be manually configured for swap...) I hope that you get the concept more closer. I cannot see reasons why to create such statistics in advance and feed them to the kernel somehow. For dynamic systems, you need dynamic statistics, I think. And "the statistics", in fact, only consist of two variables per swap device. Not something the kernel should not be able to manage in reasonable time. Of course, such a feature should be tested for real advantages Xuân. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing 2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf @ 2001-10-12 3:32 ` 'adilger@turbolabs.com' 2001-10-12 15:22 ` Xuan Baldauf 0 siblings, 1 reply; 7+ messages in thread From: 'adilger@turbolabs.com' @ 2001-10-12 3:32 UTC (permalink / raw) To: Xuan Baldauf; +Cc: Venkatesh Ramamurthy, 'linux-kernel@vger.kernel.org' On Oct 12, 2001 02:45 +0200, Xuan Baldauf wrote: > > I'd rather just have the statistic data in a regular file for ALL disks, > > and then send it to the kernel via ioctl or write to a special file that > > the kernel will read from. I don't think it is critical to have this > > data right at boot time, since it would only be used for optimizing I/O > > access and would not be required for a disk to actually work. > > why do you want to separate statistics data out? The statistics are not > about disk throughput, head seek times, etc. They are just about the time > between "needing a page" and "getting that page", which is very abstract. > Let's call it the swapin-delay. It does not only depend on disk-throughput > and head seek times, but also on "device business". What I am saying is that such information is useful for ALL devices, and not just swap devices. There was a long thread from Daniel Phillips where he was working on (1) a few months ago. Why is this data useful? 1) You have dirty pages in RAM, when should you write them? The current system is to delay the write as long as possible in case the dirty pages are discarded (e.g. temp file) before they need to be written. However, if the disk is idle during this time, then doing the write immediately will not impose extra overhead, and will mean that the dirty page could be freed quickly if there was a need for memory. 2) Swap or MD RAID 1 load balancing. Which device should you write to (swap) or read from (RAID 1)? If you know how fast/busy each device is, you can make a better decision on this instead of round-robin. 3) Guaranteed rate I/O. For XFS/XLV on SGI IRIX, you can request a guaranteed I/O rate for a specific time period (e.g. to record video or capture data from an experiment) and the system will tell you if it is possible or not. In the IRIX case, they had data on each drive to tell them what the performance is in advance, while Linux would need to do a drive-by-drive benchmark instead. A lot of the data needed for this is already part of "sard", but that is only reporting the data to user space, while some of the above decisions need to be done inside the kernel on a continuous basis. Note that I'm NOT saying that having all of this data will improve system performance (it may slow it down from overhead), but I was just advocating a broader view of what could be done (and what has already been done in related areas). Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing 2001-10-12 3:32 ` 'adilger@turbolabs.com' @ 2001-10-12 15:22 ` Xuan Baldauf 0 siblings, 0 replies; 7+ messages in thread From: Xuan Baldauf @ 2001-10-12 15:22 UTC (permalink / raw) To: 'adilger@turbolabs.com' Cc: Venkatesh Ramamurthy, 'linux-kernel@vger.kernel.org' "'adilger@turbolabs.com'" wrote: > [...] > I was just > advocating a broader view of what could be done (and what has already > been done in related areas). Okay, understood. :-) > > > Cheers, Andreas Xuân. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2001-10-12 15:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-10 15:23 dynamic swap prioritizing Venkatesh Ramamurthy 2001-10-10 15:55 ` 'adilger@turbolabs.com' 2001-10-10 17:14 ` Richard B. Johnson 2001-10-11 11:30 ` OO swap interface David Nicol 2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf 2001-10-12 3:32 ` 'adilger@turbolabs.com' 2001-10-12 15:22 ` Xuan Baldauf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox