* Re: dynamic swap prioritizing
@ 2001-10-10 15:23 Venkatesh Ramamurthy
2001-10-10 15:55 ` 'adilger@turbolabs.com'
0 siblings, 1 reply; 7+ messages in thread
From: Venkatesh Ramamurthy @ 2001-10-10 15:23 UTC (permalink / raw)
To: 'adilger@turbolabs.com', 'xuan--lkml@baldauf.org'
Cc: 'linux-kernel@vger.kernel.org'
> If this is to be generally useful, it would be good to find things
> like max sequential read speed, max sequential write speed, and max
> seek time (at least). Estimates for max sequential read speed and
> seek time could be found at boot time for each disk relatively
> easily, but write speed may have to be found only at runtime (or
> it could all be fed in to the kernel from user space from benchmarks
> run previously).
Maybe we can find out the statistics for the first time (or when swap is
created) and store this information in the swap partition itself. This would
allow us to compute time consuming statistics only once. Also we need to
create new fields in the swap structure for this purpose.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing
2001-10-10 15:23 dynamic swap prioritizing Venkatesh Ramamurthy
@ 2001-10-10 15:55 ` 'adilger@turbolabs.com'
2001-10-10 17:14 ` Richard B. Johnson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: 'adilger@turbolabs.com' @ 2001-10-10 15:55 UTC (permalink / raw)
To: Venkatesh Ramamurthy
Cc: 'xuan--lkml@baldauf.org',
'linux-kernel@vger.kernel.org'
On Oct 10, 2001 11:23 -0400, Venkatesh Ramamurthy wrote:
> > If this is to be generally useful, it would be good to find things
> > like max sequential read speed, max sequential write speed, and max
> > seek time (at least). Estimates for max sequential read speed and
> > seek time could be found at boot time for each disk relatively
> > easily, but write speed may have to be found only at runtime (or
> > it could all be fed in to the kernel from user space from benchmarks
> > run previously).
>
> Maybe we can find out the statistics for the first time (or when swap is
> created) and store this information in the swap partition itself. This would
> allow us to compute time consuming statistics only once. Also we need to
> create new fields in the swap structure for this purpose.
I'd rather just have the statistic data in a regular file for ALL disks,
and then send it to the kernel via ioctl or write to a special file that
the kernel will read from. I don't think it is critical to have this
data right at boot time, since it would only be used for optimizing I/O
access and would not be required for a disk to actually work.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing
2001-10-10 15:55 ` 'adilger@turbolabs.com'
@ 2001-10-10 17:14 ` Richard B. Johnson
2001-10-11 11:30 ` OO swap interface David Nicol
2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf
2 siblings, 0 replies; 7+ messages in thread
From: Richard B. Johnson @ 2001-10-10 17:14 UTC (permalink / raw)
To: 'adilger@turbolabs.com'
Cc: Venkatesh Ramamurthy, 'xuan--lkml@baldauf.org',
'linux-kernel@vger.kernel.org'
I think that when the kernel needs a page of memory, it needs it
NOW! Swapping a dirty page to other in-memory pages wastes the
very page(s) that you need.
The contents of a least-recently used page should be written to
the swap device to free it for immediate use, regardless of the
disk-write speed, and regardless of how close the kernel thinks
the heads are to some track. Other stuff, like "prioritizing"
wastes resources you are trying to obtain. Further, all attempts
so-far, to use "elevator" algorithms to speed disk access fails
to provide any measurable improvements in anything. In fact,
buffering until the data will fit on a nearby track wastes
memory pages and the CPU resources necessary to manage them.
In the days when CPUs were slow, memory was scarce, and I/O
was at a crawl, Digital made a VMS system that worked. Using
the same kind of memory handling should be suburb now-days.
(1) You keep a page of zeroed data. This is used by
all processes for new buffers. A single page handles all.
Reads are always allowed. Writes cause a page-fault. This
is called demand-zero paging.
(2) Pages used for shared file mapping are kept in real
memory as long as possible (run-time libraries).
(3) All other pages are available for swapping. The page-
stealer grabs the least-recently used pages from sleeping
processes first. Tasks that are waiting for I/O are the
next to have their least-recently used pages stolen. Tasks
that are waiting for kernel services are the last to have
their least-recently used pages swiped.
The Linux kernel is not a task, so "waiting for kernel services"
is not valid here. Everything else is.
Not every machine runs with gigahertz processors where CPU
overhead of keeping track of pages is in the noise.
Additionally, prioritizing based upon some "goodness" puts policy in the
kernel.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.
^ permalink raw reply [flat|nested] 7+ messages in thread
* OO swap interface
2001-10-10 15:55 ` 'adilger@turbolabs.com'
2001-10-10 17:14 ` Richard B. Johnson
@ 2001-10-11 11:30 ` David Nicol
2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf
2 siblings, 0 replies; 7+ messages in thread
From: David Nicol @ 2001-10-11 11:30 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Here's an idea that has been mulling under my mullet for
the last few weeks:
the nbd can be used for a swap device, but since swap has no reason
to inform the drive about what parts of it are free, it is not possible
to have a central nbd server overcommit for multiple client swapping
nodes.
Therefore I wonder how tricky it would be to create a swap interface
that is ignorant of disk geometries. the swap interface language
would accept requests for space, with unique handles, and would
return the swapped-out data on representation of the handle. Like
a virtual memory hat check.
--
David Nicol 816.235.1187
1,3,7-trimethylxanthine
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing
2001-10-10 15:55 ` 'adilger@turbolabs.com'
2001-10-10 17:14 ` Richard B. Johnson
2001-10-11 11:30 ` OO swap interface David Nicol
@ 2001-10-12 0:45 ` Xuan Baldauf
2001-10-12 3:32 ` 'adilger@turbolabs.com'
2 siblings, 1 reply; 7+ messages in thread
From: Xuan Baldauf @ 2001-10-12 0:45 UTC (permalink / raw)
To: 'adilger@turbolabs.com'
Cc: Venkatesh Ramamurthy, 'xuan--lkml@baldauf.org',
'linux-kernel@vger.kernel.org'
"'adilger@turbolabs.com'" wrote:
> On Oct 10, 2001 11:23 -0400, Venkatesh Ramamurthy wrote:
> > > If this is to be generally useful, it would be good to find things
> > > like max sequential read speed, max sequential write speed, and max
> > > seek time (at least). Estimates for max sequential read speed and
> > > seek time could be found at boot time for each disk relatively
> > > easily, but write speed may have to be found only at runtime (or
> > > it could all be fed in to the kernel from user space from benchmarks
> > > run previously).
> >
> > Maybe we can find out the statistics for the first time (or when swap is
> > created) and store this information in the swap partition itself. This would
> > allow us to compute time consuming statistics only once. Also we need to
> > create new fields in the swap structure for this purpose.
>
> I'd rather just have the statistic data in a regular file for ALL disks,
> and then send it to the kernel via ioctl or write to a special file that
> the kernel will read from. I don't think it is critical to have this
> data right at boot time, since it would only be used for optimizing I/O
> access and would not be required for a disk to actually work.
>
> Cheers, Andreas
Hey people,
why do you want to separate statistics data out? The statistics are not about disk
throughput, head seek times, etc. They are just about the time between "needing a
page" and "getting that page", which is very abstract. Let's call it the
swapin-delay. It does not only depend on disk-throughput and head seek times, but
also on "device business".
For every swap device, there is a "swap_business" data structure, which covers a
- average_swapin_delay
- average_swapin_delay_last_write_timestamp /* timestamp where swapin_delay was
last written */
There is a "swap_business_memory_timeout" kernel parameter (accessible via /proc)
which represents the length of a time interval from now into the past. This
interval is to be used as the time interval where gathered disk activity data
should be used for reasoning swap decisions of the future.
For every page fault which requires a page to be swapped in, a timestamp is
written to a datastructure covering the swapin process. When the page is ready
available in memory, a function is called which does following:
- compute the current_swapin_delay for the current swapin
- my_swap_device->average_swapin_delay = (current_swapin_delay * (now -
average_swapin_delay_last_write_timestamp) + my_swap_device->average_swapin_delay
* (average_swapin_delay_last_write_timestamp - (now -
swap_business_memory_timeout))/swap_business_memory_timeout;
There are some special cases like "no disk activity". In this case, swap_business
is not updated for that device. But maybe the reason for no disk activity is that
the disk is a swap disk and the values of "swap_business" where once so bad that
this device will not be considered anymore. That would be a "soft deadlock"...
On swapout, the "average_swapin_delay" fields of every "swap_business" data
structure of every swap device is compared against same field of other available
swap devices. According to these comparision, a decision is made where to do the
next swapout to.
Because that framework only can bring advantages if there are at least two swap
devices, it can be skipped for the one-swap-device-case (most setups do not have
more than one swap device, but maybe just because the 32MB or 64MB graphics card
(with plenty of mostly unused RAM) needs to be manually configured for swap...)
I hope that you get the concept more closer. I cannot see reasons why to create
such statistics in advance and feed them to the kernel somehow. For dynamic
systems, you need dynamic statistics, I think. And "the statistics", in fact, only
consist of two variables per swap device. Not something the kernel should not be
able to manage in reasonable time.
Of course, such a feature should be tested for real advantages
Xuân.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing
2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf
@ 2001-10-12 3:32 ` 'adilger@turbolabs.com'
2001-10-12 15:22 ` Xuan Baldauf
0 siblings, 1 reply; 7+ messages in thread
From: 'adilger@turbolabs.com' @ 2001-10-12 3:32 UTC (permalink / raw)
To: Xuan Baldauf; +Cc: Venkatesh Ramamurthy, 'linux-kernel@vger.kernel.org'
On Oct 12, 2001 02:45 +0200, Xuan Baldauf wrote:
> > I'd rather just have the statistic data in a regular file for ALL disks,
> > and then send it to the kernel via ioctl or write to a special file that
> > the kernel will read from. I don't think it is critical to have this
> > data right at boot time, since it would only be used for optimizing I/O
> > access and would not be required for a disk to actually work.
>
> why do you want to separate statistics data out? The statistics are not
> about disk throughput, head seek times, etc. They are just about the time
> between "needing a page" and "getting that page", which is very abstract.
> Let's call it the swapin-delay. It does not only depend on disk-throughput
> and head seek times, but also on "device business".
What I am saying is that such information is useful for ALL devices, and
not just swap devices. There was a long thread from Daniel Phillips
where he was working on (1) a few months ago. Why is this data useful?
1) You have dirty pages in RAM, when should you write them? The current
system is to delay the write as long as possible in case the dirty
pages are discarded (e.g. temp file) before they need to be written.
However, if the disk is idle during this time, then doing the write
immediately will not impose extra overhead, and will mean that the
dirty page could be freed quickly if there was a need for memory.
2) Swap or MD RAID 1 load balancing. Which device should you write to
(swap) or read from (RAID 1)? If you know how fast/busy each device
is, you can make a better decision on this instead of round-robin.
3) Guaranteed rate I/O. For XFS/XLV on SGI IRIX, you can request a
guaranteed I/O rate for a specific time period (e.g. to record video
or capture data from an experiment) and the system will tell you if
it is possible or not. In the IRIX case, they had data on each
drive to tell them what the performance is in advance, while Linux
would need to do a drive-by-drive benchmark instead.
A lot of the data needed for this is already part of "sard", but that
is only reporting the data to user space, while some of the above
decisions need to be done inside the kernel on a continuous basis.
Note that I'm NOT saying that having all of this data will improve
system performance (it may slow it down from overhead), but I was just
advocating a broader view of what could be done (and what has already
been done in related areas).
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynamic swap prioritizing
2001-10-12 3:32 ` 'adilger@turbolabs.com'
@ 2001-10-12 15:22 ` Xuan Baldauf
0 siblings, 0 replies; 7+ messages in thread
From: Xuan Baldauf @ 2001-10-12 15:22 UTC (permalink / raw)
To: 'adilger@turbolabs.com'
Cc: Venkatesh Ramamurthy, 'linux-kernel@vger.kernel.org'
"'adilger@turbolabs.com'" wrote:
> [...]
> I was just
> advocating a broader view of what could be done (and what has already
> been done in related areas).
Okay, understood. :-)
>
>
> Cheers, Andreas
Xuân.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2001-10-12 15:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-10 15:23 dynamic swap prioritizing Venkatesh Ramamurthy
2001-10-10 15:55 ` 'adilger@turbolabs.com'
2001-10-10 17:14 ` Richard B. Johnson
2001-10-11 11:30 ` OO swap interface David Nicol
2001-10-12 0:45 ` dynamic swap prioritizing Xuan Baldauf
2001-10-12 3:32 ` 'adilger@turbolabs.com'
2001-10-12 15:22 ` Xuan Baldauf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox