* Re: [PATCH 1/3] i/o bandwidth controller documentation
[not found] <200806201602.m5KG2Zx32671@inv.it.uc3m.es>
@ 2008-06-20 16:11 ` Peter T. Breuer
2008-07-04 15:35 ` Andrea Righi
0 siblings, 1 reply; 12+ messages in thread
From: Peter T. Breuer @ 2008-06-20 16:11 UTC (permalink / raw)
To: linux kernel
> + Block device I/O bandwidth controller
How can this work? You will limit the number of available buffer heads
per second?
Unfortunaely, the problem is the fs above the block device. If the
block device is artificially slowed then the fs will still happily allow
a process to fill buffers forever until memory is full, while the block
device continues to trickle the buffers away.
What one wants is for the fs buffering to be linked to the underlying
block device i/o speed. One wants the rate at which fs buffers are
filled to be no more than (modulu brief spurts) the rate at which the
device operates.
That way networked block devices have a chance of having some memory
left to send the dirty buffers out to the net with. B/w limiting the
device itself doesn't seem to me to do any good.
Peter
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-20 16:11 ` [PATCH 1/3] i/o bandwidth controller documentation Peter T. Breuer
@ 2008-07-04 15:35 ` Andrea Righi
0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2008-07-04 15:35 UTC (permalink / raw)
To: ptb; +Cc: linux kernel
Peter T. Breuer wrote:
>> + Block device I/O bandwidth controller
>
> How can this work? You will limit the number of available buffer heads
> per second?
>
> Unfortunaely, the problem is the fs above the block device. If the
> block device is artificially slowed then the fs will still happily allow
> a process to fill buffers forever until memory is full, while the block
> device continues to trickle the buffers away.
>
> What one wants is for the fs buffering to be linked to the underlying
> block device i/o speed. One wants the rate at which fs buffers are
> filled to be no more than (modulu brief spurts) the rate at which the
> device operates.
>
> That way networked block devices have a chance of having some memory
> left to send the dirty buffers out to the net with. B/w limiting the
> device itself doesn't seem to me to do any good.
>
> Peter
Peter,
I see your message only now, it seems you didn't add me in to or cc.
Anyway, I totally agree with you, but it seems there's a
misunderstanding here. The block device i/o bw controller *does*
throttling slowing down applications' requests and not the dispatching
of the already submitted i/o requests.
IMHO, for the same reason you pointed, delaying the dispatching of i/o
requests simply leads to an excessive page cache and buffers
consumption, because userspace apps dirty ratio is actually never
limited.
As reported in the io-throttle documentation:
"This controller allows to limit the I/O bandwidth of specific block
devices for specific process containers (cgroups) imposing additional
delays on I/O requests for those processes that exceed the limits
defined in the control group filesystem."
Do you think we can use a better wording to describe this concept?
-Andrea
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] i/o bandwidth controller documentation
@ 2008-07-04 13:58 Andrea Righi
0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2008-07-04 13:58 UTC (permalink / raw)
To: Balbir Singh, Paul Menage
Cc: Carl Henrik Lunde, axboe, matt, roberto, randy.dunlap,
Divyesh Shah, subrata, eric.rannaud, akpm, containers,
linux-kernel, Andrea Righi
Documentation of the block device I/O bandwidth controller: description, usage,
advantages and design.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
Documentation/controllers/io-throttle.txt | 265 +++++++++++++++++++++++++++++
1 files changed, 265 insertions(+), 0 deletions(-)
create mode 100644 Documentation/controllers/io-throttle.txt
diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
new file mode 100644
index 0000000..578d78e
--- /dev/null
+++ b/Documentation/controllers/io-throttle.txt
@@ -0,0 +1,265 @@
+
+ Block device I/O bandwidth controller
+
+1. Description
+
+This controller allows to limit the I/O bandwidth of specific block devices for
+specific process containers (cgroups) imposing additional delays on I/O
+requests for those processes that exceed the limits defined in the control
+group filesystem.
+
+Bandwidth limiting rules offer better control over QoS with respect to priority
+or weight-based solutions that only give information about applications'
+relative performance requirements. Nevertheless, priority based solutions are
+affected by performance bursts, when only low-priority requests are submitted
+to a general purpose resource dispatcher.
+
+The goal of the I/O bandwidth controller is to improve performance
+predictability and provide performance isolation of different control groups
+sharing the same block devices.
+
+NOTE #1: If you're looking for a way to improve the overall throughput of the
+system probably you should use a different solution.
+
+NOTE #2: The current implementation does not guarantee minimum bandwidth
+levels, the QoS is implemented only slowing down I/O "traffic" that exceeds the
+limits specified by the user; minimum I/O rate thresholds are supposed to be
+guaranteed if the user configures a proper I/O bandwidth partitioning of the
+block devices shared among the different cgroups (theoretically if the sum of
+all the single limits defined for a block device doesn't exceed the total I/O
+bandwidth of that device).
+
+2. User Interface
+
+A new I/O bandwidth limitation rule is described using the file
+blockio.bandwidth.
+
+The same file can be used to set multiple rules for different block devices
+relative to the same cgroup.
+
+The syntax to configure a limiting rule is the following:
+
+# /bin/echo DEV:BW:STRATEGY:BUCKET_SIZE > CGROUP/blockio.bandwidth
+
+- DEV is the name of the device the limiting rule is applied to.
+
+- BW is the maximum I/O bandwidth on DEVICE allowed by CGROUP; bandwidth must
+ be expressed in bytes/s.
+
+- STRATEGY is the throttling strategy used to throttle the applications' I/O
+ requests from/to device DEV. At the moment two different strategies can be
+ used:
+
+ 0 = leaky bucket: the controller accepts at most B bytes (B = BW * time);
+ further I/O requests are delayed scheduling a timeout for
+ the tasks that made those requests.
+
+ Different I/O flow
+ | | |
+ | v |
+ | v
+ v
+ .......
+ \ /
+ \ / leaky-bucket
+ ---
+ |||
+ vvv
+ Smoothed I/O flow
+
+ 1 = token bucket: BW tokens are added to the bucket every seconds; the bucket
+ can hold at the most BUCKET_SIZE tokens; I/O requests are
+ accepted if there are available tokens in the bucket; when
+ a request of N bytes arrives N tokens are removed from the
+ bucket; if fewer than N tokens are available the request is
+ delayed until a sufficient amount of token is available in
+ the bucket.
+
+ Tokens (I/O rate)
+ o
+ o
+ o
+ ....... <--.
+ \ / | Bucket size (burst limit)
+ \ooo/ |
+ --- <--'
+ |ooo
+ Incoming --->|---> Conforming
+ I/O |oo I/O
+ requests -->|--> requests
+ |
+ ---->|
+
+ Leaky bucket is more precise than token bucket to respect the bandwidth
+ limits, because bursty workloads are always smoothed. Token bucket, instead,
+ allows a small irregularity degree in the I/O flows (burst limit), and, for
+ this, it is better in terms of efficiency (bursty workloads are not smoothed
+ when there are sufficient tokens in the bucket).
+
+- BUCKET_SIZE is used only with token bucket (STRATEGY == 1) and defines the
+ size of the bucket in bytes.
+
+- CGROUP is the name of the limited process container.
+
+All the defined rules and statistics for a specific cgroup can be shown reading
+the file blockio.bandwidth. The following syntax is used:
+
+$ cat CGROUP/blockio.bandwidth
+MAJOR MINOR BW STRATEGY LEAKY_STAT BUCKET_SIZE BUCKET_FILL TIME_DELTA
+
+- MAJOR is the major device number of DEV (defined above)
+
+- MINOR is the minor device number of DEV (defined above)
+
+- BW, STRATEGY and BUCKET_SIZE are the same parameters defined above
+
+- LEAKY_STAT is the amount of bytes currently allowed by the I/O bandwidth
+ controller (only used with leaky bucket strategy - STRATEGY == 0)
+
+- BUCKET_FILL represents the amount of tokens present in the bucket (only used
+ with token bucket strategy - STRATEGY == 1)
+
+- TIME_DELTA can be one of the following:
+ - the amount of jiffies elapsed from the last I/O request (token bucket)
+ - the amount of jiffies during which the bytes given by LEAKY_STAT have been
+ accumulated (leaky bucket)
+
+Multiple per-block device rules are reported in multiple rows
+(DEVi, i = 1 .. n):
+
+$ cat CGROUP/blockio.bandwidth
+MAJOR1 MINOR1 BW1 STRATEGY1 LEAKY_STAT1 BUCKET_SIZE1 BUCKET_FILL1 TIME_DELTA1
+MAJOR1 MINOR1 BW2 STRATEGY2 LEAKY_STAT2 BUCKET_SIZE2 BUCKET_FILL2 TIME_DELTA2
+...
+MAJORn MINORn BWn STRATEGYn LEAKY_STATn BUCKET_SIZEn BUCKET_FILLn TIME_DELTAn
+
+I/O bandwidth limiting rules can be removed setting the BW value to 0.
+
+Examples:
+
+* Mount the cgroup filesystem (blockio subsystem):
+ # mkdir /mnt/cgroup
+ # mount -t cgroup -oblockio blockio /mnt/cgroup
+
+* Instantiate the new cgroup "foo":
+ # mkdir /mnt/cgroup/foo
+ --> the cgroup foo has been created
+
+* Add the current shell process to the cgroup "foo":
+ # /bin/echo $$ > /mnt/cgroup/foo/tasks
+ --> the current shell has been added to the cgroup "foo"
+
+* Give maximum 1MiB/s of I/O bandwidth on /dev/sda for the cgroup "foo", using
+ leaky bucket throttling strategy:
+ # /bin/echo /dev/sda:$((1024 * 1024)):0:0 > \
+ > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda
+
+* Give maximum 8MiB/s of I/O bandwidth on /dev/sdb for the cgroup "foo", using
+ token bucket throttling strategy, bucket size = 8MB:
+ # /bin/echo /dev/sdb:$((8 * 1024 * 1024)):1:$((8 * 1024 * 1024)) > \
+ > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda (controlled by leaky bucket throttling)
+ and 8MiB/s on /dev/sdb (controlled by token bucket throttling)
+
+* Run a benchmark doing I/O on /dev/sda and /dev/sdb; I/O limits and usage
+ defined for cgroup "foo" can be shown as following:
+ # cat /mnt/cgroup/foo/blockio.bandwidth
+ 8 16 8388608 1 0 8388608 -522560 48
+ 8 0 1048576 0 737280 0 0 216
+
+* Extend the maximum I/O bandwidth for the cgroup "foo" to 16MiB/s on /dev/sda:
+ # /bin/echo /dev/sda:$((16 * 1024 * 1024)):0:0 > \
+ > /mnt/cgroup/foo/blockio.bandwidth
+ # cat /mnt/cgroup/foo/blockio.bandwidth
+ 8 16 8388608 1 0 8388608 -84432 206436
+ 8 0 16777216 0 0 0 0 15212
+
+* Remove limiting rule on /dev/sdb for cgroup "foo":
+ # /bin/echo /dev/sdb:0:0:0 > /mnt/cgroup/foo/blockio.bandwidth
+ # cat /mnt/cgroup/foo/blockio.bandwidth
+ 8 0 16777216 0 0 0 0 110388
+
+3. Advantages of providing this feature
+
+* Allow I/O traffic shaping for block device shared among different cgroups
+* Improve I/O performance predictability on block devices shared between
+ different cgroups
+* Limiting rules do not depend of the particular I/O scheduler (anticipatory,
+ deadline, CFQ, noop) and/or the type of the underlying block devices
+* The bandwidth limitations are guaranteed both for synchronous and
+ asynchronous operations, even the I/O passing through the page cache or
+ buffers and not only direct I/O (see below for details)
+* It is possible to implement a simple user-space application to dynamically
+ adjust the I/O workload of different process containers at run-time,
+ according to the particular users' requirements and applications' performance
+ constraints
+* It is even possible to implement event-based performance throttling
+ mechanisms; for example the same user-space application could actively
+ throttle the I/O bandwidth to reduce power consumption when the battery of a
+ mobile device is running low (power throttling) or when the temperature of a
+ hardware component is too high (thermal throttling)
+* Provides zero overhead for non block device I/O bandwidth controller users
+
+4. Design
+
+The I/O throttling is performed imposing an explicit timeout, via
+schedule_timeout_killable() on the processes that exceed the I/O bandwidth
+dedicated to the cgroup they belong to. I/O accounting happens per cgroup.
+
+It just works as expected for read operations: the real I/O activity is reduced
+synchronously according to the defined limitations.
+
+Write operations, instead, are modeled depending of the dirty pages ratio
+(write throttling in memory), since the writes to the real block devices are
+processed asynchronously by different kernel threads (pdflush). However, the
+dirty pages ratio is directly proportional to the actual I/O that will be
+performed on the real block device. So, due to the asynchronous transfers
+through the page cache, the I/O throttling in memory can be considered a form
+of anticipatory throttling to the underlying block devices.
+
+Multiple re-writes in already dirtied page cache areas are not considered for
+accounting the I/O activity. This is valid for multiple re-reads of pages
+already present in the page cache as well.
+
+This means that a process that re-writes and/or re-reads multiple times the
+same blocks in a file (without re-creating it by truncate(), ftrunctate(),
+creat(), etc.) is affected by the I/O limitations only for the actual I/O
+performed to (or from) the underlying block devices.
+
+Multiple rules for different block devices are stored in a linked list, using
+the dev_t number of each block device as key to uniquely identify each element
+of the list. RCU synchronization is used to protect the whole list structure,
+since the elements in the list are not supposed to change frequently (they
+change only when a new rule is defined or an old rule is removed or updated),
+while the reads in the list occur at each operation that generates I/O. This
+allows to provide zero overhead for cgroups that do not use any limitation.
+
+WARNING: per-block device limiting rules always refer to the dev_t device
+number. If a block device is unplugged (i.e. a USB device) the limiting rules
+defined for that device persist and they are still valid if a new device is
+plugged in the system and it uses the same major and minor numbers.
+
+5. Todo
+
+* Think an alternative design for general purpose usage; special purpose usage
+ right now is restricted to improve I/O performance predictability and
+ evaluate more precise response timings for applications doing I/O. To a large
+ degree the block I/O bandwidth controller should implement a more complex
+ logic to better evaluate real I/O operations cost, depending also on the
+ particular block device profile (i.e. USB stick, optical drive, hard disk,
+ etc.). This would also allow to appropriately account I/O cost for seeky
+ workloads, respect to large stream workloads. Instead of looking at the
+ request stream and try to predict how expensive the I/O cost will be, a
+ totally different approach could be to collect request timings (start time /
+ elapsed time) and based on collected informations, try to estimate the I/O
+ cost and usage (idea proposed by Andrew Morton <akpm@linux-foundation.org>).
+
+* Correcly handle AIO: at the moment the approach is to make a task sleep also
+ when doing asynchronous I/O. A more reasonable behaviour would be to return
+ EAGAIN from aio_write()/aio_read()
+ (reported by Eric Rannaud <eric.rannaud@gmail.com>).
--
1.5.4.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 1/3] i/o bandwidth controller documentation
@ 2008-06-20 10:05 Andrea Righi
2008-06-20 17:08 ` Randy Dunlap
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Righi @ 2008-06-20 10:05 UTC (permalink / raw)
To: Balbir Singh, Paul Menage, Carl Henrik Lunde
Cc: axboe, matt, roberto, randy.dunlap, Divyesh Shah, akpm,
containers, linux-kernel, Andrea Righi
Documentation of the block device I/O bandwidth controller: description, usage,
advantages and design.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
Documentation/controllers/io-throttle.txt | 163 +++++++++++++++++++++++++++++
1 files changed, 163 insertions(+), 0 deletions(-)
create mode 100644 Documentation/controllers/io-throttle.txt
diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
new file mode 100644
index 0000000..e1df98a
--- /dev/null
+++ b/Documentation/controllers/io-throttle.txt
@@ -0,0 +1,163 @@
+
+ Block device I/O bandwidth controller
+
+1. Description
+
+This controller allows to limit the I/O bandwidth of specific block devices for
+specific process containers (cgroups) imposing additional delays on I/O
+requests for those processes that exceed the limits defined in the control
+group filesystem.
+
+Bandwidth limiting rules offer better control over QoS with respect to priority
+or weight-based solutions that only give information about applications'
+relative performance requirements.
+
+The goal of the I/O bandwidth controller is to improve performance
+predictability and QoS of the different control groups sharing the same block
+devices.
+
+NOTE #1: if you're looking for a way to improve the overall throughput of the
+system probably you should use a different solution.
+
+NOTE #2: the current implementation does not guarantee minimum bandwidth
+levels, the QoS is implemented only slowing down i/o "traffic" that exceeds the
+limits specified by the user. Minimum i/o rate thresholds are supposed to be
+guaranteed if the user configures a proper i/o bandwidth partitioning of the
+block devices shared among the different cgroups (theoretically if the sum of
+all the single limits defined for a block device doesn't exceed the total i/o
+bandwidth of that device).
+
+2. User Interface
+
+A new I/O bandwidth limitation rule is described using the file
+blockio.bandwidth.
+
+The same file can be used to set multiple rules for different block devices
+relative to the same cgroup.
+
+The syntax is the following:
+# /bin/echo DEVICE:BANDWIDTH > CGROUP/blockio.bandwidth
+
+- DEVICE is the name of the device the limiting rule is applied to,
+- BANDWIDTH is the maximum I/O bandwidth on DEVICE allowed by CGROUP (we can
+ use a suffix k, K, m, M, g or G to indicate bandwidth values in KB/s, MB/s
+ or GB/s),
+- CGROUP is the name of the limited process container.
+
+Examples:
+
+* Mount the cgroup filesystem (blockio subsystem):
+ # mkdir /mnt/cgroup
+ # mount -t cgroup -oblockio blockio /mnt/cgroup
+
+* Instantiate the new cgroup "foo":
+ # mkdir /mnt/cgroup/foo
+ --> the cgroup foo has been created
+
+* Add the current shell process to the cgroup "foo":
+ # /bin/echo $$ > /mnt/cgroup/foo/tasks
+ --> the current shell has been added to the cgroup "foo"
+
+* Give maximum 1MiB/s of I/O bandwidth on /dev/sda1 for the cgroup "foo":
+ # /bin/echo /dev/sda1:1M > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda1 (blockio.bandwidth is expressed in
+ KiB/s).
+
+* Give maximum 8MiB/s of I/O bandwidth on /dev/sdb for the cgroup "foo":
+ # /bin/echo /dev/sda5:8M > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda1 and 8MiB/s on /dev/sda5.
+ NOTE: each partition needs its own limitation rule! In this case, for
+ example, there's no limitation on /dev/sda5 for cgroup "foo".
+
+* Run a benchmark doing I/O on /dev/sda1 and /dev/sda5; I/O limits and usage
+ defined for cgroup "foo" can be shown as following:
+ # cat /mnt/cgroup/foo/blockio.bandwidth
+ === device (8,1) ===
+ bandwidth limit: 1024 KiB/sec
+ current i/o usage: 819 KiB/sec
+ === device (8,5) ===
+ bandwidth limit: 1024 KiB/sec
+ current i/o usage: 3102 KiB/sec
+
+ Devices are reported using (major, minor) numbers when reading
+ blockio.bandwidth.
+
+ The corresponding device names can be retrieved in /proc/diskstats (or in
+ other places as well).
+
+ For example to find the name of the device (8,5):
+ # sed -ne 's/^ \+8 \+5 \([^ ]\+\).*/\1/p' /proc/diskstats
+ sda5
+
+ Current I/O usage can be greater than bandwidth limit, this means the i/o
+ controller is going to impose the limitation.
+
+* Extend the maximum I/O bandwidth for the cgroup "foo" to 8MiB/s:
+ # /bin/echo /dev/sda1:8M > /mnt/cgroup/foo/blockio-bandwidth
+
+* Remove limiting rule on /dev/sda1 for cgroup "foo":
+ # /bin/echo /dev/sda1:0 > /mnt/cgroup/foo/blockio-bandwidth
+
+3. Advantages of providing this feature
+
+* Allow I/O traffic shaping for block device shared among different cgroups
+* Improve I/O performance predictability on block devices shared between
+ different cgroups
+* Limiting rules do not depend of the particular I/O scheduler (anticipatory,
+ deadline, CFQ, noop) and/or the type of the underlying block devices
+* The bandwidth limitations are guaranteed both for synchronous and
+ asynchronous operations, even the I/O passing through the page cache or
+ buffers and not only direct I/O (see below for details)
+* It is possible to implement a simple user-space application to dynamically
+ adjust the I/O workload of different process containers at run-time,
+ according to the particular users' requirements and applications' performance
+ constraints
+* It is even possible to implement event-based performance throttling
+ mechanisms; for example the same user-space application could actively
+ throttle the I/O bandwidth to reduce power consumption when the battery of a
+ mobile device is running low (power throttling) or when the temperature of a
+ hardware component is too high (thermal throttling)
+* Provides zero overhead for non block device I/O bandwidth controller users
+
+4. Design
+
+The I/O throttling is performed imposing an explicit timeout, via
+schedule_timeout_killable() on the processes that exceed the I/O bandwidth
+dedicated to the cgroup they belong to. I/O accounting happens per cgroup.
+
+It just works as expected for read operations: the real I/O activity is reduced
+synchronously according to the defined limitations.
+
+Write operations, instead, are modeled depending of the dirty pages ratio
+(write throttling in memory), since the writes to the real block devices are
+processed asynchronously by different kernel threads (pdflush). However, the
+dirty pages ratio is directly proportional to the actual I/O that will be
+performed on the real block device. So, due to the asynchronous transfers
+through the page cache, the I/O throttling in memory can be considered a form
+of anticipatory throttling to the underlying block devices.
+
+Multiple re-writes in already dirtied page cache areas are not considered for
+accounting the I/O activity. This is valid for multiple re-reads of pages
+already present in the page cache as well.
+
+This means that a process that re-writes and/or re-reads multiple times the
+same blocks in a file (without re-creating it by truncate(), ftrunctate(),
+creat(), etc.) is affected by the I/O limitations only for the actual I/O
+performed to (or from) the underlying block devices.
+
+Multiple rules for different block devices are stored in a linked list, using
+the dev_t number of each block device as key to uniquely identify each element
+of the list. RCU synchronization is used to protect the whole list structure,
+since the elements in the list are not supposed to change frequently (they
+change only when a new rule is defined or an old rule is removed or updated),
+while the reads in the list occur at each operation that generates I/O. This
+allows to provide zero overhead for cgroups that do not use any limitation.
+
+WARNING: per-block device limiting rules always refer to the dev_t device
+number. If a block device is unplugged (i.e. a USB device) the limiting rules
+associated to that device persist and they are still valid if a new device is
+plugged in the system and it uses the same major and minor numbers.
--
1.5.4.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-20 10:05 Andrea Righi
@ 2008-06-20 17:08 ` Randy Dunlap
2008-06-21 10:35 ` Andrea Righi
0 siblings, 1 reply; 12+ messages in thread
From: Randy Dunlap @ 2008-06-20 17:08 UTC (permalink / raw)
To: Andrea Righi
Cc: Balbir Singh, Paul Menage, Carl Henrik Lunde, axboe, matt,
roberto, Divyesh Shah, akpm, containers, linux-kernel
On Fri, 20 Jun 2008 12:05:33 +0200 Andrea Righi wrote:
> Documentation of the block device I/O bandwidth controller: description, usage,
> advantages and design.
>
> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> ---
> Documentation/controllers/io-throttle.txt | 163 +++++++++++++++++++++++++++++
> 1 files changed, 163 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/controllers/io-throttle.txt
>
> diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
> new file mode 100644
> index 0000000..e1df98a
> --- /dev/null
> +++ b/Documentation/controllers/io-throttle.txt
> @@ -0,0 +1,163 @@
> +
> + Block device I/O bandwidth controller
> +
> +1. Description
> +
> +This controller allows to limit the I/O bandwidth of specific block devices for
> +specific process containers (cgroups) imposing additional delays on I/O
> +requests for those processes that exceed the limits defined in the control
> +group filesystem.
> +
> +Bandwidth limiting rules offer better control over QoS with respect to priority
> +or weight-based solutions that only give information about applications'
> +relative performance requirements.
> +
> +The goal of the I/O bandwidth controller is to improve performance
> +predictability and QoS of the different control groups sharing the same block
> +devices.
> +
> +NOTE #1: if you're looking for a way to improve the overall throughput of the
I would s/if/If/
> +system probably you should use a different solution.
> +
> +NOTE #2: the current implementation does not guarantee minimum bandwidth
s/the/The/
> +levels, the QoS is implemented only slowing down i/o "traffic" that exceeds the
Please consistenly use "I/O" instead of "i/o".
Above comma makes a run-on sentence. A period or semi-colon would be better IMO.
> +limits specified by the user. Minimum i/o rate thresholds are supposed to be
> +guaranteed if the user configures a proper i/o bandwidth partitioning of the
> +block devices shared among the different cgroups (theoretically if the sum of
> +all the single limits defined for a block device doesn't exceed the total i/o
> +bandwidth of that device).
> +
> +2. User Interface
> +
> +A new I/O bandwidth limitation rule is described using the file
> +blockio.bandwidth.
> +
> +The same file can be used to set multiple rules for different block devices
> +relative to the same cgroup.
> +
> +The syntax is the following:
> +# /bin/echo DEVICE:BANDWIDTH > CGROUP/blockio.bandwidth
> +
> +- DEVICE is the name of the device the limiting rule is applied to,
> +- BANDWIDTH is the maximum I/O bandwidth on DEVICE allowed by CGROUP (we can
> + use a suffix k, K, m, M, g or G to indicate bandwidth values in KB/s, MB/s
> + or GB/s),
> +- CGROUP is the name of the limited process container.
> +
> +Examples:
> +
> +* Mount the cgroup filesystem (blockio subsystem):
> + # mkdir /mnt/cgroup
> + # mount -t cgroup -oblockio blockio /mnt/cgroup
> +
> +* Instantiate the new cgroup "foo":
> + # mkdir /mnt/cgroup/foo
> + --> the cgroup foo has been created
> +
> +* Add the current shell process to the cgroup "foo":
> + # /bin/echo $$ > /mnt/cgroup/foo/tasks
> + --> the current shell has been added to the cgroup "foo"
> +
> +* Give maximum 1MiB/s of I/O bandwidth on /dev/sda1 for the cgroup "foo":
> + # /bin/echo /dev/sda1:1M > /mnt/cgroup/foo/blockio.bandwidth
> + # sh
> + --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
> + bandwidth of 1MiB/s on /dev/sda1 (blockio.bandwidth is expressed in
> + KiB/s).
> +
> +* Give maximum 8MiB/s of I/O bandwidth on /dev/sdb for the cgroup "foo":
> + # /bin/echo /dev/sda5:8M > /mnt/cgroup/foo/blockio.bandwidth
> + # sh
> + --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
> + bandwidth of 1MiB/s on /dev/sda1 and 8MiB/s on /dev/sda5.
> + NOTE: each partition needs its own limitation rule! In this case, for
> + example, there's no limitation on /dev/sda5 for cgroup "foo".
> +
> +* Run a benchmark doing I/O on /dev/sda1 and /dev/sda5; I/O limits and usage
> + defined for cgroup "foo" can be shown as following:
> + # cat /mnt/cgroup/foo/blockio.bandwidth
> + === device (8,1) ===
> + bandwidth limit: 1024 KiB/sec
> + current i/o usage: 819 KiB/sec
> + === device (8,5) ===
> + bandwidth limit: 1024 KiB/sec
> + current i/o usage: 3102 KiB/sec
Ugh, this makes it look like the output does "pretty printing" (formatting),
which is generally not a good idea. Let some app be responsible for that,
not the kernel. Basically this means don't use leading spaces just to make the
":"s line up in the output.
> +
> + Devices are reported using (major, minor) numbers when reading
> + blockio.bandwidth.
> +
> + The corresponding device names can be retrieved in /proc/diskstats (or in
> + other places as well).
> +
> + For example to find the name of the device (8,5):
> + # sed -ne 's/^ \+8 \+5 \([^ ]\+\).*/\1/p' /proc/diskstats
> + sda5
> +
> + Current I/O usage can be greater than bandwidth limit, this means the i/o
Run-on sentence. Change , to . (with This) or use ;
> + controller is going to impose the limitation.
> +
> +* Extend the maximum I/O bandwidth for the cgroup "foo" to 8MiB/s:
> + # /bin/echo /dev/sda1:8M > /mnt/cgroup/foo/blockio-bandwidth
> +
> +* Remove limiting rule on /dev/sda1 for cgroup "foo":
> + # /bin/echo /dev/sda1:0 > /mnt/cgroup/foo/blockio-bandwidth
> +
> +3. Advantages of providing this feature
> +
> +* Allow I/O traffic shaping for block device shared among different cgroups
> +* Improve I/O performance predictability on block devices shared between
> + different cgroups
> +* Limiting rules do not depend of the particular I/O scheduler (anticipatory,
> + deadline, CFQ, noop) and/or the type of the underlying block devices
> +* The bandwidth limitations are guaranteed both for synchronous and
> + asynchronous operations, even the I/O passing through the page cache or
> + buffers and not only direct I/O (see below for details)
> +* It is possible to implement a simple user-space application to dynamically
> + adjust the I/O workload of different process containers at run-time,
> + according to the particular users' requirements and applications' performance
> + constraints
> +* It is even possible to implement event-based performance throttling
> + mechanisms; for example the same user-space application could actively
> + throttle the I/O bandwidth to reduce power consumption when the battery of a
> + mobile device is running low (power throttling) or when the temperature of a
> + hardware component is too high (thermal throttling)
> +* Provides zero overhead for non block device I/O bandwidth controller users
> +
> +4. Design
> +
> +The I/O throttling is performed imposing an explicit timeout, via
> +schedule_timeout_killable() on the processes that exceed the I/O bandwidth
> +dedicated to the cgroup they belong to. I/O accounting happens per cgroup.
> +
> +It just works as expected for read operations: the real I/O activity is reduced
> +synchronously according to the defined limitations.
> +
> +Write operations, instead, are modeled depending of the dirty pages ratio
> +(write throttling in memory), since the writes to the real block devices are
> +processed asynchronously by different kernel threads (pdflush). However, the
> +dirty pages ratio is directly proportional to the actual I/O that will be
> +performed on the real block device. So, due to the asynchronous transfers
> +through the page cache, the I/O throttling in memory can be considered a form
> +of anticipatory throttling to the underlying block devices.
> +
> +Multiple re-writes in already dirtied page cache areas are not considered for
> +accounting the I/O activity. This is valid for multiple re-reads of pages
> +already present in the page cache as well.
> +
> +This means that a process that re-writes and/or re-reads multiple times the
> +same blocks in a file (without re-creating it by truncate(), ftrunctate(),
> +creat(), etc.) is affected by the I/O limitations only for the actual I/O
> +performed to (or from) the underlying block devices.
> +
> +Multiple rules for different block devices are stored in a linked list, using
> +the dev_t number of each block device as key to uniquely identify each element
> +of the list. RCU synchronization is used to protect the whole list structure,
> +since the elements in the list are not supposed to change frequently (they
> +change only when a new rule is defined or an old rule is removed or updated),
> +while the reads in the list occur at each operation that generates I/O. This
> +allows to provide zero overhead for cgroups that do not use any limitation.
> +
> +WARNING: per-block device limiting rules always refer to the dev_t device
> +number. If a block device is unplugged (i.e. a USB device) the limiting rules
> +associated to that device persist and they are still valid if a new device is
associated with (?)
> +plugged in the system and it uses the same major and minor numbers.
> --
---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-20 17:08 ` Randy Dunlap
@ 2008-06-21 10:35 ` Andrea Righi
2008-06-22 16:03 ` Randy Dunlap
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Righi @ 2008-06-21 10:35 UTC (permalink / raw)
To: Randy Dunlap
Cc: Balbir Singh, Paul Menage, Carl Henrik Lunde, axboe, matt,
roberto, Divyesh Shah, akpm, containers, linux-kernel
Thanks Randy, I've applied all your fixes to my local documentation,
next patchset version will include them. A few small comments below.
Randy Dunlap wrote:
>> +* Run a benchmark doing I/O on /dev/sda1 and /dev/sda5; I/O limits and usage
>> + defined for cgroup "foo" can be shown as following:
>> + # cat /mnt/cgroup/foo/blockio.bandwidth
>> + === device (8,1) ===
>> + bandwidth limit: 1024 KiB/sec
>> + current i/o usage: 819 KiB/sec
>> + === device (8,5) ===
>> + bandwidth limit: 1024 KiB/sec
>> + current i/o usage: 3102 KiB/sec
>
> Ugh, this makes it look like the output does "pretty printing" (formatting),
> which is generally not a good idea. Let some app be responsible for that,
> not the kernel. Basically this means don't use leading spaces just to make the
> ":"s line up in the output.
Sounds reasonable. I think the output could be further reduced,
the following format should be explanatory enough.
device: %u,%u
bandwidth: %lu KiB/sec
usage: %lu KiB/sec
>> +WARNING: per-block device limiting rules always refer to the dev_t device
>> +number. If a block device is unplugged (i.e. a USB device) the limiting rules
>> +associated to that device persist and they are still valid if a new device is
>
> associated with (?)
what about:
...the limiting rules defined for that device...
-Andrea
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-21 10:35 ` Andrea Righi
@ 2008-06-22 16:03 ` Randy Dunlap
0 siblings, 0 replies; 12+ messages in thread
From: Randy Dunlap @ 2008-06-22 16:03 UTC (permalink / raw)
To: righi.andrea
Cc: axboe, containers, Balbir Singh, Paul Menage, Carl Henrik Lunde,
matt, roberto, linux-kernel, akpm, Divyesh Shah
--- Original Message ---
> Thanks Randy, I've applied all your fixes to my local
> documentation,
> next patchset version will include them. A few small comments
> below.
>
> >> +WARNING: per-block device limiting rules always refer to the dev_t device
> >> +number. If a block device is unplugged (i.e. a USB device) the limiting rules
> >> +associated to that device persist and they are still valid if a new device is
> >
> > associated with (?)
>
> what about:
>
> ...the limiting rules defined for that device...
Hi Andrea,
Yes, that's fine.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] i/o bandwidth controller documentation
@ 2008-06-06 22:27 Andrea Righi
2008-06-11 22:42 ` Randy Dunlap
2008-06-18 15:16 ` Carl Henrik Lunde
0 siblings, 2 replies; 12+ messages in thread
From: Andrea Righi @ 2008-06-06 22:27 UTC (permalink / raw)
To: balbir, menage; +Cc: matt, roberto, randy.dunlap, akpm, linux-kernel
Documentation of the block device I/O bandwidth controller: description, usage,
advantages and design.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
Documentation/controllers/io-throttle.txt | 150 +++++++++++++++++++++++++++++
1 files changed, 150 insertions(+), 0 deletions(-)
create mode 100644 Documentation/controllers/io-throttle.txt
diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
new file mode 100644
index 0000000..5373fa8
--- /dev/null
+++ b/Documentation/controllers/io-throttle.txt
@@ -0,0 +1,150 @@
+
+ Block device I/O bandwidth controller
+
+1. Description
+
+This controller allows to limit the I/O bandwidth of specific block devices for
+specific process containers (cgroups) imposing additional delays on I/O
+requests for those processes that exceed the limits defined in the control
+group filesystem.
+
+Bandwidth limiting rules offer better control over QoS with respect to priority
+or weight-based solutions that only give information about applications'
+relative performance requirements.
+
+The goal of the I/O bandwidth controller is to improve performance
+predictability and QoS of the different control groups sharing the same block
+devices.
+
+NOTE: if you're looking for a way to improve the overall throughput of the
+system probably you should use a different solution.
+
+2. User Interface
+
+A new I/O bandwidth limitation rule is described using the file
+blockio.bandwidth.
+
+The same file can be used to set multiple rules for different block devices
+relatively to the same cgroup.
+
+The syntax is the following:
+# /bin/echo DEVICE:BANDWIDTH > CGROUP/blockio.bandwidth
+
+- DEVICE is the name of the device the limiting rule is applied to,
+- BANDWIDTH is the maximum I/O bandwidth on DEVICE allowed by CGROUP,
+- CGROUP is the name of the limited process container.
+
+Examples:
+
+* Mount the cgroup filesystem (blockio subsystem):
+ # mkdir /mnt/cgroup
+ # mount -t cgroup -oblockio blockio /mnt/cgroup
+
+* Instantiate the new cgroup "foo":
+ # mkdir /mnt/cgroup/foo
+ --> the cgroup foo has been created
+
+* Add the current shell process to the cgroup "foo":
+ # /bin/echo $$ > /mnt/cgroup/foo/tasks
+ --> the current shell has been added to the cgroup "foo"
+
+* Give maximum 1MiB/s of I/O bandwidth on /dev/sda1 for the cgroup "foo":
+ # /bin/echo /dev/sda1:1024 > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda1 (blockio.bandwidth is expressed in
+ KiB/s).
+
+* Give maximum 8MiB/s of I/O bandwidth on /dev/sdb for the cgroup "foo":
+ # /bin/echo /dev/sdb:8192 > /mnt/cgroup/foo/blockio.bandwidth
+ # sh
+ --> the subshell 'sh' is running in cgroup "foo" and it can use a maximum I/O
+ bandwidth of 1MiB/s on /dev/sda1 and 8MiB/s on /dev/sdb.
+ NOTE: each partition needs its own limitation rule! In this case, for
+ example, there's no limitation on /dev/sdb1 for cgroup "foo".
+
+* Show the I/O limits defined for cgroup "foo":
+ # cat /mnt/cgroup/foo/blockio.bandwidth
+ === device (8,1) ===
+ bandwidth-max: 1024 KiB/sec
+ requested: 0 bytes
+ last request: 4294933948 jiffies
+ delta: 2660 jiffies
+ === device (8,5) ===
+ bandwidth-max: 8192 KiB/sec
+ requested: 0 bytes
+ last request: 4294935736 jiffies
+ delta: 872 jiffies
+
+ Devices are reported using (major, minor) numbers when reading
+ blockio.bandwidth.
+
+ The corresponding device names can be retrieved in /proc/diskstats (or in
+ other places as well).
+
+ For example to find the name of the device (8,5):
+ # sed -ne 's/^ \+8 \+5 \([^ ]\+\).*/\1/p' /proc/diskstats
+ sda5
+
+* Extend the maximum I/O bandwidth for the cgroup "foo" to 8MiB/s:
+ # /bin/echo /dev/sda1:8192 > /mnt/cgroup/foo/blockio-bandwidth
+
+* Remove limiting rule on /dev/sda1 for cgroup "foo":
+ # /bin/echo /dev/sda1:0 > /mnt/cgroup/foo/blockio-bandwidth
+
+3. Advantages of providing this feature
+
+* Allow QoS for block device I/O among different cgroups
+* Improve I/O performance predictability on block devices shared between
+ different cgroups
+* Limiting rules do not depend of the particular I/O scheduler (anticipatory,
+ deadline, CFQ, noop) and/or the type of the underlying block devices
+* The bandwidth limitations are guaranteed both for synchronous and
+ asynchronous operations, even the I/O passing through the page cache or
+ buffers and not only direct I/O (see below for details)
+* It is possible to implement a simple user-space application to dynamically
+ adjust the I/O workload of different process containers at run-time,
+ according to the particular users' requirements and applications' performance
+ constraints
+* It is even possible to implement event-based performance throttling
+ mechanisms; for example the same user-space application could actively
+ throttle the I/O bandwidth to reduce power consumption when the battery of a
+ mobile device is running low (power throttling) or when the temperature of a
+ hardware component is too high (thermal throttling)
+
+4. Design
+
+The I/O throttling is performed imposing an explicit timeout, via
+schedule_timeout_killable() on the processes that exceed the I/O bandwidth
+dedicated to the cgroup they belong to.
+
+It just works as expected for read operations: the real I/O activity is reduced
+synchronously according to the defined limitations.
+
+Write operations, instead, are modeled depending of the dirty pages ratio
+(write throttling in memory), since the writes to the real block devices are
+processed asynchronously by different kernel threads (pdflush). However, the
+dirty pages ratio is directly proportional to the actual I/O that will be
+performed on the real block device. So, due to the asynchronous transfers
+through the page cache, the I/O throttling in memory can be considered a form
+of anticipatory throttling to the underlying block devices.
+
+Multiple re-writes in already dirtied page cache areas are not considered for
+accounting the I/O activity. This is valid for multiple re-reads of pages
+already present in the page cache as well.
+
+This means that a process that re-writes and/or re-reads multiple times the
+same blocks in a file (without re-creating it by truncate(), ftrunctate(),
+creat(), etc.) is affected by the I/O limitations only for the actual I/O
+performed to (or from) the underlying block devices.
+
+Multiple rules for different block devices are stored in a rbtree, using the
+dev_t number of each block device as key. This allows to reduce the controller
+overhead on systems with many LUNs and different per-LUN I/O bandwidth rules
+(exploiting the worst case complexity of O(log n) for search operations in the
+rbtree structure).
+
+WARNING: per-block device limiting rules always refer to the dev_t device
+number. If a block device is unplugged (i.e. a USB device) the limiting rules
+associated to that device persist and they are still valid if a new device is
+plugged in the system and it uses the same major and minor numbers.
--
1.5.4.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-06 22:27 Andrea Righi
@ 2008-06-11 22:42 ` Randy Dunlap
2008-06-11 22:51 ` Andrea Righi
2008-06-18 15:16 ` Carl Henrik Lunde
1 sibling, 1 reply; 12+ messages in thread
From: Randy Dunlap @ 2008-06-11 22:42 UTC (permalink / raw)
To: Andrea Righi; +Cc: balbir, menage, matt, roberto, akpm, linux-kernel
On Sat, 7 Jun 2008 00:27:28 +0200 Andrea Righi wrote:
> Documentation of the block device I/O bandwidth controller: description, usage,
> advantages and design.
>
> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> ---
> Documentation/controllers/io-throttle.txt | 150 +++++++++++++++++++++++++++++
> 1 files changed, 150 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/controllers/io-throttle.txt
>
> diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
> new file mode 100644
> index 0000000..5373fa8
> --- /dev/null
> +++ b/Documentation/controllers/io-throttle.txt
> @@ -0,0 +1,150 @@
> +
> + Block device I/O bandwidth controller
> +
> +1. Description
> +
> +This controller allows to limit the I/O bandwidth of specific block devices for
> +specific process containers (cgroups) imposing additional delays on I/O
> +requests for those processes that exceed the limits defined in the control
> +group filesystem.
> +
> +Bandwidth limiting rules offer better control over QoS with respect to priority
> +or weight-based solutions that only give information about applications'
> +relative performance requirements.
> +
> +The goal of the I/O bandwidth controller is to improve performance
> +predictability and QoS of the different control groups sharing the same block
> +devices.
> +
> +NOTE: if you're looking for a way to improve the overall throughput of the
> +system probably you should use a different solution.
> +
> +2. User Interface
> +
> +A new I/O bandwidth limitation rule is described using the file
> +blockio.bandwidth.
> +
> +The same file can be used to set multiple rules for different block devices
> +relatively to the same cgroup.
relative
> +
> +The syntax is the following:
> +# /bin/echo DEVICE:BANDWIDTH > CGROUP/blockio.bandwidth
> +
> +- DEVICE is the name of the device the limiting rule is applied to,
> +- BANDWIDTH is the maximum I/O bandwidth on DEVICE allowed by CGROUP,
> +- CGROUP is the name of the limited process container.
Thanks.
---
~Randy
"'Daemon' is an old piece of jargon from the UNIX operating system,
where it referred to a piece of low-level utility software, a
fundamental part of the operating system."
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-11 22:42 ` Randy Dunlap
@ 2008-06-11 22:51 ` Andrea Righi
0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2008-06-11 22:51 UTC (permalink / raw)
To: Randy Dunlap; +Cc: balbir, menage, matt, roberto, akpm, linux-kernel
Randy Dunlap wrote:
> On Sat, 7 Jun 2008 00:27:28 +0200 Andrea Righi wrote:
>
>> Documentation of the block device I/O bandwidth controller: description, usage,
>> advantages and design.
>>
>> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
>> ---
>> Documentation/controllers/io-throttle.txt | 150 +++++++++++++++++++++++++++++
>> 1 files changed, 150 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/controllers/io-throttle.txt
>>
>> diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt
>> new file mode 100644
>> index 0000000..5373fa8
>> --- /dev/null
>> +++ b/Documentation/controllers/io-throttle.txt
>> @@ -0,0 +1,150 @@
>> +
>> + Block device I/O bandwidth controller
>> +
>> +1. Description
>> +
>> +This controller allows to limit the I/O bandwidth of specific block devices for
>> +specific process containers (cgroups) imposing additional delays on I/O
>> +requests for those processes that exceed the limits defined in the control
>> +group filesystem.
>> +
>> +Bandwidth limiting rules offer better control over QoS with respect to priority
>> +or weight-based solutions that only give information about applications'
>> +relative performance requirements.
>> +
>> +The goal of the I/O bandwidth controller is to improve performance
>> +predictability and QoS of the different control groups sharing the same block
>> +devices.
>> +
>> +NOTE: if you're looking for a way to improve the overall throughput of the
>> +system probably you should use a different solution.
>> +
>> +2. User Interface
>> +
>> +A new I/O bandwidth limitation rule is described using the file
>> +blockio.bandwidth.
>> +
>> +The same file can be used to set multiple rules for different block devices
>> +relatively to the same cgroup.
>
> relative
>
I will fix it in the next version.
Thanks again Randy.
-Andrea
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-06 22:27 Andrea Righi
2008-06-11 22:42 ` Randy Dunlap
@ 2008-06-18 15:16 ` Carl Henrik Lunde
2008-06-18 22:28 ` Andrea Righi
1 sibling, 1 reply; 12+ messages in thread
From: Carl Henrik Lunde @ 2008-06-18 15:16 UTC (permalink / raw)
To: Andrea Righi
Cc: balbir, menage, matt, roberto, randy.dunlap, akpm, linux-kernel
On Sat, Jun 7, 2008 at 00:27, Andrea Righi <righi.andrea@gmail.com> wrote:
[...]
> +3. Advantages of providing this feature
> +
> +* Allow QoS for block device I/O among different cgroups
I'm not sure if this can be called QoS, as it does not guarantee
anything but throttling?
> +* The bandwidth limitations are guaranteed both for synchronous and
> + asynchronous operations, even the I/O passing through the page cache or
> + buffers and not only direct I/O (see below for details)
The throttling does not seem to cover the I/O path for XFS?
I was unable to throttle processes reading from an XFS file system.
Also I think the name of the function cgroup_io_account is a bit too innocent?
It sounds like a inline function "{ io += bytes; }", not like
something which may sleep.
--
Carl Henrik
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1/3] i/o bandwidth controller documentation
2008-06-18 15:16 ` Carl Henrik Lunde
@ 2008-06-18 22:28 ` Andrea Righi
0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2008-06-18 22:28 UTC (permalink / raw)
To: Carl Henrik Lunde
Cc: balbir, menage, matt, roberto, randy.dunlap, akpm, linux-kernel
Carl Henrik Lunde wrote:
> On Sat, Jun 7, 2008 at 00:27, Andrea Righi <righi.andrea@gmail.com> wrote:
> [...]
>> +3. Advantages of providing this feature
>> +
>> +* Allow QoS for block device I/O among different cgroups
>
> I'm not sure if this can be called QoS, as it does not guarantee
> anything but throttling?
That's correct. There's nothing to guarantee minimum bandwidth levels
right now, the "QoS" is implemented only slowing down i/o "traffic" that
exceeds the limits (probably "i/o traffic shaping" is a better wording).
Minimum thresholds are supposed to be guaranteed if the user configures
a proper i/o bandwidth partitioning of the block devices shared among
the different cgroups (that could mean: the sum of all the single limits
for a device doesn't exceed the total i/o bandwidth of that device... at
least theoretically).
I'll try to clarify better this concept in the documentation that I'll
include in the next patchset version.
I'd also like to explore the io-throttle controller on-top-of other i/o
band controlling solutions (see for example:
http://lkml.org/lkml/2008/4/3/45), in order to exploit both the limiting
feature from io-throttle and use priority / fair queueing alghorithms to
guarantee minimum performance levels.
>> +* The bandwidth limitations are guaranteed both for synchronous and
>> + asynchronous operations, even the I/O passing through the page cache or
>> + buffers and not only direct I/O (see below for details)
>
> The throttling does not seem to cover the I/O path for XFS?
> I was unable to throttle processes reading from an XFS file system.
mmmh... works for me. Are you sure you've limited the correct block
device?
> Also I think the name of the function cgroup_io_account is a bit too innocent?
> It sounds like a inline function "{ io += bytes; }", not like
> something which may sleep.
Agree. What about cgroup_acct_and_throttle_io()? suggestions?
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-07-04 15:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200806201602.m5KG2Zx32671@inv.it.uc3m.es>
2008-06-20 16:11 ` [PATCH 1/3] i/o bandwidth controller documentation Peter T. Breuer
2008-07-04 15:35 ` Andrea Righi
2008-07-04 13:58 Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2008-06-20 10:05 Andrea Righi
2008-06-20 17:08 ` Randy Dunlap
2008-06-21 10:35 ` Andrea Righi
2008-06-22 16:03 ` Randy Dunlap
2008-06-06 22:27 Andrea Righi
2008-06-11 22:42 ` Randy Dunlap
2008-06-11 22:51 ` Andrea Righi
2008-06-18 15:16 ` Carl Henrik Lunde
2008-06-18 22:28 ` Andrea Righi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox