Query about anticipatory IO scheduler dynamic write batch size updation logic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: Jens Axboe <jens.axboe@oracle.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	npiggin@suse.de
Cc: Divyesh Shah <dpshah@google.com>, Nauman Rafique <nauman@google.com>
Subject: Query about anticipatory IO scheduler dynamic write batch size updation logic
Date: Thu, 28 May 2009 18:33:30 -0400	[thread overview]
Message-ID: <20090528223330.GC4335@redhat.com> (raw)

Hi Jens, Nick,

I am having a look at anticipatory dynamic write batch size updation logic
and can't understand it. Hence thought it is easier to ask.

IIUC, update_write_batch(), calculates how much time a write batch has
taken and if writes are taking more time than allocated batch size, then
it decreases the number of writes to issue from the write batch and vice
versa.

Currently, update_write_batch() waits for first read request to finish,
and then it comapres the current value of jiffies with "current_batch_expires".

I got two questions here.

- ad->current_batch_expires gets updated with the expiry time of next READ
  batch in as_completed_request() before we call update_write_batch().

  That means effectively we are making sure that as long as writes don't
  consume more than (WRITE + READ) batch size, things are fine.  I am
  guessing that this is not intentional and probably is the side affect
  of following patch.

	commit d585d0b9d73ed999cc7b8cf3cac4a5b01abb544e
	Author: Divyesh Shah <dpshah@google.com>
	Date:   Mon Jun 16 18:37:08 2008 +0200

    	block: Fix the starving writes bug in the anticipatory IO scheduler

- Why do we call update_write_batch() after completion of first read
  request. Shoulnd't it be called after last write request has completed
  from the write batch? Calling it after first read makes it unfair
  beacuse next read might cause a big seek time and we will account that
  into time taken by previous write batch?

For the first issue, I wrote a quick crude test patch as follows.

---
 block/as-iosched.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux12/block/as-iosched.c
===================================================================
--- linux12.orig/block/as-iosched.c	2009-04-30 00:48:16.000000000 -0400
+++ linux12/block/as-iosched.c	2009-05-28 17:43:31.000000000 -0400
@@ -103,6 +103,8 @@ struct as_data {
 	sector_t new_seek_mean;

 	unsigned long current_batch_expires;
+	unsigned long last_write_batch_expires; /* ideal expiry time of last
+						 * write batch */
 	unsigned long last_check_fifo[2];
 	int changed_batch;		/* 1: waiting for old batch to end */
 	int new_batch;			/* 1: waiting on first read complete */
@@ -811,7 +813,7 @@ static void update_write_batch(struct as
 	unsigned long batch = ad->batch_expire[BLK_RW_ASYNC];
 	long write_time;

-	write_time = (jiffies - ad->current_batch_expires) + batch;
+	write_time = (jiffies - ad->last_write_batch_expires) + batch;
 	if (write_time < 0)
 		write_time = 0;

@@ -847,6 +849,14 @@ static void as_completed_request(struct 
 	}

 	if (ad->changed_batch && ad->nr_dispatched == 1) {
+		/*
+		 * If this was write batch finishing, store the expiry time
+		 * so that it can be used to update write batch size when
+		 * next read request finishes.
+		 */
+		if (ad->batch_data_dir == BLK_RW_SYNC)
+			ad->last_write_batch_expires =
+						ad->current_batch_expires;
 		ad->current_batch_expires = jiffies +
 					ad->batch_expire[ad->batch_data_dir];
 		kblockd_schedule_work(q, &ad->antic_work);

I did a basic test with and without the patch.

I am having a SATA disk and running a reader and a writer with anticipatory
scheduler. I am reading a 2.1 G file and writting the same size file.
Following is my test script.

dd if=/mnt/sdb/testzerofile2 of=/dev/null &
reader=$!
echo "launched reader $reader"

dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 &
writer=$!
echo "launched writer $writer"

wait $reader
echo "reader finished"

With current AS (2.6.30-rc4), writer finishes first and gets more
bandwidth.

Without patch
=============

First run
=========
2147483648 bytes (2.1 GB) copied, 49.1937 s, 43.7 MB/s
2147483648 bytes (2.1 GB) copied, 54.8708 s, 39.1 MB/s
reader finished

Second run
==========
2147483648 bytes (2.1 GB) copied, 47.9976 s, 44.7 MB/s
2147483648 bytes (2.1 GB) copied, 58.8639 s, 36.5 MB/s
reader finished

With-patch
==========
First run
---------
2147483648 bytes (2.1 GB) copied, 45.4351 s, 47.3 MB/s
reader finished
2147483648 bytes (2.1 GB) copied, 50.6594 s, 42.4 MB/s

Second run
----------
2147483648 bytes (2.1 GB) copied, 45.8888 s, 46.8 MB/s
reader finished
2147483648 bytes (2.1 GB) copied, 51.1322 s, 42.0 MB/s

Note that without patch, writer wins the race and with patch, reader wins
the race. I am not sure what is the expected behavior from AS but I
thought at least I will ask the question.

Thanks
Vivek

                 reply	other threads:[~2009-05-28 22:33 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090528223330.GC4335@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dpshah@google.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox