public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add lld-congestion bit for backing_dev_info
@ 2008-09-19 14:47 Kiyoshi Ueda
  2008-09-19 14:48 ` [PATCH 1/2] lld busy status exporting interface Kiyoshi Ueda
  2008-09-19 14:49 ` [PATCH 2/2] scsi: exports busy status via bdi_lld_congested Kiyoshi Ueda
  0 siblings, 2 replies; 9+ messages in thread
From: Kiyoshi Ueda @ 2008-09-19 14:47 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-kernel, linux-scsi, dm-devel, j-nomura, k-ueda

Hi James,

The following patches exports busy-status of SCSI LLD to
request-based dm-multipath for proper I/O scheduling.
The patches are created on the following commit of scsi-misc-2.6.
---------------------------------------------------------------
commit 7d36ee8aefec15ab00ac08dc5611d6b5eab38d7f
Author: Andrew Vasquez <andrew.vasquez@qlogic.com>
Date:   Thu Sep 11 21:22:54 2008 -0700

    [SCSI] qla2xxx: Update version number to 8.02.01-k8.
---------------------------------------------------------------

No major changes since the last post (http://lkml.org/lkml/2008/9/12/100),
except for rebasing on top of scsi-misc-2.6 from 2.6.27-rc6.

Summary of the patches:
  1/2: lld busy status exporting interface
  2/2: scsi: exports busy status via bdi_lld_congested

Please review and apply.

The 1st patch changes generic part and not specific to scsi,
so I'm not sure whether it is appropriate to ask you about merging
this patch.
If I should push the 1st patch separately to other person,
please let me know.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] lld busy status exporting interface
  2008-09-19 14:47 [PATCH 0/2] Add lld-congestion bit for backing_dev_info Kiyoshi Ueda
@ 2008-09-19 14:48 ` Kiyoshi Ueda
  2008-09-19 21:33   ` Andrew Morton
  2008-09-19 14:49 ` [PATCH 2/2] scsi: exports busy status via bdi_lld_congested Kiyoshi Ueda
  1 sibling, 1 reply; 9+ messages in thread
From: Kiyoshi Ueda @ 2008-09-19 14:48 UTC (permalink / raw)
  To: James.Bottomley; +Cc: j-nomura, k-ueda, dm-devel, linux-kernel, linux-scsi

This patch adds an interface to check lld's busy status
from the block layer.
This resolves a performance problem on request stacking devices below.


Some drivers like scsi mid layer stop dispatching request when
they detect busy state on its low-level device like host/bus/device.
It allows other requests to stay in the I/O scheduler's queue
for a chance of merging.

Request stacking drivers like request-based dm should follow
the same logic.
However, there is no generic interface for the stacked device
to check if the underlying device(s) are busy.
If the request stacking driver dispatches and submits requests to
the busy underlying device, the requests will stay in
the underlying device's queue without a chance of merging.
This causes performance problem on burst I/O load.

With this patch, busy state of the underlying device is exported
via the state flag of queue's backing_dev_info.  So the request
stacking driver can check it and stop dispatching requests if busy.

The underlying device driver must set/clear the flag appropriately:
   ON:  when the device driver can't process requests immediately.
   OFF: when the device driver can process requests immediately,
        including abnormal situations where the device driver needs
        to kill all requests.


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 include/linux/backing-dev.h |    8 ++++++++
 mm/backing-dev.c            |   12 ++++++++++++
 2 files changed, 20 insertions(+)

Index: scsi-misc-2.6/include/linux/backing-dev.h
===================================================================
--- scsi-misc-2.6.orig/include/linux/backing-dev.h
+++ scsi-misc-2.6/include/linux/backing-dev.h
@@ -26,6 +26,7 @@ enum bdi_state {
 	BDI_pdflush,		/* A pdflush thread is working this device */
 	BDI_write_congested,	/* The write queue is getting full */
 	BDI_read_congested,	/* The read queue is getting full */
+	BDI_lld_congested,	/* The device/host is busy */
 	BDI_unused,		/* Available bits start here */
 };
 
@@ -226,8 +227,15 @@ static inline int bdi_rw_congested(struc
 				  (1 << BDI_write_congested));
 }
 
+static inline int bdi_lld_congested(struct backing_dev_info *bdi)
+{
+	return bdi_congested(bdi, 1 << BDI_lld_congested);
+}
+
 void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
 void set_bdi_congested(struct backing_dev_info *bdi, int rw);
+void clear_bdi_lld_congested(struct backing_dev_info *bdi);
+void set_bdi_lld_congested(struct backing_dev_info *bdi);
 long congestion_wait(int rw, long timeout);
 
 
Index: scsi-misc-2.6/mm/backing-dev.c
===================================================================
--- scsi-misc-2.6.orig/mm/backing-dev.c
+++ scsi-misc-2.6/mm/backing-dev.c
@@ -279,6 +279,18 @@ void set_bdi_congested(struct backing_de
 }
 EXPORT_SYMBOL(set_bdi_congested);
 
+void clear_bdi_lld_congested(struct backing_dev_info *bdi)
+{
+	clear_bit(BDI_lld_congested, &bdi->state);
+}
+EXPORT_SYMBOL_GPL(clear_bdi_lld_congested);
+
+void set_bdi_lld_congested(struct backing_dev_info *bdi)
+{
+	set_bit(BDI_lld_congested, &bdi->state);
+}
+EXPORT_SYMBOL_GPL(set_bdi_lld_congested);
+
 /**
  * congestion_wait - wait for a backing_dev to become uncongested
  * @rw: READ or WRITE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] scsi: exports busy status via bdi_lld_congested
  2008-09-19 14:47 [PATCH 0/2] Add lld-congestion bit for backing_dev_info Kiyoshi Ueda
  2008-09-19 14:48 ` [PATCH 1/2] lld busy status exporting interface Kiyoshi Ueda
@ 2008-09-19 14:49 ` Kiyoshi Ueda
  2008-09-19 16:06   ` Mike Christie
  1 sibling, 1 reply; 9+ messages in thread
From: Kiyoshi Ueda @ 2008-09-19 14:49 UTC (permalink / raw)
  To: James.Bottomley; +Cc: j-nomura, k-ueda, dm-devel, linux-kernel, linux-scsi

This patch change scsi mid layer to export its busy status.
Not set the busy flag, when scsi can't dispatch I/Os anymore and
needs to kill I/Os.  Otherwise, request stacking drivers may hold
requests forever.


Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 drivers/scsi/scsi.c     |    4 ++--
 drivers/scsi/scsi_lib.c |   23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 3 deletions(-)

Index: scsi-misc-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-misc-2.6.orig/drivers/scsi/scsi_lib.c
+++ scsi-misc-2.6/drivers/scsi/scsi_lib.c
@@ -459,17 +459,30 @@ static void scsi_init_cmd_errh(struct sc
 
 void scsi_device_unbusy(struct scsi_device *sdev)
 {
+	int host_congested;
 	struct Scsi_Host *shost = sdev->host;
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->host_busy--;
+	if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
+	    shost->host_blocked || shost->host_self_blocked ||
+	    scsi_host_in_recovery(shost))
+		host_congested = 1;
+	else
+		host_congested = 0;
+
 	if (unlikely(scsi_host_in_recovery(shost) &&
 		     (shost->host_failed || shost->host_eh_scheduled)))
 		scsi_eh_wakeup(shost);
 	spin_unlock(shost->host_lock);
+
 	spin_lock(sdev->request_queue->queue_lock);
 	sdev->device_busy--;
+	if (bdi_lld_congested(&sdev->request_queue->backing_dev_info) &&
+	    !host_congested && sdev->device_busy < sdev->queue_depth &&
+	    !sdev->device_blocked)
+		clear_bdi_lld_congested(&sdev->request_queue->backing_dev_info);
 	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
 }
 
@@ -1496,9 +1509,14 @@ static void scsi_request_fn(struct reque
 		 * accept it.
 		 */
 		req = elv_next_request(q);
-		if (!req || !scsi_dev_queue_ready(q, sdev))
+		if (!req)
 			break;
 
+		if (!scsi_dev_queue_ready(q, sdev)) {
+			set_bdi_lld_congested(&q->backing_dev_info);
+			break;
+		}
+
 		if (unlikely(!scsi_device_online(sdev))) {
 			sdev_printk(KERN_ERR, sdev,
 				    "rejecting I/O to offline device\n");
@@ -1569,6 +1587,8 @@ static void scsi_request_fn(struct reque
 		rtn = scsi_dispatch_cmd(cmd);
 		spin_lock_irq(q->queue_lock);
 		if(rtn) {
+			set_bdi_lld_congested(&q->backing_dev_info);
+
 			/* we're refusing the command; because of
 			 * the way locks get dropped, we need to 
 			 * check here if plugging is required */
@@ -1593,6 +1613,7 @@ static void scsi_request_fn(struct reque
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
+	set_bdi_lld_congested(&q->backing_dev_info);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
Index: scsi-misc-2.6/drivers/scsi/scsi.c
===================================================================
--- scsi-misc-2.6.orig/drivers/scsi/scsi.c
+++ scsi-misc-2.6/drivers/scsi/scsi.c
@@ -862,8 +862,6 @@ void scsi_finish_command(struct scsi_cmn
 	struct scsi_driver *drv;
 	unsigned int good_bytes;
 
-	scsi_device_unbusy(sdev);
-
         /*
          * Clear the flags which say that the device/host is no longer
          * capable of accepting new commands.  These are set in scsi_queue.c
@@ -875,6 +873,8 @@ void scsi_finish_command(struct scsi_cmn
         shost->host_blocked = 0;
         sdev->device_blocked = 0;
 
+	scsi_device_unbusy(sdev);
+
 	/*
 	 * If we have valid sense information, then some kind of recovery
 	 * must have taken place.  Make a note of this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] scsi: exports busy status via bdi_lld_congested
  2008-09-19 14:49 ` [PATCH 2/2] scsi: exports busy status via bdi_lld_congested Kiyoshi Ueda
@ 2008-09-19 16:06   ` Mike Christie
  2008-09-19 17:28     ` Kiyoshi Ueda
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Christie @ 2008-09-19 16:06 UTC (permalink / raw)
  To: device-mapper development
  Cc: James.Bottomley, j-nomura, linux-kernel, linux-scsi, k-ueda

Kiyoshi Ueda wrote:
> This patch change scsi mid layer to export its busy status.
> Not set the busy flag, when scsi can't dispatch I/Os anymore and
> needs to kill I/Os.  Otherwise, request stacking drivers may hold
> requests forever.
> 
> 
> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> ---
>  drivers/scsi/scsi.c     |    4 ++--
>  drivers/scsi/scsi_lib.c |   23 ++++++++++++++++++++++-
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> Index: scsi-misc-2.6/drivers/scsi/scsi_lib.c
> ===================================================================
> --- scsi-misc-2.6.orig/drivers/scsi/scsi_lib.c
> +++ scsi-misc-2.6/drivers/scsi/scsi_lib.c
> @@ -459,17 +459,30 @@ static void scsi_init_cmd_errh(struct sc
>  
>  void scsi_device_unbusy(struct scsi_device *sdev)
>  {
> +	int host_congested;
>  	struct Scsi_Host *shost = sdev->host;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	shost->host_busy--;
> +	if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
> +	    shost->host_blocked || shost->host_self_blocked ||
> +	    scsi_host_in_recovery(shost))
> +		host_congested = 1;
> +	else
> +		host_congested = 0;
> +

You might want to check if the starget busy too (scsi-misc has this but 
Jens's tree and linus do not)? The code below that I snipped from 
scsi_request_fn will set the congested bits if scsi_target_queue_ready 
returns 0, so above here you would want to clear it if it not congested 
anymore.

>  	if (unlikely(scsi_host_in_recovery(shost) &&
>  		     (shost->host_failed || shost->host_eh_scheduled)))
>  		scsi_eh_wakeup(shost);
>  	spin_unlock(shost->host_lock);
> +
>  	spin_lock(sdev->request_queue->queue_lock);
>  	sdev->device_busy--;
> +	if (bdi_lld_congested(&sdev->request_queue->backing_dev_info) &&
> +	    !host_congested && sdev->device_busy < sdev->queue_depth &&
> +	    !sdev->device_blocked)
> +		clear_bdi_lld_congested(&sdev->request_queue->backing_dev_info);
>  	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
>  }
>  



> @@ -1593,6 +1613,7 @@ static void scsi_request_fn(struct reque
>  	 * later time.
>  	 */
>  	spin_lock_irq(q->queue_lock);
> +	set_bdi_lld_congested(&q->backing_dev_info);
>  	blk_requeue_request(q, req);
>  	sdev->device_busy--;
>  	if(sdev->device_busy == 0)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] scsi: exports busy status via bdi_lld_congested
  2008-09-19 16:06   ` Mike Christie
@ 2008-09-19 17:28     ` Kiyoshi Ueda
  0 siblings, 0 replies; 9+ messages in thread
From: Kiyoshi Ueda @ 2008-09-19 17:28 UTC (permalink / raw)
  To: michaelc
  Cc: k-ueda, linux-scsi, linux-kernel, James.Bottomley, dm-devel,
	j-nomura

Hi Mike,

On Fri, 19 Sep 2008 11:06:04 -0500, Mike Christie wrote:
> Kiyoshi Ueda wrote:
> > This patch change scsi mid layer to export its busy status.
> > Not set the busy flag, when scsi can't dispatch I/Os anymore and
> > needs to kill I/Os.  Otherwise, request stacking drivers may hold
> > requests forever.
> > 
> > 
> > Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
> > Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> > Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> > ---
> >  drivers/scsi/scsi.c     |    4 ++--
> >  drivers/scsi/scsi_lib.c |   23 ++++++++++++++++++++++-
> >  2 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > Index: scsi-misc-2.6/drivers/scsi/scsi_lib.c
> > ===================================================================
> > --- scsi-misc-2.6.orig/drivers/scsi/scsi_lib.c
> > +++ scsi-misc-2.6/drivers/scsi/scsi_lib.c
> > @@ -459,17 +459,30 @@ static void scsi_init_cmd_errh(struct sc
> >  
> >  void scsi_device_unbusy(struct scsi_device *sdev)
> >  {
> > +	int host_congested;
> >  	struct Scsi_Host *shost = sdev->host;
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(shost->host_lock, flags);
> >  	shost->host_busy--;
> > +	if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
> > +	    shost->host_blocked || shost->host_self_blocked ||
> > +	    scsi_host_in_recovery(shost))
> > +		host_congested = 1;
> > +	else
> > +		host_congested = 0;
> > +
> 
> You might want to check if the starget busy too (scsi-misc has this but 
> Jens's tree and linus do not)? The code below that I snipped from 
> scsi_request_fn will set the congested bits if scsi_target_queue_ready 
> returns 0, so above here you would want to clear it if it not congested 
> anymore.

Yes, thank you for noticing that.
Since I was looking at scsi-misc and your target patch is in
scsi-post-merge, I overlooked it.
I'll rebase my patches and repost again.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] lld busy status exporting interface
  2008-09-19 14:48 ` [PATCH 1/2] lld busy status exporting interface Kiyoshi Ueda
@ 2008-09-19 21:33   ` Andrew Morton
  2008-09-19 23:11     ` Kiyoshi Ueda
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-09-19 21:33 UTC (permalink / raw)
  Cc: k-ueda, linux-scsi, linux-kernel, James.Bottomley, dm-devel,
	j-nomura

On Fri, 19 Sep 2008 10:48:54 -0400 (EDT)
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> This patch adds an interface to check lld's busy status
> from the block layer.
> This resolves a performance problem on request stacking devices below.
> 
> 
> Some drivers like scsi mid layer stop dispatching request when
> they detect busy state on its low-level device like host/bus/device.
> It allows other requests to stay in the I/O scheduler's queue
> for a chance of merging.
> 
> Request stacking drivers like request-based dm should follow
> the same logic.
> However, there is no generic interface for the stacked device
> to check if the underlying device(s) are busy.
> If the request stacking driver dispatches and submits requests to
> the busy underlying device, the requests will stay in
> the underlying device's queue without a chance of merging.
> This causes performance problem on burst I/O load.
> 
> With this patch, busy state of the underlying device is exported
> via the state flag of queue's backing_dev_info.  So the request
> stacking driver can check it and stop dispatching requests if busy.
> 
> The underlying device driver must set/clear the flag appropriately:
>    ON:  when the device driver can't process requests immediately.
>    OFF: when the device driver can process requests immediately,
>         including abnormal situations where the device driver needs
>         to kill all requests.
> 
> 
> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> ---
>  include/linux/backing-dev.h |    8 ++++++++
>  mm/backing-dev.c            |   12 ++++++++++++
>  2 files changed, 20 insertions(+)
> 
> Index: scsi-misc-2.6/include/linux/backing-dev.h
> ===================================================================
> --- scsi-misc-2.6.orig/include/linux/backing-dev.h
> +++ scsi-misc-2.6/include/linux/backing-dev.h
> @@ -26,6 +26,7 @@ enum bdi_state {
>  	BDI_pdflush,		/* A pdflush thread is working this device */
>  	BDI_write_congested,	/* The write queue is getting full */
>  	BDI_read_congested,	/* The read queue is getting full */
> +	BDI_lld_congested,	/* The device/host is busy */
>  	BDI_unused,		/* Available bits start here */
>  };
>  
> @@ -226,8 +227,15 @@ static inline int bdi_rw_congested(struc
>  				  (1 << BDI_write_congested));
>  }
>  
> +static inline int bdi_lld_congested(struct backing_dev_info *bdi)
> +{
> +	return bdi_congested(bdi, 1 << BDI_lld_congested);
> +}
> +
>  void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
>  void set_bdi_congested(struct backing_dev_info *bdi, int rw);
> +void clear_bdi_lld_congested(struct backing_dev_info *bdi);
> +void set_bdi_lld_congested(struct backing_dev_info *bdi);
>  long congestion_wait(int rw, long timeout);
>  
>  
> Index: scsi-misc-2.6/mm/backing-dev.c
> ===================================================================
> --- scsi-misc-2.6.orig/mm/backing-dev.c
> +++ scsi-misc-2.6/mm/backing-dev.c
> @@ -279,6 +279,18 @@ void set_bdi_congested(struct backing_de
>  }
>  EXPORT_SYMBOL(set_bdi_congested);
>  
> +void clear_bdi_lld_congested(struct backing_dev_info *bdi)
> +{
> +	clear_bit(BDI_lld_congested, &bdi->state);
> +}
> +EXPORT_SYMBOL_GPL(clear_bdi_lld_congested);
> +
> +void set_bdi_lld_congested(struct backing_dev_info *bdi)
> +{
> +	set_bit(BDI_lld_congested, &bdi->state);
> +}
> +EXPORT_SYMBOL_GPL(set_bdi_lld_congested);
> +
>  /**
>   * congestion_wait - wait for a backing_dev to become uncongested
>   * @rw: READ or WRITE

Is this really the right way to do it?

Back in the days when we first did the backing_dev_info.congested_fn()
logic it was decided that there basically was no single place in which
the congested state could be stored.

So we ended up deciding that whenever a caller wants to know a
backing_dev's congested status, it has to call in to the
->congested_fn() and that congested_fn would then call down into all
the constituent low-level drivers/queues/etc asking each one if it is
congested.

I mean, as a simple example which is probably wrong - what happens if a
single backing_dev is implemented via two different disks and
controllers, and they both become congested and then one of them comes
uncongested.  Is there no way in which the above implemention can
incorrectly flag the backing_dev as being uncongested?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] lld busy status exporting interface
  2008-09-19 21:33   ` Andrew Morton
@ 2008-09-19 23:11     ` Kiyoshi Ueda
  2008-09-19 23:45       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Kiyoshi Ueda @ 2008-09-19 23:11 UTC (permalink / raw)
  To: akpm; +Cc: James.Bottomley, linux-kernel, linux-scsi, dm-devel, j-nomura,
	k-ueda

Hi Andrew,

On Fri, 19 Sep 2008 14:33:44 -0700, Andrew Morton wrote:
> On Fri, 19 Sep 2008 10:48:54 -0400 (EDT)
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> 
> > This patch adds an interface to check lld's busy status
> > from the block layer.
> > This resolves a performance problem on request stacking devices below.
> > 
> > 
> > Some drivers like scsi mid layer stop dispatching request when
> > they detect busy state on its low-level device like host/bus/device.
> > It allows other requests to stay in the I/O scheduler's queue
> > for a chance of merging.
> > 
> > Request stacking drivers like request-based dm should follow
> > the same logic.
> > However, there is no generic interface for the stacked device
> > to check if the underlying device(s) are busy.
> > If the request stacking driver dispatches and submits requests to
> > the busy underlying device, the requests will stay in
> > the underlying device's queue without a chance of merging.
> > This causes performance problem on burst I/O load.
> > 
> > With this patch, busy state of the underlying device is exported
> > via the state flag of queue's backing_dev_info.  So the request
> > stacking driver can check it and stop dispatching requests if busy.
> > 
> > The underlying device driver must set/clear the flag appropriately:
> >    ON:  when the device driver can't process requests immediately.
> >    OFF: when the device driver can process requests immediately,
> >         including abnormal situations where the device driver needs
> >         to kill all requests.
> > 
> > 
> > Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
> > Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> > Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
> > ---
> >  include/linux/backing-dev.h |    8 ++++++++
> >  mm/backing-dev.c            |   12 ++++++++++++
> >  2 files changed, 20 insertions(+)
> > 
> > Index: scsi-misc-2.6/include/linux/backing-dev.h
> > ===================================================================
> > --- scsi-misc-2.6.orig/include/linux/backing-dev.h
> > +++ scsi-misc-2.6/include/linux/backing-dev.h
> > @@ -26,6 +26,7 @@ enum bdi_state {
> >  	BDI_pdflush,		/* A pdflush thread is working this device */
> >  	BDI_write_congested,	/* The write queue is getting full */
> >  	BDI_read_congested,	/* The read queue is getting full */
> > +	BDI_lld_congested,	/* The device/host is busy */
> >  	BDI_unused,		/* Available bits start here */
> >  };
> >  
> > @@ -226,8 +227,15 @@ static inline int bdi_rw_congested(struc
> >  				  (1 << BDI_write_congested));
> >  }
> >  
> > +static inline int bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	return bdi_congested(bdi, 1 << BDI_lld_congested);
> > +}
> > +
> >  void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
> >  void set_bdi_congested(struct backing_dev_info *bdi, int rw);
> > +void clear_bdi_lld_congested(struct backing_dev_info *bdi);
> > +void set_bdi_lld_congested(struct backing_dev_info *bdi);
> >  long congestion_wait(int rw, long timeout);
> >  
> >  
> > Index: scsi-misc-2.6/mm/backing-dev.c
> > ===================================================================
> > --- scsi-misc-2.6.orig/mm/backing-dev.c
> > +++ scsi-misc-2.6/mm/backing-dev.c
> > @@ -279,6 +279,18 @@ void set_bdi_congested(struct backing_de
> >  }
> >  EXPORT_SYMBOL(set_bdi_congested);
> >  
> > +void clear_bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	clear_bit(BDI_lld_congested, &bdi->state);
> > +}
> > +EXPORT_SYMBOL_GPL(clear_bdi_lld_congested);
> > +
> > +void set_bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	set_bit(BDI_lld_congested, &bdi->state);
> > +}
> > +EXPORT_SYMBOL_GPL(set_bdi_lld_congested);
> > +
> >  /**
> >   * congestion_wait - wait for a backing_dev to become uncongested
> >   * @rw: READ or WRITE
> 
> Is this really the right way to do it?

I think so, but I may not understand what you mean correctly.
So please elaborate your concern if my explanation below doesn't
satisfy what you want to know.


> Back in the days when we first did the backing_dev_info.congested_fn()
> logic it was decided that there basically was no single place in which
> the congested state could be stored.
> 
> So we ended up deciding that whenever a caller wants to know a
> backing_dev's congested status, it has to call in to the
> ->congested_fn() and that congested_fn would then call down into all
> the constituent low-level drivers/queues/etc asking each one if it is
> congested.

bdi_lld_congested() also does that using bdi_congested(), which calls
->congested_fn().
And only real device drivers (e.g. scsi, ide) set/clear the flag.
Stacking drivers like request-based dm don't.
So stacking drivers always check the BDI_lld_congested flag of
the bottom device of the device stack.

BDI_[write|read]_congested flags have been using for queue's
congestion, so that I/O queueing/merging can be proceeded even if
the lld is congested.  So I added a new flag.


> I mean, as a simple example which is probably wrong - what happens if a
> single backing_dev is implemented via two different disks and
> controllers, and they both become congested and then one of them comes
> uncongested.  Is there no way in which the above implemention can
> incorrectly flag the backing_dev as being uncongested?

Do you mean that "a single backing_dev via two disks/controllers" is
a dm device (e.g. a dm-multipath device having 2 paths, a dm-mirror
device having 2 disks)?

If so, dm doesn't set/clear the flag, and the decision, whether
the dm device itself is congested or not, depends on dm's target driver.

For example of dm-multipath,
  o call bdi_lld_congested() for each path.
  o if one of the paths are uncongested, dm-multipath will decide
    the dm device is uncongested and dispatch incoming I/Os to
    the uncongested path.

For example of dm-mirror,
  o call bdi_lld_congested() for each disk.
  o if the incoming I/O is READ, same logic as dm-multipath above.
    if the incoming I/O is WRITE, dm-mirror will decide the dm device
    is uncongested only when all disks are uncongested.

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] lld busy status exporting interface
  2008-09-19 23:11     ` Kiyoshi Ueda
@ 2008-09-19 23:45       ` Andrew Morton
  2008-09-22  4:49         ` Jun'ichi Nomura (NEC)
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-09-19 23:45 UTC (permalink / raw)
  Cc: James.Bottomley, linux-kernel, linux-scsi, dm-devel, j-nomura,
	k-ueda

On Fri, 19 Sep 2008 19:11:22 -0400 (EDT)
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:

> > Back in the days when we first did the backing_dev_info.congested_fn()
> > logic it was decided that there basically was no single place in which
> > the congested state could be stored.
> > 
> > So we ended up deciding that whenever a caller wants to know a
> > backing_dev's congested status, it has to call in to the
> > ->congested_fn() and that congested_fn would then call down into all
> > the constituent low-level drivers/queues/etc asking each one if it is
> > congested.
> 
> bdi_lld_congested() also does that using bdi_congested(), which calls
> ->congested_fn().
> And only real device drivers (e.g. scsi, ide) set/clear the flag.
> Stacking drivers like request-based dm don't.

umm, OK, that should work.

> So stacking drivers always check the BDI_lld_congested flag of
> the bottom device of the device stack.

How does a stacking driver know that the backing_device which it is
looking at is a "lowest level" device?

I don't think it does - only the code which implements that device
knows this, so the stacking driver has to call into that device's
congested_fn(), yes?

In which case one wonders why the state was stored in the
backing_dev_info at all.  Why not store it in the device-private data
to avoid confusion and abuse?

> BDI_[write|read]_congested flags have been using for queue's
> congestion, so that I/O queueing/merging can be proceeded even if
> the lld is congested.  So I added a new flag.

iirc, BDI_read/write_congested predated the introduction of the
congested_fn() and perhaps should have been removed once we went to the
congested_fn approach.  But it's been a while since I spent a lot of
time looking in there.

> 
> > I mean, as a simple example which is probably wrong - what happens if a
> > single backing_dev is implemented via two different disks and
> > controllers, and they both become congested and then one of them comes
> > uncongested.  Is there no way in which the above implemention can
> > incorrectly flag the backing_dev as being uncongested?
> 
> Do you mean that "a single backing_dev via two disks/controllers" is
> a dm device (e.g. a dm-multipath device having 2 paths, a dm-mirror
> device having 2 disks)?

Something along those lines, sure.

> If so, dm doesn't set/clear the flag, and the decision, whether
> the dm device itself is congested or not, depends on dm's target driver.
> 
> For example of dm-multipath,
>   o call bdi_lld_congested() for each path.
>   o if one of the paths are uncongested, dm-multipath will decide
>     the dm device is uncongested and dispatch incoming I/Os to
>     the uncongested path.

hm, OK.

> For example of dm-mirror,
>   o call bdi_lld_congested() for each disk.
>   o if the incoming I/O is READ, same logic as dm-multipath above.
>     if the incoming I/O is WRITE, dm-mirror will decide the dm device
>     is uncongested only when all disks are uncongested.
> 
> Thanks,
> Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] lld busy status exporting interface
  2008-09-19 23:45       ` Andrew Morton
@ 2008-09-22  4:49         ` Jun'ichi Nomura (NEC)
  0 siblings, 0 replies; 9+ messages in thread
From: Jun'ichi Nomura (NEC) @ 2008-09-22  4:49 UTC (permalink / raw)
  To: Andrew Morton, Jens Axboe, James.Bottomley
  Cc: Kiyoshi Ueda, dm-devel, linux-kernel, linux-scsi

Hi Andrew,
(and James, Jens, please let us know your opinions on the possible
 changes described below)

Andrew Morton wrote:
> On Fri, 19 Sep 2008 19:11:22 -0400 (EDT)
> Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote:
> 
>>> Back in the days when we first did the backing_dev_info.congested_fn()
>>> logic it was decided that there basically was no single place in which
>>> the congested state could be stored.
>>>
>>> So we ended up deciding that whenever a caller wants to know a
>>> backing_dev's congested status, it has to call in to the
>>> ->congested_fn() and that congested_fn would then call down into all
>>> the constituent low-level drivers/queues/etc asking each one if it is
>>> congested.
>> bdi_lld_congested() also does that using bdi_congested(), which calls
>> ->congested_fn().
>> And only real device drivers (e.g. scsi, ide) set/clear the flag.
>> Stacking drivers like request-based dm don't.
> 
> umm, OK, that should work.
> 
>> So stacking drivers always check the BDI_lld_congested flag of
>> the bottom device of the device stack.
> 
> How does a stacking driver know that the backing_device which it is
> looking at is a "lowest level" device?
> 
> I don't think it does - only the code which implements that device
> knows this, so the stacking driver has to call into that device's
> congested_fn(), yes?

Yes. So the stacking driver calls bdi_congested, which calls
the underlying device's congested_fn if exists, and eventually
checks the bottom device's congestion state.
Translation of multiple devices' congestion status is done by
the congested_fn of the stacking device.
E.g. dm-multipath returns 'not congested' if any of its paths are not
congested.

> In which case one wonders why the state was stored in the
> backing_dev_info at all.  Why not store it in the device-private data
> to avoid confusion and abuse?

It should be possible.
We've just followed the existing scheme of BDI_{read,write}congested
because of their similarity.

I would like to know which part of the patch was your concern:
  1) Exposing set/clear_bdi_lld_congested without explicit comments
     that says they should be used only by bottom-level devices
  2) A new bdi_state, BDI_lld_congested
  3) Use of backing_dev_info for this purpose

If 1), either or both of the followings can be easily done:
  [a] Add a comment in backing-dev.h that says set/clear_bdi_lld_congested
      should be called only from the bottom device
  [b] Move set/clear_bdi_lld_congested from mm/backing-dev.c
      to block/blk-core.c, with renaming to blk_set/clear_lld_congested,
      so that only a block device that knows what it does will set/clear
      the flag

If 2) or 3), I think we need to rewrite the patch in either way of these:
  [c] Add a new strategy function to request_queue and use it instead,
      e.g. q->lld_busy_fn, which is NULL by default.
      Set/clear QUEUE_FLAG_BUSY in request_queue by the bottom device,
      and the block layer will check the flag if q->lld_busy_fn is NULL.
  [d] Similar to [c], except that storing the busy flag in struct scsi_device
      and q->lld_busy_fn() of the scsi device will check that.
      If q->lld_busy_fn is NULL, the block layer will just return
      'not congested'.

Which do you think is better?

>> BDI_[write|read]_congested flags have been using for queue's
>> congestion, so that I/O queueing/merging can be proceeded even if
>> the lld is congested.  So I added a new flag.
> 
> iirc, BDI_read/write_congested predated the introduction of the
> congested_fn() and perhaps should have been removed once we went to the
> congested_fn approach.  But it's been a while since I spent a lot of
> time looking in there.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-09-22  4:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-19 14:47 [PATCH 0/2] Add lld-congestion bit for backing_dev_info Kiyoshi Ueda
2008-09-19 14:48 ` [PATCH 1/2] lld busy status exporting interface Kiyoshi Ueda
2008-09-19 21:33   ` Andrew Morton
2008-09-19 23:11     ` Kiyoshi Ueda
2008-09-19 23:45       ` Andrew Morton
2008-09-22  4:49         ` Jun'ichi Nomura (NEC)
2008-09-19 14:49 ` [PATCH 2/2] scsi: exports busy status via bdi_lld_congested Kiyoshi Ueda
2008-09-19 16:06   ` Mike Christie
2008-09-19 17:28     ` Kiyoshi Ueda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox