Re: [PATCH 1/2] block: fix leaks associated with discard request payload

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
       [not found] ` <1277582211-10725-1-git-send-email-snitzer@redhat.com>
@ 2010-06-27 15:29   ` James Bottomley
  2010-06-28 17:16     ` Martin K. Petersen
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: James Bottomley @ 2010-06-27 15:29 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, axboe, dm-devel, linux-kernel, martin.petersen,
	akpm, linux-scsi, FUJITA Tomonori

linux-scsi cc added, since it's a SCSI patch.

On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> Fix leaks introduced via "block: don't allocate a payload for discard
> request" commit a1d949f5f44.
> 
> sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> discard request's payload directly in scsi_finish_command().
> 
> Also cleanup page allocated for discard payload in
> scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-core.c       |   23 +++++++++++++++++++++++
>  drivers/scsi/scsi.c    |    8 ++++++++
>  drivers/scsi/sd.c      |   18 ++++++++----------
>  include/linux/blkdev.h |    1 +
>  4 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 98b4cee..07925aa 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
>  }
>  EXPORT_SYMBOL_GPL(blk_add_request_payload);
>  
> +/**
> + * blk_clear_request_payload - clear a request's payload
> + * @rq: request to update
> + *
> + * The driver needs to take care of freeing the payload itself.
> + */
> +void blk_clear_request_payload(struct request *rq)
> +{
> +	struct bio *bio = rq->bio;
> +
> +	rq->__data_len = rq->resid_len = 0;
> +	rq->nr_phys_segments = 0;
> +	rq->buffer = NULL;
> +
> +	bio->bi_size = 0;
> +	bio->bi_vcnt = 0;
> +	bio->bi_phys_segments = 0;
> +
> +	bio->bi_io_vec->bv_page = NULL;
> +	bio->bi_io_vec->bv_len = 0;
> +}
> +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> +
>  void init_request_from_bio(struct request *req, struct bio *bio)
>  {
>  	req->cpu = bio->bi_comp_cpu;
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index ad0ed21..69c7ea4 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
>  		 */
>  		if (good_bytes == old_good_bytes)
>  			good_bytes -= scsi_get_resid(cmd);
> +	} else if (cmd->request->cmd_flags & REQ_DISCARD) {
> +		/*
> +		 * If this is a discard request that originated from the kernel
> +		 * we need to free our payload here.  Note that we need to check
> +		 * the request flag as the normal payload rules apply for
> +		 * pass-through UNMAP / WRITE SAME requests.
> +		 */
> +		__free_page(bio_page(cmd->request->bio));

This is another layering violation:  the page is allocated in the Upper
layer and freed in the mid-layer.

I really hate these growing contortions for discard.  They're a clear
signal that we haven't implemented it right.

So let's first work out how it should be done.  I really like Tomo's
idea of doing discard through the normal REQ_TYPE_FS route, which means
we can control the setup in prep and the tear down in done, all confined
to the ULD.

Where I think I'm at is partially what Christoph says:  The command
transformation belongs in the ULD so that's where the allocation and
deallocation should be, and partly what Tomo says in that we should
eliminate the special case paths.

The payload vs actual request size should be a red herring if we've got
everything correct:  only the ULD cares about the request parameters.
Once we've got everything set up, the mid layer and LLD should only care
about the parameters in the command, so we can confine the size changing
part to the ULD doing the discard.

Could someone take a stab at this and see if it works without layering
violations or any other problematic signals?

Thanks,

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-27 15:29   ` [PATCH 1/2] block: fix leaks associated with discard request payload James Bottomley
@ 2010-06-28 17:16     ` Martin K. Petersen
  2010-06-29  8:00       ` Boaz Harrosh
  2010-06-29 22:28     ` [dm-devel] " Mikulas Patocka
  2010-06-30  8:32     ` Boaz Harrosh
  2 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2010-06-28 17:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Snitzer, Christoph Hellwig, axboe, dm-devel, linux-kernel,
	martin.petersen, akpm, linux-scsi, FUJITA Tomonori

>>>>> "James" == James Bottomley <James.Bottomley@suse.de> writes:

James> I really hate these growing contortions for discard.  They're a
James> clear signal that we haven't implemented it right.

James> So let's first work out how it should be done.  I really like
James> Tomo's idea of doing discard through the normal REQ_TYPE_FS
James> route, which means we can control the setup in prep and the tear
James> down in done, all confined to the ULD.

Yeah, this is what I was trying to do a couple of months ago.  Trying to
make discard and write same filesystem class requests so we can split,
merge, etc. like READs and WRITEs.  I still think this is how we should
do it but it's a lot of work.

There are several challenges involved.  I was doing the "payload"
allocation at request allocation time by permitting a buffer trailing
struct request (size defined by ULD depending on req type).  However, we
have a few places in the stack where we memcpy requests and assume them
to be the same size.  That needs to be fixed.  That's also the roadblock
I ran into wrt. 32-byte CDB allocation so for that I ended up allocating
the command in sd.

Also, another major headache of mine is WRITE SAME/UNMAP to DSM TRIM
conversion.  Because of the limitations of the TRIM command format a
single WRITE SAME can turn into effectively hundreds of TRIM commands to
be issued.  I tried to limit this by using UNMAP translation instead.
But we can still get into cases where we need to either loop or allocate
a bunch of TRIMs in the translation layer.  That leaves two options:
Either pass really conservative limits up the stack and loop up there.
Or deal with the allocation/translation stuff at the bottom of the pile.
None of my attempts in these departments turned out to be very nice.
I'm still dreaming of the day where libata moves out from under SCSI so
we don't have to translate square pegs into round holes...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-28 17:16     ` Martin K. Petersen
@ 2010-06-29  8:00       ` Boaz Harrosh
  0 siblings, 0 replies; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-29  8:00 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Mike Snitzer, Christoph Hellwig, axboe, dm-devel,
	linux-kernel, akpm, linux-scsi, FUJITA Tomonori

On 06/28/2010 08:16 PM, Martin K. Petersen wrote:
> I'm still dreaming of the day where libata moves out from under SCSI so
> we don't have to translate square pegs into round holes...
> 

Hi Martin.

Is that on the Radar still? what would you say are the major issues holding
it back?

* Is it just the missing new ULD, Will we be using a new ULD?
* Is it the ata midlayer that would duplicate some of scsi-midlayer.
  (Not so much these days, lots of good code went into blk_rq_)
* Is it the distro's setup procedures changing yet again back to the
  hd vs sd times.

Thanks
Boaz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-27 15:29   ` [PATCH 1/2] block: fix leaks associated with discard request payload James Bottomley
  2010-06-28 17:16     ` Martin K. Petersen
@ 2010-06-29 22:28     ` Mikulas Patocka
  2010-06-29 23:03       ` James Bottomley
  2010-06-30  8:32     ` Boaz Harrosh
  2 siblings, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2010-06-29 22:28 UTC (permalink / raw)
  To: device-mapper development
  Cc: Mike Snitzer, axboe, linux-scsi, martin.petersen, linux-kernel,
	akpm, Christoph Hellwig



On Sun, 27 Jun 2010, James Bottomley wrote:

> linux-scsi cc added, since it's a SCSI patch.
> 
> On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> > Fix leaks introduced via "block: don't allocate a payload for discard
> > request" commit a1d949f5f44.
> > 
> > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > discard request's payload directly in scsi_finish_command().
> > 
> > Also cleanup page allocated for discard payload in
> > scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  block/blk-core.c       |   23 +++++++++++++++++++++++
> >  drivers/scsi/scsi.c    |    8 ++++++++
> >  drivers/scsi/sd.c      |   18 ++++++++----------
> >  include/linux/blkdev.h |    1 +
> >  4 files changed, 40 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 98b4cee..07925aa 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
> >  }
> >  EXPORT_SYMBOL_GPL(blk_add_request_payload);
> >  
> > +/**
> > + * blk_clear_request_payload - clear a request's payload
> > + * @rq: request to update
> > + *
> > + * The driver needs to take care of freeing the payload itself.
> > + */
> > +void blk_clear_request_payload(struct request *rq)
> > +{
> > +	struct bio *bio = rq->bio;
> > +
> > +	rq->__data_len = rq->resid_len = 0;
> > +	rq->nr_phys_segments = 0;
> > +	rq->buffer = NULL;
> > +
> > +	bio->bi_size = 0;
> > +	bio->bi_vcnt = 0;
> > +	bio->bi_phys_segments = 0;
> > +
> > +	bio->bi_io_vec->bv_page = NULL;
> > +	bio->bi_io_vec->bv_len = 0;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> > +
> >  void init_request_from_bio(struct request *req, struct bio *bio)
> >  {
> >  	req->cpu = bio->bi_comp_cpu;
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index ad0ed21..69c7ea4 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
> >  		 */
> >  		if (good_bytes == old_good_bytes)
> >  			good_bytes -= scsi_get_resid(cmd);
> > +	} else if (cmd->request->cmd_flags & REQ_DISCARD) {
> > +		/*
> > +		 * If this is a discard request that originated from the kernel
> > +		 * we need to free our payload here.  Note that we need to check
> > +		 * the request flag as the normal payload rules apply for
> > +		 * pass-through UNMAP / WRITE SAME requests.
> > +		 */
> > +		__free_page(bio_page(cmd->request->bio));
> 
> This is another layering violation:  the page is allocated in the Upper
> layer and freed in the mid-layer.
> 
> I really hate these growing contortions for discard.  They're a clear
> signal that we haven't implemented it right.
> 
> So let's first work out how it should be done.  I really like Tomo's
> idea of doing discard through the normal REQ_TYPE_FS route, which means
> we can control the setup in prep and the tear down in done, all confined
> to the ULD.
> 
> Where I think I'm at is partially what Christoph says:  The command
> transformation belongs in the ULD so that's where the allocation and
> deallocation should be, and partly what Tomo says in that we should
> eliminate the special case paths.
> 
> The payload vs actual request size should be a red herring if we've got
> everything correct:  only the ULD cares about the request parameters.
> Once we've got everything set up, the mid layer and LLD should only care
> about the parameters in the command, so we can confine the size changing
> part to the ULD doing the discard.
> 
> Could someone take a stab at this and see if it works without layering
> violations or any other problematic signals?
> 
> Thanks,
> 
> James

Well, I think that you overestimate the importance of scsi code too much.

There is a layering violation in the code. So what --- you either fix the 
layering violation or let it be there and grind your teeth on it. But in 
either case, that layering violation won't affect anyone except scsi 
developers.

On the other hand, if you say "because we want to avoid layering violation 
in SCSI, every issuer of discard request must supply an empty page", you 
create havoc all over the Linux codebase. md, dm, drbd, xvd, virtio --- 
whatever you think of, will be allocating a dummy page when constructing 
a discard request.

If the layering violation spans only scsi code, it can be eventually 
fixed, but this, much worse "layering violation" that will be spanning all 
block device midlayers, won't ever be fixed.

Imagine for example --- a discard request arrivers at a dm-snapshot 
device. The driver splits it into chunks, remaps each chunk to the 
physical chunk, submits the requests, the elevator merges adjacent 
requests and submits fewer bigger requests to the device. Now, if you had 
to allocate a zeroed page each time you are splitting the request, that 
would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
fine, allocate a 100MB of zeroed pages.

So I say --- let there be a layering violation in the scsi code, but don't 
put this problem with a page allocation to all the other bio midlayer 
developers.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-29 22:28     ` [dm-devel] " Mikulas Patocka
@ 2010-06-29 23:03       ` James Bottomley
  2010-06-29 23:51         ` Mike Snitzer
  2010-06-30  0:11         ` [dm-devel] " Mikulas Patocka
  0 siblings, 2 replies; 20+ messages in thread
From: James Bottomley @ 2010-06-29 23:03 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Mike Snitzer, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

On Tue, 2010-06-29 at 18:28 -0400, Mikulas Patocka wrote:
> 
> On Sun, 27 Jun 2010, James Bottomley wrote:
> 
> > linux-scsi cc added, since it's a SCSI patch.
> > 
> > On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> > > Fix leaks introduced via "block: don't allocate a payload for discard
> > > request" commit a1d949f5f44.
> > > 
> > > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > > discard request's payload directly in scsi_finish_command().
> > > 
> > > Also cleanup page allocated for discard payload in
> > > scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > ---
> > >  block/blk-core.c       |   23 +++++++++++++++++++++++
> > >  drivers/scsi/scsi.c    |    8 ++++++++
> > >  drivers/scsi/sd.c      |   18 ++++++++----------
> > >  include/linux/blkdev.h |    1 +
> > >  4 files changed, 40 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 98b4cee..07925aa 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
> > >  }
> > >  EXPORT_SYMBOL_GPL(blk_add_request_payload);
> > >  
> > > +/**
> > > + * blk_clear_request_payload - clear a request's payload
> > > + * @rq: request to update
> > > + *
> > > + * The driver needs to take care of freeing the payload itself.
> > > + */
> > > +void blk_clear_request_payload(struct request *rq)
> > > +{
> > > +	struct bio *bio = rq->bio;
> > > +
> > > +	rq->__data_len = rq->resid_len = 0;
> > > +	rq->nr_phys_segments = 0;
> > > +	rq->buffer = NULL;
> > > +
> > > +	bio->bi_size = 0;
> > > +	bio->bi_vcnt = 0;
> > > +	bio->bi_phys_segments = 0;
> > > +
> > > +	bio->bi_io_vec->bv_page = NULL;
> > > +	bio->bi_io_vec->bv_len = 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> > > +
> > >  void init_request_from_bio(struct request *req, struct bio *bio)
> > >  {
> > >  	req->cpu = bio->bi_comp_cpu;
> > > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > > index ad0ed21..69c7ea4 100644
> > > --- a/drivers/scsi/scsi.c
> > > +++ b/drivers/scsi/scsi.c
> > > @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
> > >  		 */
> > >  		if (good_bytes == old_good_bytes)
> > >  			good_bytes -= scsi_get_resid(cmd);
> > > +	} else if (cmd->request->cmd_flags & REQ_DISCARD) {
> > > +		/*
> > > +		 * If this is a discard request that originated from the kernel
> > > +		 * we need to free our payload here.  Note that we need to check
> > > +		 * the request flag as the normal payload rules apply for
> > > +		 * pass-through UNMAP / WRITE SAME requests.
> > > +		 */
> > > +		__free_page(bio_page(cmd->request->bio));
> > 
> > This is another layering violation:  the page is allocated in the Upper
> > layer and freed in the mid-layer.
> > 
> > I really hate these growing contortions for discard.  They're a clear
> > signal that we haven't implemented it right.
> > 
> > So let's first work out how it should be done.  I really like Tomo's
> > idea of doing discard through the normal REQ_TYPE_FS route, which means
> > we can control the setup in prep and the tear down in done, all confined
> > to the ULD.
> > 
> > Where I think I'm at is partially what Christoph says:  The command
> > transformation belongs in the ULD so that's where the allocation and
> > deallocation should be, and partly what Tomo says in that we should
> > eliminate the special case paths.
> > 
> > The payload vs actual request size should be a red herring if we've got
> > everything correct:  only the ULD cares about the request parameters.
> > Once we've got everything set up, the mid layer and LLD should only care
> > about the parameters in the command, so we can confine the size changing
> > part to the ULD doing the discard.
> > 
> > Could someone take a stab at this and see if it works without layering
> > violations or any other problematic signals?
> > 
> > Thanks,
> > 
> > James
> 
> Well, I think that you overestimate the importance of scsi code too much.

Not, I think, a deadly sin for a SCSI maintainer.

> There is a layering violation in the code. So what --- you either fix the 
> layering violation or let it be there and grind your teeth on it. But in 
> either case, that layering violation won't affect anyone except scsi 
> developers.

A layering violation is a signal of bad design wherever it occurs, so
that wasn't a SCSI centric argument.

> On the other hand, if you say "because we want to avoid layering violation 
> in SCSI, every issuer of discard request must supply an empty page", you 
> create havoc all over the Linux codebase. md, dm, drbd, xvd, virtio --- 
> whatever you think of, will be allocating a dummy page when constructing 
> a discard request.

Since I didn't actually say any of that, I suggest you re-read text you
quoted above.  The phrase "The command transformation belongs in the ULD
so that's where the allocation and deallocation should be" might be a
relevant one to concentrate on.

> If the layering violation spans only scsi code, it can be eventually 
> fixed, but this, much worse "layering violation" that will be spanning all 
> block device midlayers, won't ever be fixed.
> 
> Imagine for example --- a discard request arrivers at a dm-snapshot 
> device. The driver splits it into chunks, remaps each chunk to the 
> physical chunk, submits the requests, the elevator merges adjacent 
> requests and submits fewer bigger requests to the device. Now, if you had 
> to allocate a zeroed page each time you are splitting the request, that 
> would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> fine, allocate a 100MB of zeroed pages.

This is a straw man:  You've tried to portray a position I've never
taken as mine then attack it ... with what is effectively another bogus
argument.

It's not an either/or choice.  I've asked the relevant parties to
combine the approaches and see if a REQ_TYPE_FS path that does the
allocations in the appropriate place, likely the ULD, produces a good
design.

> So I say --- let there be a layering violation in the scsi code, but don't 
> put this problem with a page allocation to all the other bio midlayer 
> developers.

Thanks for explaining that you have nothing to contribute, I'll make
sure you're not on my list of relevant parties.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-29 23:03       ` James Bottomley
@ 2010-06-29 23:51         ` Mike Snitzer
  2010-06-30  0:11         ` [dm-devel] " Mikulas Patocka
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2010-06-29 23:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: axboe, linux-scsi, martin.petersen, linux-kernel,
	device-mapper development, Mikulas Patocka, akpm,
	Christoph Hellwig

On Tue, Jun 29 2010 at  7:03pm -0400,
James Bottomley <James.Bottomley@suse.de> wrote:

> On Tue, 2010-06-29 at 18:28 -0400, Mikulas Patocka wrote:
> > 
> > On Sun, 27 Jun 2010, James Bottomley wrote:
> > 
> > > linux-scsi cc added, since it's a SCSI patch.
> > > 
> > > On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> > > > Fix leaks introduced via "block: don't allocate a payload for discard
> > > > request" commit a1d949f5f44.
> > > > 
> > > > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > > > discard request's payload directly in scsi_finish_command().
> > > > 
> > > > Also cleanup page allocated for discard payload in
> > > > scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
> > > > 
> > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > > ---
> > > >  block/blk-core.c       |   23 +++++++++++++++++++++++
> > > >  drivers/scsi/scsi.c    |    8 ++++++++
> > > >  drivers/scsi/sd.c      |   18 ++++++++----------
> > > >  include/linux/blkdev.h |    1 +
> > > >  4 files changed, 40 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 98b4cee..07925aa 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(blk_add_request_payload);
> > > >  
> > > > +/**
> > > > + * blk_clear_request_payload - clear a request's payload
> > > > + * @rq: request to update
> > > > + *
> > > > + * The driver needs to take care of freeing the payload itself.
> > > > + */
> > > > +void blk_clear_request_payload(struct request *rq)
> > > > +{
> > > > +	struct bio *bio = rq->bio;
> > > > +
> > > > +	rq->__data_len = rq->resid_len = 0;
> > > > +	rq->nr_phys_segments = 0;
> > > > +	rq->buffer = NULL;
> > > > +
> > > > +	bio->bi_size = 0;
> > > > +	bio->bi_vcnt = 0;
> > > > +	bio->bi_phys_segments = 0;
> > > > +
> > > > +	bio->bi_io_vec->bv_page = NULL;
> > > > +	bio->bi_io_vec->bv_len = 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> > > > +
> > > >  void init_request_from_bio(struct request *req, struct bio *bio)
> > > >  {
> > > >  	req->cpu = bio->bi_comp_cpu;
> > > > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > > > index ad0ed21..69c7ea4 100644
> > > > --- a/drivers/scsi/scsi.c
> > > > +++ b/drivers/scsi/scsi.c
> > > > @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
> > > >  		 */
> > > >  		if (good_bytes == old_good_bytes)
> > > >  			good_bytes -= scsi_get_resid(cmd);
> > > > +	} else if (cmd->request->cmd_flags & REQ_DISCARD) {
> > > > +		/*
> > > > +		 * If this is a discard request that originated from the kernel
> > > > +		 * we need to free our payload here.  Note that we need to check
> > > > +		 * the request flag as the normal payload rules apply for
> > > > +		 * pass-through UNMAP / WRITE SAME requests.
> > > > +		 */
> > > > +		__free_page(bio_page(cmd->request->bio));
> > > 
> > > This is another layering violation:  the page is allocated in the Upper
> > > layer and freed in the mid-layer.
> > > 
> > > I really hate these growing contortions for discard.  They're a clear
> > > signal that we haven't implemented it right.
> > > 
> > > So let's first work out how it should be done.  I really like Tomo's
> > > idea of doing discard through the normal REQ_TYPE_FS route, which means
> > > we can control the setup in prep and the tear down in done, all confined
> > > to the ULD.
> > > 
> > > Where I think I'm at is partially what Christoph says:  The command
> > > transformation belongs in the ULD so that's where the allocation and
> > > deallocation should be, and partly what Tomo says in that we should
> > > eliminate the special case paths.
> > > 
> > > The payload vs actual request size should be a red herring if we've got
> > > everything correct:  only the ULD cares about the request parameters.
> > > Once we've got everything set up, the mid layer and LLD should only care
> > > about the parameters in the command, so we can confine the size changing
> > > part to the ULD doing the discard.
> > > 
> > > Could someone take a stab at this and see if it works without layering
> > > violations or any other problematic signals?
> > > 
> > > Thanks,
> > > 
> > > James
> > 
> > Well, I think that you overestimate the importance of scsi code too much.
> 
> Not, I think, a deadly sin for a SCSI maintainer.

Indeed ;)

> > There is a layering violation in the code. So what --- you either fix the 
> > layering violation or let it be there and grind your teeth on it. But in 
> > either case, that layering violation won't affect anyone except scsi 
> > developers.
> 
> A layering violation is a signal of bad design wherever it occurs, so
> that wasn't a SCSI centric argument.
> 
> > On the other hand, if you say "because we want to avoid layering violation 
> > in SCSI, every issuer of discard request must supply an empty page", you 
> > create havoc all over the Linux codebase. md, dm, drbd, xvd, virtio --- 
> > whatever you think of, will be allocating a dummy page when constructing 
> > a discard request.
> 
> Since I didn't actually say any of that, I suggest you re-read text you
> quoted above.  The phrase "The command transformation belongs in the ULD
> so that's where the allocation and deallocation should be" might be a
> relevant one to concentrate on.

Right, freeing the page, that was allocated in SCSI's ULD, from the SCSI
midlayer is a SCSI layering violation.  I think Mikulas was reacting to
the desire to maintain the existing, arguably more problematic, layering
violation that spans the block and SCSI layers.

> > If the layering violation spans only scsi code, it can be eventually 
> > fixed, but this, much worse "layering violation" that will be spanning all 
> > block device midlayers, won't ever be fixed.
> > 
> > Imagine for example --- a discard request arrivers at a dm-snapshot 
> > device. The driver splits it into chunks, remaps each chunk to the 
> > physical chunk, submits the requests, the elevator merges adjacent 
> > requests and submits fewer bigger requests to the device. Now, if you had 
> > to allocate a zeroed page each time you are splitting the request, that 
> > would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> > fine, allocate a 100MB of zeroed pages.
> 
> This is a straw man:  You've tried to portray a position I've never
> taken as mine then attack it ... with what is effectively another bogus
> argument.
> 
> It's not an either/or choice.  I've asked the relevant parties to
> combine the approaches and see if a REQ_TYPE_FS path that does the
> allocations in the appropriate place, likely the ULD, produces a good
> design.

If in the end we can fix up SCSI properly then everyone is happy.  So
lets just keep working toward that.  The various attempts to convert
discard over to REQ_TYPE_FS have fallen short.  Hopefully we'll have a
break through shortly.

Thanks for your guidance James,
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-29 23:03       ` James Bottomley
  2010-06-29 23:51         ` Mike Snitzer
@ 2010-06-30  0:11         ` Mikulas Patocka
  2010-06-30 14:22           ` James Bottomley
  1 sibling, 1 reply; 20+ messages in thread
From: Mikulas Patocka @ 2010-06-30  0:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: device-mapper development, Mike Snitzer, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

> > If the layering violation spans only scsi code, it can be eventually 
> > fixed, but this, much worse "layering violation" that will be spanning all 
> > block device midlayers, won't ever be fixed.
> > 
> > Imagine for example --- a discard request arrivers at a dm-snapshot 
> > device. The driver splits it into chunks, remaps each chunk to the 
> > physical chunk, submits the requests, the elevator merges adjacent 
> > requests and submits fewer bigger requests to the device. Now, if you had 
> > to allocate a zeroed page each time you are splitting the request, that 
> > would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> > fine, allocate a 100MB of zeroed pages.
> 
> This is a straw man:  You've tried to portray a position I've never
> taken as mine then attack it ... with what is effectively another bogus
> argument.
>
> It's not an either/or choice.

It is either/or choice. If the interface isn't fixed NOW, the existing 
flawed zeroed-page-allocation interface gets into RHEL and I and others 
will have to support it for 7 years.

> I've asked the relevant parties to
> combine the approaches and see if a REQ_TYPE_FS path that does the
> allocations in the appropriate place, likely the ULD, produces a good
> design.

OK, but before you do this research, fix the interface.

> > So I say --- let there be a layering violation in the scsi code, but don't 
> > put this problem with a page allocation to all the other bio midlayer 
> > developers.
> 
> Thanks for explaining that you have nothing to contribute, I'll make
> sure you're not on my list of relevant parties.

You misunderstand what I meant. You admit that there are design problems 
in SCSI. So don't burden other developers with these problems. Don't force 
the others to allocate dummy pages just because you want a cleaner scsi 
code.

You intend to fix the design of SCSI and then remove the dummy pages. But 
by the time you finish it, it will be already late and there will be 
midlayer drivers allocating these dummy pages.

What I mean is that "layering violation" inside one driver is smaller 
problem than misdesigned interface between drivers. So accept the patch 
that creates "layering violation" but cleans up the interface.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30  0:11         ` [dm-devel] " Mikulas Patocka
@ 2010-06-30 14:22           ` James Bottomley
  2010-06-30 15:36             ` Mike Snitzer
  2010-07-01 12:28             ` [dm-devel] " Mikulas Patocka
  0 siblings, 2 replies; 20+ messages in thread
From: James Bottomley @ 2010-06-30 14:22 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Mike Snitzer, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

On Tue, 2010-06-29 at 20:11 -0400, Mikulas Patocka wrote:
> > > If the layering violation spans only scsi code, it can be eventually 
> > > fixed, but this, much worse "layering violation" that will be spanning all 
> > > block device midlayers, won't ever be fixed.
> > > 
> > > Imagine for example --- a discard request arrivers at a dm-snapshot 
> > > device. The driver splits it into chunks, remaps each chunk to the 
> > > physical chunk, submits the requests, the elevator merges adjacent 
> > > requests and submits fewer bigger requests to the device. Now, if you had 
> > > to allocate a zeroed page each time you are splitting the request, that 
> > > would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> > > fine, allocate a 100MB of zeroed pages.
> > 
> > This is a straw man:  You've tried to portray a position I've never
> > taken as mine then attack it ... with what is effectively another bogus
> > argument.
> >
> > It's not an either/or choice.
> 
> It is either/or choice. If the interface isn't fixed NOW, the existing 
> flawed zeroed-page-allocation interface gets into RHEL

That's a false dichotomy.  You might see an either apply this hack now
or support the interface choice with RHEL, but upstream has the option
to fix stuff correctly.  RHEL has never needed my blessing to apply
random crap to their kernel before ... why is this patch any different?

> and I and others will have to support it for 7 years.

It's called a business model ... I believe it's what they pay you for.

> > I've asked the relevant parties to
> > combine the approaches and see if a REQ_TYPE_FS path that does the
> > allocations in the appropriate place, likely the ULD, produces a good
> > design.
> 
> OK, but before you do this research, fix the interface.

So even in the RHEL world, I think you'd find that analysing the problem
*before* comping up with a fix is a good way of doing things.

> > > So I say --- let there be a layering violation in the scsi code, but don't 
> > > put this problem with a page allocation to all the other bio midlayer 
> > > developers.
> > 
> > Thanks for explaining that you have nothing to contribute, I'll make
> > sure you're not on my list of relevant parties.
> 
> You misunderstand what I meant. You admit that there are design problems 
> in SCSI.

No I didn't.

And the rest of this rubbish is based on that false premise.  It might
help you to take off your SCSI antipathy and see this as a system
problem: it actually originates in block and spills out from there.
Thus it requires a system solution.

James

> So don't burden other developers with these problems. Don't force 
> the others to allocate dummy pages just because you want a cleaner scsi 
> code.
> 
> You intend to fix the design of SCSI and then remove the dummy pages. But 
> by the time you finish it, it will be already late and there will be 
> midlayer drivers allocating these dummy pages.
> 
> What I mean is that "layering violation" inside one driver is smaller 
> problem than misdesigned interface between drivers. So accept the patch 
> that creates "layering violation" but cleans up the interface.
> 
> Mikulas



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 14:22           ` James Bottomley
@ 2010-06-30 15:36             ` Mike Snitzer
  2010-06-30 16:26               ` James Bottomley
  2010-07-01 12:28             ` [dm-devel] " Mikulas Patocka
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2010-06-30 15:36 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mikulas Patocka, device-mapper development, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

On Wed, Jun 30 2010 at 10:22am -0400,
James Bottomley <James.Bottomley@suse.de> wrote:

> On Tue, 2010-06-29 at 20:11 -0400, Mikulas Patocka wrote:
> > > > If the layering violation spans only scsi code, it can be eventually 
> > > > fixed, but this, much worse "layering violation" that will be spanning all 
> > > > block device midlayers, won't ever be fixed.
> > > > 
> > > > Imagine for example --- a discard request arrivers at a dm-snapshot 
> > > > device. The driver splits it into chunks, remaps each chunk to the 
> > > > physical chunk, submits the requests, the elevator merges adjacent 
> > > > requests and submits fewer bigger requests to the device. Now, if you had 
> > > > to allocate a zeroed page each time you are splitting the request, that 
> > > > would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> > > > fine, allocate a 100MB of zeroed pages.
> > > 
> > > This is a straw man:  You've tried to portray a position I've never
> > > taken as mine then attack it ... with what is effectively another bogus
> > > argument.
> > >
> > > It's not an either/or choice.
> > 
> > It is either/or choice. If the interface isn't fixed NOW, the existing 
> > flawed zeroed-page-allocation interface gets into RHEL
> 
> That's a false dichotomy.  You might see an either apply this hack now
> or support the interface choice with RHEL, but upstream has the option
> to fix stuff correctly.  RHEL has never needed my blessing to apply
> random crap to their kernel before ... why is this patch any different?
> 
> > and I and others will have to support it for 7 years.
> 
> It's called a business model ... I believe it's what they pay you for.
> 
> > > I've asked the relevant parties to
> > > combine the approaches and see if a REQ_TYPE_FS path that does the
> > > allocations in the appropriate place, likely the ULD, produces a good
> > > design.
> > 
> > OK, but before you do this research, fix the interface.
> 
> So even in the RHEL world, I think you'd find that analysing the problem
> *before* comping up with a fix is a good way of doing things.
> 
> > > > So I say --- let there be a layering violation in the scsi code, but don't 
> > > > put this problem with a page allocation to all the other bio midlayer 
> > > > developers.
> > > 
> > > Thanks for explaining that you have nothing to contribute, I'll make
> > > sure you're not on my list of relevant parties.
> > 
> > You misunderstand what I meant. You admit that there are design problems 
> > in SCSI.
> 
> No I didn't.
> 
> And the rest of this rubbish is based on that false premise.  It might
> help you to take off your SCSI antipathy and see this as a system
> problem: it actually originates in block and spills out from there.
> Thus it requires a system solution.

As fun as it is for the others monitoring these lists to see redhat.com
vs suse.de banter I think framing this discussion like you (and Mikulas)
continue to do is a complete distraction.

I tried to elevate (and defuse) the discussion yesterday.  But simply
put: patches speak volumes.  I look forward to working with Tomo, hch
and anyone else who has something to contribute that moves us toward a
real fix for discards.

Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 15:36             ` Mike Snitzer
@ 2010-06-30 16:26               ` James Bottomley
  0 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2010-06-30 16:26 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mikulas Patocka, device-mapper development, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

On Wed, 2010-06-30 at 11:36 -0400, Mike Snitzer wrote:
> On Wed, Jun 30 2010 at 10:22am -0400,
> James Bottomley <James.Bottomley@suse.de> wrote:
> 
> > On Tue, 2010-06-29 at 20:11 -0400, Mikulas Patocka wrote:
> > > > > If the layering violation spans only scsi code, it can be eventually 
> > > > > fixed, but this, much worse "layering violation" that will be spanning all 
> > > > > block device midlayers, won't ever be fixed.
> > > > > 
> > > > > Imagine for example --- a discard request arrivers at a dm-snapshot 
> > > > > device. The driver splits it into chunks, remaps each chunk to the 
> > > > > physical chunk, submits the requests, the elevator merges adjacent 
> > > > > requests and submits fewer bigger requests to the device. Now, if you had 
> > > > > to allocate a zeroed page each time you are splitting the request, that 
> > > > > would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
> > > > > fine, allocate a 100MB of zeroed pages.
> > > > 
> > > > This is a straw man:  You've tried to portray a position I've never
> > > > taken as mine then attack it ... with what is effectively another bogus
> > > > argument.
> > > >
> > > > It's not an either/or choice.
> > > 
> > > It is either/or choice. If the interface isn't fixed NOW, the existing 
> > > flawed zeroed-page-allocation interface gets into RHEL
> > 
> > That's a false dichotomy.  You might see an either apply this hack now
> > or support the interface choice with RHEL, but upstream has the option
> > to fix stuff correctly.  RHEL has never needed my blessing to apply
> > random crap to their kernel before ... why is this patch any different?
> > 
> > > and I and others will have to support it for 7 years.
> > 
> > It's called a business model ... I believe it's what they pay you for.
> > 
> > > > I've asked the relevant parties to
> > > > combine the approaches and see if a REQ_TYPE_FS path that does the
> > > > allocations in the appropriate place, likely the ULD, produces a good
> > > > design.
> > > 
> > > OK, but before you do this research, fix the interface.
> > 
> > So even in the RHEL world, I think you'd find that analysing the problem
> > *before* comping up with a fix is a good way of doing things.
> > 
> > > > > So I say --- let there be a layering violation in the scsi code, but don't 
> > > > > put this problem with a page allocation to all the other bio midlayer 
> > > > > developers.
> > > > 
> > > > Thanks for explaining that you have nothing to contribute, I'll make
> > > > sure you're not on my list of relevant parties.
> > > 
> > > You misunderstand what I meant. You admit that there are design problems 
> > > in SCSI.
> > 
> > No I didn't.
> > 
> > And the rest of this rubbish is based on that false premise.  It might
> > help you to take off your SCSI antipathy and see this as a system
> > problem: it actually originates in block and spills out from there.
> > Thus it requires a system solution.
> 
> As fun as it is for the others monitoring these lists to see redhat.com
> vs suse.de banter I think framing this discussion like you (and Mikulas)
> continue to do is a complete distraction.

Well, it's not SUSE v Red Hat, it's upstream v Enterprise ... and it's
partly my job to explain why upstream does correct fixes not enterprise
workarounds (whether the enterprise is RHEL or SLES).  But I agree it's
becoming a pointless distraction.

> I tried to elevate (and defuse) the discussion yesterday.  But simply
> put: patches speak volumes.  I look forward to working with Tomo, hch
> and anyone else who has something to contribute that moves us toward a
> real fix for discards.

Right, so thanks for that.

Most of our problem is tied up in the fact that we need to allocate in
the prepare path, but we don't have a corresponding clear unprepare path
to do the deallocation in.  Introducing that into block might sort out
this tangle better ... error handling on the backend is very convoluted
and we can't really free the page until after it's complete (and the
->done function doesn't mark completion of error handling terms).  I
think the bones of a solution to this might be that
scsi_unprep_request() needs to call into block (instead of setting the
flags itself), say blk_unprep_request.  Block also needs to call
blk_unprep_request based on the REQ_DONTPREP status in its completion
path.  This would then give us a hook to hang the deallocation correctly
on.

James



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 14:22           ` James Bottomley
  2010-06-30 15:36             ` Mike Snitzer
@ 2010-07-01 12:28             ` Mikulas Patocka
  2010-07-01 12:46               ` Mike Snitzer
  2010-07-01 12:49               ` [dm-devel] " Alasdair G Kergon
  1 sibling, 2 replies; 20+ messages in thread
From: Mikulas Patocka @ 2010-07-01 12:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: device-mapper development, Mike Snitzer, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

> > It is either/or choice. If the interface isn't fixed NOW, the existing 
> > flawed zeroed-page-allocation interface gets into RHEL
> 
> That's a false dichotomy.  You might see an either apply this hack now
> or support the interface choice with RHEL, but upstream has the option
> to fix stuff correctly.  RHEL has never needed my blessing to apply
> random crap to their kernel before ... why is this patch any different?

We can't apply non-upstream patches (except few exceptions such as 
dm-raid45). It makes sense, non-upstream patches have smaller test 
coverage.

> And the rest of this rubbish is based on that false premise.  It might
> help you to take off your SCSI antipathy and see this as a system
> problem: it actually originates in block and spills out from there.
> Thus it requires a system solution.
> 
> James

Imagine this: I take a FPGA PCI board, I design a storage controller on it 
and this controller will need 3 pages to process a discard request. Now I 
say: I refuse to allocate these 3 pages in the driver because the driver 
would look ugly --- instead, I demand that everyone in the Linux kernel 
who creates a discard request must attach 3 pages to the request for my 
driver.

Do you think it is correct behavior? Would you accept such a driver? I 
guess you wouldn't! But this is the same thing that you are doing with 
SCSI.

Now lets take it a bit further and I say "I may clean up the driver for my 
controller one day, when I do it, I remove that 3-page requirement --- and 
then, everyone who allocated those pages will have to change his code and 
remove the allocations".

And this is what you are intending to do with SCSI.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-07-01 12:28             ` [dm-devel] " Mikulas Patocka
@ 2010-07-01 12:46               ` Mike Snitzer
  2010-07-01 14:03                 ` Mikulas Patocka
  2010-07-01 12:49               ` [dm-devel] " Alasdair G Kergon
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2010-07-01 12:46 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Bottomley, device-mapper development, axboe, linux-scsi,
	martin.petersen, linux-kernel, akpm, Christoph Hellwig

On Thu, Jul 01 2010 at  8:28am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> > > It is either/or choice. If the interface isn't fixed NOW, the existing 
> > > flawed zeroed-page-allocation interface gets into RHEL
> > 
> > That's a false dichotomy.  You might see an either apply this hack now
> > or support the interface choice with RHEL, but upstream has the option
> > to fix stuff correctly.  RHEL has never needed my blessing to apply
> > random crap to their kernel before ... why is this patch any different?
> 
> We can't apply non-upstream patches (except few exceptions such as 
> dm-raid45). It makes sense, non-upstream patches have smaller test 
> coverage.
> 
> > And the rest of this rubbish is based on that false premise.  It might
> > help you to take off your SCSI antipathy and see this as a system
> > problem: it actually originates in block and spills out from there.
> > Thus it requires a system solution.
> > 
> > James
> 
> Imagine this: I take a FPGA PCI board, I design a storage controller on it 
> and this controller will need 3 pages to process a discard request. Now I 
> say: I refuse to allocate these 3 pages in the driver because the driver 
> would look ugly --- instead, I demand that everyone in the Linux kernel 
> who creates a discard request must attach 3 pages to the request for my 
> driver.
> 
> Do you think it is correct behavior? Would you accept such a driver? I 
> guess you wouldn't! But this is the same thing that you are doing with 
> SCSI.
> 
> Now lets take it a bit further and I say "I may clean up the driver for my 
> controller one day, when I do it, I remove that 3-page requirement --- and 
> then, everyone who allocated those pages will have to change his code and 
> remove the allocations".
> 
> And this is what you are intending to do with SCSI.

Mikulas,

Jens has already queued up a comprehensive fix (3 patches) that James
and Tomo developed.  Please stop the hostility.. it has no place.

Others,
I'd encourage you to not respond to this thread further ;)

Regards,
Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-07-01 12:46               ` Mike Snitzer
@ 2010-07-01 14:03                 ` Mikulas Patocka
  0 siblings, 0 replies; 20+ messages in thread
From: Mikulas Patocka @ 2010-07-01 14:03 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, linux-scsi, martin.petersen, linux-kernel,
	device-mapper development, James Bottomley, akpm,
	Christoph Hellwig

> Mikulas,
> 
> Jens has already queued up a comprehensive fix (3 patches) that James
> and Tomo developed.  Please stop the hostility.. it has no place.
> 
> Others,
> I'd encourage you to not respond to this thread further ;)
> 
> Regards,
> Mike

I read mailing lists at longer intervals than my mailbox, so I read the 
James' message I responded to but missed the patches on the list. It's 
good that it's fixed and there is no need to argue about it anymore.

Mikulas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-07-01 12:28             ` [dm-devel] " Mikulas Patocka
  2010-07-01 12:46               ` Mike Snitzer
@ 2010-07-01 12:49               ` Alasdair G Kergon
  1 sibling, 0 replies; 20+ messages in thread
From: Alasdair G Kergon @ 2010-07-01 12:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: James Bottomley, axboe, Mike Snitzer, linux-scsi, linux-kernel,
	device-mapper development, martin.petersen, akpm,
	Christoph Hellwig

On Thu, Jul 01, 2010 at 08:28:02AM -0400, Mikulas Patocka wrote:
> But this is the same thing that you are doing with SCSI.

Have you read these patches, Mikulas?  Stop digging.

Alasdair

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-27 15:29   ` [PATCH 1/2] block: fix leaks associated with discard request payload James Bottomley
  2010-06-28 17:16     ` Martin K. Petersen
  2010-06-29 22:28     ` [dm-devel] " Mikulas Patocka
@ 2010-06-30  8:32     ` Boaz Harrosh
  2010-06-30  8:42       ` Christoph Hellwig
  2 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-30  8:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Snitzer, Christoph Hellwig, axboe, dm-devel, linux-kernel,
	martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On 06/27/2010 06:29 PM, James Bottomley wrote:
<snip>
>> +		/*
>> +		 * If this is a discard request that originated from the kernel
>> +		 * we need to free our payload here.  Note that we need to check
>> +		 * the request flag as the normal payload rules apply for
>> +		 * pass-through UNMAP / WRITE SAME requests.
>> +		 */
>> +		__free_page(bio_page(cmd->request->bio));
> 
> This is another layering violation:  the page is allocated in the Upper
> layer and freed in the mid-layer.
> 

May I ask a silly question? Why the dynamic allocation?

Why not have a const-static single global page at the block-layer somewhere
that will be used for all discard-type operations and be done with it once and
for all. A single page can be used for any size bio , any number of concurrent
discards, any ZERO needed operation. It can also be used by other operations
like padding and others. In fact isn't there one for the libsata padding?

(It could be dynamical allocated on first use for embedded system)

just my $0.017
Boaz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30  8:32     ` Boaz Harrosh
@ 2010-06-30  8:42       ` Christoph Hellwig
  2010-06-30 10:25         ` Boaz Harrosh
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-30  8:42 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: James Bottomley, Mike Snitzer, Christoph Hellwig, axboe, dm-devel,
	linux-kernel, martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On Wed, Jun 30, 2010 at 11:32:43AM +0300, Boaz Harrosh wrote:
> May I ask a silly question? Why the dynamic allocation?
> 
> Why not have a const-static single global page at the block-layer somewhere
> that will be used for all discard-type operations and be done with it once and
> for all. A single page can be used for any size bio , any number of concurrent
> discards, any ZERO needed operation. It can also be used by other operations
> like padding and others. In fact isn't there one for the libsata padding?

for UNMAP we need to write into the payload.  And for ATA TRIM we need
to write into the WRITE SAME payload.  That's another layering violation
for those looking for them, btw..


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30  8:42       ` Christoph Hellwig
@ 2010-06-30 10:25         ` Boaz Harrosh
  2010-06-30 10:41           ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-30 10:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Mike Snitzer, axboe, dm-devel, linux-kernel,
	martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On 06/30/2010 11:42 AM, Christoph Hellwig wrote:
> On Wed, Jun 30, 2010 at 11:32:43AM +0300, Boaz Harrosh wrote:
>> May I ask a silly question? Why the dynamic allocation?
>>
>> Why not have a const-static single global page at the block-layer somewhere
>> that will be used for all discard-type operations and be done with it once and
>> for all. A single page can be used for any size bio , any number of concurrent
>> discards, any ZERO needed operation. It can also be used by other operations
>> like padding and others. In fact isn't there one for the libsata padding?
> 
> for UNMAP we need to write into the payload.  And for ATA TRIM we need
> to write into the WRITE SAME payload.  

OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where
the CDB information spills into the payload? like the scatter-gather and extent
lists and such. Do we actually use a WRITE_SAME which is not zero? for what use?

> That's another layering violation
> for those looking for them, btw..
> 

Agreed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 10:25         ` Boaz Harrosh
@ 2010-06-30 10:41           ` Christoph Hellwig
  2010-06-30 10:57             ` Boaz Harrosh
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2010-06-30 10:41 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Christoph Hellwig, James Bottomley, Mike Snitzer, axboe, dm-devel,
	linux-kernel, martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote:
> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where
> the CDB information spills into the payload? like the scatter-gather and extent
> lists and such.

For UNMAP the payload is a list of block number / length pairs, while
the CDB itself doesn't contain any information like that.  It's a rather
awkward command.

> Do we actually use a WRITE_SAME which is not zero? for what use?

The kernel doesn't issue any WRITE SAME without the unmap bit set.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 10:41           ` Christoph Hellwig
@ 2010-06-30 10:57             ` Boaz Harrosh
  2010-06-30 12:18               ` Mike Snitzer
  0 siblings, 1 reply; 20+ messages in thread
From: Boaz Harrosh @ 2010-06-30 10:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Mike Snitzer, axboe, dm-devel, linux-kernel,
	martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On 06/30/2010 01:41 PM, Christoph Hellwig wrote:
> On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote:
>> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where
>> the CDB information spills into the payload? like the scatter-gather and extent
>> lists and such.
> 
> For UNMAP the payload is a list of block number / length pairs, while
> the CDB itself doesn't contain any information like that.  It's a rather
> awkward command.
> 

How big can that be? could we, maybe, use the sense_buffer, properly allocated
already?

>> Do we actually use a WRITE_SAME which is not zero? for what use?
> 
> The kernel doesn't issue any WRITE SAME without the unmap bit set.

So if the unmap bit is set then the page can just be zero, right?

I still think a static zero-page is a worth while optimization. And
block-drivers can take care with special needs with a private mem_pool
or something. For the discard-type user and generic block layer the
page is just an implementation specific residue, No?

But don't mind me, I'm just babbling. Not that I'll do anything about it.
Boaz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] block: fix leaks associated with discard request payload
  2010-06-30 10:57             ` Boaz Harrosh
@ 2010-06-30 12:18               ` Mike Snitzer
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2010-06-30 12:18 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Christoph Hellwig, James Bottomley, axboe, dm-devel, linux-kernel,
	martin.petersen, akpm, linux-scsi, FUJITA Tomonori

On Wed, Jun 30 2010 at  6:57am -0400,
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On 06/30/2010 01:41 PM, Christoph Hellwig wrote:
> > On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote:
> >> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where
> >> the CDB information spills into the payload? like the scatter-gather and extent
> >> lists and such.
> > 
> > For UNMAP the payload is a list of block number / length pairs, while
> > the CDB itself doesn't contain any information like that.  It's a rather
> > awkward command.
> > 
> 
> How big can that be? could we, maybe, use the sense_buffer, properly allocated
> already?
> 
> >> Do we actually use a WRITE_SAME which is not zero? for what use?
> > 
> > The kernel doesn't issue any WRITE SAME without the unmap bit set.
> 
> So if the unmap bit is set then the page can just be zero, right?
> 
> I still think a static zero-page is a worth while optimization. And
> block-drivers can take care with special needs with a private mem_pool
> or something. For the discard-type user and generic block layer the
> page is just an implementation specific residue, No?

Why should the block layer have any role in managing this page?  Block
layer doesn't care about it, SCSI does.

Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-07-01 14:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20100622180029.GA15950@redhat.com>
     [not found] ` <1277582211-10725-1-git-send-email-snitzer@redhat.com>
2010-06-27 15:29   ` [PATCH 1/2] block: fix leaks associated with discard request payload James Bottomley
2010-06-28 17:16     ` Martin K. Petersen
2010-06-29  8:00       ` Boaz Harrosh
2010-06-29 22:28     ` [dm-devel] " Mikulas Patocka
2010-06-29 23:03       ` James Bottomley
2010-06-29 23:51         ` Mike Snitzer
2010-06-30  0:11         ` [dm-devel] " Mikulas Patocka
2010-06-30 14:22           ` James Bottomley
2010-06-30 15:36             ` Mike Snitzer
2010-06-30 16:26               ` James Bottomley
2010-07-01 12:28             ` [dm-devel] " Mikulas Patocka
2010-07-01 12:46               ` Mike Snitzer
2010-07-01 14:03                 ` Mikulas Patocka
2010-07-01 12:49               ` [dm-devel] " Alasdair G Kergon
2010-06-30  8:32     ` Boaz Harrosh
2010-06-30  8:42       ` Christoph Hellwig
2010-06-30 10:25         ` Boaz Harrosh
2010-06-30 10:41           ` Christoph Hellwig
2010-06-30 10:57             ` Boaz Harrosh
2010-06-30 12:18               ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox