All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kent Overstreet <kmo@daterainc.com>
To: Hugh Dickins <hughd@google.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Jens Axboe <axboe@kernel.dk>, Shaohua Li <shli@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: [PATCH] block: Explicitly handle discard/write same segments
Date: Tue, 4 Feb 2014 02:17:48 -0800	[thread overview]
Message-ID: <20140204101748.GA12440@kmo-pixel> (raw)
In-Reply-To: <alpine.LSU.2.11.1401310852000.987@eggly.anvils>

Immutable biovecs changed the way biovecs are interpreted - drivers no
longer use bi_vcnt, they have to go by bi_iter.bi_size (to allow for
using part of an existing segment without modifying it).

This breaks with discards and write_same bios, since for those bi_size
has nothing to do with segments in the biovec. So for now, we need a
fairly gross hack - we fortunately know that there will never be more
than one segment for the entire request, so we can special case
discard/write_same.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
---
On Fri, Jan 31, 2014 at 09:17:25AM -0800, Hugh Dickins wrote:
> On Thu, 16 Jan 2014, Kent Overstreet wrote:
> > 
> > Ok, I reread the code and figured it out - the analagous change also has to be
> > made in __blk_segment_map_sg(). I'll mail out a patch for this tomorrow after
> > I've stared at the code more and had less beer.
> 
> I'd been hoping for a patch to try, but now your changes have hit Linus's
> tree: so today we have discard broken there too, crashing as originally
> reported on the NULL struct page pointer in __blk_recalc_rq_segments()'s
> page_to_pfn(bv.bv_page).
> 
> How to reproduce it?  I hope you'll find easier ways, but I get it with
> swapping to SSD (remember "swapon -d" to enable discard).  I'm just doing
> what I've done for years, running a pair of make -j20 kbuilds to tmpfs in
> limited RAM (I use mem=700M with 1.5G of swap: but that would be far too
> little RAM for a general config of current tree), to get plenty of fairly
> chaotic swapping but good forward progress nonetheless (if the sizes are
> too small, then it'll just thrash abysmally or be OOM-killed).
> 
> But please do send me a patch and I'll give it a try - thanks.

Hugh - can you give this patch a try? Passes my tests but I was never
able to reproduce your crash, unfortunately.

 block/blk-merge.c | 91 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8f8adaa954..6c583f9c5b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -21,6 +21,16 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	if (!bio)
 		return 0;
 
+	/*
+	 * This should probably be returning 0, but blk_add_request_payload()
+	 * (Christoph!!!!)
+	 */
+	if (bio->bi_rw & REQ_DISCARD)
+		return 1;
+
+	if (bio->bi_rw & REQ_WRITE_SAME)
+		return 1;
+
 	fbio = bio;
 	cluster = blk_queue_cluster(q);
 	seg_size = 0;
@@ -161,30 +171,60 @@ new_segment:
 	*bvprv = *bvec;
 }
 
-/*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
- */
-int blk_rq_map_sg(struct request_queue *q, struct request *rq,
-		  struct scatterlist *sglist)
+static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
+			     struct scatterlist *sglist,
+			     struct scatterlist **sg)
 {
 	struct bio_vec bvec, bvprv = { NULL };
-	struct req_iterator iter;
-	struct scatterlist *sg;
+	struct bvec_iter iter;
 	int nsegs, cluster;
 
 	nsegs = 0;
 	cluster = blk_queue_cluster(q);
 
-	/*
-	 * for each bio in rq
-	 */
-	sg = NULL;
-	rq_for_each_segment(bvec, rq, iter) {
-		__blk_segment_map_sg(q, &bvec, sglist, &bvprv, &sg,
-				     &nsegs, &cluster);
-	} /* segments in rq */
+	if (bio->bi_rw & REQ_DISCARD) {
+		/*
+		 * This is a hack - drivers should be neither modifying the
+		 * biovec, nor relying on bi_vcnt - but because of
+		 * blk_add_request_payload(), a discard bio may or may not have
+		 * a payload we need to set up here (thank you Christoph) and
+		 * bi_vcnt is really the only way of telling if we need to.
+		 */
+
+		if (bio->bi_vcnt)
+			goto single_segment;
+
+		return 0;
+	}
+
+	if (bio->bi_rw & REQ_WRITE_SAME) {
+single_segment:
+		*sg = sglist;
+		bvec = bio_iovec(bio);
+		sg_set_page(*sg, bvec.bv_page, bvec.bv_len, bvec.bv_offset);
+		return 1;
+	}
+
+	for_each_bio(bio)
+		bio_for_each_segment(bvec, bio, iter)
+			__blk_segment_map_sg(q, &bvec, sglist, &bvprv, sg,
+					     &nsegs, &cluster);
 
+	return nsegs;
+}
+
+/*
+ * map a request to scatterlist, return number of sg entries setup. Caller
+ * must make sure sg can hold rq->nr_phys_segments entries
+ */
+int blk_rq_map_sg(struct request_queue *q, struct request *rq,
+		  struct scatterlist *sglist)
+{
+	struct scatterlist *sg = NULL;
+	int nsegs = 0;
+
+	if (rq->bio)
+		nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg);
 
 	if (unlikely(rq->cmd_flags & REQ_COPY_USER) &&
 	    (blk_rq_bytes(rq) & q->dma_pad_mask)) {
@@ -230,20 +270,13 @@ EXPORT_SYMBOL(blk_rq_map_sg);
 int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
 		   struct scatterlist *sglist)
 {
-	struct bio_vec bvec, bvprv = { NULL };
-	struct scatterlist *sg;
-	int nsegs, cluster;
-	struct bvec_iter iter;
-
-	nsegs = 0;
-	cluster = blk_queue_cluster(q);
-
-	sg = NULL;
-	bio_for_each_segment(bvec, bio, iter) {
-		__blk_segment_map_sg(q, &bvec, sglist, &bvprv, &sg,
-				     &nsegs, &cluster);
-	} /* segments in bio */
+	struct scatterlist *sg = NULL;
+	int nsegs;
+	struct bio *next = bio->bi_next;
+	bio->bi_next = NULL;
 
+	nsegs = __blk_bios_map_sg(q, bio, sglist, &sg);
+	bio->bi_next = next;
 	if (sg)
 		sg_mark_end(sg);
 
-- 
1.9.rc1


  parent reply	other threads:[~2014-02-04 10:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-13  3:52 next bio iters break discard? Hugh Dickins
2014-01-14  2:33 ` Kent Overstreet
2014-01-14  4:06   ` Martin K. Petersen
2014-01-14  4:48     ` Kent Overstreet
2014-01-14 20:17       ` Martin K. Petersen
2014-01-14 22:24         ` Kent Overstreet
2014-01-16  1:39           ` Martin K. Petersen
2014-01-16 20:21           ` Hugh Dickins
2014-01-17  1:06             ` Kent Overstreet
2014-01-17  1:21             ` Kent Overstreet
2014-01-31 17:17               ` Hugh Dickins
2014-01-31 21:58                 ` Jens Axboe
2014-02-04 10:17                 ` Kent Overstreet [this message]
2014-02-04 12:25                   ` [PATCH] block: Explicitly handle discard/write same segments Hugh Dickins
2014-02-04 12:35                     ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140204101748.GA12440@kmo-pixel \
    --to=kmo@daterainc.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.