From: Alex Elder <elder@inktank.com>
To: Guangliang Zhao <gzhao@suse.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH v5] rbd: fix the memory leak of bio_chain_clone
Date: Wed, 10 Oct 2012 09:59:19 -0700 [thread overview]
Message-ID: <5075A967.2020207@inktank.com> (raw)
In-Reply-To: <5074EAF8.90407@inktank.com>
On 10/09/2012 08:26 PM, Alex Elder wrote:
> On 09/11/2012 02:17 PM, Alex Elder wrote:
>> On 09/06/2012 06:30 AM, Guangliang Zhao wrote:
>>> The bio_pair alloced in bio_chain_clone would not be freed,
>>> this will cause a memory leak. It could be freed actually only
>>> after 3 times release, because the reference count of bio_pair
>>> is initialized to 3 when bio_split and bio_pair_release only
>>> drops the reference count.
>>>
>>> The function bio_pair_release must be called three times for
>>> releasing bio_pair, and the callback functions of bios on the
>>> requests will be called when the last release time in bio_pair_release,
>>> however, these functions will also be called in rbd_req_cb. In
>>> other words, they will be called twice, and it may cause serious
>>> consequences.
>>
>>
>> I just want you to know I'm looking at this patch now. This is a
>> pretty complex bit of code though, so it may take me a bit to get
>> back to you.
>
> Sorry about the long delay. I've finally had a chance to look a
> little more closely at your patch.
>
> I had to sort of port what you supplied so it fit the current
> code, which has changed a little since you first sent this.
>
> It looks to me like it should work. Rather than using bio_split()
> when a bio is more than is needed to satisfy a particular
> segment of a request, you create a clone of the bio and pass
> it back to the caller. The next call will use that clone
> rather than the original as it continues processing the next
> segment of the request. The original bio in this case will
> be freed as before, and the clone will be freed (drop a reference)
> in a subsequent call when it gets "used up."
I've done enough testing with this to be satisfied this
works correctly.
> Do you have a test that you used to verify this both performed
> correctly when a split was found and no longer leaked anything?
I am still interested to know if you had a particular way
to verify that that the leak was occurring (or not). But
we obviously won't be leaking any bio_pairs any more...
-Alex
> I'm going to put it through some testing myself. I might want
> to make small revisions to a comment here or there, but otherwise
> I'll take it in unless I find it fails something.
>
> Thanks a lot.
>
> Reviewed-by: Alex Elder <elder@inktank.com>
>
>>> This patch clones bio chain from the origin directly instead of
>>> bio_split. The old bios which will be split may be modified by
>>> the callback fn, so their copys need to be saved(called split_bio).
>>> The new bio chain can be released whenever we don't need it.
>>>
>>> This patch can just handle the split of *single page* bios, but
>>> it's enough here for the following reasons:
>>>
>>> Only bios span across multiple osds need to be split, and these bios
>>> *must* be single page because of rbd_merge_bvec. With the function,
>>> the new bvec will not permitted to merge, if it make the bio cross
>>> the osd boundary, except it is the first one. In other words, there
>>> are two types of bio:
>>>
>>> - the bios don't cross the osd boundary
>>> They have one or more pages. The value of offset will
>>> always be 0 in this case, so nothing will be changed, and
>>> the code changes tmp bios doesn't take effact at all.
>>>
>>> - the bios cross the osd boundary
>>> Each one have only one page. These bios need to be split,
>>> and the offset is used to indicate the next bio, it makes
>>> sense only in this instance.
>>>
>>> The original bios may be modifid by the callback fn before the next
>>> bio_chain_clone() called, when a bio need to be split, so its copy
>>> will be saved.
>>>
>>> Signed-off-by: Guangliang Zhao <gzhao@suse.com>
>>> ---
>>>
>>> drivers/block/rbd.c | 102
>>> ++++++++++++++++++++++++++++++---------------------
>>> 1 file changed, 60 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 9917943..a605e1c 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -717,50 +717,70 @@ static void zero_bio_chain(struct bio *chain,
>>> int start_ofs)
>>> }
>>> }
>>>
>>> -/*
>>> +/**
>>> * bio_chain_clone - clone a chain of bios up to a certain length.
>>> - * might return a bio_pair that will need to be released.
>>> + * @old: bio to clone
>>> + * @split_bio: bio which will be split
>>> + * @offset: start point for bio clone
>>> + * @len: length of bio chain
>>> + * @gfp_mask: allocation priority
>>> + *
>>> + * Value of split_bio will be !NULL only when there is a bio need to be
>>> + * split. NULL otherwise.
>>> + *
>>> + * RETURNS:
>>> + * Pointer to new bio chain on success, NULL on failure.
>>> */
>>> -static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
>>> - struct bio_pair **bp,
>>> - int len, gfp_t gfpmask)
>>> +static struct bio *bio_chain_clone(struct bio **old, struct bio
>>> **split_bio,
>>> + int *offset, int len, gfp_t gfpmask)
>>> {
>>> - struct bio *tmp, *old_chain = *old, *new_chain = NULL, *tail =
>>> NULL;
>>> - int total = 0;
>>> -
>>> - if (*bp) {
>>> - bio_pair_release(*bp);
>>> - *bp = NULL;
>>> - }
>>> + struct bio *tmp, *old_chain, *split, *new_chain = NULL, *tail =
>>> NULL;
>>> + int total = 0, need = len;
>>>
>>> + split = *split_bio;
>>> + old_chain = split ? split : *old;
>>> while (old_chain && (total < len)) {
>>> tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
>>> if (!tmp)
>>> goto err_out;
>>>
>>> - if (total + old_chain->bi_size > len) {
>>> - struct bio_pair *bp;
>>> -
>>> - /*
>>> - * this split can only happen with a single paged bio,
>>> - * split_bio will BUG_ON if this is not the case
>>> - */
>>> - dout("bio_chain_clone split! total=%d remaining=%d"
>>> - "bi_size=%u\n",
>>> - total, len - total, old_chain->bi_size);
>>> -
>>> - /* split the bio. We'll release it either in the next
>>> - call, or it will have to be released outside */
>>> - bp = bio_split(old_chain, (len - total) / SECTOR_SIZE);
>>> - if (!bp)
>>> - goto err_out;
>>> -
>>> - __bio_clone(tmp, &bp->bio1);
>>> -
>>> - *next = &bp->bio2;
>>> + __bio_clone(tmp, old_chain);
>>> + tmp->bi_sector += *offset >> SECTOR_SHIFT;
>>> + tmp->bi_io_vec->bv_offset += *offset >> SECTOR_SHIFT;
>>> + /*
>>> + * The bios span across multiple osd objects must be
>>> + * single paged, rbd_merge_bvec would guarantee it.
>>> + * So we needn't worry about other things.
>>> + */
>>> + if (tmp->bi_size - *offset > need) {
>>> + tmp->bi_size = need;
>>> + tmp->bi_io_vec->bv_len = need;
>>> + *offset += need;
>>> + if (!split) {
>>> + /*
>>> + * Because the old bio may be modified by the
>>> + * callback function, its copy should be saved.
>>> + */
>>> + split = bio_clone(old_chain, gfpmask);
>>> + /*
>>> + * This is always the last allocation in this
>>> + * loop, so we only need release the bio chain
>>> + * when failed.
>>> + */
>>> + if (!split)
>>> + goto err_out;
>>> +
>>> + split->bi_next = old_chain->bi_next;
>>> + }
>>> } else {
>>> - __bio_clone(tmp, old_chain);
>>> - *next = old_chain->bi_next;
>>> + old_chain = old_chain->bi_next;
>>> + tmp->bi_size -= *offset;
>>> + tmp->bi_io_vec->bv_len -= *offset;
>>> + *offset = 0;
>>> + if (split) {
>>> + bio_put(split);
>>> + split = NULL;
>>> + }
>>> }
>>>
>>> tmp->bi_bdev = NULL;
>>> @@ -773,9 +793,9 @@ static struct bio *bio_chain_clone(struct bio
>>> **old, struct bio **next,
>>> tail->bi_next = tmp;
>>> tail = tmp;
>>> }
>>> - old_chain = old_chain->bi_next;
>>>
>>> total += tmp->bi_size;
>>> + need -= tmp->bi_size;
>>> }
>>>
>>> BUG_ON(total < len);
>>> @@ -784,6 +804,7 @@ static struct bio *bio_chain_clone(struct bio
>>> **old, struct bio **next,
>>> tail->bi_next = NULL;
>>>
>>> *old = old_chain;
>>> + *split_bio = split;
>>>
>>> return new_chain;
>>>
>>> @@ -1434,16 +1455,15 @@ static void rbd_rq_fn(struct request_queue *q)
>>> {
>>> struct rbd_device *rbd_dev = q->queuedata;
>>> struct request *rq;
>>> - struct bio_pair *bp = NULL;
>>>
>>> while ((rq = blk_fetch_request(q))) {
>>> struct bio *bio;
>>> - struct bio *rq_bio, *next_bio = NULL;
>>> + struct bio *rq_bio, *split_bio = NULL;
>>> bool do_write;
>>> unsigned int size;
>>> u64 op_size = 0;
>>> u64 ofs;
>>> - int num_segs, cur_seg = 0;
>>> + int num_segs, cur_seg = 0, offset = 0;
>>> struct rbd_req_coll *coll;
>>> struct ceph_snap_context *snapc;
>>>
>>> @@ -1507,7 +1527,7 @@ static void rbd_rq_fn(struct request_queue *q)
>>> ofs, size,
>>> NULL, NULL);
>>> kref_get(&coll->kref);
>>> - bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
>>> + bio = bio_chain_clone(&rq_bio, &split_bio, &offset,
>>> op_size, GFP_ATOMIC);
>>> if (!bio) {
>>> rbd_coll_end_req_index(rq, coll, cur_seg,
>>> @@ -1535,12 +1555,10 @@ next_seg:
>>> ofs += op_size;
>>>
>>> cur_seg++;
>>> - rq_bio = next_bio;
>>> } while (size > 0);
>>> kref_put(&coll->kref, rbd_coll_release);
>>>
>>> - if (bp)
>>> - bio_pair_release(bp);
>>> + BUG_ON(split_bio);
>>> spin_lock_irq(q->queue_lock);
>>>
>>> ceph_put_snap_context(snapc);
>>>
>>
>
next prev parent reply other threads:[~2012-10-10 16:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-06 11:30 [PATCH v5] rbd: fix the memory leak of bio_chain_clone Guangliang Zhao
2012-09-11 21:17 ` Alex Elder
2012-10-10 3:26 ` Alex Elder
2012-10-10 16:59 ` Alex Elder [this message]
2012-10-10 19:58 ` Alex Elder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5075A967.2020207@inktank.com \
--to=elder@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=gzhao@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox