All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: lantianyu1986@gmail.com, kys@microsoft.com,
	haiyangz@microsoft.com, sthemmin@microsoft.com,
	sashal@kernel.org, michael.h.kelley@microsoft.com
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	eric.devolder@oracle.com
Subject: Re: [RFC PATCH 3/4] Hyper-V/Balloon: Call add_memory() with dm_device.ha_lock.
Date: Wed, 11 Dec 2019 15:57:08 +0100	[thread overview]
Message-ID: <87pnguc3ln.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20191210154611.10958-4-Tianyu.Lan@microsoft.com>

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> The ha_lock is to protect hot-add region list ha_region_list.
> When Hyper-V delivers hot-add memory message, handle_pg_range()
> goes through the list to find the hot-add region state
> associated with message and do hot-add memory. The lock
> is released in the loop before calling hv_mem_hot_add()
> and is reacquired in hv_mem_hot_add(). There is a race
> that list entry maybe freed during the slot.

Do I understand correctly that without memory hot remove there's no
race? If yes than we should clarify this in the changelog.

>
> To avoid the race and simply the code, make hv_mem_hot_add()
> under protection of ha_region_list lock. There is a dead lock
> case when run add_memory() under ha_lock. add_memory() calls
> hv_online_page() inside and hv_online_page() also acquires
> ha_lock again. Add lock_thread in the struct hv_dynmem_device
> to record hv_mem_hot_add()'s thread and check lock_thread
> in hv_online_page(). hv_mem_hot_add() thread already holds
> lock during traverse hot add list and so not acquire lock
> in hv_online_page().
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/hv/hv_balloon.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 34bd73526afd..4d1a3b1e2490 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -545,6 +545,7 @@ struct hv_dynmem_device {
>  	 * regions from ha_region_list.
>  	 */
>  	spinlock_t ha_lock;
> +	struct task_struct *lock_thread;
>  
>  	/*
>  	 * A list of hot-add regions.
> @@ -707,12 +708,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  	unsigned long start_pfn;
>  	unsigned long processed_pfn;
>  	unsigned long total_pfn = pfn_count;
> -	unsigned long flags;
>  
>  	for (i = 0; i < (size/HA_CHUNK); i++) {
>  		start_pfn = start + (i * HA_CHUNK);
>  
> -		spin_lock_irqsave(&dm_device.ha_lock, flags);
>  		has->ha_end_pfn +=  HA_CHUNK;
>  
>  		if (total_pfn > HA_CHUNK) {
> @@ -724,7 +723,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  		}
>  
>  		has->covered_end_pfn +=  processed_pfn;
> -		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
>  		init_completion(&dm_device.ol_waitevent);
>  		dm_device.ha_waiting = !memhp_auto_online;
> @@ -745,10 +743,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  				 */
>  				do_hot_add = false;
>  			}
> -			spin_lock_irqsave(&dm_device.ha_lock, flags);
>  			has->ha_end_pfn -= HA_CHUNK;
>  			has->covered_end_pfn -=  processed_pfn;
> -			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  			break;
>  		}
>  
> @@ -771,8 +767,13 @@ static void hv_online_page(struct page *pg, unsigned int order)
>  	struct hv_hotadd_state *has;
>  	unsigned long flags;
>  	unsigned long pfn = page_to_pfn(pg);
> +	int unlocked;
> +
> +	if (dm_device.lock_thread != current) {

With lock_thread checking you're trying to protect against taking the
spinlock twice (when this is called from add_memory()) but why not just
check that spin_is_locked() AND we sit on the same CPU as the VMBus
channel attached to the balloon device? 

> +		spin_lock_irqsave(&dm_device.ha_lock, flags);
> +		unlocked = 1;
> +	}

We set unlocked to '1' when we're actually locked, aren't we?

>  
> -	spin_lock_irqsave(&dm_device.ha_lock, flags);
>  	list_for_each_entry(has, &dm_device.ha_region_list, list) {
>  		/* The page belongs to a different HAS. */
>  		if ((pfn < has->start_pfn) ||
> @@ -782,7 +783,9 @@ static void hv_online_page(struct page *pg, unsigned int order)
>  		hv_bring_pgs_online(has, pfn, 1UL << order);
>  		break;
>  	}
> -	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
> +
> +	if (unlocked)
> +		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  }
>  
>  static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
> @@ -860,6 +863,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
>  		pg_start);
>  
>  	spin_lock_irqsave(&dm_device.ha_lock, flags);
> +	dm_device.lock_thread = current;
>  	list_for_each_entry(has, &dm_device.ha_region_list, list) {
>  		/*
>  		 * If the pfn range we are dealing with is not in the current
> @@ -912,9 +916,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
>  			} else {
>  				pfn_cnt = size;
>  			}
> -			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  			hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
> -			spin_lock_irqsave(&dm_device.ha_lock, flags);

Apart from the deadlock you mention in the commit message, add_memory
does lock_device_hotplug()/unlock_device_hotplug() which is a mutex. If
I'm not mistaken you now take the mutext under a spinlock
(&dm_device.ha_lock). Not good.


>  		}
>  		/*
>  		 * If we managed to online any pages that were given to us,
> @@ -923,6 +925,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
>  		res = has->covered_end_pfn - old_covered_state;
>  		break;
>  	}
> +	dm_device.lock_thread = NULL;
>  	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
>  	return res;

-- 
Vitaly


  reply	other threads:[~2019-12-11 14:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-10 15:46 [RFC PATCH 0/4] x86/Hyper-V: Add Dynamic memory hot-remove function lantianyu1986
2019-12-10 15:46 ` [RFC PATCH 1/4] mm/resource: Move child to new resource when release mem region lantianyu1986
2019-12-10 15:46 ` [RFC PATCH 2/4] mm/hotplug: Expose is_mem_section_removable() and offline_pages() lantianyu1986
2019-12-11 12:07   ` David Hildenbrand
2019-12-10 15:46 ` [RFC PATCH 3/4] Hyper-V/Balloon: Call add_memory() with dm_device.ha_lock lantianyu1986
2019-12-11 14:57   ` Vitaly Kuznetsov [this message]
2019-12-12  8:24     ` [EXTERNAL] " Tianyu Lan
2019-12-10 15:46 ` [RFC PATCH 4/4] x86/Hyper-V: Add memory hot remove function lantianyu1986
2019-12-11 15:06   ` Vitaly Kuznetsov
2019-12-12 13:37     ` [EXTERNAL] " Tianyu Lan
2019-12-11 19:52   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnguc3ln.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=eric.devolder@oracle.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=lantianyu1986@gmail.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.h.kelley@microsoft.com \
    --cc=sashal@kernel.org \
    --cc=sthemmin@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.