Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2] net: mana: Allow variable size indirection table
From: Jakub Kicinski @ 2024-05-30 15:41 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-hardening, netdev, linux-hyperv, linux-kernel, linux-rdma,
	Colin Ian King, Ahmed Zaki, Pavan Chebbi, Souradeep Chakrabarti,
	Konstantin Taranov, Kees Cook, Paolo Abeni, Eric Dumazet,
	David S. Miller, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Leon Romanovsky, Jason Gunthorpe, Ajay Sharma,
	Long Li, Shradha Gupta
In-Reply-To: <1716960955-3195-1-git-send-email-shradhagupta@linux.microsoft.com>

On Tue, 28 May 2024 22:35:55 -0700 Shradha Gupta wrote:
> +	save_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
> +	if (!save_table)
> +		return -ENOMEM;
> +
>  	if (rxfh->indir) {
> -		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
> +		for (i = 0; i < apc->indir_table_sz; i++)
>  			if (rxfh->indir[i] >= apc->num_queues)
>  				return -EINVAL;

leaks save_table

^ permalink raw reply

* Re: [PATCH net-next v2] net: mana: Allow variable size indirection table
From: Leon Romanovsky @ 2024-05-30 14:37 UTC (permalink / raw)
  To: Shradha Gupta, Jakub Kicinski
  Cc: linux-hardening, netdev, linux-hyperv, linux-kernel, linux-rdma,
	Colin Ian King, Ahmed Zaki, Pavan Chebbi, Souradeep Chakrabarti,
	Konstantin Taranov, Kees Cook, Paolo Abeni, Eric Dumazet,
	David S. Miller, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Jason Gunthorpe, Ajay Sharma, Long Li,
	Shradha Gupta
In-Reply-To: <1716960955-3195-1-git-send-email-shradhagupta@linux.microsoft.com>

On Tue, May 28, 2024 at 10:35:55PM -0700, Shradha Gupta wrote:
> Allow variable size indirection table allocation in MANA instead
> of using a constant value MANA_INDIRECT_TABLE_SIZE.
> The size is now derived from the MANA_QUERY_VPORT_CONFIG and the
> indirection table is allocated dynamically.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  Changes in v2:
>  * Rebased to latest net-next tree
>  * Rearranged cleanup code in mana_probe_port to avoid extra operations
> ---
>  drivers/infiniband/hw/mana/qp.c               | 10 +--

Jakub,

Once you are ok with this patch, let me create shared branch for it.
It is -rc1 and Konstantin already submitted some changes to qp.c
https://lore.kernel.org/all/1716366242-558-1-git-send-email-kotaranov@linux.microsoft.com/

This specific patch applies on top of Konstantin's changes cleanly.

Thanks

>  drivers/net/ethernet/microsoft/mana/mana_en.c | 68 ++++++++++++++++---
>  .../ethernet/microsoft/mana/mana_ethtool.c    | 20 ++++--
>  include/net/mana/gdma.h                       |  4 +-
>  include/net/mana/mana.h                       |  9 +--
>  5 files changed, 84 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> index ba13c5abf8ef..2d411a16a127 100644
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -21,7 +21,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
>  
>  	gc = mdev_to_gc(dev);
>  
> -	req_buf_size = struct_size(req, indir_tab, MANA_INDIRECT_TABLE_SIZE);
> +	req_buf_size = struct_size(req, indir_tab, MANA_INDIRECT_TABLE_DEF_SIZE);
>  	req = kzalloc(req_buf_size, GFP_KERNEL);
>  	if (!req)
>  		return -ENOMEM;
> @@ -41,18 +41,18 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
>  	if (log_ind_tbl_size)
>  		req->rss_enable = true;
>  
> -	req->num_indir_entries = MANA_INDIRECT_TABLE_SIZE;
> +	req->num_indir_entries = MANA_INDIRECT_TABLE_DEF_SIZE;
>  	req->indir_tab_offset = offsetof(struct mana_cfg_rx_steer_req_v2,
>  					 indir_tab);
>  	req->update_indir_tab = true;
>  	req->cqe_coalescing_enable = 1;
>  
>  	/* The ind table passed to the hardware must have
> -	 * MANA_INDIRECT_TABLE_SIZE entries. Adjust the verb
> +	 * MANA_INDIRECT_TABLE_DEF_SIZE entries. Adjust the verb
>  	 * ind_table to MANA_INDIRECT_TABLE_SIZE if required
>  	 */
>  	ibdev_dbg(&dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
> -	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
> +	for (i = 0; i < MANA_INDIRECT_TABLE_DEF_SIZE; i++) {
>  		req->indir_tab[i] = ind_table[i % (1 << log_ind_tbl_size)];
>  		ibdev_dbg(&dev->ib_dev, "index %u handle 0x%llx\n", i,
>  			  req->indir_tab[i]);
> @@ -137,7 +137,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
>  	}
>  
>  	ind_tbl_size = 1 << ind_tbl->log_ind_tbl_size;
> -	if (ind_tbl_size > MANA_INDIRECT_TABLE_SIZE) {
> +	if (ind_tbl_size > MANA_INDIRECT_TABLE_DEF_SIZE) {
>  		ibdev_dbg(&mdev->ib_dev,
>  			  "Indirect table size %d exceeding limit\n",
>  			  ind_tbl_size);
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d087cf954f75..851e1b9761b3 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -481,7 +481,7 @@ static int mana_get_tx_queue(struct net_device *ndev, struct sk_buff *skb,
>  	struct sock *sk = skb->sk;
>  	int txq;
>  
> -	txq = apc->indir_table[hash & MANA_INDIRECT_TABLE_MASK];
> +	txq = apc->indir_table[hash & (apc->indir_table_sz - 1)];
>  
>  	if (txq != old_q && sk && sk_fullsock(sk) &&
>  	    rcu_access_pointer(sk->sk_dst_cache))
> @@ -962,7 +962,16 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
>  
>  	*max_sq = resp.max_num_sq;
>  	*max_rq = resp.max_num_rq;
> -	*num_indir_entry = resp.num_indirection_ent;
> +	if (resp.num_indirection_ent > 0 &&
> +	    resp.num_indirection_ent <= MANA_INDIRECT_TABLE_MAX_SIZE &&
> +	    is_power_of_2(resp.num_indirection_ent)) {
> +		*num_indir_entry = resp.num_indirection_ent;
> +	} else {
> +		netdev_warn(apc->ndev,
> +			    "Setting indirection table size to default %d for vPort %d\n",
> +			    MANA_INDIRECT_TABLE_DEF_SIZE, apc->port_idx);
> +		*num_indir_entry = MANA_INDIRECT_TABLE_DEF_SIZE;
> +	}
>  
>  	apc->port_handle = resp.vport;
>  	ether_addr_copy(apc->mac_addr, resp.mac_addr);
> @@ -1054,14 +1063,13 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
>  				   bool update_default_rxobj, bool update_key,
>  				   bool update_tab)
>  {
> -	u16 num_entries = MANA_INDIRECT_TABLE_SIZE;
>  	struct mana_cfg_rx_steer_req_v2 *req;
>  	struct mana_cfg_rx_steer_resp resp = {};
>  	struct net_device *ndev = apc->ndev;
>  	u32 req_buf_size;
>  	int err;
>  
> -	req_buf_size = struct_size(req, indir_tab, num_entries);
> +	req_buf_size = struct_size(req, indir_tab, apc->indir_table_sz);
>  	req = kzalloc(req_buf_size, GFP_KERNEL);
>  	if (!req)
>  		return -ENOMEM;
> @@ -1072,7 +1080,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
>  	req->hdr.req.msg_version = GDMA_MESSAGE_V2;
>  
>  	req->vport = apc->port_handle;
> -	req->num_indir_entries = num_entries;
> +	req->num_indir_entries = apc->indir_table_sz;
>  	req->indir_tab_offset = offsetof(struct mana_cfg_rx_steer_req_v2,
>  					 indir_tab);
>  	req->rx_enable = rx;
> @@ -1111,7 +1119,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
>  	}
>  
>  	netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
> -		    apc->port_handle, num_entries);
> +		    apc->port_handle, apc->indir_table_sz);
>  out:
>  	kfree(req);
>  	return err;
> @@ -2344,11 +2352,33 @@ static int mana_create_vport(struct mana_port_context *apc,
>  	return mana_create_txq(apc, net);
>  }
>  
> +static int mana_rss_table_alloc(struct mana_port_context *apc)
> +{
> +	if (!apc->indir_table_sz) {
> +		netdev_err(apc->ndev,
> +			   "Indirection table size not set for vPort %d\n",
> +			   apc->port_idx);
> +		return -EINVAL;
> +	}
> +
> +	apc->indir_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
> +	if (!apc->indir_table)
> +		return -ENOMEM;
> +
> +	apc->rxobj_table = kcalloc(apc->indir_table_sz, sizeof(mana_handle_t), GFP_KERNEL);
> +	if (!apc->rxobj_table) {
> +		kfree(apc->indir_table);
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
>  static void mana_rss_table_init(struct mana_port_context *apc)
>  {
>  	int i;
>  
> -	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
> +	for (i = 0; i < apc->indir_table_sz; i++)
>  		apc->indir_table[i] =
>  			ethtool_rxfh_indir_default(i, apc->num_queues);
>  }
> @@ -2361,7 +2391,7 @@ int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
>  	int i;
>  
>  	if (update_tab) {
> -		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
> +		for (i = 0; i < apc->indir_table_sz; i++) {
>  			queue_idx = apc->indir_table[i];
>  			apc->rxobj_table[i] = apc->rxqs[queue_idx]->rxobj;
>  		}
> @@ -2466,7 +2496,6 @@ static int mana_init_port(struct net_device *ndev)
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	u32 max_txq, max_rxq, max_queues;
>  	int port_idx = apc->port_idx;
> -	u32 num_indirect_entries;
>  	int err;
>  
>  	err = mana_init_port_context(apc);
> @@ -2474,7 +2503,7 @@ static int mana_init_port(struct net_device *ndev)
>  		return err;
>  
>  	err = mana_query_vport_cfg(apc, port_idx, &max_txq, &max_rxq,
> -				   &num_indirect_entries);
> +				   &apc->indir_table_sz);
>  	if (err) {
>  		netdev_err(ndev, "Failed to query info for vPort %d\n",
>  			   port_idx);
> @@ -2723,6 +2752,10 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
>  	if (err)
>  		goto free_net;
>  
> +	err = mana_rss_table_alloc(apc);
> +	if (err)
> +		goto reset_apc;
> +
>  	netdev_lockdep_set_classes(ndev);
>  
>  	ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> @@ -2739,11 +2772,17 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
>  	err = register_netdev(ndev);
>  	if (err) {
>  		netdev_err(ndev, "Unable to register netdev.\n");
> -		goto reset_apc;
> +		goto free_indir;
>  	}
>  
>  	return 0;
>  
> +free_indir:
> +	apc->indir_table_sz = 0;
> +	kfree(apc->indir_table);
> +	apc->indir_table = NULL;
> +	kfree(apc->rxobj_table);
> +	apc->rxobj_table = NULL;
>  reset_apc:
>  	kfree(apc->rxqs);
>  	apc->rxqs = NULL;
> @@ -2897,6 +2936,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
>  {
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct mana_context *ac = gd->driver_data;
> +	struct mana_port_context *apc;
>  	struct device *dev = gc->dev;
>  	struct net_device *ndev;
>  	int err;
> @@ -2908,6 +2948,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
>  
>  	for (i = 0; i < ac->num_ports; i++) {
>  		ndev = ac->ports[i];
> +		apc = netdev_priv(ndev);
>  		if (!ndev) {
>  			if (i == 0)
>  				dev_err(dev, "No net device to remove\n");
> @@ -2931,6 +2972,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
>  		}
>  
>  		unregister_netdevice(ndev);
> +		apc->indir_table_sz = 0;
> +		kfree(apc->indir_table);
> +		apc->indir_table = NULL;
> +		kfree(apc->rxobj_table);
> +		apc->rxobj_table = NULL;
>  
>  		rtnl_unlock();
>  
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index ab2413d71f6c..1667f18046d2 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -245,7 +245,9 @@ static u32 mana_get_rxfh_key_size(struct net_device *ndev)
>  
>  static u32 mana_rss_indir_size(struct net_device *ndev)
>  {
> -	return MANA_INDIRECT_TABLE_SIZE;
> +	struct mana_port_context *apc = netdev_priv(ndev);
> +
> +	return apc->indir_table_sz;
>  }
>  
>  static int mana_get_rxfh(struct net_device *ndev,
> @@ -257,7 +259,7 @@ static int mana_get_rxfh(struct net_device *ndev,
>  	rxfh->hfunc = ETH_RSS_HASH_TOP; /* Toeplitz */
>  
>  	if (rxfh->indir) {
> -		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
> +		for (i = 0; i < apc->indir_table_sz; i++)
>  			rxfh->indir[i] = apc->indir_table[i];
>  	}
>  
> @@ -273,8 +275,8 @@ static int mana_set_rxfh(struct net_device *ndev,
>  {
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	bool update_hash = false, update_table = false;
> -	u32 save_table[MANA_INDIRECT_TABLE_SIZE];
>  	u8 save_key[MANA_HASH_KEY_SIZE];
> +	u32 *save_table;
>  	int i, err;
>  
>  	if (!apc->port_is_up)
> @@ -284,13 +286,17 @@ static int mana_set_rxfh(struct net_device *ndev,
>  	    rxfh->hfunc != ETH_RSS_HASH_TOP)
>  		return -EOPNOTSUPP;
>  
> +	save_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
> +	if (!save_table)
> +		return -ENOMEM;
> +
>  	if (rxfh->indir) {
> -		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
> +		for (i = 0; i < apc->indir_table_sz; i++)
>  			if (rxfh->indir[i] >= apc->num_queues)
>  				return -EINVAL;
>  
>  		update_table = true;
> -		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
> +		for (i = 0; i < apc->indir_table_sz; i++) {
>  			save_table[i] = apc->indir_table[i];
>  			apc->indir_table[i] = rxfh->indir[i];
>  		}
> @@ -306,7 +312,7 @@ static int mana_set_rxfh(struct net_device *ndev,
>  
>  	if (err) { /* recover to original values */
>  		if (update_table) {
> -			for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
> +			for (i = 0; i < apc->indir_table_sz; i++)
>  				apc->indir_table[i] = save_table[i];
>  		}
>  
> @@ -316,6 +322,8 @@ static int mana_set_rxfh(struct net_device *ndev,
>  		mana_config_rss(apc, TRI_STATE_TRUE, update_hash, update_table);
>  	}
>  
> +	kfree(save_table);
> +
>  	return err;
>  }
>  
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 27684135bb4d..c547756c4284 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -543,11 +543,13 @@ enum {
>   */
>  #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
>  #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
> +#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
>  
>  #define GDMA_DRV_CAP_FLAGS1 \
>  	(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
>  	 GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> -	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> +	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
> +	 GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT)
>  
>  #define GDMA_DRV_CAP_FLAGS2 0
>  
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 561f6719fb4e..59823901b74f 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -30,8 +30,8 @@ enum TRI_STATE {
>  };
>  
>  /* Number of entries for hardware indirection table must be in power of 2 */
> -#define MANA_INDIRECT_TABLE_SIZE 64
> -#define MANA_INDIRECT_TABLE_MASK (MANA_INDIRECT_TABLE_SIZE - 1)
> +#define MANA_INDIRECT_TABLE_MAX_SIZE 512
> +#define MANA_INDIRECT_TABLE_DEF_SIZE 64
>  
>  /* The Toeplitz hash key's length in bytes: should be multiple of 8 */
>  #define MANA_HASH_KEY_SIZE 40
> @@ -410,10 +410,11 @@ struct mana_port_context {
>  	struct mana_tx_qp *tx_qp;
>  
>  	/* Indirection Table for RX & TX. The values are queue indexes */
> -	u32 indir_table[MANA_INDIRECT_TABLE_SIZE];
> +	u32 *indir_table;
> +	u32 indir_table_sz;
>  
>  	/* Indirection table containing RxObject Handles */
> -	mana_handle_t rxobj_table[MANA_INDIRECT_TABLE_SIZE];
> +	mana_handle_t *rxobj_table;
>  
>  	/*  Hash key used by the NIC */
>  	u8 hashkey[MANA_HASH_KEY_SIZE];
> -- 
> 2.34.1
> 

^ permalink raw reply

* Re: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: kernel test robot @ 2024-05-30  8:12 UTC (permalink / raw)
  To: Aditya Nagesh, adityanagesh, kys, haiyangz, wei.liu, decui,
	linux-hyperv, linux-kernel
  Cc: oe-kbuild-all, Aditya Nagesh
In-Reply-To: <1716998695-32135-1-git-send-email-adityanagesh@linux.microsoft.com>

Hi Aditya,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20240529]
[cannot apply to linus/master v6.10-rc1 v6.9 v6.9-rc7 v6.10-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditya-Nagesh/Drivers-hv-Cosmetic-changes-for-hv-c-and-balloon-c/20240530-000928
base:   next-20240529
patch link:    https://lore.kernel.org/r/1716998695-32135-1-git-send-email-adityanagesh%40linux.microsoft.com
patch subject: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
config: arm64-randconfig-002-20240530 (https://download.01.org/0day-ci/archive/20240530/202405301550.DKuS2OzK-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240530/202405301550.DKuS2OzK-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405301550.DKuS2OzK-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   drivers/hv/hv_balloon.c: In function 'hot_add_req':
>> drivers/hv/hv_balloon.c:998:9: warning: "/*" within comment [-Wcomment]
     998 |         /*
         |          
>> drivers/hv/hv_balloon.c:980: error: unterminated #ifdef
     980 | #ifdef CONFIG_MEMORY_HOTPLUG
         | 
>> drivers/hv/hv_balloon.c:978:39: error: expected declaration or statement at end of input
     978 |         resp.hdr.size = sizeof(struct dm_hot_add_response);
         |                                       ^~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:974:34: warning: unused variable 'dm' [-Wunused-variable]
     974 |         struct hv_dynmem_device *dm = &dm_device;
         |                                  ^~
   drivers/hv/hv_balloon.c: At top level:
>> drivers/hv/hv_balloon.c:575:13: warning: 'post_status' declared 'static' but never defined [-Wunused-function]
     575 | static void post_status(struct hv_dynmem_device *dm);
         |             ^~~~~~~~~~~
>> drivers/hv/hv_balloon.c:577:13: warning: 'enable_page_reporting' declared 'static' but never defined [-Wunused-function]
     577 | static void enable_page_reporting(void);
         |             ^~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:579:13: warning: 'disable_page_reporting' declared 'static' but never defined [-Wunused-function]
     579 | static void disable_page_reporting(void);
         |             ^~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:967:13: warning: 'hot_add_req' defined but not used [-Wunused-function]
     967 | static void hot_add_req(struct work_struct *dummy)
         |             ^~~~~~~~~~~
>> drivers/hv/hv_balloon.c:496:22: warning: 'ha_pages_in_chunk' defined but not used [-Wunused-variable]
     496 | static unsigned long ha_pages_in_chunk;
         |                      ^~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:494:13: warning: 'balloon_up_send_buffer' defined but not used [-Wunused-variable]
     494 | static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE];
         |             ^~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:493:13: warning: 'recv_buffer' defined but not used [-Wunused-variable]
     493 | static __u8 recv_buffer[HV_HYP_PAGE_SIZE];
         |             ^~~~~~~~~~~
>> drivers/hv/hv_balloon.c:478:12: warning: 'dm_ring_size' defined but not used [-Wunused-variable]
     478 | static int dm_ring_size = VMBUS_RING_SIZE(16 * 1024);
         |            ^~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:476:17: warning: 'trans_id' defined but not used [-Wunused-variable]
     476 | static atomic_t trans_id = ATOMIC_INIT(0);
         |                 ^~~~~~~~
>> drivers/hv/hv_balloon.c:469:12: warning: 'hv_hypercall_multi_failure' defined but not used [-Wunused-variable]
     469 | static int hv_hypercall_multi_failure;
         |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:467:22: warning: 'last_post_time' defined but not used [-Wunused-variable]
     467 | static unsigned long last_post_time;
         |                      ^~~~~~~~~~~~~~
>> drivers/hv/hv_balloon.c:455:13: warning: 'do_hot_add' defined but not used [-Wunused-variable]
     455 | static bool do_hot_add;
         |             ^~~~~~~~~~
>> drivers/hv/hv_balloon.c:453:13: warning: 'allow_hibernation' defined but not used [-Wunused-variable]
     453 | static bool allow_hibernation;
         |             ^~~~~~~~~~~~~~~~~


vim +980 drivers/hv/hv_balloon.c

1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   966  
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  @967  static void hot_add_req(struct work_struct *dummy)
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   968  {
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   969  	struct dm_hot_add_response resp;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   970  #ifdef CONFIG_MEMORY_HOTPLUG
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   971  	unsigned long pg_start, pfn_cnt;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   972  	unsigned long rg_start, rg_sz;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   973  #endif
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  @974  	struct hv_dynmem_device *dm = &dm_device;
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14   975  
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14   976  	memset(&resp, 0, sizeof(struct dm_hot_add_response));
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14   977  	resp.hdr.type = DM_MEM_HOT_ADD_RESPONSE;
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  @978  	resp.hdr.size = sizeof(struct dm_hot_add_response);
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14   979  
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  @980  #ifdef CONFIG_MEMORY_HOTPLUG
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   981  	pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   982  	pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   983  
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   984  	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   985  	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   986  
1aa0ebde593dfb1 Aditya Nagesh    2024-05-29   987  	if (rg_start == 0 && !dm->host_specified_ha_region) {
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   988  		/*
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   989  		 * Based on the hot-add page range being specified,
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   990  	}
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   991  
7f4f2302a11173d K. Y. Srinivasan 2013-03-18   992  	if (do_hot_add)
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   993  		resp.page_count = process_hot_add(pg_start, pfn_cnt,
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   994  						  rg_start, rg_sz);
549fd280b145e21 Vitaly Kuznetsov 2015-02-28   995  
549fd280b145e21 Vitaly Kuznetsov 2015-02-28   996  	dm->num_pages_added += resp.page_count;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15   997  #endif
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  @998  	/*
7f4f2302a11173d K. Y. Srinivasan 2013-03-18   999  	 * The result field of the response structure has the
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1000  	 * following semantics:
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1001  	 *
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1002  	 * 1. If all or some pages hot-added: Guest should return success.
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1003  	 *
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1004  	 * 2. If no pages could be hot-added:
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1005  	 *
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1006  	 * If the guest returns success, then the host
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1007  	 * will not attempt any further hot-add operations. This
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1008  	 * signifies a permanent failure.
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1009  	 *
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1010  	 * If the guest returns failure, then this failure will be
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1011  	 * treated as a transient failure and the host may retry the
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1012  	 * hot-add operation after some delay.
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1013  	 */
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1014  	if (resp.page_count > 0)
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1015  		resp.result = 1;
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1016  	else if (!do_hot_add)
7f4f2302a11173d K. Y. Srinivasan 2013-03-18  1017  		resp.result = 1;
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1018  	else
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1019  		resp.result = 0;
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1020  
25bd2b2f1f05347 Dexuan Cui       2019-11-19  1021  	if (!do_hot_add || resp.page_count == 0) {
25bd2b2f1f05347 Dexuan Cui       2019-11-19  1022  		if (!allow_hibernation)
223e1e4d2c16fed Vitaly Kuznetsov 2018-03-04  1023  			pr_err("Memory hot add failed\n");
25bd2b2f1f05347 Dexuan Cui       2019-11-19  1024  		else
25bd2b2f1f05347 Dexuan Cui       2019-11-19  1025  			pr_info("Ignore hot-add request!\n");
25bd2b2f1f05347 Dexuan Cui       2019-11-19  1026  	}
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1027  
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1028  	dm->state = DM_INITIALIZED;
20138d6cb838aa0 K. Y. Srinivasan 2013-07-17  1029  	resp.hdr.trans_id = atomic_inc_return(&trans_id);
1cac8cd4d146b60 K. Y. Srinivasan 2013-03-15  1030  	vmbus_sendpacket(dm->dev->channel, &resp,
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1031  			sizeof(struct dm_hot_add_response),
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1032  			(unsigned long)NULL,
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1033  			VM_PKT_DATA_INBAND, 0);
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1034  }
9aa8b50b2b3d3a7 K. Y. Srinivasan 2012-11-14  1035  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: Wei Liu @ 2024-05-30  2:45 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Aditya Nagesh, adityanagesh@microsoft.com, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR02MB41572A8E15A990EB162C60FCD4F22@SN6PR02MB4157.namprd02.prod.outlook.com>

On Wed, May 29, 2024 at 04:29:17PM +0000, Michael Kelley wrote:
> From: Aditya Nagesh <adityanagesh@linux.microsoft.com> Sent: Wednesday, May 29, 2024 9:05 AM
> > 
> > Fix issues reported by checkpatch.pl script in hv.c and
> > balloon.c
> >  - Remove unnecessary parentheses
> >  - Remove extra newlines
> >  - Remove extra spaces
> >  - Add spaces between comparison operators
> >  - Remove comparison with NULL in if statements
> > 
> > No functional changes intended
> > 
> > Signed-off-by: Aditya Nagesh <adityanagesh@linux.microsoft.com>
> > Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > [V5]
> > Rebase to hyperv-fixes
> > 
> > [V4]
> > Fix Alignment issue and revert a line since 100 characters are allowed in a line
> > 
> > [V3]
> > Fix alignment issues in multiline function parameters.
> > 
> > [V2]
> > Change Subject from "Drivers: hv: Fix Issues reported by checkpatch.pl script"
> >  to "Drivers: hv: Cosmetic changes for hv.c and balloon.c"
> >  drivers/hv/hv.c         |  37 +++++++-------
> >  drivers/hv/hv_balloon.c | 105 ++++++++++++++--------------------------
> >  2 files changed, 53 insertions(+), 89 deletions(-)
> > 
> 
> [snip]
> 
> > @@ -999,21 +984,14 @@ static void hot_add_req(struct work_struct *dummy)
> >  	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
> >  	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
> > 
> > -	if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
> > +	if (rg_start == 0 && !dm->host_specified_ha_region) {
> >  		/*
> > -		 * The host has not specified the hot-add region.
> >  		 * Based on the hot-add page range being specified,
> > -		 * compute a hot-add region that can cover the pages
> > -		 * that need to be hot-added while ensuring the alignment
> > -		 * and size requirements of Linux as it relates to hot-add.
> > -		 */
> > -		rg_start = ALIGN_DOWN(pg_start, ha_pages_in_chunk);
> > -		rg_sz = ALIGN(pfn_cnt, ha_pages_in_chunk);
> 
> Hmmm.  The above is not a cosmetic change.  Looks like this
> delta was erroneously introduced in the v5 version.  It wasn't
> there in v4.

This also breaks the build, since now the comment is not terminated.

Please fix this and resubmit.

In general, you should always build test your code before submission.

Thanks,
Wei.

> 
> Everything else LGTM.
> 
> Michael

^ permalink raw reply

* Re: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: kernel test robot @ 2024-05-30  1:06 UTC (permalink / raw)
  To: Aditya Nagesh, adityanagesh, kys, haiyangz, wei.liu, decui,
	linux-hyperv, linux-kernel
  Cc: llvm, oe-kbuild-all, Aditya Nagesh
In-Reply-To: <1716998695-32135-1-git-send-email-adityanagesh@linux.microsoft.com>

Hi Aditya,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20240529]
[cannot apply to linus/master v6.10-rc1 v6.9 v6.9-rc7 v6.10-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditya-Nagesh/Drivers-hv-Cosmetic-changes-for-hv-c-and-balloon-c/20240530-000928
base:   next-20240529
patch link:    https://lore.kernel.org/r/1716998695-32135-1-git-send-email-adityanagesh%40linux.microsoft.com
patch subject: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
config: i386-buildonly-randconfig-004-20240530 (https://download.01.org/0day-ci/archive/20240530/202405300835.7RLpoY4A-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240530/202405300835.7RLpoY4A-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405300835.7RLpoY4A-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/hv/hv_balloon.c:980:2: error: unterminated conditional directive
     980 | #ifdef CONFIG_MEMORY_HOTPLUG
         |  ^
>> drivers/hv/hv_balloon.c:2129:23: error: expected '}'
    2129 | MODULE_LICENSE("GPL");
         |                       ^
   drivers/hv/hv_balloon.c:968:1: note: to match this '{'
     968 | {
         | ^
   2 errors generated.


vim +980 drivers/hv/hv_balloon.c

1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   966  
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   967  static void hot_add_req(struct work_struct *dummy)
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   968  {
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   969  	struct dm_hot_add_response resp;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   970  #ifdef CONFIG_MEMORY_HOTPLUG
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   971  	unsigned long pg_start, pfn_cnt;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   972  	unsigned long rg_start, rg_sz;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   973  #endif
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   974  	struct hv_dynmem_device *dm = &dm_device;
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14   975  
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14   976  	memset(&resp, 0, sizeof(struct dm_hot_add_response));
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14   977  	resp.hdr.type = DM_MEM_HOT_ADD_RESPONSE;
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14   978  	resp.hdr.size = sizeof(struct dm_hot_add_response);
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14   979  
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  @980  #ifdef CONFIG_MEMORY_HOTPLUG
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   981  	pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   982  	pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   983  
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   984  	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   985  	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   986  
1aa0ebde593dfb Aditya Nagesh    2024-05-29   987  	if (rg_start == 0 && !dm->host_specified_ha_region) {
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   988  		/*
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   989  		 * Based on the hot-add page range being specified,
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   990  	}
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   991  
7f4f2302a11173 K. Y. Srinivasan 2013-03-18   992  	if (do_hot_add)
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   993  		resp.page_count = process_hot_add(pg_start, pfn_cnt,
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   994  						  rg_start, rg_sz);
549fd280b145e2 Vitaly Kuznetsov 2015-02-28   995  
549fd280b145e2 Vitaly Kuznetsov 2015-02-28   996  	dm->num_pages_added += resp.page_count;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15   997  #endif
7f4f2302a11173 K. Y. Srinivasan 2013-03-18   998  	/*
7f4f2302a11173 K. Y. Srinivasan 2013-03-18   999  	 * The result field of the response structure has the
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1000  	 * following semantics:
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1001  	 *
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1002  	 * 1. If all or some pages hot-added: Guest should return success.
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1003  	 *
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1004  	 * 2. If no pages could be hot-added:
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1005  	 *
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1006  	 * If the guest returns success, then the host
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1007  	 * will not attempt any further hot-add operations. This
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1008  	 * signifies a permanent failure.
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1009  	 *
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1010  	 * If the guest returns failure, then this failure will be
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1011  	 * treated as a transient failure and the host may retry the
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1012  	 * hot-add operation after some delay.
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1013  	 */
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1014  	if (resp.page_count > 0)
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1015  		resp.result = 1;
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1016  	else if (!do_hot_add)
7f4f2302a11173 K. Y. Srinivasan 2013-03-18  1017  		resp.result = 1;
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1018  	else
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1019  		resp.result = 0;
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1020  
25bd2b2f1f0534 Dexuan Cui       2019-11-19  1021  	if (!do_hot_add || resp.page_count == 0) {
25bd2b2f1f0534 Dexuan Cui       2019-11-19  1022  		if (!allow_hibernation)
223e1e4d2c16fe Vitaly Kuznetsov 2018-03-04  1023  			pr_err("Memory hot add failed\n");
25bd2b2f1f0534 Dexuan Cui       2019-11-19  1024  		else
25bd2b2f1f0534 Dexuan Cui       2019-11-19  1025  			pr_info("Ignore hot-add request!\n");
25bd2b2f1f0534 Dexuan Cui       2019-11-19  1026  	}
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1027  
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1028  	dm->state = DM_INITIALIZED;
20138d6cb838aa K. Y. Srinivasan 2013-07-17  1029  	resp.hdr.trans_id = atomic_inc_return(&trans_id);
1cac8cd4d146b6 K. Y. Srinivasan 2013-03-15  1030  	vmbus_sendpacket(dm->dev->channel, &resp,
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1031  			sizeof(struct dm_hot_add_response),
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1032  			(unsigned long)NULL,
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1033  			VM_PKT_DATA_INBAND, 0);
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1034  }
9aa8b50b2b3d3a K. Y. Srinivasan 2012-11-14  1035  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* RE: [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: Michael Kelley @ 2024-05-29 16:29 UTC (permalink / raw)
  To: Aditya Nagesh, adityanagesh@microsoft.com, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1716998695-32135-1-git-send-email-adityanagesh@linux.microsoft.com>

From: Aditya Nagesh <adityanagesh@linux.microsoft.com> Sent: Wednesday, May 29, 2024 9:05 AM
> 
> Fix issues reported by checkpatch.pl script in hv.c and
> balloon.c
>  - Remove unnecessary parentheses
>  - Remove extra newlines
>  - Remove extra spaces
>  - Add spaces between comparison operators
>  - Remove comparison with NULL in if statements
> 
> No functional changes intended
> 
> Signed-off-by: Aditya Nagesh <adityanagesh@linux.microsoft.com>
> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> [V5]
> Rebase to hyperv-fixes
> 
> [V4]
> Fix Alignment issue and revert a line since 100 characters are allowed in a line
> 
> [V3]
> Fix alignment issues in multiline function parameters.
> 
> [V2]
> Change Subject from "Drivers: hv: Fix Issues reported by checkpatch.pl script"
>  to "Drivers: hv: Cosmetic changes for hv.c and balloon.c"
>  drivers/hv/hv.c         |  37 +++++++-------
>  drivers/hv/hv_balloon.c | 105 ++++++++++++++--------------------------
>  2 files changed, 53 insertions(+), 89 deletions(-)
> 

[snip]

> @@ -999,21 +984,14 @@ static void hot_add_req(struct work_struct *dummy)
>  	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
>  	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
> 
> -	if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
> +	if (rg_start == 0 && !dm->host_specified_ha_region) {
>  		/*
> -		 * The host has not specified the hot-add region.
>  		 * Based on the hot-add page range being specified,
> -		 * compute a hot-add region that can cover the pages
> -		 * that need to be hot-added while ensuring the alignment
> -		 * and size requirements of Linux as it relates to hot-add.
> -		 */
> -		rg_start = ALIGN_DOWN(pg_start, ha_pages_in_chunk);
> -		rg_sz = ALIGN(pfn_cnt, ha_pages_in_chunk);

Hmmm.  The above is not a cosmetic change.  Looks like this
delta was erroneously introduced in the v5 version.  It wasn't
there in v4.

Everything else LGTM.

Michael

^ permalink raw reply

* [PATCH v5] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: Aditya Nagesh @ 2024-05-29 16:04 UTC (permalink / raw)
  To: adityanagesh, kys, haiyangz, wei.liu, decui, linux-hyperv,
	linux-kernel
  Cc: Aditya Nagesh

Fix issues reported by checkpatch.pl script in hv.c and
balloon.c
 - Remove unnecessary parentheses
 - Remove extra newlines
 - Remove extra spaces
 - Add spaces between comparison operators
 - Remove comparison with NULL in if statements

No functional changes intended

Signed-off-by: Aditya Nagesh <adityanagesh@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
[V5]
Rebase to hyperv-fixes

[V4]
Fix Alignment issue and revert a line since 100 characters are allowed in a line

[V3]
Fix alignment issues in multiline function parameters.

[V2]
Change Subject from "Drivers: hv: Fix Issues reported by checkpatch.pl script"
 to "Drivers: hv: Cosmetic changes for hv.c and balloon.c"
 drivers/hv/hv.c         |  37 +++++++-------
 drivers/hv/hv_balloon.c | 105 ++++++++++++++--------------------------
 2 files changed, 53 insertions(+), 89 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index a8ad728354cb..e0d676c74f14 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -45,8 +45,8 @@ int hv_init(void)
  * This involves a hypercall.
  */
 int hv_post_message(union hv_connection_id connection_id,
-		  enum hv_message_type message_type,
-		  void *payload, size_t payload_size)
+			enum hv_message_type message_type,
+			void *payload, size_t payload_size)
 {
 	struct hv_input_post_message *aligned_msg;
 	unsigned long flags;
@@ -86,7 +86,7 @@ int hv_post_message(union hv_connection_id connection_id,
 			status = HV_STATUS_INVALID_PARAMETER;
 	} else {
 		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
-				aligned_msg, NULL);
+					 aligned_msg, NULL);
 	}
 
 	local_irq_restore(flags);
@@ -111,7 +111,7 @@ int hv_synic_alloc(void)
 
 	hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
 					 GFP_KERNEL);
-	if (hv_context.hv_numa_map == NULL) {
+	if (!hv_context.hv_numa_map) {
 		pr_err("Unable to allocate NUMA map\n");
 		goto err;
 	}
@@ -120,11 +120,11 @@ int hv_synic_alloc(void)
 		hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		tasklet_init(&hv_cpu->msg_dpc,
-			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
+			     vmbus_on_msg_dpc, (unsigned long)hv_cpu);
 
 		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
 			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
-			if (hv_cpu->post_msg_page == NULL) {
+			if (!hv_cpu->post_msg_page) {
 				pr_err("Unable to allocate post msg page\n");
 				goto err;
 			}
@@ -147,14 +147,14 @@ int hv_synic_alloc(void)
 		if (!ms_hyperv.paravisor_present && !hv_root_partition) {
 			hv_cpu->synic_message_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (hv_cpu->synic_message_page == NULL) {
+			if (!hv_cpu->synic_message_page) {
 				pr_err("Unable to allocate SYNIC message page\n");
 				goto err;
 			}
 
 			hv_cpu->synic_event_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (hv_cpu->synic_event_page == NULL) {
+			if (!hv_cpu->synic_event_page) {
 				pr_err("Unable to allocate SYNIC event page\n");
 
 				free_page((unsigned long)hv_cpu->synic_message_page);
@@ -203,14 +203,13 @@ int hv_synic_alloc(void)
 	return ret;
 }
 
-
 void hv_synic_free(void)
 {
 	int cpu, ret;
 
 	for_each_present_cpu(cpu) {
-		struct hv_per_cpu_context *hv_cpu
-			= per_cpu_ptr(hv_context.cpu_context, cpu);
+		struct hv_per_cpu_context *hv_cpu =
+			per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		/* It's better to leak the page if the encryption fails. */
 		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
@@ -262,8 +261,8 @@ void hv_synic_free(void)
  */
 void hv_synic_enable_regs(unsigned int cpu)
 {
-	struct hv_per_cpu_context *hv_cpu
-		= per_cpu_ptr(hv_context.cpu_context, cpu);
+	struct hv_per_cpu_context *hv_cpu =
+		per_cpu_ptr(hv_context.cpu_context, cpu);
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_sint shared_sint;
@@ -277,8 +276,8 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_message_page
-			= (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
+		hv_cpu->synic_message_page =
+			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
 		if (!hv_cpu->synic_message_page)
 			pr_err("Fail to map synic message page.\n");
 	} else {
@@ -296,8 +295,8 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_event_page
-			= (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
+		hv_cpu->synic_event_page =
+			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
 		if (!hv_cpu->synic_event_page)
 			pr_err("Fail to map synic event page.\n");
 	} else {
@@ -348,8 +347,8 @@ int hv_synic_init(unsigned int cpu)
  */
 void hv_synic_disable_regs(unsigned int cpu)
 {
-	struct hv_per_cpu_context *hv_cpu
-		= per_cpu_ptr(hv_context.cpu_context, cpu);
+	struct hv_per_cpu_context *hv_cpu =
+		per_cpu_ptr(hv_context.cpu_context, cpu);
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4370ad31b5b3..8f94240e5910 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -42,8 +42,6 @@
  * Begin protocol definitions.
  */
 
-
-
 /*
  * Protocol versions. The low word is the minor version, the high word the major
  * version.
@@ -72,8 +70,6 @@ enum {
 	DYNMEM_PROTOCOL_VERSION_CURRENT = DYNMEM_PROTOCOL_VERSION_WIN10
 };
 
-
-
 /*
  * Message Types
  */
@@ -102,7 +98,6 @@ enum dm_message_type {
 	DM_VERSION_1_MAX		= 12
 };
 
-
 /*
  * Structures defining the dynamic memory management
  * protocol.
@@ -116,7 +111,6 @@ union dm_version {
 	__u32 version;
 } __packed;
 
-
 union dm_caps {
 	struct {
 		__u64 balloon:1;
@@ -149,8 +143,6 @@ union dm_mem_page_range {
 	__u64  page_range;
 } __packed;
 
-
-
 /*
  * The header for all dynamic memory messages:
  *
@@ -175,7 +167,6 @@ struct dm_message {
 	__u8 data[]; /* enclosed message */
 } __packed;
 
-
 /*
  * Specific message types supporting the dynamic memory protocol.
  */
@@ -272,7 +263,6 @@ struct dm_status {
 	__u32 io_diff;
 } __packed;
 
-
 /*
  * Message to ask the guest to allocate memory - balloon up message.
  * This message is sent from the host to the guest. The guest may not be
@@ -287,14 +277,13 @@ struct dm_balloon {
 	__u32 reservedz;
 } __packed;
 
-
 /*
  * Balloon response message; this message is sent from the guest
  * to the host in response to the balloon message.
  *
  * reservedz: Reserved; must be set to zero.
  * more_pages: If FALSE, this is the last message of the transaction.
- * if TRUE there will atleast one more message from the guest.
+ * if TRUE there will be at least one more message from the guest.
  *
  * range_count: The number of ranges in the range array.
  *
@@ -315,7 +304,7 @@ struct dm_balloon_response {
  * to the guest to give guest more memory.
  *
  * more_pages: If FALSE, this is the last message of the transaction.
- * if TRUE there will atleast one more message from the guest.
+ * if TRUE there will be at least one more message from the guest.
  *
  * reservedz: Reserved; must be set to zero.
  *
@@ -343,7 +332,6 @@ struct dm_unballoon_response {
 	struct dm_header hdr;
 } __packed;
 
-
 /*
  * Hot add request message. Message sent from the host to the guest.
  *
@@ -391,7 +379,6 @@ enum dm_info_type {
 	MAX_INFO_TYPE
 };
 
-
 /*
  * Header for the information message.
  */
@@ -481,10 +468,10 @@ static unsigned long last_post_time;
 
 static int hv_hypercall_multi_failure;
 
-module_param(hot_add, bool, (S_IRUGO | S_IWUSR));
+module_param(hot_add, bool, 0644);
 MODULE_PARM_DESC(hot_add, "If set attempt memory hot_add");
 
-module_param(pressure_report_delay, uint, (S_IRUGO | S_IWUSR));
+module_param(pressure_report_delay, uint, 0644);
 MODULE_PARM_DESC(pressure_report_delay, "Delay in secs in reporting pressure");
 static atomic_t trans_id = ATOMIC_INIT(0);
 
@@ -503,7 +490,6 @@ enum hv_dm_state {
 	DM_INIT_ERROR
 };
 
-
 static __u8 recv_buffer[HV_HYP_PAGE_SIZE];
 static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE];
 
@@ -599,12 +585,12 @@ static inline bool has_pfn_is_backed(struct hv_hotadd_state *has,
 	struct hv_hotadd_gap *gap;
 
 	/* The page is not backed. */
-	if ((pfn < has->covered_start_pfn) || (pfn >= has->covered_end_pfn))
+	if (pfn < has->covered_start_pfn || pfn >= has->covered_end_pfn)
 		return false;
 
 	/* Check for gaps. */
 	list_for_each_entry(gap, &has->gap_list, list) {
-		if ((pfn >= gap->start_pfn) && (pfn < gap->end_pfn))
+		if (pfn >= gap->start_pfn && pfn < gap->end_pfn)
 			return false;
 	}
 
@@ -784,8 +770,8 @@ static void hv_online_page(struct page *pg, unsigned int order)
 	guard(spinlock_irqsave)(&dm_device.ha_lock);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
 		/* The page belongs to a different HAS. */
-		if ((pfn < has->start_pfn) ||
-				(pfn + (1UL << order) > has->end_pfn))
+		if (pfn < has->start_pfn ||
+		    (pfn + (1UL << order) > has->end_pfn))
 			continue;
 
 		hv_bring_pgs_online(has, pfn, 1UL << order);
@@ -846,7 +832,7 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 }
 
 static unsigned long handle_pg_range(unsigned long pg_start,
-					unsigned long pg_count)
+				     unsigned long pg_count)
 {
 	unsigned long start_pfn = pg_start;
 	unsigned long pfn_cnt = pg_count;
@@ -857,7 +843,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 	unsigned long res = 0, flags;
 
 	pr_debug("Hot adding %lu pages starting at pfn 0x%lx.\n", pg_count,
-		pg_start);
+		 pg_start);
 
 	spin_lock_irqsave(&dm_device.ha_lock, flags);
 	list_for_each_entry(has, &dm_device.ha_region_list, list) {
@@ -893,10 +879,9 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 			if (start_pfn > has->start_pfn &&
 			    online_section_nr(pfn_to_section_nr(start_pfn)))
 				hv_bring_pgs_online(has, start_pfn, pgs_ol);
-
 		}
 
-		if ((has->ha_end_pfn < has->end_pfn) && (pfn_cnt > 0)) {
+		if (has->ha_end_pfn < has->end_pfn && pfn_cnt > 0) {
 			/*
 			 * We have some residual hot add range
 			 * that needs to be hot added; hot add
@@ -999,21 +984,14 @@ static void hot_add_req(struct work_struct *dummy)
 	rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
 	rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
 
-	if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
+	if (rg_start == 0 && !dm->host_specified_ha_region) {
 		/*
-		 * The host has not specified the hot-add region.
 		 * Based on the hot-add page range being specified,
-		 * compute a hot-add region that can cover the pages
-		 * that need to be hot-added while ensuring the alignment
-		 * and size requirements of Linux as it relates to hot-add.
-		 */
-		rg_start = ALIGN_DOWN(pg_start, ha_pages_in_chunk);
-		rg_sz = ALIGN(pfn_cnt, ha_pages_in_chunk);
 	}
 
 	if (do_hot_add)
 		resp.page_count = process_hot_add(pg_start, pfn_cnt,
-						rg_start, rg_sz);
+						  rg_start, rg_sz);
 
 	dm->num_pages_added += resp.page_count;
 #endif
@@ -1191,11 +1169,10 @@ static void post_status(struct hv_dynmem_device *dm)
 				sizeof(struct dm_status),
 				(unsigned long)NULL,
 				VM_PKT_DATA_INBAND, 0);
-
 }
 
 static void free_balloon_pages(struct hv_dynmem_device *dm,
-			 union dm_mem_page_range *range_array)
+			       union dm_mem_page_range *range_array)
 {
 	int num_pages = range_array->finfo.page_cnt;
 	__u64 start_frame = range_array->finfo.start_page;
@@ -1211,8 +1188,6 @@ static void free_balloon_pages(struct hv_dynmem_device *dm,
 	}
 }
 
-
-
 static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 					unsigned int num_pages,
 					struct dm_balloon_response *bl_resp,
@@ -1258,7 +1233,6 @@ static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
 			page_to_pfn(pg);
 		bl_resp->range_array[i].finfo.page_cnt = alloc_unit;
 		bl_resp->hdr.size += sizeof(union dm_mem_page_range);
-
 	}
 
 	return i * alloc_unit;
@@ -1312,7 +1286,7 @@ static void balloon_up(struct work_struct *dummy)
 
 		if (num_ballooned == 0 || num_ballooned == num_pages) {
 			pr_debug("Ballooned %u out of %u requested pages.\n",
-				num_pages, dm_device.balloon_wrk.num_pages);
+				 num_pages, dm_device.balloon_wrk.num_pages);
 
 			bl_resp->more_pages = 0;
 			done = true;
@@ -1346,16 +1320,15 @@ static void balloon_up(struct work_struct *dummy)
 
 			for (i = 0; i < bl_resp->range_count; i++)
 				free_balloon_pages(&dm_device,
-						 &bl_resp->range_array[i]);
+						   &bl_resp->range_array[i]);
 
 			done = true;
 		}
 	}
-
 }
 
 static void balloon_down(struct hv_dynmem_device *dm,
-			struct dm_unballoon_request *req)
+			 struct dm_unballoon_request *req)
 {
 	union dm_mem_page_range *range_array = req->range_array;
 	int range_count = req->range_count;
@@ -1369,7 +1342,7 @@ static void balloon_down(struct hv_dynmem_device *dm,
 	}
 
 	pr_debug("Freed %u ballooned pages.\n",
-		prev_pages_ballooned - dm->num_pages_ballooned);
+		 prev_pages_ballooned - dm->num_pages_ballooned);
 
 	if (req->more_pages == 1)
 		return;
@@ -1394,8 +1367,7 @@ static int dm_thread_func(void *dm_dev)
 	struct hv_dynmem_device *dm = dm_dev;
 
 	while (!kthread_should_stop()) {
-		wait_for_completion_interruptible_timeout(
-						&dm_device.config_event, 1*HZ);
+		wait_for_completion_interruptible_timeout(&dm_device.config_event, 1 * HZ);
 		/*
 		 * The host expects us to post information on the memory
 		 * pressure every second.
@@ -1419,9 +1391,8 @@ static int dm_thread_func(void *dm_dev)
 	return 0;
 }
 
-
 static void version_resp(struct hv_dynmem_device *dm,
-			struct dm_version_response *vresp)
+			 struct dm_version_response *vresp)
 {
 	struct dm_version_request version_req;
 	int ret;
@@ -1482,7 +1453,7 @@ static void version_resp(struct hv_dynmem_device *dm,
 }
 
 static void cap_resp(struct hv_dynmem_device *dm,
-			struct dm_capabilities_resp_msg *cap_resp)
+		     struct dm_capabilities_resp_msg *cap_resp)
 {
 	if (!cap_resp->is_accepted) {
 		pr_err("Capabilities not accepted by host\n");
@@ -1515,7 +1486,7 @@ static void balloon_onchannelcallback(void *context)
 		switch (dm_hdr->type) {
 		case DM_VERSION_RESPONSE:
 			version_resp(dm,
-				 (struct dm_version_response *)dm_msg);
+				     (struct dm_version_response *)dm_msg);
 			break;
 
 		case DM_CAPABILITIES_RESPONSE:
@@ -1545,7 +1516,7 @@ static void balloon_onchannelcallback(void *context)
 
 			dm->state = DM_BALLOON_DOWN;
 			balloon_down(dm,
-				 (struct dm_unballoon_request *)recv_buffer);
+				     (struct dm_unballoon_request *)recv_buffer);
 			break;
 
 		case DM_MEM_HOT_ADD_REQUEST:
@@ -1583,17 +1554,15 @@ static void balloon_onchannelcallback(void *context)
 
 		default:
 			pr_warn_ratelimited("Unhandled message: type: %d\n", dm_hdr->type);
-
 		}
 	}
-
 }
 
 #define HV_LARGE_REPORTING_ORDER	9
 #define HV_LARGE_REPORTING_LEN (HV_HYP_PAGE_SIZE << \
 		HV_LARGE_REPORTING_ORDER)
 static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
-		    struct scatterlist *sgl, unsigned int nents)
+			       struct scatterlist *sgl, unsigned int nents)
 {
 	unsigned long flags;
 	struct hv_memory_hint *hint;
@@ -1628,7 +1597,7 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
 		 */
 
 		/* page reporting for pages 2MB or higher */
-		if (order >= HV_LARGE_REPORTING_ORDER ) {
+		if (order >= HV_LARGE_REPORTING_ORDER) {
 			range->page.largepage = 1;
 			range->page_size = HV_GPA_PAGE_RANGE_PAGE_SIZE_2MB;
 			range->base_large_pfn = page_to_hvpfn(
@@ -1642,23 +1611,21 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
 			range->page.additional_pages =
 				(sg->length / HV_HYP_PAGE_SIZE) - 1;
 		}
-
 	}
 
 	status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT, nents, 0,
 				     hint, NULL);
 	local_irq_restore(flags);
 	if (!hv_result_success(status)) {
-
 		pr_err("Cold memory discard hypercall failed with status %llx\n",
-				status);
+		       status);
 		if (hv_hypercall_multi_failure > 0)
 			hv_hypercall_multi_failure++;
 
 		if (hv_result(status) == HV_STATUS_INVALID_PARAMETER) {
 			pr_err("Underlying Hyper-V does not support order less than 9. Hypercall failed\n");
 			pr_err("Defaulting to page_reporting_order %d\n",
-					pageblock_order);
+			       pageblock_order);
 			page_reporting_order = pageblock_order;
 			hv_hypercall_multi_failure++;
 			return -EINVAL;
@@ -1692,7 +1659,7 @@ static void enable_page_reporting(void)
 		pr_err("Failed to enable cold memory discard: %d\n", ret);
 	} else {
 		pr_info("Cold memory discard hint enabled with order %d\n",
-				page_reporting_order);
+			page_reporting_order);
 	}
 }
 
@@ -1775,7 +1742,7 @@ static int balloon_connect_vsp(struct hv_device *dev)
 	if (ret)
 		goto out;
 
-	t = wait_for_completion_timeout(&dm_device.host_event, 5*HZ);
+	t = wait_for_completion_timeout(&dm_device.host_event, 5 * HZ);
 	if (t == 0) {
 		ret = -ETIMEDOUT;
 		goto out;
@@ -1833,7 +1800,7 @@ static int balloon_connect_vsp(struct hv_device *dev)
 	if (ret)
 		goto out;
 
-	t = wait_for_completion_timeout(&dm_device.host_event, 5*HZ);
+	t = wait_for_completion_timeout(&dm_device.host_event, 5 * HZ);
 	if (t == 0) {
 		ret = -ETIMEDOUT;
 		goto out;
@@ -1874,8 +1841,8 @@ static int hv_balloon_debug_show(struct seq_file *f, void *offset)
 	char *sname;
 
 	seq_printf(f, "%-22s: %u.%u\n", "host_version",
-				DYNMEM_MAJOR_VERSION(dm->version),
-				DYNMEM_MINOR_VERSION(dm->version));
+			DYNMEM_MAJOR_VERSION(dm->version),
+			DYNMEM_MINOR_VERSION(dm->version));
 
 	seq_printf(f, "%-22s:", "capabilities");
 	if (ballooning_enabled())
@@ -1924,10 +1891,10 @@ static int hv_balloon_debug_show(struct seq_file *f, void *offset)
 	seq_printf(f, "%-22s: %u\n", "pages_ballooned", dm->num_pages_ballooned);
 
 	seq_printf(f, "%-22s: %lu\n", "total_pages_committed",
-				get_pages_committed(dm));
+		   get_pages_committed(dm));
 
 	seq_printf(f, "%-22s: %llu\n", "max_dynamic_page_count",
-				dm->max_dynamic_page_count);
+		   dm->max_dynamic_page_count);
 
 	return 0;
 }
@@ -1937,7 +1904,7 @@ DEFINE_SHOW_ATTRIBUTE(hv_balloon_debug);
 static void  hv_balloon_debugfs_init(struct hv_dynmem_device *b)
 {
 	debugfs_create_file("hv-balloon", 0444, NULL, b,
-			&hv_balloon_debug_fops);
+			    &hv_balloon_debug_fops);
 }
 
 static void  hv_balloon_debugfs_exit(struct hv_dynmem_device *b)
@@ -2095,7 +2062,6 @@ static int balloon_suspend(struct hv_device *hv_dev)
 	tasklet_enable(&hv_dev->channel->callback_event);
 
 	return 0;
-
 }
 
 static int balloon_resume(struct hv_device *dev)
@@ -2154,7 +2120,6 @@ static  struct hv_driver balloon_drv = {
 
 static int __init init_balloon_drv(void)
 {
-
 	return vmbus_driver_register(&balloon_drv);
 }
 
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCHv11 05/19] x86/relocate_kernel: Use named labels for less confusion
From: Borislav Petkov @ 2024-05-29 15:15 UTC (permalink / raw)
  To: Andrew Cooper, Nikolay Borisov
  Cc: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	x86, Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <55bc0649-c017-49ab-905d-212f140a403f@citrix.com>

On Wed, May 29, 2024 at 01:33:35PM +0100, Andrew Cooper wrote:
> Seems I've gained a reputation...

Yes you have. You have this weird interest in very deep uarch details
that I can't share. Not at that detail. :-P

> jmp 1f dates back to ye olde 8086, which started the whole trend of the
> instruction pointer just being a figment of the ISA's imagination[1].
> 
> Hardware maintains the pointer to the next byte to fetch (the prefetch
> queue was up to 6 bytes), and there was a micro-op to subtract the
> current length of the prefetch queue from the accumulator.
> 
> In those days, the prefetch queue was not coherent with main memory, and
> jumps (being a discontinuity in the instruction stream) simply flushed
> the prefetch queue.
> 
> This was necessary after modifying executable code, because otherwise
> you could end up executing stale bytes from the prefetch queue and then
> non-stale bytes thereafter.  (Otherwise known as the way to distinguish
> the 8086 from the 8088 because the latter only had a 4 byte prefetch queue.)

Thanks - that certainly wakes up a long-asleep neuron in the back of my
mind...

> Anyway.  It's how you used to spell "serialising operation" before that
> term ever entered the architecture.  Linux still supports CPUs prior to
> the Pentium, so still needs to care about prefetch queues in the 486.
> 
> However, this example appears to be in 64bit code and following a write
> to CR4 which will be fully serialising, so it's probably copy&paste from
> 32bit code where it would be necessary in principle.

Yap, fully agreed. We could try to remove it and see what complains.

Nikolay, wanna do a patch which properly explains the situation?

> https://www.righto.com/2023/01/inside-8086-processors-instruction.html#fn:pc
> 
> In fact, anyone who hasn't should read the entire series on the 8086,
> https://www.righto.com/p/index.html

Oh yeah, already bookmarked.

Thanks Andy!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCHv11 05/19] x86/relocate_kernel: Use named labels for less confusion
From: Andrew Cooper @ 2024-05-29 12:33 UTC (permalink / raw)
  To: Borislav Petkov, Kirill A. Shutemov
  Cc: Nikolay Borisov, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <20240529112852.GBZlcRdI3oqBtjKxAV@fat_crate.local>

On 29/05/2024 12:28 pm, Borislav Petkov wrote:
> On Wed, May 29, 2024 at 02:17:29PM +0300, Kirill A. Shutemov wrote:
>>> That jmp 1f becomes redundant now as it simply jumps 1 line below.
>>>
>> Nothing changed wrt this jump. It dates back to initial kexec
>> implementation.
>>
>> See 5234f5eb04ab ("[PATCH] kexec: x86_64 kexec implementation").
>>
>> But I don't see functional need in it.
>>
>> Anyway, it is outside of the scope of the patch.
> Yap, Kirill did what Nikolay should've done - git archeology. Please
> don't forget to do that next time.
>
> And back in the day they didn't comment non-obvious things because
> commenting is for losers. :-\
>
> So that unconditional forward jump either flushes branch prediction on
> some old uarch or something else weird, uarch-special.
>
> I doubt we can remove it just like that.
>
> Lemme add Andy - he should know.

Seems I've gained a reputation...

jmp 1f dates back to ye olde 8086, which started the whole trend of the
instruction pointer just being a figment of the ISA's imagination[1].

Hardware maintains the pointer to the next byte to fetch (the prefetch
queue was up to 6 bytes), and there was a micro-op to subtract the
current length of the prefetch queue from the accumulator.

In those days, the prefetch queue was not coherent with main memory, and
jumps (being a discontinuity in the instruction stream) simply flushed
the prefetch queue.

This was necessary after modifying executable code, because otherwise
you could end up executing stale bytes from the prefetch queue and then
non-stale bytes thereafter.  (Otherwise known as the way to distinguish
the 8086 from the 8088 because the latter only had a 4 byte prefetch queue.)

Anyway.  It's how you used to spell "serialising operation" before that
term ever entered the architecture.  Linux still supports CPUs prior to
the Pentium, so still needs to care about prefetch queues in the 486.

However, this example appears to be in 64bit code and following a write
to CR4 which will be fully serialising, so it's probably copy&paste from
32bit code where it would be necessary in principle.

~Andrew

[1]
https://www.righto.com/2023/01/inside-8086-processors-instruction.html#fn:pc

In fact, anyone who hasn't should read the entire series on the 8086,
https://www.righto.com/p/index.html

^ permalink raw reply

* Re: [PATCHv11 06/19] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
From: Nikolay Borisov @ 2024-05-29 11:39 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <20240528095522.509667-7-kirill.shutemov@linux.intel.com>



On 28.05.24 г. 12:55 ч., Kirill A. Shutemov wrote:
> TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
> that bit is cleared during CR4 register reprogramming during boot or
> kexec flows, a #VE exception will be raised which the guest kernel
> cannot handle it.
> 
> Therefore, make sure the CR4.MCE setting is preserved over kexec too and
> avoid raising any #VEs.
> 
> The change doesn't affect non-TDX-guest environments.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply

* Re: [PATCHv11 05/19] x86/relocate_kernel: Use named labels for less confusion
From: Borislav Petkov @ 2024-05-29 11:28 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Cooper
  Cc: Nikolay Borisov, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <t3zx4f6ynru7qp4oel4syza2alcuxz7q7hxqgf2lxusgobnsnh@vtnecqrsxci5>

On Wed, May 29, 2024 at 02:17:29PM +0300, Kirill A. Shutemov wrote:
> > That jmp 1f becomes redundant now as it simply jumps 1 line below.
> > 
> 
> Nothing changed wrt this jump. It dates back to initial kexec
> implementation.
> 
> See 5234f5eb04ab ("[PATCH] kexec: x86_64 kexec implementation").
> 
> But I don't see functional need in it.
> 
> Anyway, it is outside of the scope of the patch.

Yap, Kirill did what Nikolay should've done - git archeology. Please
don't forget to do that next time.

And back in the day they didn't comment non-obvious things because
commenting is for losers. :-\

So that unconditional forward jump either flushes branch prediction on
some old uarch or something else weird, uarch-special.

I doubt we can remove it just like that.

Lemme add Andy - he should know.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCHv11 05/19] x86/relocate_kernel: Use named labels for less confusion
From: Kirill A. Shutemov @ 2024-05-29 11:17 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <1e1d1aea-7346-4022-9f5f-402d171adfda@suse.com>

On Wed, May 29, 2024 at 01:47:50PM +0300, Nikolay Borisov wrote:
> 
> 
> On 28.05.24 г. 12:55 ч., Kirill A. Shutemov wrote:
> > From: Borislav Petkov <bp@alien8.de>
> > 
> > That identity_mapped() functions was loving that "1" label to the point
> > of completely confusing its readers.
> > 
> > Use named labels in each place for clarity.
> > 
> > No functional changes.
> > 
> > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >   arch/x86/kernel/relocate_kernel_64.S | 13 +++++++------
> >   1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> > index 56cab1bb25f5..085eef5c3904 100644
> > --- a/arch/x86/kernel/relocate_kernel_64.S
> > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > @@ -148,9 +148,10 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> >   	 */
> >   	movl	$X86_CR4_PAE, %eax
> >   	testq	$X86_CR4_LA57, %r13
> > -	jz	1f
> > +	jz	.Lno_la57
> >   	orl	$X86_CR4_LA57, %eax
> > -1:
> > +.Lno_la57:
> > +
> >   	movq	%rax, %cr4
> >   	jmp 1f
> 
> That jmp 1f becomes redundant now as it simply jumps 1 line below.
> 

Nothing changed wrt this jump. It dates back to initial kexec
implementation.

See 5234f5eb04ab ("[PATCH] kexec: x86_64 kexec implementation").

But I don't see functional need in it.

Anyway, it is outside of the scope of the patch.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCHv11 05/19] x86/relocate_kernel: Use named labels for less confusion
From: Nikolay Borisov @ 2024-05-29 10:47 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <20240528095522.509667-6-kirill.shutemov@linux.intel.com>



On 28.05.24 г. 12:55 ч., Kirill A. Shutemov wrote:
> From: Borislav Petkov <bp@alien8.de>
> 
> That identity_mapped() functions was loving that "1" label to the point
> of completely confusing its readers.
> 
> Use named labels in each place for clarity.
> 
> No functional changes.
> 
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   arch/x86/kernel/relocate_kernel_64.S | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 56cab1bb25f5..085eef5c3904 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -148,9 +148,10 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>   	 */
>   	movl	$X86_CR4_PAE, %eax
>   	testq	$X86_CR4_LA57, %r13
> -	jz	1f
> +	jz	.Lno_la57
>   	orl	$X86_CR4_LA57, %eax
> -1:
> +.Lno_la57:
> +
>   	movq	%rax, %cr4
>   
>   	jmp 1f

That jmp 1f becomes redundant now as it simply jumps 1 line below.

> @@ -165,9 +166,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>   	 * used by kexec. Flush the caches before copying the kernel.
>   	 */
>   	testq	%r12, %r12
> -	jz 1f
> +	jz .Lsme_off
>   	wbinvd
> -1:
> +.Lsme_off:
>   
>   	movq	%rcx, %r11
>   	call	swap_pages
> @@ -187,7 +188,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>   	 */
>   
>   	testq	%r11, %r11
> -	jnz 1f
> +	jnz .Lrelocate
>   	xorl	%eax, %eax
>   	xorl	%ebx, %ebx
>   	xorl    %ecx, %ecx
> @@ -208,7 +209,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>   	ret
>   	int3
>   
> -1:
> +.Lrelocate:
>   	popq	%rdx
>   	leaq	PAGE_SIZE(%r10), %rsp
>   	ANNOTATE_RETPOLINE_SAFE

^ permalink raw reply

* Re: [PATCHv11 10/19] x86/mm: Add callbacks to prepare encrypted memory for kexec
From: Borislav Petkov @ 2024-05-29 10:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, x86, Rafael J. Wysocki,
	Peter Zijlstra, Adrian Hunter, Kuppuswamy Sathyanarayanan,
	Elena Reshetova, Jun Nakajima, Rick Edgecombe, Tom Lendacky,
	Kalra, Ashish, Sean Christopherson, Huang, Kai, Ard Biesheuvel,
	Baoquan He, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	kexec, linux-hyperv, linux-acpi, linux-coco, linux-kernel,
	Nikolay Borisov, Tao Liu
In-Reply-To: <20240528095522.509667-11-kirill.shutemov@linux.intel.com>

On Tue, May 28, 2024 at 12:55:13PM +0300, Kirill A. Shutemov wrote:
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index 28ac3cb9b987..6cade48811cc 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -149,12 +149,21 @@ struct x86_init_acpi {
>   * @enc_status_change_finish	Notify HV after the encryption status of a range is changed
>   * @enc_tlb_flush_required	Returns true if a TLB flush is needed before changing page encryption status
>   * @enc_cache_flush_required	Returns true if a cache flush is needed before changing page encryption status
> + * @enc_kexec_begin		Begin the two-step process of conversion shared memory back

s/conversion/converting/

> + *				to private. It stops the new conversions from being started
> + *				and waits in-flight conversions to finish, if possible.

Good.

Now add "The @crash parameter denotes whether the function is being
called in the crash shutdown path."

> + * @enc_kexec_finish		Finish the two-step process of conversion shared memory to

s/conversion/converting/

> + *				private. All memory is private after the call.

"... when the function returns."

> + *				It called with all CPUs but one shutdown and interrupts
> + *				disabled.

"It is called on only one CPU while the others are shut down and with
interrupts disabled."

>   */
>  struct x86_guest {
>  	int (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
>  	int (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
>  	bool (*enc_tlb_flush_required)(bool enc);
>  	bool (*enc_cache_flush_required)(void);
> +	void (*enc_kexec_begin)(bool crash);
> +	void (*enc_kexec_finish)(void);
>  };
>  
>  /**
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index f06501445cd9..74f6305eb9ec 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -128,6 +128,18 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  #ifdef CONFIG_HPET_TIMER
>  	hpet_disable();
>  #endif
> +
> +	/*
> +	 * Non-crash kexec calls enc_kexec_begin() while scheduling is still
> +	 * active. This allows the callback to wait until all in-flight
> +	 * shared<->private conversions are complete. In a crash scenario,
> +	 * enc_kexec_begin() get call after all but one CPU has been shut down

"gets called" ... "have been shut down"

> +	 * and interrupts have been disabled. This only allows the callback to

only?

> +	 * detect a race with the conversion and report it.
> +	 */
> +	x86_platform.guest.enc_kexec_begin(true);
> +	x86_platform.guest.enc_kexec_finish();
> +

...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* [PATCH net-next v2] net: mana: Allow variable size indirection table
From: Shradha Gupta @ 2024-05-29  5:35 UTC (permalink / raw)
  To: linux-hardening, netdev, linux-hyperv, linux-kernel, linux-rdma
  Cc: Shradha Gupta, Colin Ian King, Ahmed Zaki, Pavan Chebbi,
	Souradeep Chakrabarti, Konstantin Taranov, Kees Cook, Paolo Abeni,
	Jakub Kicinski, Eric Dumazet, David S. Miller, Dexuan Cui,
	Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Leon Romanovsky,
	Jason Gunthorpe, Ajay Sharma, Long Li, Shradha Gupta

Allow variable size indirection table allocation in MANA instead
of using a constant value MANA_INDIRECT_TABLE_SIZE.
The size is now derived from the MANA_QUERY_VPORT_CONFIG and the
indirection table is allocated dynamically.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 Changes in v2:
 * Rebased to latest net-next tree
 * Rearranged cleanup code in mana_probe_port to avoid extra operations
---
 drivers/infiniband/hw/mana/qp.c               | 10 +--
 drivers/net/ethernet/microsoft/mana/mana_en.c | 68 ++++++++++++++++---
 .../ethernet/microsoft/mana/mana_ethtool.c    | 20 ++++--
 include/net/mana/gdma.h                       |  4 +-
 include/net/mana/mana.h                       |  9 +--
 5 files changed, 84 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index ba13c5abf8ef..2d411a16a127 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -21,7 +21,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
 
 	gc = mdev_to_gc(dev);
 
-	req_buf_size = struct_size(req, indir_tab, MANA_INDIRECT_TABLE_SIZE);
+	req_buf_size = struct_size(req, indir_tab, MANA_INDIRECT_TABLE_DEF_SIZE);
 	req = kzalloc(req_buf_size, GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
@@ -41,18 +41,18 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
 	if (log_ind_tbl_size)
 		req->rss_enable = true;
 
-	req->num_indir_entries = MANA_INDIRECT_TABLE_SIZE;
+	req->num_indir_entries = MANA_INDIRECT_TABLE_DEF_SIZE;
 	req->indir_tab_offset = offsetof(struct mana_cfg_rx_steer_req_v2,
 					 indir_tab);
 	req->update_indir_tab = true;
 	req->cqe_coalescing_enable = 1;
 
 	/* The ind table passed to the hardware must have
-	 * MANA_INDIRECT_TABLE_SIZE entries. Adjust the verb
+	 * MANA_INDIRECT_TABLE_DEF_SIZE entries. Adjust the verb
 	 * ind_table to MANA_INDIRECT_TABLE_SIZE if required
 	 */
 	ibdev_dbg(&dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
-	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+	for (i = 0; i < MANA_INDIRECT_TABLE_DEF_SIZE; i++) {
 		req->indir_tab[i] = ind_table[i % (1 << log_ind_tbl_size)];
 		ibdev_dbg(&dev->ib_dev, "index %u handle 0x%llx\n", i,
 			  req->indir_tab[i]);
@@ -137,7 +137,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 	}
 
 	ind_tbl_size = 1 << ind_tbl->log_ind_tbl_size;
-	if (ind_tbl_size > MANA_INDIRECT_TABLE_SIZE) {
+	if (ind_tbl_size > MANA_INDIRECT_TABLE_DEF_SIZE) {
 		ibdev_dbg(&mdev->ib_dev,
 			  "Indirect table size %d exceeding limit\n",
 			  ind_tbl_size);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d087cf954f75..851e1b9761b3 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -481,7 +481,7 @@ static int mana_get_tx_queue(struct net_device *ndev, struct sk_buff *skb,
 	struct sock *sk = skb->sk;
 	int txq;
 
-	txq = apc->indir_table[hash & MANA_INDIRECT_TABLE_MASK];
+	txq = apc->indir_table[hash & (apc->indir_table_sz - 1)];
 
 	if (txq != old_q && sk && sk_fullsock(sk) &&
 	    rcu_access_pointer(sk->sk_dst_cache))
@@ -962,7 +962,16 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
 
 	*max_sq = resp.max_num_sq;
 	*max_rq = resp.max_num_rq;
-	*num_indir_entry = resp.num_indirection_ent;
+	if (resp.num_indirection_ent > 0 &&
+	    resp.num_indirection_ent <= MANA_INDIRECT_TABLE_MAX_SIZE &&
+	    is_power_of_2(resp.num_indirection_ent)) {
+		*num_indir_entry = resp.num_indirection_ent;
+	} else {
+		netdev_warn(apc->ndev,
+			    "Setting indirection table size to default %d for vPort %d\n",
+			    MANA_INDIRECT_TABLE_DEF_SIZE, apc->port_idx);
+		*num_indir_entry = MANA_INDIRECT_TABLE_DEF_SIZE;
+	}
 
 	apc->port_handle = resp.vport;
 	ether_addr_copy(apc->mac_addr, resp.mac_addr);
@@ -1054,14 +1063,13 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 				   bool update_default_rxobj, bool update_key,
 				   bool update_tab)
 {
-	u16 num_entries = MANA_INDIRECT_TABLE_SIZE;
 	struct mana_cfg_rx_steer_req_v2 *req;
 	struct mana_cfg_rx_steer_resp resp = {};
 	struct net_device *ndev = apc->ndev;
 	u32 req_buf_size;
 	int err;
 
-	req_buf_size = struct_size(req, indir_tab, num_entries);
+	req_buf_size = struct_size(req, indir_tab, apc->indir_table_sz);
 	req = kzalloc(req_buf_size, GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
@@ -1072,7 +1080,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	req->hdr.req.msg_version = GDMA_MESSAGE_V2;
 
 	req->vport = apc->port_handle;
-	req->num_indir_entries = num_entries;
+	req->num_indir_entries = apc->indir_table_sz;
 	req->indir_tab_offset = offsetof(struct mana_cfg_rx_steer_req_v2,
 					 indir_tab);
 	req->rx_enable = rx;
@@ -1111,7 +1119,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	}
 
 	netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
-		    apc->port_handle, num_entries);
+		    apc->port_handle, apc->indir_table_sz);
 out:
 	kfree(req);
 	return err;
@@ -2344,11 +2352,33 @@ static int mana_create_vport(struct mana_port_context *apc,
 	return mana_create_txq(apc, net);
 }
 
+static int mana_rss_table_alloc(struct mana_port_context *apc)
+{
+	if (!apc->indir_table_sz) {
+		netdev_err(apc->ndev,
+			   "Indirection table size not set for vPort %d\n",
+			   apc->port_idx);
+		return -EINVAL;
+	}
+
+	apc->indir_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
+	if (!apc->indir_table)
+		return -ENOMEM;
+
+	apc->rxobj_table = kcalloc(apc->indir_table_sz, sizeof(mana_handle_t), GFP_KERNEL);
+	if (!apc->rxobj_table) {
+		kfree(apc->indir_table);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static void mana_rss_table_init(struct mana_port_context *apc)
 {
 	int i;
 
-	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+	for (i = 0; i < apc->indir_table_sz; i++)
 		apc->indir_table[i] =
 			ethtool_rxfh_indir_default(i, apc->num_queues);
 }
@@ -2361,7 +2391,7 @@ int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
 	int i;
 
 	if (update_tab) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+		for (i = 0; i < apc->indir_table_sz; i++) {
 			queue_idx = apc->indir_table[i];
 			apc->rxobj_table[i] = apc->rxqs[queue_idx]->rxobj;
 		}
@@ -2466,7 +2496,6 @@ static int mana_init_port(struct net_device *ndev)
 	struct mana_port_context *apc = netdev_priv(ndev);
 	u32 max_txq, max_rxq, max_queues;
 	int port_idx = apc->port_idx;
-	u32 num_indirect_entries;
 	int err;
 
 	err = mana_init_port_context(apc);
@@ -2474,7 +2503,7 @@ static int mana_init_port(struct net_device *ndev)
 		return err;
 
 	err = mana_query_vport_cfg(apc, port_idx, &max_txq, &max_rxq,
-				   &num_indirect_entries);
+				   &apc->indir_table_sz);
 	if (err) {
 		netdev_err(ndev, "Failed to query info for vPort %d\n",
 			   port_idx);
@@ -2723,6 +2752,10 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	if (err)
 		goto free_net;
 
+	err = mana_rss_table_alloc(apc);
+	if (err)
+		goto reset_apc;
+
 	netdev_lockdep_set_classes(ndev);
 
 	ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
@@ -2739,11 +2772,17 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	err = register_netdev(ndev);
 	if (err) {
 		netdev_err(ndev, "Unable to register netdev.\n");
-		goto reset_apc;
+		goto free_indir;
 	}
 
 	return 0;
 
+free_indir:
+	apc->indir_table_sz = 0;
+	kfree(apc->indir_table);
+	apc->indir_table = NULL;
+	kfree(apc->rxobj_table);
+	apc->rxobj_table = NULL;
 reset_apc:
 	kfree(apc->rxqs);
 	apc->rxqs = NULL;
@@ -2897,6 +2936,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 {
 	struct gdma_context *gc = gd->gdma_context;
 	struct mana_context *ac = gd->driver_data;
+	struct mana_port_context *apc;
 	struct device *dev = gc->dev;
 	struct net_device *ndev;
 	int err;
@@ -2908,6 +2948,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 
 	for (i = 0; i < ac->num_ports; i++) {
 		ndev = ac->ports[i];
+		apc = netdev_priv(ndev);
 		if (!ndev) {
 			if (i == 0)
 				dev_err(dev, "No net device to remove\n");
@@ -2931,6 +2972,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		}
 
 		unregister_netdevice(ndev);
+		apc->indir_table_sz = 0;
+		kfree(apc->indir_table);
+		apc->indir_table = NULL;
+		kfree(apc->rxobj_table);
+		apc->rxobj_table = NULL;
 
 		rtnl_unlock();
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index ab2413d71f6c..1667f18046d2 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -245,7 +245,9 @@ static u32 mana_get_rxfh_key_size(struct net_device *ndev)
 
 static u32 mana_rss_indir_size(struct net_device *ndev)
 {
-	return MANA_INDIRECT_TABLE_SIZE;
+	struct mana_port_context *apc = netdev_priv(ndev);
+
+	return apc->indir_table_sz;
 }
 
 static int mana_get_rxfh(struct net_device *ndev,
@@ -257,7 +259,7 @@ static int mana_get_rxfh(struct net_device *ndev,
 	rxfh->hfunc = ETH_RSS_HASH_TOP; /* Toeplitz */
 
 	if (rxfh->indir) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+		for (i = 0; i < apc->indir_table_sz; i++)
 			rxfh->indir[i] = apc->indir_table[i];
 	}
 
@@ -273,8 +275,8 @@ static int mana_set_rxfh(struct net_device *ndev,
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	bool update_hash = false, update_table = false;
-	u32 save_table[MANA_INDIRECT_TABLE_SIZE];
 	u8 save_key[MANA_HASH_KEY_SIZE];
+	u32 *save_table;
 	int i, err;
 
 	if (!apc->port_is_up)
@@ -284,13 +286,17 @@ static int mana_set_rxfh(struct net_device *ndev,
 	    rxfh->hfunc != ETH_RSS_HASH_TOP)
 		return -EOPNOTSUPP;
 
+	save_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
+	if (!save_table)
+		return -ENOMEM;
+
 	if (rxfh->indir) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+		for (i = 0; i < apc->indir_table_sz; i++)
 			if (rxfh->indir[i] >= apc->num_queues)
 				return -EINVAL;
 
 		update_table = true;
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+		for (i = 0; i < apc->indir_table_sz; i++) {
 			save_table[i] = apc->indir_table[i];
 			apc->indir_table[i] = rxfh->indir[i];
 		}
@@ -306,7 +312,7 @@ static int mana_set_rxfh(struct net_device *ndev,
 
 	if (err) { /* recover to original values */
 		if (update_table) {
-			for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+			for (i = 0; i < apc->indir_table_sz; i++)
 				apc->indir_table[i] = save_table[i];
 		}
 
@@ -316,6 +322,8 @@ static int mana_set_rxfh(struct net_device *ndev,
 		mana_config_rss(apc, TRI_STATE_TRUE, update_hash, update_table);
 	}
 
+	kfree(save_table);
+
 	return err;
 }
 
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 27684135bb4d..c547756c4284 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -543,11 +543,13 @@ enum {
  */
 #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
 #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
+#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
 
 #define GDMA_DRV_CAP_FLAGS1 \
 	(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
 	 GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
-	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
+	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG | \
+	 GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT)
 
 #define GDMA_DRV_CAP_FLAGS2 0
 
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 561f6719fb4e..59823901b74f 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -30,8 +30,8 @@ enum TRI_STATE {
 };
 
 /* Number of entries for hardware indirection table must be in power of 2 */
-#define MANA_INDIRECT_TABLE_SIZE 64
-#define MANA_INDIRECT_TABLE_MASK (MANA_INDIRECT_TABLE_SIZE - 1)
+#define MANA_INDIRECT_TABLE_MAX_SIZE 512
+#define MANA_INDIRECT_TABLE_DEF_SIZE 64
 
 /* The Toeplitz hash key's length in bytes: should be multiple of 8 */
 #define MANA_HASH_KEY_SIZE 40
@@ -410,10 +410,11 @@ struct mana_port_context {
 	struct mana_tx_qp *tx_qp;
 
 	/* Indirection Table for RX & TX. The values are queue indexes */
-	u32 indir_table[MANA_INDIRECT_TABLE_SIZE];
+	u32 *indir_table;
+	u32 indir_table_sz;
 
 	/* Indirection table containing RxObject Handles */
-	mana_handle_t rxobj_table[MANA_INDIRECT_TABLE_SIZE];
+	mana_handle_t *rxobj_table;
 
 	/*  Hash key used by the NIC */
 	u8 hashkey[MANA_HASH_KEY_SIZE];
-- 
2.34.1


^ permalink raw reply related

* Re: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Tom Lendacky @ 2024-05-28 14:15 UTC (permalink / raw)
  To: Michael Kelley, Dexuan Cui, Dave Hansen, x86@kernel.org,
	linux-coco@lists.linux.dev, bp@alien8.de,
	dave.hansen@linux.intel.com, Haiyang Zhang, hpa@zytor.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan, luto@kernel.org,
	mingo@redhat.com, peterz@infradead.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, tglx@linutronix.de,
	wei.liu@kernel.org, jason, tytso@mit.edu, ardb@kernel.org
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tianyu Lan
In-Reply-To: <SN6PR02MB4157B940ED0A5F39D49A897BD4F52@SN6PR02MB4157.namprd02.prod.outlook.com>

On 5/24/24 17:44, Michael Kelley wrote:
> From: Dexuan Cui <decui@microsoft.com> Sent: Friday, May 24, 2024 1:46 AM
>>> From: Dave Hansen <dave.hansen@intel.com>
>>> Sent: Thursday, May 23, 2024 7:26 AM
>>> [...]
>>> On 5/22/24 19:24, Dexuan Cui wrote:
>>> ...

> 
> My thoughts:
> 
> __bss_decrypted is named as if it applies to any CoCo VM, but really
> it is specific to AMD SEV. It was originally used for a GHCB page, which

IIRC, it was originally used for KVM clock, not the GHCB page, since
plain SEV doesn't use a GHCB, see:

b3f0907c71e0 ("x86/mm: Add .bss..decrypted section to hold shared variables")

> is SEV-specific, and then it proved to be convenient for the Hyper-V TSC
> page. Ideally, we could fix __bss_decrypted to work generally in a
> TDX VM without any dependency on code specific to a hypervisor. But
> looking at some of the details, that may be non-trivial.

In reality, TDX should also make this area shared as that is how this
section is meant to be setup. But up till now, I don't think TDX used
anything in the __bss_decrypted section, so it was never moved to a
common location and has remained SEV specific.

> 
> A narrower solution is to remove the Hyper-V TSC page from
> __bss_decrypted, and use Hyper-V specific code on both TDX and
> SEV-SNP to decrypt just that page (not the entire __bss_decrypted),
> based on whether the Hyper-V guest is running with a paravisor.
>  From Dexuan's patch, it looks like set_memory_decrypted()
> works on TDX at the time that ms_hyperv_init_platform() runs.
> Does it also work on SEV-SNP? The code in kvm_init_platform()
> uses early_set_mem_enc_dec_hypercall() with
> kvm_sev_hc_page_enc_status(), which is SEV only.  So maybe

This is to inform the hypervisor that these pages are now shared, see
below.

> the normal set_memory_decrypted() doesn't work on SEV at
> that point, though I'm not at all clear on what kvm_init_platform is
> trying to do.  Shouldn't __bss_decrypted already be set up correctly?

With SEV, yes, the pagetable is set up correctly. And specific to SNP,
the RMP is set up correctly because of the page state change (PSC) call
which also notifies the hypervisor of the state change.

But since the RMP PSC is SNP specific, SEV and SEV-ES require the
separate hypercall to notify the hypervisor of the state change.

Thanks,
Tom

> 

^ permalink raw reply

* Re: [PATCHv11 01/19] x86/acpi: Extract ACPI MADT wakeup code into a separate file
From: Borislav Petkov @ 2024-05-28 13:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, x86, Rafael J. Wysocki,
	Peter Zijlstra, Adrian Hunter, Kuppuswamy Sathyanarayanan,
	Elena Reshetova, Jun Nakajima, Rick Edgecombe, Tom Lendacky,
	Kalra, Ashish, Sean Christopherson, Huang, Kai, Ard Biesheuvel,
	Baoquan He, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	kexec, linux-hyperv, linux-acpi, linux-coco, linux-kernel,
	Tao Liu
In-Reply-To: <20240528095522.509667-2-kirill.shutemov@linux.intel.com>

On Tue, May 28, 2024 at 12:55:04PM +0300, Kirill A. Shutemov wrote:
> In order to prepare for the expansion of support for the ACPI MADT
> wakeup method, move the relevant code into a separate file.
> 
> Introduce a new configuration option to clearly indicate dependencies
> without the use of ifdefs.
> 
> There have been no functional changes.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Acked-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Baoquan He <bhe@redhat.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Tao Liu <ltao@redhat.com>
> ---
>  arch/x86/Kconfig                   |  7 +++
>  arch/x86/include/asm/acpi.h        |  5 ++
>  arch/x86/kernel/acpi/Makefile      |  1 +
>  arch/x86/kernel/acpi/boot.c        | 86 +-----------------------------
>  arch/x86/kernel/acpi/madt_wakeup.c | 82 ++++++++++++++++++++++++++++
>  5 files changed, 96 insertions(+), 85 deletions(-)
>  create mode 100644 arch/x86/kernel/acpi/madt_wakeup.c

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCHv11 06/19] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
From: Huang, Kai @ 2024-05-28 11:12 UTC (permalink / raw)
  To: kirill.shutemov@linux.intel.com, tglx@linutronix.de,
	mingo@redhat.com, x86@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com
  Cc: kexec@lists.infradead.org, ardb@kernel.org,
	linux-coco@lists.linux.dev, ashish.kalra@amd.com,
	thomas.lendacky@amd.com, Hunter, Adrian, Reshetova, Elena,
	linux-kernel@vger.kernel.org, haiyangz@microsoft.com,
	seanjc@google.com, kys@microsoft.com, bhe@redhat.com,
	Nakajima, Jun, hpa@zytor.com, peterz@infradead.org,
	linux-hyperv@vger.kernel.org, Edgecombe, Rick P,
	rafael@kernel.org, sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-acpi@vger.kernel.org
In-Reply-To: <20240528095522.509667-7-kirill.shutemov@linux.intel.com>

On Tue, 2024-05-28 at 12:55 +0300, Kirill A. Shutemov wrote:
> TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
> that bit is cleared during CR4 register reprogramming during boot or
> kexec flows, a #VE exception will be raised which the guest kernel
> cannot handle it.

Nit: the ending "it" isn't needed.

> 
> Therefore, make sure the CR4.MCE setting is preserved over kexec too and
> avoid raising any #VEs.
> 
> The change doesn't affect non-TDX-guest environments.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Kai Huang <kai.huang@intel.com>

> ---
>  arch/x86/kernel/relocate_kernel_64.S | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 085eef5c3904..b668a6be4f6f 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -5,6 +5,8 @@
>   */
>  
>  #include <linux/linkage.h>
> +#include <linux/stringify.h>
> +#include <asm/alternative.h>
>  #include <asm/page_types.h>
>  #include <asm/kexec.h>
>  #include <asm/processor-flags.h>
> @@ -143,15 +145,17 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>  
>  	/*
>  	 * Set cr4 to a known state:
> -	 *  - physical address extension enabled
>  	 *  - 5-level paging, if it was enabled before
> +	 *  - Machine check exception on TDX guest, if it was enabled before.
> +	 *    Clearing MCE might not be allowed in TDX guests, depending on setup.
> +	 *  - physical address extension enabled
>  	 */
> -	movl	$X86_CR4_PAE, %eax
> -	testq	$X86_CR4_LA57, %r13
> -	jz	.Lno_la57
> -	orl	$X86_CR4_LA57, %eax
> -.Lno_la57:
> +	movl	$X86_CR4_LA57, %eax
> +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
>  
> +	/* R13 contains the original CR4 value, read in relocate_kernel() */
> +	andl	%r13d, %eax
> +	orl	$X86_CR4_PAE, %eax
>  	movq	%rax, %cr4
>  
>  	jmp 1f


^ permalink raw reply

* Re: [PATCHv11 00/19] x86/tdx: Add kexec support
From: Rafael J. Wysocki @ 2024-05-28 10:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, kexec, linux-hyperv, linux-acpi,
	linux-coco, linux-kernel
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

On Tue, May 28, 2024 at 11:55 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The patchset adds bits and pieces to get kexec (and crashkernel) work on
> TDX guest.
>
> The last patch implements CPU offlining according to the approved ACPI
> spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> 2nd kernel with single CPU.
>
> Please review. I would be glad for any feedback.
>
> [1] https://lore.kernel.org/all/13356251.uLZWGnKmhe@kreacher

For the ACPI-related changes in the series

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

^ permalink raw reply

* [PATCHv11 14/19] x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel, Tao Liu
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

In order to support MADT wakeup structure version 1, provide more
appropriate names for the fields in the structure.

Rename 'mailbox_version' to 'version'. This field signifies the version
of the structure and the related protocols, rather than the version of
the mailbox. This field has not been utilized in the code thus far.

Rename 'base_address' to 'mailbox_address' to clarify the kind of
address it represents. In version 1, the structure includes the reset
vector address. Clear and distinct naming helps to prevent any
confusion.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tao Liu <ltao@redhat.com>
---
 arch/x86/kernel/acpi/madt_wakeup.c | 2 +-
 include/acpi/actbl2.h              | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index d222be8d7a07..004801b9b151 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -75,7 +75,7 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
 
 	acpi_table_print_madt_entry(&header->common);
 
-	acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
+	acpi_mp_wake_mailbox_paddr = mp_wake->mailbox_address;
 
 	cpu_hotplug_disable_offlining();
 
diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index ae747c89d92c..fa63362469aa 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -1194,9 +1194,9 @@ struct acpi_madt_generic_translator {
 
 struct acpi_madt_multiproc_wakeup {
 	struct acpi_subtable_header header;
-	u16 mailbox_version;
+	u16 version;
 	u32 reserved;		/* reserved - must be zero */
-	u64 base_address;
+	u64 mailbox_address;
 };
 
 #define ACPI_MULTIPROC_WAKEUP_MB_OS_SIZE        2032
-- 
2.43.0


^ permalink raw reply related

* [PATCHv11 12/19] x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel, Dave Hansen,
	Tao Liu
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

e820__end_of_ram_pfn() is used to calculate max_pfn which, among other
things, guides where direct mapping ends. Any memory above max_pfn is
not going to be present in the direct mapping.

e820__end_of_ram_pfn() finds the end of the RAM based on the highest
E820_TYPE_RAM range. But it doesn't includes E820_TYPE_ACPI ranges into
calculation.

Despite the name, E820_TYPE_ACPI covers not only ACPI data, but also EFI
tables and might be required by kernel to function properly.

Usually the problem is hidden because there is some E820_TYPE_RAM memory
above E820_TYPE_ACPI. But crashkernel only presents pre-allocated crash
memory as E820_TYPE_RAM on boot. If the preallocated range is small, it
can fit under the last E820_TYPE_ACPI range.

Modify e820__end_of_ram_pfn() and e820__end_of_low_ram_pfn() to cover
E820_TYPE_ACPI memory.

The problem was discovered during debugging kexec for TDX guest. TDX
guest uses E820_TYPE_ACPI to store the unaccepted memory bitmap and pass
it between the kernels on kexec.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Tao Liu <ltao@redhat.com>
---
 arch/x86/kernel/e820.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 68b09f718f10..4893d30ce438 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -828,7 +828,7 @@ u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
 /*
  * Find the highest page frame number we have available
  */
-static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type type)
+static unsigned long __init e820__end_ram_pfn(unsigned long limit_pfn)
 {
 	int i;
 	unsigned long last_pfn = 0;
@@ -839,7 +839,8 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 		unsigned long start_pfn;
 		unsigned long end_pfn;
 
-		if (entry->type != type)
+		if (entry->type != E820_TYPE_RAM &&
+		    entry->type != E820_TYPE_ACPI)
 			continue;
 
 		start_pfn = entry->addr >> PAGE_SHIFT;
@@ -865,12 +866,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 
 unsigned long __init e820__end_of_ram_pfn(void)
 {
-	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
+	return e820__end_ram_pfn(MAX_ARCH_PFN);
 }
 
 unsigned long __init e820__end_of_low_ram_pfn(void)
 {
-	return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
+	return e820__end_ram_pfn(1UL << (32 - PAGE_SHIFT));
 }
 
 static void __init early_panic(char *msg)
-- 
2.43.0


^ permalink raw reply related

* [PATCHv11 07/19] x86/mm: Make x86_platform.guest.enc_status_change_*() return errno
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel, Dave Hansen,
	Tao Liu, Michael Kelley
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

TDX is going to have more than one reason to fail
enc_status_change_prepare().

Change the callback to return errno instead of assuming -EIO;
enc_status_change_finish() changed too to keep the interface symmetric.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/coco/tdx/tdx.c         | 20 +++++++++++---------
 arch/x86/hyperv/ivm.c           | 22 ++++++++++------------
 arch/x86/include/asm/x86_init.h |  4 ++--
 arch/x86/kernel/x86_init.c      |  4 ++--
 arch/x86/mm/mem_encrypt_amd.c   |  8 ++++----
 arch/x86/mm/pat/set_memory.c    | 12 +++++++-----
 6 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index c1cb90369915..26fa47db5782 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -798,28 +798,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 	return true;
 }
 
-static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
-					  bool enc)
+static int tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
+					 bool enc)
 {
 	/*
 	 * Only handle shared->private conversion here.
 	 * See the comment in tdx_early_init().
 	 */
-	if (enc)
-		return tdx_enc_status_changed(vaddr, numpages, enc);
-	return true;
+	if (enc && !tdx_enc_status_changed(vaddr, numpages, enc))
+		return -EIO;
+
+	return 0;
 }
 
-static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
+static int tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
 					 bool enc)
 {
 	/*
 	 * Only handle private->shared conversion here.
 	 * See the comment in tdx_early_init().
 	 */
-	if (!enc)
-		return tdx_enc_status_changed(vaddr, numpages, enc);
-	return true;
+	if (!enc && !tdx_enc_status_changed(vaddr, numpages, enc))
+		return -EIO;
+
+	return 0;
 }
 
 void __init tdx_early_init(void)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 768d73de0d09..b4a851d27c7c 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -523,9 +523,9 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
  * transition is complete, hv_vtom_set_host_visibility() marks the pages
  * as "present" again.
  */
-static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
+static int hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
 {
-	return !set_memory_np(kbuffer, pagecount);
+	return set_memory_np(kbuffer, pagecount);
 }
 
 /*
@@ -536,20 +536,19 @@ static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc
  * with host. This function works as wrap of hv_mark_gpa_visibility()
  * with memory base and size.
  */
-static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc)
+static int hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc)
 {
 	enum hv_mem_host_visibility visibility = enc ?
 			VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
 	u64 *pfn_array;
 	phys_addr_t paddr;
+	int i, pfn, err;
 	void *vaddr;
 	int ret = 0;
-	bool result = true;
-	int i, pfn;
 
 	pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
 	if (!pfn_array) {
-		result = false;
+		ret = -ENOMEM;
 		goto err_set_memory_p;
 	}
 
@@ -568,10 +567,8 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
 		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
 			ret = hv_mark_gpa_visibility(pfn, pfn_array,
 						     visibility);
-			if (ret) {
-				result = false;
+			if (ret)
 				goto err_free_pfn_array;
-			}
 			pfn = 0;
 		}
 	}
@@ -586,10 +583,11 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
 	 * order to avoid leaving the memory range in a "broken" state. Setting
 	 * the PRESENT bits shouldn't fail, but return an error if it does.
 	 */
-	if (set_memory_p(kbuffer, pagecount))
-		result = false;
+	err = set_memory_p(kbuffer, pagecount);
+	if (err && !ret)
+		ret = err;
 
-	return result;
+	return ret;
 }
 
 static bool hv_vtom_tlb_flush_required(bool private)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6149eabe200f..28ac3cb9b987 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -151,8 +151,8 @@ struct x86_init_acpi {
  * @enc_cache_flush_required	Returns true if a cache flush is needed before changing page encryption status
  */
 struct x86_guest {
-	bool (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
-	bool (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
+	int (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
+	int (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
 	bool (*enc_tlb_flush_required)(bool enc);
 	bool (*enc_cache_flush_required)(void);
 };
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index d5dc5a92635a..a7143bb7dd93 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -134,8 +134,8 @@ struct x86_cpuinit_ops x86_cpuinit = {
 
 static void default_nmi_init(void) { };
 
-static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; }
-static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return true; }
+static int enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return 0; }
+static int enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return 0; }
 static bool enc_tlb_flush_required_noop(bool enc) { return false; }
 static bool enc_cache_flush_required_noop(void) { return false; }
 static bool is_private_mmio_noop(u64 addr) {return false; }
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 422602f6039b..e7b67519ddb5 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -283,7 +283,7 @@ static void enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool enc)
 #endif
 }
 
-static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
+static int amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
 {
 	/*
 	 * To maintain the security guarantees of SEV-SNP guests, make sure
@@ -292,11 +292,11 @@ static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool
 	if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && !enc)
 		snp_set_memory_shared(vaddr, npages);
 
-	return true;
+	return 0;
 }
 
 /* Return true unconditionally: return value doesn't matter for the SEV side */
-static bool amd_enc_status_change_finish(unsigned long vaddr, int npages, bool enc)
+static int amd_enc_status_change_finish(unsigned long vaddr, int npages, bool enc)
 {
 	/*
 	 * After memory is mapped encrypted in the page table, validate it
@@ -308,7 +308,7 @@ static bool amd_enc_status_change_finish(unsigned long vaddr, int npages, bool e
 	if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
 		enc_dec_hypercall(vaddr, npages << PAGE_SHIFT, enc);
 
-	return true;
+	return 0;
 }
 
 static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 19fdfbb171ed..498812f067cd 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2196,7 +2196,8 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 		cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required());
 
 	/* Notify hypervisor that we are about to set/clr encryption attribute. */
-	if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc))
+	ret = x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
+	if (ret)
 		goto vmm_fail;
 
 	ret = __change_page_attr_set_clr(&cpa, 1);
@@ -2214,16 +2215,17 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 		return ret;
 
 	/* Notify hypervisor that we have successfully set/clr encryption attribute. */
-	if (!x86_platform.guest.enc_status_change_finish(addr, numpages, enc))
+	ret = x86_platform.guest.enc_status_change_finish(addr, numpages, enc);
+	if (ret)
 		goto vmm_fail;
 
 	return 0;
 
 vmm_fail:
-	WARN_ONCE(1, "CPA VMM failure to convert memory (addr=%p, numpages=%d) to %s.\n",
-		  (void *)addr, numpages, enc ? "private" : "shared");
+	WARN_ONCE(1, "CPA VMM failure to convert memory (addr=%p, numpages=%d) to %s: %d\n",
+		  (void *)addr, numpages, enc ? "private" : "shared", ret);
 
-	return -EIO;
+	return ret;
 }
 
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
-- 
2.43.0


^ permalink raw reply related

* [PATCHv11 17/19] x86/mm: Introduce kernel_ident_mapping_free()
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel, Tao Liu
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

The helper complements kernel_ident_mapping_init(): it frees the
identity mapping that was previously allocated. It will be used in the
error path to free a partially allocated mapping or if the mapping is no
longer needed.

The caller provides a struct x86_mapping_info with the free_pgd_page()
callback hooked up and the pgd_t to free.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Tao Liu <ltao@redhat.com>
---
 arch/x86/include/asm/init.h |  3 ++
 arch/x86/mm/ident_map.c     | 73 +++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index cc9ccf61b6bd..14d72727d7ee 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -6,6 +6,7 @@
 
 struct x86_mapping_info {
 	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
+	void (*free_pgt_page)(void *, void *); /* free buf for page table */
 	void *context;			 /* context for alloc_pgt_page */
 	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
@@ -16,4 +17,6 @@ struct x86_mapping_info {
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 				unsigned long pstart, unsigned long pend);
 
+void kernel_ident_mapping_free(struct x86_mapping_info *info, pgd_t *pgd);
+
 #endif /* _ASM_X86_INIT_H */
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..3996af7b4abf 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -4,6 +4,79 @@
  * included by both the compressed kernel and the regular kernel.
  */
 
+static void free_pte(struct x86_mapping_info *info, pmd_t *pmd)
+{
+	pte_t *pte = pte_offset_kernel(pmd, 0);
+
+	info->free_pgt_page(pte, info->context);
+}
+
+static void free_pmd(struct x86_mapping_info *info, pud_t *pud)
+{
+	pmd_t *pmd = pmd_offset(pud, 0);
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_present(pmd[i]))
+			continue;
+
+		if (pmd_leaf(pmd[i]))
+			continue;
+
+		free_pte(info, &pmd[i]);
+	}
+
+	info->free_pgt_page(pmd, info->context);
+}
+
+static void free_pud(struct x86_mapping_info *info, p4d_t *p4d)
+{
+	pud_t *pud = pud_offset(p4d, 0);
+	int i;
+
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		if (!pud_present(pud[i]))
+			continue;
+
+		if (pud_leaf(pud[i]))
+			continue;
+
+		free_pmd(info, &pud[i]);
+	}
+
+	info->free_pgt_page(pud, info->context);
+}
+
+static void free_p4d(struct x86_mapping_info *info, pgd_t *pgd)
+{
+	p4d_t *p4d = p4d_offset(pgd, 0);
+	int i;
+
+	for (i = 0; i < PTRS_PER_P4D; i++) {
+		if (!p4d_present(p4d[i]))
+			continue;
+
+		free_pud(info, &p4d[i]);
+	}
+
+	if (pgtable_l5_enabled())
+		info->free_pgt_page(pgd, info->context);
+}
+
+void kernel_ident_mapping_free(struct x86_mapping_info *info, pgd_t *pgd)
+{
+	int i;
+
+	for (i = 0; i < PTRS_PER_PGD; i++) {
+		if (!pgd_present(pgd[i]))
+			continue;
+
+		free_p4d(info, &pgd[i]);
+	}
+
+	info->free_pgt_page(pgd, info->context);
+}
+
 static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
 			   unsigned long addr, unsigned long end)
 {
-- 
2.43.0


^ permalink raw reply related

* [PATCHv11 10/19] x86/mm: Add callbacks to prepare encrypted memory for kexec
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel,
	Nikolay Borisov, Tao Liu
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

AMD SEV and Intel TDX guests allocate shared buffers for performing I/O.
This is done by allocating pages normally from the buddy allocator and
then converting them to shared using set_memory_decrypted().

On kexec, the second kernel is unaware of which memory has been
converted in this manner. It only sees E820_TYPE_RAM. Accessing shared
memory as private is fatal.

Therefore, the memory state must be reset to its original state before
starting the new kernel with kexec.

The process of converting shared memory back to private occurs in two
steps:

- enc_kexec_begin() stops new conversions.

- enc_kexec_finish() unshares all existing shared memory, reverting it
  back to private.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Tao Liu <ltao@redhat.com>
---
 arch/x86/include/asm/x86_init.h |  9 +++++++++
 arch/x86/kernel/crash.c         | 12 ++++++++++++
 arch/x86/kernel/reboot.c        | 12 ++++++++++++
 arch/x86/kernel/x86_init.c      |  4 ++++
 4 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 28ac3cb9b987..6cade48811cc 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -149,12 +149,21 @@ struct x86_init_acpi {
  * @enc_status_change_finish	Notify HV after the encryption status of a range is changed
  * @enc_tlb_flush_required	Returns true if a TLB flush is needed before changing page encryption status
  * @enc_cache_flush_required	Returns true if a cache flush is needed before changing page encryption status
+ * @enc_kexec_begin		Begin the two-step process of conversion shared memory back
+ *				to private. It stops the new conversions from being started
+ *				and waits in-flight conversions to finish, if possible.
+ * @enc_kexec_finish		Finish the two-step process of conversion shared memory to
+ *				private. All memory is private after the call.
+ *				It called with all CPUs but one shutdown and interrupts
+ *				disabled.
  */
 struct x86_guest {
 	int (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
 	int (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
 	bool (*enc_tlb_flush_required)(bool enc);
 	bool (*enc_cache_flush_required)(void);
+	void (*enc_kexec_begin)(bool crash);
+	void (*enc_kexec_finish)(void);
 };
 
 /**
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f06501445cd9..74f6305eb9ec 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -128,6 +128,18 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 #ifdef CONFIG_HPET_TIMER
 	hpet_disable();
 #endif
+
+	/*
+	 * Non-crash kexec calls enc_kexec_begin() while scheduling is still
+	 * active. This allows the callback to wait until all in-flight
+	 * shared<->private conversions are complete. In a crash scenario,
+	 * enc_kexec_begin() get call after all but one CPU has been shut down
+	 * and interrupts have been disabled. This only allows the callback to
+	 * detect a race with the conversion and report it.
+	 */
+	x86_platform.guest.enc_kexec_begin(true);
+	x86_platform.guest.enc_kexec_finish();
+
 	crash_save_cpu(regs, safe_smp_processor_id());
 }
 
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..097313147ad3 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
 #include <linux/delay.h>
 #include <linux/objtool.h>
 #include <linux/pgtable.h>
+#include <linux/kexec.h>
 #include <acpi/reboot.h>
 #include <asm/io.h>
 #include <asm/apic.h>
@@ -716,6 +717,14 @@ static void native_machine_emergency_restart(void)
 
 void native_machine_shutdown(void)
 {
+	/*
+	 * Call enc_kexec_begin() while all CPUs are still active and
+	 * interrupts are enabled. This will allow all in-flight memory
+	 * conversions to finish cleanly.
+	 */
+	if (kexec_in_progress)
+		x86_platform.guest.enc_kexec_begin(false);
+
 	/* Stop the cpus and apics */
 #ifdef CONFIG_X86_IO_APIC
 	/*
@@ -752,6 +761,9 @@ void native_machine_shutdown(void)
 #ifdef CONFIG_X86_64
 	x86_platform.iommu_shutdown();
 #endif
+
+	if (kexec_in_progress)
+		x86_platform.guest.enc_kexec_finish();
 }
 
 static void __machine_emergency_restart(int emergency)
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a7143bb7dd93..8a79fb505303 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -138,6 +138,8 @@ static int enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool
 static int enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return 0; }
 static bool enc_tlb_flush_required_noop(bool enc) { return false; }
 static bool enc_cache_flush_required_noop(void) { return false; }
+static void enc_kexec_begin_noop(bool crash) {}
+static void enc_kexec_finish_noop(void) {}
 static bool is_private_mmio_noop(u64 addr) {return false; }
 
 struct x86_platform_ops x86_platform __ro_after_init = {
@@ -161,6 +163,8 @@ struct x86_platform_ops x86_platform __ro_after_init = {
 		.enc_status_change_finish  = enc_status_change_finish_noop,
 		.enc_tlb_flush_required	   = enc_tlb_flush_required_noop,
 		.enc_cache_flush_required  = enc_cache_flush_required_noop,
+		.enc_kexec_begin	   = enc_kexec_begin_noop,
+		.enc_kexec_finish	   = enc_kexec_finish_noop,
 	},
 };
 
-- 
2.43.0


^ permalink raw reply related

* [PATCHv11 19/19] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
From: Kirill A. Shutemov @ 2024-05-28  9:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
	Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
	Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
	Huang, Kai, Ard Biesheuvel, Baoquan He, H. Peter Anvin,
	Kirill A. Shutemov, K. Y. Srinivasan, Haiyang Zhang, kexec,
	linux-hyperv, linux-acpi, linux-coco, linux-kernel, Tao Liu
In-Reply-To: <20240528095522.509667-1-kirill.shutemov@linux.intel.com>

When MADT is parsed, print MULTIPROC_WAKEUP information:

ACPI: MP Wakeup (version[1], mailbox[0x7fffd000], reset[0x7fffe068])

This debug information will be very helpful during bring up.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Tao Liu <ltao@redhat.com>
---
 drivers/acpi/tables.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index b976e5fc3fbc..9e1b01c35070 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -198,6 +198,20 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
 		}
 		break;
 
+	case ACPI_MADT_TYPE_MULTIPROC_WAKEUP:
+		{
+			struct acpi_madt_multiproc_wakeup *p =
+				(struct acpi_madt_multiproc_wakeup *)header;
+			u64 reset_vector = 0;
+
+			if (p->version >= ACPI_MADT_MP_WAKEUP_VERSION_V1)
+				reset_vector = p->reset_vector;
+
+			pr_debug("MP Wakeup (version[%d], mailbox[%#llx], reset[%#llx])\n",
+				 p->version, p->mailbox_address, reset_vector);
+		}
+		break;
+
 	case ACPI_MADT_TYPE_CORE_PIC:
 		{
 			struct acpi_madt_core_pic *p = (struct acpi_madt_core_pic *)header;
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox