linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Nuno Das Neves <nunodasneves@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Cc: "kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"decui@microsoft.com" <decui@microsoft.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"jinankjain@linux.microsoft.com" <jinankjain@linux.microsoft.com>,
	"muminulrussell@gmail.com" <muminulrussell@gmail.com>,
	"skinsburskii@linux.microsoft.com"
	<skinsburskii@linux.microsoft.com>,
	"mrathor@linux.microsoft.com" <mrathor@linux.microsoft.com>,
	"ssengar@linux.microsoft.com" <ssengar@linux.microsoft.com>,
	"apais@linux.microsoft.com" <apais@linux.microsoft.com>,
	"Tianyu.Lan@microsoft.com" <Tianyu.Lan@microsoft.com>,
	"stanislav.kinsburskiy@gmail.com"
	<stanislav.kinsburskiy@gmail.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"prapal@linux.microsoft.com" <prapal@linux.microsoft.com>,
	"muislam@microsoft.com" <muislam@microsoft.com>,
	"anrayabh@linux.microsoft.com" <anrayabh@linux.microsoft.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"corbet@lwn.net" <corbet@lwn.net>
Subject: Re: [PATCH v5 10/10] Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs
Date: Wed, 19 Mar 2025 11:04:22 -0700	[thread overview]
Message-ID: <9791bc6b-d8ad-47ee-8c54-7230d044f8d5@linux.microsoft.com> (raw)
In-Reply-To: <BN7PR02MB4148D51FFF965676AD155A3ED4D92@BN7PR02MB4148.namprd02.prod.outlook.com>

On 3/19/2025 8:26 AM, Michael Kelley wrote:
> From: Michael Kelley <mhklinux@outlook.com> Sent: Tuesday, March 18, 2025 7:10 PM
>>
>> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Tuesday, March
>> 18, 2025 5:34 PM
>>>
>>> On 3/17/2025 4:51 PM, Michael Kelley wrote:
>>>> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, February 26, 2025 3:08 PM
> 
> [snip]
> 
>>>>> +
>>>>> +	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
>>>>> +	if (!region)
>>>>> +		return -EINVAL;
>>> <snip>
>>>> +	case MSHV_GPAP_ACCESS_TYPE_ACCESSED:
>>>>> +		hv_type_mask = 1;
>>>>> +		if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) {
>>>>> +			hv_flags.clear_accessed = 1;
>>>>> +			/* not accessed implies not dirty */
>>>>> +			hv_flags.clear_dirty = 1;
>>>>> +		} else { // MSHV_GPAP_ACCESS_OP_SET
>>>>
>>>> Avoid C++ style comments.
>>>>
>>> Ack
>>>
>>>>> +			hv_flags.set_accessed = 1;
>>>>> +		}
>>>>> +		break;
>>>>> +	case MSHV_GPAP_ACCESS_TYPE_DIRTY:
>>>>> +		hv_type_mask = 2;
>>>>> +		if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) {
>>>>> +			hv_flags.clear_dirty = 1;
>>>>> +		} else { // MSHV_GPAP_ACCESS_OP_SET
>>>>
>>>> Same here.
>>>>
>>> Ack
>>>
>>>>> +			hv_flags.set_dirty = 1;
>>>>> +			/* dirty implies accessed */
>>>>> +			hv_flags.set_accessed = 1;
>>>>> +		}
>>>>> +		break;
>>>>> +	}
>>>>> +
>>>>> +	states = vzalloc(states_buf_sz);
>>>>> +	if (!states)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	ret = hv_call_get_gpa_access_states(partition->pt_id, args.page_count,
>>>>> +					    args.gpap_base, hv_flags, &written,
>>>>> +					    states);
>>>>> +	if (ret)
>>>>> +		goto free_return;
>>>>> +
>>>>> +	/*
>>>>> +	 * Overwrite states buffer with bitmap - the bits in hv_type_mask
>>>>> +	 * correspond to bitfields in hv_gpa_page_access_state
>>>>> +	 */
>>>>> +	for (i = 0; i < written; ++i)
>>>>> +		assign_bit(i, (ulong *)states,
>>>>
>>>> Why the cast to ulong *?  I think this argument to assign_bit() is void *, in
>>>> which case the cast wouldn't be needed.
>>>>
>>> It looks like assign_bit() and friends resolve to a set of functions which do
>>> take an unsigned long pointer, e.g.:
>>>
>>> __set_bit() -> generic___set_bit(unsigned long nr, volatile unsigned long *addr)
>>> set_bit() -> arch_set_bit(unsigned int nr, volatile unsigned long *p)
>>> etc...
>>>
>>> So a cast is necessary.
>>
>> Indeed, you are right.  Seems like set_bit() and friends should take a void *.
>> But that's a different kettle of fish.
>>
>>>
>>>> Also, assign_bit() does atomic bit operations. Doing such in a loop like
>>>> here will really hammer the hardware memory bus with atomic
>>>> read-modify-write cycles. Use __assign_bit() instead, which does
>>>> non-atomic operations. You don't need atomic here as no other
>>>> threads are modifying the bit array.
>>>>
>>> I didn't realize it was atomic. I'll change it to __assign_bit().
>>>
>>>>> +			   states[i].as_uint8 & hv_type_mask);
>>>>
>>>> OK, so the starting contents of "states" is an array of bytes. The ending
>>>> contents is an array of bits. This works because every bit in the ending
>>>> bit array is set to either 0 or 1. Overlap occurs on the first iteration
>>>> where the code reads the 0th byte, and writes the 0th bit, which is part of
>>>> the 0th byte. The second iteration reads the 1st byte, and writes the 1st bit,
>>>> which doesn't overlap, and there's no overlap from then on.
>>>>
>>>> Suppose "written" is not a multiple of 8. The last byte of "states" as an
>>>> array of bits will have some bits that have not been set to either 0 or 1 and
>>>> might be leftover garbage from when "states" was an array of bytes. That
>>>> garbage will get copied to user space. Is that OK? Even if user space knows
>>>> enough to ignore those bits, it seems a little dubious to be copying even
>>>> a few bits of garbage to user space.
>>>>
>>>> Some comments might help here.
>>>>
>>> This is a good point. The expectation is indeed that userspace knows which
>>> bits are valid from the returned "written" value, but I agree it's a bit
>>> odd to have some garbage bits in the last byte. How does this look (to be
>>> inserted here directly after the loop):
>>>
>>> +       /* zero the unused bits in the last byte of the returned bitmap */
>>> +       if (written > 0) {
>>> +               u8 last_bits_mask;
>>> +               int last_byte_idx;
>>> +               int bits_rem = written % 8;
>>> +
>>> +               /* bits_rem == 0 when all bits in the last byte were assigned */
>>> +               if (bits_rem > 0) {
>>> +                       /* written > 0 ensures last_byte_idx >= 0 */
>>> +                       last_byte_idx = ((written + 7) / 8) - 1;
>>> +                       /* bits_rem > 0 ensures this masks 1 to 7 bits */
>>> +                       last_bits_mask = (1 << bits_rem) - 1;
>>> +                       states[last_byte_idx].as_uint8 &= last_bits_mask;
>>> +               }
>>> +       }
>>
>> A simpler approach is to "continue" the previous loop.  And if "written"
>> is zero, this additional loop won't do anything either:
>>
>> 	for (i = written; i < ALIGN(written, 8); ++i)
>> 		__clear_bit(i, (ulong *)states);
>>
> > One further thought here: Could "written" be less than
> args.page_count at this point? That would require
> hv_call_get_gpa_access_states() to not fail, but still return
> a value for written that is less than args.page_count. If that
> could happen, then the above loop should be:
> 
> 	for (i = written; i < bitmap_buf_sz * 8; ++i)
> 		__clear_bit(i, (ulong *)states);
> 
> so that all the uninitialized bits and bytes that will be written
> back to user space are cleared.
> Hmmm...now I'm not so sure where the need for "written" came from in
the first place - in practice "written" will always be equal to
args.page_count except on error, but in that case there's a goto
free_return anyway, so the number is never copied to userspace. And
I checked the userspace code - it doesn't expect a partial result
either.

So it seems to be redundant, but I don't really want to remove it just
now.

Your suggestion with bitmap_buf_sz * 8 should be fine, and will make it
straightforward to remove "written" in a future cleanup if that ends up
looking like a good idea.

>>>
>>> The remaining bytes could be memset() to zero but I think it's fine to leave
>>> them.
>>
>> I agree.  The remaining bytes aren't written back to user space anyway
>> since the copy_to_user() uses bitmap_buf_sz.
> 
> Maybe I misunderstood what you meant by "remaining bytes".  I think
> all bits and bytes that are written back to user space should have
> valid data or zeros so that no garbage is written back.
> 
Agreed.

Nuno

> Michael
> 
>>
>>>
>>>>> +
>>>>> +	args.page_count = written;
>>>>> +
>>>>> +	if (copy_to_user(user_args, &args, sizeof(args))) {
>>>>> +		ret = -EFAULT;
>>>>> +		goto free_return;
>>>>> +	}
>>>>> +	if (copy_to_user((void __user *)args.bitmap_ptr, states, bitmap_buf_sz))
>>>>> +		ret = -EFAULT;
>>>>> +
>>>>> +free_return:
>>>>> +	vfree(states);
>>>>> +	return ret;
>>>>> +}



      reply	other threads:[~2025-03-19 18:07 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 23:07 [PATCH v5 00/10] Introduce /dev/mshv root partition driver Nuno Das Neves
2025-02-26 23:07 ` [PATCH v5 01/10] hyperv: Convert Hyper-V status codes to strings Nuno Das Neves
2025-02-26 23:26   ` Stanislav Kinsburskii
2025-02-27  4:22   ` Easwar Hariharan
2025-02-27 23:48     ` Nuno Das Neves
2025-02-27 17:02   ` Roman Kisel
2025-02-27 22:54     ` Easwar Hariharan
2025-02-27 23:08       ` Roman Kisel
2025-02-27 23:25         ` Easwar Hariharan
2025-02-28 17:20           ` Roman Kisel
2025-02-28 20:22             ` Easwar Hariharan
2025-02-28 22:26               ` Roman Kisel
2025-02-27 23:21       ` Roman Kisel
2025-02-28  0:15     ` Nuno Das Neves
2025-02-28 16:40       ` Roman Kisel
2025-03-06 17:57   ` Michael Kelley
2025-03-06 18:09     ` Michael Kelley
2025-03-06 18:40       ` Nuno Das Neves
2025-03-06 18:57         ` Michael Kelley
2025-03-07 19:38     ` Nuno Das Neves
2025-02-26 23:07 ` [PATCH v5 02/10] x86/mshyperv: Add support for extended Hyper-V features Nuno Das Neves
2025-02-26 23:27   ` Stanislav Kinsburskii
2025-02-27 17:59   ` Roman Kisel
2025-02-28  0:17     ` Nuno Das Neves
2025-02-28 16:42       ` Roman Kisel
2025-02-27 18:17   ` Easwar Hariharan
2025-03-06 18:30   ` Michael Kelley
2025-03-12 18:04     ` Nuno Das Neves
2025-03-10 13:17   ` Tianyu Lan
2025-02-26 23:07 ` [PATCH v5 03/10] arm64/hyperv: Add some missing functions to arm64 Nuno Das Neves
2025-02-26 23:27   ` Stanislav Kinsburskii
2025-02-27  5:56   ` Easwar Hariharan
2025-02-28  0:21     ` Nuno Das Neves
2025-03-06 19:05       ` Michael Kelley
2025-03-07 21:36         ` Nuno Das Neves
2025-03-07 21:55           ` Easwar Hariharan
2025-02-27 18:09   ` Roman Kisel
2025-02-26 23:07 ` [PATCH v5 04/10] hyperv: Introduce hv_recommend_using_aeoi() Nuno Das Neves
2025-02-26 23:28   ` Stanislav Kinsburskii
2025-02-27 18:04   ` Roman Kisel
2025-02-28  0:21     ` Nuno Das Neves
2025-02-27 23:03   ` Easwar Hariharan
2025-02-28  0:33     ` Nuno Das Neves
2025-02-28  0:49       ` Easwar Hariharan
2025-03-06 19:12   ` Michael Kelley
2025-03-10 12:51   ` Tianyu Lan
2025-02-26 23:07 ` [PATCH v5 05/10] acpi: numa: Export node_to_pxm() Nuno Das Neves
2025-02-26 23:31   ` Stanislav Kinsburskii
2025-02-27 23:05   ` Easwar Hariharan
2025-03-06 19:16   ` Michael Kelley
2025-03-10 12:50   ` Tianyu Lan
2025-02-26 23:08 ` [PATCH v5 06/10] Drivers/hv: Export some functions for use by root partition module Nuno Das Neves
2025-02-26 23:32   ` Stanislav Kinsburskii
2025-02-27 18:11   ` Roman Kisel
2025-02-28  0:51   ` Easwar Hariharan
2025-03-06 19:23   ` Michael Kelley
2025-03-07 21:38     ` Nuno Das Neves
2025-02-26 23:08 ` [PATCH v5 07/10] Drivers: hv: Introduce per-cpu event ring tail Nuno Das Neves
2025-02-26 23:39   ` Stanislav Kinsburskii
2025-03-07 17:02   ` Michael Kelley
2025-03-07 22:06     ` Nuno Das Neves
2025-03-07 23:21       ` Michael Kelley
2025-03-07 23:31         ` Nuno Das Neves
2025-03-07 23:37         ` Michael Kelley
2025-03-10 13:01   ` Tianyu Lan
2025-03-12 19:44     ` Nuno Das Neves
2025-03-13  7:34       ` Tianyu Lan
2025-03-13 15:56         ` Nuno Das Neves
2025-03-13 16:00           ` Tianyu Lan
2025-02-26 23:08 ` [PATCH v5 08/10] x86: hyperv: Add mshv_handler irq handler and setup function Nuno Das Neves
2025-02-26 23:43   ` Stanislav Kinsburskii
2025-03-01  0:38     ` Nuno Das Neves
2025-03-07 17:38       ` Michael Kelley
2025-03-10 21:46         ` Nuno Das Neves
2025-03-10 22:23           ` Michael Kelley
2025-03-07 17:44   ` Michael Kelley
2025-03-07 23:29     ` Nuno Das Neves
2025-03-07 23:45       ` Michael Kelley
2025-02-26 23:08 ` [PATCH v5 09/10] hyperv: Add definitions for root partition driver to hv headers Nuno Das Neves
2025-02-26 23:51   ` Stanislav Kinsburskii
2025-03-01  0:46     ` Nuno Das Neves
2025-02-27 18:13   ` Roman Kisel
2025-02-28  1:27   ` Easwar Hariharan
2025-03-01  0:52     ` Nuno Das Neves
2025-03-07 17:26   ` Michael Kelley
2025-03-07 23:35     ` Nuno Das Neves
2025-03-10 12:40   ` Tianyu Lan
2025-03-12 20:17     ` Nuno Das Neves
     [not found] ` <1740611284-27506-11-git-send-email-nunodasneves@linux.microsoft.com>
2025-02-27  4:59   ` [PATCH v5 10/10] Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs Easwar Hariharan
2025-03-01  1:29     ` Nuno Das Neves
2025-02-27 18:50   ` Roman Kisel
2025-03-01  1:38     ` Nuno Das Neves
2025-03-06 17:32     ` Wei Liu
2025-03-07 18:06       ` Roman Kisel
2025-03-11 18:01   ` Jeff Johnson
2025-03-14 19:25     ` Nuno Das Neves
2025-03-13 16:43   ` Michael Kelley
2025-03-14  2:15     ` Nuno Das Neves
2025-03-14  3:27       ` Michael Kelley
2025-03-17 23:51   ` Michael Kelley
2025-03-18 17:24     ` Wei Liu
2025-03-18 17:45       ` Michael Kelley
2025-03-18 20:07         ` Wei Liu
2025-03-19  0:34     ` Nuno Das Neves
2025-03-19  2:10       ` Michael Kelley
2025-03-19 15:26         ` Michael Kelley
2025-03-19 18:04           ` Nuno Das Neves [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9791bc6b-d8ad-47ee-8c54-7230d044f8d5@linux.microsoft.com \
    --to=nunodasneves@linux.microsoft.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=anrayabh@linux.microsoft.com \
    --cc=apais@linux.microsoft.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=daniel.lezcano@linaro.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=jinankjain@linux.microsoft.com \
    --cc=joro@8bytes.org \
    --cc=kys@microsoft.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=mingo@redhat.com \
    --cc=mrathor@linux.microsoft.com \
    --cc=muislam@microsoft.com \
    --cc=muminulrussell@gmail.com \
    --cc=prapal@linux.microsoft.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=stanislav.kinsburskiy@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).