public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: Jason Wang <jasowang@redhat.com>,
	"gregkh\@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"driverdev-devel\@linuxdriverproject.org" 
	<driverdev-devel@linuxdriverproject.org>,
	"olaf\@aepfle.de" <olaf@aepfle.de>,
	"apw\@canonical.com" <apw@canonical.com>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>
Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure
Date: Mon, 01 Dec 2014 10:55:02 +0100	[thread overview]
Message-ID: <871tojzocp.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <F792CF86EFE20D4AB8064279AFBA51C613E691B8@HKNPRD3002MB017.064d.mgd.msft.net> (Dexuan Cui's message of "Mon, 1 Dec 2014 09:47:14 +0000")

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Monday, December 1, 2014 16:23 PM
>> To: Dexuan Cui
>> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org; driverdev-
>> devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; KY
>> Srinivasan; vkuznets@redhat.com; Haiyang Zhang
>> Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
>> failure
>> On Fri, Nov 28, 2014 at 7:54 PM, Dexuan Cui <decui@microsoft.com> wrote:
>> >>  -----Original Message-----
>> >>  From: Jason Wang [mailto:jasowang@redhat.com]
>> >>  Sent: Friday, November 28, 2014 18:13 PM
>> >>  To: Dexuan Cui
>> >>  Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
>> >> driverdev-
>> >>  devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; KY
>> >>  Srinivasan; vkuznets@redhat.com; Haiyang Zhang
>> >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
>> >> transfer
>> >>  failure
>> >>  On Fri, Nov 28, 2014 at 4:36 PM, Dexuan Cui <decui@microsoft.com>
>> >> wrote:
>> >>  >>  -----Original Message-----
>> >>  >>  From: Jason Wang [mailto:jasowang@redhat.com]
>> >>  >>  Sent: Friday, November 28, 2014 14:47 PM
>> >>  >>  To: Dexuan Cui
>> >>  >>  Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
>> >>  >> driverdev-
>> >>  >>  devel@linuxdriverproject.org; olaf@aepfle.de;
>> >> apw@canonical.com; KY
>> >>  >>  Srinivasan; vkuznets@redhat.com; Haiyang Zhang
>> >>  >>  Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message
>> >> on
>> >>  >> transfer
>> >>  >>  failure
>> >>  >>  On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui
>> >> <decui@microsoft.com>
>> >>  >> wrote:
>> >>  >>  > In the case the user-space daemon crashes, hangs or is
>> >> killed, we
>> >>  >>  > need to down the semaphore, otherwise, after the daemon starts
>> >>  >> next
>> >>  >>  > time, the obsolete data in fcopy_transaction.message or
>> >>  >>  > fcopy_transaction.fcopy_msg will be used immediately.
>> >>  >>  >
>> >>  >>  > Cc: Jason Wang <jasowang@redhat.com>
>> >>  >>  > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >>  >>  > Cc: K. Y. Srinivasan <kys@microsoft.com>
>> >>  >>  > Signed-off-by: Dexuan Cui <decui@microsoft.com>
>> >>  >>  > ---
>> >>  >>  >
>> >>  >>  > v2: I removed the "FCP" prefix as Greg asked.
>> >>  >>  >
>> >>  >>  >     I also updated the output message a little:
>> >>  >>  >     "FCP: failed to acquire the semaphore" -->
>> >>  >>  >     "can not acquire the semaphore: it is benign"
>> >>  >>  >
>> >>  >>  > v3: I added the code in fcopy_release() as Jason Wang
>> >> suggested.
>> >>  >>  >     I removed the pr_debug (it isn't so meaningful)and added a
>> >>  >>  > comment instead.
>> >>  >>  >
>> >>  >>  >  drivers/hv/hv_fcopy.c | 19 +++++++++++++++++++
>> >>  >>  >  1 file changed, 19 insertions(+)
>> >>  >>  >
>> >>  >>  > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
>> >>  >>  > index 23b2ce2..faa6ba6 100644
>> >>  >>  > --- a/drivers/hv/hv_fcopy.c
>> >>  >>  > +++ b/drivers/hv/hv_fcopy.c
>> >>  >>  > @@ -86,6 +86,18 @@ static void fcopy_work_func(struct
>> >> work_struct
>> >>  >>  > *dummy)
>> >>  >>  >  	 * process the pending transaction.
>> >>  >>  >  	 */
>> >>  >>  >  	fcopy_respond_to_host(HV_E_FAIL);
>> >>  >>  > +
>> >>  >>  > +	/* In the case the user-space daemon crashes, hangs or is
>> >>  >> killed, we
>> >>  >>  > +	 * need to down the semaphore, otherwise, after the daemon
>> >>  >> starts
>> >>  >>  > next
>> >>  >>  > +	 * time, the obsolete data in fcopy_transaction.message or
>> >>  >>  > +	 * fcopy_transaction.fcopy_msg will be used immediately.
>> >>  >>  > +	 *
>> >>  >>  > +	 * NOTE: fcopy_read() happens to get the semaphore (very
>> >> rare)?
>> >>  >>  > We're
>> >>  >>  > +	 * still OK, because we've reported the failure to the host.
>> >>  >>  > +	 */
>> >>  >>  > +	if (down_trylock(&fcopy_transaction.read_sema))
>> >>  >>  > +		;
>> >>  >>
>> >>  >>  Sorry, I'm not quite understand how if () ; can help here.
>> >>  >>
>> >>  >>  Btw, a question not relate to this patch.
>> >>  >>
>> >>  >>  What happens if a daemon is resume from SIGSTOP and expires the
>> >>  >> check
>> >>  >>  here?
>> >>  > Hi Jason,
>> >>  > My idea is: here we need down_trylock(), but in case we can't get
>> >> the
>> >>  > semaphore, it's OK anyway:
>> >>  >
>> >>  > Scenario 1):
>> >>  > 1.1: when the daemon is blocked on the pread(), the daemon
>> >> receives
>> >>  > SIGSTOP;
>> >>  > 1.2: the host user runs the PowerShell Copy-VMFile command;
>> >>  > 1.3.1: the driver reports the failure to the host user in 5s and
>> >>  > 1.3.2: the driver down()-es the semaphore;
>> >>  > 1.4: the daemon receives SIGCONT and it will be still blocked on
>> >> the
>> >>  > pread().
>> >>  > Without the down_trylock(), in 1.4, the daemon can receive an
>> >>  > obsolete message.
>> >>  > NOTE: in this scenario, the daemon is not killed.
>> >>  >
>> >>  > Scenario 2):
>> >>  > In senario 1), if the daemon receives SIGCONT between 1.3.1 and
>> >> 1.3.2
>> >>  > and
>> >>  > do down() in fcopy_read(), it will receive the message but: the
>> >>  > driver has
>> >>  > reported the failure to the host user and the driver's 1.3.2 can't
>> >>  > get the
>> >>  > semaphore -- IMO this is acceptably OK, though in the VM, an
>> >>  > incomplete
>> >>  > file will be left there.
>> >>  > BTW, I think in the daemon's hv_start_fcopy() we should add a
>> >>  > close(target_fd) before open()-ing a new one.
>> >>
>> >>  Right, but how about the case when resuming from SIGSTOP but no
>> >> timeout?
>> > Sorry, I don't understand this:
>> > if no timeout, fcopy_read() will get the semaphore and fcopy_write()
>> > will try to cancel fcopy_work.
>> 
>> Yes.
>> >
>> >
>> >>  Looks like in this case userspace() may wait in down_interruptible()
>> >>  until timeout. We probably need something like this:
>> >>
>> >>          if (down_interruptible(&fcopy_transaction.read_sema)) {
>> >>                  up(&fcopy_transaction.read_sema);
>> >>                  return -EINTR;
>> >>          }
>> > until "timeout"?
>> > if the daemon can't get the semaphore, it can only be wake by a
>> > signal(the
>> > daemon doesn't install handler, so by default most signals will kill
>> > the daemon).
>> > In case a signal waking up the daemon doesn't kill the daemon, why
>> > should
>> > we do up()?
>> 
>> True, no need since we do down_trylock() in release().
>> 
>> Btw, there's no EINTR handling in handling pread() return value,
>> may add such one which should be useful for something like debugging.
>> 
>> >
>> >
>> >>
>> >>  This should synchronize with the timeout work for sure.
>> >>  But how about only schedule it after this?
>> >>  It does not may sense to start the timer during interrupt
>> >>  since the file may not even opened and it may take time
>> >>  to handle signals?
>> >>
>> >>  >
>> >>  >>  >
>> >>  >>  > +
>> >>  >>  >  }
>> >>  >>  >
>> >>  >>  >  static int fcopy_handle_handshake(u32 version)
>> >>  >>  > @@ -351,6 +363,13 @@ static int fcopy_release(struct inode
>> >> *inode,
>> >>  >>  > struct file *f)
>> >>  >>  >  	 */
>> >>  >>  >  	in_hand_shake = true;
>> >>  >>  >  	opened = false;
>> >>  >>  > +
>> >>  >>  > +	if (cancel_delayed_work_sync(&fcopy_work)) {
>> >>  >>  > +		/* We haven't up()-ed the semaphore(very rare)? */
>> >>  >>  > +		if (down_trylock(&fcopy_transaction.read_sema))
>> >>  >>  > +			;
>> >>  >>
>> >>  >>  And this.
>> >>  >
>> >>  > Scenario 3):
>> >>  > When the daemon exits(e.g., SIGKILL received), if there is a
>> >>  > fcopy_work
>> >>  > pending (scheduled but not start to run yet), we should cancel the
>> >>  > work (as you suggested) and down() the semaphore, otherwise, the
>> >>  > obsolete message will be received by the next instance of the
>> >> daemon.
>> >>
>> >>  Yes
>> >>  >
>> >>  >
>> >>  > Scenario 4):  in the driver's hv_fcopy_onchannelcallback():
>> >>  >         schedule_delayed_work(&fcopy_work, 5*HZ);
>> >>  >         ----> if fcopy_release() is running on another vcpu, just
>> >>  > before the next line?
>> >>  >         fcopy_send_data();
>> >>  >
>> >>  > In this case, fcopy_release() can cancel fcopy_work, but
>> >>  > can't get the semaphore since it hasn't been up()-ed.
>> >>  > Hmm, in this case,   fcopy_send_data() will do up()  later, and
>> >> we'll
>> >>  > buffer an obsolete message in the driver, and the message will be
>> >>  > fetched by the next instance of the daemon...
>> >>  >
>> >>  > Looks we need a spinlock here?
>> >>
>> >>  Unless fcopy_release() can wait for all data for current transation
>> >>  to be received. Spinlock won't help.
>> >>
>> >>  But an idea is let the daemon the handle such cases. E.g make sure
>> >> the
>> >>  processing begins with START_COPY and end with
>> COMPLETE/CANCEL_COPY.
>> >>  Drop all requests that does not start with START_COPY.
>> >>
>> >>  Thought?
>> > Good idea.
>> > I also think we should reinforce the concept of state machine in the
>> > daemon code.
>> 
>> Yes, it needs.
> I agree.
> Obviously we can do something to make the daemon/driver work better
> in the corner cases.
>
>> >
>> > The daemon/driver communication has so many corner cases...
>> 
>> Looks so, let's first address the issue mentioned in this patch.
> OK.
>
>> I don't have any more comments other than changing
>> 
>>  if(down_trylock(&fcopy_transaction.read_sema))
>>      ;
>> 
>> to
>> 
>> down_trylock(&fcopy_transaction.read_sema);
> Hi Jason,
> This is to address Vitaly's comment in the bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=1162100#c5
>
> down_trylock(&fcopy_transaction.read_sema) will
>
> "
> produces the following compile warning:
> drivers/hv/hv_fcopy.c: In function ‘fcopy_work_func’:
> drivers/hv/hv_fcopy.c:95:2: warning: ignoring return value of ‘down_trylock’, declared with attribute warn_unused_result [-Wunused-result]
>   (void)down_trylock(&fcopy_transaction.read_sema);
> "
>
> Actually I personally don't care about the warning, because we only
> see it when we run some kind of code checker program.  :-)
>
> I can change my v3  to the "normal" style you prefer, if
> there is no strong objection from Vitaly?

Well, I still prefer pr_debug() there: these are being compiled out on
non-debug builds. I'm not objecting to any solution here though, really
a minor issue.

>
> Thanks,
> -- Dexuan

-- 
  Vitaly

  reply	other threads:[~2014-12-01  9:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-27 13:09 [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure Dexuan Cui
2014-11-28  6:47 ` Jason Wang
2014-11-28  8:36   ` Dexuan Cui
     [not found]     ` <F792CF86EFE20D4AB8064279AFBA51C613E66566@HKNPRD3002MB017.064d.mgd.msft.net >
2014-11-28 10:12       ` Jason Wang
2014-11-28 11:54         ` Dexuan Cui
     [not found]           ` <F792CF86EFE20D4AB8064279AFBA51C613E66AAA@HKNPRD3002MB017.064d.mgd.msft.net >
2014-12-01  8:22             ` Jason Wang
2014-12-01  9:47               ` Dexuan Cui
2014-12-01  9:55                 ` Vitaly Kuznetsov [this message]
     [not found]                 ` <F792CF86EFE20D4AB8064279AFBA51C613E691B8@HKNPRD3002MB017.064d.mgd.msft.net >
2014-12-01 10:18                   ` Jason Wang
2014-12-01 11:00                     ` Dexuan Cui
2014-12-01 15:54                       ` KY Srinivasan
2014-12-02  5:33                         ` Dexuan Cui
     [not found]                           ` <F792CF86EFE20D4AB8064279AFBA51C613E6B93B@HKNPRD3002MB017.064d.mgd.msft.net >
2014-12-02  6:58                             ` Jason Wang
2015-01-13 13:52 ` Vitaly Kuznetsov
2015-01-14  1:43   ` Dexuan Cui
     [not found]     ` <F792CF86EFE20D4AB8064279AFBA51C61EA7CFF9@SIXPRD3002MB028.064d.mgd.msft.net >
2015-01-14  2:28       ` Jason Wang
2015-01-14  3:21     ` KY Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871tojzocp.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=apw@canonical.com \
    --cc=decui@microsoft.com \
    --cc=driverdev-devel@linuxdriverproject.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olaf@aepfle.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox