All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Amos Kong <akong@redhat.com>, virtualization@lists.linux-foundation.org
Cc: herbert@gondor.apana.org.au, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, m@bues.ch, mpm@selenic.com,
	amit.shah@redhat.com,
	Linus Torvalds <torvalds@linux-foundation.org>,
	mb@bu3sch.de
Subject: Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
Date: Thu, 18 Sep 2014 12:13:08 +0930	[thread overview]
Message-ID: <87wq91odhf.fsf@rustcorp.com.au> (raw)
In-Reply-To: <1410796949-2221-3-git-send-email-akong@redhat.com>

Amos Kong <akong@redhat.com> writes:

> I started a QEMU (non-smp) guest with one virtio-rng device, and read
> random data from /dev/hwrng by dd:
>
>  # dd if=/dev/hwrng of=/dev/null &
>
> In the same time, if I check hwrng attributes from sysfs by cat:
>
>  # cat /sys/class/misc/hw_random/rng_*
>
> The cat process always gets stuck with slow backend (5 k/s), if we
> use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> minutes. The stuck doesn't exist for smp guest.
>
> Reading syscall enters kernel and call rng_dev_read(), it's user
> context. We used need_resched() to check if other tasks need to
> be run, but it almost always return false, and re-hold the mutex
> lock. The attributes accessing process always fails to hold the
> lock, so the cat gets stuck.
>
> User context doesn't allow other user contexts run on that CPU,
> unless the kernel code sleeps for some reason. This is why the
> need_reshed() always return false here.
>
> This patch removed need_resched() and always schedule other tasks
> then other tasks can have chance to hold the lock and execute
> protected code.

OK, this is going to be a rant.

Your explanation doesn't make sense at all.  Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests.  For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday.  That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr....
Rusty.

> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  drivers/char/hw_random/core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index c591d7e..263a370 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		if (need_resched())
> -			schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(1);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;
> -- 
> 1.9.3

WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Amos Kong <akong@redhat.com>, virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org, herbert@gondor.apana.org.au, m@bues.ch,
	mb@bu3sch.de, mpm@selenic.com, amit.shah@redhat.com,
	linux-kernel@vger.kernel.org
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
Date: Thu, 18 Sep 2014 12:13:08 +0930	[thread overview]
Message-ID: <87wq91odhf.fsf@rustcorp.com.au> (raw)
In-Reply-To: <1410796949-2221-3-git-send-email-akong@redhat.com>

Amos Kong <akong@redhat.com> writes:

> I started a QEMU (non-smp) guest with one virtio-rng device, and read
> random data from /dev/hwrng by dd:
>
>  # dd if=/dev/hwrng of=/dev/null &
>
> In the same time, if I check hwrng attributes from sysfs by cat:
>
>  # cat /sys/class/misc/hw_random/rng_*
>
> The cat process always gets stuck with slow backend (5 k/s), if we
> use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> minutes. The stuck doesn't exist for smp guest.
>
> Reading syscall enters kernel and call rng_dev_read(), it's user
> context. We used need_resched() to check if other tasks need to
> be run, but it almost always return false, and re-hold the mutex
> lock. The attributes accessing process always fails to hold the
> lock, so the cat gets stuck.
>
> User context doesn't allow other user contexts run on that CPU,
> unless the kernel code sleeps for some reason. This is why the
> need_reshed() always return false here.
>
> This patch removed need_resched() and always schedule other tasks
> then other tasks can have chance to hold the lock and execute
> protected code.

OK, this is going to be a rant.

Your explanation doesn't make sense at all.  Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests.  For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday.  That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr....
Rusty.

> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
>  drivers/char/hw_random/core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index c591d7e..263a370 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		if (need_resched())
> -			schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(1);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;
> -- 
> 1.9.3


  reply	other threads:[~2014-09-18  2:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong
2014-09-15 16:02 ` Amos Kong
2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong
2014-09-15 16:02   ` Amos Kong
2014-09-15 16:13   ` Michael Büsch
2014-09-15 16:13   ` Michael Büsch
2014-09-16  0:30     ` Amos Kong
2014-09-16  0:30       ` Amos Kong
2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
2014-09-15 16:02   ` Amos Kong
2014-09-18  2:43   ` Rusty Russell [this message]
2014-09-18  2:43     ` Rusty Russell
2014-09-18  2:48     ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell
2014-09-18  2:48       ` Rusty Russell
2014-09-18  2:48       ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell
2014-09-18  2:48       ` Rusty Russell
2014-09-18 12:22         ` Amos Kong
2014-09-18 12:22           ` Amos Kong
2014-09-18  2:48       ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell
2014-09-18  2:48         ` Rusty Russell
2014-10-21 14:15         ` Herbert Xu
2014-10-21 14:15           ` Herbert Xu
2014-11-03 15:24           ` Amos Kong
2014-11-03 15:24             ` Amos Kong
2014-09-18  2:48       ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell
2014-09-18  2:48         ` Rusty Russell
2014-09-18  2:48       ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell
2014-09-18  2:48         ` Rusty Russell
2014-09-18 12:47     ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong
2014-09-18 12:47       ` Amos Kong
2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong
2014-09-15 16:02 ` Amos Kong
2014-09-15 16:13   ` Michael Büsch
2014-09-15 16:13   ` Michael Büsch
2014-09-16  0:27     ` Amos Kong
2014-09-16  0:27       ` Amos Kong
2014-09-16 15:01       ` Michael Büsch
2014-09-16 15:01         ` Michael Büsch
2014-09-17  9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu
2014-09-17  9:30   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wq91odhf.fsf@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=akong@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m@bues.ch \
    --cc=mb@bu3sch.de \
    --cc=mpm@selenic.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.