* Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup
From: Michael S. Tsirkin @ 2010-05-27 13:12 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Sridhar Samudrala, netdev, lkml, kvm@vger.kernel.org,
Andrew Morton, Dmitri Vorobiev, Tejun Heo, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100527124448.GA4241@redhat.com>
On Thu, May 27, 2010 at 02:44:48PM +0200, Oleg Nesterov wrote:
> On 05/27, Michael S. Tsirkin wrote:
> >
> > On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
> > > Add a new kernel API to create a singlethread workqueue and attach it's
> > > task to current task's cgroup and cpumask.
> > >
> > > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> >
> > Could someone familiar with workqueue code please comment on whether
> > this patch is suitable for 2.6.35?
> >
> > It is needed to fix the case where vhost user might cause a kernel
> > thread to consume more CPU than allowed by the cgroup.
> > Should I merge it through the vhost tree?
> > Ack for this?
>
> I don't understand the reasons for this patch, but this doesn't matter.
Depending on userspace application, driver can create a lot of work
for a workqueue to handle. By making the workqueue thread
belong in a cgroup, we make it possible to the CPU and other
resources thus consumed.
--
MST
^ permalink raw reply
* Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup
From: Oleg Nesterov @ 2010-05-27 12:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Sridhar Samudrala, netdev, lkml, kvm@vger.kernel.org,
Andrew Morton, Dmitri Vorobiev, Tejun Heo, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100527091426.GA6308@redhat.com>
On 05/27, Michael S. Tsirkin wrote:
>
> On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
> > Add a new kernel API to create a singlethread workqueue and attach it's
> > task to current task's cgroup and cpumask.
> >
> > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
>
> Could someone familiar with workqueue code please comment on whether
> this patch is suitable for 2.6.35?
>
> It is needed to fix the case where vhost user might cause a kernel
> thread to consume more CPU than allowed by the cgroup.
> Should I merge it through the vhost tree?
> Ack for this?
I don't understand the reasons for this patch, but this doesn't matter.
I don't really see any need to change workqueue.c,
> > +static struct task_struct *get_singlethread_wq_task(struct workqueue_struct *wq)
> > +{
> > + return (per_cpu_ptr(wq->cpu_wq, singlethread_cpu))->thread;
> > +}
(Not sure this trivial static helper with the single caller makes sense, but
see below)
> > +/* Create a singlethread workqueue and attach it's task to the current task's
> > + * cgroup and set it's cpumask to the current task's cpumask.
> > + */
> > +struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name)
> > +{
> > + struct workqueue_struct *wq;
> > + struct task_struct *task;
> > + cpumask_var_t mask;
> > +
> > + wq = create_singlethread_workqueue(name);
> > + if (!wq)
> > + goto out;
> > +
> > + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> > + goto err;
> > +
> > + if (sched_getaffinity(current->pid, mask))
> > + goto err;
> > +
> > + task = get_singlethread_wq_task(wq);
> > + if (sched_setaffinity(task->pid, mask))
> > + goto err;
> > +
> > + if (cgroup_attach_task_current_cg(task))
> > + goto err;
> > +out:
> > + return wq;
> > +err:
> > + destroy_workqueue(wq);
> > + wq = NULL;
> > + goto out;
> > +}
Instead, cgroup.c (or whoever needs this) can do
struct move_struct {
struct work_struct work;
int ret;
};
static void move_func(struct work_struct *work)
{
struct move_struct *move = container_of(...);
if (cgroup_attach_task_current_cg(current))
ret = -EANY;
}
static struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name)
{
struct workqueue_struct *wq;
struct move_struct move = {
.work = __WORK_INITIALIZER(move_func);
};
wq = create_singlethread_workqueue(name);
if (!wq)
return NULL;
queue_work(&move.work);
flush_work(&move.work);
if (move.ret) {
destroy_workqueue(wq);
wq = NULL;
}
return wq;
}
Or. Just export wq_per_cpu() from workqueue.c (probably with a better name) and
use it like the patch does.
But, imho, create_singlethread_workqueue_in_current_cg() does not belong
to workqueue.c.
Oleg.
^ permalink raw reply
* [PATCH 6/11] drivers/net: Eliminate a NULL pointer dereference
From: Julia Lawall @ 2010-05-27 12:33 UTC (permalink / raw)
To: netdev, linux-kernel, kernel-janitors
From: Julia Lawall <julia@diku.dk>
At the point of the print, dev is NULL.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
expression E,E1;
identifier f;
statement S1,S2,S3;
@@
if ((E == NULL && ...) || ...)
{
... when != if (...) S1 else S2
when != E = E1
* E->f
... when any
return ...;
}
else S3
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
---
drivers/net/3c507.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/3c507.c b/drivers/net/3c507.c
index 82eaf65..ea9b7a0 100644
--- a/drivers/net/3c507.c
+++ b/drivers/net/3c507.c
@@ -551,8 +551,7 @@ static irqreturn_t el16_interrupt(int irq, void *dev_id)
void __iomem *shmem;
if (dev == NULL) {
- pr_err("%s: net_interrupt(): irq %d for unknown device.\n",
- dev->name, irq);
+ pr_err("net_interrupt(): irq %d for unknown device.\n", irq);
return IRQ_NONE;
}
^ permalink raw reply related
* [PATCH 4/11] drivers/net/hamradio: Eliminate a NULL pointer dereference
From: Julia Lawall @ 2010-05-27 12:33 UTC (permalink / raw)
To: Jean-Paul Roubelat, linux-hams, netdev, linux-kernel,
kernel-janitors
From: Julia Lawall <julia@diku.dk>
At the point of the print, dev is NULL.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
expression E,E1;
identifier f;
statement S1,S2,S3;
@@
if ((E == NULL && ...) || ...)
{
... when != if (...) S1 else S2
when != E = E1
* E->f
... when any
return ...;
}
else S3
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
---
drivers/net/hamradio/yam.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index 694132e..4e7d1d0 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -1151,8 +1151,7 @@ static int __init yam_init_driver(void)
dev = alloc_netdev(sizeof(struct yam_port), name,
yam_setup);
if (!dev) {
- printk(KERN_ERR "yam: cannot allocate net device %s\n",
- dev->name);
+ pr_err("yam: cannot allocate net device\n");
err = -ENOMEM;
goto error;
}
^ permalink raw reply related
* [GIT PULL net-2.6] vhost-net: error handling fixes
From: Michael S. Tsirkin @ 2010-05-27 11:57 UTC (permalink / raw)
To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel, krkumar2
David,
The following tree includes fixes dealing with error handling
in vhost-net. It is on top of net-2.6.
Please merge it for 2.6.35.
Thanks!
The following changes since commit 8a74ad60a546b13bd1096b2a61a7a5c6fd9ae17c:
net: fix lock_sock_bh/unlock_sock_bh (2010-05-27 00:30:53 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
Krishna Kumar (1):
vhost: Fix host panic if ioctl called with wrong index
Takuya Yoshikawa (3):
vhost: fix to check the return value of copy_to/from_user() correctly
vhost-net: fix to check the return value of copy_to/from_user() correctly
vhost: fix the memory leak which will happen when memory_access_ok fails
drivers/vhost/net.c | 14 ++++++------
drivers/vhost/vhost.c | 57 +++++++++++++++++++++++++++---------------------
2 files changed, 39 insertions(+), 32 deletions(-)
--
MST
^ permalink raw reply
* Re: ipv6: Add GSO support on forwarding path
From: Herbert Xu @ 2010-05-27 11:54 UTC (permalink / raw)
To: Ralf Baechle; +Cc: David S. Miller, netdev
In-Reply-To: <20100527102429.GA9017@gondor.apana.org.au>
On Thu, May 27, 2010 at 08:24:29PM +1000, Herbert Xu wrote:
>
> Actually, this patch is still not quite right for the GSO case.
> Right now it's only adding the IP header size, not the full header
> size.
>
> I need to think a bit more about this.
Let's do it the old way first, so that it at least works for the
common case. I'll fix it properly later.
ipv6: Add GSO support on forwarding path
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.
Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
hen they really should be checking gso_size + header instead.
I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message. Someone
should add that to be 100% correct.
Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cd963f6..89425af 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -507,7 +507,7 @@ int ip6_forward(struct sk_buff *skb)
if (mtu < IPV6_MIN_MTU)
mtu = IPV6_MIN_MTU;
- if (skb->len > mtu) {
+ if (skb->len > mtu && !skb_is_gso(skb)) {
/* Again, force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* [GIT PULL net-next-2.6] vhost-net cleanups
From: Michael S. Tsirkin @ 2010-05-27 10:58 UTC (permalink / raw)
To: David Miller; +Cc: kvm, virtualization, netdev, linux-kernel, jdike, tklauser
David,
The following tree is on top of net-next-2.6.
Please merge it for 2.6.36.
Thanks!
The following changes since commit 7a9b149212f3716c598afe973b6261fd58453b7a:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 (2010-05-20 21:26:12 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next
Jeff Dike (1):
vhost-net: minor cleanup
Michael S. Tsirkin (1):
vhost: whitespace fix
Tobias Klauser (1):
vhost: Storage class should be before const qualifier
drivers/vhost/net.c | 13 ++++++-------
drivers/vhost/vhost.c | 4 ++--
2 files changed, 8 insertions(+), 9 deletions(-)
--
MST
^ permalink raw reply
* Re: [PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails
From: Michael S. Tsirkin @ 2010-05-27 10:49 UTC (permalink / raw)
To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527190356.cbf2aac7.yoshikawa.takuya@oss.ntt.co.jp>
On Thu, May 27, 2010 at 07:03:56PM +0900, Takuya Yoshikawa wrote:
> We need to free newmem when vhost_set_memory() fails to complete.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
Thanks, applied.
> drivers/vhost/vhost.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9633a3c..1241a22 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
> return -EFAULT;
> }
>
> - if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
> + if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL))) {
> + kfree(newmem);
> return -EFAULT;
> + }
> oldmem = d->memory;
> rcu_assign_pointer(d->memory, newmem);
> synchronize_rcu();
> --
> 1.7.0.4
^ permalink raw reply
* Re: [PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly
From: Michael S. Tsirkin @ 2010-05-27 10:48 UTC (permalink / raw)
To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527190158.4587a2db.yoshikawa.takuya@oss.ntt.co.jp>
On Thu, May 27, 2010 at 07:01:58PM +0900, Takuya Yoshikawa wrote:
> copy_to/from_user() returns the number of bytes that could not be copied.
>
> So we need to check if it is not zero, and in that case, we should return
> the error number -EFAULT rather than directly return the return value from
> copy_to/from_user().
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Thanks, applied.
> ---
> drivers/vhost/net.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index aa88911..0f41c91 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> int r;
> switch (ioctl) {
> case VHOST_NET_SET_BACKEND:
> - r = copy_from_user(&backend, argp, sizeof backend);
> - if (r < 0)
> - return r;
> + if (copy_from_user(&backend, argp, sizeof backend))
> + return -EFAULT;
> return vhost_net_set_backend(n, backend.index, backend.fd);
> case VHOST_GET_FEATURES:
> features = VHOST_FEATURES;
> - return copy_to_user(featurep, &features, sizeof features);
> + if (copy_to_user(featurep, &features, sizeof features))
> + return -EFAULT;
> + return 0;
> case VHOST_SET_FEATURES:
> - r = copy_from_user(&features, featurep, sizeof features);
> - if (r < 0)
> - return r;
> + if (copy_from_user(&features, featurep, sizeof features))
> + return -EFAULT;
> if (features & ~VHOST_FEATURES)
> return -EOPNOTSUPP;
> return vhost_net_set_features(n, features);
> --
> 1.7.0.4
^ permalink raw reply
* Re: [PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly
From: Michael S. Tsirkin @ 2010-05-27 10:48 UTC (permalink / raw)
To: Takuya Yoshikawa; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>
On Thu, May 27, 2010 at 06:58:03PM +0900, Takuya Yoshikawa wrote:
> copy_to/from_user() returns the number of bytes that could not be copied.
>
> So we need to check if it is not zero, and in that case, we should return
> the error number -EFAULT rather than directly return the return value from
> copy_to/from_user().
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Thanks, applied.
> ---
> drivers/vhost/vhost.c | 51 ++++++++++++++++++++++++++----------------------
> 1 files changed, 28 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5c9c657..9633a3c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
> {
> struct vhost_memory mem, *newmem, *oldmem;
> unsigned long size = offsetof(struct vhost_memory, regions);
> - long r;
> - r = copy_from_user(&mem, m, size);
> - if (r)
> - return r;
> + if (copy_from_user(&mem, m, size))
> + return -EFAULT;
> if (mem.padding)
> return -EOPNOTSUPP;
> if (mem.nregions > VHOST_MEMORY_MAX_NREGIONS)
> @@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
> return -ENOMEM;
>
> memcpy(newmem, &mem, size);
> - r = copy_from_user(newmem->regions, m->regions,
> - mem.nregions * sizeof *m->regions);
> - if (r) {
> + if (copy_from_user(newmem->regions, m->regions,
> + mem.nregions * sizeof *m->regions)) {
> kfree(newmem);
> - return r;
> + return -EFAULT;
> }
>
> if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
> @@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> r = -EBUSY;
> break;
> }
> - r = copy_from_user(&s, argp, sizeof s);
> - if (r < 0)
> + if (copy_from_user(&s, argp, sizeof s)) {
> + r = -EFAULT;
> break;
> + }
> if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
> r = -EINVAL;
> break;
> @@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> r = -EBUSY;
> break;
> }
> - r = copy_from_user(&s, argp, sizeof s);
> - if (r < 0)
> + if (copy_from_user(&s, argp, sizeof s)) {
> + r = -EFAULT;
> break;
> + }
> if (s.num > 0xffff) {
> r = -EINVAL;
> break;
> @@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> case VHOST_GET_VRING_BASE:
> s.index = idx;
> s.num = vq->last_avail_idx;
> - r = copy_to_user(argp, &s, sizeof s);
> + if (copy_to_user(argp, &s, sizeof s))
> + r = -EFAULT;
> break;
> case VHOST_SET_VRING_ADDR:
> - r = copy_from_user(&a, argp, sizeof a);
> - if (r < 0)
> + if (copy_from_user(&a, argp, sizeof a)) {
> + r = -EFAULT;
> break;
> + }
> if (a.flags & ~(0x1 << VHOST_VRING_F_LOG)) {
> r = -EOPNOTSUPP;
> break;
> @@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> vq->used = (void __user *)(unsigned long)a.used_user_addr;
> break;
> case VHOST_SET_VRING_KICK:
> - r = copy_from_user(&f, argp, sizeof f);
> - if (r < 0)
> + if (copy_from_user(&f, argp, sizeof f)) {
> + r = -EFAULT;
> break;
> + }
> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> if (IS_ERR(eventfp)) {
> r = PTR_ERR(eventfp);
> @@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> filep = eventfp;
> break;
> case VHOST_SET_VRING_CALL:
> - r = copy_from_user(&f, argp, sizeof f);
> - if (r < 0)
> + if (copy_from_user(&f, argp, sizeof f)) {
> + r = -EFAULT;
> break;
> + }
> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> if (IS_ERR(eventfp)) {
> r = PTR_ERR(eventfp);
> @@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> filep = eventfp;
> break;
> case VHOST_SET_VRING_ERR:
> - r = copy_from_user(&f, argp, sizeof f);
> - if (r < 0)
> + if (copy_from_user(&f, argp, sizeof f)) {
> + r = -EFAULT;
> break;
> + }
> eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> if (IS_ERR(eventfp)) {
> r = PTR_ERR(eventfp);
> @@ -575,9 +579,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
> r = vhost_set_memory(d, argp);
> break;
> case VHOST_SET_LOG_BASE:
> - r = copy_from_user(&p, argp, sizeof p);
> - if (r < 0)
> + if (copy_from_user(&p, argp, sizeof p)) {
> + r = -EFAULT;
> break;
> + }
> if ((u64)(unsigned long)p != p) {
> r = -EFAULT;
> break;
> --
> 1.7.0.4
^ permalink raw reply
* Re: ipv6: Add GSO support on forwarding path
From: Herbert Xu @ 2010-05-27 10:24 UTC (permalink / raw)
To: Ralf Baechle; +Cc: David S. Miller, netdev
In-Reply-To: <20100526221601.GA3369@gondor.apana.org.au>
On Thu, May 27, 2010 at 08:16:01AM +1000, Herbert Xu wrote:
>
> OK, it looks like my patch doesn't work because the transport
> header isn't set on the forwarding path. Here's an updated patch:
>
> ipv6: Add GSO support on forwarding path
Actually, this patch is still not quite right for the GSO case.
Right now it's only adding the IP header size, not the full header
size.
I need to think a bit more about this.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: ipv6: Add GSO support on forwarding path
From: Ralf Baechle @ 2010-05-27 10:14 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100527092246.GB15298@linux-mips.org>
On Thu, May 27, 2010 at 10:22:46AM +0100, Ralf Baechle wrote:
> > > I tested this on top of a 2.6.34 release kernel and asked the user
> > > experiencing the problem to re-test and the problem still persists.
> >
> > OK, it looks like my patch doesn't work because the transport
> > header isn't set on the forwarding path. Here's an updated patch:
>
> Still no improvment. I've moved all the collected tcpdumps to
> ftp://www.linux-ax25.org/pub/traces/ including a README file describing
> contents.
Uh.. Of course I should have upgraded the host not the client. And
after having done that things are working fine now. Thanks!
Ralf
^ permalink raw reply
* [PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails
From: Takuya Yoshikawa @ 2010-05-27 10:03 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>
We need to free newmem when vhost_set_memory() fails to complete.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
drivers/vhost/vhost.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9633a3c..1241a22 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
return -EFAULT;
}
- if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
+ if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL))) {
+ kfree(newmem);
return -EFAULT;
+ }
oldmem = d->memory;
rcu_assign_pointer(d->memory, newmem);
synchronize_rcu();
--
1.7.0.4
^ permalink raw reply related
* [PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly
From: Takuya Yoshikawa @ 2010-05-27 10:01 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev
In-Reply-To: <20100527185803.0c0213a1.yoshikawa.takuya@oss.ntt.co.jp>
copy_to/from_user() returns the number of bytes that could not be copied.
So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
drivers/vhost/net.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index aa88911..0f41c91 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
int r;
switch (ioctl) {
case VHOST_NET_SET_BACKEND:
- r = copy_from_user(&backend, argp, sizeof backend);
- if (r < 0)
- return r;
+ if (copy_from_user(&backend, argp, sizeof backend))
+ return -EFAULT;
return vhost_net_set_backend(n, backend.index, backend.fd);
case VHOST_GET_FEATURES:
features = VHOST_FEATURES;
- return copy_to_user(featurep, &features, sizeof features);
+ if (copy_to_user(featurep, &features, sizeof features))
+ return -EFAULT;
+ return 0;
case VHOST_SET_FEATURES:
- r = copy_from_user(&features, featurep, sizeof features);
- if (r < 0)
- return r;
+ if (copy_from_user(&features, featurep, sizeof features))
+ return -EFAULT;
if (features & ~VHOST_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
--
1.7.0.4
^ permalink raw reply related
* [PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly
From: Takuya Yoshikawa @ 2010-05-27 9:58 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev
copy_to/from_user() returns the number of bytes that could not be copied.
So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
drivers/vhost/vhost.c | 51 ++++++++++++++++++++++++++----------------------
1 files changed, 28 insertions(+), 23 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c9c657..9633a3c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
{
struct vhost_memory mem, *newmem, *oldmem;
unsigned long size = offsetof(struct vhost_memory, regions);
- long r;
- r = copy_from_user(&mem, m, size);
- if (r)
- return r;
+ if (copy_from_user(&mem, m, size))
+ return -EFAULT;
if (mem.padding)
return -EOPNOTSUPP;
if (mem.nregions > VHOST_MEMORY_MAX_NREGIONS)
@@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
return -ENOMEM;
memcpy(newmem, &mem, size);
- r = copy_from_user(newmem->regions, m->regions,
- mem.nregions * sizeof *m->regions);
- if (r) {
+ if (copy_from_user(newmem->regions, m->regions,
+ mem.nregions * sizeof *m->regions)) {
kfree(newmem);
- return r;
+ return -EFAULT;
}
if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
@@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
r = -EBUSY;
break;
}
- r = copy_from_user(&s, argp, sizeof s);
- if (r < 0)
+ if (copy_from_user(&s, argp, sizeof s)) {
+ r = -EFAULT;
break;
+ }
if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) {
r = -EINVAL;
break;
@@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
r = -EBUSY;
break;
}
- r = copy_from_user(&s, argp, sizeof s);
- if (r < 0)
+ if (copy_from_user(&s, argp, sizeof s)) {
+ r = -EFAULT;
break;
+ }
if (s.num > 0xffff) {
r = -EINVAL;
break;
@@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
case VHOST_GET_VRING_BASE:
s.index = idx;
s.num = vq->last_avail_idx;
- r = copy_to_user(argp, &s, sizeof s);
+ if (copy_to_user(argp, &s, sizeof s))
+ r = -EFAULT;
break;
case VHOST_SET_VRING_ADDR:
- r = copy_from_user(&a, argp, sizeof a);
- if (r < 0)
+ if (copy_from_user(&a, argp, sizeof a)) {
+ r = -EFAULT;
break;
+ }
if (a.flags & ~(0x1 << VHOST_VRING_F_LOG)) {
r = -EOPNOTSUPP;
break;
@@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
vq->used = (void __user *)(unsigned long)a.used_user_addr;
break;
case VHOST_SET_VRING_KICK:
- r = copy_from_user(&f, argp, sizeof f);
- if (r < 0)
+ if (copy_from_user(&f, argp, sizeof f)) {
+ r = -EFAULT;
break;
+ }
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
if (IS_ERR(eventfp)) {
r = PTR_ERR(eventfp);
@@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
filep = eventfp;
break;
case VHOST_SET_VRING_CALL:
- r = copy_from_user(&f, argp, sizeof f);
- if (r < 0)
+ if (copy_from_user(&f, argp, sizeof f)) {
+ r = -EFAULT;
break;
+ }
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
if (IS_ERR(eventfp)) {
r = PTR_ERR(eventfp);
@@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
filep = eventfp;
break;
case VHOST_SET_VRING_ERR:
- r = copy_from_user(&f, argp, sizeof f);
- if (r < 0)
+ if (copy_from_user(&f, argp, sizeof f)) {
+ r = -EFAULT;
break;
+ }
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
if (IS_ERR(eventfp)) {
r = PTR_ERR(eventfp);
@@ -575,9 +579,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
r = vhost_set_memory(d, argp);
break;
case VHOST_SET_LOG_BASE:
- r = copy_from_user(&p, argp, sizeof p);
- if (r < 0)
+ if (copy_from_user(&p, argp, sizeof p)) {
+ r = -EFAULT;
break;
+ }
if ((u64)(unsigned long)p != p) {
r = -EFAULT;
break;
--
1.7.0.4
^ permalink raw reply related
* Re: UDP Fragmentation and DF bit..
From: Andi Kleen @ 2010-05-27 9:43 UTC (permalink / raw)
To: Vincent, Pradeep; +Cc: netdev@vger.kernel.org
In-Reply-To: <C8231CDB.16FC5%pradeepv@amazon.com>
"Vincent, Pradeep" <pradeepv@amazon.com> writes:
> OMan 7 ip¹ declares that ³The system-wide default is controlled by the
> ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and disabled on all
> others.² which led me to think ODF¹ bit will not be set for UDP packets.
> But..
One should add ip.7 is not really a spec, just documentation
how things were quite a few years ago. It unfortunately
does often not get updated when things change.
>
> In a network environment where MTU-big and MTU-small co-exist (and have
> router¹s fragmentation turned off in favor of PMTU discovery), UDP packets
> that are > MTU-small and < MTU-big find the PMTU effectively but UDP
> packets
I don't understand your mtu-small/big concept. PMTU is per IP and per
flow. So there's only always a single PMTU, not small and big.
Or do you refer to a single IP NAT situation where a single IP
shares different MTUs?
> Is there a reason ODF¹ bit cannot be set on fragmented packets on UDP
> transmission ? I couldn¹t find anything in RFC for IP protocol that
> prohibited DF bit on fragmented packets. Did I miss
> something here ?
> Would it be reasonable if PMTU discovery is performed (DF bit set +
> appropriate icmp logic) even for locally fragmented packets ? I think
> this
DF=1 on fragments would mean the application has to do pmtu discovery
even with fragments for the case when the kernel does not know
the path mtu yet. But if the app supports pmtu discovery it's better
to not use fragments in the first place.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply
* ipsec and gre on the same interface
From: bengan @ 2010-05-27 9:28 UTC (permalink / raw)
To: netdev
Hi,
I've got a set up with two boxes (routers) with ipsec and gre (quagga for the
routing) that works but that includes 2 different interfaces in each box. But
I would like to do this in a box with only one interface for ipsec and gre. Is
that possible? In a vmware setup I have eth0 and eth1 as my two interfaces and
established an endpoint for the gre tunnel in eth1 I can get ipsec and gre
running and forwarding packets. But when I want to do the same thing with only
one interface it doesn't work.
Am I doing something really stupid or should this work? If I'm doing something
really stupid could you explain what?
The setup
Working setup
**** ****
* *----------* *
**** ****
ipsec
<-------->
gre
<---------------->
The setup I would like to get working
**** ****
* *----------* *
**** ****
ipsec
<-------->
gre
<-------->
regards,
/bengan
^ permalink raw reply
* [PATCH net-2.6] be2net: Patch removes redundant while statement in loop.
From: Sarveshwar Bandi @ 2010-05-27 9:32 UTC (permalink / raw)
To: netdev; +Cc: davem
Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
drivers/net/benet/be_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index aa065c7..54b1427 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1861,7 +1861,7 @@ static int be_setup(struct be_adapter *a
goto if_destroy;
}
vf++;
- } while (vf < num_vfs);
+ }
} else if (!be_physfn(adapter)) {
status = be_cmd_mac_addr_query(adapter, mac,
MAC_ADDRESS_TYPE_NETWORK, false, adapter->if_handle);
--
1.4.0
^ permalink raw reply related
* Re: ipv6: Add GSO support on forwarding path
From: Ralf Baechle @ 2010-05-27 9:22 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <20100526221601.GA3369@gondor.apana.org.au>
On Thu, May 27, 2010 at 08:16:01AM +1000, Herbert Xu wrote:
> > I tested this on top of a 2.6.34 release kernel and asked the user
> > experiencing the problem to re-test and the problem still persists.
>
> OK, it looks like my patch doesn't work because the transport
> header isn't set on the forwarding path. Here's an updated patch:
Still no improvment. I've moved all the collected tcpdumps to
ftp://www.linux-ax25.org/pub/traces/ including a README file describing
contents.
Ralf
^ permalink raw reply
* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Michael S. Tsirkin @ 2010-05-27 9:16 UTC (permalink / raw)
To: Paul Menage
Cc: Sridhar Samudrala, netdev, kvm@vger.kernel.org, lkml, containers,
lizf, Andrew Morton, Ben Blum, KAMEZAWA Hiroyuki
In-Reply-To: <AANLkTilawCLv5_XlStsFdOdLh-dmlIR3aUTyrVS-JF_s@mail.gmail.com>
On Tue, May 25, 2010 at 11:34:12AM -0700, Paul Menage wrote:
> On Tue, May 25, 2010 at 9:53 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, May 20, 2010 at 03:22:15PM -0700, Paul Menage wrote:
> >> On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
> >> <samudrala.sridhar@gmail.com> wrote:
> >> > Add a new kernel API to attach a task to current task's cgroup
> >> > in all the active hierarchies.
> >> >
> >> > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> >>
> >> Reviewed-by: Paul Menage <menage@google.com>
> >>
> >> It would be more efficient to just attach directly to current->cgroups
> >> rather than potentially creating/destroying one css_set for each
> >> hierarchy until we've completely converged on current->cgroups - but
> >> that would require a bunch of refactoring of the guts of
> >> cgroup_attach_task() to ensure that the right can_attach()/attach()
> >> callbacks are made. That doesn't really seem worthwhile right now for
> >> the initial use, that I imagine isn't going to be
> >> performance-sensitive.
> >>
> >> Paul
> >
> > Is this patch suitable for 2.6.35?
>
> Should be OK, yes.
>
> Paul
So I'll add your Acked-by tag then, and merge through the vhost tree
together with the patch that uses this.
--
MST
^ permalink raw reply
* Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup
From: Michael S. Tsirkin @ 2010-05-27 9:14 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: netdev, lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
Tejun Heo, Jiri Kosina, Thomas Gleixner, Oleg Nesterov,
Ingo Molnar, Andi Kleen
In-Reply-To: <1274227491.2370.110.camel@w-sridhar.beaverton.ibm.com>
On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
> Add a new kernel API to create a singlethread workqueue and attach it's
> task to current task's cgroup and cpumask.
>
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Could someone familiar with workqueue code please comment on whether
this patch is suitable for 2.6.35?
It is needed to fix the case where vhost user might cause a kernel
thread to consume more CPU than allowed by the cgroup.
Should I merge it through the vhost tree?
Ack for this?
Thanks!
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 9466e86..6d6f301 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -211,6 +211,7 @@ __create_workqueue_key(const char *name, int singlethread,
> #define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0)
> #define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 0)
>
> +extern struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name);
> extern void destroy_workqueue(struct workqueue_struct *wq);
>
> extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5bfb213..6ba226e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -35,6 +35,7 @@
> #include <linux/lockdep.h>
> #define CREATE_TRACE_POINTS
> #include <trace/events/workqueue.h>
> +#include <linux/cgroup.h>
>
> /*
> * The per-CPU workqueue (if single thread, we always use the first
> @@ -193,6 +194,45 @@ static const struct cpumask *cpu_singlethread_map __read_mostly;
> */
> static cpumask_var_t cpu_populated_map __read_mostly;
>
> +static struct task_struct *get_singlethread_wq_task(struct workqueue_struct *wq)
> +{
> + return (per_cpu_ptr(wq->cpu_wq, singlethread_cpu))->thread;
> +}
> +
> +/* Create a singlethread workqueue and attach it's task to the current task's
> + * cgroup and set it's cpumask to the current task's cpumask.
> + */
> +struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char *name)
> +{
> + struct workqueue_struct *wq;
> + struct task_struct *task;
> + cpumask_var_t mask;
> +
> + wq = create_singlethread_workqueue(name);
> + if (!wq)
> + goto out;
> +
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + goto err;
> +
> + if (sched_getaffinity(current->pid, mask))
> + goto err;
> +
> + task = get_singlethread_wq_task(wq);
> + if (sched_setaffinity(task->pid, mask))
> + goto err;
> +
> + if (cgroup_attach_task_current_cg(task))
> + goto err;
> +out:
> + return wq;
> +err:
> + destroy_workqueue(wq);
> + wq = NULL;
> + goto out;
> +}
> +EXPORT_SYMBOL_GPL(create_singlethread_workqueue_in_current_cg);
> +
> /* If it's single threaded, it isn't in the list of workqueues. */
> static inline int is_wq_single_threaded(struct workqueue_struct *wq)
> {
>
^ permalink raw reply
* [patch 3/3] rds/iw_rdma: remove unneeded kfree()
From: Dan Carpenter @ 2010-05-27 9:11 UTC (permalink / raw)
To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors
The "dma_pages" variable is always NULL at this point so the kfree()
isn't needed.
Signed-off-by: Dan Carpenter <error27@gmail.com>
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 48d5073..742ab6a 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -334,7 +334,6 @@ static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
out_unmap:
ib_dma_unmap_sg(rds_iwdev->dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
sg->dma_len = 0;
- kfree(dma_pages);
return ERR_PTR(ret);
}
^ permalink raw reply related
* [patch 2/3] rds/iw_rdma: spin_lock_irq() is not nestable
From: Dan Carpenter @ 2010-05-27 9:10 UTC (permalink / raw)
To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors
The IRQs are already disabled here so we don't need to disable them
again.
Signed-off-by: Dan Carpenter <error27@gmail.com>
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 7365fa2..48d5073 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -213,9 +213,9 @@ void rds_iw_add_conn(struct rds_iw_device *rds_iwdev, struct rds_connection *con
BUG_ON(list_empty(&ic->iw_node));
list_del(&ic->iw_node);
- spin_lock_irq(&rds_iwdev->spinlock);
+ spin_lock(&rds_iwdev->spinlock);
list_add_tail(&ic->iw_node, &rds_iwdev->conn_list);
- spin_unlock_irq(&rds_iwdev->spinlock);
+ spin_unlock(&rds_iwdev->spinlock);
spin_unlock_irq(&iw_nodev_conns_lock);
ic->rds_iwdev = rds_iwdev;
^ permalink raw reply related
* [patch 1/3] rds/iw_rdma: using too much stack space
From: Dan Carpenter @ 2010-05-27 9:08 UTC (permalink / raw)
To: Andy Grover; +Cc: David S. Miller, rds-devel, netdev, kernel-janitors
struct rds_sock is pretty big, it uses 944 bytes. The inclusion of
struct sock is what does it, struct sock uses 700 bytes.
Signed-off-by: Dan Carpenter <error27@gmail.com>
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index 13dc186..742ab6a 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -178,23 +178,29 @@ int rds_iw_update_cm_id(struct rds_iw_device *rds_iwdev, struct rdma_cm_id *cm_i
{
struct sockaddr_in *src_addr, *dst_addr;
struct rds_iw_device *rds_iwdev_old;
- struct rds_sock rs;
+ struct rds_sock *rs;
struct rdma_cm_id *pcm_id;
- int rc;
+ int ret;
+
+ rs = kmalloc(sizeof(*rs), GFP_KERNEL);
+ if (!rs)
+ return -ENOMEM;
src_addr = (struct sockaddr_in *)&cm_id->route.addr.src_addr;
dst_addr = (struct sockaddr_in *)&cm_id->route.addr.dst_addr;
- rs.rs_bound_addr = src_addr->sin_addr.s_addr;
- rs.rs_bound_port = src_addr->sin_port;
- rs.rs_conn_addr = dst_addr->sin_addr.s_addr;
- rs.rs_conn_port = dst_addr->sin_port;
+ rs->rs_bound_addr = src_addr->sin_addr.s_addr;
+ rs->rs_bound_port = src_addr->sin_port;
+ rs->rs_conn_addr = dst_addr->sin_addr.s_addr;
+ rs->rs_conn_port = dst_addr->sin_port;
- rc = rds_iw_get_device(&rs, &rds_iwdev_old, &pcm_id);
- if (rc)
+ ret = rds_iw_get_device(rs, &rds_iwdev_old, &pcm_id);
+ if (ret)
rds_iw_remove_cm_id(rds_iwdev, cm_id);
- return rds_iw_add_cm_id(rds_iwdev, cm_id);
+ ret = rds_iw_add_cm_id(rds_iwdev, cm_id);
+ kfree(rs);
+ return ret;
}
void rds_iw_add_conn(struct rds_iw_device *rds_iwdev, struct rds_connection *conn)
^ permalink raw reply related
* Re: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Michael S. Tsirkin @ 2010-05-27 8:20 UTC (permalink / raw)
To: Xin, Xiaohui
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Herbert Xu
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D790B7958E740@shsmsx502.ccr.corp.intel.com>
On Thu, May 27, 2010 at 09:21:02AM +0800, Xin, Xiaohui wrote:
> Michael,
> I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.
>
> When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside. In zero-copy case, vhost cannot know which page is used to put header, and which page is used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where the zero-copy use mergeable buffer must modify.
>
> Have I missed something here? And how do you think about it?
>
> Thanks
> Xiaohui
Maybe you can teach the hardware skip the first 12 bytes: qemu will
call an ioctl telling hardware what the virtio header size is.
This is how we plan to do it for tap.
Alternatively, buffers can be used in any order.
So we can have hardware use N buffers for the packet, and then
have vhost put the header in buffer N+1.
--
MST
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox