* Re: [PATCH net-next] can-raw: add msg_flags to distinguish local traffic
From: Daniel Baluta @ 2010-10-15 9:09 UTC (permalink / raw)
To: Kurt Van Dijck
Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w, netdev, Oliver Hartkopp
In-Reply-To: <20101015073709.GA387-MxZ6Iy/zr/UdbCeoMzGj59i2O/JbrIOy@public.gmane.org>
On Fri, Oct 15, 2010 at 10:37 AM, Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org> wrote:
> CAN has no addressing scheme. It is currently impossible
> for userspace to tell is a received CAN frame comes from
> another process on the local host, or from a remote CAN
> device.
> This patch add support for userspace applications to distinguish
> between 'own', 'local' and 'remote' CAN traffic.
> Distinction is made by returning some flags in msg->msg_flags
> in the call to recvmsg.
> MSG_CONFIRM flag means 'own', as in 'transmission confirmation'
> MSG_DONTROUTE flag means 'local', not routed.
> Obviously, msgs with MSG_CONFIRM will have MSG_DONTROUTE set too.
>
> Please note that on SocketCAN mailing list, different opinions
> exist on the exact meaning of MSG_DONTROUTE. Better (=more
> intuitive) alternatives are appreciated.
>
> Signed-off-by: Kurt Van Dijck <kurt.van.dijck-AgBVmzD5pcezQB+pC5nmwQ@public.gmane.org>
> ---
> net/can/raw.c | 33 ++++++++++++++++++++++++++++++---
> 1 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/net/can/raw.c b/net/can/raw.c
> index 7d77e67..f98709e 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -90,23 +90,39 @@ struct raw_sock {
> can_err_mask_t err_mask;
> };
>
> +/*
> + * return some space to store extra msg flags in.
> + * We use 1 int beyond the 'struct sockaddr_can' in skb->cb
> + * to store those.
> + * These flags will be use in raw_recvmsg()
> + */
> +static inline int *raw_flags(struct sk_buff *skb)
> +{
> + BUILD_BUG_ON(sizeof(skb->cb)
> + <= (sizeof(struct sockaddr_can) + sizeof(int)));
> + /* return pointer after struct sockaddr_can */
> + return (int *)(&((struct sockaddr_can *)skb->cb)[1]);
Since msg_flags is unsigned, it would be nice if this function returns unsigned.
> +}
> +
> static inline struct raw_sock *raw_sk(const struct sock *sk)
> {
> return (struct raw_sock *)sk;
> }
>
> -static void raw_rcv(struct sk_buff *skb, void *data)
> +static void raw_rcv(struct sk_buff *oskb, void *data)
> {
> struct sock *sk = (struct sock *)data;
> struct raw_sock *ro = raw_sk(sk);
> struct sockaddr_can *addr;
> + struct sk_buff *skb;
> + int *pflags;
>
> /* check the received tx sock reference */
> - if (!ro->recv_own_msgs && skb->sk == sk)
> + if (!ro->recv_own_msgs && oskb->sk == sk)
> return;
>
> /* clone the given skb to be able to enqueue it into the rcv queue */
> - skb = skb_clone(skb, GFP_ATOMIC);
> + skb = skb_clone(oskb, GFP_ATOMIC);
> if (!skb)
> return;
>
> @@ -123,6 +139,14 @@ static void raw_rcv(struct sk_buff *skb, void *data)
> addr->can_family = AF_CAN;
> addr->can_ifindex = skb->dev->ifindex;
>
> + /* prepare the flags for raw_recvmsg() */
> + pflags = raw_flags(skb);
> + *pflags = 0;
> + if (oskb->sk)
> + *pflags |= MSG_DONTROUTE;
> + if (oskb->sk == sk)
> + *pflags |= MSG_CONFIRM;
> +
> if (sock_queue_rcv_skb(sk, skb) < 0)
> kfree_skb(skb);
> }
> @@ -707,6 +731,9 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
> memcpy(msg->msg_name, skb->cb, msg->msg_namelen);
> }
>
> + /* assign the flags that have been recorded in in raw_rcv() */
small typo: double in
> + msg->msg_flags |= *(raw_flags(skb));
> +
> skb_free_datagram(sk, skb);
>
> return size;
thanks,
Daniel.
^ permalink raw reply
* [PATCH v13 01/16] Add a new structure for skb buffer from external.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <1287133937-5538-1-git-send-email-xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/skbuff.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 77eb60d..696e690 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -211,6 +211,15 @@ struct skb_shared_info {
skb_frag_t frags[MAX_SKB_FRAGS];
};
+/* The structure is for a skb which pages may point to
+ * an external buffer, which is not allocated from kernel space.
+ * It also contains a destructor for itself.
+ */
+struct skb_ext_page {
+ struct page *page;
+ void (*dtor)(struct skb_ext_page *);
+};
+
/* We divide dataref into two halves. The higher 16 bits hold references
* to the payload part of skb->data. The lower 16 bits hold references to
* the entire skb->data. A clone of a headerless skb holds the length of
--
1.7.3
^ permalink raw reply related
* [PATCH v13 02/16] Add a new struct for device to manipulate external buffer.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Add a structure in structure net_device, the new field is
named as mp_port. It's for mediate passthru (zero-copy).
It contains the capability for the net device driver,
a socket, and an external buffer creator, external means
skb buffer belongs to the device may not be allocated from
kernel space.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/netdevice.h | 25 ++++++++++++++++++++++++-
1 files changed, 24 insertions(+), 1 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 46c36ff..f6b1870 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -325,6 +325,28 @@ enum netdev_state_t {
__LINK_STATE_DORMANT,
};
+/*The structure for mediate passthru(zero-copy). */
+struct mp_port {
+ /* the header len */
+ int hdr_len;
+ /* the max payload len for one descriptor */
+ int data_len;
+ /* the pages for DMA in one time */
+ int npages;
+ /* the socket bind to */
+ struct socket *sock;
+ /* the header len for virtio-net */
+ int vnet_hlen;
+ /* the external buffer page creator */
+ struct skb_ext_page *(*ctor)(struct mp_port *,
+ struct sk_buff *, int);
+ /* the hash function attached to find according
+ * backend ring descriptor info for one external
+ * buffer page.
+ */
+ struct skb_ext_page *(*hash)(struct net_device *,
+ struct page *);
+};
/*
* This structure holds at boot time configured netdevice settings. They
@@ -1045,7 +1067,8 @@ struct net_device {
/* GARP */
struct garp_port *garp_port;
-
+ /* mpassthru */
+ struct mp_port *mp_port;
/* class/net/name entry */
struct device dev;
/* space for optional device, statistics, and wireless sysfs groups */
--
1.7.3
^ permalink raw reply related
* [PATCH v13 03/16] Add a ndo_mp_port_prep pointer to net_device_ops.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
If the driver want to allocate external buffers,
then it can export it's capability, as the skb
buffer header length, the page length can be DMA, etc.
The external buffers owner may utilize this.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/netdevice.h | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f6b1870..575777f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -723,6 +723,12 @@ struct netdev_rx_queue {
* int (*ndo_set_vf_port)(struct net_device *dev, int vf,
* struct nlattr *port[]);
* int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ *
+ * int (*ndo_mp_port_prep)(struct net_device *dev, struct mp_port *port);
+ * If the driver want to allocate external buffers,
+ * then it can export it's capability, as the skb
+ * buffer header length, the page length can be DMA, etc.
+ * The external buffers owner may utilize this.
*/
#define HAVE_NET_DEVICE_OPS
struct net_device_ops {
@@ -795,6 +801,10 @@ struct net_device_ops {
int (*ndo_fcoe_get_wwn)(struct net_device *dev,
u64 *wwn, int type);
#endif
+#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE)
+ int (*ndo_mp_port_prep)(struct net_device *dev,
+ struct mp_port *port);
+#endif
};
/*
--
1.7.3
^ permalink raw reply related
* [PATCH v13 05/16] Add a function to indicate if device use external buffer.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/netdevice.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8dcf6de..f91d9bb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1739,6 +1739,11 @@ extern gro_result_t napi_gro_frags(struct napi_struct *napi);
extern int netdev_mp_port_prep(struct net_device *dev,
struct mp_port *port);
+static inline bool dev_is_mpassthru(struct net_device *dev)
+{
+ return dev && dev->mp_port;
+}
+
static inline void napi_free_frags(struct napi_struct *napi)
{
kfree_skb(napi->skb);
--
1.7.3
^ permalink raw reply related
* [PATCH v13 06/16]Use callback to deal with skb_release_data() specially.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
If buffer is external, then use the callback to destruct
buffers.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
net/core/skbuff.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c83b421..5e6d69c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -210,6 +210,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
/* make sure we initialize shinfo sequentially */
shinfo = skb_shinfo(skb);
+ shinfo->destructor_arg = NULL;
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
atomic_set(&shinfo->dataref, 1);
@@ -343,6 +344,13 @@ static void skb_release_data(struct sk_buff *skb)
if (skb_has_frags(skb))
skb_drop_fraglist(skb);
+ if (skb->dev && dev_is_mpassthru(skb->dev)) {
+ struct skb_ext_page *ext_page =
+ skb_shinfo(skb)->destructor_arg;
+ if (ext_page && ext_page->dtor)
+ ext_page->dtor(ext_page);
+ }
+
kfree(skb->head);
}
}
--
1.7.3
^ permalink raw reply related
* [PATCH v13 07/16] Modify netdev_alloc_page() to get external buffer
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Currently, it can get external buffers from mp device.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
net/core/skbuff.c | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5e6d69c..f39d372 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -262,11 +262,38 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
}
EXPORT_SYMBOL(__netdev_alloc_skb);
+struct page *netdev_alloc_ext_pages(struct net_device *dev, int npages)
+{
+ struct mp_port *port;
+ struct skb_ext_page *ext_page = NULL;
+
+ port = dev->mp_port;
+ if (!port)
+ goto out;
+ ext_page = port->ctor(port, NULL, npages);
+ if (ext_page)
+ return ext_page->page;
+out:
+ return NULL;
+
+}
+EXPORT_SYMBOL(netdev_alloc_ext_pages);
+
+struct page *netdev_alloc_ext_page(struct net_device *dev)
+{
+ return netdev_alloc_ext_pages(dev, 1);
+
+}
+EXPORT_SYMBOL(netdev_alloc_ext_page);
+
struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
{
int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
struct page *page;
+ if (dev_is_mpassthru(dev))
+ return netdev_alloc_ext_page(dev);
+
page = alloc_pages_node(node, gfp_mask, 0);
return page;
}
--
1.7.3
^ permalink raw reply related
* [PATCH v13 08/16] Modify netdev_free_page() to release external buffer
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Currently, it can get external buffers from mp device.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/skbuff.h | 4 +++-
net/core/skbuff.c | 24 ++++++++++++++++++++++++
2 files changed, 27 insertions(+), 1 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 696e690..8cfde3e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1585,9 +1585,11 @@ static inline struct page *netdev_alloc_page(struct net_device *dev)
return __netdev_alloc_page(dev, GFP_ATOMIC);
}
+extern void __netdev_free_page(struct net_device *dev, struct page *page);
+
static inline void netdev_free_page(struct net_device *dev, struct page *page)
{
- __free_page(page);
+ __netdev_free_page(dev, page);
}
/**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f39d372..02439e0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -299,6 +299,30 @@ struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
}
EXPORT_SYMBOL(__netdev_alloc_page);
+void netdev_free_ext_page(struct net_device *dev, struct page *page)
+{
+ struct skb_ext_page *ext_page = NULL;
+ if (dev_is_mpassthru(dev) && dev->mp_port->hash) {
+ ext_page = dev->mp_port->hash(dev, page);
+ if (ext_page)
+ ext_page->dtor(ext_page);
+ else
+ __free_page(page);
+ }
+}
+EXPORT_SYMBOL(netdev_free_ext_page);
+
+void __netdev_free_page(struct net_device *dev, struct page *page)
+{
+ if (dev_is_mpassthru(dev)) {
+ netdev_free_ext_page(dev, page);
+ return;
+ }
+
+ __free_page(page);
+}
+EXPORT_SYMBOL(__netdev_free_page);
+
void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
int size)
{
--
1.7.3
^ permalink raw reply related
* [PATCH v13 09/16]Don't do skb recycle, if device use external buffer.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
net/core/skbuff.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 02439e0..196aa99 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -558,6 +558,12 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size)
if (skb_shared(skb) || skb_cloned(skb))
return false;
+ /* if the device wants to do mediate passthru, the skb may
+ * get external buffer, so don't recycle
+ */
+ if (dev_is_mpassthru(skb->dev))
+ return 0;
+
skb_release_head_state(skb);
shinfo = skb_shinfo(skb);
--
1.7.3
^ permalink raw reply related
* [PATCH v13 11/16] Add header file for mp device.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
include/linux/mpassthru.h | 133 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 133 insertions(+), 0 deletions(-)
create mode 100644 include/linux/mpassthru.h
diff --git a/include/linux/mpassthru.h b/include/linux/mpassthru.h
new file mode 100644
index 0000000..1115f55
--- /dev/null
+++ b/include/linux/mpassthru.h
@@ -0,0 +1,133 @@
+#ifndef __MPASSTHRU_H
+#define __MPASSTHRU_H
+
+#include <linux/types.h>
+#include <linux/if_ether.h>
+#include <linux/ioctl.h>
+
+/* ioctl defines */
+#define MPASSTHRU_BINDDEV _IOW('M', 213, int)
+#define MPASSTHRU_UNBINDDEV _IO('M', 214)
+#define MPASSTHRU_SET_MEM_LOCKED _IOW('M', 215, unsigned long)
+#define MPASSTHRU_GET_MEM_LOCKED_NEED _IOR('M', 216, unsigned long)
+
+#define COPY_THRESHOLD (L1_CACHE_BYTES * 4)
+#define COPY_HDR_LEN (L1_CACHE_BYTES < 64 ? 64 : L1_CACHE_BYTES)
+
+#define DEFAULT_NEED ((8192*2*2)*4096)
+
+struct frag {
+ u16 offset;
+ u16 size;
+};
+
+#define HASH_BUCKETS (8192*2)
+struct page_info {
+ struct list_head list;
+ struct page_info *next;
+ struct page_info *prev;
+ struct page *pages[MAX_SKB_FRAGS];
+ struct sk_buff *skb;
+ struct page_pool *pool;
+
+ /* The pointer relayed to skb, to indicate
+ * it's a external allocated skb or kernel
+ */
+ struct skb_ext_page ext_page;
+ /* flag to indicate read or write */
+#define INFO_READ 0
+#define INFO_WRITE 1
+ unsigned flags;
+ /* exact number of locked pages */
+ unsigned pnum;
+
+ /* The fields after that is for backend
+ * driver, now for vhost-net.
+ */
+ /* the kiocb structure related to */
+ struct kiocb *iocb;
+ /* the ring descriptor index */
+ unsigned int desc_pos;
+ /* the iovec coming from backend, we only
+ * need few of them */
+ struct iovec hdr[2];
+ struct iovec iov[2];
+};
+
+struct page_pool {
+ /* the queue for rx side */
+ struct list_head readq;
+ /* the lock to protect readq */
+ spinlock_t read_lock;
+ /* record the orignal rlimit */
+ struct rlimit o_rlim;
+ /* userspace wants to locked */
+ int locked_pages;
+ /* currently locked pages */
+ int cur_pages;
+ /* the memory locked before */
+ unsigned long orig_locked_vm;
+ /* the device according to */
+ struct net_device *dev;
+ /* the mp_port according to dev */
+ struct mp_port port;
+ /* the hash_table list to find each locked page */
+ struct page_info **hash_table;
+};
+
+static struct kmem_cache *ext_page_info_cache;
+
+#ifdef __KERNEL__
+#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE)
+struct socket *mp_get_socket(struct file *);
+struct page_pool *page_pool_create(struct net_device *dev,
+ struct socket *sock);
+int async_recvmsg(struct kiocb *iocb, struct page_pool *pool,
+ struct iovec *iov, int count, int flags);
+int async_sendmsg(struct sock *sk, struct kiocb *iocb,
+ struct page_pool *pool, struct iovec *iov,
+ int count);
+void async_data_ready(struct sock *sk, struct page_pool *pool);
+void dev_change_state(struct net_device *dev);
+void page_pool_destroy(struct mm_struct *mm, struct page_pool *pool);
+#else
+#include <linux/err.h>
+#include <linux/errno.h>
+struct file;
+struct socket;
+static inline struct socket *mp_get_socket(struct file *f)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline struct page_pool *page_pool_create(struct net_device *dev,
+ struct socket *sock)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline int async_recvmsg(struct kiocb *iocb, struct page_pool *pool,
+ struct iovec *iov, int count, int flags)
+{
+ return -EINVAL;
+}
+static inline int async_sendmsg(struct sock *sk, struct kiocb *iocb,
+ struct page_pool *pool, struct iovec *iov,
+ int count)
+{
+ return -EINVAL;
+}
+static inline void async_data_ready(struct sock *sk, struct page_pool *pool)
+{
+ return;
+}
+static inline void dev_change_state(struct net_device *dev)
+{
+ return;
+}
+static inline void page_pool_destroy(struct mm_struct *mm,
+ struct page_pool *pool)
+{
+ return;
+}
+#endif /* CONFIG_MEDIATE_PASSTHRU */
+#endif /* __KERNEL__ */
+#endif /* __MPASSTHRU_H */
--
1.7.3
^ permalink raw reply related
* [PATCH v13 13/16] Add a kconfig entry and make entry for mp device.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
drivers/vhost/Kconfig | 10 ++++++++++
drivers/vhost/Makefile | 2 ++
2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..a6b8cbf 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,13 @@ config VHOST_NET
To compile this driver as a module, choose M here: the module will
be called vhost_net.
+config MEDIATE_PASSTHRU
+ tristate "mediate passthru network driver (EXPERIMENTAL)"
+ depends on VHOST_NET
+ ---help---
+ zerocopy network I/O support, we call it as mediate passthru to
+ be distiguish with hardare passthru.
+
+ To compile this driver as a module, choose M here: the module will
+ be called mpassthru.
+
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..c18b9fc 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,4 @@
obj-$(CONFIG_VHOST_NET) += vhost_net.o
vhost_net-y := vhost.o net.o
+
+obj-$(CONFIG_MEDIATE_PASSTHRU) += mpassthru.o
--
1.7.3
^ permalink raw reply related
* [PATCH v13 14/16]Provides multiple submits and asynchronous notifications.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
The vhost-net backend now only supports synchronous send/recv
operations. The patch provides multiple submits and asynchronous
notifications. This is needed for zero-copy case.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
drivers/vhost/net.c | 355 +++++++++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.c | 78 +++++++++++
drivers/vhost/vhost.h | 15 ++-
3 files changed, 423 insertions(+), 25 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7c80082..17c599a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -24,6 +24,8 @@
#include <linux/if_arp.h>
#include <linux/if_tun.h>
#include <linux/if_macvlan.h>
+#include <linux/mpassthru.h>
+#include <linux/aio.h>
#include <net/sock.h>
@@ -32,6 +34,7 @@
/* Max number of bytes transferred before requeueing the job.
* Using this limit prevents one virtqueue from starving others. */
#define VHOST_NET_WEIGHT 0x80000
+static struct kmem_cache *notify_cache;
enum {
VHOST_NET_VQ_RX = 0,
@@ -49,6 +52,7 @@ struct vhost_net {
struct vhost_dev dev;
struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
struct vhost_poll poll[VHOST_NET_VQ_MAX];
+ struct kmem_cache *cache;
/* Tells us whether we are polling a socket for TX.
* We only do this when socket buffer fills up.
* Protected by tx vq lock. */
@@ -109,11 +113,184 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock)
net->tx_poll_state = VHOST_NET_POLL_STARTED;
}
+struct kiocb *notify_dequeue(struct vhost_virtqueue *vq)
+{
+ struct kiocb *iocb = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vq->notify_lock, flags);
+ if (!list_empty(&vq->notifier)) {
+ iocb = list_first_entry(&vq->notifier,
+ struct kiocb, ki_list);
+ list_del(&iocb->ki_list);
+ }
+ spin_unlock_irqrestore(&vq->notify_lock, flags);
+ return iocb;
+}
+
+static void handle_iocb(struct kiocb *iocb)
+{
+ struct vhost_virtqueue *vq = iocb->private;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vq->notify_lock, flags);
+ list_add_tail(&iocb->ki_list, &vq->notifier);
+ spin_unlock_irqrestore(&vq->notify_lock, flags);
+}
+
+static int is_async_vq(struct vhost_virtqueue *vq)
+{
+ return (vq->link_state == VHOST_VQ_LINK_ASYNC);
+}
+
+static void handle_async_rx_events_notify(struct vhost_net *net,
+ struct vhost_virtqueue *vq,
+ struct socket *sock)
+{
+ struct kiocb *iocb = NULL;
+ struct vhost_log *vq_log = NULL;
+ int rx_total_len = 0;
+ unsigned int head, log, in, out;
+ int size;
+
+ if (!is_async_vq(vq))
+ return;
+
+ if (sock->sk->sk_data_ready)
+ sock->sk->sk_data_ready(sock->sk, 0);
+
+ vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
+ vq->log : NULL;
+
+ while ((iocb = notify_dequeue(vq)) != NULL) {
+ if (!iocb->ki_left) {
+ vhost_add_used_and_signal(&net->dev, vq,
+ iocb->ki_pos, iocb->ki_nbytes);
+ size = iocb->ki_nbytes;
+ head = iocb->ki_pos;
+ rx_total_len += iocb->ki_nbytes;
+
+ if (iocb->ki_dtor)
+ iocb->ki_dtor(iocb);
+ kmem_cache_free(net->cache, iocb);
+
+ /* when log is enabled, recomputing the log is needed,
+ * since these buffers are in async queue, may not get
+ * the log info before.
+ */
+ if (unlikely(vq_log)) {
+ if (!log)
+ __vhost_get_vq_desc(&net->dev, vq,
+ vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in, vq_log,
+ &log, head);
+ vhost_log_write(vq, vq_log, log, size);
+ }
+ if (unlikely(rx_total_len >= VHOST_NET_WEIGHT)) {
+ vhost_poll_queue(&vq->poll);
+ break;
+ }
+ } else {
+ int i = 0;
+ int count = iocb->ki_left;
+ int hc = count;
+ while (count--) {
+ if (iocb) {
+ vq->heads[i].id = iocb->ki_pos;
+ vq->heads[i].len = iocb->ki_nbytes;
+ size = iocb->ki_nbytes;
+ head = iocb->ki_pos;
+ rx_total_len += iocb->ki_nbytes;
+
+ if (iocb->ki_dtor)
+ iocb->ki_dtor(iocb);
+ kmem_cache_free(net->cache, iocb);
+
+ if (unlikely(vq_log)) {
+ if (!log)
+ __vhost_get_vq_desc(
+ &net->dev, vq, vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in, vq_log,
+ &log, head);
+ vhost_log_write(
+ vq, vq_log, log, size);
+ }
+ } else
+ break;
+
+ i++;
+ if (count)
+ iocb = notify_dequeue(vq);
+ }
+ vhost_add_used_and_signal_n(
+ &net->dev, vq, vq->heads, hc);
+ }
+ }
+}
+
+static void handle_async_tx_events_notify(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
+{
+ struct kiocb *iocb = NULL;
+ struct list_head *entry, *tmp;
+ unsigned long flags;
+ int tx_total_len = 0;
+
+ if (!is_async_vq(vq))
+ return;
+
+ spin_lock_irqsave(&vq->notify_lock, flags);
+ list_for_each_safe(entry, tmp, &vq->notifier) {
+ iocb = list_entry(entry,
+ struct kiocb, ki_list);
+ if (!iocb->ki_flags)
+ continue;
+ list_del(&iocb->ki_list);
+ vhost_add_used_and_signal(&net->dev, vq,
+ iocb->ki_pos, 0);
+ tx_total_len += iocb->ki_nbytes;
+
+ if (iocb->ki_dtor)
+ iocb->ki_dtor(iocb);
+
+ kmem_cache_free(net->cache, iocb);
+ if (unlikely(tx_total_len >= VHOST_NET_WEIGHT)) {
+ vhost_poll_queue(&vq->poll);
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&vq->notify_lock, flags);
+}
+
+static struct kiocb *create_iocb(struct vhost_net *net,
+ struct vhost_virtqueue *vq,
+ unsigned head)
+{
+ struct kiocb *iocb = NULL;
+
+ if (!is_async_vq(vq))
+ return NULL;
+
+ iocb = kmem_cache_zalloc(net->cache, GFP_KERNEL);
+ if (!iocb)
+ return NULL;
+ iocb->private = vq;
+ iocb->ki_pos = head;
+ iocb->ki_dtor = handle_iocb;
+ if (vq == &net->dev.vqs[VHOST_NET_VQ_RX])
+ iocb->ki_user_data = vq->num;
+ iocb->ki_iovec = vq->hdr;
+ return iocb;
+}
+
/* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
static void handle_tx(struct vhost_net *net)
{
struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
+ struct kiocb *iocb = NULL;
unsigned out, in, s;
int head;
struct msghdr msg = {
@@ -146,6 +323,10 @@ static void handle_tx(struct vhost_net *net)
if (wmem < sock->sk->sk_sndbuf / 2)
tx_poll_stop(net);
hdr_size = vq->vhost_hlen;
+ if (!vq->vhost_hlen && is_async_vq(vq))
+ hdr_size = vq->sock_hlen;
+
+ handle_async_tx_events_notify(net, vq);
for (;;) {
head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
@@ -157,11 +338,14 @@ static void handle_tx(struct vhost_net *net)
break;
/* Nothing new? Wait for eventfd to tell us they refilled. */
if (head == vq->num) {
- wmem = atomic_read(&sock->sk->sk_wmem_alloc);
- if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
- tx_poll_start(net, sock);
- set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
- break;
+ if (!is_async_vq(vq)) {
+ wmem = atomic_read(&sock->sk->sk_wmem_alloc);
+ if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
+ tx_poll_start(net, sock);
+ set_bit(SOCK_ASYNC_NOSPACE,
+ &sock->flags);
+ break;
+ }
}
if (unlikely(vhost_enable_notify(vq))) {
vhost_disable_notify(vq);
@@ -178,6 +362,13 @@ static void handle_tx(struct vhost_net *net)
s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
msg.msg_iovlen = out;
len = iov_length(vq->iov, out);
+ /* if async operations supported */
+ if (is_async_vq(vq)) {
+ iocb = create_iocb(net, vq, head);
+ if (!iocb)
+ break;
+ }
+
/* Sanity check */
if (!len) {
vq_err(vq, "Unexpected header len for TX: "
@@ -186,12 +377,18 @@ static void handle_tx(struct vhost_net *net)
break;
}
/* TODO: Check specific error and bomb out unless ENOBUFS? */
- err = sock->ops->sendmsg(NULL, sock, &msg, len);
+ err = sock->ops->sendmsg(iocb, sock, &msg, len);
if (unlikely(err < 0)) {
+ if (is_async_vq(vq))
+ kmem_cache_free(net->cache, iocb);
vhost_discard_vq_desc(vq, 1);
tx_poll_start(net, sock);
break;
}
+
+ if (is_async_vq(vq))
+ continue;
+
if (err != len)
pr_debug("Truncated TX packet: "
" len %d != %zd\n", err, len);
@@ -203,6 +400,8 @@ static void handle_tx(struct vhost_net *net)
}
}
+ handle_async_tx_events_notify(net, vq);
+
mutex_unlock(&vq->mutex);
unuse_mm(net->dev.mm);
}
@@ -396,7 +595,8 @@ static void handle_rx_big(struct vhost_net *net)
static void handle_rx_mergeable(struct vhost_net *net)
{
struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
- unsigned uninitialized_var(in), log;
+ unsigned uninitialized_var(in), log, out;
+ struct kiocb *iocb;
struct vhost_log *vq_log;
struct msghdr msg = {
.msg_name = NULL,
@@ -417,28 +617,44 @@ static void handle_rx_mergeable(struct vhost_net *net)
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
struct socket *sock = rcu_dereference(vq->private_data);
- if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
+ if (!sock || (skb_queue_empty(&sock->sk->sk_receive_queue) &&
+ !is_async_vq(vq)))
return;
-
use_mm(net->dev.mm);
mutex_lock(&vq->mutex);
vhost_disable_notify(vq);
vhost_hlen = vq->vhost_hlen;
sock_hlen = vq->sock_hlen;
+ /* In async cases, when write log is enabled, in case the submitted
+ * buffers did not get log info before the log enabling, so we'd
+ * better recompute the log info when needed. We do this in
+ * handle_async_rx_events_notify().
+ */
+
vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
vq->log : NULL;
- while ((sock_len = peek_head_len(sock->sk))) {
- sock_len += sock_hlen;
- vhost_len = sock_len + vhost_hlen;
- headcount = get_rx_bufs(vq, vq->heads, vhost_len,
+ handle_async_rx_events_notify(net, vq, sock);
+
+ while (is_async_vq(vq) || (sock_len = peek_head_len(sock->sk))) {
+ if (is_async_vq(vq))
+ headcount = vhost_get_vq_desc(&net->dev, vq, vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in,
+ vq->log, &log);
+ else {
+ sock_len += sock_hlen;
+ vhost_len = sock_len + vhost_hlen;
+ headcount = get_rx_bufs(vq, vq->heads, vhost_len,
&in, vq_log, &log);
+ }
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
break;
/* OK, now we need to know about added descriptors. */
- if (!headcount) {
+ if ((!headcount && !is_async_vq(vq)) ||
+ (headcount == vq->num && is_async_vq(vq))) {
if (unlikely(vhost_enable_notify(vq))) {
/* They have slipped one in as we were
* doing that: check again. */
@@ -450,16 +666,41 @@ static void handle_rx_mergeable(struct vhost_net *net)
break;
}
/* We don't need to be notified again. */
- if (unlikely((vhost_hlen)))
- /* Skip header. TODO: support TSO. */
- move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, in);
- else
- /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF:
- * needed because sendmsg can modify msg_iov. */
- copy_iovec_hdr(vq->iov, vq->hdr, sock_hlen, in);
+ if (unlikely((vhost_hlen))) {
+ if (is_async_vq(vq))
+ vq->hdr[0].iov_len = vhost_hlen;
+ else
+ /* Skip header. TODO: support TSO. */
+ move_iovec_hdr(vq->iov, vq->hdr,
+ vhost_hlen, in);
+ } else {
+ if (is_async_vq(vq))
+ vq->hdr[0].iov_len = sock_hlen;
+ else
+ /* Copy the header for use in
+ * VIRTIO_NET_F_MRG_RXBUF:
+ * needed because sendmsg can
+ * modify msg_iov. */
+ copy_iovec_hdr(vq->iov, vq->hdr,
+ sock_hlen, in);
+ }
msg.msg_iovlen = in;
- err = sock->ops->recvmsg(NULL, sock, &msg,
+ if (is_async_vq(vq)) {
+ iocb = create_iocb(net, vq, headcount);
+ if (!iocb)
+ break;
+ }
+ err = sock->ops->recvmsg(iocb, sock, &msg,
sock_len, MSG_DONTWAIT | MSG_TRUNC);
+ if (is_async_vq(vq)) {
+ if (err < 0) {
+ kmem_cache_free(net->cache, iocb);
+ vhost_discard_vq_desc(vq, headcount);
+ break;
+ }
+ continue;
+ }
+
/* Userspace might have consumed the packet meanwhile:
* it's not supposed to do this usually, but might be hard
* to prevent. Discard data we got (if any) and keep going. */
@@ -496,6 +737,8 @@ static void handle_rx_mergeable(struct vhost_net *net)
}
}
+ handle_async_rx_events_notify(net, vq, sock);
+
mutex_unlock(&vq->mutex);
unuse_mm(net->dev.mm);
}
@@ -561,6 +804,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
n->tx_poll_state = VHOST_NET_POLL_DISABLED;
+ n->cache = NULL;
f->private_data = n;
@@ -624,6 +868,21 @@ static void vhost_net_flush(struct vhost_net *n)
vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
}
+static void vhost_async_cleanup(struct vhost_net *n)
+{
+ /* clean the notifier */
+ struct vhost_virtqueue *vq;
+ struct kiocb *iocb = NULL;
+ if (n->cache) {
+ vq = &n->dev.vqs[VHOST_NET_VQ_RX];
+ while ((iocb = notify_dequeue(vq)) != NULL)
+ kmem_cache_free(n->cache, iocb);
+ vq = &n->dev.vqs[VHOST_NET_VQ_TX];
+ while ((iocb = notify_dequeue(vq)) != NULL)
+ kmem_cache_free(n->cache, iocb);
+ }
+}
+
static int vhost_net_release(struct inode *inode, struct file *f)
{
struct vhost_net *n = f->private_data;
@@ -640,6 +899,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
/* We do an extra flush before freeing memory,
* since jobs can re-queue themselves. */
vhost_net_flush(n);
+ vhost_async_cleanup(n);
kfree(n);
return 0;
}
@@ -691,21 +951,61 @@ static struct socket *get_tap_socket(int fd)
return sock;
}
-static struct socket *get_socket(int fd)
+static struct socket *get_mp_socket(int fd)
+{
+ struct file *file = fget(fd);
+ struct socket *sock;
+ if (!file)
+ return ERR_PTR(-EBADF);
+ sock = mp_get_socket(file);
+ if (IS_ERR(sock))
+ fput(file);
+ return sock;
+}
+
+static struct socket *get_socket(struct vhost_virtqueue *vq, int fd,
+ enum vhost_vq_link_state *state)
{
struct socket *sock;
/* special case to disable backend */
if (fd == -1)
return NULL;
+
+ *state = VHOST_VQ_LINK_SYNC;
+
sock = get_raw_socket(fd);
if (!IS_ERR(sock))
return sock;
sock = get_tap_socket(fd);
if (!IS_ERR(sock))
return sock;
+ /* If we dont' have notify_cache, then dont do mpassthru */
+ if (!notify_cache)
+ return ERR_PTR(-ENOTSOCK);
+ /* If we don't have mergeable buffer then dont do mpassthru */
+ if (vhost_has_feature(vq->dev, VIRTIO_NET_F_MRG_RXBUF)) {
+ sock = get_mp_socket(fd);
+ if (!IS_ERR(sock)) {
+ *state = VHOST_VQ_LINK_ASYNC;
+ return sock;
+ }
+ }
return ERR_PTR(-ENOTSOCK);
}
+static void vhost_init_link_state(struct vhost_net *n, int index)
+{
+ struct vhost_virtqueue *vq = n->vqs + index;
+
+ WARN_ON(!mutex_is_locked(&vq->mutex));
+ if (vq->link_state == VHOST_VQ_LINK_ASYNC) {
+ INIT_LIST_HEAD(&vq->notifier);
+ spin_lock_init(&vq->notify_lock);
+ if (!n->cache)
+ n->cache = notify_cache;
+ }
+}
+
static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
{
struct socket *sock, *oldsock;
@@ -729,12 +1029,14 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = -EFAULT;
goto err_vq;
}
- sock = get_socket(fd);
+ sock = get_socket(vq, fd, &vq->link_state);
if (IS_ERR(sock)) {
r = PTR_ERR(sock);
goto err_vq;
}
+ vhost_init_link_state(n, index);
+
/* start polling new socket */
oldsock = vq->private_data;
if (sock != oldsock) {
@@ -879,6 +1181,9 @@ static struct miscdevice vhost_net_misc = {
static int vhost_net_init(void)
{
+ notify_cache = kmem_cache_create("vhost_kiocb",
+ sizeof(struct kiocb), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
return misc_register(&vhost_net_misc);
}
module_init(vhost_net_init);
@@ -886,6 +1191,8 @@ module_init(vhost_net_init);
static void vhost_net_exit(void)
{
misc_deregister(&vhost_net_misc);
+ if (notify_cache)
+ kmem_cache_destroy(notify_cache);
}
module_exit(vhost_net_exit);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dd3d6f7..295d9ab 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1015,6 +1015,84 @@ static int get_indirect(struct vhost_dev *dev, struct vhost_virtqueue *vq,
return 0;
}
+/* To recompute the log */
+int __vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
+ struct iovec iov[], unsigned int iov_size,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num,
+ unsigned int head)
+{
+ struct vring_desc desc;
+ unsigned int i, found = 0;
+ int ret;
+
+ /* When we start there are none of either input nor output. */
+ *out_num = *in_num = 0;
+ if (unlikely(log))
+ *log_num = 0;
+
+ i = head;
+ do {
+ unsigned iov_count = *in_num + *out_num;
+ if (unlikely(i >= vq->num)) {
+ vq_err(vq, "Desc index is %u > %u, head = %u",
+ i, vq->num, head);
+ return -EINVAL;
+ }
+ if (unlikely(++found > vq->num)) {
+ vq_err(vq, "Loop detected: last one at %u "
+ "vq size %u head %u\n",
+ i, vq->num, head);
+ return -EINVAL;
+ }
+ ret = copy_from_user(&desc, vq->desc + i, sizeof desc);
+ if (unlikely(ret)) {
+ vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+ i, vq->desc + i);
+ return -EFAULT;
+ }
+ if (desc.flags & VRING_DESC_F_INDIRECT) {
+ ret = get_indirect(dev, vq, iov, iov_size,
+ out_num, in_num,
+ log, log_num, &desc);
+ if (unlikely(ret < 0)) {
+ vq_err(vq, "Failure detected "
+ "in indirect descriptor at idx %d\n", i);
+ return ret;
+ }
+ continue;
+ }
+
+ ret = translate_desc(dev, desc.addr, desc.len, iov + iov_count,
+ iov_size - iov_count);
+ if (unlikely(ret < 0)) {
+ vq_err(vq, "Translation failure %d descriptor idx %d\n",
+ ret, i);
+ return ret;
+ }
+ if (desc.flags & VRING_DESC_F_WRITE) {
+ /* If this is an input descriptor,
+ * increment that count. */
+ *in_num += ret;
+ if (unlikely(log)) {
+ log[*log_num].addr = desc.addr;
+ log[*log_num].len = desc.len;
+ ++*log_num;
+ }
+ } else {
+ /* If it's an output descriptor, they're all supposed
+ * to come before any input descriptors. */
+ if (unlikely(*in_num)) {
+ vq_err(vq, "Descriptor has out after in: "
+ "idx %d\n", i);
+ return -EINVAL;
+ }
+ *out_num += ret;
+ }
+ } while ((i = next_desc(&desc)) != -1);
+
+ return head;
+}
/* This looks in the virtqueue and for the first available buffer, and converts
* it to an iovec for convenient access. Since descriptors consist of some
* number of output then some number of input descriptors, it's actually two
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index afd7729..915336d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -55,6 +55,11 @@ struct vhost_log {
u64 len;
};
+enum vhost_vq_link_state {
+ VHOST_VQ_LINK_SYNC = 0,
+ VHOST_VQ_LINK_ASYNC = 1,
+};
+
/* The virtqueue structure describes a queue attached to a device. */
struct vhost_virtqueue {
struct vhost_dev *dev;
@@ -110,6 +115,10 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log log[VHOST_NET_MAX_SG];
+ /* Differiate async socket for 0-copy from normal */
+ enum vhost_vq_link_state link_state;
+ struct list_head notifier;
+ spinlock_t notify_lock;
};
struct vhost_dev {
@@ -136,7 +145,11 @@ void vhost_dev_cleanup(struct vhost_dev *);
long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, unsigned long arg);
int vhost_vq_access_ok(struct vhost_virtqueue *vq);
int vhost_log_access_ok(struct vhost_dev *);
-
+int __vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
+ struct iovec iov[], unsigned int iov_count,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num,
+ unsigned int head);
int vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
struct iovec iov[], unsigned int iov_count,
unsigned int *out_num, unsigned int *in_num,
--
1.7.3
^ permalink raw reply related
* [PATCH v13 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
This example is made on ixgbe driver.
It provides API is_rx_buffer_mapped_as_page() to indicate
if the driver use napi_gro_frags() interface or not.
The example allocates 2 pages for DMA for one ring descriptor
using netdev_alloc_page(). When packets is coming, using
napi_gro_frags() to allocate skb and to receive the packets.
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
drivers/net/ixgbe/ixgbe.h | 3 +
drivers/net/ixgbe/ixgbe_main.c | 163 +++++++++++++++++++++++++++++++---------
2 files changed, 131 insertions(+), 35 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index 9e15eb9..89367ca 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -131,6 +131,9 @@ struct ixgbe_rx_buffer {
struct page *page;
dma_addr_t page_dma;
unsigned int page_offset;
+ u16 mapped_as_page;
+ struct page *page_skb;
+ unsigned int page_skb_offset;
};
struct ixgbe_queue_stats {
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index e32af43..a4a5263 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1029,6 +1029,12 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw,
IXGBE_WRITE_REG(hw, IXGBE_RDT(rx_ring->reg_idx), val);
}
+static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi,
+ struct net_device *dev)
+{
+ return true;
+}
+
/**
* ixgbe_alloc_rx_buffers - Replace used receive buffers; packet split
* @adapter: address of board private structure
@@ -1045,13 +1051,17 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
i = rx_ring->next_to_use;
bi = &rx_ring->rx_buffer_info[i];
+
while (cleaned_count--) {
rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i);
+ bi->mapped_as_page =
+ is_rx_buffer_mapped_as_page(bi, adapter->netdev);
+
if (!bi->page_dma &&
(rx_ring->flags & IXGBE_RING_RX_PS_ENABLED)) {
if (!bi->page) {
- bi->page = alloc_page(GFP_ATOMIC);
+ bi->page = netdev_alloc_page(adapter->netdev);
if (!bi->page) {
adapter->alloc_rx_page_failed++;
goto no_buffers;
@@ -1068,7 +1078,7 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
DMA_FROM_DEVICE);
}
- if (!bi->skb) {
+ if (!bi->mapped_as_page && !bi->skb) {
struct sk_buff *skb;
/* netdev_alloc_skb reserves 32 bytes up front!! */
uint bufsz = rx_ring->rx_buf_len + SMP_CACHE_BYTES;
@@ -1088,6 +1098,19 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
rx_ring->rx_buf_len,
DMA_FROM_DEVICE);
}
+
+ if (bi->mapped_as_page && !bi->page_skb) {
+ bi->page_skb = netdev_alloc_page(adapter->netdev);
+ if (!bi->page_skb) {
+ adapter->alloc_rx_page_failed++;
+ goto no_buffers;
+ }
+ bi->page_skb_offset = 0;
+ bi->dma = dma_map_page(&pdev->dev, bi->page_skb,
+ bi->page_skb_offset,
+ (PAGE_SIZE / 2),
+ PCI_DMA_FROMDEVICE);
+ }
/* Refresh the desc even if buffer_addrs didn't change because
* each write-back erases this info. */
if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
@@ -1165,6 +1188,13 @@ struct ixgbe_rsc_cb {
bool delay_unmap;
};
+static bool is_no_buffer(struct ixgbe_rx_buffer *rx_buffer_info)
+{
+ return (!rx_buffer_info->skb ||
+ !rx_buffer_info->page_skb) &&
+ !rx_buffer_info->page;
+}
+
#define IXGBE_RSC_CB(skb) ((struct ixgbe_rsc_cb *)(skb)->cb)
static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
@@ -1174,6 +1204,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
struct ixgbe_adapter *adapter = q_vector->adapter;
struct net_device *netdev = adapter->netdev;
struct pci_dev *pdev = adapter->pdev;
+ struct napi_struct *napi = &q_vector->napi;
union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
struct sk_buff *skb;
@@ -1211,32 +1242,68 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
len = le16_to_cpu(rx_desc->wb.upper.length);
}
+ if (is_no_buffer(rx_buffer_info))
+ break;
cleaned = true;
- skb = rx_buffer_info->skb;
- prefetch(skb->data);
- rx_buffer_info->skb = NULL;
- if (rx_buffer_info->dma) {
- if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
- (!(staterr & IXGBE_RXD_STAT_EOP)) &&
- (!(skb->prev))) {
- /*
- * When HWRSC is enabled, delay unmapping
- * of the first packet. It carries the
- * header information, HW may still
- * access the header after the writeback.
- * Only unmap it when EOP is reached
- */
- IXGBE_RSC_CB(skb)->delay_unmap = true;
- IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
- } else {
- dma_unmap_single(&pdev->dev,
- rx_buffer_info->dma,
- rx_ring->rx_buf_len,
- DMA_FROM_DEVICE);
+ if (!rx_buffer_info->mapped_as_page) {
+ skb = rx_buffer_info->skb;
+ prefetch(skb->data);
+ rx_buffer_info->skb = NULL;
+
+ if (rx_buffer_info->dma) {
+ if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
+ (!(staterr & IXGBE_RXD_STAT_EOP)) &&
+ (!(skb->prev))) {
+ /*
+ * When HWRSC is enabled, delay unmapping
+ * of the first packet. It carries the
+ * header information, HW may still
+ * access the header after the writeback.
+ * Only unmap it when EOP is reached
+ */
+ IXGBE_RSC_CB(skb)->delay_unmap = true;
+ IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
+ } else
+ dma_unmap_single(&pdev->dev,
+ rx_buffer_info->dma,
+ rx_ring->rx_buf_len,
+ DMA_FROM_DEVICE);
+ rx_buffer_info->dma = 0;
+ skb_put(skb, len);
+ }
+ } else {
+ skb = napi_get_frags(napi);
+ prefetch(rx_buffer_info->page_skb_offset);
+ rx_buffer_info->skb = NULL;
+ if (rx_buffer_info->dma) {
+ if ((adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
+ (!(staterr & IXGBE_RXD_STAT_EOP)) &&
+ (!(skb->prev))) {
+ /*
+ * When HWRSC is enabled, delay unmapping
+ * of the first packet. It carries the
+ * header information, HW may still
+ * access the header after the writeback.
+ * Only unmap it when EOP is reached
+ */
+ IXGBE_RSC_CB(skb)->delay_unmap = true;
+ IXGBE_RSC_CB(skb)->dma = rx_buffer_info->dma;
+ } else
+ dma_unmap_page(&pdev->dev, rx_buffer_info->dma,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+ rx_buffer_info->dma = 0;
+ skb_fill_page_desc(skb,
+ skb_shinfo(skb)->nr_frags,
+ rx_buffer_info->page_skb,
+ rx_buffer_info->page_skb_offset,
+ len);
+ rx_buffer_info->page_skb = NULL;
+ skb->len += len;
+ skb->data_len += len;
+ skb->truesize += len;
}
- rx_buffer_info->dma = 0;
- skb_put(skb, len);
}
if (upper_len) {
@@ -1283,10 +1350,16 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
skb = ixgbe_transform_rsc_queue(skb, &(rx_ring->rsc_count));
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) {
if (IXGBE_RSC_CB(skb)->delay_unmap) {
- dma_unmap_single(&pdev->dev,
- IXGBE_RSC_CB(skb)->dma,
- rx_ring->rx_buf_len,
- DMA_FROM_DEVICE);
+ if (!rx_buffer_info->mapped_as_page)
+ dma_unmap_single(&pdev->dev,
+ IXGBE_RSC_CB(skb)->dma,
+ rx_ring->rx_buf_len,
+ DMA_FROM_DEVICE);
+ else
+ dma_unmap_page(&pdev->dev,
+ IXGBE_RSC_CB(skb)->dma,
+ PAGE_SIZE / 2,
+ DMA_FROM_DEVICE);
IXGBE_RSC_CB(skb)->dma = 0;
IXGBE_RSC_CB(skb)->delay_unmap = false;
}
@@ -1304,6 +1377,11 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_buffer_info->dma = next_buffer->dma;
next_buffer->skb = skb;
next_buffer->dma = 0;
+ if (rx_buffer_info->mapped_as_page) {
+ rx_buffer_info->page_skb =
+ next_buffer->page_skb;
+ next_buffer->page_skb = NULL;
+ }
} else {
skb->next = next_buffer->skb;
skb->next->prev = skb;
@@ -1323,7 +1401,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
total_rx_bytes += skb->len;
total_rx_packets++;
- skb->protocol = eth_type_trans(skb, adapter->netdev);
+ if (!rx_buffer_info->mapped_as_page)
+ skb->protocol = eth_type_trans(skb, adapter->netdev);
#ifdef IXGBE_FCOE
/* if ddp, not passing to ULD unless for FCP_RSP or error */
if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
@@ -1332,7 +1411,14 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
goto next_desc;
}
#endif /* IXGBE_FCOE */
- ixgbe_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
+
+ if (!rx_buffer_info->mapped_as_page)
+ ixgbe_receive_skb(q_vector, skb, staterr,
+ rx_ring, rx_desc);
+ else {
+ skb_record_rx_queue(skb, rx_ring->queue_index);
+ napi_gro_frags(napi);
+ }
next_desc:
rx_desc->wb.upper.status_error = 0;
@@ -3622,9 +3708,16 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter,
rx_buffer_info = &rx_ring->rx_buffer_info[i];
if (rx_buffer_info->dma) {
- dma_unmap_single(&pdev->dev, rx_buffer_info->dma,
- rx_ring->rx_buf_len,
- DMA_FROM_DEVICE);
+ if (!rx_buffer_info->mapped_as_page)
+ dma_unmap_single(&pdev->dev, rx_buffer_info->dma,
+ rx_ring->rx_buf_len,
+ PCI_DMA_FROMDEVICE);
+ else {
+ dma_unmap_page(&pdev->dev, rx_buffer_info->dma,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+ rx_buffer_info->page_skb = NULL;
+ }
rx_buffer_info->dma = 0;
}
if (rx_buffer_info->skb) {
@@ -3651,7 +3744,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_adapter *adapter,
PAGE_SIZE / 2, DMA_FROM_DEVICE);
rx_buffer_info->page_dma = 0;
}
- put_page(rx_buffer_info->page);
+ netdev_free_page(adapter->netdev, rx_buffer_info->page);
rx_buffer_info->page = NULL;
rx_buffer_info->page_offset = 0;
}
--
1.7.3
^ permalink raw reply related
* [PATCH v13 16/16] An example how to alloc user buffer based on napi_gro_frags() interface.
From: xiaohui.xin @ 2010-10-15 9:12 UTC (permalink / raw)
To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1287132437.git.xiaohui.xin@intel.com>
From: Xin Xiaohui <xiaohui.xin@intel.com>
This example is made on ixgbe driver which using napi_gro_frags().
It can get buffers from guest side directly using netdev_alloc_page()
and release guest buffers using netdev_free_page().
Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
---
drivers/net/ixgbe/ixgbe_main.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a4a5263..47663ac 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1032,7 +1032,14 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw,
static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi,
struct net_device *dev)
{
- return true;
+ return dev_is_mpassthru(dev);
+}
+
+static u32 get_page_skb_offset(struct net_device *dev)
+{
+ if (!dev_is_mpassthru(dev))
+ return 0;
+ return dev->mp_port->vnet_hlen;
}
/**
@@ -1105,7 +1112,8 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter,
adapter->alloc_rx_page_failed++;
goto no_buffers;
}
- bi->page_skb_offset = 0;
+ bi->page_skb_offset =
+ get_page_skb_offset(adapter->netdev);
bi->dma = dma_map_page(&pdev->dev, bi->page_skb,
bi->page_skb_offset,
(PAGE_SIZE / 2),
@@ -1242,8 +1250,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
len = le16_to_cpu(rx_desc->wb.upper.length);
}
- if (is_no_buffer(rx_buffer_info))
+ if (is_no_buffer(rx_buffer_info)) {
+ printk("no buffers\n");
break;
+ }
cleaned = true;
if (!rx_buffer_info->mapped_as_page) {
@@ -1299,6 +1309,11 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_buffer_info->page_skb,
rx_buffer_info->page_skb_offset,
len);
+ if (dev_is_mpassthru(netdev) &&
+ netdev->mp_port->hash)
+ skb_shinfo(skb)->destructor_arg =
+ netdev->mp_port->hash(netdev,
+ rx_buffer_info->page_skb);
rx_buffer_info->page_skb = NULL;
skb->len += len;
skb->data_len += len;
@@ -1316,7 +1331,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
upper_len);
if ((rx_ring->rx_buf_len > (PAGE_SIZE / 2)) ||
- (page_count(rx_buffer_info->page) != 1))
+ (page_count(rx_buffer_info->page) != 1) ||
+ dev_is_mpassthru(netdev))
rx_buffer_info->page = NULL;
else
get_page(rx_buffer_info->page);
--
1.7.3
^ permalink raw reply related
* [PATCH 08/22] rds: stop including asm-generic/bitops/le.h
From: Akinobu Mita @ 2010-10-15 9:46 UTC (permalink / raw)
To: linux-kernel, linux-arch, Arnd Bergmann, Christoph Hellwig,
Andrew Morton
Cc: Akinobu Mita, Andy Grover, rds-devel, David S. Miller, netdev
In-Reply-To: <1287135981-17604-1-git-send-email-akinobu.mita@gmail.com>
No need to include asm-generic/bitops/le.h as all architectures
provide little endian bit operations now.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
net/rds/cong.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/net/rds/cong.c b/net/rds/cong.c
index c6784d5..15a65f0 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -34,8 +34,6 @@
#include <linux/types.h>
#include <linux/rbtree.h>
-#include <asm-generic/bitops/le.h>
-
#include "rds.h"
/*
--
1.7.1.231.gd0b16
^ permalink raw reply related
* [PATCH] connector: remove lazy workqueue creation
From: Tejun Heo @ 2010-10-15 9:55 UTC (permalink / raw)
To: Evgeniy Polyakov, netdev@vger.kernel.org, Frederic Weisbecker,
David S. Miller
Commit 1a5645bc (connector: create connector workqueue only while
needed once) implements lazy workqueue creation for connector
workqueue. With cmwq now in place, lazy workqueue creation doesn't
make much sense while adding a lot of complexity. Remove it and
allocate an ordered workqueue during initialization.
This also removes a call to flush_scheduled_work() which is deprecated
and scheduled to be removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
---
drivers/connector/cn_queue.c | 75 ++++--------------------------------------
drivers/connector/connector.c | 9 ++---
include/linux/connector.h | 8 ----
3 files changed, 12 insertions(+), 80 deletions(-)
Index: work/drivers/connector/cn_queue.c
===================================================================
--- work.orig/drivers/connector/cn_queue.c
+++ work/drivers/connector/cn_queue.c
@@ -31,48 +31,6 @@
#include <linux/connector.h>
#include <linux/delay.h>
-
-/*
- * This job is sent to the kevent workqueue.
- * While no event is once sent to any callback, the connector workqueue
- * is not created to avoid a useless waiting kernel task.
- * Once the first event is received, we create this dedicated workqueue which
- * is necessary because the flow of data can be high and we don't want
- * to encumber keventd with that.
- */
-static void cn_queue_create(struct work_struct *work)
-{
- struct cn_queue_dev *dev;
-
- dev = container_of(work, struct cn_queue_dev, wq_creation);
-
- dev->cn_queue = create_singlethread_workqueue(dev->name);
- /* If we fail, we will use keventd for all following connector jobs */
- WARN_ON(!dev->cn_queue);
-}
-
-/*
- * Queue a data sent to a callback.
- * If the connector workqueue is already created, we queue the job on it.
- * Otherwise, we queue the job to kevent and queue the connector workqueue
- * creation too.
- */
-int queue_cn_work(struct cn_callback_entry *cbq, struct work_struct *work)
-{
- struct cn_queue_dev *pdev = cbq->pdev;
-
- if (likely(pdev->cn_queue))
- return queue_work(pdev->cn_queue, work);
-
- /* Don't create the connector workqueue twice */
- if (atomic_inc_return(&pdev->wq_requested) == 1)
- schedule_work(&pdev->wq_creation);
- else
- atomic_dec(&pdev->wq_requested);
-
- return schedule_work(work);
-}
-
void cn_queue_wrapper(struct work_struct *work)
{
struct cn_callback_entry *cbq =
@@ -111,11 +69,7 @@ cn_queue_alloc_callback_entry(char *name
static void cn_queue_free_callback(struct cn_callback_entry *cbq)
{
- /* The first jobs have been sent to kevent, flush them too */
- flush_scheduled_work();
- if (cbq->pdev->cn_queue)
- flush_workqueue(cbq->pdev->cn_queue);
-
+ flush_workqueue(cbq->pdev->cn_queue);
kfree(cbq);
}
@@ -193,11 +147,14 @@ struct cn_queue_dev *cn_queue_alloc_dev(
atomic_set(&dev->refcnt, 0);
INIT_LIST_HEAD(&dev->queue_list);
spin_lock_init(&dev->queue_lock);
- init_waitqueue_head(&dev->wq_created);
dev->nls = nls;
- INIT_WORK(&dev->wq_creation, cn_queue_create);
+ dev->cn_queue = alloc_ordered_workqueue(dev->name, 0);
+ if (!dev->cn_queue) {
+ kfree(dev);
+ return NULL;
+ }
return dev;
}
@@ -205,25 +162,9 @@ struct cn_queue_dev *cn_queue_alloc_dev(
void cn_queue_free_dev(struct cn_queue_dev *dev)
{
struct cn_callback_entry *cbq, *n;
- long timeout;
- DEFINE_WAIT(wait);
- /* Flush the first pending jobs queued on kevent */
- flush_scheduled_work();
-
- /* If the connector workqueue creation is still pending, wait for it */
- prepare_to_wait(&dev->wq_created, &wait, TASK_UNINTERRUPTIBLE);
- if (atomic_read(&dev->wq_requested) && !dev->cn_queue) {
- timeout = schedule_timeout(HZ * 2);
- if (!timeout && !dev->cn_queue)
- WARN_ON(1);
- }
- finish_wait(&dev->wq_created, &wait);
-
- if (dev->cn_queue) {
- flush_workqueue(dev->cn_queue);
- destroy_workqueue(dev->cn_queue);
- }
+ flush_workqueue(dev->cn_queue);
+ destroy_workqueue(dev->cn_queue);
spin_lock_bh(&dev->queue_lock);
list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry)
Index: work/drivers/connector/connector.c
===================================================================
--- work.orig/drivers/connector/connector.c
+++ work/drivers/connector/connector.c
@@ -133,7 +133,8 @@ static int cn_call_callback(struct sk_bu
__cbq->data.skb == NULL)) {
__cbq->data.skb = skb;
- if (queue_cn_work(__cbq, &__cbq->work))
+ if (queue_work(dev->cbdev->cn_queue,
+ &__cbq->work))
err = 0;
else
err = -EINVAL;
@@ -148,13 +149,11 @@ static int cn_call_callback(struct sk_bu
d->callback = __cbq->data.callback;
d->free = __new_cbq;
- __new_cbq->pdev = __cbq->pdev;
-
INIT_WORK(&__new_cbq->work,
&cn_queue_wrapper);
- if (queue_cn_work(__new_cbq,
- &__new_cbq->work))
+ if (queue_work(dev->cbdev->cn_queue,
+ &__new_cbq->work))
err = 0;
else {
kfree(__new_cbq);
Index: work/include/linux/connector.h
===================================================================
--- work.orig/include/linux/connector.h
+++ work/include/linux/connector.h
@@ -88,12 +88,6 @@ struct cn_queue_dev {
unsigned char name[CN_CBQ_NAMELEN];
struct workqueue_struct *cn_queue;
- /* Sent to kevent to create cn_queue only when needed */
- struct work_struct wq_creation;
- /* Tell if the wq_creation job is pending/completed */
- atomic_t wq_requested;
- /* Wait for cn_queue to be created */
- wait_queue_head_t wq_created;
struct list_head queue_list;
spinlock_t queue_lock;
@@ -141,8 +135,6 @@ int cn_netlink_send(struct cn_msg *, u32
int cn_queue_add_callback(struct cn_queue_dev *dev, char *name, struct cb_id *id, void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);
-int queue_cn_work(struct cn_callback_entry *cbq, struct work_struct *work);
-
struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *);
void cn_queue_free_dev(struct cn_queue_dev *dev);
^ permalink raw reply
* (unknown),
From: Marc Kleine-Budde @ 2010-10-15 10:33 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA
Moin,
this series of patches improves the mcp251x driver. It first fixes the
local_softirq_pending problem. Then the amount of SPI transfers is reduced
in order to optimise the driver.
This series has been tested with a mcp2515 on i.MX35.
Please review and test,
cheers, Marc
The following changes since commit cd2638a86c7b90e77ce623c09de2a26177f2a5c1:
Carolyn Wyborny (1):
igb: add check for fiber/serdes devices to igb_set_spd_dplx;
are available in the git repository at:
git://git.pengutronix.de/git/mkl/linux-2.6.git can/mcp251x-for-net-next
Marc Kleine-Budde (4):
can: mcp251x: fix NOHZ local_softirq_pending 08 warning
can: mcp251x: write intf only when needed
can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
can: mcp251x: optimize 2515, rx int gets cleared automatically
Sascha Hauer (3):
can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
can: mcp251x: allow to read two registers in one spi transfer
can: mcp251x: read-modify-write eflag only when needed
drivers/net/can/mcp251x.c | 77 +++++++++++++++++++++++++++++++++++----------
1 files changed, 60 insertions(+), 17 deletions(-)
^ permalink raw reply
* [PATCH 1/7] can: mcp251x: fix NOHZ local_softirq_pending 08 warning
From: Marc Kleine-Budde @ 2010-10-15 10:33 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287138845-20561-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
This patch replaces netif_rx() with netif_rx_ni() which has to be used
from the threaded interrupt i.e. process context context.
Thanks to Christian Pellegrin for pointing at the right fix:
481a8199142c050b72bff8a1956a49fd0a75bbe0 by Oliver Hartkopp.
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
drivers/net/can/mcp251x.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index b11a0cb..c06e023 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -451,7 +451,7 @@ static void mcp251x_hw_rx(struct spi_device *spi, int buf_idx)
priv->net->stats.rx_packets++;
priv->net->stats.rx_bytes += frame->can_dlc;
- netif_rx(skb);
+ netif_rx_ni(skb);
}
static void mcp251x_hw_sleep(struct spi_device *spi)
@@ -676,7 +676,7 @@ static void mcp251x_error_skb(struct net_device *net, int can_id, int data1)
if (skb) {
frame->can_id = can_id;
frame->data[1] = data1;
- netif_rx(skb);
+ netif_rx_ni(skb);
} else {
dev_err(&net->dev,
"cannot allocate error skb\n");
--
1.7.0.4
^ permalink raw reply related
* [PATCH 2/7] can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
From: Marc Kleine-Budde @ 2010-10-15 10:34 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287138845-20561-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
drivers/net/can/mcp251x.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index c06e023..fdea752 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -816,10 +816,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
if (intf & CANINTF_ERRIF) {
/* Handle overflow counters */
if (eflag & (EFLG_RX0OVR | EFLG_RX1OVR)) {
- if (eflag & EFLG_RX0OVR)
+ if (eflag & EFLG_RX0OVR) {
net->stats.rx_over_errors++;
- if (eflag & EFLG_RX1OVR)
+ net->stats.rx_errors++;
+ }
+ if (eflag & EFLG_RX1OVR) {
net->stats.rx_over_errors++;
+ net->stats.rx_errors++;
+ }
can_id |= CAN_ERR_CRTL;
data1 |= CAN_ERR_CRTL_RX_OVERFLOW;
}
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH net-next 2/5] tipc: Simplify bearer shutdown logic
From: Neil Horman @ 2010-10-15 10:48 UTC (permalink / raw)
To: Paul Gortmaker; +Cc: davem, netdev, allan.stephens
In-Reply-To: <20101014235825.GA5048@windriver.com>
On Thu, Oct 14, 2010 at 07:58:26PM -0400, Paul Gortmaker wrote:
> [Re: [PATCH net-next 2/5] tipc: Simplify bearer shutdown logic] On 13/10/2010 (Wed 10:39) Neil Horman wrote:
>
> > On Tue, Oct 12, 2010 at 08:25:55PM -0400, Paul Gortmaker wrote:
> > > From: Allan Stephens <allan.stephens@windriver.com>
> > >
> > > Disable all active bearers when TIPC is shut down without having to do
> > > a name-based search to locate each bearer object.
> > >
> > It seems like you're doing a good deal more in this patch than just disabling
> > all active bearers without doing a name search. The description is implemented
> > in the for loop of tipc_bearer_stop. Whats the rest of it for?
>
> It seems the original needlessly bloated out the patch size by
> swapping the order of tipc_bearer_find_interface & bearer_find
> in the file (now fixed) - and you are right, the locking change
> wasn't properly covered in the commit log. The extra test you'd
> suggested tossing out is also now gone.
>
> This change doesn't explicitly depend on any other changes,
> so if it is now OK, the option is there for it to be applied
> independently of the others that haven't been reworked yet.
>
> Thanks,
> Paul.
>
>
> From 1771ad642cb076dbeb71e3533a25cb2f07df9cd8 Mon Sep 17 00:00:00 2001
> From: Allan Stephens <allan.stephens@windriver.com>
> Date: Sat, 4 Sep 2010 09:29:04 -0400
> Subject: [PATCH] tipc: Simplify bearer shutdown logic
>
> Optimize processing in TIPC's bearer shutdown code, including:
>
> 1. Remove an unnecessary check to see if TIPC bearer's can exist.
> 2. Don't release spinlocks before calling a media-specific disabling
> routine, since the routine can't sleep.
> 3. Make bearer_disable() operate directly on a struct bearer, instead
> of needlessly taking a name and then mapping that to the struct.
>
> Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> net/tipc/bearer.c | 38 +++++++++++---------------------------
> 1 files changed, 11 insertions(+), 27 deletions(-)
>
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 9c10c6b..fd9c06c 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -288,9 +288,6 @@ static struct bearer *bearer_find(const char *name)
> struct bearer *b_ptr;
> u32 i;
>
> - if (tipc_mode != TIPC_NET_MODE)
> - return NULL;
> -
> for (i = 0, b_ptr = tipc_bearers; i < MAX_BEARERS; i++, b_ptr++) {
> if (b_ptr->active && (!strcmp(b_ptr->publ.name, name)))
> return b_ptr;
> @@ -630,30 +627,17 @@ int tipc_block_bearer(const char *name)
> * Note: This routine assumes caller holds tipc_net_lock.
> */
>
> -static int bearer_disable(const char *name)
> +static int bearer_disable(struct bearer *b_ptr)
> {
> - struct bearer *b_ptr;
> struct link *l_ptr;
> struct link *temp_l_ptr;
>
> - b_ptr = bearer_find(name);
> - if (!b_ptr) {
> - warn("Attempt to disable unknown bearer <%s>\n", name);
> - return -EINVAL;
> - }
> -
> - info("Disabling bearer <%s>\n", name);
> + info("Disabling bearer <%s>\n", b_ptr->publ.name);
> tipc_disc_stop_link_req(b_ptr->link_req);
> spin_lock_bh(&b_ptr->publ.lock);
> b_ptr->link_req = NULL;
> b_ptr->publ.blocked = 1;
> - if (b_ptr->media->disable_bearer) {
> - spin_unlock_bh(&b_ptr->publ.lock);
> - write_unlock_bh(&tipc_net_lock);
> - b_ptr->media->disable_bearer(&b_ptr->publ);
> - write_lock_bh(&tipc_net_lock);
> - spin_lock_bh(&b_ptr->publ.lock);
> - }
> + b_ptr->media->disable_bearer(&b_ptr->publ);
> list_for_each_entry_safe(l_ptr, temp_l_ptr, &b_ptr->links, link_list) {
> tipc_link_delete(l_ptr);
> }
> @@ -664,10 +648,16 @@ static int bearer_disable(const char *name)
>
> int tipc_disable_bearer(const char *name)
> {
> + struct bearer *b_ptr;
> int res;
>
> write_lock_bh(&tipc_net_lock);
> - res = bearer_disable(name);
> + b_ptr = bearer_find(name);
> + if (b_ptr == NULL) {
> + warn("Attempt to disable unknown bearer <%s>\n", name);
> + res = -EINVAL;
> + } else
> + res = bearer_disable(b_ptr);
> write_unlock_bh(&tipc_net_lock);
> return res;
> }
> @@ -680,13 +670,7 @@ void tipc_bearer_stop(void)
>
> for (i = 0; i < MAX_BEARERS; i++) {
> if (tipc_bearers[i].active)
> - tipc_bearers[i].publ.blocked = 1;
> - }
> - for (i = 0; i < MAX_BEARERS; i++) {
> - if (tipc_bearers[i].active)
> - bearer_disable(tipc_bearers[i].publ.name);
> + bearer_disable(&tipc_bearers[i]);
> }
> media_count = 0;
> }
> -
> -
> --
> 1.7.2.1
>
>
Yes, this looks much better, thank you.
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* [PATCH V2 0/7] can: mcp251x: fix and optimize driver
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA
Moin,
this series of patches improves the mcp251x driver. It first fixes the
local_softirq_pending problem. Then the amount of SPI transfers is reduced
in order to optimise the driver.
This series has been tested with a mcp2515 on i.MX35.
Changes since V1:
- Fix broken encoding in S-o-b
Please review, test and consider to apply.
regards, Marc
---
The following changes since commit cd2638a86c7b90e77ce623c09de2a26177f2a5c1:
Carolyn Wyborny (1):
igb: add check for fiber/serdes devices to igb_set_spd_dplx;
are available in the git repository at:
git://git.pengutronix.de/git/mkl/linux-2.6.git can/mcp251x-for-net-next
Marc Kleine-Budde (4):
can: mcp251x: fix NOHZ local_softirq_pending 08 warning
can: mcp251x: write intf only when needed
can: mcp251x: define helper functions mcp251x_is_2510, mcp251x_is_2515
can: mcp251x: optimize 2515, rx int gets cleared automatically
Sascha Hauer (3):
can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
can: mcp251x: allow to read two registers in one spi transfer
can: mcp251x: read-modify-write eflag only when needed
drivers/net/can/mcp251x.c | 77 +++++++++++++++++++++++++++++++++++----------
1 files changed, 60 insertions(+), 17 deletions(-)
^ permalink raw reply
* [PATCH 1/7] can: mcp251x: fix NOHZ local_softirq_pending 08 warning
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
This patch replaces netif_rx() with netif_rx_ni() which has to be used
from the threaded interrupt i.e. process context context.
Thanks to Christian Pellegrin for pointing at the right fix:
481a8199142c050b72bff8a1956a49fd0a75bbe0 by Oliver Hartkopp.
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
drivers/net/can/mcp251x.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index b11a0cb..c06e023 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -451,7 +451,7 @@ static void mcp251x_hw_rx(struct spi_device *spi, int buf_idx)
priv->net->stats.rx_packets++;
priv->net->stats.rx_bytes += frame->can_dlc;
- netif_rx(skb);
+ netif_rx_ni(skb);
}
static void mcp251x_hw_sleep(struct spi_device *spi)
@@ -676,7 +676,7 @@ static void mcp251x_error_skb(struct net_device *net, int can_id, int data1)
if (skb) {
frame->can_id = can_id;
frame->data[1] = data1;
- netif_rx(skb);
+ netif_rx_ni(skb);
} else {
dev_err(&net->dev,
"cannot allocate error skb\n");
--
1.7.0.4
^ permalink raw reply related
* [PATCH 2/7] can: mcp251x: increase rx_errors on overflow, not only rx_over_errors
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
drivers/net/can/mcp251x.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index c06e023..fdea752 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -816,10 +816,14 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
if (intf & CANINTF_ERRIF) {
/* Handle overflow counters */
if (eflag & (EFLG_RX0OVR | EFLG_RX1OVR)) {
- if (eflag & EFLG_RX0OVR)
+ if (eflag & EFLG_RX0OVR) {
net->stats.rx_over_errors++;
- if (eflag & EFLG_RX1OVR)
+ net->stats.rx_errors++;
+ }
+ if (eflag & EFLG_RX1OVR) {
net->stats.rx_over_errors++;
+ net->stats.rx_errors++;
+ }
can_id |= CAN_ERR_CRTL;
data1 |= CAN_ERR_CRTL_RX_OVERFLOW;
}
--
1.7.0.4
^ permalink raw reply related
* [PATCH 3/7] can: mcp251x: allow to read two registers in one spi transfer
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde,
Uwe Kleine-König
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
From: Sascha Hauer <s.hauer@pengutronix.de>
This patch bases on work done earlier by David Jander.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: David Jander <david@protonic.nl>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
drivers/net/can/mcp251x.c | 20 +++++++++++++++++---
1 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index fdea752..9b3466a 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -319,6 +319,20 @@ static u8 mcp251x_read_reg(struct spi_device *spi, uint8_t reg)
return val;
}
+static void mcp251x_read_2regs(struct spi_device *spi, uint8_t reg,
+ uint8_t *v1, uint8_t *v2)
+{
+ struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
+
+ priv->spi_tx_buf[0] = INSTRUCTION_READ;
+ priv->spi_tx_buf[1] = reg;
+
+ mcp251x_spi_trans(spi, 4);
+
+ *v1 = priv->spi_rx_buf[2];
+ *v2 = priv->spi_rx_buf[3];
+}
+
static void mcp251x_write_reg(struct spi_device *spi, u8 reg, uint8_t val)
{
struct mcp251x_priv *priv = dev_get_drvdata(&spi->dev);
@@ -754,10 +768,11 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
mutex_lock(&priv->mcp_lock);
while (!priv->force_quit) {
enum can_state new_state;
- u8 intf = mcp251x_read_reg(spi, CANINTF);
- u8 eflag;
+ u8 intf, eflag;
int can_id = 0, data1 = 0;
+ mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
+
if (intf & CANINTF_RX0IF) {
mcp251x_hw_rx(spi, 0);
/* Free one buffer ASAP */
@@ -770,7 +785,6 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
mcp251x_write_bits(spi, CANINTF, intf, 0x00);
- eflag = mcp251x_read_reg(spi, EFLG);
mcp251x_write_reg(spi, EFLG, 0x00);
/* Update can state */
--
1.7.0.4
_______________________________________________
Socketcan-core mailing list
Socketcan-core@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/socketcan-core
^ permalink raw reply related
* [PATCH 4/7] can: mcp251x: read-modify-write eflag only when needed
From: Marc Kleine-Budde @ 2010-10-15 10:49 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1287139762-23356-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
From: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Use read-modify-write instead of a simple write to change the register
contents, to close existing the race window between the original manual
read and write.
Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
drivers/net/can/mcp251x.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
index 9b3466a..7e2f951 100644
--- a/drivers/net/can/mcp251x.c
+++ b/drivers/net/can/mcp251x.c
@@ -785,7 +785,8 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
mcp251x_write_bits(spi, CANINTF, intf, 0x00);
- mcp251x_write_reg(spi, EFLG, 0x00);
+ if (eflag)
+ mcp251x_write_bits(spi, EFLG, eflag, 0x00);
/* Update can state */
if (eflag & EFLG_TXBO) {
--
1.7.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox