Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v6] can: kvaser_usb: Add support for Kvaser CAN/USB devices
From: Olivier Sobrie @ 2012-11-22 15:01 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can
  Cc: netdev, linux-usb, Daniel Berglund
In-Reply-To: <1353481873-3214-1-git-send-email-olivier@sobrie.be>

Hi linux-usb folks,

Is there someone who can help me to fix the following errors?

smatch warnings:

+ drivers/net/can/usb/kvaser_usb.c:431 kvaser_usb_send_simple_msg() error: doing
+dma on the stack ((null))
+ drivers/net/can/usb/kvaser_usb.c:1073 kvaser_usb_set_opt_mode() error: doing
+dma on the stack ((null))
+ drivers/net/can/usb/kvaser_usb.c:1174 kvaser_usb_flush_queue() error: doing
+dma on the stack ((null))
+ drivers/net/can/usb/kvaser_usb.c:1384 kvaser_usb_set_bittiming() error: doing
+dma on the stack ((null))

I assume it's due to the buffer I pass to the function usb_bulk_msg()
which is on the stack and can't be.
Do I just have to kmalloc a buffer and give it to the usb_bulk_msg()
function? That's what I understood by reading
"Documentation/DMA-API-HOWTO.txt" section "What memory is DMA'able?"...
and from commit
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=32ec4576c3fb37316b1d11a04b220527822f3f0d

Thanks,

Olivier

On Wed, Nov 21, 2012 at 08:11:13AM +0100, Olivier Sobrie wrote:
> This driver provides support for several Kvaser CAN/USB devices.
> Such kind of devices supports up to three CAN network interfaces.
> 
> It has been tested with a Kvaser USB Leaf Light (one network interface)
> connected to a pch_can interface.
> The firmware version of the Kvaser device was 2.5.205.
> 
> List of Kvaser devices supported by the driver:
>   - Kvaser Leaf Light
>   - Kvaser Leaf Professional HS
>   - Kvaser Leaf SemiPro HS
>   - Kvaser Leaf Professional LS
>   - Kvaser Leaf Professional SWC
>   - Kvaser Leaf Professional LIN
>   - Kvaser Leaf SemiPro LS
>   - Kvaser Leaf SemiPro SWC
>   - Kvaser Memorator II HS/HS
>   - Kvaser USBcan Professional HS/HS
>   - Kvaser Leaf Light GI
>   - Kvaser Leaf Professional HS (OBD-II connector)
>   - Kvaser Memorator Professional HS/LS
>   - Kvaser Leaf Light "China"
>   - Kvaser BlackBird SemiPro
>   - Kvaser USBcan R
> 
> Signed-off-by: Daniel Berglund <db@kvaser.com>
> Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
> ---
> Hi,
> 
> This version includes the last changes requested by Marc on version 5 of
> the patch.
> 
> Olivier
> 
>  drivers/net/can/usb/Kconfig      |   29 +
>  drivers/net/can/usb/Makefile     |    1 +
>  drivers/net/can/usb/kvaser_usb.c | 1598 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1628 insertions(+)
>  create mode 100644 drivers/net/can/usb/kvaser_usb.c
> 
> diff --git a/drivers/net/can/usb/Kconfig b/drivers/net/can/usb/Kconfig
> index 0a68768..a4e4bee 100644
> --- a/drivers/net/can/usb/Kconfig
> +++ b/drivers/net/can/usb/Kconfig
> @@ -13,6 +13,35 @@ config CAN_ESD_USB2
>            This driver supports the CAN-USB/2 interface
>            from esd electronic system design gmbh (http://www.esd.eu).
>  
> +config CAN_KVASER_USB
> +	tristate "Kvaser CAN/USB interface"
> +	---help---
> +	  This driver adds support for Kvaser CAN/USB devices like Kvaser
> +	  Leaf Light.
> +
> +	  The driver gives support for the following devices:
> +	    - Kvaser Leaf Light
> +	    - Kvaser Leaf Professional HS
> +	    - Kvaser Leaf SemiPro HS
> +	    - Kvaser Leaf Professional LS
> +	    - Kvaser Leaf Professional SWC
> +	    - Kvaser Leaf Professional LIN
> +	    - Kvaser Leaf SemiPro LS
> +	    - Kvaser Leaf SemiPro SWC
> +	    - Kvaser Memorator II HS/HS
> +	    - Kvaser USBcan Professional HS/HS
> +	    - Kvaser Leaf Light GI
> +	    - Kvaser Leaf Professional HS (OBD-II connector)
> +	    - Kvaser Memorator Professional HS/LS
> +	    - Kvaser Leaf Light "China"
> +	    - Kvaser BlackBird SemiPro
> +	    - Kvaser USBcan R
> +
> +	  If unsure, say N.
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called kvaser_usb.
> +
>  config CAN_PEAK_USB
>  	tristate "PEAK PCAN-USB/USB Pro interfaces"
>  	---help---
> diff --git a/drivers/net/can/usb/Makefile b/drivers/net/can/usb/Makefile
> index da6d1d3..80a2ee4 100644
> --- a/drivers/net/can/usb/Makefile
> +++ b/drivers/net/can/usb/Makefile
> @@ -4,6 +4,7 @@
>  
>  obj-$(CONFIG_CAN_EMS_USB) += ems_usb.o
>  obj-$(CONFIG_CAN_ESD_USB2) += esd_usb2.o
> +obj-$(CONFIG_CAN_KVASER_USB) += kvaser_usb.o
>  obj-$(CONFIG_CAN_PEAK_USB) += peak_usb/
>  
>  ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
> diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
> new file mode 100644
> index 0000000..8807bf8
> --- /dev/null
> +++ b/drivers/net/can/usb/kvaser_usb.c
> @@ -0,0 +1,1598 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation version 2.
> + *
> + * Parts of this driver are based on the following:
> + *  - Kvaser linux leaf driver (version 4.78)
> + *  - CAN driver for esd CAN-USB/2
> + *
> + * Copyright (C) 2002-2006 KVASER AB, Sweden. All rights reserved.
> + * Copyright (C) 2010 Matthias Fuchs <matthias.fuchs@esd.eu>, esd gmbh
> + * Copyright (C) 2012 Olivier Sobrie <olivier@sobrie.be>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/completion.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/usb.h>
> +
> +#include <linux/can.h>
> +#include <linux/can/dev.h>
> +#include <linux/can/error.h>
> +
> +#define MAX_TX_URBS			16
> +#define MAX_RX_URBS			4
> +#define START_TIMEOUT			1000 /* msecs */
> +#define STOP_TIMEOUT			1000 /* msecs */
> +#define USB_SEND_TIMEOUT		1000 /* msecs */
> +#define USB_RECV_TIMEOUT		1000 /* msecs */
> +#define RX_BUFFER_SIZE			3072
> +#define CAN_USB_CLOCK			8000000
> +#define MAX_NET_DEVICES			3
> +
> +/* Kvaser USB devices */
> +#define KVASER_VENDOR_ID		0x0bfd
> +#define USB_LEAF_DEVEL_PRODUCT_ID	10
> +#define USB_LEAF_LITE_PRODUCT_ID	11
> +#define USB_LEAF_PRO_PRODUCT_ID		12
> +#define USB_LEAF_SPRO_PRODUCT_ID	14
> +#define USB_LEAF_PRO_LS_PRODUCT_ID	15
> +#define USB_LEAF_PRO_SWC_PRODUCT_ID	16
> +#define USB_LEAF_PRO_LIN_PRODUCT_ID	17
> +#define USB_LEAF_SPRO_LS_PRODUCT_ID	18
> +#define USB_LEAF_SPRO_SWC_PRODUCT_ID	19
> +#define USB_MEMO2_DEVEL_PRODUCT_ID	22
> +#define USB_MEMO2_HSHS_PRODUCT_ID	23
> +#define USB_UPRO_HSHS_PRODUCT_ID	24
> +#define USB_LEAF_LITE_GI_PRODUCT_ID	25
> +#define USB_LEAF_PRO_OBDII_PRODUCT_ID	26
> +#define USB_MEMO2_HSLS_PRODUCT_ID	27
> +#define USB_LEAF_LITE_CH_PRODUCT_ID	28
> +#define USB_BLACKBIRD_SPRO_PRODUCT_ID	29
> +#define USB_OEM_MERCURY_PRODUCT_ID	34
> +#define USB_OEM_LEAF_PRODUCT_ID		35
> +#define USB_CAN_R_PRODUCT_ID		39
> +
> +/* USB devices features */
> +#define KVASER_HAS_SILENT_MODE		BIT(0)
> +#define KVASER_HAS_TXRX_ERRORS		BIT(1)
> +
> +/* Message header size */
> +#define MSG_HEADER_LEN			2
> +
> +/* Can message flags */
> +#define MSG_FLAG_ERROR_FRAME		BIT(0)
> +#define MSG_FLAG_OVERRUN		BIT(1)
> +#define MSG_FLAG_NERR			BIT(2)
> +#define MSG_FLAG_WAKEUP			BIT(3)
> +#define MSG_FLAG_REMOTE_FRAME		BIT(4)
> +#define MSG_FLAG_RESERVED		BIT(5)
> +#define MSG_FLAG_TX_ACK			BIT(6)
> +#define MSG_FLAG_TX_REQUEST		BIT(7)
> +
> +/* Can states */
> +#define M16C_STATE_BUS_RESET		BIT(0)
> +#define M16C_STATE_BUS_ERROR		BIT(4)
> +#define M16C_STATE_BUS_PASSIVE		BIT(5)
> +#define M16C_STATE_BUS_OFF		BIT(6)
> +
> +/* Can msg ids */
> +#define CMD_RX_STD_MESSAGE		12
> +#define CMD_TX_STD_MESSAGE		13
> +#define CMD_RX_EXT_MESSAGE		14
> +#define CMD_TX_EXT_MESSAGE		15
> +#define CMD_SET_BUS_PARAMS		16
> +#define CMD_GET_BUS_PARAMS		17
> +#define CMD_GET_BUS_PARAMS_REPLY	18
> +#define CMD_GET_CHIP_STATE		19
> +#define CMD_CHIP_STATE_EVENT		20
> +#define CMD_SET_CTRL_MODE		21
> +#define CMD_GET_CTRL_MODE		22
> +#define CMD_GET_CTRL_MODE_REPLY		23
> +#define CMD_RESET_CHIP			24
> +#define CMD_RESET_CARD			25
> +#define CMD_START_CHIP			26
> +#define CMD_START_CHIP_REPLY		27
> +#define CMD_STOP_CHIP			28
> +#define CMD_STOP_CHIP_REPLY		29
> +#define CMD_GET_CARD_INFO2		32
> +#define CMD_GET_CARD_INFO		34
> +#define CMD_GET_CARD_INFO_REPLY		35
> +#define CMD_GET_SOFTWARE_INFO		38
> +#define CMD_GET_SOFTWARE_INFO_REPLY	39
> +#define CMD_ERROR_EVENT			45
> +#define CMD_FLUSH_QUEUE			48
> +#define CMD_RESET_ERROR_COUNTER		49
> +#define CMD_TX_ACKNOWLEDGE		50
> +#define CMD_CAN_ERROR_EVENT		51
> +#define CMD_USB_THROTTLE		77
> +#define CMD_LOG_MESSAGE			106
> +
> +/* error factors */
> +#define M16C_EF_ACKE			BIT(0)
> +#define M16C_EF_CRCE			BIT(1)
> +#define M16C_EF_FORME			BIT(2)
> +#define M16C_EF_STFE			BIT(3)
> +#define M16C_EF_BITE0			BIT(4)
> +#define M16C_EF_BITE1			BIT(5)
> +#define M16C_EF_RCVE			BIT(6)
> +#define M16C_EF_TRE			BIT(7)
> +
> +/* bittiming parameters */
> +#define KVASER_USB_TSEG1_MIN		1
> +#define KVASER_USB_TSEG1_MAX		16
> +#define KVASER_USB_TSEG2_MIN		1
> +#define KVASER_USB_TSEG2_MAX		8
> +#define KVASER_USB_SJW_MAX		4
> +#define KVASER_USB_BRP_MIN		1
> +#define KVASER_USB_BRP_MAX		64
> +#define KVASER_USB_BRP_INC		1
> +
> +/* ctrl modes */
> +#define KVASER_CTRL_MODE_NORMAL		1
> +#define KVASER_CTRL_MODE_SILENT		2
> +#define KVASER_CTRL_MODE_SELFRECEPTION	3
> +#define KVASER_CTRL_MODE_OFF		4
> +
> +struct kvaser_msg_simple {
> +	u8 tid;
> +	u8 channel;
> +} __packed;
> +
> +struct kvaser_msg_cardinfo {
> +	u8 tid;
> +	u8 nchannels;
> +	__le32 serial_number;
> +	__le32 padding;
> +	__le32 clock_resolution;
> +	__le32 mfgdate;
> +	u8 ean[8];
> +	u8 hw_revision;
> +	u8 usb_hs_mode;
> +	__le16 padding2;
> +} __packed;
> +
> +struct kvaser_msg_cardinfo2 {
> +	u8 tid;
> +	u8 channel;
> +	u8 pcb_id[24];
> +	__le32 oem_unlock_code;
> +} __packed;
> +
> +struct kvaser_msg_softinfo {
> +	u8 tid;
> +	u8 channel;
> +	__le32 sw_options;
> +	__le32 fw_version;
> +	__le16 max_outstanding_tx;
> +	__le16 padding[9];
> +} __packed;
> +
> +struct kvaser_msg_busparams {
> +	u8 tid;
> +	u8 channel;
> +	__le32 bitrate;
> +	u8 tseg1;
> +	u8 tseg2;
> +	u8 sjw;
> +	u8 no_samp;
> +} __packed;
> +
> +struct kvaser_msg_tx_can {
> +	u8 channel;
> +	u8 tid;
> +	u8 msg[14];
> +	u8 padding;
> +	u8 flags;
> +} __packed;
> +
> +struct kvaser_msg_rx_can {
> +	u8 channel;
> +	u8 flag;
> +	__le16 time[3];
> +	u8 msg[14];
> +} __packed;
> +
> +struct kvaser_msg_chip_state_event {
> +	u8 tid;
> +	u8 channel;
> +	__le16 time[3];
> +	u8 tx_errors_count;
> +	u8 rx_errors_count;
> +	u8 status;
> +	u8 padding[3];
> +} __packed;
> +
> +struct kvaser_msg_tx_acknowledge {
> +	u8 channel;
> +	u8 tid;
> +	__le16 time[3];
> +	u8 flags;
> +	u8 time_offset;
> +} __packed;
> +
> +struct kvaser_msg_error_event {
> +	u8 tid;
> +	u8 flags;
> +	__le16 time[3];
> +	u8 channel;
> +	u8 padding;
> +	u8 tx_errors_count;
> +	u8 rx_errors_count;
> +	u8 status;
> +	u8 error_factor;
> +} __packed;
> +
> +struct kvaser_msg_ctrl_mode {
> +	u8 tid;
> +	u8 channel;
> +	u8 ctrl_mode;
> +	u8 padding[3];
> +} __packed;
> +
> +struct kvaser_msg_flush_queue {
> +	u8 tid;
> +	u8 channel;
> +	u8 flags;
> +	u8 padding[3];
> +} __packed;
> +
> +struct kvaser_msg_log_message {
> +	u8 channel;
> +	u8 flags;
> +	__le16 time[3];
> +	u8 dlc;
> +	u8 time_offset;
> +	__le32 id;
> +	u8 data[8];
> +} __packed;
> +
> +struct kvaser_msg {
> +	u8 len;
> +	u8 id;
> +	union	{
> +		struct kvaser_msg_simple simple;
> +		struct kvaser_msg_cardinfo cardinfo;
> +		struct kvaser_msg_cardinfo2 cardinfo2;
> +		struct kvaser_msg_softinfo softinfo;
> +		struct kvaser_msg_busparams busparams;
> +		struct kvaser_msg_tx_can tx_can;
> +		struct kvaser_msg_rx_can rx_can;
> +		struct kvaser_msg_chip_state_event chip_state_event;
> +		struct kvaser_msg_tx_acknowledge tx_acknowledge;
> +		struct kvaser_msg_error_event error_event;
> +		struct kvaser_msg_ctrl_mode ctrl_mode;
> +		struct kvaser_msg_flush_queue flush_queue;
> +		struct kvaser_msg_log_message log_message;
> +	} u;
> +} __packed;
> +
> +struct kvaser_usb_tx_urb_context {
> +	struct kvaser_usb_net_priv *priv;
> +	u32 echo_index;
> +	int dlc;
> +};
> +
> +struct kvaser_usb {
> +	struct usb_device *udev;
> +	struct kvaser_usb_net_priv *nets[MAX_NET_DEVICES];
> +
> +	struct usb_endpoint_descriptor *bulk_in, *bulk_out;
> +	struct usb_anchor rx_submitted;
> +
> +	u32 fw_version;
> +	unsigned int nchannels;
> +
> +	bool rxinitdone;
> +	void *rxbuf[MAX_RX_URBS];
> +	dma_addr_t rxbuf_dma[MAX_RX_URBS];
> +};
> +
> +struct kvaser_usb_net_priv {
> +	struct can_priv can;
> +
> +	atomic_t active_tx_urbs;
> +	struct usb_anchor tx_submitted;
> +	struct kvaser_usb_tx_urb_context tx_contexts[MAX_TX_URBS];
> +
> +	struct completion start_comp, stop_comp;
> +
> +	struct kvaser_usb *dev;
> +	struct net_device *netdev;
> +	int channel;
> +
> +	struct can_berr_counter bec;
> +};
> +
> +static struct usb_device_id kvaser_usb_table[] = {
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_DEVEL_PRODUCT_ID) },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_LITE_PRODUCT_ID) },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_PRO_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_SPRO_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_PRO_LS_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_PRO_SWC_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_PRO_LIN_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_SPRO_LS_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_SPRO_SWC_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_MEMO2_DEVEL_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_MEMO2_HSHS_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_UPRO_HSHS_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_LITE_GI_PRODUCT_ID) },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_PRO_OBDII_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS |
> +			       KVASER_HAS_SILENT_MODE },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_MEMO2_HSLS_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_LEAF_LITE_CH_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_BLACKBIRD_SPRO_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_OEM_MERCURY_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_OEM_LEAF_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ USB_DEVICE(KVASER_VENDOR_ID, USB_CAN_R_PRODUCT_ID),
> +		.driver_info = KVASER_HAS_TXRX_ERRORS },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(usb, kvaser_usb_table);
> +
> +static inline int kvaser_usb_send_msg(const struct kvaser_usb *dev,
> +				      struct kvaser_msg *msg)
> +{
> +	int actual_len;
> +
> +	return usb_bulk_msg(dev->udev,
> +			    usb_sndbulkpipe(dev->udev,
> +					dev->bulk_out->bEndpointAddress),
> +			    msg, msg->len, &actual_len,
> +			    USB_SEND_TIMEOUT);
> +}
> +
> +static int kvaser_usb_wait_msg(const struct kvaser_usb *dev, u8 id,
> +			       struct kvaser_msg *msg)
> +{
> +	struct kvaser_msg *tmp;
> +	void *buf;
> +	int actual_len;
> +	int err;
> +	int pos = 0;
> +
> +	buf = kzalloc(RX_BUFFER_SIZE, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	err = usb_bulk_msg(dev->udev,
> +			   usb_rcvbulkpipe(dev->udev,
> +					   dev->bulk_in->bEndpointAddress),
> +			   buf, RX_BUFFER_SIZE, &actual_len,
> +			   USB_RECV_TIMEOUT);
> +	if (err < 0)
> +		goto end;
> +
> +	while (pos <= actual_len - MSG_HEADER_LEN) {
> +		tmp = buf + pos;
> +
> +		if (!tmp->len)
> +			break;
> +
> +		if (pos + tmp->len > actual_len) {
> +			dev_err(dev->udev->dev.parent, "Format error\n");
> +			break;
> +		}
> +
> +		if (tmp->id == id) {
> +			memcpy(msg, tmp, tmp->len);
> +			goto end;
> +		}
> +
> +		pos += tmp->len;
> +	}
> +
> +	err = -EINVAL;
> +
> +end:
> +	kfree(buf);
> +
> +	return err;
> +}
> +
> +static int kvaser_usb_send_simple_msg(const struct kvaser_usb *dev,
> +				      u8 msg_id, int channel)
> +{
> +	struct kvaser_msg msg = {
> +		.len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple),
> +		.id = msg_id,
> +		.u.simple.channel = channel,
> +		.u.simple.tid = 0xff,
> +	};
> +
> +	return kvaser_usb_send_msg(dev, &msg);
> +}
> +
> +static int kvaser_usb_get_software_info(struct kvaser_usb *dev)
> +{
> +	struct kvaser_msg msg;
> +	int err;
> +
> +	err = kvaser_usb_send_simple_msg(dev, CMD_GET_SOFTWARE_INFO, 0);
> +	if (err)
> +		return err;
> +
> +	err = kvaser_usb_wait_msg(dev, CMD_GET_SOFTWARE_INFO_REPLY, &msg);
> +	if (err)
> +		return err;
> +
> +	dev->fw_version = le32_to_cpu(msg.u.softinfo.fw_version);
> +
> +	return 0;
> +}
> +
> +static int kvaser_usb_get_card_info(struct kvaser_usb *dev)
> +{
> +	struct kvaser_msg msg;
> +	int err;
> +
> +	err = kvaser_usb_send_simple_msg(dev, CMD_GET_CARD_INFO, 0);
> +	if (err)
> +		return err;
> +
> +	err = kvaser_usb_wait_msg(dev, CMD_GET_CARD_INFO_REPLY, &msg);
> +	if (err)
> +		return err;
> +
> +	dev->nchannels = msg.u.cardinfo.nchannels;
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_tx_acknowledge(const struct kvaser_usb *dev,
> +				      const struct kvaser_msg *msg)
> +{
> +	struct net_device_stats *stats;
> +	struct kvaser_usb_tx_urb_context *context;
> +	struct kvaser_usb_net_priv *priv;
> +	struct sk_buff *skb;
> +	struct can_frame *cf;
> +	u8 channel = msg->u.tx_acknowledge.channel;
> +	u8 tid = msg->u.tx_acknowledge.tid;
> +
> +	if (channel >= dev->nchannels) {
> +		dev_err(dev->udev->dev.parent,
> +			"Invalid channel number (%d)\n", channel);
> +		return;
> +	}
> +
> +	priv = dev->nets[channel];
> +
> +	if (!netif_device_present(priv->netdev))
> +		return;
> +
> +	stats = &priv->netdev->stats;
> +
> +	context = &priv->tx_contexts[tid % MAX_TX_URBS];
> +
> +	/* Sometimes the state change doesn't come after a bus-off event */
> +	if (priv->can.restart_ms &&
> +	    (priv->can.state >= CAN_STATE_BUS_OFF)) {
> +		skb = alloc_can_err_skb(priv->netdev, &cf);
> +		if (skb) {
> +			cf->can_id |= CAN_ERR_RESTARTED;
> +			netif_rx(skb);
> +
> +			stats->rx_packets++;
> +			stats->rx_bytes += cf->can_dlc;
> +		} else {
> +			netdev_err(priv->netdev,
> +				   "No memory left for err_skb\n");
> +		}
> +
> +		priv->can.can_stats.restarts++;
> +		netif_carrier_on(priv->netdev);
> +
> +		priv->can.state = CAN_STATE_ERROR_ACTIVE;
> +	}
> +
> +	stats->tx_packets++;
> +	stats->tx_bytes += context->dlc;
> +	can_get_echo_skb(priv->netdev, context->echo_index);
> +
> +	context->echo_index = MAX_TX_URBS;
> +	atomic_dec(&priv->active_tx_urbs);
> +
> +	netif_wake_queue(priv->netdev);
> +}
> +
> +static void kvaser_usb_simple_msg_callback(struct urb *urb)
> +{
> +	struct net_device *netdev = urb->context;
> +
> +	kfree(urb->transfer_buffer);
> +
> +	if (urb->status)
> +		netdev_warn(netdev, "urb status received: %d\n",
> +			    urb->status);
> +}
> +
> +static int kvaser_usb_simple_msg_async(struct kvaser_usb_net_priv *priv,
> +				       u8 msg_id)
> +{
> +	struct kvaser_usb *dev = priv->dev;
> +	struct net_device *netdev = priv->netdev;
> +	struct kvaser_msg *msg;
> +	struct urb *urb;
> +	void *buf;
> +	int err;
> +
> +	urb = usb_alloc_urb(0, GFP_ATOMIC);
> +	if (!urb) {
> +		netdev_err(netdev, "No memory left for URBs\n");
> +		return -ENOMEM;
> +	}
> +
> +	buf = kmalloc(sizeof(struct kvaser_msg), GFP_ATOMIC);
> +	if (!buf) {
> +		netdev_err(netdev, "No memory left for USB buffer\n");
> +		usb_free_urb(urb);
> +		return -ENOMEM;
> +	}
> +
> +	msg = (struct kvaser_msg *)buf;
> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_simple);
> +	msg->id = msg_id;
> +	msg->u.simple.channel = priv->channel;
> +
> +	usb_fill_bulk_urb(urb, dev->udev,
> +			  usb_sndbulkpipe(dev->udev,
> +					  dev->bulk_out->bEndpointAddress),
> +			  buf, msg->len,
> +			  kvaser_usb_simple_msg_callback, priv);
> +	usb_anchor_urb(urb, &priv->tx_submitted);
> +
> +	err = usb_submit_urb(urb, GFP_ATOMIC);
> +	if (err) {
> +		netdev_err(netdev, "Error transmitting URB\n");
> +		usb_unanchor_urb(urb);
> +		usb_free_urb(urb);
> +		kfree(buf);
> +		return err;
> +	}
> +
> +	usb_free_urb(urb);
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_unlink_tx_urbs(struct kvaser_usb_net_priv *priv)
> +{
> +	int i;
> +
> +	usb_kill_anchored_urbs(&priv->tx_submitted);
> +	atomic_set(&priv->active_tx_urbs, 0);
> +
> +	for (i = 0; i < MAX_TX_URBS; i++)
> +		priv->tx_contexts[i].echo_index = MAX_TX_URBS;
> +}
> +
> +static void kvaser_usb_rx_error(const struct kvaser_usb *dev,
> +				const struct kvaser_msg *msg)
> +{
> +	struct can_frame *cf;
> +	struct sk_buff *skb;
> +	struct net_device_stats *stats;
> +	struct kvaser_usb_net_priv *priv;
> +	unsigned int new_state;
> +	u8 channel, status, txerr, rxerr, error_factor;
> +
> +	switch (msg->id) {
> +	case CMD_CAN_ERROR_EVENT:
> +		channel = msg->u.error_event.channel;
> +		status =  msg->u.error_event.status;
> +		txerr = msg->u.error_event.tx_errors_count;
> +		rxerr = msg->u.error_event.rx_errors_count;
> +		error_factor = msg->u.error_event.error_factor;
> +		break;
> +	case CMD_LOG_MESSAGE:
> +		channel = msg->u.log_message.channel;
> +		status = msg->u.log_message.data[0];
> +		txerr = msg->u.log_message.data[2];
> +		rxerr = msg->u.log_message.data[3];
> +		error_factor = msg->u.log_message.data[1];
> +		break;
> +	case CMD_CHIP_STATE_EVENT:
> +		channel = msg->u.chip_state_event.channel;
> +		status =  msg->u.chip_state_event.status;
> +		txerr = msg->u.chip_state_event.tx_errors_count;
> +		rxerr = msg->u.chip_state_event.rx_errors_count;
> +		error_factor = 0;
> +		break;
> +	default:
> +		dev_err(dev->udev->dev.parent, "Invalid msg id (%d)\n",
> +			msg->id);
> +		return;
> +	}
> +
> +	if (channel >= dev->nchannels) {
> +		dev_err(dev->udev->dev.parent,
> +			"Invalid channel number (%d)\n", channel);
> +		return;
> +	}
> +
> +	priv = dev->nets[channel];
> +	stats = &priv->netdev->stats;
> +
> +	if (status & M16C_STATE_BUS_RESET) {
> +		kvaser_usb_unlink_tx_urbs(priv);
> +		return;
> +	}
> +
> +	skb = alloc_can_err_skb(priv->netdev, &cf);
> +	if (!skb) {
> +		stats->rx_dropped++;
> +		return;
> +	}
> +
> +	new_state = priv->can.state;
> +
> +	netdev_dbg(priv->netdev, "Error status: 0x%02x\n", status);
> +
> +	if (status & M16C_STATE_BUS_OFF) {
> +		cf->can_id |= CAN_ERR_BUSOFF;
> +
> +		priv->can.can_stats.bus_off++;
> +		if (!priv->can.restart_ms)
> +			kvaser_usb_simple_msg_async(priv, CMD_STOP_CHIP);
> +
> +		netif_carrier_off(priv->netdev);
> +
> +		new_state = CAN_STATE_BUS_OFF;
> +	} else if (status & M16C_STATE_BUS_PASSIVE) {
> +		if (priv->can.state != CAN_STATE_ERROR_PASSIVE) {
> +			cf->can_id |= CAN_ERR_CRTL;
> +
> +			if (txerr || rxerr)
> +				cf->data[1] = (txerr > rxerr)
> +						? CAN_ERR_CRTL_TX_PASSIVE
> +						: CAN_ERR_CRTL_RX_PASSIVE;
> +			else
> +				cf->data[1] = CAN_ERR_CRTL_TX_PASSIVE |
> +					      CAN_ERR_CRTL_RX_PASSIVE;
> +
> +			priv->can.can_stats.error_passive++;
> +		}
> +
> +		new_state = CAN_STATE_ERROR_PASSIVE;
> +	}
> +
> +	if (status == M16C_STATE_BUS_ERROR) {
> +		if ((priv->can.state < CAN_STATE_ERROR_WARNING) &&
> +		    ((txerr >= 96) || (rxerr >= 96))) {
> +			cf->can_id |= CAN_ERR_CRTL;
> +			cf->data[1] = (txerr > rxerr)
> +					? CAN_ERR_CRTL_TX_WARNING
> +					: CAN_ERR_CRTL_RX_WARNING;
> +
> +			priv->can.can_stats.error_warning++;
> +			new_state = CAN_STATE_ERROR_WARNING;
> +		} else if (priv->can.state > CAN_STATE_ERROR_ACTIVE) {
> +			cf->can_id |= CAN_ERR_PROT;
> +			cf->data[2] = CAN_ERR_PROT_ACTIVE;
> +
> +			new_state = CAN_STATE_ERROR_ACTIVE;
> +		}
> +	}
> +
> +	if (!status) {
> +		cf->can_id |= CAN_ERR_PROT;
> +		cf->data[2] = CAN_ERR_PROT_ACTIVE;
> +
> +		new_state = CAN_STATE_ERROR_ACTIVE;
> +	}
> +
> +	if (priv->can.restart_ms &&
> +	    (priv->can.state >= CAN_STATE_BUS_OFF) &&
> +	    (new_state < CAN_STATE_BUS_OFF)) {
> +		cf->can_id |= CAN_ERR_RESTARTED;
> +		netif_carrier_on(priv->netdev);
> +
> +		priv->can.can_stats.restarts++;
> +	}
> +
> +	if (error_factor) {
> +		priv->can.can_stats.bus_error++;
> +		stats->rx_errors++;
> +
> +		cf->can_id |= CAN_ERR_BUSERROR | CAN_ERR_PROT;
> +
> +		if (error_factor & M16C_EF_ACKE)
> +			cf->data[3] |= (CAN_ERR_PROT_LOC_ACK);
> +		if (error_factor & M16C_EF_CRCE)
> +			cf->data[3] |= (CAN_ERR_PROT_LOC_CRC_SEQ |
> +					CAN_ERR_PROT_LOC_CRC_DEL);
> +		if (error_factor & M16C_EF_FORME)
> +			cf->data[2] |= CAN_ERR_PROT_FORM;
> +		if (error_factor & M16C_EF_STFE)
> +			cf->data[2] |= CAN_ERR_PROT_STUFF;
> +		if (error_factor & M16C_EF_BITE0)
> +			cf->data[2] |= CAN_ERR_PROT_BIT0;
> +		if (error_factor & M16C_EF_BITE1)
> +			cf->data[2] |= CAN_ERR_PROT_BIT1;
> +		if (error_factor & M16C_EF_TRE)
> +			cf->data[2] |= CAN_ERR_PROT_TX;
> +	}
> +
> +	cf->data[6] = txerr;
> +	cf->data[7] = rxerr;
> +
> +	priv->bec.txerr = txerr;
> +	priv->bec.rxerr = rxerr;
> +
> +	priv->can.state = new_state;
> +
> +	netif_rx(skb);
> +
> +	stats->rx_packets++;
> +	stats->rx_bytes += cf->can_dlc;
> +}
> +
> +static void kvaser_usb_rx_can_err(const struct kvaser_usb_net_priv *priv,
> +				  const struct kvaser_msg *msg)
> +{
> +	struct can_frame *cf;
> +	struct sk_buff *skb;
> +	struct net_device_stats *stats = &priv->netdev->stats;
> +
> +	if (msg->u.rx_can.flag & (MSG_FLAG_ERROR_FRAME |
> +					 MSG_FLAG_NERR)) {
> +		netdev_err(priv->netdev, "Unknow error (flags: 0x%02x)\n",
> +			   msg->u.rx_can.flag);
> +
> +		stats->rx_errors++;
> +		return;
> +	}
> +
> +	if (msg->u.rx_can.flag & MSG_FLAG_OVERRUN) {
> +		skb = alloc_can_err_skb(priv->netdev, &cf);
> +		if (!skb) {
> +			stats->rx_dropped++;
> +			return;
> +		}
> +
> +		cf->can_id |= CAN_ERR_CRTL;
> +		cf->data[1] = CAN_ERR_CRTL_RX_OVERFLOW;
> +
> +		stats->rx_over_errors++;
> +		stats->rx_errors++;
> +
> +		netif_rx(skb);
> +
> +		stats->rx_packets++;
> +		stats->rx_bytes += cf->can_dlc;
> +	}
> +}
> +
> +static void kvaser_usb_rx_can_msg(const struct kvaser_usb *dev,
> +				  const struct kvaser_msg *msg)
> +{
> +	struct kvaser_usb_net_priv *priv;
> +	struct can_frame *cf;
> +	struct sk_buff *skb;
> +	struct net_device_stats *stats;
> +	u8 channel = msg->u.rx_can.channel;
> +
> +	if (channel >= dev->nchannels) {
> +		dev_err(dev->udev->dev.parent,
> +			"Invalid channel number (%d)\n", channel);
> +		return;
> +	}
> +
> +	priv = dev->nets[channel];
> +	stats = &priv->netdev->stats;
> +
> +	if (msg->u.rx_can.flag & (MSG_FLAG_ERROR_FRAME | MSG_FLAG_NERR |
> +				  MSG_FLAG_OVERRUN)) {
> +		kvaser_usb_rx_can_err(priv, msg);
> +		return;
> +	} else if (msg->u.rx_can.flag & ~MSG_FLAG_REMOTE_FRAME) {
> +		netdev_warn(priv->netdev,
> +			    "Unhandled frame (flags: 0x%02x)",
> +			    msg->u.rx_can.flag);
> +		return;
> +	}
> +
> +	skb = alloc_can_skb(priv->netdev, &cf);
> +	if (!skb) {
> +		stats->tx_dropped++;
> +		return;
> +	}
> +
> +	cf->can_id = ((msg->u.rx_can.msg[0] & 0x1f) << 6) |
> +		     (msg->u.rx_can.msg[1] & 0x3f);
> +	cf->can_dlc = get_can_dlc(msg->u.rx_can.msg[5]);
> +
> +	if (msg->id == CMD_RX_EXT_MESSAGE) {
> +		cf->can_id <<= 18;
> +		cf->can_id |= ((msg->u.rx_can.msg[2] & 0x0f) << 14) |
> +			      ((msg->u.rx_can.msg[3] & 0xff) << 6) |
> +			      (msg->u.rx_can.msg[4] & 0x3f);
> +		cf->can_id |= CAN_EFF_FLAG;
> +	}
> +
> +	if (msg->u.rx_can.flag & MSG_FLAG_REMOTE_FRAME)
> +		cf->can_id |= CAN_RTR_FLAG;
> +	else
> +		memcpy(cf->data, &msg->u.rx_can.msg[6], cf->can_dlc);
> +
> +	netif_rx(skb);
> +
> +	stats->rx_packets++;
> +	stats->rx_bytes += cf->can_dlc;
> +}
> +
> +static void kvaser_usb_start_chip_reply(const struct kvaser_usb *dev,
> +					const struct kvaser_msg *msg)
> +{
> +	struct kvaser_usb_net_priv *priv;
> +	u8 channel = msg->u.simple.channel;
> +
> +	if (channel >= dev->nchannels) {
> +		dev_err(dev->udev->dev.parent,
> +			"Invalid channel number (%d)\n", channel);
> +		return;
> +	}
> +
> +	priv = dev->nets[channel];
> +
> +	if (completion_done(&priv->start_comp) &&
> +	    netif_queue_stopped(priv->netdev)) {
> +		netif_wake_queue(priv->netdev);
> +	} else {
> +		netif_start_queue(priv->netdev);
> +		complete(&priv->start_comp);
> +	}
> +}
> +
> +static void kvaser_usb_stop_chip_reply(const struct kvaser_usb *dev,
> +				       const struct kvaser_msg *msg)
> +{
> +	struct kvaser_usb_net_priv *priv;
> +	u8 channel = msg->u.simple.channel;
> +
> +	if (channel >= dev->nchannels) {
> +		dev_err(dev->udev->dev.parent,
> +			"Invalid channel number (%d)\n", channel);
> +		return;
> +	}
> +
> +	priv = dev->nets[channel];
> +
> +	complete(&priv->stop_comp);
> +}
> +
> +static void kvaser_usb_handle_message(const struct kvaser_usb *dev,
> +				      const struct kvaser_msg *msg)
> +{
> +	switch (msg->id) {
> +	case CMD_START_CHIP_REPLY:
> +		kvaser_usb_start_chip_reply(dev, msg);
> +		break;
> +
> +	case CMD_STOP_CHIP_REPLY:
> +		kvaser_usb_stop_chip_reply(dev, msg);
> +		break;
> +
> +	case CMD_RX_STD_MESSAGE:
> +	case CMD_RX_EXT_MESSAGE:
> +		kvaser_usb_rx_can_msg(dev, msg);
> +		break;
> +
> +	case CMD_CHIP_STATE_EVENT:
> +	case CMD_CAN_ERROR_EVENT:
> +		kvaser_usb_rx_error(dev, msg);
> +		break;
> +
> +	case CMD_LOG_MESSAGE:
> +		if (msg->u.log_message.flags & MSG_FLAG_ERROR_FRAME)
> +			kvaser_usb_rx_error(dev, msg);
> +		break;
> +
> +	case CMD_TX_ACKNOWLEDGE:
> +		kvaser_usb_tx_acknowledge(dev, msg);
> +		break;
> +
> +	default:
> +		dev_warn(dev->udev->dev.parent,
> +			 "Unhandled message (%d)\n", msg->id);
> +		break;
> +	}
> +}
> +
> +static void kvaser_usb_read_bulk_callback(struct urb *urb)
> +{
> +	struct kvaser_usb *dev = urb->context;
> +	struct kvaser_msg *msg;
> +	int pos = 0;
> +	int err, i;
> +
> +	switch (urb->status) {
> +	case 0:
> +		break;
> +	case -ENOENT:
> +	case -ESHUTDOWN:
> +		return;
> +	default:
> +		dev_info(dev->udev->dev.parent, "Rx URB aborted (%d)\n",
> +			 urb->status);
> +		goto resubmit_urb;
> +	}
> +
> +	while (pos <= urb->actual_length - MSG_HEADER_LEN) {
> +		msg = urb->transfer_buffer + pos;
> +
> +		if (!msg->len)
> +			break;
> +
> +		if (pos + msg->len > urb->actual_length) {
> +			dev_err(dev->udev->dev.parent, "Format error\n");
> +			break;
> +		}
> +
> +		kvaser_usb_handle_message(dev, msg);
> +
> +		pos += msg->len;
> +	}
> +
> +resubmit_urb:
> +	usb_fill_bulk_urb(urb, dev->udev,
> +			  usb_rcvbulkpipe(dev->udev,
> +					  dev->bulk_in->bEndpointAddress),
> +			  urb->transfer_buffer, RX_BUFFER_SIZE,
> +			  kvaser_usb_read_bulk_callback, dev);
> +
> +	err = usb_submit_urb(urb, GFP_ATOMIC);
> +	if (err == -ENODEV) {
> +		for (i = 0; i < dev->nchannels; i++) {
> +			if (!dev->nets[i])
> +				continue;
> +
> +			netif_device_detach(dev->nets[i]->netdev);
> +		}
> +	} else if (err) {
> +		dev_err(dev->udev->dev.parent,
> +			"Failed resubmitting read bulk urb: %d\n", err);
> +	}
> +
> +	return;
> +}
> +
> +static int kvaser_usb_setup_rx_urbs(struct kvaser_usb *dev)
> +{
> +	int i, err = 0;
> +
> +	if (dev->rxinitdone)
> +		return 0;
> +
> +	for (i = 0; i < MAX_RX_URBS; i++) {
> +		struct urb *urb = NULL;
> +		u8 *buf = NULL;
> +		dma_addr_t buf_dma;
> +
> +		urb = usb_alloc_urb(0, GFP_KERNEL);
> +		if (!urb) {
> +			dev_warn(dev->udev->dev.parent,
> +				 "No memory left for URBs\n");
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		buf = usb_alloc_coherent(dev->udev, RX_BUFFER_SIZE,
> +					 GFP_KERNEL, &buf_dma);
> +		if (!buf) {
> +			dev_warn(dev->udev->dev.parent,
> +				 "No memory left for USB buffer\n");
> +			usb_free_urb(urb);
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		usb_fill_bulk_urb(urb, dev->udev,
> +				  usb_rcvbulkpipe(dev->udev,
> +					  dev->bulk_in->bEndpointAddress),
> +				  buf, RX_BUFFER_SIZE,
> +				  kvaser_usb_read_bulk_callback,
> +				  dev);
> +		urb->transfer_dma = buf_dma;
> +		urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
> +		usb_anchor_urb(urb, &dev->rx_submitted);
> +
> +		err = usb_submit_urb(urb, GFP_KERNEL);
> +		if (err) {
> +			usb_unanchor_urb(urb);
> +			usb_free_coherent(dev->udev, RX_BUFFER_SIZE, buf,
> +					  buf_dma);
> +			usb_free_urb(urb);
> +			break;
> +		}
> +
> +		dev->rxbuf[i] = buf;
> +		dev->rxbuf_dma[i] = buf_dma;
> +
> +		usb_free_urb(urb);
> +	}
> +
> +	if (i == 0) {
> +		dev_warn(dev->udev->dev.parent,
> +			 "Cannot setup read URBs, error %d\n", err);
> +		return err;
> +	} else if (i < MAX_RX_URBS) {
> +		dev_warn(dev->udev->dev.parent,
> +			 "RX performances may be slow\n");
> +	}
> +
> +	dev->rxinitdone = true;
> +
> +	return 0;
> +}
> +
> +static int kvaser_usb_set_opt_mode(const struct kvaser_usb_net_priv *priv)
> +{
> +	struct kvaser_msg msg = {
> +		.id = CMD_SET_CTRL_MODE,
> +		.len = MSG_HEADER_LEN +
> +		       sizeof(struct kvaser_msg_ctrl_mode),
> +		.u.ctrl_mode.tid = 0xff,
> +		.u.ctrl_mode.channel = priv->channel,
> +	};
> +
> +	if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
> +		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_SILENT;
> +	else
> +		msg.u.ctrl_mode.ctrl_mode = KVASER_CTRL_MODE_NORMAL;
> +
> +	return kvaser_usb_send_msg(priv->dev, &msg);
> +}
> +
> +static int kvaser_usb_start_chip(struct kvaser_usb_net_priv *priv)
> +{
> +	int err;
> +
> +	init_completion(&priv->start_comp);
> +
> +	err = kvaser_usb_send_simple_msg(priv->dev, CMD_START_CHIP,
> +					 priv->channel);
> +	if (err)
> +		return err;
> +
> +	if (!wait_for_completion_timeout(&priv->start_comp,
> +					 msecs_to_jiffies(START_TIMEOUT)))
> +		return -ETIMEDOUT;
> +
> +	return 0;
> +}
> +
> +static int kvaser_usb_open(struct net_device *netdev)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +	struct kvaser_usb *dev = priv->dev;
> +	int err;
> +
> +	err = open_candev(netdev);
> +	if (err)
> +		return err;
> +
> +	err = kvaser_usb_setup_rx_urbs(dev);
> +	if (err)
> +		goto error;
> +
> +	err = kvaser_usb_set_opt_mode(priv);
> +	if (err)
> +		goto error;
> +
> +	err = kvaser_usb_start_chip(priv);
> +	if (err) {
> +		netdev_warn(netdev, "Cannot start device, error %d\n", err);
> +		goto error;
> +	}
> +
> +	priv->can.state = CAN_STATE_ERROR_ACTIVE;
> +
> +	return 0;
> +
> +error:
> +	close_candev(netdev);
> +	return err;
> +}
> +
> +static void kvaser_usb_unlink_all_urbs(struct kvaser_usb *dev)
> +{
> +	int i;
> +
> +	usb_kill_anchored_urbs(&dev->rx_submitted);
> +
> +	for (i = 0; i < MAX_RX_URBS; i++)
> +		usb_free_coherent(dev->udev, RX_BUFFER_SIZE,
> +				  dev->rxbuf[i],
> +				  dev->rxbuf_dma[i]);
> +
> +	for (i = 0; i < MAX_NET_DEVICES; i++) {
> +		struct kvaser_usb_net_priv *priv = dev->nets[i];
> +
> +		if (priv)
> +			kvaser_usb_unlink_tx_urbs(priv);
> +	}
> +}
> +
> +static int kvaser_usb_stop_chip(struct kvaser_usb_net_priv *priv)
> +{
> +	int err;
> +
> +	init_completion(&priv->stop_comp);
> +
> +	err = kvaser_usb_send_simple_msg(priv->dev, CMD_STOP_CHIP,
> +					 priv->channel);
> +	if (err)
> +		return err;
> +
> +	if (!wait_for_completion_timeout(&priv->stop_comp,
> +					 msecs_to_jiffies(STOP_TIMEOUT)))
> +		return -ETIMEDOUT;
> +
> +	return 0;
> +}
> +
> +static int kvaser_usb_flush_queue(struct kvaser_usb_net_priv *priv)
> +{
> +	struct kvaser_msg msg = {
> +		.id = CMD_FLUSH_QUEUE,
> +		.len = MSG_HEADER_LEN +
> +		       sizeof(struct kvaser_msg_flush_queue),
> +		.u.flush_queue.channel = priv->channel,
> +		.u.flush_queue.flags = 0x00,
> +	};
> +
> +	return kvaser_usb_send_msg(priv->dev, &msg);
> +}
> +
> +static int kvaser_usb_close(struct net_device *netdev)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +	struct kvaser_usb *dev = priv->dev;
> +	int err;
> +
> +	netif_stop_queue(netdev);
> +
> +	err = kvaser_usb_flush_queue(priv);
> +	if (err)
> +		netdev_warn(netdev, "Cannot flush queue, error %d\n", err);
> +
> +	if (kvaser_usb_send_simple_msg(dev, CMD_RESET_CHIP, priv->channel))
> +		netdev_warn(netdev, "Cannot reset card, error %d\n", err);
> +
> +	err = kvaser_usb_stop_chip(priv);
> +	if (err)
> +		netdev_warn(netdev, "Cannot stop device, error %d\n", err);
> +
> +	priv->can.state = CAN_STATE_STOPPED;
> +	close_candev(priv->netdev);
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_write_bulk_callback(struct urb *urb)
> +{
> +	struct kvaser_usb_tx_urb_context *context = urb->context;
> +	struct kvaser_usb_net_priv *priv;
> +	struct net_device *netdev;
> +
> +	if (WARN_ON(!context))
> +		return;
> +
> +	priv = context->priv;
> +	netdev = priv->netdev;
> +
> +	kfree(urb->transfer_buffer);
> +
> +	if (!netif_device_present(netdev))
> +		return;
> +
> +	if (urb->status)
> +		netdev_info(netdev, "Tx URB aborted (%d)\n", urb->status);
> +}
> +
> +static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb,
> +					 struct net_device *netdev)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +	struct kvaser_usb *dev = priv->dev;
> +	struct net_device_stats *stats = &netdev->stats;
> +	struct can_frame *cf = (struct can_frame *)skb->data;
> +	struct kvaser_usb_tx_urb_context *context = NULL;
> +	struct urb *urb;
> +	void *buf;
> +	struct kvaser_msg *msg;
> +	int i, err;
> +	int ret = NETDEV_TX_OK;
> +
> +	if (can_dropped_invalid_skb(netdev, skb))
> +		return NETDEV_TX_OK;
> +
> +	urb = usb_alloc_urb(0, GFP_ATOMIC);
> +	if (!urb) {
> +		netdev_err(netdev, "No memory left for URBs\n");
> +		stats->tx_dropped++;
> +		goto nourbmem;
> +	}
> +
> +	buf = kmalloc(sizeof(struct kvaser_msg), GFP_ATOMIC);
> +	if (!buf) {
> +		netdev_err(netdev, "No memory left for USB buffer\n");
> +		stats->tx_dropped++;
> +		goto nobufmem;
> +	}
> +
> +	msg = buf;
> +	msg->len = MSG_HEADER_LEN + sizeof(struct kvaser_msg_tx_can);
> +	msg->u.tx_can.flags = 0;
> +	msg->u.tx_can.channel = priv->channel;
> +
> +	if (cf->can_id & CAN_EFF_FLAG) {
> +		msg->id = CMD_TX_EXT_MESSAGE;
> +		msg->u.tx_can.msg[0] = (cf->can_id >> 24) & 0x1f;
> +		msg->u.tx_can.msg[1] = (cf->can_id >> 18) & 0x3f;
> +		msg->u.tx_can.msg[2] = (cf->can_id >> 14) & 0x0f;
> +		msg->u.tx_can.msg[3] = (cf->can_id >> 6) & 0xff;
> +		msg->u.tx_can.msg[4] = cf->can_id & 0x3f;
> +	} else {
> +		msg->id = CMD_TX_STD_MESSAGE;
> +		msg->u.tx_can.msg[0] = (cf->can_id >> 6) & 0x1f;
> +		msg->u.tx_can.msg[1] = cf->can_id & 0x3f;
> +	}
> +
> +	msg->u.tx_can.msg[5] = cf->can_dlc;
> +	memcpy(&msg->u.tx_can.msg[6], cf->data, cf->can_dlc);
> +
> +	if (cf->can_id & CAN_RTR_FLAG)
> +		msg->u.tx_can.flags |= MSG_FLAG_REMOTE_FRAME;
> +
> +	for (i = 0; i < ARRAY_SIZE(priv->tx_contexts); i++) {
> +		if (priv->tx_contexts[i].echo_index == MAX_TX_URBS) {
> +			context = &priv->tx_contexts[i];
> +			break;
> +		}
> +	}
> +
> +	if (!context) {
> +		netdev_warn(netdev, "cannot find free context\n");
> +		ret =  NETDEV_TX_BUSY;
> +		goto releasebuf;
> +	}
> +
> +	context->priv = priv;
> +	context->echo_index = i;
> +	context->dlc = cf->can_dlc;
> +
> +	msg->u.tx_can.tid = context->echo_index;
> +
> +	usb_fill_bulk_urb(urb, dev->udev,
> +			  usb_sndbulkpipe(dev->udev,
> +					  dev->bulk_out->bEndpointAddress),
> +			  buf, msg->len,
> +			  kvaser_usb_write_bulk_callback, context);
> +	usb_anchor_urb(urb, &priv->tx_submitted);
> +
> +	can_put_echo_skb(skb, netdev, context->echo_index);
> +
> +	atomic_inc(&priv->active_tx_urbs);
> +
> +	if (atomic_read(&priv->active_tx_urbs) >= MAX_TX_URBS)
> +		netif_stop_queue(netdev);
> +
> +	err = usb_submit_urb(urb, GFP_ATOMIC);
> +	if (unlikely(err)) {
> +		can_free_echo_skb(netdev, context->echo_index);
> +
> +		skb = NULL; /* set to NULL to avoid double free in
> +			     * dev_kfree_skb(skb) */
> +
> +		atomic_dec(&priv->active_tx_urbs);
> +		usb_unanchor_urb(urb);
> +
> +		stats->tx_dropped++;
> +
> +		if (err == -ENODEV)
> +			netif_device_detach(netdev);
> +		else
> +			netdev_warn(netdev, "Failed tx_urb %d\n", err);
> +
> +		goto releasebuf;
> +	}
> +
> +	usb_free_urb(urb);
> +
> +	return NETDEV_TX_OK;
> +
> +releasebuf:
> +	kfree(buf);
> +nobufmem:
> +	usb_free_urb(urb);
> +nourbmem:
> +	dev_kfree_skb(skb);
> +	return ret;
> +}
> +
> +static const struct net_device_ops kvaser_usb_netdev_ops = {
> +	.ndo_open = kvaser_usb_open,
> +	.ndo_stop = kvaser_usb_close,
> +	.ndo_start_xmit = kvaser_usb_start_xmit,
> +};
> +
> +static struct can_bittiming_const kvaser_usb_bittiming_const = {
> +	.name = "kvaser_usb",
> +	.tseg1_min = KVASER_USB_TSEG1_MIN,
> +	.tseg1_max = KVASER_USB_TSEG1_MAX,
> +	.tseg2_min = KVASER_USB_TSEG2_MIN,
> +	.tseg2_max = KVASER_USB_TSEG2_MAX,
> +	.sjw_max = KVASER_USB_SJW_MAX,
> +	.brp_min = KVASER_USB_BRP_MIN,
> +	.brp_max = KVASER_USB_BRP_MAX,
> +	.brp_inc = KVASER_USB_BRP_INC,
> +};
> +
> +static int kvaser_usb_set_bittiming(struct net_device *netdev)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +	struct can_bittiming *bt = &priv->can.bittiming;
> +	struct kvaser_usb *dev = priv->dev;
> +	struct kvaser_msg msg = {
> +		.id = CMD_SET_BUS_PARAMS,
> +		.len = MSG_HEADER_LEN +
> +		       sizeof(struct kvaser_msg_busparams),
> +		.u.busparams.channel = priv->channel,
> +		.u.busparams.tid = 0xff,
> +		.u.busparams.bitrate = cpu_to_le32(bt->bitrate),
> +		.u.busparams.sjw = bt->sjw,
> +		.u.busparams.tseg1 = bt->prop_seg + bt->phase_seg1,
> +		.u.busparams.tseg2 = bt->phase_seg2,
> +	};
> +
> +	if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
> +		msg.u.busparams.no_samp = 3;
> +	else
> +		msg.u.busparams.no_samp = 1;
> +
> +	return kvaser_usb_send_msg(dev, &msg);
> +}
> +
> +static int kvaser_usb_set_mode(struct net_device *netdev,
> +			       enum can_mode mode)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +	int err;
> +
> +	switch (mode) {
> +	case CAN_MODE_START:
> +		err = kvaser_usb_simple_msg_async(priv, CMD_START_CHIP);
> +		if (err)
> +			return err;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kvaser_usb_get_berr_counter(const struct net_device *netdev,
> +				       struct can_berr_counter *bec)
> +{
> +	struct kvaser_usb_net_priv *priv = netdev_priv(netdev);
> +
> +	*bec = priv->bec;
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_remove_interfaces(struct kvaser_usb *dev)
> +{
> +	int i;
> +
> +	for (i = 0; i < dev->nchannels; i++) {
> +		if (!dev->nets[i])
> +			continue;
> +
> +		unregister_netdev(dev->nets[i]->netdev);
> +	}
> +
> +	kvaser_usb_unlink_all_urbs(dev);
> +
> +	for (i = 0; i < dev->nchannels; i++) {
> +		if (!dev->nets[i])
> +			continue;
> +
> +		free_candev(dev->nets[i]->netdev);
> +	}
> +}
> +
> +static int kvaser_usb_init_one(struct usb_interface *intf,
> +			       const struct usb_device_id *id, int channel)
> +{
> +	struct kvaser_usb *dev = usb_get_intfdata(intf);
> +	struct net_device *netdev;
> +	struct kvaser_usb_net_priv *priv;
> +	int i, err;
> +
> +	netdev = alloc_candev(sizeof(*priv), MAX_TX_URBS);
> +	if (!netdev) {
> +		dev_err(&intf->dev, "Cannot alloc candev\n");
> +		return -ENOMEM;
> +	}
> +
> +	priv = netdev_priv(netdev);
> +
> +	init_completion(&priv->start_comp);
> +	init_completion(&priv->stop_comp);
> +
> +	init_usb_anchor(&priv->tx_submitted);
> +	atomic_set(&priv->active_tx_urbs, 0);
> +
> +	for (i = 0; i < ARRAY_SIZE(priv->tx_contexts); i++)
> +		priv->tx_contexts[i].echo_index = MAX_TX_URBS;
> +
> +	priv->dev = dev;
> +	priv->netdev = netdev;
> +	priv->channel = channel;
> +
> +	priv->can.state = CAN_STATE_STOPPED;
> +	priv->can.clock.freq = CAN_USB_CLOCK;
> +	priv->can.bittiming_const = &kvaser_usb_bittiming_const;
> +	priv->can.do_set_bittiming = kvaser_usb_set_bittiming;
> +	priv->can.do_set_mode = kvaser_usb_set_mode;
> +	if (id->driver_info & KVASER_HAS_TXRX_ERRORS)
> +		priv->can.do_get_berr_counter = kvaser_usb_get_berr_counter;
> +	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES;
> +	if (id->driver_info & KVASER_HAS_SILENT_MODE)
> +		priv->can.ctrlmode_supported |= CAN_CTRLMODE_LISTENONLY;
> +
> +	netdev->flags |= IFF_ECHO;
> +
> +	netdev->netdev_ops = &kvaser_usb_netdev_ops;
> +
> +	SET_NETDEV_DEV(netdev, &intf->dev);
> +
> +	dev->nets[channel] = priv;
> +
> +	err = register_candev(netdev);
> +	if (err) {
> +		dev_err(&intf->dev, "Failed to register can device\n");
> +		free_candev(netdev);
> +		dev->nets[channel] = NULL;
> +		return err;
> +	}
> +
> +	netdev_dbg(netdev, "device registered\n");
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_get_endpoints(const struct usb_interface *intf,
> +				     struct usb_endpoint_descriptor **in,
> +				     struct usb_endpoint_descriptor **out)
> +{
> +	const struct usb_host_interface *iface_desc;
> +	struct usb_endpoint_descriptor *endpoint;
> +	int i;
> +
> +	iface_desc = &intf->altsetting[0];
> +
> +	for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) {
> +		endpoint = &iface_desc->endpoint[i].desc;
> +
> +		if (usb_endpoint_is_bulk_in(endpoint))
> +			*in = endpoint;
> +
> +		if (usb_endpoint_is_bulk_out(endpoint))
> +			*out = endpoint;
> +	}
> +}
> +
> +static int kvaser_usb_probe(struct usb_interface *intf,
> +			    const struct usb_device_id *id)
> +{
> +	struct kvaser_usb *dev;
> +	int err = -ENOMEM;
> +	int i;
> +
> +	dev = devm_kzalloc(&intf->dev, sizeof(*dev), GFP_KERNEL);
> +	if (!dev)
> +		return -ENOMEM;
> +
> +	kvaser_usb_get_endpoints(intf, &dev->bulk_in, &dev->bulk_out);
> +	if (!dev->bulk_in || !dev->bulk_out) {
> +		dev_err(&intf->dev, "Cannot get usb endpoint(s)");
> +		return err;
> +	}
> +
> +	dev->udev = interface_to_usbdev(intf);
> +
> +	init_usb_anchor(&dev->rx_submitted);
> +
> +	usb_set_intfdata(intf, dev);
> +
> +	for (i = 0; i < MAX_NET_DEVICES; i++)
> +		kvaser_usb_send_simple_msg(dev, CMD_RESET_CHIP, i);
> +
> +	err = kvaser_usb_get_software_info(dev);
> +	if (err) {
> +		dev_err(&intf->dev,
> +			"Cannot get software infos, error %d\n", err);
> +		return err;
> +	}
> +
> +	err = kvaser_usb_get_card_info(dev);
> +	if (err) {
> +		dev_err(&intf->dev,
> +			"Cannot get card infos, error %d\n", err);
> +		return err;
> +	}
> +
> +	dev_dbg(&intf->dev, "Firmware version: %d.%d.%d\n",
> +		((dev->fw_version >> 24) & 0xff),
> +		((dev->fw_version >> 16) & 0xff),
> +		(dev->fw_version & 0xffff));
> +
> +	for (i = 0; i < dev->nchannels; i++) {
> +		err = kvaser_usb_init_one(intf, id, i);
> +		if (err) {
> +			kvaser_usb_remove_interfaces(dev);
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static void kvaser_usb_disconnect(struct usb_interface *intf)
> +{
> +	struct kvaser_usb *dev = usb_get_intfdata(intf);
> +
> +	usb_set_intfdata(intf, NULL);
> +
> +	if (!dev)
> +		return;
> +
> +	kvaser_usb_remove_interfaces(dev);
> +}
> +
> +static struct usb_driver kvaser_usb_driver = {
> +	.name = "kvaser_usb",
> +	.probe = kvaser_usb_probe,
> +	.disconnect = kvaser_usb_disconnect,
> +	.id_table = kvaser_usb_table,
> +};
> +
> +module_usb_driver(kvaser_usb_driver);
> +
> +MODULE_AUTHOR("Olivier Sobrie <olivier@sobrie.be>");
> +MODULE_DESCRIPTION("CAN driver for Kvaser CAN/USB devices");
> +MODULE_LICENSE("GPL v2");
> -- 
> 1.7.9.5
> 

-- 
Olivier

^ permalink raw reply

* [PATCHv4] virtio-spec: virtio network device RFS support
From: Michael S. Tsirkin @ 2012-11-22 14:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, kvm, virtualization

Add RFS support to virtio network device.
Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
packet steering for unidirectional protocols.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

--

Changes from v3:
- rename multiqueue -> rfs this is what we support
- Be more explicit about what driver should do.
- Simplify layout making VQs functionality depend on feature.
- Remove unused commands, only leave in programming # of queues

Changes from v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
  this is both clearer and easier to support.
  It does not look like we need a separate steering command
  since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2

Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
  looks nicer to me.
- documented packet steering, added a generalized steering programming
  command. Current modes are single queue and host driven multiqueue,
  but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
  but makes it more efficient to switch between modes as
  we can avoid this causing packet reordering.

Rusty, could you please take a look and comment soon?
If this looks OK to everyone, we can proceed with finalizing the
implementation. Would be nice to try and put it in 3.8.

---

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index d2f0da9..c1fa3e4 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -59,6 +59,7 @@
 \author -608949062 "Rusty Russell,,," 
 \author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com
 \author 1531152142 "Paolo Bonzini,,," 
+\author 1986246365 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -4170,9 +4171,42 @@ ID 1
 \end_layout
 
 \begin_layout Description
-Virtqueues 0:receiveq.
- 1:transmitq.
- 2:controlq
+Virtqueues 0:receiveq
+\change_inserted 1986246365 1352742829
+0
+\change_unchanged
+.
+ 1:transmitq
+\change_inserted 1986246365 1352742832
+0
+\change_deleted 1986246365 1352742947
+.
+ 
+\change_inserted 1986246365 1352742952
+.
+ ....
+ 2N
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1352743187
+N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is indicated by max_
+\emph on
+virtqueue_pairs control
+\emph default
+ field.
+ 
+\end_layout
+
+\end_inset
+
+: receivqN.
+ 2N+1: transmitqN.
+ 2N+
+\change_unchanged
+2:controlq
 \begin_inset Foot
 status open
 
@@ -4343,6 +4377,16 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1352742767
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1352742808
+VIRTIO_NET_F_RFS(2) Device supports Receive Flow Steering.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4355,11 +4399,44 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1352743300
+Two
+\change_inserted 1986246365 1352743301
+Four
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1353595219
+ The following read-only field, 
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_RFS is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq0..receiveq
+\emph on
+N
+\emph default
+ and transmitq0..transmitq
+\emph on
+N
+\emph default
+ respectively; 
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs
+\emph default
+) that can be configured once VIRTIO_NET_F_RFS is negotiated.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4410,7 +4487,24 @@ Device Initialization
 
 \begin_layout Enumerate
 The initialization routine should identify the receive and transmission
- virtqueues.
+ virtqueues
+\change_inserted 1986246365 1352744077
+, up to N+1 of each kind
+\change_unchanged
+.
+
+\change_inserted 1986246365 1352743942
+ If VIRTIO_NET_F_RFS feature bit is negotiated, 
+\emph on
+N=max_virtqueue_pairs
+\emph default
+, otherwise identify 
+\emph on
+N=0
+\emph default
+.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
@@ -4455,7 +4549,11 @@ status
 \end_layout
 
 \begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+The receive virtqueue
+\change_inserted 1986246365 1352743953
+s
+\change_unchanged
+ should be filled with receive buffers.
  This is described in detail below in 
 \begin_inset Quotes eld
 \end_inset
@@ -4550,8 +4648,15 @@ Device Operation
 \end_layout
 
 \begin_layout Standard
-Packets are transmitted by placing them in the transmitq, and buffers for
- incoming packets are placed in the receiveq.
+Packets are transmitted by placing them in the transmitq
+\change_inserted 1986246365 1353593685
+0..transmitqN
+\change_unchanged
+, and buffers for incoming packets are placed in the receiveq
+\change_inserted 1986246365 1353593692
+0..receiveqN
+\change_unchanged
+.
  In each case, the packet itself is preceeded by a header:
 \end_layout
 
@@ -4861,6 +4966,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the
 struct virtio_net_hdr
 \family default
 .
+\change_inserted 1986246365 1353594518
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594638
+If VIRTIO_NET_F_RFS is negotiated, each of the receiveq0...receiveqN that will
+ be used should be populated with receive buffers.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsection*
@@ -5293,8 +5409,125 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
  
 \end_layout
 
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1353593879
+Packet Receive Flow Steering
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594403
+If the driver negotiates the VIRTIO_NET_F_RFS (depends on VIRTIO_NET_F_CTRL_VQ),
+ it can transmit outgoing packets on one of the multiple transmitq0..transmitqN
+ and ask the device to queue incoming packets into one the multiple receiveq0..rec
+eiveqN depending on the packet flow.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594292
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594178
+
+struct virtio_net_ctrl_rfs {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594212
+
+	u16 virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594263
+
+#define VIRTIO_NET_CTRL_RFC    1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET        0 
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594884
+RFS acceleration is disabled by default.
+ Driver enables RFS by executing the VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET command,
+ specifying the number of the last transmit and receive queue that is going
+ to be used; thus out of transmitq0..transmitqn and receiveq0..receiveqn where
+ 
+\emph on
+n=virtqueue
+\emph default
+_pairs will be used.
+ All these virtqueues must have been pre-configured in advance.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353595328
+Programming of the receive flow classificator is implicit.
+ Transmitting a packet of a specific flow on transmitqX will cause incoming
+ packets for this flow to be steered to receiveqX.
+ For uni-directional protocols, or where no packets have been transmitted
+ yet, device will steer a packet to a random queue out of the specified
+ receiveq0..receiveqn.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353595040
+RFS acceleration is disabled by setting 
+\emph on
+virtqueue_pairs = 0
+\emph default
+ (this is the default).
+ Following this, driver should not transmit new packets on virtqueues other
+ than transmitq0 and device will not steer new packets on virtqueues other
+ than receiveq0.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1353593873
 .
+
+\change_unchanged
  
 \end_layout
 
@@ -6152,13 +6385,7 @@ Virtqueues 0:receiveq(port0).
 status open
 
 \begin_layout Plain Layout
-Ports 
-\change_inserted 1986246365 1347188327
-1
-\change_deleted 1986246365 1347188327
-2
-\change_unchanged
- onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
+Ports 12 onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
 \end_layout
 
 \end_inset
@@ -6185,13 +6412,8 @@ VIRTIO_CONSOLE_F_SIZE
 
 \begin_layout Description
 VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple ports; configurati
-on fields nr_ports and max_nr_ports are valid
-\change_inserted 1986246365 1347188404
-; if this bit is negotiated,
-\change_deleted 1986246365 1347188406
- and
-\change_unchanged
- control virtqueues will be used.
+on fields nr_ports and max_nr_ports are valid; if this bit is negotiated,
+ and control virtqueues will be used.
 \end_layout
 
 \end_deeper
@@ -6260,8 +6482,7 @@ If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the driver can
  spawn multiple ports, not all of which may be attached to a console.
  Some could be generic ports.
  In this case, the control virtqueues are enabled and according to the max_nr_po
-rts configuration-space value, an appropriate number of virtqueues are
- created.
+rts configuration-space value, an appropriate number of virtqueues are created.
  A control message indicating the driver is ready is sent to the host.
  The host can then send control messages for adding new ports to the device.
  After creating and initializing each port, a VIRTIO_CONSOLE_PORT_READY
@@ -6699,14 +6920,9 @@ The driver constructs an array of addresses of memory pages it has previously
 \end_layout
 
 \begin_layout Enumerate
-If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is 
-\change_inserted 1986246365 1347188540
-negotiated
-\change_deleted 1986246365 1347188542
-set
-\change_unchanged
-, the guest may not use these requested pages until that descriptor in the
- deflateq has been used by the device.
+If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiatedset, the guest
+ may not use these requested pages until that descriptor in the deflateq
+ has been used by the device.
 \end_layout
 
 \begin_layout Enumerate

^ permalink raw reply related

* tap devices not receiving packets from a bridge
From: Peter Lieven @ 2012-11-22 14:29 UTC (permalink / raw)
  To: qemu-devel@nongnu.org, netdev

Hi,

is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
a bridge from sending pakets to a tap device?

My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
which is based on Linux 3.2.33.

I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
the tap does not have any TX packets. RX is working fine. I see the packets coming in at
the physical interface on the host, but they are not forwarded to the tap interface.
The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
bridge) again. Unfortunately it seems not to be reproducible.

Maybe this sounds familiar to someone?

Thank you,
Peter

^ permalink raw reply

* Re: [PATCH V3] xen/netfront: handle compound page fragments on transmit
From: Konrad Rzeszutek Wilk @ 2012-11-22 14:15 UTC (permalink / raw)
  To: David Miller
  Cc: ian.campbell, netdev, edumazet, stefan.bader, xen-devel, linux,
	annie.li
In-Reply-To: <20121121.114957.441218567286373485.davem@davemloft.net>

On Wed, Nov 21, 2012 at 11:49:57AM -0500, David Miller wrote:
> From: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Date: Wed, 21 Nov 2012 10:16:27 -0500
> 
> > On Wed, Nov 21, 2012 at 12:02:16PM +0000, Ian Campbell wrote:
> >> An SKB paged fragment can consist of a compound page with order > 0.
> >> However the netchannel protocol deals only in PAGE_SIZE frames.
> >> 
> >> Handle this in xennet_make_frags by iterating over the frames which
> >> make up the page.
> >> 
> >> This is the netfront equivalent to 6a8ed462f16b for netback.
> >> 
> >> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> >> Cc: netdev@vger.kernel.org
> >> Cc: xen-devel@lists.xen.org
> >> Cc: Eric Dumazet <edumazet@google.com>
> >> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> > 
> > Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > David, would you like me to send it to Linus via my tree or are you
> > OK sending him a git pull with this patch (and hopefully some other
> > ones?)
> 
> I'll merge this to Linus via my net tree, thanks.

Great. Thank you for doing it at such short notice and sooo close to the release.

^ permalink raw reply

* [PATCH] 8139cp: enable bql
From: David Woodhouse @ 2012-11-22 13:16 UTC (permalink / raw)
  To: netdev, codel


[-- Attachment #1.1: Type: text/plain, Size: 2137 bytes --]

This adds support for byte queue limits on RTL8139C+

Tested on real hardware.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-By: Dave Täht <dave.taht@bufferbloat.net>
---
dtaht looking over my shoulder and says it seems to be doing the right thing...

--- drivers/net/ethernet/realtek/8139cp.c~	2012-11-21 20:49:47.000000000 +0000
+++ drivers/net/ethernet/realtek/8139cp.c	2012-11-22 13:07:26.119076315 +0000
@@ -648,6 +648,7 @@ static void cp_tx (struct cp_private *cp
 {
 	unsigned tx_head = cp->tx_head;
 	unsigned tx_tail = cp->tx_tail;
+	unsigned bytes_compl = 0, pkts_compl = 0;
 
 	while (tx_tail != tx_head) {
 		struct cp_desc *txd = cp->tx_ring + tx_tail;
@@ -666,6 +667,9 @@ static void cp_tx (struct cp_private *cp
 				 le32_to_cpu(txd->opts1) & 0xffff,
 				 PCI_DMA_TODEVICE);
 
+		bytes_compl += skb->len;
+		pkts_compl++;
+
 		if (status & LastFrag) {
 			if (status & (TxError | TxFIFOUnder)) {
 				netif_dbg(cp, tx_err, cp->dev,
@@ -697,6 +701,7 @@ static void cp_tx (struct cp_private *cp
 
 	cp->tx_tail = tx_tail;
 
+	netdev_completed_queue(cp->dev, pkts_compl, bytes_compl);
 	if (TX_BUFFS_AVAIL(cp) > (MAX_SKB_FRAGS + 1))
 		netif_wake_queue(cp->dev);
 }
@@ -843,6 +848,8 @@ static netdev_tx_t cp_start_xmit (struct
 		wmb();
 	}
 	cp->tx_head = entry;
+
+	netdev_sent_queue(dev, skb->len);
 	netif_dbg(cp, tx_queued, cp->dev, "tx queued, slot %d, skblen %d\n",
 		  entry, skb->len);
 	if (TX_BUFFS_AVAIL(cp) <= (MAX_SKB_FRAGS + 1))
@@ -937,6 +944,8 @@ static void cp_stop_hw (struct cp_privat
 
 	cp->rx_tail = 0;
 	cp->tx_head = cp->tx_tail = 0;
+
+	netdev_reset_queue(cp->dev);
 }
 
 static void cp_reset_hw (struct cp_private *cp)
@@ -981,6 +990,8 @@ static inline void cp_start_hw (struct c
 	cpw32_f(TxRingAddr + 4, (ring_dma >> 16) >> 16);
 
 	cpw8(Cmd, RxOn | TxOn);
+
+	netdev_reset_queue(cp->dev);
 }
 
 static void cp_enable_irq(struct cp_private *cp)


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

^ permalink raw reply

* [PATCH] bonding: in balance-rr mode, set curr_active_slave only if it is up
From: Michal Kubecek @ 2012-11-22 12:48 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, Andy Gospodarek, linux-kernel

If all slaves of a balance-rr bond with ARP monitor are enslaved
with down link state, bond keeps down state even after slaves
go up.

This is caused by bond_enslave() setting curr_active_slave to
first slave not taking into account its link state. As
bond_loadbalance_arp_mon() uses curr_active_slave to identify
whether slave's down->up transition should update bond's link
state, bond stays down even if slaves are up (until first slave
goes from up to down at least once).

Before commit f31c7937 "bonding: start slaves with link down for
ARP monitor", this was masked by slaves always starting in UP
state with ARP monitor (and MII monitor not relying on
curr_active_slave being NULL if there is no slave up).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 drivers/net/bonding/bond_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5f5b69f..c8bff3e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1838,7 +1838,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		 * anyway (it holds no special properties of the bond device),
 		 * so we can change it without calling change_active_interface()
 		 */
-		if (!bond->curr_active_slave)
+		if (!bond->curr_active_slave && new_slave->link == BOND_LINK_UP)
 			bond->curr_active_slave = new_slave;
 
 		break;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] ipv6: adapt connect for repair move
From: Pavel Emelyanov @ 2012-11-22 11:51 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: netdev, criu, linux-kernel, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1353582838-18876-1-git-send-email-avagin@openvz.org>

On 11/22/2012 03:13 PM, Andrey Vagin wrote:
> This is work the same as for ipv4.
> 
> All other hacks about tcp repair are in common code for ipv4 and ipv6,
> so this patch is enough for repairing ipv6 connections.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

> ---
>  net/ipv4/tcp_ipv4.c   | 13 +------------
>  net/ipv4/tcp_output.c |  5 +++++
>  net/ipv6/tcp_ipv6.c   |  2 +-
>  3 files changed, 7 insertions(+), 13 deletions(-)

^ permalink raw reply

* [PATCH] ipv6: adapt connect for repair move
From: Andrey Vagin @ 2012-11-22 11:13 UTC (permalink / raw)
  To: netdev
  Cc: criu, linux-kernel, Andrey Vagin, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Pavel Emelyanov

This is work the same as for ipv4.

All other hacks about tcp repair are in common code for ipv4 and ipv6,
so this patch is enough for repairing ipv6 connections.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 net/ipv4/tcp_ipv4.c   | 13 +------------
 net/ipv4/tcp_output.c |  5 +++++
 net/ipv6/tcp_ipv6.c   |  2 +-
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0c4a643..801eac4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -138,14 +138,6 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 }
 EXPORT_SYMBOL_GPL(tcp_twsk_unique);
 
-static int tcp_repair_connect(struct sock *sk)
-{
-	tcp_connect_init(sk);
-	tcp_finish_connect(sk, NULL);
-
-	return 0;
-}
-
 /* This will initiate an outgoing connection. */
 int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
@@ -250,10 +242,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	inet->inet_id = tp->write_seq ^ jiffies;
 
-	if (likely(!tp->repair))
-		err = tcp_connect(sk);
-	else
-		err = tcp_repair_connect(sk);
+	err = tcp_connect(sk);
 
 	rt = NULL;
 	if (err)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9aac058..695984f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3000,6 +3000,11 @@ int tcp_connect(struct sock *sk)
 
 	tcp_connect_init(sk);
 
+	if (unlikely(tp->repair)) {
+		tcp_finish_connect(sk, NULL);
+		return 0;
+	}
+
 	buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation);
 	if (unlikely(buff == NULL))
 		return -ENOBUFS;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 26175bf..4968a53 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -306,7 +306,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (err)
 		goto late_failure;
 
-	if (!tp->write_seq)
+	if (!tp->write_seq && likely(!tp->repair))
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
 							     np->daddr.s6_addr32,
 							     inet->inet_sport,
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 1/2] netfilter: ipset: Fix range bug in hash:ip,port,net
From: pablo @ 2012-11-22  9:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1353575452-3127-1-git-send-email-pablo@netfilter.org>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Due to the missing ininitalization at adding/deleting entries, when
a plain_ip,port,net element was the object, multiple elements were
added/deleted instead. The bug came from the missing dangling
default initialization.

The error-prone default initialization is corrected in all hash:* types.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipset/ip_set_hash_ip.c        |    4 ++--
 net/netfilter/ipset/ip_set_hash_ipport.c    |    7 +++----
 net/netfilter/ipset/ip_set_hash_ipportip.c  |    7 +++----
 net/netfilter/ipset/ip_set_hash_ipportnet.c |    7 +++++--
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c
index ec3dba5..5c0b785 100644
--- a/net/netfilter/ipset/ip_set_hash_ip.c
+++ b/net/netfilter/ipset/ip_set_hash_ip.c
@@ -173,6 +173,7 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[],
 		return adtfn(set, &nip, timeout, flags);
 	}
 
+	ip_to = ip;
 	if (tb[IPSET_ATTR_IP_TO]) {
 		ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to);
 		if (ret)
@@ -185,8 +186,7 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[],
 		if (!cidr || cidr > 32)
 			return -IPSET_ERR_INVALID_CIDR;
 		ip_set_mask_from_to(ip, ip_to, cidr);
-	} else
-		ip_to = ip;
+	}
 
 	hosts = h->netmask == 32 ? 1 : 2 << (32 - h->netmask - 1);
 
diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c
index 0171f75..6283351 100644
--- a/net/netfilter/ipset/ip_set_hash_ipport.c
+++ b/net/netfilter/ipset/ip_set_hash_ipport.c
@@ -162,7 +162,7 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[],
 	const struct ip_set_hash *h = set->data;
 	ipset_adtfn adtfn = set->variant->adt[adt];
 	struct hash_ipport4_elem data = { };
-	u32 ip, ip_to = 0, p = 0, port, port_to;
+	u32 ip, ip_to, p = 0, port, port_to;
 	u32 timeout = h->timeout;
 	bool with_ports = false;
 	int ret;
@@ -210,7 +210,7 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[],
 		return ip_set_eexist(ret, flags) ? 0 : ret;
 	}
 
-	ip = ntohl(data.ip);
+	ip_to = ip = ntohl(data.ip);
 	if (tb[IPSET_ATTR_IP_TO]) {
 		ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to);
 		if (ret)
@@ -223,8 +223,7 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[],
 		if (!cidr || cidr > 32)
 			return -IPSET_ERR_INVALID_CIDR;
 		ip_set_mask_from_to(ip, ip_to, cidr);
-	} else
-		ip_to = ip;
+	}
 
 	port_to = port = ntohs(data.port);
 	if (with_ports && tb[IPSET_ATTR_PORT_TO]) {
diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c
index 6344ef5..6a21271 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportip.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportip.c
@@ -166,7 +166,7 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[],
 	const struct ip_set_hash *h = set->data;
 	ipset_adtfn adtfn = set->variant->adt[adt];
 	struct hash_ipportip4_elem data = { };
-	u32 ip, ip_to = 0, p = 0, port, port_to;
+	u32 ip, ip_to, p = 0, port, port_to;
 	u32 timeout = h->timeout;
 	bool with_ports = false;
 	int ret;
@@ -218,7 +218,7 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[],
 		return ip_set_eexist(ret, flags) ? 0 : ret;
 	}
 
-	ip = ntohl(data.ip);
+	ip_to = ip = ntohl(data.ip);
 	if (tb[IPSET_ATTR_IP_TO]) {
 		ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to);
 		if (ret)
@@ -231,8 +231,7 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[],
 		if (!cidr || cidr > 32)
 			return -IPSET_ERR_INVALID_CIDR;
 		ip_set_mask_from_to(ip, ip_to, cidr);
-	} else
-		ip_to = ip;
+	}
 
 	port_to = port = ntohs(data.port);
 	if (with_ports && tb[IPSET_ATTR_PORT_TO]) {
diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c
index cb71f9a..2d5cd4e 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c
@@ -215,8 +215,8 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
 	const struct ip_set_hash *h = set->data;
 	ipset_adtfn adtfn = set->variant->adt[adt];
 	struct hash_ipportnet4_elem data = { .cidr = HOST_MASK - 1 };
-	u32 ip, ip_to = 0, p = 0, port, port_to;
-	u32 ip2_from = 0, ip2_to, ip2_last, ip2;
+	u32 ip, ip_to, p = 0, port, port_to;
+	u32 ip2_from, ip2_to, ip2_last, ip2;
 	u32 timeout = h->timeout;
 	bool with_ports = false;
 	u8 cidr;
@@ -286,6 +286,7 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
 		return ip_set_eexist(ret, flags) ? 0 : ret;
 	}
 
+	ip_to = ip;
 	if (tb[IPSET_ATTR_IP_TO]) {
 		ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to);
 		if (ret)
@@ -306,6 +307,8 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
 		if (port > port_to)
 			swap(port, port_to);
 	}
+
+	ip2_to = ip2_from;
 	if (tb[IPSET_ATTR_IP2_TO]) {
 		ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2_TO], &ip2_to);
 		if (ret)
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 0/2] netfilter fixes for net
From: pablo @ 2012-11-22  9:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi David,

The following patchset contains two Netfilter fixes:

* Fix buffer overflow in the name of the timeout policy object
  in the cttimeout infrastructure, from Florian Westphal.

* Fix a bug in the hash set in case that IP ranges are
  specified, from Jozsef Kadlecsik.

You can pull these changes from:

git://1984.lsi.us.es/nf master

Thanks!

Florian Westphal (1):
  netfilter: cttimeout: fix buffer overflow

Jozsef Kadlecsik (1):
  netfilter: ipset: Fix range bug in hash:ip,port,net

 net/netfilter/ipset/ip_set_hash_ip.c        |    4 ++--
 net/netfilter/ipset/ip_set_hash_ipport.c    |    7 +++----
 net/netfilter/ipset/ip_set_hash_ipportip.c  |    7 +++----
 net/netfilter/ipset/ip_set_hash_ipportnet.c |    7 +++++--
 net/netfilter/nfnetlink_cttimeout.c         |    3 ++-
 5 files changed, 15 insertions(+), 13 deletions(-)

-- 
1.7.10.4


^ permalink raw reply

* Re: [PATCH 1/1] asix: use ramdom hw addr if the one read is not valid
From: Bjørn Mork @ 2012-11-22  8:55 UTC (permalink / raw)
  To: Jean-Christophe PLAGNIOL-VILLARD
  Cc: netdev, linux-usb, linux-arm-kernel, Sergei Shtylyov
In-Reply-To: <50AD36A8.6000702@mvista.com>

Sergei Shtylyov <sshtylyov@mvista.com> writes:
> On 11/21/2012 01:22 PM, Jean-Christophe PLAGNIOL-VILLARD wrote:
>
>> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
>> Cc: linux-usb@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> ---
>>  drivers/net/usb/asix_devices.c |   24 +++++++++++++++++++++---
>>  1 file changed, 21 insertions(+), 3 deletions(-)
>
>> diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
>> index 33ab824..7ebec5b 100644
>> --- a/drivers/net/usb/asix_devices.c
>> +++ b/drivers/net/usb/asix_devices.c
>> @@ -225,7 +225,13 @@ static int ax88172_bind(struct usbnet *dev, struct usb_interface *intf)
>>  			   ret);
>>  		goto out;
>>  	}
>> -	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
>> +
>> +	if (is_valid_ether_addr(buf)) {
>> +		memcpy(dev->net->dev_addr, buf, ETH_ALEN);
>> +	} else {
>> +		netdev_info(dev->net, "invalid hw address, using random\n");
>> +		eth_hw_addr_random(dev->net);
>> +	}
>>  
>>  	/* Initialize MII structure */
>>  	dev->mii.dev = dev->net;
[..]
>
>    Repeated thrice, this asks to be put into subroutine...

Yes.  Looking at the driver, this probably goes for most of the three
_bind() functions.  There is a lot of common code there.

But more important wrt the eth_hw_addr_random() change: Does this
actually work with real devices?  The driver implements a
asix_set_mac_address() which writes the address back to the device when
you change it.  I assume there is a reason for doing that.  Why don't
you do it here?


Bjørn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 9/9] batman-adv: Use packing of 2 for all headers before an ethernet header
From: Sven Eckelmann @ 2012-11-22  8:28 UTC (permalink / raw)
  To: Kevin Curtis
  Cc: fubar-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org,
	jitendra.kalsaria-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	rmody-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org,
	ron.mercer-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	lindner_marek-LWAfsSFWpa4@public.gmane.org, David Miller,
	andy-QlMahl40kYEqcZcGjlUOXw@public.gmane.org
In-Reply-To: <E603DC592C92B54A89CEF6B0919A0B1CA7D5A6CC87-uLMF5YoEu3cpvdBnIIuIUrTzKHr1x449ALKb+EK9MtIAvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Thursday 22 November 2012 08:12:38 Kevin Curtis wrote:
> >> Agree. But we should get this message also to the other guys
> 
> What message would this be?
> Are you advocating a different use of #pragma pack()?

David started this discussion and his statement was:

> The __packed attribute is an abstraction of the actual syntax the 
> compiler uses, if it is supported at all.
> 
> Therefore, you can't just unconditionally use the #pragma, and you 
> would need to use some kind of similar compiler abstraction for it.

So yes, it sounds to me like "don't use #pragma but think about some 
abstraction".

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 9/9] batman-adv: Use packing of 2 for all headers before an ethernet header
From: Kevin Curtis @ 2012-11-22  8:12 UTC (permalink / raw)
  To: Sven Eckelmann, David Miller
  Cc: fubar-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org,
	rmody-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org,
	ron.mercer-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	lindner_marek-LWAfsSFWpa4@public.gmane.org,
	jitendra.kalsaria-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org,
	andy-QlMahl40kYEqcZcGjlUOXw@public.gmane.org
In-Reply-To: <8108710.8oNxxR0rRd-S/pmIDWWJIwhrEaHGRlFQnOel7F/LzPIcbWoRP8EXgk@public.gmane.org>

>> Agree. But we should get this message also to the other guys

What message would this be?
Are you advocating a different use of #pragma pack()?


Kevin Curtis
Linux Development
FarSite Communications Ltd http://www.farsite.com
Winner of The Queen's Award for Enterprise 2009
tel:  +44 1256 330461
fax:  +44 1256 854931



-----Original Message-----
From: Sven Eckelmann [mailto:sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org] 
Sent: 21 November 2012 18:20
To: David Miller
Cc: ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; b.a.t.m.a.n-ZwoEplunGu2QeL92YiVCHQ@public.gmane.orgh.org; lindner_marek-LWAfsSFWpa4@public.gmane.org; Kevin Curtis; linux-driver-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org; ron.mercer-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org; jitendra.kalsaria-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org; rmody-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org; andy@greyhouse.net; fubar-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
Subject: Re: Re: [PATCH 9/9] batman-adv: Use packing of 2 for all headers before an ethernet header

On Wednesday 21 November 2012 12:57:59 David Miller wrote:
> From: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> Date: Wed, 21 Nov 2012 13:11:59 +0100
> 
> > +#pragma pack(2)
> 
>  ...
> 
> > -} __packed;
> 
> The __packed attribute is an abstraction of the actual syntax the 
> compiler uses, if it is supported at all.
> 
> Therefore, you can't just unconditionally use the #pragma, and you 
> would need to use some kind of similar compiler abstraction for it.
> 
> But to be honest this is really ugly and for very little, if any, 
> gain.

Agree. But we should get this message also to the other guys

$ git grep pragma -- drivers/net
drivers/net/bonding/bond_3ad.h:#pragma pack(1) drivers/net/bonding/bond_3ad.h:#pragma pack() drivers/net/bonding/bond_3ad.h:#pragma pack(8) drivers/net/bonding/bond_3ad.h:#pragma pack() drivers/net/bonding/bond_alb.c:#pragma pack(1) drivers/net/bonding/bond_alb.c:#pragma pack() drivers/net/ethernet/brocade/bna/bfa_defs.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfa_defs.h:#pragma pack() drivers/net/ethernet/brocade/bna/bfa_defs_cna.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfa_defs_cna.h:#pragma pack() drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h:#pragma pack() drivers/net/ethernet/brocade/bna/bfi.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfi.h:#pragma pack() drivers/net/ethernet/brocade/bna/bfi_cna.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfi_cna.h:#pragma pack() drivers/net/ethernet/brocade/bna/bfi_enet.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/bfi_enet.h:#pragma pack() drivers/net/ethernet/brocade/bna/cna.h:#pragma pack(1) drivers/net/ethernet/brocade/bna/cna.h:#pragma pack() drivers/net/ethernet/qlogic/qla3xxx.h:#pragma pack(1) drivers/net/ethernet/qlogic/qla3xxx.h:#pragma pack() drivers/net/wan/farsync.c:#pragma pack(1) drivers/net/wan/farsync.c:#pragma pack()

Kind regards,
	Sven

^ permalink raw reply

* [PATCH 1/1 v2] asix: use ramdom hw addr if the one read is not valid
From: Jean-Christophe PLAGNIOL-VILLARD @ 2012-11-22  7:35 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: netdev, linux-usb, Jean-Christophe PLAGNIOL-VILLARD
In-Reply-To: <1353493362-30418-1-git-send-email-plagnioj@jcrosoft.com>

Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
---
v2:

	introduce asix_set_netdev_dev_addr

Best Regards,
J.
 drivers/net/usb/asix_devices.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 33ab824..7a6e758 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -64,6 +64,16 @@ static void asix_status(struct usbnet *dev, struct urb *urb)
 	}
 }
 
+static void asix_set_netdev_dev_addr(struct usbnet *dev, u8 *addr)
+{
+	if (is_valid_ether_addr(addr)) {
+		memcpy(dev->net->dev_addr, addr, ETH_ALEN);
+	} else {
+		netdev_info(dev->net, "invalid hw address, using random\n");
+		eth_hw_addr_random(dev->net);
+	}
+}
+
 /* Get the PHY Identifier from the PHYSID1 & PHYSID2 MII registers */
 static u32 asix_get_phyid(struct usbnet *dev)
 {
@@ -225,7 +235,8 @@ static int ax88172_bind(struct usbnet *dev, struct usb_interface *intf)
 			   ret);
 		goto out;
 	}
-	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
+
+	asix_set_netdev_dev_addr(dev, buf);
 
 	/* Initialize MII structure */
 	dev->mii.dev = dev->net;
@@ -423,7 +434,8 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 		netdev_dbg(dev->net, "Failed to read MAC address: %d\n", ret);
 		return ret;
 	}
-	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
+
+	asix_set_netdev_dev_addr(dev, buf);
 
 	/* Initialize MII structure */
 	dev->mii.dev = dev->net;
@@ -777,7 +789,8 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
 		netdev_dbg(dev->net, "Failed to read MAC address: %d\n", ret);
 		return ret;
 	}
-	memcpy(dev->net->dev_addr, buf, ETH_ALEN);
+
+	asix_set_netdev_dev_addr(dev, buf);
 
 	/* Initialize MII structure */
 	dev->mii.dev = dev->net;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 196/493] net/wireless: remove use of __devinit
From: Hin-Tak Leung @ 2012-11-22  6:13 UTC (permalink / raw)
  To: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Bill Pemberton
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1353349642-3677-196-git-send-email-wfp5p-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org>



--- On Mon, 19/11/12, Bill Pemberton <wfp5p-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org> wrote:

> From: Bill Pemberton <wfp5p-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org>
> Subject: [PATCH 196/493] net/wireless: remove use of __devinit
> To: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org
> Cc: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>, "Jiri Slaby" <jirislaby-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "Nick Kossifidis" <mickflemm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "Luis R. Rodriguez" <mcgrof-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>, "Simon Kelley" <simon-xn1N/tgparsycpQjotevgVpr/1R2p/CL@public.gmane.org>, "Stefano Brivio" <stefano.brivio-hl5o88x/ua9eoWH0uzbU5w@public.gmane.org>, "Stanislav Yakovlev" <stas.yakovlev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "Dan Williams" <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Christian Lamparter" <chunkeey-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>, "Herton Ronaldo Krzesinski" <herton-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>, "Hin-Tak Leung" <htl10-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>, "Larry Finger" <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>, "Chaoming Li" <chaoming_li-kXabqFNEczNtrwSWzY7KCg@public.gmane.org>, "Luciano Coelho" <coelho-l0cyMroinI0@public.gmane.org>, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, ath5k-devel-xDcbHBWguxEUs3QNXV6qNA@public.gmane.org, b43-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, brcm80211-dev-list-dY08KVG/lbpWk0Htik3J/w@public.gmane.org, libertas-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> Date: Monday, 19 November, 2012, 18:22
> CONFIG_HOTPLUG is going away as an
> option so __devinit is no longer
> needed.
> 
> Signed-off-by: Bill Pemberton <wfp5p-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org>
> Cc: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> 
> Cc: Jiri Slaby <jirislaby-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Cc: Nick Kossifidis <mickflemm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Cc: "Luis R. Rodriguez" <mcgrof-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> 
> Cc: Simon Kelley <simon-xn1N/tgparsycpQjotevgVpr/1R2p/CL@public.gmane.org>
> 
> Cc: Stefano Brivio <stefano.brivio-hl5o88x/ua9eoWH0uzbU5w@public.gmane.org>
> 
> Cc: Stanislav Yakovlev <stas.yakovlev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Cc: Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Cc: Christian Lamparter <chunkeey-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
> 
> Cc: Herton Ronaldo Krzesinski <herton-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> 
> Cc: Hin-Tak Leung <htl10-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
> 
> Cc: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> 
> Cc: Chaoming Li <chaoming_li-kXabqFNEczNtrwSWzY7KCg@public.gmane.org>
> 
> Cc: Luciano Coelho <coelho-l0cyMroinI0@public.gmane.org> 
> Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> 
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> 
> Cc: ath5k-devel-xDcbHBWguxEUs3QNXV6qNA@public.gmane.org
> 
> Cc: b43-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> 
> Cc: brcm80211-dev-list-dY08KVG/lbpWk0Htik3J/w@public.gmane.org
> 
> Cc: libertas-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> 
> ---

Acked-by: Hin-Tak Leung <htl10-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>

The rtl818x parts.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/1] netfilter: cttimeout: fix buffer overflow
From: Pablo Neira Ayuso @ 2012-11-21 22:54 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev
In-Reply-To: <1353497858-29404-1-git-send-email-fw@strlen.de>

On Wed, Nov 21, 2012 at 12:37:38PM +0100, Florian Westphal wrote:
> Chen Gang reports:
> the length of nla_data(cda[CTA_TIMEOUT_NAME]) is not limited in server side.
> 
> And indeed, its used to strcpy to a fixed-sized buffer.
> 
> Fortunately, nfnetlink users need CAP_NET_ADMIN.

Good catch, applied thanks.

^ permalink raw reply

* Re: 8139cp: set ring address before enabling receiver
From: Ben Hutchings @ 2012-11-21 21:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Woodhouse, Jason Wang, David S. Miller, netdev
In-Reply-To: <50AD3C51.9070105@pobox.com>

On Wed, 2012-11-21 at 15:40 -0500, Jeff Garzik wrote:
> On 11/21/2012 03:18 PM, Ben Hutchings wrote:
> > On Wed, 2012-11-21 at 19:51 +0000, David Woodhouse wrote:
> >> On Wed, 2012-11-21 at 13:12 -0500, Jeff Garzik wrote:
> >>>
> >>> What sticks out at me from the commit message?
> >>>
> >>> It was not tested on the famously quirky 8139 hardware at all.
> >>>
> >>> While I have not looked at the 8139C+ data sheet in a while, sometimes
> >>> the hardware _did_ have a strange init order.
> >>>
> >>> As this works in a simulator but fails on real hardware, it seems like
> >>> an obvious regression caused by an untested [on read hardware] patch.
> >>
> >> The data sheet (v1.6, from http://realtek.info/pdf/rtl8139cp.pdf ) says
> >> in §6.33 (C+ Command Register):
> >>   "Enable C+ mode functions in C+CR register first,
> >>   => Enable transmit/receive in Command register (offset 37h),
> >>   => Configure other related registers (ex. Descriptor start address,
> >>      TCR, RCR, ...)."
> >>
> >> I understand the concern expressed in the offending commit message about
> >> DMA happening to invalid addresses, and I'll look at the data sheet
> >> harder to see when the DMA actually starts happening. But it definitely
> >> seems that our current code isn't doing what the data sheet says.
> >>
> >> I wonder if I can find one of these lying around and stick it in a
> >> machine with an IOMMU...
> >
> > You might be able to avoid disaster by doing:
> >
> > 1. Set MAC filter to drop everything
> > 2. Enable RX DMA
> > 3. Set RX DMA ring address
> > 4. Set MAC filter according to current flags & multicast list
> >
> > I'm assuming, knowing nothing about this particular hardware, that the
> > MAC filter register(s) will accept writes before RX DMA is enabled.
> 
> A larger point is that the commit was created to avoid imagined disaster 
> on simulated hardware...
>
>   ...and wound up creating behavior that is (a) contra to the data sheet 
> and (b) breaks real hardware.

I wasn't suggesting anyone should change this again without testing on
real hardware.  But the 'imagined disaster' seems to be an obvious and
real race condition, which the driver is just more likely to win when
racing real hardware than when racing virtual hardware.

(It could be that the hardware pre-fetches DMA descriptors, in which
case this is a 'how did that ever work?' bug.  Alternately, there could
be a hidden enable bit that doesn't get set until the RX DMA ring
address is written, in which case the driver may need a quirk for
emulations that lack that.  An IOMMU should be able to answer these
questions.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: Francois Romieu @ 2012-11-21 20:40 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Jeff Garzik, Jason Wang, David S. Miller, netdev, Hayes Wang,
	gilboad
In-Reply-To: <1353529639.26346.164.camel@shinybook.infradead.org>

David Woodhouse <dwmw2@infradead.org> :
> This fixes (for me) a regression introduced by commit b01af457 ("8139cp:
> set ring address before enabling receiver"). That commit configured the
> descriptor ring addresses earlier in the initialisation sequence, in
> order to avoid the possibility of triggering stray DMA before the
> correct address had been set up.
> 
> Unfortunately, it seems that the hardware will scribble garbage into the
> TxRingAddr registers when we enable "plus mode" Tx in the CpCmd
> register. Observed on a Traverse Geos router board.
> 
> To deal with this, while not reintroducing the problem which led to the
> original commit, we augment cp_start_hw() to write to the CpCmd register
> *first*, then set the descriptor ring addresses, and then finally to
> enable Rx and Tx in the original 8139 Cmd register. The datasheet
> actually indicates that we should enable Tx/Rx in the Cmd register
> *before* configuring the descriptor addresses, but that would appear to
> re-introduce the problem that the offending commit b01af457 was trying
> to solve. And this variant appears to work fine on real hardware.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@kernel.org [3.5+]
> 
> ---
> How about this? I'm still somewhat confused about when it actually
> *does* start doing DMA, given what the datasheet says.

Straight to -stable ?

Afaik nobody complained from the original (pre b01af457) problem on
real hardware.

May be someone @realtek (hi Hayes) can give an explanation regarding
the CpCmd, RingAddr, Cmd init sequence and the start of DMA.

-- 
Ueimor

^ permalink raw reply

* Re: 8139cp: set ring address before enabling receiver
From: David Woodhouse @ 2012-11-21 21:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Ben Hutchings, Jason Wang, David S. Miller, netdev
In-Reply-To: <50AD3C51.9070105@pobox.com>

[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]

On Wed, 2012-11-21 at 15:40 -0500, Jeff Garzik wrote:
> A larger point is that the commit was created to avoid imagined
> disaster on simulated hardware...

In their defence, I suspect that qemu/kvm is probably now the most
common type of RTL8139 on Linux deployments :)

And since KVM is now capable of supporting an IOMMU, which most *real*
boxes with RTL8139 won't have, it was probably a *real* problem rather
than just an imagined one.

>   ...and wound up creating behavior that is (a) contra to the data
> sheet and (b) breaks real hardware.

And again in their defence... the data sheet does appear to be
suggesting something completely stupid. The patch I just submitted
doesn't do what the data sheet says *either*, although I did at least
test it on real hardware. How many versions of the 8139C+ are there?
Should I be looking for more testing on different revisions?

I had a quick play with the Cfg9346 register. I note that when you set
it to 0x80 it *does* disable both network and bus mastering... and we
set it to the 'Write Enable' value 0xC0 while we're configuring
everything. I wondered if that might perhaps be the thing that made the
original behaviour, and the recommendation in the data sheet, sane.

But it doesn't disable operation when it's in the "Unlock" mode. I tried
setting the driver's value of Cfg9346_Lock to 0xC0 (i.e. leave it
write-enabled at all times), hoping that it would then fail to do any
DMA and prove that the original code was actually safe after all. But
the driver is working fine.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: 8139cp: set ring address before enabling receiver
From: Jeff Garzik @ 2012-11-21 20:40 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Woodhouse, Jason Wang, David S. Miller, netdev
In-Reply-To: <1353529135.2619.36.camel@bwh-desktop.uk.solarflarecom.com>

On 11/21/2012 03:18 PM, Ben Hutchings wrote:
> On Wed, 2012-11-21 at 19:51 +0000, David Woodhouse wrote:
>> On Wed, 2012-11-21 at 13:12 -0500, Jeff Garzik wrote:
>>>
>>> What sticks out at me from the commit message?
>>>
>>> It was not tested on the famously quirky 8139 hardware at all.
>>>
>>> While I have not looked at the 8139C+ data sheet in a while, sometimes
>>> the hardware _did_ have a strange init order.
>>>
>>> As this works in a simulator but fails on real hardware, it seems like
>>> an obvious regression caused by an untested [on read hardware] patch.
>>
>> The data sheet (v1.6, from http://realtek.info/pdf/rtl8139cp.pdf ) says
>> in §6.33 (C+ Command Register):
>>   "Enable C+ mode functions in C+CR register first,
>>   => Enable transmit/receive in Command register (offset 37h),
>>   => Configure other related registers (ex. Descriptor start address,
>>      TCR, RCR, ...)."
>>
>> I understand the concern expressed in the offending commit message about
>> DMA happening to invalid addresses, and I'll look at the data sheet
>> harder to see when the DMA actually starts happening. But it definitely
>> seems that our current code isn't doing what the data sheet says.
>>
>> I wonder if I can find one of these lying around and stick it in a
>> machine with an IOMMU...
>
> You might be able to avoid disaster by doing:
>
> 1. Set MAC filter to drop everything
> 2. Enable RX DMA
> 3. Set RX DMA ring address
> 4. Set MAC filter according to current flags & multicast list
>
> I'm assuming, knowing nothing about this particular hardware, that the
> MAC filter register(s) will accept writes before RX DMA is enabled.

A larger point is that the commit was created to avoid imagined disaster 
on simulated hardware...

  ...and wound up creating behavior that is (a) contra to the data sheet 
and (b) breaks real hardware.

	Jeff

^ permalink raw reply

* [PATCH 6/6] VSOCK: header and config files.
From: George Zhang @ 2012-11-21 20:40 UTC (permalink / raw)
  To: netdev, linux-kernel, georgezhang, virtualization
  Cc: pv-drivers, gregkh, davem
In-Reply-To: <20121121203715.14395.27632.stgit@promb-2n-dhcp175.eng.vmware.com>

VSOCK header files, Makefiles and Kconfig systems for Linux VSocket module.

Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>

---
 include/linux/socket.h              |    4 
 net/Kconfig                         |    1 
 net/Makefile                        |    1 
 net/vmw_vsock/Kconfig               |   14 +
 net/vmw_vsock/Makefile              |    4 
 net/vmw_vsock/notify_qstate.c       |  625 +++++++++++++++++++++++++++++++++++
 net/vmw_vsock/vmci_sockets.h        |  517 +++++++++++++++++++++++++++++
 net/vmw_vsock/vmci_sockets_packet.h |   90 +++++
 net/vmw_vsock/vsock_common.h        |  127 +++++++
 net/vmw_vsock/vsock_packet.h        |  124 +++++++
 net/vmw_vsock/vsock_version.h       |   28 ++
 11 files changed, 1534 insertions(+), 1 deletions(-)
 create mode 100644 net/vmw_vsock/Kconfig
 create mode 100644 net/vmw_vsock/Makefile
 create mode 100644 net/vmw_vsock/notify_qstate.c
 create mode 100644 net/vmw_vsock/vmci_sockets.h
 create mode 100644 net/vmw_vsock/vmci_sockets_packet.h
 create mode 100644 net/vmw_vsock/vsock_common.h
 create mode 100644 net/vmw_vsock/vsock_packet.h
 create mode 100644 net/vmw_vsock/vsock_version.h

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 25d6322..57bc85e 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -195,7 +195,8 @@ struct ucred {
 #define AF_CAIF		37	/* CAIF sockets			*/
 #define AF_ALG		38	/* Algorithm sockets		*/
 #define AF_NFC		39	/* NFC sockets			*/
-#define AF_MAX		40	/* For now.. */
+#define AF_VSOCK	40	/* VMCI sockets			*/
+#define AF_MAX		41	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -238,6 +239,7 @@ struct ucred {
 #define PF_CAIF		AF_CAIF
 #define PF_ALG		AF_ALG
 #define PF_NFC		AF_NFC
+#define PF_VSOCK	AF_VSOCK
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/net/Kconfig b/net/Kconfig
index 245831b..75b8d5e 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -216,6 +216,7 @@ source "net/dcb/Kconfig"
 source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 source "net/openvswitch/Kconfig"
+source "net/vmw_vsock/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 4f4ee08..cae59f4 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -70,3 +70,4 @@ obj-$(CONFIG_CEPH_LIB)		+= ceph/
 obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
 obj-$(CONFIG_NFC)		+= nfc/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
+obj-$(CONFIG_VMWARE_VSOCK)	+= vmw_vsock/
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
new file mode 100644
index 0000000..95e2568
--- /dev/null
+++ b/net/vmw_vsock/Kconfig
@@ -0,0 +1,14 @@
+#
+# Vsock protocol
+#
+
+config VMWARE_VSOCK
+	tristate "Virtual Socket protocol"
+	depends on VMWARE_VMCI
+	help
+	  Virtual Socket Protocol is a socket protocol similar to TCP/IP
+	  allowing comunication between Virtual Machines and VMware
+	  hypervisor.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called vsock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
new file mode 100644
index 0000000..4e940fe
--- /dev/null
+++ b/net/vmw_vsock/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_VMWARE_VSOCK) += vmw_vsock.o
+
+vmw_vsock-y += af_vsock.o notify.o notify_qstate.o stats.o util.o \
+	vsock_addr.o
diff --git a/net/vmw_vsock/notify_qstate.c b/net/vmw_vsock/notify_qstate.c
new file mode 100644
index 0000000..5a2f066
--- /dev/null
+++ b/net/vmw_vsock/notify_qstate.c
@@ -0,0 +1,625 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * notifyQState.c --
+ *
+ * Linux control notifications based on Queuepair state for the VMCI Stream
+ * Sockets protocol.
+ */
+
+#include <linux/types.h>
+
+#include <linux/socket.h>
+
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+
+#include "notify.h"
+#include "af_vsock.h"
+
+#define PKT_FIELD(vsk, field_name) ((vsk)->notify.pkt_q_state.field_name)
+
+/*
+ *
+ * vsock_vmci_notify_waiting_write --
+ *
+ * Determines if the conditions have been met to notify a waiting writer.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool vsock_vmci_notify_waiting_write(struct vsock_vmci_sock *vsk)
+{
+	bool retval;
+	u64 notify_limit;
+
+	if (!PKT_FIELD(vsk, peer_waiting_write))
+		return false;
+
+	/*
+	 * When the sender blocks, we take that as a sign that the sender is
+	 * faster than the receiver. To reduce the transmit rate of the sender,
+	 * we delay the sending of the read notification by decreasing the
+	 * write_notify_window. The notification is delayed until the number of
+	 * bytes used in the queue drops below the write_notify_window.
+	 */
+
+	if (!PKT_FIELD(vsk, peer_waiting_write_detected)) {
+		PKT_FIELD(vsk, peer_waiting_write_detected) = true;
+		if (PKT_FIELD(vsk, write_notify_window) < PAGE_SIZE) {
+			PKT_FIELD(vsk, write_notify_window) =
+			    PKT_FIELD(vsk, write_notify_min_window);
+		} else {
+			PKT_FIELD(vsk, write_notify_window) -= PAGE_SIZE;
+			if (PKT_FIELD(vsk, write_notify_window) <
+			    PKT_FIELD(vsk, write_notify_min_window))
+				PKT_FIELD(vsk, write_notify_window) =
+				    PKT_FIELD(vsk, write_notify_min_window);
+
+		}
+	}
+	notify_limit = vsk->consume_size - PKT_FIELD(vsk, write_notify_window);
+
+	/*
+	 * The notify_limit is used to delay notifications in the case where
+	 * flow control is enabled. Below the test is expressed in terms of
+	 * free space in the queue: if free_space > ConsumeSize -
+	 * write_notify_window then notify An alternate way of expressing this
+	 * is to rewrite the expression to use the data ready in the receive
+	 * queue: if write_notify_window > bufferReady then notify as
+	 * free_space == ConsumeSize - bufferReady.
+	 */
+
+	retval = vmci_qpair_consume_free_space(vsk->qpair) > notify_limit;
+
+	if (retval) {
+		/*
+		 * Once we notify the peer, we reset the detected flag so the
+		 * next wait will again cause a decrease in the window size.
+		 */
+
+		PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+	}
+	return retval;
+}
+
+/*
+ *
+ * vsock_vmci_handle_read --
+ *
+ * Handles an incoming read message.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_handle_read(struct sock *sk,
+		       struct vsock_packet *pkt,
+		       bool bottom_half,
+		       struct sockaddr_vm *dst, struct sockaddr_vm *src)
+{
+
+	sk->sk_write_space(sk);
+}
+
+/*
+ *
+ * vsock_vmci_handle_wrote --
+ *
+ * Handles an incoming wrote message.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_handle_wrote(struct sock *sk,
+			struct vsock_packet *pkt,
+			bool bottom_half,
+			struct sockaddr_vm *dst, struct sockaddr_vm *src)
+{
+	sk->sk_data_ready(sk, 0);
+}
+
+/*
+ *
+ * vsock_vmci_block_update_write_window --
+ *
+ * Updates the write window when we are blocking for data.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_block_update_write_window(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+	if (PKT_FIELD(vsk, write_notify_window) < vsk->consume_size)
+		PKT_FIELD(vsk, write_notify_window) =
+		    min(PKT_FIELD(vsk, write_notify_window) + PAGE_SIZE,
+			vsk->consume_size);
+
+}
+
+/*
+ *
+ * vsock_vmci_send_read_notification --
+ *
+ * Sends a read notification to this socket's peer.
+ *
+ * Results: >= 0 if the datagram is sent successfully, negative error value
+ * otherwise.
+ *
+ * Side effects: None.
+ */
+
+static int vsock_vmci_send_read_notification(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+	bool sent_read;
+	unsigned int retries;
+	int err;
+
+	vsk = vsock_sk(sk);
+	sent_read = false;
+	retries = 0;
+	err = 0;
+
+	if (vsock_vmci_notify_waiting_write(vsk)) {
+		/*
+		 * Notify the peer that we have read, retrying the send on
+		 * failure up to our maximum value.  XXX For now we just log
+		 * the failure, but later we should schedule a work item to
+		 * handle the resend until it succeeds.  That would require
+		 * keeping track of work items in the vsk and cleaning them up
+		 * upon socket close.
+		 */
+		while (!(vsk->peer_shutdown & RCV_SHUTDOWN) &&
+		       !sent_read && retries < VSOCK_MAX_DGRAM_RESENDS) {
+			err = VSOCK_SEND_READ(sk);
+			if (err >= 0)
+				sent_read = true;
+
+			retries++;
+		}
+
+		if (retries >= VSOCK_MAX_DGRAM_RESENDS && !sent_read)
+			printk
+			    ("%p unable to send read notification to peer.\n",
+			     sk);
+		else
+			PKT_FIELD(vsk, peer_waiting_write) = false;
+
+	}
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_socket_init --
+ *
+ * Function that is called after a socket is created and before any notify ops
+ * are used.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_socket_init(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+	vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, write_notify_min_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, peer_waiting_write) = false;
+	PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_socket_destruct --
+ *
+ * Function that is called when the socket is being released.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_socket_destruct(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+	vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, write_notify_min_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, peer_waiting_write) = false;
+	PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_poll_in --
+ *
+ * Called by the poll function to figure out if there is data to read and to
+ * setup future notifications if needed. Only called on sockets that aren't
+ * shutdown for recv.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_poll_in(struct sock *sk,
+			      size_t target, bool *data_ready_now)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	if (vsock_vmci_stream_has_data(vsk)) {
+		*data_ready_now = true;
+	} else {
+		/*
+		 * We can't read right now because there is nothing in the
+		 * queue. Ask for notifications when there is something to
+		 * read.
+		 */
+		if (sk->sk_state == SS_CONNECTED)
+			vsock_vmci_block_update_write_window(sk);
+
+		*data_ready_now = false;
+	}
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_poll_out
+ *
+ * Called by the poll function to figure out if there is space to write and to
+ * setup future notifications if needed. Only called on a connected socket that
+ * isn't shutdown for send.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_poll_out(struct sock *sk,
+			       size_t target, bool *space_avail_now)
+{
+	s64 produce_q_free_space;
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	produce_q_free_space = vsock_vmci_stream_has_space(vsk);
+	if (produce_q_free_space > 0) {
+		*space_avail_now = true;
+		return 0;
+	} else if (produce_q_free_space == 0) {
+		/*
+		 * This is a connected socket but we can't currently send data.
+		 * Nothing else to do.
+		 */
+		*space_avail_now = false;
+	}
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_init --
+ *
+ * Called at the start of a stream recv call with the socket lock held.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_init(struct sock *sk,
+				size_t target,
+				struct vsock_vmci_recv_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	data->consume_head = 0;
+	data->produce_tail = 0;
+	data->notify_on_block = false;
+
+	if (PKT_FIELD(vsk, write_notify_min_window) < target + 1) {
+		PKT_FIELD(vsk, write_notify_min_window) = target + 1;
+		if (PKT_FIELD(vsk, write_notify_window) <
+		    PKT_FIELD(vsk, write_notify_min_window)) {
+			/*
+			 * If the current window is smaller than the new
+			 * minimal window size, we need to reevaluate whether
+			 * we need to notify the sender. If the number of ready
+			 * bytes are smaller than the new window, we need to
+			 * send a notification to the sender before we block.
+			 */
+
+			PKT_FIELD(vsk, write_notify_window) =
+			    PKT_FIELD(vsk, write_notify_min_window);
+			data->notify_on_block = true;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_pre_block --
+ *
+ * Called right before a socket is about to block with the socket lock held.
+ * The socket lock may have been released between the entry function and the
+ * preblock call.
+ *
+ * Note: This function may be called multiple times before the post block
+ * function is called.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_pre_block(struct sock *sk,
+				     size_t target,
+				     struct vsock_vmci_recv_notify_data *data)
+{
+	int err = 0;
+
+	vsock_vmci_block_update_write_window(sk);
+
+	if (data->notify_on_block) {
+		err = vsock_vmci_send_read_notification(sk);
+		if (err < 0)
+			return err;
+
+		data->notify_on_block = false;
+	}
+
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_post_dequeue --
+ *
+ * Called right after we dequeue / peek data from a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_post_dequeue(struct sock *sk,
+				size_t target,
+				ssize_t copied,
+				bool data_read,
+				struct vsock_vmci_recv_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk;
+	int err;
+	bool was_full = false;
+	u64 free_space;
+
+	vsk = vsock_sk(sk);
+	err = 0;
+
+	if (data_read) {
+		smp_mb();
+
+		free_space = vmci_qpair_consume_free_space(vsk->qpair);
+		was_full = free_space == copied;
+
+		if (was_full)
+			PKT_FIELD(vsk, peer_waiting_write) = true;
+
+		err = vsock_vmci_send_read_notification(sk);
+		if (err < 0)
+			return err;
+
+		/* See the comment in vsock_vmci_notify_pkt_send_post_enqueue */
+		sk->sk_data_ready(sk, 0);
+	}
+
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_send_init --
+ *
+ * Called at the start of a stream send call with the socket lock held.
+ *
+ * Results: 0 on success. A negative error code on failure.
+ *
+ * Side effects:
+ */
+
+static int
+vsock_vmci_notify_pkt_send_init(struct sock *sk,
+				struct vsock_vmci_send_notify_data *data)
+{
+	data->consume_head = 0;
+	data->produce_tail = 0;
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notifySendPostEnqueue --
+ *
+ * Called right after we enqueue data to a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_send_post_enqueue(struct sock *sk,
+				ssize_t written,
+				struct vsock_vmci_send_notify_data *data)
+{
+	int err = 0;
+	struct vsock_vmci_sock *vsk;
+	bool sent_wrote = false;
+	bool was_empty;
+	int retries = 0;
+
+	vsk = vsock_sk(sk);
+
+	smp_mb();
+
+	was_empty = (vmci_qpair_produce_buf_ready(vsk->qpair) == written);
+	if (was_empty) {
+		while (!(vsk->peer_shutdown & RCV_SHUTDOWN) &&
+		       !sent_wrote && retries < VSOCK_MAX_DGRAM_RESENDS) {
+			err = VSOCK_SEND_WROTE(sk);
+			if (err >= 0)
+				sent_wrote = true;
+
+			retries++;
+		}
+	}
+
+	if (retries >= VSOCK_MAX_DGRAM_RESENDS && !sent_wrote) {
+		printk
+		    ("%p unable to send wrote notification to peer.\n",
+		     sk);
+		return err;
+	}
+
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_handle_pkt
+ *
+ * Called when a notify packet is recieved for a socket in the connected state.
+ * Note this might be called from a bottom half.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_notify_pkt_handle_pkt(struct sock *sk,
+				 struct vsock_packet *pkt,
+				 bool bottom_half,
+				 struct sockaddr_vm *dst,
+				 struct sockaddr_vm *src, bool *pkt_processed)
+{
+	bool processed = false;
+
+	switch (pkt->type) {
+	case VSOCK_PACKET_TYPE_WROTE:
+		vsock_vmci_handle_wrote(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+	case VSOCK_PACKET_TYPE_READ:
+		vsock_vmci_handle_read(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+	}
+
+	if (pkt_processed)
+		*pkt_processed = processed;
+
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_process_request
+ *
+ * Called near the end of process request.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_process_request(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = vsk->consume_size;
+	if (vsk->consume_size < PKT_FIELD(vsk, write_notify_min_window))
+		PKT_FIELD(vsk, write_notify_min_window) = vsk->consume_size;
+
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_process_negotiate
+ *
+ * Called near the end of process negotiate.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_process_negotiate(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = vsk->consume_size;
+	if (vsk->consume_size < PKT_FIELD(vsk, write_notify_min_window))
+		PKT_FIELD(vsk, write_notify_min_window) = vsk->consume_size;
+
+}
+
+/* Socket always on control packet based operations. */
+struct vsock_vmci_notify_ops vsock_vmci_notify_pkt_q_state_ops = {
+	vsock_vmci_notify_pkt_socket_init,
+	vsock_vmci_notify_pkt_socket_destruct,
+	vsock_vmci_notify_pkt_poll_in,
+	vsock_vmci_notify_pkt_poll_out,
+	vsock_vmci_notify_pkt_handle_pkt,
+	vsock_vmci_notify_pkt_recv_init,
+	vsock_vmci_notify_pkt_recv_pre_block,
+	NULL,			/* recv_pre_dequeue */
+	vsock_vmci_notify_pkt_recv_post_dequeue,
+	vsock_vmci_notify_pkt_send_init,
+	NULL,			/* send_pre_block */
+	NULL,			/* send_pre_enqueue */
+	vsock_vmci_notify_pkt_send_post_enqueue,
+	vsock_vmci_notify_pkt_process_request,
+	vsock_vmci_notify_pkt_process_negotiate,
+};
diff --git a/net/vmw_vsock/vmci_sockets.h b/net/vmw_vsock/vmci_sockets.h
new file mode 100644
index 0000000..6e6fd98
--- /dev/null
+++ b/net/vmw_vsock/vmci_sockets.h
@@ -0,0 +1,517 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vmci_sockets.h --
+ *
+ * VMCI sockets public constants and types.
+ */
+
+#ifndef _VMCI_SOCKETS_H_
+#define _VMCI_SOCKETS_H_
+
+#if !defined(__KERNEL__)
+#include <sys/socket.h>
+#endif
+
+/*
+ * \brief Option name for STREAM socket buffer size.
+ *
+ * Use as the option name in \c setsockopt(3) or \c getsockopt(3) to set or get
+ * an \c unsigned \c long \c long that specifies the size of the buffer
+ * underlying a vSockets STREAM socket.
+ *
+ * \note Value is clamped to the MIN and MAX.
+ *
+ * \see vmci_sock_get_af_value_fd() \see SO_VMCI_BUFFER_MIN_SIZE \see
+ * SO_VMCI_BUFFER_MAX_SIZE
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); unsigned long
+ * long val = 0x1000; int fd = socket(af, SOCK_STREAM, 0); setsockopt(fd, af,
+ * SO_VMCI_BUFFER_SIZE, &val, sizeof val); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_BUFFER_SIZE                 0
+
+/*
+ * \brief Option name for STREAM socket minimum buffer size.
+ *
+ * Use as the option name in \c setsockopt(3) or \c getsockopt(3) to set or get
+ * an \c unsigned \c long \c long that specifies the minimum size allowed for
+ * the buffer underlying a vSockets STREAM socket.
+ *
+ * \see vmci_sock_get_af_value_fd() \see SO_VMCI_BUFFER_SIZE \see
+ * SO_VMCI_BUFFER_MAX_SIZE
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); unsigned long
+ * long val = 0x500; int fd = socket(af, SOCK_STREAM, 0); setsockopt(fd, af,
+ * SO_VMCI_BUFFER_MIN_SIZE, &val, sizeof val); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_BUFFER_MIN_SIZE             1
+
+/*
+ * \brief Option name for STREAM socket maximum buffer size.
+ *
+ * Use as the option name in \c setsockopt(3) or \c getsockopt(3) to set or get
+ * an unsigned long long that specifies the maximum size allowed for the buffer
+ * underlying a vSockets STREAM socket.
+ *
+ * \see vmci_sock_get_af_value_fd() \see SO_VMCI_BUFFER_SIZE \see
+ * SO_VMCI_BUFFER_MIN_SIZE
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); unsigned long
+ * long val = 0x4000; int fd = socket(af, SOCK_STREAM, 0); setsockopt(fd, af,
+ * SO_VMCI_BUFFER_MAX_SIZE, &val, sizeof val); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_BUFFER_MAX_SIZE             2
+
+/*
+ * \brief Option name for socket peer's host-specific VM ID.
+ *
+ * Use as the option name in \c getsockopt(3) to get a host-specific identifier
+ * for the peer endpoint's VM.  The identifier is a signed integer.
+ *
+ * \note Only available for ESX (VMKernel/userworld) endpoints.
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); int id;
+ * socklen_t len = sizeof id; int fd = socket(af, SOCK_DGRAM, 0); getsockopt(fd,
+ * af, SO_VMCI_PEER_HOST_VM_ID, &id, &len); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_PEER_HOST_VM_ID             3
+
+/*
+ * \brief Option name for socket's service label.
+ *
+ * Use as the option name in \c setsockopt(3) or \c getsockopt(3) to set or get
+ * the service label for a socket.  The service label is a C-style
+ * NUL-terminated string.
+ *
+ * \note Only available for ESX (VMkernel/userworld) endpoints.
+ */
+
+#define SO_VMCI_SERVICE_LABEL               4
+
+/*
+ * \brief Option name for determining if a socket is trusted.
+ *
+ * Use as the option name in \c getsockopt(3) to determine if a socket is
+ * trusted.  The value is a signed integer.
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); int trusted;
+ * socklen_t len = sizeof trusted; int fd = socket(af, SOCK_DGRAM, 0);
+ * getsockopt(fd, af, SO_VMCI_TRUSTED, &trusted, &len); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_TRUSTED                     5
+
+/*
+ * \brief Option name for STREAM socket connection timeout.
+ *
+ * Use as the option name in \c setsockopt(3) or \c getsockopt(3) to set or get
+ * the connection timeout for a STREAM socket.  The value is platform
+ * dependent.  On ESX, Linux and Mac OS, it is a \c struct \c timeval. On
+ * Windows, it is a \c DWORD.
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); struct
+ * timeval t = { 5, 100000 }; int fd = socket(af, SOCK_STREAM, 0);
+ * setsockopt(fd, af, SO_VMCI_CONNECT_TIMEOUT, &t, sizeof t); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_CONNECT_TIMEOUT             6
+
+/*
+ * \brief Option name for using non-blocking send/receive.
+ *
+ * Use as the option name for \c setsockopt(3) or \c getsockopt(3) to set or
+ * get the non-blocking transmit/receive flag for a STREAM socket.  This flag
+ * determines whether \c send() and \c recv() can be called in non-blocking
+ * contexts for the given socket.  The value is a signed integer.
+ *
+ * This option is only relevant to kernel endpoints, where descheduling the
+ * thread of execution is not allowed, for example, while holding a spinlock.
+ * It is not to be confused with conventional non-blocking socket operations.
+ *
+ * \note Only available for VMKernel endpoints.
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); int nonblock;
+ * socklen_t len = sizeof nonblock; int fd = socket(af, SOCK_STREAM, 0);
+ * getsockopt(fd, af, SO_VMCI_NONBLOCK_TXRX, &nonblock, &len); ... close(fd);
+ * vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define SO_VMCI_NONBLOCK_TXRX               7
+
+/*
+ * \brief The vSocket equivalent of INADDR_ANY.
+ *
+ * This works for the \c svm_cid field of sockaddr_vm and indicates the context
+ * ID of the current endpoint.
+ *
+ * \see sockaddr_vm
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); struct
+ * sockaddr_vm addr; int fd = socket(af, SOCK_DGRAM, 0); addr.svm_family = af;
+ * addr.svm_cid = VMADDR_CID_ANY; addr.svm_port = 2000; bind(fd, &addr, sizeof
+ * addr); ... close(fd); vmci_sock_release_af_value_fd(vmciFd); \endcode
+ */
+
+#define VMADDR_CID_ANY  ((unsigned int)-1)
+
+/*
+ * \brief Bind to any available port.
+ *
+ * Works for the \c svm_port field of sockaddr_vm.
+ *
+ * \see sockaddr_vm
+ *
+ * An example is given below.
+ *
+ * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); struct
+ * sockaddr_vm addr; int fd = socket(af, SOCK_DGRAM, 0); addr.svm_family = af;
+ * addr.svm_cid = VMADDR_CID_ANY; addr.svm_port = VMADDR_PORT_ANY; bind(fd,
+ * &addr, sizeof addr); ... close(fd); vmci_sock_release_af_value_fd(vmciFd);
+ * \endcode
+ */
+
+#define VMADDR_PORT_ANY ((unsigned int)-1)
+
+/*
+ * \brief Invalid vSockets version.
+ *
+ * \see VMCISock_Version()
+ */
+
+#define VMCI_SOCKETS_INVALID_VERSION ((unsigned int)-1)
+
+/*
+ * \brief The epoch (first) component of the vSockets version.
+ *
+ * A single byte representing the epoch component of the vSockets version.
+ *
+ * \see VMCISock_Version()
+ *
+ * An example is given below.
+ *
+ * \code unsigned int ver = VMCISock_Version(); unsigned char epoch =
+ * VMCI_SOCKETS_VERSION_EPOCH(ver); \endcode
+ */
+
+#define VMCI_SOCKETS_VERSION_EPOCH(_v) (((_v) & 0xFF000000) >> 24)
+
+/*
+ * \brief The major (second) component of the vSockets version.
+ *
+ * A single byte representing the major component of the vSockets version.
+ * Typically changes for every major release of a product.
+ *
+ * \see VMCISock_Version()
+ *
+ * An example is given below.
+ *
+ * \code unsigned int ver = VMCISock_Version(); unsigned char major =
+ * VMCI_SOCKETS_VERSION_MAJOR(ver); \endcode
+ */
+
+#define VMCI_SOCKETS_VERSION_MAJOR(_v) (((_v) & 0x00FF0000) >> 16)
+
+/*
+ * \brief The minor (third) component of the vSockets version.
+ *
+ * Two bytes representing the minor component of the vSockets version.
+ *
+ * \see VMCISock_Version()
+ *
+ * An example is given below.
+ *
+ * \code unsigned int ver = VMCISock_Version(); unsigned short minor =
+ * VMCI_SOCKETS_VERSION_MINOR(ver); \endcode
+ */
+
+#define VMCI_SOCKETS_VERSION_MINOR(_v) (((_v) & 0x0000FFFF))
+
+/** \cond PRIVATE */
+
+/** \endcond */
+
+/*
+ * \brief Address structure for vSockets.
+ *
+ * The address family should be set to whatever vmci_sock_get_af_value_fd()
+ * returns.  The structure members should all align on their natural boundaries
+ * without resorting to compiler packing directives.  The total size of this
+ * structure should be exactly the same as that of \c struct \c sockaddr.
+ *
+ * \see vmci_sock_get_af_value_fd()
+ */
+
+struct sockaddr_vm {
+
+   /** \brief Address family. \see vmci_sock_get_af_value_fd() */
+	sa_family_t svm_family;
+
+   /** \cond PRIVATE */
+	unsigned short svm_reserved1;
+   /** \endcond */
+
+   /** \brief Port.  \see VMADDR_PORT_ANY */
+	unsigned int svm_port;
+
+   /** \brief Context ID.  \see VMADDR_CID_ANY */
+	unsigned int svm_cid;
+
+   /** \cond PRIVATE */
+	unsigned char svm_zero[sizeof(struct sockaddr) -
+			       sizeof(sa_family_t) -
+			       sizeof(unsigned short) -
+			       sizeof(unsigned int) - sizeof(unsigned int)];
+   /** \endcond */
+};
+
+#if defined(linux) && defined(__KERNEL__)
+int vmci_sock_get_local_c_id(void);
+#else
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+
+#include <stdio.h>
+
+/** \cond PRIVATE */
+#define VMCI_SOCKETS_DEFAULT_DEVICE      "/dev/vsock"
+#define VMCI_SOCKETS_CLASSIC_ESX_DEVICE  "/vmfs/devices/char/vsock/vsock"
+#define VMCI_SOCKETS_VERSION       1972
+#define VMCI_SOCKETS_GET_AF_VALUE  1976
+#define VMCI_SOCKETS_GET_LOCAL_CID 1977
+/** \endcond */
+
+   /*
+    * VMCISock_Version
+    *
+    * \brief Retrieve the vSockets version.
+    *
+    * Returns the current version of vSockets.  The version is a 32-bit
+    * unsigned integer that consist of three components: the epoch, the major
+    * version, and the minor version.  Use the \c VMCI_SOCKETS_VERSION macros
+    * to extract the components.
+    *
+    * \see VMCI_SOCKETS_VERSION_EPOCH() \see VMCI_SOCKETS_VERSION_MAJOR() \see
+    * VMCI_SOCKETS_VERSION_MINOR()
+    *
+    * \retval VMCI_SOCKETS_INVALID_VERSION Not available. \retval other The
+    * current version.
+    *
+    * An example is given below.
+    *
+    * \code unsigned int ver = VMCISock_Version(); if (ver !=
+    * VMCI_SOCKETS_INVALID_VERSION) { printf("vSockets version=%d.%d.%d\n",
+    * VMCI_SOCKETS_VERSION_EPOCH(ver), VMCI_SOCKETS_VERSION_MAJOR(ver),
+    * VMCI_SOCKETS_VERSION_MINOR(ver)); } \endcode
+    */
+
+static inline unsigned int VMCISock_Version(void)
+{
+	int fd;
+	unsigned int version;
+
+	fd = open(VMCI_SOCKETS_DEFAULT_DEVICE, O_RDWR);
+	if (fd < 0) {
+		fd = open(VMCI_SOCKETS_CLASSIC_ESX_DEVICE, O_RDWR);
+		if (fd < 0)
+			return VMCI_SOCKETS_INVALID_VERSION;
+
+	}
+
+	if (ioctl(fd, VMCI_SOCKETS_VERSION, &version) < 0)
+		version = VMCI_SOCKETS_INVALID_VERSION;
+
+	close(fd);
+	return version;
+}
+
+   /*
+    * vmci_sock_get_af_value_fd
+    *
+    * \brief Retrieve the address family value for vSockets.
+    *
+    * Returns the value to be used for the VMCI Sockets address family. This
+    * value should be used as the domain argument to \c socket(2) (when you
+    * might otherwise use \c AF_INET).  For VMCI Socket-specific options, this
+    * value should also be used for the level argument to \c setsockopt(2)
+    * (when you might otherwise use \c SOL_TCP).
+    *
+    * \see vmci_sock_release_af_value_fd() \see sockaddr_vm
+    *
+    * \param[out] out_fd File descriptor to the VMCI device.  The address
+    * family value is valid until this descriptor is closed.  This parameter is
+    * only valid if the return value is not -1. Call
+    * vmci_sock_release_af_value_fd() to close this descriptor.
+    *
+    * \retval -1 Not available. \retval other The address family value.
+    *
+    * An example is given below.
+    *
+    * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); if (af !=
+    * -1) { int fd = socket(af, SOCK_STREAM, 0); ... close(fd); close(vmciFd); }
+    * \endcode
+    */
+
+static inline int vmci_sock_get_af_value_fd(int *out_fd)
+{
+	int fd;
+	int family;
+
+	fd = open(VMCI_SOCKETS_DEFAULT_DEVICE, O_RDWR);
+	if (fd < 0) {
+		fd = open(VMCI_SOCKETS_CLASSIC_ESX_DEVICE, O_RDWR);
+		if (fd < 0)
+			return -1;
+
+	}
+
+	if (ioctl(fd, VMCI_SOCKETS_GET_AF_VALUE, &family) < 0)
+		family = -1;
+
+	if (family < 0)
+		close(fd);
+	else if (out_fd)
+		*out_fd = fd;
+
+	return family;
+}
+
+   /** \cond PRIVATE */
+   /*
+    * vmci_sock_get_af_value
+    *
+    * \brief Retrieve the address family value for vSockets.
+    *
+    * Returns the value to be used for the VMCI Sockets address family. This
+    * value should be used as the domain argument to \c socket(2) (when you
+    * might otherwise use \c AF_INET).  For VMCI Socket-specific options, this
+    * value should also be used for the level argument to \c setsockopt(2)
+    * (when you might otherwise use \c SOL_TCP).
+    *
+    * \note This function leaves its descriptor to the vsock device open so
+    * that the socket implementation knows that the socket family is still in
+    * use.  This is done because the address family is registered with the
+    * kernel on-demand and a notification is needed to unregister the address
+    * family.  Use of this function is thus discouraged; please use
+    * vmci_sock_get_af_value_fd() instead.
+    *
+    * \see vmci_sock_get_af_value_fd() \see sockaddr_vm
+    *
+    * \retval -1 Not available. \retval other The address family value.
+    *
+    * An example is given below.
+    *
+    * \code int af = vmci_sock_get_af_value(); if (af != -1) { int fd =
+    * socket(af, SOCK_STREAM, 0); ... close(fd); } \endcode
+    */
+
+static inline int vmci_sock_get_af_value(void)
+{
+	return vmci_sock_get_af_value_fd(NULL);
+}
+
+   /** \endcond PRIVATE */
+
+   /*
+    * vmci_sock_release_af_value_fd
+    *
+    * \brief Release the file descriptor obtained when retrieving the address
+    * family value.
+    *
+    * Use this to release the file descriptor obtained by calling
+    * vmci_sock_get_af_value_fd().
+    *
+    * \see vmci_sock_get_af_value_fd()
+    *
+    * \param[in] fd File descriptor to the VMCI device.
+    */
+
+static inline void vmci_sock_release_af_value_fd(int fd)
+{
+	if (fd >= 0)
+		close(fd);
+
+}
+
+   /*
+    * vmci_sock_get_local_c_id
+    *
+    * \brief Retrieve the current context ID.
+    *
+    * \see VMADDR_CID_ANY
+    *
+    * \retval VMADDR_CID_ANY Not available. \retval other The current context
+    * ID.
+    *
+    * An example is given below.
+    *
+    * \code int vmciFd; int af = vmci_sock_get_af_value_fd(&vmciFd); struct
+    * sockaddr_vm addr; addr.svm_family = af; addr.svm_cid =
+    * vmci_sock_get_local_c_id(); vmci_sock_release_af_value_fd(vmciFd);
+    * \endcode
+    */
+
+static inline unsigned int vmci_sock_get_local_c_id(void)
+{
+	int fd;
+	unsigned int context_id;
+
+	fd = open(VMCI_SOCKETS_DEFAULT_DEVICE, O_RDWR);
+	if (fd < 0) {
+		fd = open(VMCI_SOCKETS_CLASSIC_ESX_DEVICE, O_RDWR);
+		if (fd < 0)
+			return VMADDR_CID_ANY;
+
+	}
+
+	if (ioctl(fd, VMCI_SOCKETS_GET_LOCAL_CID, &context_id) < 0)
+		context_id = VMADDR_CID_ANY;
+
+	close(fd);
+	return context_id;
+}
+#endif
+
+#endif
diff --git a/net/vmw_vsock/vmci_sockets_packet.h b/net/vmw_vsock/vmci_sockets_packet.h
new file mode 100644
index 0000000..9e31e30
--- /dev/null
+++ b/net/vmw_vsock/vmci_sockets_packet.h
@@ -0,0 +1,90 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vmci_sockets_packet.h --
+ *
+ * Definition of VMCI Sockets packet format, constants, and types.
+ */
+
+#ifndef _VMCI_SOCKETS_PACKET_H_
+#define _VMCI_SOCKETS_PACKET_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+
+/*
+ * STREAM control packets.
+ */
+
+/* If the packet format changes in a release then this should change too. */
+#define VSOCK_PACKET_VERSION 1
+
+/* The resource ID on which control packets are sent. */
+#define VSOCK_PACKET_RID 1
+
+enum vsock_packet_type {
+	VSOCK_PACKET_TYPE_INVALID = 0,
+	VSOCK_PACKET_TYPE_REQUEST,
+	VSOCK_PACKET_TYPE_NEGOTIATE,
+	VSOCK_PACKET_TYPE_OFFER,
+	VSOCK_PACKET_TYPE_ATTACH,
+	VSOCK_PACKET_TYPE_WROTE,
+	VSOCK_PACKET_TYPE_READ,
+	VSOCK_PACKET_TYPE_RST,
+	VSOCK_PACKET_TYPE_SHUTDOWN,
+	VSOCK_PACKET_TYPE_WAITING_WRITE,
+	VSOCK_PACKET_TYPE_WAITING_READ,
+	VSOCK_PACKET_TYPE_REQUEST2,
+	VSOCK_PACKET_TYPE_NEGOTIATE2,
+	VSOCK_PACKET_TYPE_MAX
+};
+
+typedef u16 vsock_proto_version;
+#define VSOCK_PROTO_INVALID        0
+#define VSOCK_PROTO_PKT_ON_NOTIFY (1 << 0)
+
+#define VSOCK_PROTO_ALL_SUPPORTED (VSOCK_PROTO_PKT_ON_NOTIFY)
+
+struct vsock_waiting_info {
+	u64 generation;
+	u64 offset;
+};
+
+/*
+ * Control packet type for STREAM sockets.  DGRAMs have no control packets nor
+ * special packet header for data packets, they are just raw VMCI DGRAM
+ * messages.  For STREAMs, control packets are sent over the control channel
+ * while data is written and read directly from queue pairs with no packet
+ * format.
+ */
+struct vsock_packet {
+	struct vmci_datagram dg;
+	u8 version;
+	u8 type;
+	vsock_proto_version proto;
+
+	u32 src_port;
+	u32 dst_port;
+	u32 _reserved2;
+	union {
+		u64 size;
+		u64 mode;
+		struct vmci_handle handle;
+		struct vsock_waiting_info wait;
+	} u;
+};
+
+#endif
diff --git a/net/vmw_vsock/vsock_common.h b/net/vmw_vsock/vsock_common.h
new file mode 100644
index 0000000..e902783
--- /dev/null
+++ b/net/vmw_vsock/vsock_common.h
@@ -0,0 +1,127 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsockCommon.h --
+ *
+ * VSockets common constants, types and functions.
+ */
+
+#ifndef _VSOCK_COMMON_H_
+#define _VSOCK_COMMON_H_
+
+/*
+ * vmci_sock_get_af_value_int is defined separately from vmci_sock_get_af_value
+ * because it is used in several different contexts. In particular it is called
+ * from vsock_addr.c which gets compiled into both our kernel modules as well
+ * as the user level vsock library. In the linux kernel we need different
+ * behavior than external kernel modules using VMCI Sockets api inside the
+ * kernel.  FIXME
+ */
+
+#if defined __KERNEL__
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <asm/page.h>
+#else
+/* In userland, just use the normal exported userlevel api. */
+#define vmci_sock_get_af_value_int() vmci_sock_get_af_value()
+#endif
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+
+#include "vmci_sockets.h"
+#include "vsock_addr.h"
+
+#ifdef __x86_64__
+#define FMT64 "ll"
+#else
+#define FMT64 "L"
+#endif
+
+#ifndef PAGE_SHIFT
+#if defined __i386__
+#define PAGE_SHIFT 12
+#elif defined __arm__
+#define PAGE_SHIFT 12
+#else
+#error
+#endif
+#endif
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE (1<<PAGE_SHIFT)
+#endif
+
+#define MAX_UINT32 ((u32)0xffffffff)
+
+#ifndef ESYSNOTREADY
+#define ESYSNOTREADY EOPNOTSUPP
+#endif
+
+#define sockerr()	errno
+#define sockerr2err(_e)	(((_e) > 0) ? -(_e) : (_e))
+#define SS_LISTEN	255
+
+extern vmci_id vmci_get_context_id(void);
+
+/*
+ * Helper function to determine if the given handle points to the local context.
+ * Returns TRUE if the given handle is for the local context, FALSE otherwise.
+ */
+
+static inline bool vsock_vmci_is_local(struct vmci_handle handle)
+{
+	return vmci_get_context_id() == VMCI_HANDLE_TO_CONTEXT_ID(handle);
+}
+
+/*
+ * Helper function to convert from a VMCI error code to a VSock error code.
+ */
+
+static inline s32 vsock_vmci_error_to_vsock_error(s32 vmci_error)
+{
+	int err;
+
+	switch (vmci_error) {
+	case VMCI_ERROR_NO_MEM:
+		err = ENOMEM;
+		break;
+	case VMCI_ERROR_DUPLICATE_ENTRY:
+	case VMCI_ERROR_ALREADY_EXISTS:
+		err = EADDRINUSE;
+		break;
+	case VMCI_ERROR_NO_ACCESS:
+		err = EPERM;
+		break;
+	case VMCI_ERROR_NO_RESOURCES:
+		err = ENOBUFS;
+		break;
+	case VMCI_ERROR_INVALID_RESOURCE:
+		err = EHOSTUNREACH;
+		break;
+	case VMCI_ERROR_MODULE_NOT_LOADED:
+		err = ESYSNOTREADY;
+		break;
+	case VMCI_ERROR_INVALID_ARGS:
+	default:
+		err = EINVAL;
+	}
+
+	return sockerr2err(err);
+}
+
+#endif
diff --git a/net/vmw_vsock/vsock_packet.h b/net/vmw_vsock/vsock_packet.h
new file mode 100644
index 0000000..85b8d4e
--- /dev/null
+++ b/net/vmw_vsock/vsock_packet.h
@@ -0,0 +1,124 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsockPacket.h --
+ *
+ * Packet constants, types and functions.
+ */
+
+#ifndef _VSOCK_PACKET_H_
+#define _VSOCK_PACKET_H_
+
+#include "vmci_sockets_packet.h"
+
+/*
+ *
+ * vsock_packet_init --
+ *
+ * Initialize the given packet.  The packet version is set and the fields are
+ * filled out.  Reserved fields are cleared.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void
+vsock_packet_init(struct vsock_packet *pkt,
+		  struct sockaddr_vm *src,
+		  struct sockaddr_vm *dst,
+		  u8 type,
+		  u64 size,
+		  u64 mode,
+		  struct vsock_waiting_info *wait,
+		  vsock_proto_version proto,
+		  struct vmci_handle handle)
+{
+	/*
+	 * We register the stream control handler as an any cid handle so we
+	 * must always send from a source address of VMADDR_CID_ANY
+	 */
+	pkt->dg.src = VMCI_MAKE_HANDLE(VMADDR_CID_ANY, VSOCK_PACKET_RID);
+	pkt->dg.dst = VMCI_MAKE_HANDLE(dst->svm_cid, VSOCK_PACKET_RID);
+	pkt->dg.payload_size = sizeof *pkt - sizeof pkt->dg;
+	pkt->version = VSOCK_PACKET_VERSION;
+	pkt->type = type;
+	pkt->src_port = src->svm_port;
+	pkt->dst_port = dst->svm_port;
+	memset(&pkt->proto, 0, sizeof pkt->proto);
+	memset(&pkt->_reserved2, 0, sizeof pkt->_reserved2);
+
+	switch (pkt->type) {
+	case VSOCK_PACKET_TYPE_INVALID:
+		pkt->u.size = 0;
+		break;
+
+	case VSOCK_PACKET_TYPE_REQUEST:
+	case VSOCK_PACKET_TYPE_NEGOTIATE:
+		pkt->u.size = size;
+		break;
+
+	case VSOCK_PACKET_TYPE_OFFER:
+	case VSOCK_PACKET_TYPE_ATTACH:
+		pkt->u.handle = handle;
+		break;
+
+	case VSOCK_PACKET_TYPE_WROTE:
+	case VSOCK_PACKET_TYPE_READ:
+	case VSOCK_PACKET_TYPE_RST:
+		pkt->u.size = 0;
+		break;
+
+	case VSOCK_PACKET_TYPE_SHUTDOWN:
+		pkt->u.mode = mode;
+		break;
+
+	case VSOCK_PACKET_TYPE_WAITING_READ:
+	case VSOCK_PACKET_TYPE_WAITING_WRITE:
+		memcpy(&pkt->u.wait, wait, sizeof pkt->u.wait);
+		break;
+
+	case VSOCK_PACKET_TYPE_REQUEST2:
+	case VSOCK_PACKET_TYPE_NEGOTIATE2:
+		pkt->u.size = size;
+		pkt->proto = proto;
+		break;
+	}
+}
+
+/*
+ *
+ * vsock_packet_get_addresses --
+ *
+ * Get the local and remote addresses from the given packet.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void
+vsock_packet_get_addresses(struct vsock_packet *pkt,
+			   struct sockaddr_vm *local,
+			   struct sockaddr_vm *remote)
+{
+	vsock_addr_init(local, VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.dst),
+			pkt->dst_port);
+	vsock_addr_init(remote, VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.src),
+			pkt->src_port);
+}
+
+#endif
diff --git a/net/vmw_vsock/vsock_version.h b/net/vmw_vsock/vsock_version.h
new file mode 100644
index 0000000..3a698c7
--- /dev/null
+++ b/net/vmw_vsock/vsock_version.h
@@ -0,0 +1,28 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2011-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsock_version.h --
+ *
+ * Version definitions for the Linux vsock driver.
+ */
+
+#ifndef _VSOCK_VERSION_H_
+#define _VSOCK_VERSION_H_
+
+#define VSOCK_DRIVER_VERSION_PARTS	{ 1, 0, 0, 0 }
+#define VSOCK_DRIVER_VERSION_STRING	"1.0.0.0-k"
+
+#endif /* _VSOCK_VERSION_H_ */

^ permalink raw reply related

* [PATCH 5/6] VSOCK: utility functions.
From: George Zhang @ 2012-11-21 20:40 UTC (permalink / raw)
  To: netdev, linux-kernel, georgezhang, virtualization
  Cc: pv-drivers, gregkh, davem
In-Reply-To: <20121121203715.14395.27632.stgit@promb-2n-dhcp175.eng.vmware.com>

VSOCK utility functions for Linux VSocket module.

Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>

---
 net/vmw_vsock/util.c |  620 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/vmw_vsock/util.h |  314 +++++++++++++++++++++++++
 2 files changed, 934 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/util.c
 create mode 100644 net/vmw_vsock/util.h

diff --git a/net/vmw_vsock/util.c b/net/vmw_vsock/util.c
new file mode 100644
index 0000000..cd86482
--- /dev/null
+++ b/net/vmw_vsock/util.c
@@ -0,0 +1,620 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * util.c --
+ *
+ * Utility functions for Linux VSocket module.
+ */
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/socket.h>
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+
+#include "af_vsock.h"
+#include "util.h"
+
+struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+
+DEFINE_SPINLOCK(vsock_table_lock);
+
+/*
+ *
+ * vsock_vmci_log_pkt --
+ *
+ * Logs the provided packet.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_vmci_log_pkt(char const *function, u32 line,
+			struct vsock_packet *pkt)
+{
+	char buf[256];
+	char *cur = buf;
+	int left = sizeof buf;
+	int written = 0;
+	char *type_strings[] = {
+		[VSOCK_PACKET_TYPE_INVALID] = "INVALID",
+		[VSOCK_PACKET_TYPE_REQUEST] = "REQUEST",
+		[VSOCK_PACKET_TYPE_NEGOTIATE] = "NEGOTIATE",
+		[VSOCK_PACKET_TYPE_OFFER] = "OFFER",
+		[VSOCK_PACKET_TYPE_ATTACH] = "ATTACH",
+		[VSOCK_PACKET_TYPE_WROTE] = "WROTE",
+		[VSOCK_PACKET_TYPE_READ] = "READ",
+		[VSOCK_PACKET_TYPE_RST] = "RST",
+		[VSOCK_PACKET_TYPE_SHUTDOWN] = "SHUTDOWN",
+		[VSOCK_PACKET_TYPE_WAITING_WRITE] = "WAITING_WRITE",
+		[VSOCK_PACKET_TYPE_WAITING_READ] = "WAITING_READ",
+		[VSOCK_PACKET_TYPE_REQUEST2] = "REQUEST2",
+		[VSOCK_PACKET_TYPE_NEGOTIATE2] = "NEGOTIATE2",
+	};
+
+	written = snprintf(cur, left, "PKT: %u:%u -> %u:%u",
+			   VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.src),
+			   pkt->src_port,
+			   VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.dst),
+			   pkt->dst_port);
+	if (written >= left)
+		goto error;
+
+	left -= written;
+	cur += written;
+
+	switch (pkt->type) {
+	case VSOCK_PACKET_TYPE_REQUEST:
+	case VSOCK_PACKET_TYPE_NEGOTIATE:
+		written = snprintf(cur, left, ", %s, size = %" FMT64 "u",
+				   type_strings[pkt->type], pkt->u.size);
+		break;
+
+	case VSOCK_PACKET_TYPE_OFFER:
+	case VSOCK_PACKET_TYPE_ATTACH:
+		written = snprintf(cur, left, ", %s, handle = %u:%u",
+				   type_strings[pkt->type],
+				   VMCI_HANDLE_TO_CONTEXT_ID(pkt->u.handle),
+				   VMCI_HANDLE_TO_RESOURCE_ID(pkt->u.handle));
+		break;
+
+	case VSOCK_PACKET_TYPE_WROTE:
+	case VSOCK_PACKET_TYPE_READ:
+	case VSOCK_PACKET_TYPE_RST:
+		written = snprintf(cur, left, ", %s", type_strings[pkt->type]);
+		break;
+	case VSOCK_PACKET_TYPE_SHUTDOWN: {
+		bool recv;
+		bool send;
+
+		recv = pkt->u.mode & RCV_SHUTDOWN;
+		send = pkt->u.mode & SEND_SHUTDOWN;
+		written = snprintf(cur, left, ", %s, mode = %c%c",
+				   type_strings[pkt->type],
+				   recv ? 'R' : ' ', send ? 'S' : ' ');
+	}
+	break;
+
+	case VSOCK_PACKET_TYPE_WAITING_WRITE:
+	case VSOCK_PACKET_TYPE_WAITING_READ:
+		written = snprintf(cur, left,
+			", %s, generation = %" FMT64 "u, offset = %" FMT64 "u",
+			type_strings[pkt->type],
+			pkt->u.wait.generation, pkt->u.wait.offset);
+
+		break;
+
+	case VSOCK_PACKET_TYPE_REQUEST2:
+	case VSOCK_PACKET_TYPE_NEGOTIATE2:
+		written = snprintf(cur, left,
+				   ", %s, size = %" FMT64 "u, proto = %u",
+				   type_strings[pkt->type], pkt->u.size,
+				   pkt->proto);
+		break;
+
+	default:
+		written = snprintf(cur, left, ", unrecognized type");
+	}
+
+	if (written >= left)
+		goto error;
+
+	left -= written;
+	cur += written;
+
+	written = snprintf(cur, left, "  [%s:%u]\n", function, line);
+	if (written >= left)
+		goto error;
+
+	return;
+
+error:
+	pr_err("could not log packet\n");
+}
+
+/*
+ *
+ * vsock_vmci_init_tables --
+ *
+ * Initializes the tables used for socket lookup.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_vmci_init_tables(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(vsock_bind_table); i++)
+		INIT_LIST_HEAD(&vsock_bind_table[i]);
+
+	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
+		INIT_LIST_HEAD(&vsock_connected_table[i]);
+}
+
+/*
+ *
+ * __vsock_vmci_insert_bound --
+ *
+ * Inserts socket into the bound table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count for sk is incremented.
+ */
+
+void __vsock_vmci_insert_bound(struct list_head *list, struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	sock_hold(sk);
+	list_add(&vsk->bound_table, list);
+}
+
+/*
+ *
+ * __vsock_vmci_insert_connected --
+ *
+ * Inserts socket into the connected table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count for sk is incremented.
+ */
+
+void __vsock_vmci_insert_connected(struct list_head *list, struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	sock_hold(sk);
+	list_add(&vsk->connected_table, list);
+}
+
+/*
+ *
+ * __vsock_vmci_remove_bound --
+ *
+ * Removes socket from the bound table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count for sk is decremented.
+ */
+
+void __vsock_vmci_remove_bound(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+	list_del_init(&vsk->bound_table);
+	sock_put(sk);
+}
+
+/*
+ *
+ * __vsock_vmci_remove_connected --
+ *
+ * Removes socket from the connected table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count for sk is decremented.
+ */
+
+void __vsock_vmci_remove_connected(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+	list_del_init(&vsk->connected_table);
+	sock_put(sk);
+}
+
+/*
+ *
+ * __vsock_vmci_find_bound_socket --
+ *
+ * Finds the socket corresponding to the provided address in the bound sockets
+ * hash table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: The sock structure if found, NULL if not found.
+ *
+ * Side effects: None.
+ */
+
+struct sock *__vsock_vmci_find_bound_socket(struct sockaddr_vm *addr)
+{
+	struct vsock_vmci_sock *vsk;
+
+	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+		if (vsock_addr_equals_addr_any(addr, &vsk->local_addr))
+			return sk_vsock(vsk);
+	}
+
+	return NULL;
+}
+
+/*
+ *
+ * __vsock_vmci_find_connected_socket --
+ *
+ * Finds the socket corresponding to the provided addresses in the connected
+ * sockets hash table.
+ *
+ * Note that this assumes any necessary locks are held.
+ *
+ * Results: The sock structure if found, NULL if not found.
+ *
+ * Side effects: None.
+ */
+
+struct sock *__vsock_vmci_find_connected_socket(struct sockaddr_vm *src,
+						struct sockaddr_vm *dst)
+{
+	struct vsock_vmci_sock *vsk;
+
+	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
+			    connected_table) {
+		if (vsock_addr_equals_addr(src, &vsk->remote_addr)
+		    && vsock_addr_equals_addr(dst, &vsk->local_addr)) {
+			return sk_vsock(vsk);
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ *
+ * __vsock_vmci_in_bound_table --
+ *
+ * Determines whether the provided socket is in the bound table.
+ *
+ * Results: TRUE is socket is in bound table, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool __vsock_vmci_in_bound_table(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	return !list_empty(&vsk->bound_table);
+}
+
+/*
+ *
+ * __vsock_vmci_in_connected_table --
+ *
+ * Determines whether the provided socket is in the connected table.
+ *
+ * Results: TRUE is socket is in connected table, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool __vsock_vmci_in_connected_table(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	return !list_empty(&vsk->connected_table);
+}
+
+/*
+ *
+ * vsock_vmci_get_pending --
+ *
+ * Retrieves a pending connection that matches the addresses specified in the
+ * provided packet.
+ *
+ * Assumes the socket lock is held for listener.
+ *
+ * Results: Socket of the pending connection on success, NULL if not found.
+ *
+ * Side effects: A reference is held on the socket until the release function
+ * is called.
+ */
+
+struct sock *vsock_vmci_get_pending(struct sock *listener,
+				    struct vsock_packet *pkt)
+{
+	struct vsock_vmci_sock *vlistener;
+	struct vsock_vmci_sock *vpending;
+	struct sock *pending;
+
+	vlistener = vsock_sk(listener);
+
+	list_for_each_entry(vpending, &vlistener->pending_links,
+			    pending_links) {
+		struct sockaddr_vm src;
+		struct sockaddr_vm dst;
+
+		vsock_addr_init(&src, VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.src),
+				pkt->src_port);
+		vsock_addr_init(&dst, VMCI_HANDLE_TO_CONTEXT_ID(pkt->dg.dst),
+				pkt->dst_port);
+
+		if (vsock_addr_equals_addr(&src, &vpending->remote_addr) &&
+		    vsock_addr_equals_addr(&dst, &vpending->local_addr)) {
+			pending = sk_vsock(vpending);
+			sock_hold(pending);
+			goto found;
+		}
+	}
+
+	pending = NULL;
+found:
+	return pending;
+
+}
+
+/*
+ *
+ * vsock_vmci_release_pending --
+ *
+ * Releases the reference on a socket previously obtained by a call to
+ * vsock_vmci_get_pending().
+ *
+ * Results: None.
+ *
+ * Side effects: The socket may be freed if this was the last reference.
+ */
+
+void vsock_vmci_release_pending(struct sock *pending)
+{
+	sock_put(pending);
+}
+
+/*
+ *
+ * vsock_vmci_add_pending --
+ *
+ * Adds a pending connection on listener's pending list.
+ *
+ * Assumes the socket lock is held for listener. Assumes the socket lock is
+ * held for pending.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count of the sockets is incremented.
+ */
+
+void vsock_vmci_add_pending(struct sock *listener, struct sock *pending)
+{
+	struct vsock_vmci_sock *vlistener;
+	struct vsock_vmci_sock *vpending;
+
+	vlistener = vsock_sk(listener);
+	vpending = vsock_sk(pending);
+
+	sock_hold(pending);
+	sock_hold(listener);
+	list_add_tail(&vpending->pending_links, &vlistener->pending_links);
+}
+
+/*
+ *
+ * vsock_vmci_remove_pending --
+ *
+ * Removes a pending connection from the listener's pending list.
+ *
+ * Assumes the socket lock is held for listener. Assumes the socket lock is
+ * held for pending.
+ *
+ * Results: None.
+ *
+ * Side effects: The reference count of the sockets is decremented.
+ */
+
+void vsock_vmci_remove_pending(struct sock *listener, struct sock *pending)
+{
+	struct vsock_vmci_sock *vpending = vsock_sk(pending);
+
+	list_del_init(&vpending->pending_links);
+	sock_put(listener);
+	sock_put(pending);
+}
+
+/*
+ *
+ * vsock_vmci_enqueue_accept --
+ *
+ * Enqueues the connected socket on the listening socket's accepting queue.
+ *
+ * Assumes the socket lock is held for listener. Assumes the socket lock is
+ * held for connected.
+ *
+ * Results: None.
+ *
+ * Side effects: The sockets' reference counts are incremented.
+ */
+
+void vsock_vmci_enqueue_accept(struct sock *listener, struct sock *connected)
+{
+	struct vsock_vmci_sock *vlistener;
+	struct vsock_vmci_sock *vconnected;
+
+	vlistener = vsock_sk(listener);
+	vconnected = vsock_sk(connected);
+
+	sock_hold(connected);
+	sock_hold(listener);
+	list_add_tail(&vconnected->accept_queue, &vlistener->accept_queue);
+}
+
+/*
+ *
+ * vsock_vmci_dequeue_accept --
+ *
+ * Dequeues the next connected socket from the listening socket's accept queue.
+ *
+ * Assumes the socket lock is held for listener.
+ *
+ * Note that the caller must call sock_put() on the returned socket once it is
+ * done with the socket.
+ *
+ * Results: The next socket from the queue, or NULL if the queue is empty.
+ *
+ * Side effects: The reference count of the listener is decremented.
+ */
+
+struct sock *vsock_vmci_dequeue_accept(struct sock *listener)
+{
+	struct vsock_vmci_sock *vlistener;
+	struct vsock_vmci_sock *vconnected;
+
+	vlistener = vsock_sk(listener);
+
+	if (list_empty(&vlistener->accept_queue))
+		return NULL;
+
+	vconnected = list_entry(vlistener->accept_queue.next,
+				struct vsock_vmci_sock, accept_queue);
+
+	list_del_init(&vconnected->accept_queue);
+	sock_put(listener);
+	/*
+	 * The caller will need a reference on the connected socket so we let
+	 * it call sock_put().
+	 */
+
+	return sk_vsock(vconnected);
+}
+
+/*
+ *
+ * vsock_vmci_remove_accept --
+ *
+ * Removes a socket from the accept queue of a listening socket.
+ *
+ * Assumes the socket lock is held for listener. Assumes the socket lock is
+ * held for connected.
+ *
+ * Results: None.
+ *
+ * Side effects: The sockets' reference counts are decremented.
+ */
+
+void vsock_vmci_remove_accept(struct sock *listener, struct sock *connected)
+{
+	struct vsock_vmci_sock *vconnected;
+
+	if (!vsock_vmci_in_accept_queue(connected))
+		return;
+
+	vconnected = vsock_sk(connected);
+
+	list_del_init(&vconnected->accept_queue);
+	sock_put(listener);
+	sock_put(connected);
+}
+
+/*
+ *
+ * vsock_vmci_in_accept_queue --
+ *
+ * Determines whether a socket is on an accept queue.
+ *
+ * Assumes the socket lock is held for sk.
+ *
+ * Results: TRUE if the socket is in an accept queue, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_vmci_in_accept_queue(struct sock *sk)
+{
+	/*
+	 * If our accept queue isn't empty, it means we're linked into some
+	 * listener socket's accept queue.
+	 */
+	return !vsock_vmci_is_accept_queue_empty(sk);
+}
+
+/*
+ *
+ * vsock_vmci_is_accept_queue_empty --
+ *
+ * Determines whether the provided socket's accept queue is empty.
+ *
+ * Assumes the socket lock is held for sk.
+ *
+ * Results: TRUE if the socket's accept queue is empty, FALSE otherwsise.
+ *
+ * Side effects: None.
+ *
+ */
+
+bool vsock_vmci_is_accept_queue_empty(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+	return list_empty(&vsk->accept_queue);
+}
+
+/*
+ *
+ * vsock_vmci_is_pending --
+ *
+ * Determines whether a socket is pending.
+ *
+ * Assumes the socket lock is held for sk.
+ *
+ * Results: TRUE if the socket is pending, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_vmci_is_pending(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+	return !list_empty(&vsk->pending_links);
+}
diff --git a/net/vmw_vsock/util.h b/net/vmw_vsock/util.h
new file mode 100644
index 0000000..bc8ab7e
--- /dev/null
+++ b/net/vmw_vsock/util.h
@@ -0,0 +1,314 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * util.h --
+ *
+ * Utility functions for Linux VSocket module.
+ */
+
+#ifndef __UTIL_H__
+#define __UTIL_H__
+
+#include <linux/types.h>
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+#include <linux/spinlock.h>
+
+#include "vsock_common.h"
+#include "vsock_packet.h"
+
+/*
+ * Each bound VSocket is stored in the bind hash table and each connected
+ * VSocket is stored in the connected hash table.
+ *
+ * Unbound sockets are all put on the same list attached to the end of the hash
+ * table (vsock_unbound_sockets).  Bound sockets are added to the hash table in
+ * the bucket that their local address hashes to (vsock_bound_sockets(addr)
+ * represents the list that addr hashes to).
+ *
+ * Specifically, we initialize the vsock_bind_table array to a size of
+ * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
+ * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
+ * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.  The hash function
+ * mods with VSOCK_HASH_SIZE - 1 to ensure this.
+ */
+#define VSOCK_HASH_SIZE         251
+#define LAST_RESERVED_PORT      1023
+#define MAX_PORT_RETRIES        24
+
+extern struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+extern struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+
+extern spinlock_t vsock_table_lock;
+
+#define VSOCK_HASH(addr)        ((addr)->svm_port % (VSOCK_HASH_SIZE - 1))
+#define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
+#define vsock_unbound_sockets     (&vsock_bind_table[VSOCK_HASH_SIZE])
+
+/* XXX This can probably be implemented in a better way. */
+#define VSOCK_CONN_HASH(src, dst)				\
+	(((src)->svm_cid ^ (dst)->svm_port) % (VSOCK_HASH_SIZE - 1))
+#define vsock_connected_sockets(src, dst)		\
+	(&vsock_connected_table[VSOCK_CONN_HASH(src, dst)])
+#define vsock_connected_sockets_vsk(vsk)				\
+	vsock_connected_sockets(&(vsk)->remote_addr, &(vsk)->local_addr)
+
+/*
+ * Prototypes.
+ */
+
+void vsock_vmci_log_pkt(char const *function, u32 line,
+			struct vsock_packet *pkt);
+
+void vsock_vmci_init_tables(void);
+void __vsock_vmci_insert_bound(struct list_head *list, struct sock *sk);
+void __vsock_vmci_insert_connected(struct list_head *list, struct sock *sk);
+void __vsock_vmci_remove_bound(struct sock *sk);
+void __vsock_vmci_remove_connected(struct sock *sk);
+struct sock *__vsock_vmci_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *__vsock_vmci_find_connected_socket(struct sockaddr_vm *src,
+						struct sockaddr_vm *dst);
+bool __vsock_vmci_in_bound_table(struct sock *sk);
+bool __vsock_vmci_in_connected_table(struct sock *sk);
+
+struct sock *vsock_vmci_get_pending(struct sock *listener,
+				    struct vsock_packet *pkt);
+void vsock_vmci_release_pending(struct sock *pending);
+void vsock_vmci_add_pending(struct sock *listener, struct sock *pending);
+void vsock_vmci_remove_pending(struct sock *listener, struct sock *pending);
+void vsock_vmci_enqueue_accept(struct sock *listener, struct sock *connected);
+struct sock *vsock_vmci_dequeue_accept(struct sock *listener);
+void vsock_vmci_remove_accept(struct sock *listener, struct sock *connected);
+bool vsock_vmci_in_accept_queue(struct sock *sk);
+bool vsock_vmci_is_accept_queue_empty(struct sock *sk);
+bool vsock_vmci_is_pending(struct sock *sk);
+
+static inline void vsock_vmci_insert_bound(struct list_head *list,
+					   struct sock *sk);
+static inline void vsock_vmci_insert_connected(struct list_head *list,
+					       struct sock *sk);
+static inline void vsock_vmci_remove_bound(struct sock *sk);
+static inline void vsock_vmci_remove_connected(struct sock *sk);
+static inline struct sock *vsock_vmci_find_bound_socket(struct sockaddr_vm
+							*addr);
+static inline struct sock *vsock_vmci_find_connected_socket(struct sockaddr_vm
+							    *src,
+							    struct sockaddr_vm
+							    *dst);
+static inline bool vsock_vmci_in_bound_table(struct sock *sk);
+static inline bool vsock_vmci_in_connected_table(struct sock *sk);
+
+/*
+ *
+ * vsock_vmci_insert_bound --
+ *
+ * Inserts socket into the bound table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: None.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline void
+vsock_vmci_insert_bound(struct list_head *list, struct sock *sk)
+{
+	spin_lock_bh(&vsock_table_lock);
+	__vsock_vmci_insert_bound(list, sk);
+	spin_unlock_bh(&vsock_table_lock);
+}
+
+/*
+ *
+ * vsock_vmci_insert_connected --
+ *
+ * Inserts socket into the connected table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: None.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline void
+vsock_vmci_insert_connected(struct list_head *list, struct sock *sk)
+{
+	spin_lock_bh(&vsock_table_lock);
+	__vsock_vmci_insert_connected(list, sk);
+	spin_unlock_bh(&vsock_table_lock);
+}
+
+/*
+ *
+ * vsock_vmci_remove_bound --
+ *
+ * Removes socket from the bound list.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: None.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline void vsock_vmci_remove_bound(struct sock *sk)
+{
+	spin_lock_bh(&vsock_table_lock);
+	__vsock_vmci_remove_bound(sk);
+	spin_unlock_bh(&vsock_table_lock);
+}
+
+/*
+ *
+ * vsock_vmci_remove_connected --
+ *
+ * Removes socket from the connected list.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: None.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline void vsock_vmci_remove_connected(struct sock *sk)
+{
+	spin_lock_bh(&vsock_table_lock);
+	__vsock_vmci_remove_connected(sk);
+	spin_unlock_bh(&vsock_table_lock);
+}
+
+/*
+ *
+ * vsock_vmci_find_bound_socket --
+ *
+ * Finds the socket corresponding to the provided address in the bound sockets
+ * hash table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these are called from tasklets.
+ *
+ * Results: The sock structure if found, NULL on failure.
+ *
+ * Side effects: vsock_table_lock is acquired and released. The socket's
+ * reference count is increased.
+ */
+
+static inline struct sock *vsock_vmci_find_bound_socket(struct sockaddr_vm
+							*addr)
+{
+	struct sock *sk;
+
+	spin_lock_bh(&vsock_table_lock);
+	sk = __vsock_vmci_find_bound_socket(addr);
+	if (sk)
+		sock_hold(sk);
+
+	spin_unlock_bh(&vsock_table_lock);
+
+	return sk;
+}
+
+/*
+ *
+ * vsock_vmci_find_connected_socket --
+ *
+ * Finds the socket corresponding to the provided address in the connected
+ * sockets hash table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these are called from tasklets.
+ *
+ * Results: The sock structure if found, NULL on failure.
+ *
+ * Side effects: vsock_table_lock is acquired and released. The socket's
+ * reference count is increased.
+ */
+
+static inline struct sock *vsock_vmci_find_connected_socket(struct sockaddr_vm
+							    *src,
+							    struct sockaddr_vm
+							    *dst)
+{
+	struct sock *sk;
+
+	spin_lock_bh(&vsock_table_lock);
+	sk = __vsock_vmci_find_connected_socket(src, dst);
+	if (sk)
+		sock_hold(sk);
+
+	spin_unlock_bh(&vsock_table_lock);
+
+	return sk;
+}
+
+/*
+ *
+ * vsock_vmci_in_bound_table --
+ *
+ * Determines whether the provided socket is in the bound table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: TRUE is socket is in bound table, FALSE otherwise.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline bool vsock_vmci_in_bound_table(struct sock *sk)
+{
+	bool ret;
+
+	spin_lock_bh(&vsock_table_lock);
+	ret = __vsock_vmci_in_bound_table(sk);
+	spin_unlock_bh(&vsock_table_lock);
+
+	return ret;
+}
+
+/*
+ *
+ * vsock_vmci_in_connected_table --
+ *
+ * Determines whether the provided socket is in the connected table.
+ *
+ * Note that it is important to invoke the bottom-half versions of the spinlock
+ * functions since these may be called from tasklets.
+ *
+ * Results: TRUE is socket is in connected table, FALSE otherwise.
+ *
+ * Side effects: vsock_table_lock is acquired and released.
+ */
+
+static inline bool vsock_vmci_in_connected_table(struct sock *sk)
+{
+	bool ret;
+
+	spin_lock_bh(&vsock_table_lock);
+	ret = __vsock_vmci_in_connected_table(sk);
+	spin_unlock_bh(&vsock_table_lock);
+
+	return ret;
+}
+
+#endif /* __UTIL_H__ */

^ permalink raw reply related

* [PATCH 4/6] VSOCK: statistics implementation.
From: George Zhang @ 2012-11-21 20:40 UTC (permalink / raw)
  To: netdev, linux-kernel, georgezhang, virtualization
  Cc: pv-drivers, gregkh, davem
In-Reply-To: <20121121203715.14395.27632.stgit@promb-2n-dhcp175.eng.vmware.com>

VSOCK stats for VMCI Stream Sockets protocol.

Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>

---
 net/vmw_vsock/stats.c |   37 ++++++++
 net/vmw_vsock/stats.h |  217 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 254 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/stats.c
 create mode 100644 net/vmw_vsock/stats.h

diff --git a/net/vmw_vsock/stats.c b/net/vmw_vsock/stats.c
new file mode 100644
index 0000000..2d172d5
--- /dev/null
+++ b/net/vmw_vsock/stats.c
@@ -0,0 +1,37 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * stats.c --
+ *
+ * Linux stats for the VMCI Stream Sockets protocol.
+ */
+
+#include <linux/types.h>
+
+#include <linux/socket.h>
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+
+#include "af_vsock.h"
+#include "stats.h"
+
+#ifdef VSOCK_GATHER_STATISTICS
+u64 vsock_stats_ctl_pkt_count[VSOCK_PACKET_TYPE_MAX];
+u64 vsock_stats_consume_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+u64 vsock_stats_produce_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+atomic64_t vsock_stats_consume_total;
+atomic64_t vsock_stats_produce_total;
+#endif
diff --git a/net/vmw_vsock/stats.h b/net/vmw_vsock/stats.h
new file mode 100644
index 0000000..9949b22
--- /dev/null
+++ b/net/vmw_vsock/stats.h
@@ -0,0 +1,217 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * stats.h --
+ *
+ * Stats functions for Linux vsock module.
+ */
+
+#ifndef __STATS_H__
+#define __STATS_H__
+
+#include <linux/types.h>
+
+#include "vsock_common.h"
+#include "vsock_packet.h"
+
+/*
+ * Define VSOCK_GATHER_STATISTICS to turn on statistics gathering. Currently
+ * this consists of 3 types of stats: 1. The number of control datagram
+ * messages sent. 2. The level of queuepair fullness (in 10% buckets) whenever
+ * data is about to be enqueued or dequeued from the queuepair. 3. The total
+ * number of bytes enqueued/dequeued.
+ */
+
+#ifdef VSOCK_GATHER_STATISTICS
+
+#define VSOCK_NUM_QUEUE_LEVEL_BUCKETS 10
+extern u64 vsock_stats_ctl_pkt_count[VSOCK_PACKET_TYPE_MAX];
+extern u64 vsock_stats_consume_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+extern u64 vsock_stats_produce_queue_hist[VSOCK_NUM_QUEUE_LEVEL_BUCKETS];
+extern atomic64_t vsock_stats_consume_total;
+extern atomic64_t vsock_stats_produce_total;
+
+#define VSOCK_STATS_STREAM_CONSUME_HIST(vsk)				\
+	vsock_vmci_stats_update_queue_bucket_count((vsk)->qpair,	\
+				(vsk)->consume_size,	\
+				vmci_qpair_consume_buf_ready((vsk)->qpair), \
+				vsock_stats_consume_queue_hist)
+#define VSOCK_STATS_STREAM_PRODUCE_HIST(vsk)				\
+	vsock_vmci_stats_update_queue_bucket_count((vsk)->qpair,	\
+				(vsk)->produce_size,	\
+				vmci_qpair_produce_buf_ready((vsk)->qpair), \
+				vsock_stats_produce_queue_hist)
+#define VSOCK_STATS_CTLPKT_LOG(pkt_type)				\
+	do {								\
+		++vsock_stats_ctl_pkt_count[pkt_type];			\
+	} while (0)
+#define VSOCK_STATS_STREAM_CONSUME(bytes)		\
+	atomic64_add(&vsock_stats_consume_total, bytes)
+#define VSOCK_STATS_STREAM_PRODUCE(bytes)		\
+	atomic64_add(&vsock_stats_produce_total, bytes)
+#define VSOCK_STATS_CTLPKT_DUMP_ALL() vsock_vmci_stats_ctl_pkt_dump_all()
+#define VSOCK_STATS_HIST_DUMP_ALL()   vsock_vmci_stats_hist_dump_all()
+#define VSOCK_STATS_TOTALS_DUMP_ALL() vsock_vmci_stats_totals_dump_all()
+#define VSOCK_STATS_RESET()           vsock_vmci_stats_reset()
+
+/*
+ *
+ * vsock_vmci_stats_update_queue_bucket_count --
+ *
+ * Given a queue, determine how much data is enqueued and add that to the
+ * specified queue level statistic bucket.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void
+vsock_vmci_stats_update_queue_bucket_count(vmci_qpair *qpair,
+					   u64 queue_size,
+					   u64 data_ready,
+					   u64 queue_hist[])
+{
+	u64 bucket = 0;
+	u32 remainder = 0;
+
+	/*
+	 * We can't do 64 / 64 = 64 bit divides on linux because it requires a
+	 * libgcc which is not linked into the kernel module. Since this code
+	 * is only used by developers we just limit the queue_size to be less
+	 * than MAX_UINT for now.
+	 */
+	Div643264(data_ready * 10, queue_size, &bucket, &remainder);
+	++queue_hist[bucket];
+}
+
+/*
+ *
+ * vsock_vmci_stats_ctl_pkt_dump_all --
+ *
+ * Prints all stream control packet counts out to the console using the
+ * appropriate platform logging.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void vsock_vmci_stats_ctl_pkt_dump_all(void)
+{
+	int index;
+
+	BUILD_BUG_ON(VSOCK_PACKET_TYPE_MAX !=
+			ARRAY_SIZE(vsock_stats_ctl_pkt_count));
+
+	for (index = 0; index < ARRAY_SIZE(vsock_stats_ctl_pkt_count);
+	     index++) {
+		pr_info("Control packet: Type = %u, Count = %" FMT64
+		       "u\n", index, vsock_stats_ctl_pkt_count[index]);
+	}
+}
+
+/*
+ *
+ * vsock_vmci_stats_hist_dump_all --
+ *
+ * Prints the produce and consume queue histograms to the console.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void vsock_vmci_stats_hist_dump_all(void)
+{
+	int index;
+
+#define VSOCK_DUMP_HIST(strname, name) do {		    \
+		for (index = 0; index < ARRAY_SIZE(name); index++) {	\
+			printk(strname " Bucket count %u = %"FMT64"u\n", \
+			       index, name[index]);			\
+		}							\
+	} while (0)
+
+	VSOCK_DUMP_HIST("Produce Queue", vsock_stats_produce_queue_hist);
+	VSOCK_DUMP_HIST("Consume Queue", vsock_stats_consume_queue_hist);
+
+#undef VSOCK_DUMP_HIST
+}
+
+/*
+ *
+ * vsock_vmci_stats_totals_dump_all --
+ *
+ * Prints the produce and consume totals.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void vsock_vmci_stats_totals_dump_all(void)
+{
+	pr_info("Produced %" FMT64 "u total bytes\n",
+	       atomic64_read(&vsock_stats_produce_total));
+	pr_info("Consumed %" FMT64 "u total bytes\n",
+	       atomic64_read(&vsock_stats_consume_total));
+}
+
+/*
+ *
+ * vsock_vmci_stats_reset --
+ *
+ * Reset all VSock statistics.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static inline void vsock_vmci_stats_reset(void)
+{
+	int index;
+
+#define VSOCK_RESET_ARRAY(name) do {			   \
+		for (index = 0; index < ARRAY_SIZE(name); index++) {	\
+			name[index] = 0;				\
+		}							\
+	} while (0)
+
+	VSOCK_RESET_ARRAY(vsock_stats_ctl_pkt_count);
+	VSOCK_RESET_ARRAY(vsock_stats_produce_queue_hist);
+	VSOCK_RESET_ARRAY(vsock_stats_consume_queue_hist);
+
+#undef VSOCK_RESET_ARRAY
+
+	atomic64_set(&vsock_stats_consume_total, 0);
+	atomic64_set(&vsock_stats_produce_total, 0);
+}
+
+#else
+#define VSOCK_STATS_STREAM_CONSUME_HIST(vsk)
+#define VSOCK_STATS_STREAM_PRODUCE_HIST(vsk)
+#define VSOCK_STATS_STREAM_PRODUCE(bytes)
+#define VSOCK_STATS_STREAM_CONSUME(bytes)
+#define VSOCK_STATS_CTLPKT_LOG(pkt_type)
+#define VSOCK_STATS_CTLPKT_DUMP_ALL()
+#define VSOCK_STATS_HIST_DUMP_ALL()
+#define VSOCK_STATS_TOTALS_DUMP_ALL()
+#define VSOCK_STATS_RESET()
+#endif
+
+#endif

^ permalink raw reply related

* [PATCH 3/6] VSOCK: notification implementation.
From: George Zhang @ 2012-11-21 20:40 UTC (permalink / raw)
  To: netdev, linux-kernel, georgezhang, virtualization
  Cc: pv-drivers, gregkh, davem
In-Reply-To: <20121121203715.14395.27632.stgit@promb-2n-dhcp175.eng.vmware.com>

VSOCK control notifications for VMCI Stream Sockets protocol.

Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>

---
 net/vmw_vsock/notify.c |  983 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/vmw_vsock/notify.h |  130 ++++++
 2 files changed, 1113 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/notify.c
 create mode 100644 net/vmw_vsock/notify.h

diff --git a/net/vmw_vsock/notify.c b/net/vmw_vsock/notify.c
new file mode 100644
index 0000000..8504e28
--- /dev/null
+++ b/net/vmw_vsock/notify.c
@@ -0,0 +1,983 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * notify.c --
+ *
+ * Linux control notifications for the VMCI Stream Sockets protocol.
+ */
+
+#include <linux/types.h>
+
+#include <linux/socket.h>
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+
+#include "notify.h"
+#include "af_vsock.h"
+
+#define PKT_FIELD(vsk, field_name) ((vsk)->notify.pkt.field_name)
+
+#define VSOCK_MAX_DGRAM_RESENDS       10
+
+/*
+ *
+ * vsock_vmci_notify_waiting_write --
+ *
+ * Determines if the conditions have been met to notify a waiting writer.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool vsock_vmci_notify_waiting_write(struct vsock_vmci_sock *vsk)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	bool retval;
+	u64 notify_limit;
+
+	if (!PKT_FIELD(vsk, peer_waiting_write))
+		return false;
+
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+	/*
+	 * When the sender blocks, we take that as a sign that the sender is
+	 * faster than the receiver. To reduce the transmit rate of the sender,
+	 * we delay the sending of the read notification by decreasing the
+	 * write_notify_window. The notification is delayed until the number of
+	 * bytes used in the queue drops below the write_notify_window.
+	 */
+
+	if (!PKT_FIELD(vsk, peer_waiting_write_detected)) {
+		PKT_FIELD(vsk, peer_waiting_write_detected) = true;
+		if (PKT_FIELD(vsk, write_notify_window) < PAGE_SIZE) {
+			PKT_FIELD(vsk, write_notify_window) =
+			    PKT_FIELD(vsk, write_notify_min_window);
+		} else {
+			PKT_FIELD(vsk, write_notify_window) -= PAGE_SIZE;
+			if (PKT_FIELD(vsk, write_notify_window) <
+			    PKT_FIELD(vsk, write_notify_min_window))
+				PKT_FIELD(vsk, write_notify_window) =
+				    PKT_FIELD(vsk, write_notify_min_window);
+
+		}
+	}
+	notify_limit = vsk->consume_size - PKT_FIELD(vsk, write_notify_window);
+#else
+	notify_limit = 0;
+#endif
+
+	/*
+	 * For now we ignore the wait information and just see if the free
+	 * space exceeds the notify limit.  Note that improving this function
+	 * to be more intelligent will not require a protocol change and will
+	 * retain compatibility between endpoints with mixed versions of this
+	 * function.
+	 *
+	 * The notify_limit is used to delay notifications in the case where
+	 * flow control is enabled. Below the test is expressed in terms of
+	 * free space in the queue: if free_space > ConsumeSize -
+	 * write_notify_window then notify An alternate way of expressing this
+	 * is to rewrite the expression to use the data ready in the receive
+	 * queue: if write_notify_window > bufferReady then notify as
+	 * free_space == ConsumeSize - bufferReady.
+	 */
+	retval = vmci_qpair_consume_free_space(vsk->qpair) > notify_limit;
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+	if (retval) {
+		/*
+		 * Once we notify the peer, we reset the detected flag so the
+		 * next wait will again cause a decrease in the window size.
+		 */
+
+		PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+	}
+#endif
+	return retval;
+#else
+	return true;
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_notify_waiting_read --
+ *
+ * Determines if the conditions have been met to notify a waiting reader.
+ *
+ * Results: true if a notification should be sent, false otherwise.
+ *
+ * Side effects: None.
+ */
+
+static bool vsock_vmci_notify_waiting_read(struct vsock_vmci_sock *vsk)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	if (!PKT_FIELD(vsk, peer_waiting_read))
+		return false;
+
+	/*
+	 * For now we ignore the wait information and just see if there is any
+	 * data for our peer to read.  Note that improving this function to be
+	 * more intelligent will not require a protocol change and will retain
+	 * compatibility between endpoints with mixed versions of this
+	 * function.
+	 */
+	return vmci_qpair_produce_buf_ready(vsk->qpair) > 0;
+#else
+	return true;
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_handle_waiting_read --
+ *
+ * Handles an incoming waiting read message.
+ *
+ * Results: None.
+ *
+ * Side effects: May send a notification to the peer, may update socket's wait
+ * info structure.
+ */
+
+static void
+vsock_vmci_handle_waiting_read(struct sock *sk,
+			       struct vsock_packet *pkt,
+			       bool bottom_half,
+			       struct sockaddr_vm *dst,
+			       struct sockaddr_vm *src)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, peer_waiting_read) = true;
+	memcpy(&PKT_FIELD(vsk, peer_waiting_read_info), &pkt->u.wait,
+	       sizeof PKT_FIELD(vsk, peer_waiting_read_info));
+
+	if (vsock_vmci_notify_waiting_read(vsk)) {
+		bool sent;
+
+		if (bottom_half)
+			sent = VSOCK_SEND_WROTE_BH(dst, src) > 0;
+		else
+			sent = VSOCK_SEND_WROTE(sk) > 0;
+
+		if (sent)
+			PKT_FIELD(vsk, peer_waiting_read) = false;
+
+	}
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_handle_waiting_write --
+ *
+ * Handles an incoming waiting write message.
+ *
+ * Results: None.
+ *
+ * Side effects: May send a notification to the peer, may update socket's wait
+ * info structure.
+ */
+
+static void
+vsock_vmci_handle_waiting_write(struct sock *sk,
+				struct vsock_packet *pkt,
+				bool bottom_half,
+				struct sockaddr_vm *dst,
+				struct sockaddr_vm *src)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, peer_waiting_write) = true;
+	memcpy(&PKT_FIELD(vsk, peer_waiting_write_info), &pkt->u.wait,
+	       sizeof PKT_FIELD(vsk, peer_waiting_write_info));
+
+	if (vsock_vmci_notify_waiting_write(vsk)) {
+		bool sent;
+
+		if (bottom_half)
+			sent = VSOCK_SEND_READ_BH(dst, src) > 0;
+		else
+			sent = VSOCK_SEND_READ(sk) > 0;
+
+		if (sent)
+			PKT_FIELD(vsk, peer_waiting_write) = false;
+
+	}
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_handle_read --
+ *
+ * Handles an incoming read message.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_handle_read(struct sock *sk,
+		       struct vsock_packet *pkt,
+		       bool bottom_half,
+		       struct sockaddr_vm *dst, struct sockaddr_vm *src)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+	PKT_FIELD(vsk, sent_waiting_write) = false;
+#endif
+
+	sk->sk_write_space(sk);
+}
+
+/*
+ *
+ * vsock_vmci_send_waiting_read --
+ *
+ * Sends a waiting read notification to this socket's peer.
+ *
+ * Results: true if the datagram is sent successfully, false otherwise.
+ *
+ * Side effects: Our peer will notify us when there is data to read from our
+ * consume queue.
+ */
+
+static bool vsock_vmci_send_waiting_read(struct sock *sk, u64 room_needed)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+	struct vsock_waiting_info waiting_info;
+	u64 tail;
+	u64 head;
+	u64 room_left;
+	bool ret;
+
+	vsk = vsock_sk(sk);
+
+	if (PKT_FIELD(vsk, sent_waiting_read))
+		return true;
+
+	if (PKT_FIELD(vsk, write_notify_window) < vsk->consume_size)
+		PKT_FIELD(vsk, write_notify_window) =
+		    min(PKT_FIELD(vsk, write_notify_window) + PAGE_SIZE,
+			vsk->consume_size);
+
+	vmci_qpair_get_consume_indexes(vsk->qpair, &tail, &head);
+	room_left = vsk->consume_size - head;
+	if (room_needed >= room_left) {
+		waiting_info.offset = room_needed - room_left;
+		waiting_info.generation =
+		    PKT_FIELD(vsk, consume_q_generation) + 1;
+	} else {
+		waiting_info.offset = head + room_needed;
+		waiting_info.generation = PKT_FIELD(vsk, consume_q_generation);
+	}
+
+	ret = VSOCK_SEND_WAITING_READ(sk, &waiting_info) > 0;
+	if (ret)
+		PKT_FIELD(vsk, sent_waiting_read) = true;
+
+	return ret;
+#else
+	return true;
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_send_waiting_write --
+ *
+ * Sends a waiting write notification to this socket's peer.
+ *
+ * Results: true if the datagram is sent successfully or does not need to be
+ * sent. false otherwise.
+ *
+ * Side effects: Our peer will notify us when there is room to write in to our
+ * produce queue.
+ */
+
+static bool vsock_vmci_send_waiting_write(struct sock *sk, u64 room_needed)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+	struct vsock_waiting_info waiting_info;
+	u64 tail;
+	u64 head;
+	u64 room_left;
+	bool ret;
+
+	vsk = vsock_sk(sk);
+
+	if (PKT_FIELD(vsk, sent_waiting_write))
+		return true;
+
+	vmci_qpair_get_produce_indexes(vsk->qpair, &tail, &head);
+	room_left = vsk->produce_size - tail;
+	if (room_needed + 1 >= room_left) {
+		/* Wraps around to current generation. */
+		waiting_info.offset = room_needed + 1 - room_left;
+		waiting_info.generation = PKT_FIELD(vsk, produce_q_generation);
+	} else {
+		waiting_info.offset = tail + room_needed + 1;
+		waiting_info.generation =
+		    PKT_FIELD(vsk, produce_q_generation) - 1;
+	}
+
+	ret = VSOCK_SEND_WAITING_WRITE(sk, &waiting_info) > 0;
+	if (ret)
+		PKT_FIELD(vsk, sent_waiting_write) = true;
+
+	return ret;
+#else
+	return true;
+#endif
+}
+
+/*
+ *
+ * vsock_vmci_send_read_notification --
+ *
+ * Sends a read notification to this socket's peer.
+ *
+ * Results: >= 0 if the datagram is sent successfully, negative error value
+ * otherwise.
+ *
+ * Side effects: None.
+ */
+
+static int vsock_vmci_send_read_notification(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+	bool sent_read;
+	unsigned int retries;
+	int err;
+
+	vsk = vsock_sk(sk);
+	sent_read = false;
+	retries = 0;
+	err = 0;
+
+	if (vsock_vmci_notify_waiting_write(vsk)) {
+		/*
+		 * Notify the peer that we have read, retrying the send on
+		 * failure up to our maximum value.  XXX For now we just log
+		 * the failure, but later we should schedule a work item to
+		 * handle the resend until it succeeds.  That would require
+		 * keeping track of work items in the vsk and cleaning them up
+		 * upon socket close.
+		 */
+		while (!(vsk->peer_shutdown & RCV_SHUTDOWN) &&
+		       !sent_read && retries < VSOCK_MAX_DGRAM_RESENDS) {
+			err = VSOCK_SEND_READ(sk);
+			if (err >= 0)
+				sent_read = true;
+
+			retries++;
+		}
+
+		if (retries >= VSOCK_MAX_DGRAM_RESENDS)
+			printk
+			    ("%p unable to send read notify to peer.\n",
+			     sk);
+		else
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+			PKT_FIELD(vsk, peer_waiting_write) = false;
+#endif
+
+	}
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_handle_wrote --
+ *
+ * Handles an incoming wrote message.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_handle_wrote(struct sock *sk,
+			struct vsock_packet *pkt,
+			bool bottom_half,
+			struct sockaddr_vm *dst, struct sockaddr_vm *src)
+{
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+	PKT_FIELD(vsk, sent_waiting_read) = false;
+#endif
+
+	sk->sk_data_ready(sk, 0);
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_socket_init --
+ *
+ * Function that is called after a socket is created and before any notify ops
+ * are used.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_socket_init(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk;
+	vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, write_notify_min_window) = PAGE_SIZE;
+	PKT_FIELD(vsk, peer_waiting_read) = false;
+	PKT_FIELD(vsk, peer_waiting_write) = false;
+	PKT_FIELD(vsk, peer_waiting_write_detected) = false;
+	PKT_FIELD(vsk, sent_waiting_read) = false;
+	PKT_FIELD(vsk, sent_waiting_write) = false;
+	PKT_FIELD(vsk, produce_q_generation) = 0;
+	PKT_FIELD(vsk, consume_q_generation) = 0;
+
+	memset(&PKT_FIELD(vsk, peer_waiting_read_info), 0,
+	       sizeof PKT_FIELD(vsk, peer_waiting_read_info));
+	memset(&PKT_FIELD(vsk, peer_waiting_write_info), 0,
+	       sizeof PKT_FIELD(vsk, peer_waiting_write_info));
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_socket_destruct --
+ *
+ * Function that is called when the socket is being released.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_socket_destruct(struct sock *sk)
+{
+	return;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_poll_in --
+ *
+ * Called by the poll function to figure out if there is data to read and to
+ * setup future notifications if needed. Only called on sockets that aren't
+ * shutdown for recv.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_poll_in(struct sock *sk,
+			      size_t target, bool *data_ready_now)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	if (vsock_vmci_stream_has_data(vsk)) {
+		*data_ready_now = true;
+	} else {
+		/*
+		 * We can't read right now because there is nothing in the
+		 * queue. Ask for notifications when there is something to
+		 * read.
+		 */
+		if (sk->sk_state == SS_CONNECTED) {
+			if (!vsock_vmci_send_waiting_read(sk, 1))
+				return -1;
+
+		}
+		*data_ready_now = false;
+	}
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_poll_out
+ *
+ * Called by the poll function to figure out if there is space to write and to
+ * setup future notifications if needed. Only called on a connected socket that
+ * isn't shutdown for send.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_poll_out(struct sock *sk,
+			       size_t target, bool *space_avail_now)
+{
+	s64 produce_q_free_space;
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	produce_q_free_space = vsock_vmci_stream_has_space(vsk);
+	if (produce_q_free_space > 0) {
+		*space_avail_now = true;
+		return 0;
+	} else if (produce_q_free_space == 0) {
+		/*
+		 * This is a connected socket but we can't currently send data.
+		 * Notify the peer that we are waiting if the queue is full. We
+		 * only send a waiting write if the queue is full because
+		 * otherwise we end up in an infinite WAITING_WRITE, READ,
+		 * WAITING_WRITE, READ, etc. loop. Treat failing to send the
+		 * notification as a socket error, passing that back through
+		 * the mask.
+		 */
+		if (!vsock_vmci_send_waiting_write(sk, 1))
+			return -1;
+
+		*space_avail_now = false;
+	}
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_init --
+ *
+ * Called at the start of a stream recv call with the socket lock held.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_init(struct sock *sk,
+				size_t target,
+				struct vsock_vmci_recv_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk;
+
+	vsk = vsock_sk(sk);
+
+#ifdef VSOCK_OPTIMIZATION_WAITING_NOTIFY
+	data->consume_head = 0;
+	data->produce_tail = 0;
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+	data->notify_on_block = false;
+
+	if (PKT_FIELD(vsk, write_notify_min_window) < target + 1) {
+		PKT_FIELD(vsk, write_notify_min_window) = target + 1;
+		if (PKT_FIELD(vsk, write_notify_window) <
+		    PKT_FIELD(vsk, write_notify_min_window)) {
+			/*
+			 * If the current window is smaller than the new
+			 * minimal window size, we need to reevaluate whether
+			 * we need to notify the sender. If the number of ready
+			 * bytes are smaller than the new window, we need to
+			 * send a notification to the sender before we block.
+			 */
+
+			PKT_FIELD(vsk, write_notify_window) =
+			    PKT_FIELD(vsk, write_notify_min_window);
+			data->notify_on_block = true;
+		}
+	}
+#endif
+#endif
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_pre_block --
+ *
+ * Called right before a socket is about to block with the socket lock held.
+ * The socket lock may have been released between the entry function and the
+ * preblock call.
+ *
+ * Note: This function may be called multiple times before the post block
+ * function is called.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_pre_block(struct sock *sk,
+				     size_t target,
+				     struct vsock_vmci_recv_notify_data *data)
+{
+	int err = 0;
+
+	/* Notify our peer that we are waiting for data to read. */
+	if (!vsock_vmci_send_waiting_read(sk, target)) {
+		err = -EHOSTUNREACH;
+		return err;
+	}
+#ifdef VSOCK_OPTIMIZATION_FLOW_CONTROL
+	if (data->notify_on_block) {
+		err = vsock_vmci_send_read_notification(sk);
+		if (err < 0)
+			return err;
+
+		data->notify_on_block = false;
+	}
+#endif
+
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_pre_dequeue --
+ *
+ * Called right before we dequeue / peek data from a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_pre_dequeue(struct sock *sk,
+				       size_t target,
+				       struct vsock_vmci_recv_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	/*
+	 * Now consume up to len bytes from the queue.  Note that since we have
+	 * the socket locked we should copy at least ready bytes.
+	 */
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	vmci_qpair_get_consume_indexes(vsk->qpair,
+				       &data->produce_tail,
+				       &data->consume_head);
+#endif
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_recv_post_dequeue --
+ *
+ * Called right after we dequeue / peek data from a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_recv_post_dequeue(struct sock *sk,
+				size_t target,
+				ssize_t copied,
+				bool data_read,
+				struct vsock_vmci_recv_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk;
+	int err;
+
+	vsk = vsock_sk(sk);
+	err = 0;
+
+	if (data_read) {
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+		/*
+		 * Detect a wrap-around to maintain queue generation.  Note
+		 * that this is safe since we hold the socket lock across the
+		 * two queue pair operations.
+		 */
+		if (copied >= vsk->consume_size - data->consume_head)
+			PKT_FIELD(vsk, consume_q_generation)++;
+
+#endif
+
+		err = vsock_vmci_send_read_notification(sk);
+		if (err < 0)
+			return err;
+
+	}
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_send_init --
+ *
+ * Called at the start of a stream send call with the socket lock held.
+ *
+ * Results: 0 on success. A negative error code on failure.
+ *
+ * Side effects:
+ */
+
+static int
+vsock_vmci_notify_pkt_send_init(struct sock *sk,
+				struct vsock_vmci_send_notify_data *data)
+{
+#ifdef VSOCK_OPTIMIZATION_WAITING_NOTIFY
+	data->consume_head = 0;
+	data->produce_tail = 0;
+#endif
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_send_pre_block --
+ *
+ * Called right before a socket is about to block with the socket lock held.
+ * The socket lock may have been released between the entry function and the
+ * preblock call.
+ *
+ * Note: This function may be called multiple times before the post block
+ * function is called.
+ *
+ * Results. 0 on success. A negative error code on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_send_pre_block(struct sock *sk,
+				     struct vsock_vmci_send_notify_data *data)
+{
+	/* Notify our peer that we are waiting for room to write. */
+	if (!vsock_vmci_send_waiting_write(sk, 1))
+		return -EHOSTUNREACH;
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notifySendPreEnqueue --
+ *
+ * Called right before we Enqueue to a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_send_pre_enqueue(struct sock *sk,
+				       struct vsock_vmci_send_notify_data *data)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	vmci_qpair_get_produce_indexes(vsk->qpair,
+				       &data->produce_tail,
+				       &data->consume_head);
+#endif
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_vmci_notifySendPostEnqueue --
+ *
+ * Called right after we enqueue data to a socket.
+ *
+ * Results: 0 on success. Negative error on failure.
+ *
+ * Side effects: None.
+ */
+
+static int
+vsock_vmci_notify_pkt_send_post_enqueue(struct sock *sk,
+				ssize_t written,
+				struct vsock_vmci_send_notify_data *data)
+{
+	int err = 0;
+	struct vsock_vmci_sock *vsk;
+	bool sent_wrote = false;
+	int retries = 0;
+
+	vsk = vsock_sk(sk);
+
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+	/*
+	 * Detect a wrap-around to maintain queue generation.  Note that this
+	 * is safe since we hold the socket lock across the two queue pair
+	 * operations.
+	 */
+	if (written >= vsk->produce_size - data->produce_tail)
+		PKT_FIELD(vsk, produce_q_generation)++;
+
+#endif
+
+	if (vsock_vmci_notify_waiting_read(vsk)) {
+		/*
+		 * Notify the peer that we have written, retrying the send on
+		 * failure up to our maximum value. See the XXX comment for the
+		 * corresponding piece of code in StreamRecvmsg() for potential
+		 * improvements.
+		 */
+		while (!(vsk->peer_shutdown & RCV_SHUTDOWN) &&
+		       !sent_wrote && retries < VSOCK_MAX_DGRAM_RESENDS) {
+			err = VSOCK_SEND_WROTE(sk);
+			if (err >= 0)
+				sent_wrote = true;
+
+			retries++;
+		}
+
+		if (retries >= VSOCK_MAX_DGRAM_RESENDS) {
+			printk
+			     (" %p unable to send wrote notify to peer.\n",
+			     sk);
+			return err;
+		} else {
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+			PKT_FIELD(vsk, peer_waiting_read) = false;
+#endif
+		}
+	}
+	return err;
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_handle_pkt
+ *
+ * Called when a notify packet is recieved for a socket in the connected state.
+ * Note this might be called from a bottom half.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void
+vsock_vmci_notify_pkt_handle_pkt(struct sock *sk,
+				 struct vsock_packet *pkt,
+				 bool bottom_half,
+				 struct sockaddr_vm *dst,
+				 struct sockaddr_vm *src, bool *pkt_processed)
+{
+	bool processed = false;
+
+	switch (pkt->type) {
+	case VSOCK_PACKET_TYPE_WROTE:
+		vsock_vmci_handle_wrote(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+	case VSOCK_PACKET_TYPE_READ:
+		vsock_vmci_handle_read(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+	case VSOCK_PACKET_TYPE_WAITING_WRITE:
+		vsock_vmci_handle_waiting_write(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+
+	case VSOCK_PACKET_TYPE_WAITING_READ:
+		vsock_vmci_handle_waiting_read(sk, pkt, bottom_half, dst, src);
+		processed = true;
+		break;
+	}
+
+	if (pkt_processed)
+		*pkt_processed = processed;
+
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_process_request
+ *
+ * Called near the end of process request.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_process_request(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = vsk->consume_size;
+	if (vsk->consume_size < PKT_FIELD(vsk, write_notify_min_window))
+		PKT_FIELD(vsk, write_notify_min_window) = vsk->consume_size;
+
+}
+
+/*
+ *
+ * vsock_vmci_notify_pkt_process_negotiate
+ *
+ * Called near the end of process negotiate.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+static void vsock_vmci_notify_pkt_process_negotiate(struct sock *sk)
+{
+	struct vsock_vmci_sock *vsk = vsock_sk(sk);
+
+	PKT_FIELD(vsk, write_notify_window) = vsk->consume_size;
+	if (vsk->consume_size < PKT_FIELD(vsk, write_notify_min_window))
+		PKT_FIELD(vsk, write_notify_min_window) = vsk->consume_size;
+
+}
+
+/* Socket control packet based operations. */
+struct vsock_vmci_notify_ops vsock_vmci_notify_pkt_ops = {
+	vsock_vmci_notify_pkt_socket_init,
+	vsock_vmci_notify_pkt_socket_destruct,
+	vsock_vmci_notify_pkt_poll_in,
+	vsock_vmci_notify_pkt_poll_out,
+	vsock_vmci_notify_pkt_handle_pkt,
+	vsock_vmci_notify_pkt_recv_init,
+	vsock_vmci_notify_pkt_recv_pre_block,
+	vsock_vmci_notify_pkt_recv_pre_dequeue,
+	vsock_vmci_notify_pkt_recv_post_dequeue,
+	vsock_vmci_notify_pkt_send_init,
+	vsock_vmci_notify_pkt_send_pre_block,
+	vsock_vmci_notify_pkt_send_pre_enqueue,
+	vsock_vmci_notify_pkt_send_post_enqueue,
+	vsock_vmci_notify_pkt_process_request,
+	vsock_vmci_notify_pkt_process_negotiate,
+};
diff --git a/net/vmw_vsock/notify.h b/net/vmw_vsock/notify.h
new file mode 100644
index 0000000..70b20ef
--- /dev/null
+++ b/net/vmw_vsock/notify.h
@@ -0,0 +1,130 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2009-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * notify.h --
+ *
+ * Notify functions for Linux VSocket module.
+ */
+
+#ifndef __NOTIFY_H__
+#define __NOTIFY_H__
+
+#include <linux/types.h>
+
+#include "vsock_common.h"
+#include "vsock_packet.h"
+
+/* Comment this out to compare with old protocol. */
+#define VSOCK_OPTIMIZATION_WAITING_NOTIFY 1
+#if defined(VSOCK_OPTIMIZATION_WAITING_NOTIFY)
+/* Comment this out to remove flow control for "new" protocol */
+#define VSOCK_OPTIMIZATION_FLOW_CONTROL 1
+#endif
+
+#define VSOCK_MAX_DGRAM_RESENDS       10
+
+#define NOTIFYCALLRET(vsk, rv, mod_fn, args...)			\
+	do {							\
+		if (vsk->notify_ops &&				\
+		    vsk->notify_ops->mod_fn != NULL)		\
+			rv = (vsk->notify_ops->mod_fn)(args);	\
+		else						\
+			rv = 0;					\
+								\
+	} while (0)
+
+#define NOTIFYCALL(vsk, mod_fn, args...)			\
+	do {							\
+		if (vsk->notify_ops &&				\
+		    vsk->notify_ops->mod_fn != NULL)		\
+			(vsk->notify_ops->mod_fn)(args);	\
+								\
+	} while (0)
+
+struct vsock_vmci_notify_pkt {
+	u64 write_notify_window;
+	u64 write_notify_min_window;
+	bool peer_waiting_read;
+	bool peer_waiting_write;
+	bool peer_waiting_write_detected;
+	bool sent_waiting_read;
+	bool sent_waiting_write;
+	struct vsock_waiting_info peer_waiting_read_info;
+	struct vsock_waiting_info peer_waiting_write_info;
+	u64 produce_q_generation;
+	u64 consume_q_generation;
+};
+
+struct vsock_vmci_notify_pkt_q_state {
+	u64 write_notify_window;
+	u64 write_notify_min_window;
+	bool peer_waiting_write;
+	bool peer_waiting_write_detected;
+};
+
+union vsock_vmci_notify {
+	struct vsock_vmci_notify_pkt pkt;
+	struct vsock_vmci_notify_pkt_q_state pkt_q_state;
+};
+
+struct vsock_vmci_recv_notify_data {
+	u64 consume_head;
+	u64 produce_tail;
+	bool notify_on_block;
+};
+
+struct vsock_vmci_send_notify_data {
+	u64 consume_head;
+	u64 produce_tail;
+};
+
+/* Socket notification callbacks. */
+struct vsock_vmci_notify_ops {
+	void (*socket_init) (struct sock *sk);
+	void (*socket_destruct) (struct sock *sk);
+	int (*poll_in) (struct sock *sk, size_t target,
+			  bool *data_ready_now);
+	int (*poll_out) (struct sock *sk, size_t target,
+			   bool *space_avail_now);
+	void (*handle_notify_pkt) (struct sock *sk, struct vsock_packet *pkt,
+				   bool bottom_half, struct sockaddr_vm *dst,
+				   struct sockaddr_vm *src,
+				   bool *pkt_processed);
+	int (*recv_init) (struct sock *sk, size_t target,
+			  struct vsock_vmci_recv_notify_data *data);
+	int (*recv_pre_block) (struct sock *sk, size_t target,
+			       struct vsock_vmci_recv_notify_data *data);
+	int (*recv_pre_dequeue) (struct sock *sk, size_t target,
+				 struct vsock_vmci_recv_notify_data *data);
+	int (*recv_post_dequeue) (struct sock *sk, size_t target,
+				  ssize_t copied, bool data_read,
+				  struct vsock_vmci_recv_notify_data *data);
+	int (*send_init) (struct sock *sk,
+			  struct vsock_vmci_send_notify_data *data);
+	int (*send_pre_block) (struct sock *sk,
+			       struct vsock_vmci_send_notify_data *data);
+	int (*send_pre_enqueue) (struct sock *sk,
+				 struct vsock_vmci_send_notify_data *data);
+	int (*send_post_enqueue) (struct sock *sk, ssize_t written,
+				  struct vsock_vmci_send_notify_data *data);
+	void (*process_request) (struct sock *sk);
+	void (*process_negotiate) (struct sock *sk);
+};
+
+extern struct vsock_vmci_notify_ops vsock_vmci_notify_pkt_ops;
+extern struct vsock_vmci_notify_ops vsock_vmci_notify_pkt_q_state_ops;
+
+#endif /* __NOTIFY_H__ */

^ permalink raw reply related

* [PATCH 2/6] VSOCK: vsock address implementaion.
From: George Zhang @ 2012-11-21 20:39 UTC (permalink / raw)
  To: netdev, linux-kernel, georgezhang, virtualization
  Cc: pv-drivers, gregkh, davem
In-Reply-To: <20121121203715.14395.27632.stgit@promb-2n-dhcp175.eng.vmware.com>

VSOCK linux address code implementation.

Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>

---
 net/vmw_vsock/vsock_addr.c |  246 ++++++++++++++++++++++++++++++++++++++++++++
 net/vmw_vsock/vsock_addr.h |   40 +++++++
 2 files changed, 286 insertions(+), 0 deletions(-)
 create mode 100644 net/vmw_vsock/vsock_addr.c
 create mode 100644 net/vmw_vsock/vsock_addr.h

diff --git a/net/vmw_vsock/vsock_addr.c b/net/vmw_vsock/vsock_addr.c
new file mode 100644
index 0000000..35eeb14
--- /dev/null
+++ b/net/vmw_vsock/vsock_addr.c
@@ -0,0 +1,246 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsockAddr.c --
+ *
+ * VSockets address implementation.
+ */
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/stddef.h>	/* for NULL */
+#include <net/sock.h>
+
+#include "vsock_common.h"
+
+/*
+ *
+ * vsock_addr_init --
+ *
+ * Initialize the given address with the given context id and port. This will
+ * clear the address, set the correct family, and add the given values.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_addr_init(struct sockaddr_vm *addr, u32 cid, u32 port)
+{
+	memset(addr, 0, sizeof *addr);
+
+	addr->svm_family = AF_VSOCK;
+	addr->svm_cid = cid;
+	addr->svm_port = port;
+}
+
+/*
+ *
+ * vsock_addr_validate --
+ *
+ * Try to validate the given address.  The address must not be null and must
+ * have the correct address family.  Any reserved fields must be zero.
+ *
+ * Results: 0 on success, EFAULT if the address is null, EAFNOSUPPORT if the
+ * address is of the wrong family, and EINVAL if the reserved fields are not
+ * zero.
+ *
+ * Side effects: None.
+ */
+
+int vsock_addr_validate(const struct sockaddr_vm *addr)
+{
+	if (!addr)
+		return -EFAULT;
+
+	if (addr->svm_family != AF_VSOCK)
+		return -EAFNOSUPPORT;
+
+	if (addr->svm_zero[0] != 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
+ *
+ * vsock_addr_bound --
+ *
+ * Determines whether the provided address is bound.
+ *
+ * Results: TRUE if the address structure is bound, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_bound(const struct sockaddr_vm *addr)
+{
+	return addr->svm_port != VMADDR_PORT_ANY;
+}
+
+/*
+ *
+ * vsock_addr_unbind --
+ *
+ * Unbind the given addresss.
+ *
+ * Results: None.
+ *
+ * Side effects: None.
+ */
+
+void vsock_addr_unbind(struct sockaddr_vm *addr)
+{
+	vsock_addr_init(addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+}
+
+/*
+ *
+ * vsock_addr_equals_addr --
+ *
+ * Determine if the given addresses are equal.
+ *
+ * Results: TRUE if the addresses are equal, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_addr(const struct sockaddr_vm *addr,
+			    const struct sockaddr_vm *other)
+{
+	return addr->svm_cid == other->svm_cid &&
+		addr->svm_port == other->svm_port;
+}
+
+/*
+ *
+ * vsock_addr_equals_addr_any --
+ *
+ * Determine if the given addresses are equal. Will accept either an exact
+ * match or one where the rids match and that either the cids match or are set
+ * to VMADDR_CID_ANY.
+ *
+ * Results: TRUE if the addresses are equal, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_addr_any(const struct sockaddr_vm *addr,
+				const struct sockaddr_vm *other)
+{
+	return (addr->svm_cid == VMADDR_CID_ANY ||
+		other->svm_cid == VMADDR_CID_ANY ||
+		addr->svm_cid == other->svm_cid) &&
+	       addr->svm_port == other->svm_port;
+}
+
+/*
+ *
+ * vsock_addr_equals_handle_port --
+ *
+ * Determines if the given address matches the given handle and port.
+ *
+ * Results: TRUE if the address matches the handle and port, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_equals_handle_port(const struct sockaddr_vm *addr,
+				   struct vmci_handle handle, u32 port)
+{
+	return addr->svm_cid == VMCI_HANDLE_TO_CONTEXT_ID(handle) &&
+		addr->svm_port == port;
+}
+
+/*
+ *
+ * vsock_addr_cast --
+ *
+ * Try to cast the given generic address to a VM address.  The given length
+ * must match that of a VM address and the address must be valid. The
+ * "out_addr" parameter contains the address if successful.
+ *
+ * Results: 0 on success, EFAULT if the length is too small.  See
+ * vsock_addr_validate() for other possible return codes.
+ *
+ * Side effects: None.
+ */
+
+int vsock_addr_cast(const struct sockaddr *addr,
+		    size_t len, struct sockaddr_vm **out_addr)
+{
+	if (len < sizeof **out_addr)
+		return -EFAULT;
+
+	*out_addr = (struct sockaddr_vm *)addr;
+	return vsock_addr_validate(*out_addr);
+}
+
+/*
+ *
+ * vsock_addr_socket_context_stream --
+ *
+ * Determines whether the provided context id represents a context that
+ * contains a stream socket endpoints.
+ *
+ * Results: TRUE if the context does have socket endpoints, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_socket_context_stream(u32 cid)
+{
+	static const vmci_id non_socket_contexts[] = {
+		VMCI_HYPERVISOR_CONTEXT_ID,
+		VMCI_WELL_KNOWN_CONTEXT_ID,
+	};
+	int i;
+
+	BUILD_BUG_ON(sizeof cid != sizeof *non_socket_contexts);
+
+	for (i = 0; i < ARRAY_SIZE(non_socket_contexts); i++) {
+		if (cid == non_socket_contexts[i])
+			return false;
+
+	}
+
+	return true;
+}
+
+/*
+ *
+ * vsock_addr_socket_context_dgram --
+ *
+ * Determines whether the provided <context id, resource id> represent a
+ * protected datagram endpoint.
+ *
+ * Results: TRUE if the context does have socket endpoints, FALSE otherwise.
+ *
+ * Side effects: None.
+ */
+
+bool vsock_addr_socket_context_dgram(u32 cid, u32 rid)
+{
+	if (cid == VMCI_HYPERVISOR_CONTEXT_ID) {
+		/*
+		 * Registrations of PBRPC Servers do not modify VMX/Hypervisor
+		 * state and are allowed.
+		 */
+		return rid == VMCI_UNITY_PBRPC_REGISTER;
+	}
+
+	return true;
+}
diff --git a/net/vmw_vsock/vsock_addr.h b/net/vmw_vsock/vsock_addr.h
new file mode 100644
index 0000000..18f023d
--- /dev/null
+++ b/net/vmw_vsock/vsock_addr.h
@@ -0,0 +1,40 @@
+/*
+ * VMware vSockets Driver
+ *
+ * Copyright (C) 2007-2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+/*
+ * vsockAddr.h --
+ *
+ * VSockets address constants, types and functions.
+ */
+
+#ifndef _VSOCK_ADDR_H_
+#define _VSOCK_ADDR_H_
+
+void vsock_addr_init(struct sockaddr_vm *addr, u32 cid, u32 port);
+int vsock_addr_validate(const struct sockaddr_vm *addr);
+bool vsock_addr_bound(const struct sockaddr_vm *addr);
+void vsock_addr_unbind(struct sockaddr_vm *addr);
+bool vsock_addr_equals_addr(const struct sockaddr_vm *addr,
+			    const struct sockaddr_vm *other);
+bool vsock_addr_equals_addr_any(const struct sockaddr_vm *addr,
+				const struct sockaddr_vm *other);
+bool vsock_addr_equals_handle_port(const struct sockaddr_vm *addr,
+				   struct vmci_handle handle, u32 port);
+int vsock_addr_cast(const struct sockaddr *addr, size_t len,
+		    struct sockaddr_vm **out_addr);
+bool vsock_addr_socket_context_stream(u32 cid);
+bool vsock_addr_socket_context_dgram(u32 cid, u32 rid);
+
+#endif

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox