* [PATCH 1/2]
@ 2007-03-06 9:36 Gerrit Renker
0 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2007-03-06 9:36 UTC (permalink / raw)
To: dccp
[DCCP]: Correctly split CCID half connections
This fixes a bug caused by a previous patch, which causes DCCP
servers in LISTEN state to not receive packets.
This patch changes the logic so that
* servers in either LISTEN or OPEN state get the RX half connection packets
* clients in OPEN state get the TX half connection packets
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
net/dccp/input.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -258,10 +258,10 @@ int dccp_rcv_established(struct sock *sk
* (only one is active at a time); when moving to bidirectional
* service, this needs to be revised.
*/
- if (dccp_sk(sk)->dccps_role = DCCP_ROLE_SERVER)
- ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
- else
+ if (dccp_sk(sk)->dccps_role = DCCP_ROLE_CLIENT)
ccid_hc_tx_packet_recv(dp->dccps_hc_tx_ccid, sk, skb);
+ else /* listening or connected server */
+ ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
return __dccp_rcv_established(sk, skb, dh, len);
discard:
@@ -505,10 +505,10 @@ int dccp_rcv_state_process(struct sock *
goto discard;
/* XXX see the comments in dccp_rcv_established about this */
- if (dccp_sk(sk)->dccps_role = DCCP_ROLE_SERVER)
- ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
- else
+ if (dccp_sk(sk)->dccps_role = DCCP_ROLE_CLIENT)
ccid_hc_tx_packet_recv(dp->dccps_hc_tx_ccid, sk, skb);
+ else
+ ccid_hc_rx_packet_recv(dp->dccps_hc_rx_ccid, sk, skb);
}
/*
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2]
2008-12-03 12:29 [PATCH 0/2] fusion reset Bernd Schubert
@ 2008-12-03 12:31 ` Bernd Schubert
0 siblings, 0 replies; 19+ messages in thread
From: Bernd Schubert @ 2008-12-03 12:31 UTC (permalink / raw)
To: linux-scsi
Cc: Sathya Prakash, Eric Moore, James Bottomley, DL-MPT Fusion Linux
On dual port 53C1030 based HBAs such as the LSI22320R, the hard reset handler
will cause DID_SOFT_ERROR for innocent devices on the second port.
Introduce a mpt_SoftResetHandler() which doesn't cause this issue and
slightly improve mpt_HardResetHandler().
This is mostly a backport of the fusion-4.x driver available from LSI.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
drivers/message/fusion/mptbase.c | 211 ++++++++++++++++++++++++----
drivers/message/fusion/mptbase.h | 11 +
drivers/message/fusion/mptctl.c | 7
drivers/message/fusion/mptsas.c | 4
drivers/message/fusion/mptscsih.c | 35 ++--
5 files changed, 218 insertions(+), 50 deletions(-)
Index: linux-2.6/drivers/message/fusion/mptbase.c
===================================================================
--- linux-2.6.orig/drivers/message/fusion/mptbase.c
+++ linux-2.6/drivers/message/fusion/mptbase.c
@@ -5945,7 +5945,7 @@ mpt_timer_expired(unsigned long data)
dcprintk(ioc, printk(MYIOC_s_DEBUG_FMT "mpt_timer_expired! \n", ioc->name));
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
/* No more processing.
@@ -6319,6 +6319,134 @@ mpt_print_ioc_summary(MPT_ADAPTER *ioc,
/*
* Reset Handling
*/
+
+/**
+ * mpt_SoftResetHandler - Issues a less expensive reset
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+
+ *
+ * Returns 0 for SUCCESS or -1 if FAILED.
+ *
+ * Message Unit Reset - instructs the IOC to reset the Reply Post and
+ * Free FIFO's. All the Message Frames on Reply Free FIFO are discarded.
+ * All posted buffers are freed, and event notification is turned off.
+ * IOC doesnt reply to any outstanding request. This will transfer IOC
+ * to READY state.
+ **/
+static int
+mpt_SoftResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
+{
+ int rc;
+ int ii;
+ u8 cb_idx;
+ unsigned long flags;
+ u32 ioc_state;
+ unsigned long time_count;
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "SoftResetHandler Entered!\n",
+ ioc->name));
+
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state == MPI_IOC_STATE_FAULT ||
+ ioc_state == MPI_IOC_STATE_RESET) {
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT
+ "skipping, either in FAULT or RESET state!\n", ioc->name));
+ return -1;
+ }
+
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ if (ioc->ioc_reset_in_progress) {
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+ return -1;
+ }
+ ioc->ioc_reset_in_progress = 1;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ rc = -1;
+
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ }
+
+ /* Disable reply interrupts (also blocks FreeQ) */
+ CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+ ioc->active = 0;
+ time_count = jiffies;
+ rc = SendIocReset(ioc, MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET, sleepFlag);
+ if (rc != 0)
+ goto out;
+
+ /* MPT_IOC_PRE_RESET clears pending requests, but MUR
+ * (MPI_FUNCTION_IOC_MESSAGE_UNIT_RESET) tries to find a DMA request and
+ * will fault the fw if no request is found. So we need to do
+ * MPT_IOC_PRE_RESET after MUR */
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_PRE_RESET);
+ }
+
+ ioc_state = mpt_GetIocState(ioc, 0) & MPI_IOC_STATE_MASK;
+ if (ioc_state != MPI_IOC_STATE_READY)
+ goto out;
+
+ for (ii = 0; ii < 5; ii++) {
+ /* Get IOC facts! Allow 5 retries */
+ rc = GetIocFacts(ioc, sleepFlag, MPT_HOSTEVENT_IOC_RECOVER);
+ if (rc == 0)
+ break;
+ if (sleepFlag == CAN_SLEEP)
+ msleep(100);
+ else
+ mdelay(100);
+ }
+ if (ii == 5)
+ goto out;
+
+ rc = PrimeIocFifos(ioc);
+ if (rc != 0)
+ goto out;
+
+ rc = SendIocInit(ioc, sleepFlag);
+ if (rc != 0)
+ goto out;
+
+ rc = SendEventNotification(ioc, 1);
+ if (rc != 0)
+ goto out;
+
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
+
+ /*
+ * At this point, we know soft reset succeeded.
+ */
+
+ ioc->active = 1;
+ CHIPREG_WRITE32(&ioc->chip->IntMask, MPI_HIM_DIM);
+
+ out:
+ spin_lock_irqsave(&ioc->diagLock, flags);
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ spin_unlock_irqrestore(&ioc->diagLock, flags);
+
+ if (ioc->active) { /* otherwise, hard reset coming */
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx])
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ printk(MYIOC_s_INFO_FMT "SoftResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED"));
+
+ return rc;
+}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/**
* mpt_HardResetHandler - Generic reset handler
@@ -6340,9 +6468,10 @@ int
mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
{
int rc;
+ u8 cb_idx;
unsigned long flags;
+ unsigned long time_count;
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler Entered!\n", ioc->name));
#ifdef MFCNT
printk(MYIOC_s_INFO_FMT "HardResetHandler Entered!\n", ioc->name);
printk("MF count 0x%x !\n", ioc->mfcnt);
@@ -6352,12 +6481,15 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* mpt_do_ioc_recovery at any instant in time.
*/
spin_lock_irqsave(&ioc->diagLock, flags);
- if ((ioc->diagPending) || (ioc->alt_ioc && ioc->alt_ioc->diagPending)){
+ if (ioc->ioc_reset_in_progress) {
spin_unlock_irqrestore(&ioc->diagLock, flags);
return 0;
} else {
ioc->diagPending = 1;
}
+ ioc->ioc_reset_in_progress = 1;
+ if (ioc->alt_ioc)
+ ioc->alt_ioc->ioc_reset_in_progress = 1;
spin_unlock_irqrestore(&ioc->diagLock, flags);
/* FIXME: If do_ioc_recovery fails, repeat....
@@ -6368,42 +6500,70 @@ mpt_HardResetHandler(MPT_ADAPTER *ioc, i
* Prevents timeouts occurring during a diagnostic reset...very bad.
* For all other protocol drivers, this is a no-op.
*/
- {
- u8 cb_idx;
- int r = 0;
-
- for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
- if (MptResetHandlers[cb_idx]) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling IOC reset_setup handler #%d\n",
- ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
- if (ioc->alt_ioc) {
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling alt-%s setup reset handler
#%d\n",
- ioc->name, ioc->alt_ioc->name, cb_idx));
- r += mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_SETUP_RESET);
- }
- }
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_SETUP_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_SETUP_RESET);
}
}
- if ((rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag)) != 0) {
- printk(MYIOC_s_WARN_FMT "Cannot recover rc = %d!\n", ioc->name, rc);
+ time_count = jiffies;
+ rc = mpt_do_ioc_recovery(ioc, MPT_HOSTEVENT_IOC_RECOVER, sleepFlag);
+ if (rc != 0) {
+ printk(KERN_WARNING MYNAM ": WARNING - (%d) Cannot recover %s\n",
+ rc, ioc->name);
+ } else {
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
}
- ioc->reload_fw = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->reload_fw = 0;
spin_lock_irqsave(&ioc->diagLock, flags);
- ioc->diagPending = 0;
- if (ioc->alt_ioc)
- ioc->alt_ioc->diagPending = 0;
+ ioc->ioc_reset_in_progress = 0;
+ ioc->taskmgmt_quiesce_io = 0;
+ ioc->taskmgmt_in_progress = 0;
+ if (ioc->alt_ioc) {
+ ioc->alt_ioc->ioc_reset_in_progress = 0;
+ ioc->alt_ioc->taskmgmt_quiesce_io = 0;
+ ioc->alt_ioc->taskmgmt_in_progress = 0;
+ }
spin_unlock_irqrestore(&ioc->diagLock, flags);
- dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler rc = %d!\n", ioc->name, rc));
+ for (cb_idx = MPT_MAX_PROTOCOL_DRIVERS-1; cb_idx; cb_idx--) {
+ if (MptResetHandlers[cb_idx]) {
+ mpt_signal_reset(cb_idx, ioc, MPT_IOC_POST_RESET);
+ if (ioc->alt_ioc)
+ mpt_signal_reset(cb_idx, ioc->alt_ioc, MPT_IOC_POST_RESET);
+ }
+ }
+
+ dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "HardResetHandler: completed (%d seconds): %s\n",
+ ioc->name, jiffies_to_msecs(jiffies - time_count)/1000,
+ ((rc == 0) ? "SUCCESS" : "FAILED")));
+ return rc;
+}
+
+/**
+ * mpt_SoftHardResetHandler - Generic reset handler
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleepFlag: Indicates if sleep or schedule must be called.
+ *
+ * First try to do a soft reset and if this fails, call the
+ * hard-reset-handler
+ */
+int
+mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag)
+{
+ int rc;
+
+ rc = mpt_SoftResetHandler(ioc, sleepFlag);
+ if (rc)
+ rc = mpt_HardResetHandler(ioc, sleepFlag);
return rc;
}
+
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
static void
EventDescriptionStr(u8 event, u32 evData0, char *evStr)
@@ -7561,6 +7721,7 @@ EXPORT_SYMBOL(mpt_verify_adapter);
EXPORT_SYMBOL(mpt_GetIocState);
EXPORT_SYMBOL(mpt_print_ioc_summary);
EXPORT_SYMBOL(mpt_HardResetHandler);
+EXPORT_SYMBOL(mpt_SoftHardResetHandler);
EXPORT_SYMBOL(mpt_config);
EXPORT_SYMBOL(mpt_findImVolumes);
EXPORT_SYMBOL(mpt_alloc_fw_memory);
Index: linux-2.6/drivers/message/fusion/mptbase.h
===================================================================
--- linux-2.6.orig/drivers/message/fusion/mptbase.h
+++ linux-2.6/drivers/message/fusion/mptbase.h
@@ -701,6 +701,9 @@ typedef struct _MPT_ADAPTER
MPT_SAS_MGMT sas_mgmt;
struct work_struct sas_persist_task;
+ int taskmgmt_in_progress;
+ u8 taskmgmt_quiesce_io;
+
struct work_struct fc_setup_reset_work;
struct list_head fc_rports;
struct work_struct fc_lsc_work;
@@ -709,6 +712,11 @@ typedef struct _MPT_ADAPTER
struct work_struct fc_rescan_work;
char fc_rescan_work_q_name[20];
struct workqueue_struct *fc_rescan_work_q;
+
+ unsigned long hard_resets; /* driver forced bus resets count */
+ unsigned long soft_resets; /* fw/external bus resets count */
+ u8 ioc_reset_in_progress;
+
struct scsi_cmnd **ScsiLookup;
spinlock_t scsi_lookup_lock;
@@ -844,8 +852,6 @@ typedef struct _MPT_SCSI_HOST {
MPT_FRAME_HDR *cmdPtr; /* Ptr to nonOS request */
struct scsi_cmnd *abortSCpnt;
MPT_LOCAL_REPLY localReply; /* internal cmd reply struct */
- unsigned long hard_resets; /* driver forced bus resets count */
- unsigned long soft_resets; /* fw/external bus resets count */
unsigned long timeouts; /* cmd timeouts */
ushort sel_timeout[MPT_MAX_FC_DEVICES];
char *info_kbuf;
@@ -916,6 +922,7 @@ extern int mpt_verify_adapter(int iocid
extern u32 mpt_GetIocState(MPT_ADAPTER *ioc, int cooked);
extern void mpt_print_ioc_summary(MPT_ADAPTER *ioc, char *buf, int *size, int len, int showlan);
extern int mpt_HardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
+extern int mpt_SoftHardResetHandler(MPT_ADAPTER *ioc, int sleepFlag);
extern int mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *cfg);
extern int mpt_alloc_fw_memory(MPT_ADAPTER *ioc, int size);
extern void mpt_free_fw_memory(MPT_ADAPTER *ioc);
Index: linux-2.6/drivers/message/fusion/mptscsih.c
===================================================================
--- linux-2.6.orig/drivers/message/fusion/mptscsih.c
+++ linux-2.6/drivers/message/fusion/mptscsih.c
@@ -1605,7 +1605,7 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
"TM Handler for type=%x: IOC Not operational (0x%x)!\n",
ioc->name, type, ioc_raw_state);
printk(MYIOC_s_WARN_FMT " Issuing HardReset!!\n", ioc->name);
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, CAN_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT "TMHandler: HardReset "
"FAILED!!\n", ioc->name);
return FAILED;
@@ -1621,8 +1621,8 @@ mptscsih_TMHandler(MPT_SCSI_HOST *hd, u8
/* Isse the Task Mgmt request.
*/
- if (hd->hard_resets < -1)
- hd->hard_resets++;
+ if (ioc->hard_resets < -1)
+ ioc->hard_resets++;
rc = mptscsih_IssueTaskMgmt(hd, type, channel, id, lun,
ctx2abort, timeout);
@@ -1724,7 +1724,7 @@ mptscsih_IssueTaskMgmt(MPT_SCSI_HOST *hd
ioc, mf));
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioc->name));
- retval = mpt_HardResetHandler(ioc, CAN_SLEEP);
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
dtmprintk(ioc, printk(MYIOC_s_DEBUG_FMT "rc=%d \n",
ioc->name, retval));
goto fail_out;
@@ -1998,11 +1998,12 @@ int
mptscsih_host_reset(struct scsi_cmnd *SCpnt)
{
MPT_SCSI_HOST * hd;
- int retval;
+ int retval, status;
MPT_ADAPTER *ioc;
/* If we can't locate the host to reset, then we failed. */
- if ((hd = shost_priv(SCpnt->device->host)) == NULL){
+ hd = shost_priv(SCpnt->device->host);
+ if (hd == NULL) {
printk(KERN_ERR MYNAM ": host reset: "
"Can't locate host! (sc=%p)\n", SCpnt);
return FAILED;
@@ -2015,21 +2016,23 @@ mptscsih_host_reset(struct scsi_cmnd *SC
/* If our attempts to reset the host failed, then return a failed
* status. The host will be taken off line by the SCSI mid-layer.
*/
- if (mpt_HardResetHandler(ioc, CAN_SLEEP) < 0) {
- retval = FAILED;
- } else {
+ retval = mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
+
+ if (retval < 0)
+ status = FAILED;
+ else {
/* Make sure TM pending is cleared and TM state is set to
* NONE.
*/
- retval = 0;
+ status = SUCCESS;
hd->tmPending = 0;
hd->tmState = TM_STATE_NONE;
}
printk(MYIOC_s_INFO_FMT "host reset: %s (sc=%p)\n",
- ioc->name, ((retval == 0) ? "SUCCESS" : "FAILED" ), SCpnt);
+ ioc->name, ((status == SUCCESS) ? "SUCCESS" : "FAILED"), SCpnt);
- return retval;
+ return status;
}
/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
@@ -2218,7 +2221,7 @@ mptscsih_taskmgmt_complete(MPT_ADAPTER *
*/
if (iocstatus == MPI_IOCSTATUS_SCSI_TASK_MGMT_FAILED ||
hd->cmdPtr)
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0)
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
printk(MYIOC_s_WARN_FMT " Firmware Reload FAILED!!\n", ioc->name);
break;
@@ -2740,8 +2743,8 @@ mptscsih_event_process(MPT_ADAPTER *ioc,
break;
case MPI_EVENT_IOC_BUS_RESET: /* 04 */
case MPI_EVENT_EXT_BUS_RESET: /* 05 */
- if (hd && (ioc->bus_type == SPI) && (hd->soft_resets < -1))
- hd->soft_resets++;
+ if (hd && (ioc->bus_type == SPI) && (ioc->soft_resets < -1))
+ ioc->soft_resets++;
break;
case MPI_EVENT_LOGOUT: /* 09 */
/* FIXME! */
@@ -2979,9 +2982,9 @@ mptscsih_timer_expired(unsigned long dat
*/
} else {
/* Perform a FW reload */
- if (mpt_HardResetHandler(ioc, NO_SLEEP) < 0) {
- printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n", ioc->name);
- }
+ if (mpt_SoftHardResetHandler(ioc, NO_SLEEP) < 0)
+ printk(MYIOC_s_WARN_FMT "Firmware Reload FAILED!\n",
+ ioc->name);
}
} else {
/* This should NEVER happen */
Index: linux-2.6/drivers/message/fusion/mptctl.c
===================================================================
--- linux-2.6.orig/drivers/message/fusion/mptctl.c
+++ linux-2.6/drivers/message/fusion/mptctl.c
@@ -323,7 +323,7 @@ static void mptctl_timeout_expired (MPT_
*/
dctlprintk(ioctl->ioc, printk(MYIOC_s_DEBUG_FMT "Calling HardReset! \n",
ioctl->ioc->name));
- mpt_HardResetHandler(ioctl->ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioctl->ioc, CAN_SLEEP);
}
return;
@@ -678,6 +678,8 @@ static int mptctl_do_reset(unsigned long
dctlprintk(iocp, printk(MYIOC_s_DEBUG_FMT "mptctl_do_reset called.\n",
iocp->name));
+ /* We can't use the soft reset handler here, since only the hard
+ * reset handler can clear possible fw faults */
if (mpt_HardResetHandler(iocp, CAN_SLEEP) != 0) {
printk (MYIOC_s_ERR_FMT "%s@%d::mptctl_do_reset - reset failed.\n",
iocp->name, __FILE__, __LINE__);
@@ -2465,8 +2467,8 @@ mptctl_hp_hostinfo(unsigned long arg, un
MPT_SCSI_HOST *hd = shost_priv(ioc->sh);
if (hd && (cim_rev == 1)) {
- karg.hard_resets = hd->hard_resets;
- karg.soft_resets = hd->soft_resets;
+ karg.hard_resets = ioc->hard_resets;
+ karg.soft_resets = ioc->soft_resets;
karg.timeouts = hd->timeouts;
}
}
Index: linux-2.6/drivers/message/fusion/mptsas.c
===================================================================
--- linux-2.6.orig/drivers/message/fusion/mptsas.c
+++ linux-2.6/drivers/message/fusion/mptsas.c
@@ -1167,7 +1167,7 @@ static int mptsas_phy_reset(struct sas_p
if (!timeleft) {
/* On timeout reset the board */
mpt_free_msg_frame(ioc, mf);
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
error = -ETIMEDOUT;
goto out_unlock;
}
@@ -1345,7 +1345,7 @@ static int mptsas_smp_handler(struct Scs
if (!timeleft) {
printk(MYIOC_s_ERR_FMT "%s: smp timeout!\n", ioc->name, __func__);
/* On timeout reset the board */
- mpt_HardResetHandler(ioc, CAN_SLEEP);
+ mpt_SoftHardResetHandler(ioc, CAN_SLEEP);
ret = -ETIMEDOUT;
goto unmap;
}
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2]
2009-12-04 21:47 ` Sebastian Kapfer
@ 2009-12-13 22:01 ` Sebastian Kapfer
0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Kapfer @ 2009-12-13 22:01 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: Linux Input ML
Hi Dmitry,
for the record, I have put together one last time what I consider the
best solution. Sorry if I seem pushy, but I think we have ironed out
all the issues, and waiting longer isn't going to make the code prettier
(or more correct).
This is your last patch, with the desync fixed:
Date: Fri, 4 Dec 2009 21:50:37 +0100
Subject: [PATCH 1/2] reworked again, 9bytes
Input: ALPS - add interleaved protocol support (Dell E6x00 series)
From: Sebastian Kapfer <sebastian_kapfer@gmx.net>
Properly handle version of the protocol where standard PS/2 packets
from trackpoint are stuffed into middle (byte 3-6) of the standard
ALPS packets when both the touchpad and trackpoint are used together.
The patch is based on work done by Matthew Chapman and additional
research done by David Kubicek and Erik Osterholm:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296610
Many thanks to David Kubicek for his efforts in researching fine points
of this new version of the protocol, especially interaction between pad
and stick in these models.
Signed-off-by: Sebastian Kapfer <sebastian_kapfer@gmx.net>
---
drivers/input/mouse/alps.c | 254 ++++++++++++++++++++++++++++++++++++-----
drivers/input/mouse/alps.h | 1 +
drivers/input/mouse/psmouse.h | 2 +-
3 files changed, 229 insertions(+), 28 deletions(-)
diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
index a3f492a..71b40ae 100644
--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -5,6 +5,7 @@
* Copyright (c) 2003-2005 Peter Osterlund <petero2@telia.com>
* Copyright (c) 2004 Dmitry Torokhov <dtor@mail.ru>
* Copyright (c) 2005 Vojtech Pavlik <vojtech@suse.cz>
+ * Copyright (c) 2009 Sebastian Kapfer <sebastian_kapfer@gmx.net>
*
* ALPS detection, tap switching and status querying info is taken from
* tpconfig utility (by C. Scott Ananian and Bruce Kall).
@@ -28,7 +29,6 @@
#define dbg(format, arg...) do {} while (0)
#endif
-
#define ALPS_OLDPROTO 0x01 /* old style input */
#define ALPS_DUALPOINT 0x02 /* touchpad has trackstick */
#define ALPS_PASS 0x04 /* device has a pass-through port */
@@ -37,7 +37,8 @@
#define ALPS_FW_BK_1 0x10 /* front & back buttons present */
#define ALPS_FW_BK_2 0x20 /* front & back buttons present */
#define ALPS_FOUR_BUTTONS 0x40 /* 4 direction button present */
-
+#define ALPS_PS2_INTERLEAVED 0x80 /* 3-byte PS/2 packet interleaved with
+ 6-byte ALPS packet */
static const struct alps_model_info alps_model_data[] = {
{ { 0x32, 0x02, 0x14 }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT }, /* Toshiba Salellite Pro M10 */
@@ -58,7 +59,9 @@ static const struct alps_model_info alps_model_data[] = {
{ { 0x20, 0x02, 0x0e }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT }, /* XXX */
{ { 0x22, 0x02, 0x0a }, 0xf8, 0xf8, ALPS_PASS | ALPS_DUALPOINT },
{ { 0x22, 0x02, 0x14 }, 0xff, 0xff, ALPS_PASS | ALPS_DUALPOINT }, /* Dell Latitude D600 */
- { { 0x62, 0x02, 0x14 }, 0xcf, 0xcf, ALPS_PASS | ALPS_DUALPOINT }, /* Dell Latitude E6500 */
+ /* Dell Latitude E5500, E6400, E6500, Precision M4400 */
+ { { 0x62, 0x02, 0x14 }, 0xcf, 0xcf,
+ ALPS_PASS | ALPS_DUALPOINT | ALPS_PS2_INTERLEAVED },
{ { 0x73, 0x02, 0x50 }, 0xcf, 0xcf, ALPS_FOUR_BUTTONS }, /* Dell Vostro 1400 */
};
@@ -69,20 +72,56 @@ static const struct alps_model_info alps_model_data[] = {
*/
/*
- * ALPS abolute Mode - new format
+ * PS/2 packet format
+ *
+ * byte 0: 0 0 YSGN XSGN 1 M R L
+ * byte 1: X7 X6 X5 X4 X3 X2 X1 X0
+ * byte 2: Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
+ *
+ * Note that the device never signals overflow condition.
+ *
+ * ALPS absolute Mode - new format
*
* byte 0: 1 ? ? ? 1 ? ? ?
* byte 1: 0 x6 x5 x4 x3 x2 x1 x0
- * byte 2: 0 x10 x9 x8 x7 ? fin ges
+ * byte 2: 0 x10 x9 x8 x7 ? fin ges
* byte 3: 0 y9 y8 y7 1 M R L
* byte 4: 0 y6 y5 y4 y3 y2 y1 y0
* byte 5: 0 z6 z5 z4 z3 z2 z1 z0
*
+ * Dualpoint device -- interleaved packet format
+ *
+ * byte 0: 1 1 0 0 1 1 1 1
+ * byte 1: 0 x6 x5 x4 x3 x2 x1 x0
+ * byte 2: 0 x10 x9 x8 x7 0 fin ges
+ * byte 3: 0 0 YSGN XSGN 1 1 1 1
+ * byte 4: X7 X6 X5 X4 X3 X2 X1 X0
+ * byte 5: Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
+ * byte 6: 0 y9 y8 y7 1 m r l
+ * byte 7: 0 y6 y5 y4 y3 y2 y1 y0
+ * byte 8: 0 z6 z5 z4 z3 z2 z1 z0
+ *
+ * CAPITALS = stick, miniscules = touchpad
+ *
* ?'s can have different meanings on different models,
* such as wheel rotation, extra buttons, stick buttons
* on a dualpoint, etc.
*/
+static void alps_report_buttons(struct input_dev *dev,
+ int left, int right, int middle,
+ bool release_only)
+{
+ if (!left || !release_only)
+ input_report_key(dev, BTN_LEFT, left);
+
+ if (!right || !release_only)
+ input_report_key(dev, BTN_RIGHT, right);
+
+ if (!middle || !release_only)
+ input_report_key(dev, BTN_MIDDLE, middle);
+}
+
static void alps_process_packet(struct psmouse *psmouse)
{
struct alps_data *priv = psmouse->private;
@@ -93,18 +132,6 @@ static void alps_process_packet(struct psmouse *psmouse)
int x, y, z, ges, fin, left, right, middle;
int back = 0, forward = 0;
- if ((packet[0] & 0xc8) == 0x08) { /* 3-byte PS/2 packet */
- input_report_key(dev2, BTN_LEFT, packet[0] & 1);
- input_report_key(dev2, BTN_RIGHT, packet[0] & 2);
- input_report_key(dev2, BTN_MIDDLE, packet[0] & 4);
- input_report_rel(dev2, REL_X,
- packet[1] ? packet[1] - ((packet[0] << 4) & 0x100) : 0);
- input_report_rel(dev2, REL_Y,
- packet[2] ? ((packet[0] << 3) & 0x100) - packet[2] : 0);
- input_sync(dev2);
- return;
- }
-
if (model->flags & ALPS_OLDPROTO) {
left = packet[2] & 0x10;
right = packet[2] & 0x08;
@@ -140,18 +167,25 @@ static void alps_process_packet(struct psmouse *psmouse)
input_report_rel(dev2, REL_X, (x > 383 ? (x - 768) : x));
input_report_rel(dev2, REL_Y, -(y > 255 ? (y - 512) : y));
- input_report_key(dev2, BTN_LEFT, left);
- input_report_key(dev2, BTN_RIGHT, right);
- input_report_key(dev2, BTN_MIDDLE, middle);
+ alps_report_buttons(dev2, left, right, middle, false);
- input_sync(dev);
input_sync(dev2);
+ input_sync(dev);
return;
}
- input_report_key(dev, BTN_LEFT, left);
- input_report_key(dev, BTN_RIGHT, right);
- input_report_key(dev, BTN_MIDDLE, middle);
+ alps_report_buttons(dev, left, right, middle, false);
+
+ if (model->flags & ALPS_PS2_INTERLEAVED) {
+ /*
+ * On devices using interleaved packets, when user presses
+ * the same button on both trackpoint and touchpad, the
+ * release for the trackpoint is not reported so we have
+ * send release events to both devices.
+ */
+ alps_report_buttons(dev2, left, right, middle, true);
+ input_sync(dev2);
+ }
/* Convert hardware tap to a reasonable Z value */
if (ges && !fin)
@@ -202,25 +236,188 @@ static void alps_process_packet(struct psmouse *psmouse)
input_sync(dev);
}
+static void alps_report_bare_ps2_packet(struct psmouse *psmouse,
+ unsigned char packet[],
+ bool report_buttons)
+{
+ struct alps_data *priv = psmouse->private;
+ struct input_dev *dev2 = priv->dev2;
+
+ if (report_buttons) {
+ int left = packet[0] & 1;
+ int right = packet[0] & 2;
+ int middle = packet[0] & 4;
+
+
+ if (priv->i->flags & ALPS_PS2_INTERLEAVED) {
+ alps_report_buttons(psmouse->dev,
+ left, right, middle, true);
+ input_sync(psmouse->dev);
+ }
+
+ alps_report_buttons(dev2, left, right, middle, false);
+ }
+
+ input_report_rel(dev2, REL_X,
+ packet[1] ? packet[1] - ((packet[0] << 4) & 0x100) : 0);
+ input_report_rel(dev2, REL_Y,
+ packet[2] ? ((packet[0] << 3) & 0x100) - packet[2] : 0);
+
+ input_sync(dev2);
+}
+
+static bool alps_is_valid_first_byte(const struct alps_model_info *model,
+ unsigned char data)
+{
+ return (data & model->mask0) == model->byte0;
+}
+
+static psmouse_ret_t alps_handle_interleaved_ps2(struct psmouse *psmouse)
+{
+ struct alps_data *priv = psmouse->private;
+
+ if (psmouse->pktcnt < 6)
+ return PSMOUSE_GOOD_DATA;
+
+ if (psmouse->pktcnt == 6) {
+ /*
+ * Start a timer to flush the packet if it ends up last
+ * 6-byte packet in the stream. Timer needs to fire
+ * psmouse core times out itself. 20 ms should be enough
+ * to decide if we are getting more data or not.
+ */
+ mod_timer(&priv->timer, jiffies + msecs_to_jiffies(20));
+ return PSMOUSE_GOOD_DATA;
+ }
+
+ del_timer(&priv->timer);
+
+ if (psmouse->packet[6] & 0x80) {
+
+ /*
+ * Highest bit is set - that means we either had
+ * complete ALPS packet and this is start of the
+ * next packet or we got garbage.
+ */
+
+ if (((psmouse->packet[3] |
+ psmouse->packet[4] |
+ psmouse->packet[5]) & 0x80) ||
+ (!alps_is_valid_first_byte(priv->i, psmouse->packet[6]))) {
+ dbg("refusing packet %x %x %x %x "
+ "(suspected interleaved ps/2)\n",
+ psmouse->packet[3], psmouse->packet[4],
+ psmouse->packet[5], psmouse->packet[6]);
+ return PSMOUSE_BAD_DATA;
+ }
+
+ alps_process_packet(psmouse);
+
+ /* Continue with the next packet */
+ psmouse->packet[0] = psmouse->packet[6];
+ psmouse->pktcnt = 1;
+
+ } else {
+
+ /*
+ * High bit is 0 - that means that we indeed got a PS/2
+ * packet in the middle of ALPS packet.
+ *
+ * There is also possibility that we got 6-byte ALPS
+ * packet followed by 3-byte packet from trackpoint. We
+ * can not distinguish between these 2 scenarios but
+ * becase the latter is unlikely to happen in course of
+ * normal operation (user would need to press all
+ * buttons on the pad and start moving trackpoint
+ * without touching the pad surface) we assume former.
+ * Even if we are wrong the wost thing that would happen
+ * the cursor would jump but we should not get protocol
+ * desynchronization.
+ */
+
+ if (psmouse->pktcnt == 9) {
+ /* Process interleaved PS/2 data */
+ alps_report_bare_ps2_packet(psmouse,
+ &psmouse->packet[3], false);
+
+ /* Continue with the standard ALPS protocol handling */
+ psmouse->packet[3] = psmouse->packet[6];
+ psmouse->packet[4] = psmouse->packet[7];
+ psmouse->packet[5] = psmouse->packet[8];
+ psmouse->pktcnt = 6;
+ alps_process_packet(psmouse);
+ return PSMOUSE_FULL_PACKET;
+ }
+ }
+
+ return PSMOUSE_GOOD_DATA;
+}
+
+static void alps_flush_packet(unsigned long data)
+{
+ struct psmouse *psmouse = (struct psmouse *)data;
+
+ serio_pause_rx(psmouse->ps2dev.serio);
+
+ if (psmouse->pktcnt == 6) {
+
+ /*
+ * We did not any more data in reasonable amount of time.
+ * Validate the last 3 bytes and process as a standard
+ * ALPS packet.
+ */
+ if ((psmouse->packet[3] |
+ psmouse->packet[4] |
+ psmouse->packet[5]) & 0x80) {
+ dbg("refusing packet %x %x %x "
+ "(suspected interleaved ps/2)\n",
+ psmouse->packet[3], psmouse->packet[4],
+ psmouse->packet[5]);
+ } else {
+ alps_process_packet(psmouse);
+ }
+ psmouse->pktcnt = 0;
+ }
+
+ serio_continue_rx(psmouse->ps2dev.serio);
+}
+
static psmouse_ret_t alps_process_byte(struct psmouse *psmouse)
{
struct alps_data *priv = psmouse->private;
+ const struct alps_model_info *model = priv->i;
+
+ dbg ("handle: %x", psmouse->packet[psmouse->pktcnt-1]);
if ((psmouse->packet[0] & 0xc8) == 0x08) { /* PS/2 packet */
if (psmouse->pktcnt == 3) {
- alps_process_packet(psmouse);
+ alps_report_bare_ps2_packet(psmouse, psmouse->packet,
+ true);
return PSMOUSE_FULL_PACKET;
}
return PSMOUSE_GOOD_DATA;
}
- if ((psmouse->packet[0] & priv->i->mask0) != priv->i->byte0)
+ /* Check for PS/2 packet stuffed in the middle of ALPS packet. */
+
+ if ((model->flags & ALPS_PS2_INTERLEAVED) &&
+ psmouse->pktcnt >= 4 && (psmouse->packet[3] & 0x0f) == 0x0f) {
+ return alps_handle_interleaved_ps2(psmouse);
+ }
+
+ if (!alps_is_valid_first_byte(model, psmouse->packet[0])) {
+ dbg("refusing packet[0] = %x (mask0 = %x, byte0 = %x)\n",
+ psmouse->packet[0], model->mask0, model->byte0);
return PSMOUSE_BAD_DATA;
+ }
/* Bytes 2 - 6 should have 0 in the highest bit */
if (psmouse->pktcnt >= 2 && psmouse->pktcnt <= 6 &&
- (psmouse->packet[psmouse->pktcnt - 1] & 0x80))
+ (psmouse->packet[psmouse->pktcnt - 1] & 0x80)) {
+ dbg("refusing packet[%i] = %x\n",
+ psmouse->pktcnt - 1, psmouse->packet[psmouse->pktcnt - 1]);
return PSMOUSE_BAD_DATA;
+ }
if (psmouse->pktcnt == 6) {
alps_process_packet(psmouse);
@@ -459,6 +656,7 @@ static void alps_disconnect(struct psmouse *psmouse)
struct alps_data *priv = psmouse->private;
psmouse_reset(psmouse);
+ del_timer_sync(&priv->timer);
input_unregister_device(priv->dev2);
kfree(priv);
}
@@ -476,6 +674,8 @@ int alps_init(struct psmouse *psmouse)
goto init_fail;
priv->dev2 = dev2;
+ setup_timer(&priv->timer, alps_flush_packet, (unsigned long)psmouse);
+
psmouse->private = priv;
model = alps_get_model(psmouse, &version);
diff --git a/drivers/input/mouse/alps.h b/drivers/input/mouse/alps.h
index bc87936..904ed8b 100644
--- a/drivers/input/mouse/alps.h
+++ b/drivers/input/mouse/alps.h
@@ -23,6 +23,7 @@ struct alps_data {
char phys[32]; /* Phys */
const struct alps_model_info *i;/* Info */
int prev_fin; /* Finger bit from previous packet */
+ struct timer_list timer;
};
#ifdef CONFIG_MOUSE_PS2_ALPS
diff --git a/drivers/input/mouse/psmouse.h b/drivers/input/mouse/psmouse.h
index e053bdd..d4772fe 100644
--- a/drivers/input/mouse/psmouse.h
+++ b/drivers/input/mouse/psmouse.h
@@ -42,7 +42,7 @@ struct psmouse {
struct delayed_work resync_work;
char *vendor;
char *name;
- unsigned char packet[8];
+ unsigned char packet[9];
unsigned char badbyte;
unsigned char pktcnt;
unsigned char pktsize;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
@ 2013-03-05 13:51 sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
0 siblings, 2 replies; 19+ messages in thread
From: sjur.brandeland @ 2013-03-05 13:51 UTC (permalink / raw)
To: Ohad Ben-Cohen, Rusty Russell
Cc: sjur, Linus Walleij, Sjur Brændeland, Erwan Yvin,
virtualization
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add wrappers for the host vrings to support loose
coupling between the virtio device and driver.
The functions find_vrhs() and del_vrhs() are added to
struct virtio_config_ops to manage the host vrings.
The function vringh_notify() is added so the guest
can be kicked when buffers are added to the used-ring.
This enables the virtio drivers to manage the virtio rings
without knowledge of how the host vrings are managed.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
include/linux/virtio_config.h | 13 +++++++++++++
include/linux/vringh.h | 13 +++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 29b9104..88dd5ae 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -51,7 +51,17 @@
* This returns a pointer to the bus name a la pci_name from which
* the caller can then copy.
* @set_vq_affinity: set the affinity for a virtqueue.
+ * @find_vrhs: find the host vrings and instantiate them
+ * vdev: the virtio_device
+ * nhvrs: the number of host vrings to find
+ * hvrs: on success, includes new host vrings
+ * callbacks: array of driver callbacks, for each host vring
+ * include a NULL entry for vqs that do not need a callback
+ * Returns 0 on success or error status
+ * @del_vrhs: free the host vrings found by find_vrhs().
*/
+struct vringh;
+typedef void vrh_callback_t(struct virtio_device *, struct vringh *);
typedef void vq_callback_t(struct virtqueue *);
struct virtio_config_ops {
void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -70,6 +80,9 @@ struct virtio_config_ops {
void (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
+ int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
+ struct vringh *vrhs[], vrh_callback_t *callbacks[]);
+ void (*del_vrhs)(struct virtio_device *vdev);
};
/* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index ab41185..8156f51 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -50,6 +50,12 @@ struct vringh {
/* The vring (note: it may contain user pointers!) */
struct vring vring;
+
+ /* The function to call when buffers are available */
+ void (*notify)(struct vringh *);
+
+ /* A pointer for the vringh clients to use. */
+ void *priv;
};
/* The memory the vring can access, and what offset to apply. */
@@ -182,4 +188,11 @@ void vringh_notify_disable_kern(struct vringh *vrh);
int vringh_need_notify_kern(struct vringh *vrh);
+/* Notify the guest about buffers added to the used ring */
+static inline void vringh_notify(struct vringh *vrh)
+{
+ if (vrh->notify)
+ vrh->notify(vrh);
+}
+
#endif /* _LINUX_VRINGH_H */
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
@ 2013-03-06 4:42 ` Rusty Russell
2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
1 sibling, 1 reply; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:42 UTC (permalink / raw)
To: Ohad Ben-Cohen
Cc: sjur, Linus Walleij, Sjur Brændeland, Erwan Yvin,
virtualization
sjur.brandeland@stericsson.com writes:
> From: Sjur Brændeland <sjur.brandeland@stericsson.com>
>
> Add wrappers for the host vrings to support loose
> coupling between the virtio device and driver.
>
> The functions find_vrhs() and del_vrhs() are added to
> struct virtio_config_ops to manage the host vrings.
> The function vringh_notify() is added so the guest
> can be kicked when buffers are added to the used-ring.
>
> This enables the virtio drivers to manage the virtio rings
> without knowledge of how the host vrings are managed.
Hmm, this is a bit weird.
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 29b9104..88dd5ae 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -51,7 +51,17 @@
> * This returns a pointer to the bus name a la pci_name from which
> * the caller can then copy.
> * @set_vq_affinity: set the affinity for a virtqueue.
> + * @find_vrhs: find the host vrings and instantiate them
> + * vdev: the virtio_device
> + * nhvrs: the number of host vrings to find
> + * hvrs: on success, includes new host vrings
> + * callbacks: array of driver callbacks, for each host vring
> + * include a NULL entry for vqs that do not need a callback
> + * Returns 0 on success or error status
> + * @del_vrhs: free the host vrings found by find_vrhs().
> */
> +struct vringh;
> +typedef void vrh_callback_t(struct virtio_device *, struct vringh *);
> typedef void vq_callback_t(struct virtqueue *);
> struct virtio_config_ops {
> void (*get)(struct virtio_device *vdev, unsigned offset,
> @@ -70,6 +80,9 @@ struct virtio_config_ops {
> void (*finalize_features)(struct virtio_device *vdev);
> const char *(*bus_name)(struct virtio_device *vdev);
> int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
> + int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
> + struct vringh *vrhs[], vrh_callback_t *callbacks[]);
> + void (*del_vrhs)(struct virtio_device *vdev);
> };
>
> /* If driver didn't advertise the feature, it will never appear. */
It's weird that you conflate the host and guest ring sides in rpmsg, but
that might make sense if they're really bound together. However, in
general they are not: it's normal to be a guest or host, not both.
This implies that you need a struct vringh_config, to put this in.
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index ab41185..8156f51 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -50,6 +50,12 @@ struct vringh {
>
> /* The vring (note: it may contain user pointers!) */
> struct vring vring;
> +
> + /* The function to call when buffers are available */
> + void (*notify)(struct vringh *);
> +
> + /* A pointer for the vringh clients to use. */
> + void *priv;
> };
Since the caller allocates the vringh, can it not use container_of()
instead of a priv pointer?
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 19+ messages in thread
* [FYI] vringh fixes.
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
@ 2013-03-06 4:46 ` Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
1 sibling, 2 replies; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:46 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
Hi all,
More benchmarking using vringh_test, and I finally figured out that my
changes to virtio_ring had no effect because we were actually
bottlenecked on the vringh side...
Here are the changes I've ended up with, for your reading pleasure.
They are all in my pending-rebases git tree, and I'll fold them together
before putting them into virtio-next (I'm waiting for Sjur's notify
changes).
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2]
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
@ 2013-03-06 4:53 ` Rusty Russell
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
1 sibling, 1 reply; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:53 UTC (permalink / raw)
To: mst; +Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
tools/virtio: separate headers more.
This makes them a bit more like the kernel headers, so we can include more
real kernel headers in our tests.
In addition this means that we don't break tools/virtio with the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Changes since last version: use ACCESS_ONCE() for "userspace" reads.
This semantic is assumed in kernel userspace access, since otherwise
the compiler might decide to re-grab a value after verification,
introducing a security hole).
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
new file mode 100644
index 0000000..aff61e1
--- /dev/null
+++ b/tools/virtio/asm/barrier.h
@@ -0,0 +1,14 @@
+#if defined(__i386__) || defined(__x86_64__)
+#define barrier() asm volatile("" ::: "memory")
+#define mb() __sync_synchronize()
+
+#define smp_mb() mb()
+# define smp_rmb() barrier()
+# define smp_wmb() barrier()
+/* Weak barriers should be used. If not - it's a bug */
+# define rmb() abort()
+# define wmb() abort()
+#else
+#error Please fill in barrier macros
+#endif
+
diff --git a/tools/virtio/linux/bug.h b/tools/virtio/linux/bug.h
new file mode 100644
index 0000000..fb94f07
--- /dev/null
+++ b/tools/virtio/linux/bug.h
@@ -0,0 +1,10 @@
+#ifndef BUG_H
+#define BUG_H
+
+#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
+
+#define BUILD_BUG_ON(x)
+
+#define BUG() abort()
+
+#endif /* BUG_H */
diff --git a/tools/virtio/linux/err.h b/tools/virtio/linux/err.h
new file mode 100644
index 0000000..e32eff8
--- /dev/null
+++ b/tools/virtio/linux/err.h
@@ -0,0 +1,26 @@
+#ifndef ERR_H
+#define ERR_H
+#define MAX_ERRNO 4095
+
+#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+ return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(const void *ptr)
+{
+ return (long) ptr;
+}
+
+static inline long __must_check IS_ERR(const void *ptr)
+{
+ return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline long __must_check IS_ERR_OR_NULL(const void *ptr)
+{
+ return !ptr || IS_ERR_VALUE((unsigned long)ptr);
+}
+#endif /* ERR_H */
diff --git a/tools/virtio/linux/export.h b/tools/virtio/linux/export.h
new file mode 100644
index 0000000..7311d32
--- /dev/null
+++ b/tools/virtio/linux/export.h
@@ -0,0 +1,5 @@
+#define EXPORT_SYMBOL(sym)
+#define EXPORT_SYMBOL_GPL(sym)
+#define EXPORT_SYMBOL_GPL_FUTURE(sym)
+#define EXPORT_UNUSED_SYMBOL(sym)
+#define EXPORT_UNUSED_SYMBOL_GPL(sym)
diff --git a/tools/virtio/linux/irqreturn.h b/tools/virtio/linux/irqreturn.h
new file mode 100644
index 0000000..a3c4e7b
--- /dev/null
+++ b/tools/virtio/linux/irqreturn.h
@@ -0,0 +1 @@
+#include "../../../include/linux/irqreturn.h"
diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
new file mode 100644
index 0000000..fba7059
--- /dev/null
+++ b/tools/virtio/linux/kernel.h
@@ -0,0 +1,112 @@
+#ifndef KERNEL_H
+#define KERNEL_H
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <stdarg.h>
+
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
+#include <errno.h>
+#include <unistd.h>
+#include <asm/barrier.h>
+
+#define CONFIG_SMP
+
+#define PAGE_SIZE getpagesize()
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+typedef unsigned long long dma_addr_t;
+typedef size_t __kernel_size_t;
+
+struct page {
+ unsigned long long dummy;
+};
+
+/* Physical == Virtual */
+#define virt_to_phys(p) ((unsigned long)p)
+#define phys_to_virt(a) ((void *)(unsigned long)(a))
+/* Page address: Virtual / 4K */
+#define page_to_phys(p) ((dma_addr_t)(unsigned long)(p))
+#define virt_to_page(p) ((struct page *)((unsigned long)p & PAGE_MASK))
+
+#define offset_in_page(p) (((unsigned long)p) % PAGE_SIZE)
+
+#define __printf(a,b) __attribute__((format(printf,a,b)))
+
+typedef enum {
+ GFP_KERNEL,
+ GFP_ATOMIC,
+ __GFP_HIGHMEM,
+ __GFP_HIGH
+} gfp_t;
+
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+
+extern void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static inline void *kmalloc(size_t s, gfp_t gfp)
+{
+ if (__kmalloc_fake)
+ return __kmalloc_fake;
+ return malloc(s);
+}
+
+static inline void kfree(void *p)
+{
+ if (p >= __kfree_ignore_start && p < __kfree_ignore_end)
+ return;
+ free(p);
+}
+
+static inline void *krealloc(void *p, size_t s, gfp_t gfp)
+{
+ return realloc(p, s);
+}
+
+
+static inline unsigned long __get_free_page(gfp_t gfp)
+{
+ void *p;
+
+ posix_memalign(&p, PAGE_SIZE, PAGE_SIZE);
+ return (unsigned long)p;
+}
+
+static inline void free_page(unsigned long addr)
+{
+ free((void *)addr);
+}
+
+#define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})
+
+#define uninitialized_var(x) x = x
+
+# ifndef likely
+# define likely(x) (__builtin_expect(!!(x), 1))
+# endif
+# ifndef unlikely
+# define unlikely(x) (__builtin_expect(!!(x), 0))
+# endif
+
+#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+#endif /* KERNEL_H */
diff --git a/tools/virtio/linux/module.h b/tools/virtio/linux/module.h
index e69de29..3039a7e 100644
--- a/tools/virtio/linux/module.h
+++ b/tools/virtio/linux/module.h
@@ -0,0 +1 @@
+#include <linux/export.h>
diff --git a/tools/virtio/linux/printk.h b/tools/virtio/linux/printk.h
new file mode 100644
index 0000000..9f2423b
--- /dev/null
+++ b/tools/virtio/linux/printk.h
@@ -0,0 +1,4 @@
+#include "../../../include/linux/kern_levels.h"
+
+#define printk printf
+#define vprintk vprintf
diff --git a/tools/virtio/linux/ratelimit.h b/tools/virtio/linux/ratelimit.h
new file mode 100644
index 0000000..dcce172
--- /dev/null
+++ b/tools/virtio/linux/ratelimit.h
@@ -0,0 +1,4 @@
+#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) int name = 0
+
+#define __ratelimit(x) (*(x))
+
diff --git a/tools/virtio/linux/scatterlist.h b/tools/virtio/linux/scatterlist.h
new file mode 100644
index 0000000..b2cf7d0
--- /dev/null
+++ b/tools/virtio/linux/scatterlist.h
@@ -0,0 +1,173 @@
+#ifndef SCATTERLIST_H
+#define SCATTERLIST_H
+#include <linux/kernel.h>
+
+struct scatterlist {
+ unsigned long page_link;
+ unsigned int offset;
+ unsigned int length;
+ dma_addr_t dma_address;
+};
+
+/* Scatterlist helpers, stolen from linux/scatterlist.h */
+#define sg_is_chain(sg) ((sg)->page_link & 0x01)
+#define sg_is_last(sg) ((sg)->page_link & 0x02)
+#define sg_chain_ptr(sg) \
+ ((struct scatterlist *) ((sg)->page_link & ~0x03))
+
+/**
+ * sg_assign_page - Assign a given page to an SG entry
+ * @sg: SG entry
+ * @page: The page
+ *
+ * Description:
+ * Assign page to sg entry. Also see sg_set_page(), the most commonly used
+ * variant.
+ *
+ **/
+static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
+{
+ unsigned long page_link = sg->page_link & 0x3;
+
+ /*
+ * In order for the low bit stealing approach to work, pages
+ * must be aligned at a 32-bit boundary as a minimum.
+ */
+ BUG_ON((unsigned long) page & 0x03);
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ sg->page_link = page_link | (unsigned long) page;
+}
+
+/**
+ * sg_set_page - Set sg entry to point at given page
+ * @sg: SG entry
+ * @page: The page
+ * @len: Length of data
+ * @offset: Offset into page
+ *
+ * Description:
+ * Use this function to set an sg entry pointing at a page, never assign
+ * the page directly. We encode sg table information in the lower bits
+ * of the page pointer. See sg_page() for looking up the page belonging
+ * to an sg entry.
+ *
+ **/
+static inline void sg_set_page(struct scatterlist *sg, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ sg_assign_page(sg, page);
+ sg->offset = offset;
+ sg->length = len;
+}
+
+static inline struct page *sg_page(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ return (struct page *)((sg)->page_link & ~0x3);
+}
+
+/*
+ * Loop over each sg element, following the pointer to a new list if necessary
+ */
+#define for_each_sg(sglist, sg, nr, __i) \
+ for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))
+
+/**
+ * sg_chain - Chain two sglists together
+ * @prv: First scatterlist
+ * @prv_nents: Number of entries in prv
+ * @sgl: Second scatterlist
+ *
+ * Description:
+ * Links @prv@ and @sgl@ together, to form a longer scatterlist.
+ *
+ **/
+static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
+ struct scatterlist *sgl)
+{
+ /*
+ * offset and length are unused for chain entry. Clear them.
+ */
+ prv[prv_nents - 1].offset = 0;
+ prv[prv_nents - 1].length = 0;
+
+ /*
+ * Set lowest bit to indicate a link pointer, and make sure to clear
+ * the termination bit if it happens to be set.
+ */
+ prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+}
+
+/**
+ * sg_mark_end - Mark the end of the scatterlist
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ * Marks the passed in sg entry as the termination point for the sg
+ * table. A call to sg_next() on this entry will return NULL.
+ *
+ **/
+static inline void sg_mark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ /*
+ * Set termination bit, clear potential chain bit
+ */
+ sg->page_link |= 0x02;
+ sg->page_link &= ~0x01;
+}
+
+static inline struct scatterlist *sg_next(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ if (sg_is_last(sg))
+ return NULL;
+
+ sg++;
+ if (unlikely(sg_is_chain(sg)))
+ sg = sg_chain_ptr(sg);
+
+ return sg;
+}
+
+static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
+{
+ memset(sgl, 0, sizeof(*sgl) * nents);
+#ifdef CONFIG_DEBUG_SG
+ {
+ unsigned int i;
+ for (i = 0; i < nents; i++)
+ sgl[i].sg_magic = SG_MAGIC;
+ }
+#endif
+ sg_mark_end(&sgl[nents - 1]);
+}
+
+static inline dma_addr_t sg_phys(struct scatterlist *sg)
+{
+ return page_to_phys(sg_page(sg)) + sg->offset;
+}
+
+static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
+ unsigned int buflen)
+{
+ sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
+}
+
+static inline void sg_init_one(struct scatterlist *sg,
+ const void *buf, unsigned int buflen)
+{
+ sg_init_table(sg, 1);
+ sg_set_buf(sg, buf, buflen);
+}
+#endif /* SCATTERLIST_H */
diff --git a/tools/virtio/linux/types.h b/tools/virtio/linux/types.h
new file mode 100644
index 0000000..f8ebb9a
--- /dev/null
+++ b/tools/virtio/linux/types.h
@@ -0,0 +1,28 @@
+#ifndef TYPES_H
+#define TYPES_H
+#include <stdint.h>
+
+#define __force
+#define __user
+#define __must_check
+#define __cold
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef uint64_t __u64;
+typedef int64_t __s64;
+typedef uint32_t __u32;
+typedef int32_t __s32;
+typedef uint16_t __u16;
+typedef int16_t __s16;
+typedef uint8_t __u8;
+typedef int8_t __s8;
+
+#endif /* TYPES_H */
diff --git a/tools/virtio/linux/uaccess.h b/tools/virtio/linux/uaccess.h
new file mode 100644
index 0000000..0a578fe
--- /dev/null
+++ b/tools/virtio/linux/uaccess.h
@@ -0,0 +1,50 @@
+#ifndef UACCESS_H
+#define UACCESS_H
+extern void *__user_addr_min, *__user_addr_max;
+
+#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+
+static inline void __chk_user_ptr(const volatile void *p, size_t size)
+{
+ assert(p >= __user_addr_min && p + size <= __user_addr_max);
+}
+
+#define put_user(x, ptr) \
+({ \
+ typeof(ptr) __pu_ptr = (ptr); \
+ __chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr)); \
+ ACCESS_ONCE(*(__pu_ptr)) = x; \
+ 0; \
+})
+
+#define get_user(x, ptr) \
+({ \
+ typeof(ptr) __pu_ptr = (ptr); \
+ __chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr)); \
+ x = ACCESS_ONCE(*(__pu_ptr)); \
+ 0; \
+})
+
+static void volatile_memcpy(volatile char *to, const volatile char *from,
+ unsigned long n)
+{
+ while (n--)
+ *(to++) = *(from++);
+}
+
+static inline int copy_from_user(void *to, const void __user volatile *from,
+ unsigned long n)
+{
+ __chk_user_ptr(from, n);
+ volatile_memcpy(to, from, n);
+ return 0;
+}
+
+static inline int copy_to_user(void __user volatile *to, const void *from,
+ unsigned long n)
+{
+ __chk_user_ptr(to, n);
+ volatile_memcpy(to, from, n);
+ return 0;
+}
+#endif /* UACCESS_H */
diff --git a/tools/virtio/linux/uio.h b/tools/virtio/linux/uio.h
new file mode 100644
index 0000000..ecbdf92
--- /dev/null
+++ b/tools/virtio/linux/uio.h
@@ -0,0 +1 @@
+#include "../../../include/linux/uio.h"
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 390c4cb..e4af659 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -1,129 +1,7 @@
#ifndef LINUX_VIRTIO_H
#define LINUX_VIRTIO_H
-
-#include <stdbool.h>
-#include <stdlib.h>
-#include <stddef.h>
-#include <stdio.h>
-#include <string.h>
-#include <assert.h>
-
-#include <linux/types.h>
-#include <errno.h>
-
-typedef unsigned long long dma_addr_t;
-
-struct scatterlist {
- unsigned long page_link;
- unsigned int offset;
- unsigned int length;
- dma_addr_t dma_address;
-};
-
-struct page {
- unsigned long long dummy;
-};
-
-#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
-
-/* Physical == Virtual */
-#define virt_to_phys(p) ((unsigned long)p)
-#define phys_to_virt(a) ((void *)(unsigned long)(a))
-/* Page address: Virtual / 4K */
-#define virt_to_page(p) ((struct page*)((virt_to_phys(p) / 4096) * \
- sizeof(struct page)))
-#define offset_in_page(p) (((unsigned long)p) % 4096)
-#define sg_phys(sg) ((sg->page_link & ~0x3) / sizeof(struct page) * 4096 + \
- sg->offset)
-static inline void sg_mark_end(struct scatterlist *sg)
-{
- /*
- * Set termination bit, clear potential chain bit
- */
- sg->page_link |= 0x02;
- sg->page_link &= ~0x01;
-}
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
- memset(sgl, 0, sizeof(*sgl) * nents);
- sg_mark_end(&sgl[nents - 1]);
-}
-static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
-{
- unsigned long page_link = sg->page_link & 0x3;
-
- /*
- * In order for the low bit stealing approach to work, pages
- * must be aligned at a 32-bit boundary as a minimum.
- */
- BUG_ON((unsigned long) page & 0x03);
- sg->page_link = page_link | (unsigned long) page;
-}
-
-static inline void sg_set_page(struct scatterlist *sg, struct page *page,
- unsigned int len, unsigned int offset)
-{
- sg_assign_page(sg, page);
- sg->offset = offset;
- sg->length = len;
-}
-
-static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
- unsigned int buflen)
-{
- sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
-}
-
-static inline void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
-{
- sg_init_table(sg, 1);
- sg_set_buf(sg, buf, buflen);
-}
-
-typedef __u16 u16;
-
-typedef enum {
- GFP_KERNEL,
- GFP_ATOMIC,
- __GFP_HIGHMEM,
- __GFP_HIGH
-} gfp_t;
-typedef enum {
- IRQ_NONE,
- IRQ_HANDLED
-} irqreturn_t;
-
-static inline void *kmalloc(size_t s, gfp_t gfp)
-{
- return malloc(s);
-}
-
-static inline void kfree(void *p)
-{
- free(p);
-}
-
-#define container_of(ptr, type, member) ({ \
- const typeof( ((type *)0)->member ) *__mptr = (ptr); \
- (type *)( (char *)__mptr - offsetof(type,member) );})
-
-#define uninitialized_var(x) x = x
-
-# ifndef likely
-# define likely(x) (__builtin_expect(!!(x), 1))
-# endif
-# ifndef unlikely
-# define unlikely(x) (__builtin_expect(!!(x), 0))
-# endif
-
-#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#ifdef DEBUG
-#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#else
-#define pr_debug(format, ...) do {} while (0)
-#endif
-#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
/* TODO: empty stubs for now. Broken but enough for virtio_ring.c */
#define list_add_tail(a, b) do {} while (0)
@@ -133,6 +11,7 @@ static inline void kfree(void *p)
#define BITS_PER_BYTE 8
#define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)
#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+
/* TODO: Not atomic as it should be:
* we don't use this for anything important. */
static inline void clear_bit(int nr, volatile unsigned long *addr)
@@ -147,10 +26,6 @@ static inline int test_bit(int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
-
-/* The only feature we care to support */
-#define virtio_has_feature(dev, feature) \
- test_bit((feature), (dev)->features)
/* end of stubs */
struct virtio_device {
@@ -170,28 +45,9 @@ struct virtqueue {
void *priv;
};
-#define EXPORT_SYMBOL_GPL(__EXPORT_SYMBOL_GPL_name) \
- void __EXPORT_SYMBOL_GPL##__EXPORT_SYMBOL_GPL_name() { \
-}
#define MODULE_LICENSE(__MODULE_LICENSE_value) \
const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
-#define CONFIG_SMP
-
-#if defined(__i386__) || defined(__x86_64__)
-#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-
-#define smp_mb() mb()
-# define smp_rmb() barrier()
-# define smp_wmb() barrier()
-/* Weak barriers should be used. If not - it's a bug */
-# define rmb() abort()
-# define wmb() abort()
-#else
-#error Please fill in barrier macros
-#endif
-
/* Interfaces exported by virtio_ring. */
int virtqueue_add_buf(struct virtqueue *vq,
struct scatterlist sg[],
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
new file mode 100644
index 0000000..5049967
--- /dev/null
+++ b/tools/virtio/linux/virtio_config.h
@@ -0,0 +1,6 @@
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END 32
+
+#define virtio_has_feature(dev, feature) \
+ test_bit((feature), (dev)->features)
+
diff --git a/tools/virtio/linux/virtio_ring.h b/tools/virtio/linux/virtio_ring.h
new file mode 100644
index 0000000..8949c4e
--- /dev/null
+++ b/tools/virtio/linux/virtio_ring.h
@@ -0,0 +1 @@
+#include "../../../include/linux/virtio_ring.h"
diff --git a/tools/virtio/linux/vringh.h b/tools/virtio/linux/vringh.h
new file mode 100644
index 0000000..9348957
--- /dev/null
+++ b/tools/virtio/linux/vringh.h
@@ -0,0 +1 @@
+#include "../../../include/linux/vringh.h"
diff --git a/tools/virtio/uapi/linux/uio.h b/tools/virtio/uapi/linux/uio.h
new file mode 100644
index 0000000..7230e90
--- /dev/null
+++ b/tools/virtio/uapi/linux/uio.h
@@ -0,0 +1 @@
+#include <sys/uio.h>
diff --git a/tools/virtio/uapi/linux/virtio_config.h b/tools/virtio/uapi/linux/virtio_config.h
new file mode 100644
index 0000000..4c86675
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_config.h
@@ -0,0 +1 @@
+#include "../../../../include/uapi/linux/virtio_config.h"
diff --git a/tools/virtio/uapi/linux/virtio_ring.h b/tools/virtio/uapi/linux/virtio_ring.h
new file mode 100644
index 0000000..4d99c78
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_ring.h
@@ -0,0 +1,4 @@
+#ifndef VIRTIO_RING_H
+#define VIRTIO_RING_H
+#include "../../../../include/uapi/linux/virtio_ring.h"
+#endif /* VIRTIO_RING_H */
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index faf3f0c..814ae80 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -10,11 +10,15 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
+#include <stdbool.h>
#include <linux/vhost.h>
#include <linux/virtio.h>
#include <linux/virtio_ring.h>
#include "../../drivers/vhost/test.h"
+/* Unused */
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+
struct vq_info {
int kick;
int call;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
@ 2013-03-06 4:54 ` Rusty Russell
2013-03-06 10:20 ` Michael S. Tsirkin
0 siblings, 1 reply; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:54 UTC (permalink / raw)
To: mst; +Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
In the coming vringh_test, we share an mmap with another userspace process
for testing. This requires real barriers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
index aff61e1..7a63693 100644
--- a/tools/virtio/asm/barrier.h
+++ b/tools/virtio/asm/barrier.h
@@ -3,8 +3,8 @@
#define mb() __sync_synchronize()
#define smp_mb() mb()
-# define smp_rmb() barrier()
-# define smp_wmb() barrier()
+# define smp_rmb() mb()
+# define smp_wmb() mb()
/* Weak barriers should be used. If not - it's a bug */
# define rmb() abort()
# define wmb() abort()
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 1/3] vringh: host-side implementation of virtio rings (v2)
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
@ 2013-03-06 4:57 ` Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
1 sibling, 2 replies; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:57 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This abstracts the business of dealing with the virtio ring layout
from the access (userspace or direct); to do this, we use function
pointers, which gcc inlines correctly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Changed:
Use ACCESS_ONCE in kernel ring accessors, to match user accessor
semantics.
diff --git a/drivers/Makefile b/drivers/Makefile
index dce39a9..72d28d3 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -123,7 +123,7 @@ obj-$(CONFIG_PPC_PS3) += ps3/
obj-$(CONFIG_OF) += of/
obj-$(CONFIG_SSB) += ssb/
obj-$(CONFIG_BCMA) += bcma/
-obj-$(CONFIG_VHOST_NET) += vhost/
+obj-$(CONFIG_VHOST_RING) += vhost/
obj-$(CONFIG_VLYNQ) += vlynq/
obj-$(CONFIG_STAGING) += staging/
obj-y += platform/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index bf24317..85b773a 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
config VHOST_NET
tristate "Host kernel accelerator for virtio net"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP)
+ select VHOST_RING
---help---
This kernel module can be loaded in host kernel to accelerate
guest networking with virtio_net. Not to be confused with virtio_net
@@ -12,3 +13,10 @@ config VHOST_NET
if STAGING
source "drivers/vhost/Kconfig.tcm"
endif
+
+config VHOST_RING
+ tristate
+ ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index 7e3aa28..c3a8cfa 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
config TCM_VHOST
tristate "TCM_VHOST fabric module"
depends on TARGET_CORE && EVENTFD && m
+ select VHOST_RING
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1d37f5e 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
vhost_net-y := vhost.o net.o
obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+
+obj-$(CONFIG_VHOST_RING) += vringh.o
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
new file mode 100644
index 0000000..8234f2b
--- /dev/null
+++ b/drivers/vhost/vringh.c
@@ -0,0 +1,1012 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/vringh.h>
+#include <linux/virtio_ring.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+ static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ if (__ratelimit(&vringh_rs)) {
+ va_list ap;
+ va_start(ap, fmt);
+ printk(KERN_NOTICE "vringh:");
+ vprintk(fmt, ap);
+ va_end(ap);
+ }
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ u16 *last_avail_idx)
+{
+ u16 avail_idx, i, head;
+ int err;
+
+ err = getu16(&avail_idx, &vrh->vring.avail->idx);
+ if (err) {
+ vringh_bad("Failed to access avail idx at %p",
+ &vrh->vring.avail->idx);
+ return err;
+ }
+
+ if (*last_avail_idx == avail_idx)
+ return vrh->vring.num;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ virtio_rmb(vrh->weak_barriers);
+
+ i = *last_avail_idx & (vrh->vring.num - 1);
+
+ err = getu16(&head, &vrh->vring.avail->ring[i]);
+ if (err) {
+ vringh_bad("Failed to read head: idx %d address %p",
+ *last_avail_idx, &vrh->vring.avail->ring[i]);
+ return err;
+ }
+
+ if (head >= vrh->vring.num) {
+ vringh_bad("Guest says index %u > %u is available",
+ head, vrh->vring.num);
+ return -EINVAL;
+ }
+
+ (*last_avail_idx)++;
+ return head;
+}
+
+/* Copy some bytes to/from the iovec. Returns num copied. */
+static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
+ void *ptr, size_t len,
+ int (*xfer)(void *addr, void *ptr,
+ size_t len))
+{
+ int err, done = 0;
+
+ while (len && iov->i < iov->used) {
+ size_t partlen;
+
+ partlen = min(iov->iov[iov->i].iov_len - iov->off, len);
+ err = xfer(iov->iov[iov->i].iov_base + iov->off, ptr, partlen);
+ if (err)
+ return err;
+ done += partlen;
+ len -= partlen;
+ ptr += partlen;
+ iov->off += partlen;
+
+ if (iov->off == iov->iov[iov->i].iov_len) {
+ iov->off = 0;
+ iov->i++;
+ }
+ }
+ return done;
+}
+
+/* May reduce *len if range is shorter. */
+static inline bool range_check(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *,
+ u64, struct vringh_range *))
+{
+ if (addr < range->start || addr > range->end_incl) {
+ if (!getrange(vrh, addr, range))
+ return false;
+ }
+ BUG_ON(addr < range->start || addr > range->end_incl);
+
+ /* To end of memory? */
+ if (unlikely(addr + *len == 0)) {
+ if (range->end_incl == -1ULL)
+ return true;
+ goto truncate;
+ }
+
+ /* Otherwise, don't wrap. */
+ if (addr + *len < addr) {
+ vringh_bad("Wrapping descriptor %zu@0x%llx",
+ *len, (unsigned long long)addr);
+ return false;
+ }
+
+ if (unlikely(addr + *len - 1 > range->end_incl))
+ goto truncate;
+ return true;
+
+truncate:
+ *len = range->end_incl + 1 - addr;
+ return true;
+}
+
+static inline bool no_range_check(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *,
+ u64, struct vringh_range *))
+{
+ return true;
+}
+
+/* No reason for this code to be inline. */
+static int move_to_indirect(int *up_next, u16 *i, void *addr,
+ const struct vring_desc *desc,
+ struct vring_desc **descs, int *desc_max)
+{
+ /* Indirect tables can't have indirect. */
+ if (*up_next != -1) {
+ vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
+ return -EINVAL;
+ }
+
+ if (unlikely(desc->len % sizeof(struct vring_desc))) {
+ vringh_bad("Strange indirect len %u", desc->len);
+ return -EINVAL;
+ }
+
+ /* We will check this when we follow it! */
+ if (desc->flags & VRING_DESC_F_NEXT)
+ *up_next = desc->next;
+ else
+ *up_next = -2;
+ *descs = addr;
+ *desc_max = desc->len / sizeof(struct vring_desc);
+
+ /* Now, start at the first indirect. */
+ *i = 0;
+ return 0;
+}
+
+static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
+{
+ struct kvec *new;
+ unsigned int flag, new_num = (iov->max_num & ~VRINGH_IOV_ALLOCATED) * 2;
+
+ if (new_num < 8)
+ new_num = 8;
+
+ flag = (iov->max_num & VRINGH_IOV_ALLOCATED);
+ if (flag)
+ new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+ else {
+ new = kmalloc(new_num * sizeof(struct iovec), gfp);
+ if (new) {
+ memcpy(new, iov->iov,
+ iov->max_num * sizeof(struct iovec));
+ flag = VRINGH_IOV_ALLOCATED;
+ }
+ }
+ if (!new)
+ return -ENOMEM;
+ iov->iov = new;
+ iov->max_num = (new_num | flag);
+ return 0;
+}
+
+static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
+ struct vring_desc **descs, int *desc_max)
+{
+ u16 i = *up_next;
+
+ *up_next = -1;
+ *descs = vrh->vring.desc;
+ *desc_max = vrh->vring.num;
+ return i;
+}
+
+static int slow_copy(struct vringh *vrh, void *dst, const void *src,
+ bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *vrh,
+ u64,
+ struct vringh_range *)),
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr,
+ struct vringh_range *r),
+ struct vringh_range *range,
+ int (*copy)(void *dst, const void *src, size_t len))
+{
+ size_t part, len = sizeof(struct vring_desc);
+
+ do {
+ u64 addr;
+ int err;
+
+ part = len;
+ addr = (u64)(unsigned long)src - range->offset;
+
+ if (!rcheck(vrh, addr, &part, range, getrange))
+ return -EINVAL;
+
+ err = copy(dst, src, part);
+ if (err)
+ return err;
+
+ dst += part;
+ src += part;
+ len -= part;
+ } while (len);
+ return 0;
+}
+
+static inline int
+__vringh_iov(struct vringh *vrh, u16 i,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *, u64,
+ struct vringh_range *)),
+ bool (*getrange)(struct vringh *, u64, struct vringh_range *),
+ gfp_t gfp,
+ int (*copy)(void *dst, const void *src, size_t len))
+{
+ int err, count = 0, up_next, desc_max;
+ struct vring_desc desc, *descs;
+ struct vringh_range range = { -1ULL, 0 }, slowrange;
+ bool slow = false;
+
+ /* We start traversing vring's descriptor table. */
+ descs = vrh->vring.desc;
+ desc_max = vrh->vring.num;
+ up_next = -1;
+
+ if (riov)
+ riov->i = riov->used = 0;
+ else if (wiov)
+ wiov->i = wiov->used = 0;
+ else
+ /* You must want something! */
+ BUG();
+
+ for (;;) {
+ void *addr;
+ struct vringh_kiov *iov;
+ size_t len;
+
+ if (unlikely(slow))
+ err = slow_copy(vrh, &desc, &descs[i], rcheck, getrange,
+ &slowrange, copy);
+ else
+ err = copy(&desc, &descs[i], sizeof(desc));
+ if (unlikely(err))
+ goto fail;
+
+ if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+
+ if (unlikely(len != desc.len)) {
+ slow = true;
+ /* We need to save this range to use offset */
+ slowrange = range;
+ }
+
+ addr = (void *)(long)(desc.addr + range.offset);
+ err = move_to_indirect(&up_next, &i, addr, &desc,
+ &descs, &desc_max);
+ if (err)
+ goto fail;
+ continue;
+ }
+
+ if (count++ == vrh->vring.num) {
+ vringh_bad("Descriptor loop in %p", descs);
+ err = -ELOOP;
+ goto fail;
+ }
+
+ if (desc.flags & VRING_DESC_F_WRITE)
+ iov = wiov;
+ else {
+ iov = riov;
+ if (unlikely(wiov && wiov->i)) {
+ vringh_bad("Readable desc %p after writable",
+ &descs[i]);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ if (!iov) {
+ vringh_bad("Unexpected %s desc",
+ !wiov ? "writable" : "readable");
+ err = -EPROTO;
+ goto fail;
+ }
+
+ again:
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = (void *)(unsigned long)(desc.addr + range.offset);
+
+ if (unlikely(iov->used == (iov->max_num & ~VRINGH_IOV_ALLOCATED))) {
+ err = resize_iovec(iov, gfp);
+ if (err)
+ goto fail;
+ }
+
+ iov->iov[iov->used].iov_base = addr;
+ iov->iov[iov->used].iov_len = len;
+ iov->used++;
+
+ if (unlikely(len != desc.len)) {
+ desc.len -= len;
+ desc.addr += len;
+ goto again;
+ }
+
+ if (desc.flags & VRING_DESC_F_NEXT) {
+ i = desc.next;
+ } else {
+ /* Just in case we need to finish traversing above. */
+ if (unlikely(up_next > 0)) {
+ i = return_from_indirect(vrh, &up_next,
+ &descs, &desc_max);
+ slow = false;
+ } else
+ break;
+ }
+
+ if (i >= desc_max) {
+ vringh_bad("Chained index %u > %u", i, desc_max);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ return 0;
+
+fail:
+ return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh,
+ const struct vring_used_elem *used,
+ unsigned int num_used,
+ int (*putu16)(u16 *p, u16 val),
+ int (*putused)(struct vring_used_elem *dst,
+ const struct vring_used_elem
+ *src, unsigned num))
+{
+ struct vring_used *used_ring;
+ int err;
+ u16 used_idx, off;
+
+ used_ring = vrh->vring.used;
+ used_idx = vrh->last_used_idx + vrh->completed;
+
+ off = used_idx % vrh->vring.num;
+
+ /* Compiler knows num_used == 1 sometimes, hence extra check */
+ if (num_used > 1 && unlikely(off + num_used >= vrh->vring.num)) {
+ u16 part = vrh->vring.num - off;
+ err = putused(&used_ring->ring[off], used, part);
+ if (!err)
+ err = putused(&used_ring->ring[0], used + part,
+ num_used - part);
+ } else
+ err = putused(&used_ring->ring[off], used, num_used);
+
+ if (err) {
+ vringh_bad("Failed to write %u used entries %u at %p",
+ num_used, off, &used_ring->ring[off]);
+ return err;
+ }
+
+ /* Make sure buffer is written before we update index. */
+ virtio_wmb(vrh->weak_barriers);
+
+ err = putu16(&vrh->vring.used->idx, used_idx + num_used);
+ if (err) {
+ vringh_bad("Failed to update used index at %p",
+ &vrh->vring.used->idx);
+ return err;
+ }
+
+ vrh->completed += num_used;
+ return 0;
+}
+
+
+static inline int __vringh_need_notify(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p))
+{
+ bool notify;
+ u16 used_event;
+ int err;
+
+ /* Flush out used index update. This is paired with the
+ * barrier that the Guest executes when enabling
+ * interrupts. */
+ virtio_mb(vrh->weak_barriers);
+
+ /* Old-style, without event indices. */
+ if (!vrh->event_indices) {
+ u16 flags;
+ err = getu16(&flags, &vrh->vring.avail->flags);
+ if (err) {
+ vringh_bad("Failed to get flags at %p",
+ &vrh->vring.avail->flags);
+ return err;
+ }
+ return (!(flags & VRING_AVAIL_F_NO_INTERRUPT));
+ }
+
+ /* Modern: we know when other side wants to know. */
+ err = getu16(&used_event, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to get used event idx at %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+
+ /* Just in case we added so many that we wrap. */
+ if (unlikely(vrh->completed > 0xffff))
+ notify = true;
+ else
+ notify = vring_need_event(used_event,
+ vrh->last_used_idx + vrh->completed,
+ vrh->last_used_idx);
+
+ vrh->last_used_idx += vrh->completed;
+ vrh->completed = 0;
+ return notify;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val))
+{
+ u16 avail;
+
+ /* Already enabled? */
+ if (vrh->listening)
+ return false;
+
+ vrh->listening = true;
+
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, 0) != 0) {
+ vringh_bad("Clearing used flags %p",
+ &vrh->vring.used->flags);
+ return false;
+ }
+ } else {
+ if (putu16(&vring_avail_event(&vrh->vring),
+ vrh->last_avail_idx) != 0) {
+ vringh_bad("Updating avail event index %p",
+ &vring_avail_event(&vrh->vring));
+ return false;
+ }
+ }
+
+ /* They could have slipped one in as we were doing that: make
+ * sure it's written, then check again. */
+ virtio_mb(vrh->weak_barriers);
+
+ if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+ vringh_bad("Failed to check avail idx at %p",
+ &vrh->vring.avail->idx);
+ return false;
+ }
+
+ /* This is so unlikely, we just leave notifications enabled. */
+ return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+ int (*putu16)(u16 *p, u16 val))
+{
+ /* Already disabled? */
+ if (!vrh->listening)
+ return;
+
+ vrh->listening = false;
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+ vringh_bad("Setting used flags %p",
+ &vrh->vring.used->flags);
+ }
+ }
+}
+
+/* Userspace access helpers: in this case, addresses are really userspace. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+ return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+ return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int copydesc_user(void *dst, const void *src, size_t len)
+{
+ return copy_from_user(dst, (__force void __user *)src, len) ?
+ -EFAULT : 0;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+ const struct vring_used_elem *src,
+ unsigned int num)
+{
+ return copy_to_user((__force void __user *)dst, src,
+ sizeof(*dst) * num) ? -EFAULT : 0;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+ return copy_from_user(dst, (__force void __user *)src, len) ?
+ -EFAULT : 0;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+ return copy_to_user((__force void __user *)dst, src, len) ?
+ -EFAULT : 0;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %u", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+ vrh->weak_barriers = weak_barriers;
+ vrh->listening = false;
+ vrh->completed = 0;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ /* vring expects kernel addresses, but only used via accessors. */
+ vrh->vring.desc = (__force struct vring_desc *)desc;
+ vrh->vring.avail = (__force struct vring_avail *)avail;
+ vrh->vring.used = (__force struct vring_used *)used;
+ return 0;
+}
+EXPORT_SYMBOL(vringh_init_user);
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @getrange: function to call to check ranges.
+ * @head: head index we received, for passing to vringh_complete_user().
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num. You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ u16 *head)
+{
+ int err;
+
+ *head = vrh->vring.num;
+ err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ /* We need the layouts to be the indentical for this to work */
+ BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
+ offsetof(struct vringh_iov, iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
+ offsetof(struct vringh_iov, i));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
+ offsetof(struct vringh_iov, used));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
+ offsetof(struct vringh_iov, max_num));
+ BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
+ offsetof(struct kvec, iov_base));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
+ offsetof(struct kvec, iov_len));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
+ != sizeof(((struct kvec *)NULL)->iov_base));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
+ != sizeof(((struct kvec *)NULL)->iov_len));
+
+ *head = err;
+ err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
+ (struct vringh_kiov *)wiov,
+ range_check, getrange, GFP_KERNEL, copydesc_user);
+ if (err)
+ return err;
+
+ return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_user);
+
+/**
+ * vringh_iov_pull_user - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+{
+ return vringh_iov_xfer((struct vringh_kiov *)riov,
+ dst, len, xfer_from_user);
+}
+EXPORT_SYMBOL(vringh_iov_pull_user);
+
+/**
+ * vringh_iov_push_user - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len)
+{
+ return vringh_iov_xfer((struct vringh_kiov *)wiov,
+ (void *)src, len, xfer_to_user);
+}
+EXPORT_SYMBOL(vringh_iov_push_user);
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_user);
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_user.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len)
+{
+ struct vring_used_elem used;
+
+ used.id = head;
+ used.len = len;
+ return __vringh_complete(vrh, &used, 1, putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_user);
+
+/**
+ * vringh_complete_multi_user - we've finished with many descriptors.
+ * @vrh: the vring.
+ * @used: the head, length pairs.
+ * @num_used: the number of used elements.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_multi_user(struct vringh *vrh,
+ const struct vring_used_elem used[],
+ unsigned num_used)
+{
+ return __vringh_complete(vrh, used, num_used,
+ putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_multi_user);
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_enable_user);
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_disable_user);
+
+/**
+ * vringh_need_notify_user - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_user() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_user(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_user);
+}
+EXPORT_SYMBOL(vringh_need_notify_user);
+
+/* Kernelspace access helpers. */
+static inline int getu16_kern(u16 *val, const u16 *p)
+{
+ *val = ACCESS_ONCE(*p);
+ return 0;
+}
+
+static inline int putu16_kern(u16 *p, u16 val)
+{
+ ACCESS_ONCE(*p) = val;
+ return 0;
+}
+
+static inline int copydesc_kern(void *dst, const void *src, size_t len)
+{
+ memcpy(dst, src, len);
+ return 0;
+}
+
+static inline int putused_kern(struct vring_used_elem *dst,
+ const struct vring_used_elem *src,
+ unsigned int num)
+{
+ memcpy(dst, src, num * sizeof(*dst));
+ return 0;
+}
+
+static inline int xfer_kern(void *src, void *dst, size_t len)
+{
+ memcpy(dst, src, len);
+ return 0;
+}
+
+/**
+ * vringh_init_kern - initialize a vringh for a kernelspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid.
+ */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %u", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+ vrh->weak_barriers = weak_barriers;
+ vrh->listening = false;
+ vrh->completed = 0;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ vrh->vring.desc = desc;
+ vrh->vring.avail = avail;
+ vrh->vring.used = used;
+ return 0;
+}
+EXPORT_SYMBOL(vringh_init_kern);
+
+/**
+ * vringh_getdesc_kern - get next available descriptor from kernelspace ring.
+ * @vrh: the kernelspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @head: head index we received, for passing to vringh_complete_kern().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num. You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_kern(struct vringh *vrh,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ u16 *head,
+ gfp_t gfp)
+{
+ int err;
+
+ err = __vringh_get_head(vrh, getu16_kern, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ *head = err;
+ err = __vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL,
+ gfp, copydesc_kern);
+ if (err)
+ return err;
+
+ return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_kern);
+
+/**
+ * vringh_iov_pull_kern - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
+{
+ return vringh_iov_xfer(riov, dst, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_pull_kern);
+
+/**
+ * vringh_iov_push_kern - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+ const void *src, size_t len)
+{
+ return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_push_kern);
+
+/**
+ * vringh_abandon_kern - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_kern() to undo).
+ *
+ * The next vringh_get_kern() will return the old descriptor(s) again.
+ */
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_kern);
+
+/**
+ * vringh_complete_kern - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_kern.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_kern() after one or more calls
+ * to this function.
+ */
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len)
+{
+ struct vring_used_elem used;
+
+ used.id = head;
+ used.len = len;
+
+ return __vringh_complete(vrh, &used, 1, putu16_kern, putused_kern);
+}
+EXPORT_SYMBOL(vringh_complete_kern);
+
+/**
+ * vringh_notify_enable_kern - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_kern(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_kern, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_enable_kern);
+
+/**
+ * vringh_notify_disable_kern - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_kern(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_disable_kern);
+
+/**
+ * vringh_need_notify_kern - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_kern() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_kern(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_kern);
+}
+EXPORT_SYMBOL(vringh_need_notify_kern);
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
new file mode 100644
index 0000000..ab41185
--- /dev/null
+++ b/include/linux/vringh.h
@@ -0,0 +1,185 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VRINGH_H
+#define _LINUX_VRINGH_H
+#include <uapi/linux/virtio_ring.h>
+#include <linux/uio.h>
+#include <linux/slab.h>
+#include <asm/barrier.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+ /* Guest publishes used event idx (note: we always do). */
+ bool event_indices;
+
+ /* Have we told the other end we want to be notified? */
+ bool listening;
+
+ /* Can we get away with weak barriers? */
+ bool weak_barriers;
+
+ /* Last available index we saw (ie. where we're up to). */
+ u16 last_avail_idx;
+
+ /* Last index we used. */
+ u16 last_used_idx;
+
+ /* How many descriptors we've completed since last need_notify(). */
+ u32 completed;
+
+ /* The vring (note: it may contain user pointers!) */
+ struct vring vring;
+};
+
+/* The memory the vring can access, and what offset to apply. */
+struct vringh_range {
+ u64 start, end_incl;
+ u64 offset;
+};
+
+/* All the information about an iovec. */
+struct vringh_iov {
+ struct iovec *iov;
+ size_t off; /* Within iov[i] */
+ unsigned i, used, max_num;
+};
+
+/* All the information about a kvec. */
+struct vringh_kiov {
+ struct kvec *iov;
+ size_t off; /* Within iov[i] */
+ unsigned i, used, max_num;
+};
+
+/* Flag on max_num to indicate we're kmalloced. */
+#define VRINGH_IOV_ALLOCATED 0x8000000
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used);
+
+static inline void vringh_iov_init(struct vringh_iov *iov,
+ struct iovec *iovec, unsigned num)
+{
+ iov->used = iov->i = 0;
+ iov->off = 0;
+ iov->max_num = num;
+ iov->iov = iovec;
+}
+
+static inline void vringh_iov_reset(struct vringh_iov *iov)
+{
+ iov->off = 0;
+ iov->i = 0;
+}
+
+static inline void vringh_iov_cleanup(struct vringh_iov *iov)
+{
+ if (iov->max_num & VRINGH_IOV_ALLOCATED)
+ kfree(iov->iov);
+ iov->max_num = iov->used = iov->i = iov->off = 0;
+ iov->iov = NULL;
+}
+
+/* Convert a descriptor into iovecs. */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ u16 *head);
+
+/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len);
+
+/* Mark a descriptor as used. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len);
+int vringh_complete_multi_user(struct vringh *vrh,
+ const struct vring_used_elem used[],
+ unsigned num_used);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+
+/* Do we need to fire the eventfd to notify the other side? */
+int vringh_need_notify_user(struct vringh *vrh);
+
+bool vringh_notify_enable_user(struct vringh *vrh);
+void vringh_notify_disable_user(struct vringh *vrh);
+
+/* Helpers for kernelspace vrings. */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used);
+
+static inline void vringh_kiov_init(struct vringh_kiov *kiov,
+ struct kvec *kvec, unsigned num)
+{
+ kiov->used = kiov->i = 0;
+ kiov->off = 0;
+ kiov->max_num = num;
+ kiov->iov = kvec;
+}
+
+static inline void vringh_kiov_reset(struct vringh_kiov *kiov)
+{
+ kiov->off = 0;
+ kiov->i = 0;
+}
+
+static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
+{
+ if (kiov->max_num & VRINGH_IOV_ALLOCATED)
+ kfree(kiov->iov);
+ kiov->max_num = kiov->used = kiov->i = kiov->off = 0;
+ kiov->iov = NULL;
+}
+
+int vringh_getdesc_kern(struct vringh *vrh,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ u16 *head,
+ gfp_t gfp);
+
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len);
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+ const void *src, size_t len);
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);
+
+bool vringh_notify_enable_kern(struct vringh *vrh);
+void vringh_notify_disable_kern(struct vringh *vrh);
+
+int vringh_need_notify_kern(struct vringh *vrh);
+
+#endif /* _LINUX_VRINGH_H */
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/3] vringh: don't flag already listening.
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
@ 2013-03-06 4:59 ` Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
1 sibling, 0 replies; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 4:59 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
This doesn't work for eventidx:
1) vringh_notify_enable_user() gets called because the ring is empty
(vrh->last_avail_idx == vrh->vring.avail->idx).
2) It puts vrh->last_avail_idx into vring_avail_event().
3) It sets ->listening to true.
4) Meanwhile, the other side has done more work, updating vrh->vring.avail->idx.
5) It notices vrh->vring.avail->idx has changed, so returns true to
make us do more work.
But since the value we wrote into vring_avail_event() is past, we
won't get notified. This happens rarely: only once I'd optimized
add_buf could I trigger this in about 1 in 10 runs.
Since it's so rare, just remove the listening flag: it's fine to do
this twice anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 8234f2b..d4413d9 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -476,12 +476,6 @@ static inline bool __vringh_notify_enable(struct vringh *vrh,
{
u16 avail;
- /* Already enabled? */
- if (vrh->listening)
- return false;
-
- vrh->listening = true;
-
if (!vrh->event_indices) {
/* Old-school; update flags. */
if (putu16(&vrh->vring.used->flags, 0) != 0) {
@@ -515,11 +509,6 @@ static inline bool __vringh_notify_enable(struct vringh *vrh,
static inline void __vringh_notify_disable(struct vringh *vrh,
int (*putu16)(u16 *p, u16 val))
{
- /* Already disabled? */
- if (!vrh->listening)
- return;
-
- vrh->listening = false;
if (!vrh->event_indices) {
/* Old-school; update flags. */
if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
@@ -593,7 +582,6 @@ int vringh_init_user(struct vringh *vrh, u32 features,
vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
vrh->weak_barriers = weak_barriers;
- vrh->listening = false;
vrh->completed = 0;
vrh->last_avail_idx = 0;
vrh->last_used_idx = 0;
@@ -853,7 +841,6 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
vrh->weak_barriers = weak_barriers;
- vrh->listening = false;
vrh->completed = 0;
vrh->last_avail_idx = 0;
vrh->last_used_idx = 0;
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index ab41185..44b94cd 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -33,9 +33,6 @@ struct vringh {
/* Guest publishes used event idx (note: we always do). */
bool event_indices;
- /* Have we told the other end we want to be notified? */
- bool listening;
-
/* Can we get away with weak barriers? */
bool weak_barriers;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/3] tools/virtio: add vring_test (v2)
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
@ 2013-03-06 5:02 ` Rusty Russell
1 sibling, 0 replies; 19+ messages in thread
From: Rusty Russell @ 2013-03-06 5:02 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
This is mainly to test the drivers/vhost/vringh.c code, but it also
uses the drivers/virtio/virtio_ring.c code for the guest side.
Usage for testing the basic implementation:
./vringh_test
# Test with indirect descriptors
./vringh_test --indirect
# Test with indirect descriptors and event indexex
./vringh_test --indirect --eventidx
You can run a parallel stress test by adding --parallel to any of the
above options.
eg ./vringh_test --parallel:
Using CPUS 0 and 3
Guest: notified 10107974, pinged 107970
Host: notified 108158, pinged 3172148
./vringh_test --indirect --eventidx --parallel:
Using CPUS 0 and 3
Guest: notified 156357, pinged 156251
Host: notified 156251, pinged 78179
Average of 50 times doing ./vringh_test --indirect --eventidx --parallel:
2.840000-3.040000(2.927292)user
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changes:
- build objects separately, just like kernel does.
- slightly better cmdline handling, arbitrary ordering of args
- --fast-vringh for benchmarking virtio_ring improvements.
- -U_FORTIFY_SOURCE to remove bogus Ubuntu fortify warnings.
diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index b48c432..3187c62 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,12 +1,14 @@
all: test mod
-test: virtio_test
+test: virtio_test vringh_test
virtio_test: virtio_ring.o virtio_test.o
-CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD
-vpath %.c ../../drivers/virtio
+vringh_test: vringh_test.o vringh.o virtio_ring.o
+
+CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE
+vpath %.c ../../drivers/virtio ../../drivers/vhost
mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
.PHONY: all test mod clean
clean:
- ${RM} *.o vhost_test/*.o vhost_test/.*.cmd \
+ ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
vhost_test/Module.symvers vhost_test/modules.order *.d
-include *.d
diff --git a/tools/virtio/linux/uio.h b/tools/virtio/linux/uio.h
index ecbdf92..cd20f0b 100644
--- a/tools/virtio/linux/uio.h
+++ b/tools/virtio/linux/uio.h
@@ -1 +1,3 @@
+#include <linux/kernel.h>
+
#include "../../../include/linux/uio.h"
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
new file mode 100644
index 0000000..8816dd4
--- /dev/null
+++ b/tools/virtio/vringh_test.c
@@ -0,0 +1,735 @@
+/* Simple test of virtio code, entirely in userpsace. */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <err.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/virtio.h>
+#include <linux/vringh.h>
+#include <linux/virtio_ring.h>
+#include <linux/uaccess.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+
+#define USER_MEM (1024*1024)
+void *__user_addr_min, *__user_addr_max;
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static u64 user_addr_offset;
+
+#define RINGSIZE 256
+#define ALIGN 4096
+
+static void never_notify_host(struct virtqueue *vq)
+{
+ abort();
+}
+
+static void never_callback_guest(struct virtqueue *vq)
+{
+ abort();
+}
+
+static bool getrange_iov(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
+ r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
+ r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
+ r->offset = user_addr_offset;
+ return true;
+}
+
+/* We return single byte ranges. */
+static bool getrange_slow(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
+ r->start = addr;
+ r->end_incl = r->start;
+ r->offset = user_addr_offset;
+ return true;
+}
+
+struct guest_virtio_device {
+ struct virtio_device vdev;
+ int to_host_fd;
+ unsigned long notifies;
+};
+
+static void parallel_notify_host(struct virtqueue *vq)
+{
+ struct guest_virtio_device *gvdev;
+
+ gvdev = container_of(vq->vdev, struct guest_virtio_device, vdev);
+ write(gvdev->to_host_fd, "", 1);
+ gvdev->notifies++;
+}
+
+static void no_notify_host(struct virtqueue *vq)
+{
+}
+
+#define NUM_XFERS (10000000)
+
+/* We aim for two "distant" cpus. */
+static void find_cpus(unsigned int *first, unsigned int *last)
+{
+ unsigned int i;
+
+ *first = -1U;
+ *last = 0;
+ for (i = 0; i < 4096; i++) {
+ cpu_set_t set;
+ CPU_ZERO(&set);
+ CPU_SET(i, &set);
+ if (sched_setaffinity(getpid(), sizeof(set), &set) == 0) {
+ if (i < *first)
+ *first = i;
+ if (i > *last)
+ *last = i;
+ }
+ }
+}
+
+/* Opencoded version for fast mode */
+static inline int vringh_get_head(struct vringh *vrh, u16 *head)
+{
+ u16 avail_idx, i;
+ int err;
+
+ err = get_user(avail_idx, &vrh->vring.avail->idx);
+ if (err)
+ return err;
+
+ if (vrh->last_avail_idx == avail_idx)
+ return 0;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ virtio_rmb(vrh->weak_barriers);
+
+ i = vrh->last_avail_idx & (vrh->vring.num - 1);
+
+ err = get_user(*head, &vrh->vring.avail->ring[i]);
+ if (err)
+ return err;
+
+ vrh->last_avail_idx++;
+ return 1;
+}
+
+static int parallel_test(unsigned long features,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ bool fast_vringh)
+{
+ void *host_map, *guest_map;
+ int fd, mapsize, to_guest[2], to_host[2];
+ unsigned long xfers = 0, notifies = 0, receives = 0;
+ unsigned int first_cpu, last_cpu;
+ cpu_set_t cpu_set;
+ char buf[128];
+
+ /* Create real file to mmap. */
+ fd = open("/tmp/vringh_test-file", O_RDWR|O_CREAT|O_TRUNC, 0600);
+ if (fd < 0)
+ err(1, "Opening /tmp/vringh_test-file");
+
+ /* Extra room at the end for some data, and indirects */
+ mapsize = vring_size(RINGSIZE, ALIGN)
+ + RINGSIZE * 2 * sizeof(int)
+ + RINGSIZE * 6 * sizeof(struct vring_desc);
+ mapsize = (mapsize + getpagesize() - 1) & ~(getpagesize() - 1);
+ ftruncate(fd, mapsize);
+
+ /* Parent and child use separate addresses, to check our mapping logic! */
+ host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+
+ pipe(to_guest);
+ pipe(to_host);
+
+ CPU_ZERO(&cpu_set);
+ find_cpus(&first_cpu, &last_cpu);
+ printf("Using CPUS %u and %u\n", first_cpu, last_cpu);
+ fflush(stdout);
+
+ if (fork() != 0) {
+ struct vringh vrh;
+ int status, err, rlen = 0;
+ char rbuf[5];
+
+ /* We are the host: never access guest addresses! */
+ munmap(guest_map, mapsize);
+
+ __user_addr_min = host_map;
+ __user_addr_max = __user_addr_min + mapsize;
+ user_addr_offset = host_map - guest_map;
+ assert(user_addr_offset);
+
+ close(to_guest[0]);
+ close(to_host[1]);
+
+ vring_init(&vrh.vring, RINGSIZE, host_map, ALIGN);
+ vringh_init_user(&vrh, features, RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ errx(1, "Could not set affinity to cpu %u", first_cpu);
+
+ while (xfers < NUM_XFERS) {
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ u16 head;
+
+ if (fast_vringh) {
+ for (;;) {
+ err = vringh_get_head(&vrh, &head);
+ if (err != 0)
+ break;
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i",
+ err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+ }
+ if (err != 1)
+ errx(1, "vringh_get_head");
+ goto complete;
+ } else {
+ vringh_iov_init(&riov,
+ host_riov,
+ ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov,
+ host_wiov,
+ ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov,
+ getrange, &head);
+ }
+ if (err == 0) {
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i",
+ err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+
+ if (vringh_notify_enable_user(&vrh))
+ continue;
+
+ /* Swallow all notifies at once. */
+ if (read(to_host[0], buf, sizeof(buf)) < 1)
+ break;
+
+ vringh_notify_disable_user(&vrh);
+ receives++;
+ continue;
+ }
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* We simply copy bytes. */
+ if (riov.used) {
+ rlen = vringh_iov_pull_user(&riov, rbuf,
+ sizeof(rbuf));
+ if (rlen != 4)
+ errx(1, "vringh_iov_pull_user: %i",
+ rlen);
+ assert(riov.i == riov.used);
+ } else {
+ err = vringh_iov_push_user(&wiov, rbuf, rlen);
+ if (err != rlen)
+ errx(1, "vringh_iov_push_user: %i",
+ err);
+ assert(wiov.i == wiov.used);
+ }
+ complete:
+ xfers++;
+
+ err = vringh_complete_user(&vrh, head,
+ riov.used ? 0 : rlen);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+ }
+
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i", err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+ wait(&status);
+ if (!WIFEXITED(status))
+ errx(1, "Child died with signal %i?", WTERMSIG(status));
+ if (WEXITSTATUS(status) != 0)
+ errx(1, "Child exited %i?", WEXITSTATUS(status));
+ printf("Host: notified %lu, pinged %lu\n", notifies, receives);
+ return 0;
+ } else {
+ struct guest_virtio_device gvdev;
+ struct virtqueue *vq;
+ unsigned int *data;
+ struct vring_desc *indirects;
+ unsigned int finished = 0;
+
+ /* We pass sg[]s pointing into here, but we need RINGSIZE+1 */
+ data = guest_map + vring_size(RINGSIZE, ALIGN);
+ indirects = (void *)data + (RINGSIZE + 1) * 2 * sizeof(int);
+
+ /* We are the guest. */
+ munmap(host_map, mapsize);
+
+ close(to_guest[1]);
+ close(to_host[0]);
+
+ gvdev.vdev.features[0] = features;
+ gvdev.to_host_fd = to_host[1];
+
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ err(1, "Could not set affinity to cpu %u", first_cpu);
+
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
+ guest_map, fast_vringh ? no_notify_host
+ : parallel_notify_host,
+ never_callback_guest, "guest vq");
+
+ /* Don't kfree indirects. */
+ __kfree_ignore_start = indirects;
+ __kfree_ignore_end = indirects + RINGSIZE * 6;
+
+ while (xfers < NUM_XFERS) {
+ struct scatterlist sg[4];
+ unsigned int num_sg, len;
+ int *dbuf, err;
+ bool output = !(xfers % 2);
+
+ /* Consume bufs. */
+ while ((dbuf = virtqueue_get_buf(vq, &len)) != NULL) {
+ if (len == 4)
+ assert(*dbuf == finished - 1);
+ else
+ assert(*dbuf == finished);
+ finished++;
+ }
+
+ /* Produce a buffer. */
+ dbuf = data + (xfers % (RINGSIZE + 1));
+
+ if (output)
+ *dbuf = xfers;
+ else
+ *dbuf = -1;
+
+ switch ((xfers / sizeof(*dbuf)) % 4) {
+ case 0:
+ /* Nasty three-element sg list. */
+ sg_init_table(sg, num_sg = 3);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 2);
+ sg_set_buf(&sg[2], (void *)dbuf + 3, 1);
+ break;
+ case 1:
+ sg_init_table(sg, num_sg = 2);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 3);
+ break;
+ case 2:
+ sg_init_table(sg, num_sg = 1);
+ sg_set_buf(&sg[0], (void *)dbuf, 4);
+ break;
+ case 3:
+ sg_init_table(sg, num_sg = 4);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 1);
+ sg_set_buf(&sg[2], (void *)dbuf + 2, 1);
+ sg_set_buf(&sg[3], (void *)dbuf + 3, 1);
+ break;
+ }
+
+ /* May allocate an indirect, so force it to allocate
+ * user addr */
+ __kmalloc_fake = indirects + (xfers % RINGSIZE) * 4;
+ if (output)
+ err = virtqueue_add_buf(vq, sg, num_sg, 0, dbuf,
+ GFP_KERNEL);
+ else
+ err = virtqueue_add_buf(vq, sg, 0, num_sg, dbuf,
+ GFP_KERNEL);
+
+ if (err == -ENOSPC) {
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ /* Swallow all notifies at once. */
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ continue;
+ }
+
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+
+ xfers++;
+ virtqueue_kick(vq);
+ }
+
+ /* Any extra? */
+ while (finished != xfers) {
+ int *dbuf;
+ unsigned int len;
+
+ /* Consume bufs. */
+ dbuf = virtqueue_get_buf(vq, &len);
+ if (dbuf) {
+ if (len == 4)
+ assert(*dbuf == finished - 1);
+ else
+ assert(len == 0);
+ finished++;
+ continue;
+ }
+
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ }
+
+ printf("Guest: notified %lu, pinged %lu\n",
+ gvdev.notifies, receives);
+ return 0;
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct virtio_device vdev;
+ struct virtqueue *vq;
+ struct vringh vrh;
+ struct scatterlist guest_sg[RINGSIZE];
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ struct vring_used_elem used[RINGSIZE];
+ char buf[28];
+ u16 head;
+ int err;
+ unsigned i;
+ void *ret;
+ bool (*getrange)(struct vringh *vrh, u64 addr, struct vringh_range *r);
+ bool fast_vringh = false, parallel = false;
+
+ getrange = getrange_iov;
+ vdev.features[0] = 0;
+
+ while (argv[1]) {
+ if (strcmp(argv[1], "--indirect") == 0)
+ vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+ else if (strcmp(argv[1], "--eventidx") == 0)
+ vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
+ else if (strcmp(argv[1], "--slow-range") == 0)
+ getrange = getrange_slow;
+ else if (strcmp(argv[1], "--fast-vringh") == 0)
+ fast_vringh = true;
+ else if (strcmp(argv[1], "--parallel") == 0)
+ parallel = true;
+ else
+ errx(1, "Unknown arg %s", argv[1]);
+ argv++;
+ }
+
+ if (parallel)
+ return parallel_test(vdev.features[0], getrange, fast_vringh);
+
+ if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
+ abort();
+ __user_addr_max = __user_addr_min + USER_MEM;
+ memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN));
+
+ /* Set up guest side. */
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+ __user_addr_min,
+ never_notify_host, never_callback_guest,
+ "guest vq");
+
+ /* Set up host side. */
+ vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
+ vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+
+ /* No descriptor to get yet... */
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 0)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* Guest puts in a descriptor. */
+ memcpy(__user_addr_max - 1, "a", 1);
+ sg_init_table(guest_sg, 1);
+ sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+ sg_init_table(guest_sg+1, 1);
+ sg_set_buf(&guest_sg[1], __user_addr_max - 3, 2);
+
+ /* May allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, 1, 1, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host retreives it. */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.used == 1);
+ assert(riov.iov[0].iov_base == __user_addr_max - 1);
+ assert(riov.iov[0].iov_len == 1);
+ if (getrange != getrange_slow) {
+ assert(wiov.used == 1);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 2);
+ } else {
+ assert(wiov.used == 2);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 1);
+ assert(wiov.iov[1].iov_base == __user_addr_max - 2);
+ assert(wiov.iov[1].iov_len == 1);
+ }
+
+ err = vringh_iov_pull_user(&riov, buf, 5);
+ if (err != 1)
+ errx(1, "vringh_iov_pull_user: %i", err);
+ assert(buf[0] == 'a');
+ assert(riov.i == 1);
+ assert(vringh_iov_pull_user(&riov, buf, 5) == 0);
+
+ memcpy(buf, "bcdef", 5);
+ err = vringh_iov_push_user(&wiov, buf, 5);
+ if (err != 2)
+ errx(1, "vringh_iov_push_user: %i", err);
+ assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
+ assert(wiov.i == wiov.used);
+ assert(vringh_iov_push_user(&wiov, buf, 5) == 0);
+
+ /* Host is done. */
+ err = vringh_complete_user(&vrh, head, err);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+
+ /* Guest should see used token now. */
+ __kfree_ignore_start = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ __kfree_ignore_end = __kfree_ignore_start + 1;
+ ret = virtqueue_get_buf(vq, &i);
+ if (ret != &err)
+ errx(1, "virtqueue_get_buf: %p", ret);
+ assert(i == 2);
+
+ /* Guest puts in a huge descriptor. */
+ sg_init_table(guest_sg, RINGSIZE);
+ for (i = 0; i < RINGSIZE; i++) {
+ sg_set_buf(&guest_sg[i],
+ __user_addr_max - USER_MEM/4, USER_MEM/4);
+ }
+
+ /* Fill contents with recognisable garbage. */
+ for (i = 0; i < USER_MEM/4; i++)
+ ((char *)__user_addr_max - USER_MEM/4)[i] = i;
+
+ /* This will allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, RINGSIZE, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (large): %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host picks it up (allocates new iov). */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+ assert(riov.iov != host_riov);
+ if (getrange != getrange_slow)
+ assert(riov.used == RINGSIZE);
+ else
+ assert(riov.used == RINGSIZE * USER_MEM/4);
+
+ assert(!(wiov.max_num & VRINGH_IOV_ALLOCATED));
+ assert(wiov.used == 0);
+
+ /* Pull data back out (in odd chunks), should be as expected. */
+ for (i = 0; i < RINGSIZE * USER_MEM/4; i += 3) {
+ err = vringh_iov_pull_user(&riov, buf, 3);
+ if (err != 3 && i + err != RINGSIZE * USER_MEM/4)
+ errx(1, "vringh_iov_pull_user large: %i", err);
+ assert(buf[0] == (char)i);
+ assert(err < 2 || buf[1] == (char)(i + 1));
+ assert(err < 3 || buf[2] == (char)(i + 2));
+ }
+ assert(riov.i == riov.used);
+ vringh_iov_cleanup(&riov);
+ vringh_iov_cleanup(&wiov);
+
+ /* Complete using multi interface, just because we can. */
+ used[0].id = head;
+ used[0].len = 0;
+ err = vringh_complete_multi_user(&vrh, used, 1);
+ if (err)
+ errx(1, "vringh_complete_multi_user(1): %i", err);
+
+ /* Free up those descriptors. */
+ ret = virtqueue_get_buf(vq, &i);
+ if (ret != &err)
+ errx(1, "virtqueue_get_buf: %p", ret);
+
+ /* Add lots of descriptors. */
+ sg_init_table(guest_sg, 1);
+ sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+ for (i = 0; i < RINGSIZE; i++) {
+ err = virtqueue_add_buf(vq, guest_sg, 1, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (multiple): %i", err);
+ }
+
+ /* Now get many, and consume them all at once. */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ for (i = 0; i < RINGSIZE; i++) {
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+ used[i].id = head;
+ used[i].len = 0;
+ }
+ /* Make sure it wraps around ring, to test! */
+ assert(vrh.vring.used->idx % RINGSIZE != 0);
+ err = vringh_complete_multi_user(&vrh, used, RINGSIZE);
+ if (err)
+ errx(1, "vringh_complete_multi_user: %i", err);
+
+ /* Free those buffers. */
+ for (i = 0; i < RINGSIZE; i++) {
+ unsigned len;
+ assert(virtqueue_get_buf(vq, &len) != NULL);
+ }
+
+ /* Test weird (but legal!) indirect. */
+ if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+ char *data = __user_addr_max - USER_MEM/4;
+ struct vring_desc *d = __user_addr_max - USER_MEM/2;
+ struct vring vring;
+
+ /* Force creation of direct, which we modify. */
+ vdev.features[0] &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+ __user_addr_min,
+ never_notify_host,
+ never_callback_guest,
+ "guest vq");
+
+ sg_init_table(guest_sg, 4);
+ sg_set_buf(&guest_sg[0], d, sizeof(*d)*2);
+ sg_set_buf(&guest_sg[1], d + 2, sizeof(*d)*1);
+ sg_set_buf(&guest_sg[2], data + 6, 4);
+ sg_set_buf(&guest_sg[3], d + 3, sizeof(*d)*3);
+
+ err = virtqueue_add_buf(vq, guest_sg, 4, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (indirect): %i", err);
+
+ vring_init(&vring, RINGSIZE, __user_addr_min, ALIGN);
+
+ /* They're used in order, but double-check... */
+ assert(vring.desc[0].addr == (unsigned long)d);
+ assert(vring.desc[1].addr == (unsigned long)(d+2));
+ assert(vring.desc[2].addr == (unsigned long)data + 6);
+ assert(vring.desc[3].addr == (unsigned long)(d+3));
+ vring.desc[0].flags |= VRING_DESC_F_INDIRECT;
+ vring.desc[1].flags |= VRING_DESC_F_INDIRECT;
+ vring.desc[3].flags |= VRING_DESC_F_INDIRECT;
+
+ /* First indirect */
+ d[0].addr = (unsigned long)data;
+ d[0].len = 1;
+ d[0].flags = VRING_DESC_F_NEXT;
+ d[0].next = 1;
+ d[1].addr = (unsigned long)data + 1;
+ d[1].len = 2;
+ d[1].flags = 0;
+
+ /* Second indirect */
+ d[2].addr = (unsigned long)data + 3;
+ d[2].len = 3;
+ d[2].flags = 0;
+
+ /* Third indirect */
+ d[3].addr = (unsigned long)data + 10;
+ d[3].len = 5;
+ d[3].flags = VRING_DESC_F_NEXT;
+ d[3].next = 1;
+ d[4].addr = (unsigned long)data + 15;
+ d[4].len = 6;
+ d[4].flags = VRING_DESC_F_NEXT;
+ d[4].next = 2;
+ d[5].addr = (unsigned long)data + 21;
+ d[5].len = 7;
+ d[5].flags = 0;
+
+ /* Host picks it up (allocates new iov). */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ if (head != 0)
+ errx(1, "vringh_getdesc_user: head %i not 0", head);
+
+ assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+ if (getrange != getrange_slow)
+ assert(riov.used == 7);
+ else
+ assert(riov.used == 28);
+ err = vringh_iov_pull_user(&riov, buf, 29);
+ assert(err == 28);
+
+ /* Data should be linear. */
+ for (i = 0; i < err; i++)
+ assert(buf[i] == i);
+ vringh_iov_cleanup(&riov);
+ }
+
+ /* Don't leak memory... */
+ vring_del_virtqueue(vq);
+ free(__user_addr_min);
+
+ return 0;
+}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
@ 2013-03-06 10:20 ` Michael S. Tsirkin
2013-03-07 3:48 ` Rusty Russell
0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2013-03-06 10:20 UTC (permalink / raw)
To: Rusty Russell
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
> In the coming vringh_test, we share an mmap with another userspace process
> for testing. This requires real barriers.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> index aff61e1..7a63693 100644
> --- a/tools/virtio/asm/barrier.h
> +++ b/tools/virtio/asm/barrier.h
> @@ -3,8 +3,8 @@
> #define mb() __sync_synchronize()
>
> #define smp_mb() mb()
> -# define smp_rmb() barrier()
> -# define smp_wmb() barrier()
> +# define smp_rmb() mb()
> +# define smp_wmb() mb()
> /* Weak barriers should be used. If not - it's a bug */
> # define rmb() abort()
> # define wmb() abort()
Hmm this seems wrong on x86 which has strong order in hardware.
It should not matter whether the other side is a userspace
process or a kernel thread.
--
MST
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 4:42 ` Rusty Russell
@ 2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 12:16 ` Ohad Ben-Cohen
0 siblings, 1 reply; 19+ messages in thread
From: Sjur Brændeland @ 2013-03-06 10:50 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 5:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
...
>> @@ -70,6 +80,9 @@ struct virtio_config_ops {
>> void (*finalize_features)(struct virtio_device *vdev);
>> const char *(*bus_name)(struct virtio_device *vdev);
>> int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
>> + int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
>> + struct vringh *vrhs[], vrh_callback_t *callbacks[]);
>> + void (*del_vrhs)(struct virtio_device *vdev);
>> };
>>
>> /* If driver didn't advertise the feature, it will never appear. */
>
> It's weird that you conflate the host and guest ring sides in rpmsg, but
> that might make sense if they're really bound together.
The caif_virtio driver is using both host and guest ring sides, host side
rings in RX direction and guest side rings in TX direction. The reason
behind this is to enable zero-copy on the producer sides.
(For details see recent discussion with Ohad "Wrappers for vringh"
or the initial blurb when first introducing caif virtio:
https://lkml.org/lkml/2012/10/31/677)
> ... However, in
> general they are not: it's normal to be a guest or host, not both.
>
> This implies that you need a struct vringh_config, to put this in.
OK, should we move the definition of struct vringh_config into vringh.h
then, and add a vringh_config field in struct virtio_device?
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index ab41185..8156f51 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -50,6 +50,12 @@ struct vringh {
>>
>> /* The vring (note: it may contain user pointers!) */
>> struct vring vring;
>> +
>> + /* The function to call when buffers are available */
>> + void (*notify)(struct vringh *);
>> +
>> + /* A pointer for the vringh clients to use. */
>> + void *priv;
>> };
>
> Since the caller allocates the vringh, can it not use container_of()
> instead of a priv pointer?
Sure, no problem. I just need to add a new struct in remoteproc containing
both the vringh and a pointer to the internal ring data.
Regards,
Sjur
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 10:50 ` Sjur Brændeland
@ 2013-03-06 12:16 ` Ohad Ben-Cohen
2013-03-06 12:37 ` Sjur Brændeland
0 siblings, 1 reply; 19+ messages in thread
From: Ohad Ben-Cohen @ 2013-03-06 12:16 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
> The caif_virtio driver is using both host and guest ring sides, host side
> rings in RX direction and guest side rings in TX direction. The reason
> behind this is to enable zero-copy on the producer sides.
> (For details see recent discussion with Ohad "Wrappers for vringh"
FWIW, I'm not convinced that host side rings are necessary for
zero-copy - you could always just pass pointers to your huge data
inside buffers posted by guest side rings.
That said, I do believe that mixing guest side and host side rings may
simplify the overall solution in some cases, especially in more
complex multi core scenarios (where several cores all communicate with
each other).
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 12:16 ` Ohad Ben-Cohen
@ 2013-03-06 12:37 ` Sjur Brændeland
2013-03-06 23:28 ` Sjur Brændeland
0 siblings, 1 reply; 19+ messages in thread
From: Sjur Brændeland @ 2013-03-06 12:37 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
>> The caif_virtio driver is using both host and guest ring sides, host side
>> rings in RX direction and guest side rings in TX direction. The reason
>> behind this is to enable zero-copy on the producer sides.
>> (For details see recent discussion with Ohad "Wrappers for vringh"
>
> FWIW, I'm not convinced that host side rings are necessary for
> zero-copy - you could always just pass pointers to your huge data
> inside buffers posted by guest side rings.
Well, to do that for CAIF you would have to integrate virtio buffer handling
deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
could take buffers from the available ring.
From the outside point of view, perhaps this sounds doable, but in reality you
wouldn't get any Network Signalling engineers implementing the 3GPP stack
to buy into that.
Another issue is that our modem also support other interface technologies
such as USB and HSI. It simply doesn't make sense to enforce virtio rings
internally if virtio isn't otherwise used on the modem.
Without host side rings we simply wouldn't be able to implement zero-copy
in the RX direction for the STE-modem.
Regards,
Sjur
>
> That said, I do believe that mixing guest side and host side rings may
> simplify the overall solution in some cases, especially in more
> complex multi core scenarios (where several cores all communicate with
> each other).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 12:37 ` Sjur Brændeland
@ 2013-03-06 23:28 ` Sjur Brændeland
0 siblings, 0 replies; 19+ messages in thread
From: Sjur Brændeland @ 2013-03-06 23:28 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 1:37 PM, Sjur Brændeland <sjurbr@gmail.com> wrote:
> On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
>>> The caif_virtio driver is using both host and guest ring sides, host side
>>> rings in RX direction and guest side rings in TX direction. The reason
>>> behind this is to enable zero-copy on the producer sides.
>>> (For details see recent discussion with Ohad "Wrappers for vringh"
>>
>> FWIW, I'm not convinced that host side rings are necessary for
>> zero-copy - you could always just pass pointers to your huge data
>> inside buffers posted by guest side rings.
When I read this again I realized that I missed your point regarding
passing pointers inside buffers completely. My rambling below
argued against using the buffers provided in the available ring as a
replacement of the modem internal allocator. Sorry for the confusion.
I haven't really spent time thinking through the implication of the alternative
of passing buffer pointers inside buffers posted by guest side rings.
But it seems to me that there would be some issues related to returning
consumed buffers back if you use the rings in this way.
Thanks,
Sjur
> Well, to do that for CAIF you would have to integrate virtio buffer handling
> deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
> could take buffers from the available ring.
>
> From the outside point of view, perhaps this sounds doable, but in reality you
> wouldn't get any Network Signalling engineers implementing the 3GPP stack
> to buy into that.
>
> Another issue is that our modem also support other interface technologies
> such as USB and HSI. It simply doesn't make sense to enforce virtio rings
> internally if virtio isn't otherwise used on the modem.
>
> Without host side rings we simply wouldn't be able to implement zero-copy
> in the RX direction for the STE-modem.
>
> Regards,
> Sjur
>
>>
>> That said, I do believe that mixing guest side and host side rings may
>> simplify the overall solution in some cases, especially in more
>> complex multi core scenarios (where several cores all communicate with
>> each other).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 10:20 ` Michael S. Tsirkin
@ 2013-03-07 3:48 ` Rusty Russell
2013-03-07 9:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 19+ messages in thread
From: Rusty Russell @ 2013-03-07 3:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
>> In the coming vringh_test, we share an mmap with another userspace process
>> for testing. This requires real barriers.
>>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>>
>> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
>> index aff61e1..7a63693 100644
>> --- a/tools/virtio/asm/barrier.h
>> +++ b/tools/virtio/asm/barrier.h
>> @@ -3,8 +3,8 @@
>> #define mb() __sync_synchronize()
>>
>> #define smp_mb() mb()
>> -# define smp_rmb() barrier()
>> -# define smp_wmb() barrier()
>> +# define smp_rmb() mb()
>> +# define smp_wmb() mb()
>> /* Weak barriers should be used. If not - it's a bug */
>> # define rmb() abort()
>> # define wmb() abort()
>
> Hmm this seems wrong on x86 which has strong order in hardware.
> It should not matter whether the other side is a userspace
> process or a kernel thread.
Actually, this code is completely generic now, though overkill for x86 smp_wmb():
Interestingly, when I try defining them, 32-bit x86 slows down (it seems
that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
On my 32-bit laptop: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz
Before:
Wall time:1.660000-1.790000(1.682500)
After:
Wall time:1.930000-3.620000(1.960625)
64 bit it's a win:
On 2.6.32-358.el6.x86_64, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), Intel(R) Xeon(R) CPU E5620 @ 2.40GHz:
Before:
real 0m2.937000-8.339000(3.123979)s
user 0m2.811000-8.233000(2.954813)s
sys 0m0.052000-0.154000(0.089396)s
After:
real 0m2.559000-2.936000(2.726729)s
user 0m2.397000-2.651000(2.506396)s
sys 0m0.055000-0.152000(0.090667)s
Raw performance doesn't really matter, of course, but it's tempting to
use these asm barriers for __x86_64__, and use __sync_synchronize()
everywhere for everyone else.
Thoughts?
Rusty.
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
index 7a63693..8de720a 100644
--- a/tools/virtio/asm/barrier.h
+++ b/tools/virtio/asm/barrier.h
@@ -1,11 +1,12 @@
#if defined(__i386__) || defined(__x86_64__)
#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-#define smp_mb() mb()
-# define smp_rmb() mb()
-# define smp_wmb() mb()
+#define smp_mb() asm volatile("mfence":::"memory")
+#define smp_rmb() asm volatile("lfence":::"memory")
+#define smp_wmb() asm volatile("sfence" ::: "memory")
+
/* Weak barriers should be used. If not - it's a bug */
+# define mb() abort()
# define rmb() abort()
# define wmb() abort()
#else
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-07 3:48 ` Rusty Russell
@ 2013-03-07 9:29 ` Michael S. Tsirkin
2013-03-07 23:56 ` Rusty Russell
0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2013-03-07 9:29 UTC (permalink / raw)
To: Rusty Russell
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
On Thu, Mar 07, 2013 at 02:48:24PM +1100, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
> >> In the coming vringh_test, we share an mmap with another userspace process
> >> for testing. This requires real barriers.
> >>
> >> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> >>
> >> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> >> index aff61e1..7a63693 100644
> >> --- a/tools/virtio/asm/barrier.h
> >> +++ b/tools/virtio/asm/barrier.h
> >> @@ -3,8 +3,8 @@
> >> #define mb() __sync_synchronize()
> >>
> >> #define smp_mb() mb()
> >> -# define smp_rmb() barrier()
> >> -# define smp_wmb() barrier()
> >> +# define smp_rmb() mb()
> >> +# define smp_wmb() mb()
> >> /* Weak barriers should be used. If not - it's a bug */
> >> # define rmb() abort()
> >> # define wmb() abort()
> >
> > Hmm this seems wrong on x86 which has strong order in hardware.
> > It should not matter whether the other side is a userspace
> > process or a kernel thread.
>
> Actually, this code is completely generic now, though overkill for x86 smp_wmb():
>
> Interestingly, when I try defining them, 32-bit x86 slows down (it seems
> that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
Well this depends on which arch you are building for.
We saw this in qemu too, see e.g. include/qemu/atomic.h in qemu.
> On my 32-bit laptop: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz
>
> Before:
> Wall time:1.660000-1.790000(1.682500)
> After:
> Wall time:1.930000-3.620000(1.960625)
>
> 64 bit it's a win:
> On 2.6.32-358.el6.x86_64, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), Intel(R) Xeon(R) CPU E5620 @ 2.40GHz:
>
> Before:
> real 0m2.937000-8.339000(3.123979)s
> user 0m2.811000-8.233000(2.954813)s
> sys 0m0.052000-0.154000(0.089396)s
> After:
> real 0m2.559000-2.936000(2.726729)s
> user 0m2.397000-2.651000(2.506396)s
> sys 0m0.055000-0.152000(0.090667)s
>
> Raw performance doesn't really matter, of course, but it's tempting to
> use these asm barriers for __x86_64__, and use __sync_synchronize()
> everywhere for everyone else.
>
> Thoughts?
> Rusty.
For smp_mb, I agree.
> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> index 7a63693..8de720a 100644
> --- a/tools/virtio/asm/barrier.h
> +++ b/tools/virtio/asm/barrier.h
> @@ -1,11 +1,12 @@
> #if defined(__i386__) || defined(__x86_64__)
> #define barrier() asm volatile("" ::: "memory")
> -#define mb() __sync_synchronize()
>
> -#define smp_mb() mb()
> -# define smp_rmb() mb()
> -# define smp_wmb() mb()
> +#define smp_mb() asm volatile("mfence":::"memory")
> +#define smp_rmb() asm volatile("lfence":::"memory")
> +#define smp_wmb() asm volatile("sfence" ::: "memory")
> +
Confused. On x86_64, as long as you are not synchronizing with a device
these are not necessary, a compiler barrier will do, unless
there are non-temporal loads/stores, which we don't use.
> /* Weak barriers should be used. If not - it's a bug */
> +# define mb() abort()
> # define rmb() abort()
> # define wmb() abort()
> #else
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-07 9:29 ` Michael S. Tsirkin
@ 2013-03-07 23:56 ` Rusty Russell
0 siblings, 0 replies; 19+ messages in thread
From: Rusty Russell @ 2013-03-07 23:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Mar 07, 2013 at 02:48:24PM +1100, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
>> >> In the coming vringh_test, we share an mmap with another userspace process
>> >> for testing. This requires real barriers.
>> >>
>> >> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>> >>
>> >> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
>> >> index aff61e1..7a63693 100644
>> >> --- a/tools/virtio/asm/barrier.h
>> >> +++ b/tools/virtio/asm/barrier.h
>> >> @@ -3,8 +3,8 @@
>> >> #define mb() __sync_synchronize()
>> >>
>> >> #define smp_mb() mb()
>> >> -# define smp_rmb() barrier()
>> >> -# define smp_wmb() barrier()
>> >> +# define smp_rmb() mb()
>> >> +# define smp_wmb() mb()
>> >> /* Weak barriers should be used. If not - it's a bug */
>> >> # define rmb() abort()
>> >> # define wmb() abort()
>> >
>> > Hmm this seems wrong on x86 which has strong order in hardware.
>> > It should not matter whether the other side is a userspace
>> > process or a kernel thread.
>>
>> Actually, this code is completely generic now, though overkill for x86 smp_wmb():
>>
>> Interestingly, when I try defining them, 32-bit x86 slows down (it seems
>> that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
>
> Well this depends on which arch you are building for.
> We saw this in qemu too, see e.g. include/qemu/atomic.h in qemu.
Hmm, I thought x86 had load reordering, but it seems that was only some
older chips, which we can consider obsolete.
I learned something... I've dropped the patch.
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-03-07 23:56 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 12:16 ` Ohad Ben-Cohen
2013-03-06 12:37 ` Sjur Brændeland
2013-03-06 23:28 ` Sjur Brændeland
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
2013-03-06 10:20 ` Michael S. Tsirkin
2013-03-07 3:48 ` Rusty Russell
2013-03-07 9:29 ` Michael S. Tsirkin
2013-03-07 23:56 ` Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
-- strict thread matches above, loose matches on Subject: below --
2009-11-29 16:54 [alps] Timing patch, revised again :-) Sebastian Kapfer
2009-12-04 8:49 ` Dmitry Torokhov
2009-12-04 21:47 ` Sebastian Kapfer
2009-12-13 22:01 ` [PATCH 1/2] Sebastian Kapfer
2008-12-03 12:29 [PATCH 0/2] fusion reset Bernd Schubert
2008-12-03 12:31 ` [PATCH 1/2] Bernd Schubert
2007-03-06 9:36 Gerrit Renker
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.