Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.
From: Mike Christie @ 2007-09-05 18:34 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, open-iscsi, anilgv, talm, lusinsky, uri,
	SCSI Mailing List
In-Reply-To: <1188599815.5176.12.camel@dell>

Michael Chan wrote:
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 6f2c71e..adcfbbc 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -1800,6 +1800,8 @@ config ZFCP
>            called zfcp. If you want to compile it as a module, say M here
>            and read <file:Documentation/kbuild/modules.txt>.
>  
> +source "drivers/scsi/bnx2i/Kconfig"
> +
>  config SCSI_SRP
>  	tristate "SCSI RDMA Protocol helper library"
>  	depends on SCSI && PCI
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index 86a7ba7..65fe633 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -133,6 +133,7 @@ obj-$(CONFIG_SCSI_IBMVSCSIS)	+= ibmvscsi/
>  obj-$(CONFIG_SCSI_HPTIOP)	+= hptiop.o
>  obj-$(CONFIG_SCSI_STEX)		+= stex.o
>  obj-$(CONFIG_PS3_ROM)		+= ps3rom.o
> +obj-$(CONFIG_SCSI_BNX2_ISCSI)	+= bnx2i/
>  
>  obj-$(CONFIG_ARM)		+= arm/
>  
> diff --git a/drivers/scsi/bnx2i/57xx_iscsi_constants.h b/drivers/scsi/bnx2i/57xx_iscsi_constants.h
> new file mode 100644
> index 0000000..61f07f9
> --- /dev/null
> +++ b/drivers/scsi/bnx2i/57xx_iscsi_constants.h
> @@ -0,0 +1,212 @@
> +#ifndef __57XX_ISCSI_CONSTANTS_H_
> +#define __57XX_ISCSI_CONSTANTS_H_
> +
> +/**
> +* This file defines HSI constants for the iSCSI flows
> +*/
> +
> +/* iSCSI request op codes */
> +#define ISCSI_OPCODE_NOP_OUT 			(0 | 0x40)
> +#define ISCSI_OPCODE_SCSI_CMD 			(1)
> +#define ISCSI_OPCODE_TMF_REQUEST 		(2 | 0x40)
> +#define ISCSI_OPCODE_LOGIN_REQUEST 		(3 | 0x40)
> +#define ISCSI_OPCODE_TEXT_REQUEST 		(4 | 0x40)
> +#define ISCSI_OPCODE_DATA_OUT 			(5)
> +#define ISCSI_OPCODE_LOGOUT_REQUEST 		(6 | 0x00)
> +#define ISCSI_OPCODE_CLEANUP_REQUEST		(7)


What is ISCSI_OPCODE_CLEANUP_REQUEST?


> +
> +/* iSCSI response/messages op codes */
> +#define ISCSI_OPCODE_NOP_IN 			(0x20)
> +#define ISCSI_OPCODE_SCSI_RESPONSE 		(0x21)
> +#define ISCSI_OPCODE_TMF_RESPONSE 		(0x22)
> +#define ISCSI_OPCODE_LOGIN_RESPONSE 		(0x23)
> +#define ISCSI_OPCODE_TEXT_RESPONSE 		(0x24)
> +#define ISCSI_OPCODE_DATA_IN			(0x25)
> +#define ISCSI_OPCODE_LOGOUT_RESPONSE 		(0x26)
> +#define ISCSI_OPCODE_CLEANUP_RESPONSE 		(0x27)
> +#define ISCSI_OPCODE_R2T			(0x31)
> +#define ISCSI_OPCODE_ASYNC_MSG 			(0x32)
> +#define ISCSI_OPCODE_REJECT 			(0x3f)
> +#define ISCSI_OPCODE_NOPOUT_LOCAL_COMPLETION	(0)


What does the LOCAL_COMPLETION on the nopout mean?


> +
> +/* iSCSI stages */
> +#define ISCSI_STAGE_SECURITY_NEGOTIATION (0)
> +#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1)
> +#define ISCSI_STAGE_FULL_FEATURE_PHASE (3)
> +
> +/* iSCSI parameter defaults */
> +#define ISCSI_DEFAULT_HEADER_DIGEST 		(0)
> +#define ISCSI_DEFAULT_DATA_DIGEST 		(0)
> +#define ISCSI_DEFAULT_INITIAL_R2T 		(1)
> +#define ISCSI_DEFAULT_IMMEDIATE_DATA 		(1)
> +#define ISCSI_DEFAULT_MAX_PDU_LENGTH 		(0x2000)
> +#define ISCSI_DEFAULT_FIRST_BURST_LENGTH 	(0x10000)
> +#define ISCSI_DEFAULT_MAX_BURST_LENGTH 		(0x40000)
> +#define ISCSI_DEFAULT_MAX_OUTSTANDING_R2T 	(1)
> +
> +/* iSCSI parameter limits */
> +#define ISCSI_MIN_VAL_MAX_PDU_LENGTH (0x200)
> +#define ISCSI_MAX_VAL_MAX_PDU_LENGTH (0xffffff)
> +#define ISCSI_MIN_VAL_BURST_LENGTH (0x200)
> +#define ISCSI_MAX_VAL_BURST_LENGTH (0xffffff)
> +#define ISCSI_MIN_VAL_MAX_OUTSTANDING_R2T (1)
> +#define ISCSI_MAX_VAL_MAX_OUTSTANDING_R2T (0xff) /* 0x10000 according to RFC */
> +
> +/* SCSI command response codes */
> +#define ISCSI_SCSI_CMD_RESPONSE_CMD_COMPLETED	(0x00)
> +#define ISCSI_SCSI_CMD_RESPONSE_TARGET_FAILURE	(0x01)
> +
> +/* SCSI command status codes */
> +#define ISCSI_SCSI_CMD_STATUS_GOOD 		(0x00)
> +#define ISCSI_SCSI_CMD_STATUS_CHECK_CONDITION 	(0x02)
> +#define ISCSI_SCSI_CMD_STATUS_INTERMIDIATE 	(0x10)
> +
> +/* TMF codes */
> +#define ISCSI_TMF_ABORT_TASK 			(1)
> +#define ISCSI_TMF_LOGICAL_UNIT_RESET 		(5)
> +
> +/* TMF response codes */
> +#define ISCSI_TMF_RESPONSE_FUNCTION_COMPLETE			(0x00)
> +#define ISCSI_TMF_RESPONSE_TASK_DOESNT_EXIST			(0x01)
> +#define ISCSI_TMF_RESPONSE_LUN_DOESNT_EXIST			(0x02)
> +#define ISCSI_TMF_RESPONSE_TASK_STILL_ALLEGIANT			(0x03)
> +#define ISCSI_TMF_RESPONSE_FUNCTION_NOT_SUPPORTED		(0x05)
> +#define ISCSI_TMF_RESPONSE_FUNCTION_AUTHORIZATION_FAILED	(0x06)
> +#define ISCSI_TMF_RESPONSE_FUNCTION_REJECTED			(0xff)
> +
> +/* Logout reason codes */
> +/*#define ISCSI_LOGOUT_REASON_CLOSE_CONNECTION (1) */
> +
> +/* Logout response codes */
> +#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0)
> +#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1)
> +#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3)
> +
> +/* iSCSI task types */
> +#define ISCSI_TASK_TYPE_READ    (0)
> +#define ISCSI_TASK_TYPE_WRITE   (1)
> +#define ISCSI_TASK_TYPE_MPATH   (2)




All of these iscsi code shoulds be in iscsi_proto.h or should be added 
there.


> +
> +#endif
> diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
> new file mode 100644
> index 0000000..b74a93c
> --- /dev/null
> +++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
> @@ -0,0 +1,1977 @@



> + * process iSCSI TMF Response CQE & wake up driver EH thread.
> + */
> +static int bnx2i_process_tmf_resp(struct bnx2i_conn *conn, struct cqe *cqe)
> +{
> +	u32 itt;
> +	struct bnx2i_cmd *cmd;
> +	struct bnx2i_cmd *aborted_cmd;
> +	struct iscsi_tmf_response *tmf_cqe;
> +
> +	tmf_cqe = (struct iscsi_tmf_response *) cqe;
> +	itt = (tmf_cqe->itt & ISCSI_TMF_RESPONSE_INDEX);
> +	cmd = get_cmnd(conn->sess, itt);
> +	if (!cmd) {
> +		printk(KERN_ALERT "tmf_resp: ITT 0x%x is not active\n",
> +				  tmf_cqe->itt);
> +		return 0;
> +	}
> +
> +	bnx2i_update_cmd_sequence(conn->sess, tmf_cqe->exp_cmd_sn,
> +				  tmf_cqe->max_cmd_sn);
> +
> +	aborted_cmd = cmd->tmf_ref_cmd;
> +
> +	if (tmf_cqe->response == ISCSI_TMF_RSP_COMPLETE) {
> +		if (aborted_cmd->scsi_cmd) {
> +			aborted_cmd->scsi_cmd->result = DID_ABORT << 16;
> +			aborted_cmd->scsi_cmd->resid =

should use resid wrappers throughout driver.

> +				aborted_cmd->scsi_cmd->request_bufflen;

Should use bufflen wrappers throught driver.


> +}
> diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
> new file mode 100644
> index 0000000..837bd95
> --- /dev/null
> +++ b/drivers/scsi/bnx2i/bnx2i_init.c
>
> + * handles SCSI command queued by SCSI-ML, allocates a command structure,
> + *	assigning CMDSN, mapping SG buffers and handing over request to CNIC.
> + */
> +int bnx2i_queuecommand(struct scsi_cmnd *sc,
> +		       void (*done) (struct scsi_cmnd *))
> +{
> +	struct bnx2i_sess *sess;
> +	struct bnx2i_conn *conn;
> +	struct bnx2i_cmd *cmd;
> +	static int old_recovery_state = 0;
> +
> +	sc->scsi_done = done;
> +	sc->result = 0;
> +	sess = iscsi_hostdata(sc->device->host->hostdata);
> +
> +#define iscsi_cmd_win_closed(_sess)	\
> +	((int) (_sess->max_cmdsn - _sess->cmdsn) < 0)
> +
> +	if (iscsi_cmd_win_closed(sess) ||
> +	    test_bit(ADAPTER_STATE_LINK_DOWN, &sess->hba->adapter_state))
> +		goto iscsi_win_closed;
> +
> +	if ((sess->state & BNX2I_SESS_IN_SHUTDOWN) ||
> +		(sess->state & BNX2I_SESS_IN_LOGOUT))
> +		goto dev_not_found;
> +
> +	if (sess->recovery_state) {
> +		if (old_recovery_state != sess->recovery_state)
> +			old_recovery_state = sess->recovery_state;
> +
> +		if (sess->recovery_state & ISCSI_SESS_RECOVERY_FAILED)
> +			goto dev_not_found;
> +		else if (!(sess->recovery_state & ISCSI_SESS_RECOVERY_COMPLETE))
> +			goto iscsi_win_closed;
> +		else
> +			sess->recovery_state = 0;
> +	}
> +
> +	cmd = bnx2i_alloc_cmd(sess);
> +	if (cmd == NULL)
> +		/* should never happen as cmd list size == SHT->can_queue */
> +		goto cmd_not_accepted;
> +
> +	cmd->conn = conn = sess->lead_conn;
> +	cmd->scsi_cmd = sc;
> +	cmd->req.total_data_transfer_length = sc->request_bufflen;
> +	cmd->iscsi_opcode = ISCSI_OPCODE_SCSI_CMD;
> +	cmd->req.cmd_sn = sess->cmdsn++;
> +
> +	bnx2i_iscsi_map_sg_list(cmd);
> +	bnx2i_cpy_scsi_cdb(sc, cmd);
> +
> +	if (sc->sc_data_direction == DMA_TO_DEVICE) {
> +		cmd->req.op_attr = ISCSI_CMD_REQUEST_WRITE;
> +		cmd->req.itt |= (ISCSI_TASK_TYPE_WRITE <<
> +				 ISCSI_CMD_REQUEST_TYPE_SHIFT);
> +		bnx2i_setup_write_cmd_bd_info(cmd);
> +	} else {
> +		cmd->req.op_attr = ISCSI_CMD_REQUEST_READ;
> +		cmd->req.itt |= (ISCSI_TASK_TYPE_READ <<
> +				 ISCSI_CMD_REQUEST_TYPE_SHIFT);
> +	}
> +	cmd->req.num_bds = cmd->bd_tbl->bd_valid;
> +	if (!cmd->bd_tbl->bd_valid) {
> +		cmd->req.bd_list_addr_lo = (u32) sess->hba->mp_bd_dma;
> +		cmd->req.bd_list_addr_hi =
> +			(u32) ((u64) sess->hba->mp_bd_dma >> 32);
> +		cmd->req.num_bds = 1;
> +	}
> +
> +	cmd->cmd_state = ISCSI_CMD_STATE_INITIATED;
> +	sc->SCp.ptr = (char *) cmd;
> +
> +	if (cmd->req.itt != ITT_INVALID_SIGNATURE) {
> +		bnx2i_send_iscsi_scsicmd(conn, cmd);
> +		list_add_tail(&cmd->link, &sess->active_cmds);
> +		sess->num_active_cmds++;
> +	}
> +	return 0;
> +
> +iscsi_win_closed:
> +cmd_not_accepted:
> +	return SCSI_MLQUEUE_HOST_BUSY;
> +
> +dev_not_found:
> +	sc->sense_buffer[0] = 0x70;
> +	sc->sense_buffer[2] = NOT_READY;
> +	sc->sense_buffer[7] = 0x6;
> +	sc->sense_buffer[12] = 0x08;
> +	sc->sense_buffer[13] = 0x00;


do not fake sense and do not face sense and set a host byte of 
DID_NO_CONNECT. DID_NO_CONNECT is fine.


> +	sc->result = (DID_NO_CONNECT << 16);
> +	sc->resid = sc->request_bufflen;
> +	sc->scsi_done(sc);
> +	return 0;
> +}
> +
> +
> +
> +/*
> + * TMF request timeout handler
> + */
> +static void bnx2i_iscsi_tmf_timer(unsigned long data)
> +{
> +	struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) data;
> +
> +	printk(KERN_ALERT "TMF timer: abort failed, cmd 0x%p\n", cmd);
> +	cmd->cmd_state = ISCSI_CMD_STATE_TMF_TIMEOUT;
> +	cmd->conn->sess->recovery_state = ISCSI_SESS_RECOVERY_OPEN_ISCSI;
> +	iscsi_conn_error(cmd->conn->cls_conn, ISCSI_ERR_CONN_FAILED);
> +}
> +
> +
> +/*
> + * initiate command abort process by requesting CNIC to send
> + *	an iSCSI TMF request to target
> + */
> +static int bnx2i_initiate_abort_cmd(struct scsi_cmnd *sc)
> +{
> +	struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) sc->SCp.ptr;
> +	struct bnx2i_cmd *tmf_cmd;
> +	struct Scsi_Host *shost = cmd->scsi_cmd->device->host;
> +	struct bnx2i_conn *conn = cmd->conn;
> +	struct bnx2i_sess *sess;
> +	struct bnx2i_hba *hba;
> +
> +	shost = cmd->scsi_cmd->device->host;
> +	sess = iscsi_hostdata(shost->hostdata);
> +	BUG_ON(shost != sess->host);
> +
> +	if (sess && (is_sess_active(sess)))
> +		hba = sess->hba;
> +	else
> +		return FAILED;
> +
> +	if (test_bit(ADAPTER_STATE_LINK_DOWN, &hba->adapter_state))
> +		return FAILED;
> +
> +	bnx2i_setup_ictx_dump(hba, conn);
> +
> +	if (cmd->scsi_cmd != sc)
> +		/* command already completed to scsi mid-layer */
> +		goto cmd_not_active;
> +
> +	tmf_cmd = bnx2i_alloc_cmd(sess);
> +	if (tmf_cmd == NULL)
> +		goto lack_of_resc;
> +
> +	tmf_cmd->conn = conn = sess->lead_conn;
> +	tmf_cmd->scsi_cmd = NULL;
> +	tmf_cmd->iscsi_opcode = ISCSI_OPCODE_TMF_REQUEST;
> +	tmf_cmd->req.cmd_sn = sess->cmdsn;
> +	tmf_cmd->tmf_ref_itt = cmd->req.itt;
> +	tmf_cmd->tmf_ref_cmd = cmd;
> +	tmf_cmd->tmf_ref_sc = cmd->scsi_cmd;
> +	cmd->cmd_state = ISCSI_CMD_STATE_ABORT_PEND;
> +	tmf_cmd->cmd_state = ISCSI_CMD_STATE_INITIATED;
> +
> +#define BNX2I_TMF_TIMEOUT	10 * HZ
> +	sess->abort_timer.expires = BNX2I_TMF_TIMEOUT + jiffies;
> +	sess->abort_timer.function = bnx2i_iscsi_tmf_timer;
> +	sess->abort_timer.data = (unsigned long)tmf_cmd;
> +	add_timer(&sess->abort_timer);
> +
> +	bnx2i_send_iscsi_tmf(conn, tmf_cmd);
> +
> +	/* update iSCSI context for this conn, wait for CNIC to complete */
> +	wait_event_interruptible(sess->er_wait, (!conn->ep ||
> +		 			 (tmf_cmd->cmd_state ==
> +					  ISCSI_CMD_STATE_FAILED) ||
> +					 (tmf_cmd->cmd_state ==
> +					  ISCSI_CMD_STATE_COMPLETED)));
> +
> +	if (signal_pending(current))
> +		flush_signals(current);
> +
> +	del_timer_sync(&sess->abort_timer);
> +
> +	if (tmf_cmd->cmd_state == ISCSI_CMD_STATE_TMF_TIMEOUT) {
> +		printk(KERN_ALERT "abort: abort failed, cmd 0x%p\n", tmf_cmd);
> +		/* TMF timed out, return error status and let SCSI-ML do
> +		 * session recovery.
> +		 */
> +		list_del_init(&tmf_cmd->link);
> +		bnx2i_free_cmd(sess, tmf_cmd);
> +		return SUCCESS;
> +	}
> +
> +	list_del_init(&tmf_cmd->link);
> +	bnx2i_free_cmd(sess, tmf_cmd);
> +
> +	if ((cmd->scsi_cmd->result & 0xFF0000) == (DID_ABORT << 16)) {
> +		cmd->cmd_state = ISCSI_CMD_STATE_CLEANUP_PEND;
> +		bnx2i_send_cmd_cleanup_req(hba, cmd);
> +		wait_event_interruptible_timeout(sess->er_wait,
> +						 (cmd->cmd_state ==
> +						  ISCSI_CMD_STATE_CLEANUP_CMPL),
> +						 msecs_to_jiffies(
> +						  ISCSI_CMD_CLEANUP_TIMEOUT));
> +
> +		if (signal_pending(current))
> +			flush_signals(current);
> +	} else
> +		cmd->scsi_cmd->result = (DID_ABORT << 16);
> +
> +	cmd->conn->sess->num_active_cmds--;
> +	list_del_init(&cmd->link);
> +	bnx2i_return_failed_command(sess, cmd, DID_ABORT);
> +	bnx2i_free_cmd(sess, cmd);
> +
> +cmd_not_active:
> +	return SUCCESS;
> +
> +lack_of_resc:
> +	return FAILED;
> +}
> +
> +
> +/*
> + * SCSI abort request handler.
> + */
> +int bnx2i_abort(struct scsi_cmnd *sc)
> +{
> +	int reason;
> +	struct bnx2i_cmd *cmd = (struct bnx2i_cmd *) sc->SCp.ptr;
> +
> +	if (unlikely(!cmd)) {
> +		/* command already completed to scsi mid-layer */
> +		printk(KERN_INFO "bnx2i_abort: sc 0x%p, not active\n", sc);
> +		return SUCCESS;
> +	}
> +
> +	reason = bnx2i_initiate_abort_cmd(sc);
> +	return reason;
> +}
> +
> +
> +
> +/*
> + * hardware reset
> + */
> +int bnx2i_reset(struct scsi_cmnd *sc)
> +{
> +	return 0;
> +}


So what is up with this one? It seems like if there is a way to reset 
hardware then you would want it as the scsi eh host reset callout 
instead of dropping the session. We could add some transport level 
recovery callouts for the iscsi specifics.


> +
> +
> +void bnx2i_return_failed_command(struct bnx2i_sess *sess,
> +				 struct bnx2i_cmd *cmd, int err_code)
> +{
> +	struct scsi_cmnd *sc = cmd->scsi_cmd;
> +	sc->result = err_code << 16;
> +	sc->resid = cmd->scsi_cmd->request_bufflen;
> +	cmd->scsi_cmd = NULL;
> +	sess->num_active_cmds--;
> +	sc->SCp.ptr = NULL;
> +	sc->scsi_done(sc);
> +}
> +
> +
> +
> +/*
> + * SCSI host reset handler - iSCSI session recovery
> + */
> +int bnx2i_host_reset(struct scsi_cmnd *sc)
> +{
> +	struct Scsi_Host *shost;
> +	struct bnx2i_sess *sess;
> +	int rc = 0;
> +
> +	shost = sc->device->host;
> +	sess = iscsi_hostdata(shost->hostdata);
> +	printk(KERN_INFO "bnx2i: attempting to reset host, #%d\n",
> +			  sess->host->host_no);
> +
> +	BUG_ON(shost != sess->host);
> +	rc = bnx2i_do_iscsi_sess_recovery(sess, DID_RESET);
> +
> +	return rc;
> +}
> +
> +
> +
> +/**********************************************************************
> + *		open-iscsi interface
> + **********************************************************************/
> +
> +
> +#define get_bnx2_device(_hba, _devc) 	do {				\
> +		if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5706) ||	\
> +			(_hba->pci_did == PCI_DEVICE_ID_NX2_5706S)) {	\
> +			_devc = '6';					\
> +		} else if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5708) ||	\
> +			(_hba->pci_did == PCI_DEVICE_ID_NX2_5708S)) {	\
> +			_devc = '8';					\
> +		} else if ((_hba->pci_did == PCI_DEVICE_ID_NX2_5709) ||	\
> +			(_hba->pci_did == PCI_DEVICE_ID_NX2_5709S)) {	\
> +			_devc = '9';					\
> +		}							\
> +	} while (0)
> +
> +/* from open-iscsi project */
> +/*
> + * iSCSI Session's hostdata organization:
> + *
> + *    *------------------* <== hostdata_session(host->hostdata)
> + *    | ptr to class sess|
> + *    |------------------| <== iscsi_hostdata(host->hostdata)
> + *    | iscsi_session    |
> + *    *------------------*
> + */



I think it is going to be nicer to not allocate the host with the 
session for bnx2i like was done for qla4xxx. If you are going to 
allocate the host with the session then it seems like you should be able 
to intergrate into libiscsi like was done with iscsi_tcp and ib_iser. 
And even if you do not allocate a host per session it seems like you 
should be able to better integreate with libiscsi like was done with 
iscsi_tcp and ib_iser. Right now, it is sort of a mix of qla4xxx's model 
and iscsi_tcp/ib_iser's model which is not going to work well in the 
long run.

You can change libiscsi so that you do not run in the host's workq and 
so data structs are better optimized for you and whatever else is 
needed. I mean feel free to change that code so it better suits your 
model. Your code is probably closer to ib_iser since that module does 
not have to handle operations like responding to r2ts and has to make 
its own connections.

There seem to be some bugs from when you might have modeled bnx2i's 
callouts after a older iscsi_tcp or ib_iser. Those bugs got fixed for 
both modules for free when libiscsi got fixed.


> +
> +#define hostdata_privsize(_sz)	(sizeof(unsigned long) + _sz + \
> +				 _sz % sizeof(unsigned long))
> +
> +#define hostdata_session(_hostdata) (iscsi_ptr(*(unsigned long *)_hostdata))
> +
> +#define session_to_cls(_sess) 	hostdata_session(_sess->host->hostdata)
> +
> +
> +
> +
> +/*
> + * Function: bnx2i_register_xport
> + * Description: this routine will allocate memory for SCSI host template,
> + * 		iSCSI template and registers one instance of NX2 device with
> + *		iSCSI Transport Kernel module.
> + */
> +int bnx2i_register_xport(struct bnx2i_hba *hba)
> +{
> +	void *mem_ptr;
> +	char dev_id = '8';
> +
> +	get_bnx2_device(hba, dev_id);
> +
> +	mem_ptr = kmalloc(sizeof(struct scsi_host_template), GFP_KERNEL);
> +	hba->scsi_template = mem_ptr;
> +	if (hba->scsi_template == NULL) {
> +		printk(KERN_ALERT "bnx2i: failed to alloc memory for sht\n");
> +		return -ENOMEM;
> +	}
> +
> +	mem_ptr = kmalloc(sizeof(struct iscsi_transport), GFP_KERNEL);
> +	hba->iscsi_transport = mem_ptr;
> +	if (hba->iscsi_transport == NULL) {
> +		printk(KERN_ALERT "mem error for iscsi_transport template\n");
> +		goto iscsi_xport_err;
> +	}
> +
> +	mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> +	if (mem_ptr == NULL) {
> +		printk(KERN_ALERT "failed to alloc memory for xport name\n");
> +		goto scsi_name_mem_err;
> +	}
> +
> +	memcpy(hba->scsi_template, (const void *) &bnx2i_host_template,
> +	       sizeof(struct scsi_host_template));
> +	hba->scsi_template->name = mem_ptr;
> +	memcpy((void *) hba->scsi_template->name,
> +	       (const void *) bnx2i_host_template.name,
> +	       strlen(bnx2i_host_template.name) + 1);
> +
> +	mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> +	if (mem_ptr == NULL) {
> +		printk(KERN_ALERT "failed to alloc proc name mem\n");
> +		goto scsi_proc_name_mem_err;
> +	}
> +	hba->scsi_template->proc_name = mem_ptr;
> +
> +	memcpy((void *) hba->iscsi_transport,
> +	       (const void *) &bnx2i_iscsi_transport,
> +	       sizeof(struct iscsi_transport));
> +
> +	hba->iscsi_transport->host_template = hba->scsi_template;
> +
> +	mem_ptr = kmalloc(BRCM_ISCSI_XPORT_NAME_SIZE_MAX, GFP_KERNEL);
> +	if (mem_ptr == NULL) {
> +		printk(KERN_ALERT "mem alloc error, iscsi xport name\n");
> +		goto xport_name_mem_err;
> +	}
> +	hba->iscsi_transport->name = mem_ptr;
> +	sprintf(mem_ptr, "%s%c-%.2x%.2x%.2x", BRCM_ISCSI_XPORT_NAME_PREFIX,
> +			 dev_id, (u8)hba->pci_dev->bus->number,
> +			 hba->pci_devno, (u8)hba->pci_func);
> +
> +	memcpy((void *) hba->scsi_template->proc_name, (const void *) mem_ptr,
> +	       strlen(mem_ptr) + 1);
> +
> +	hba->shost_template = iscsi_register_transport(hba->iscsi_transport);
> +	if (!hba->shost_template) {
> +		printk(KERN_ALERT "bnx2i: xport reg failed, hba 0x%p\n", hba);
> +		goto failed_registration;
> +	}


The transport should be allocated like qla4xxx or iscsi_tcp/ib_iser. 
Userspace then sets everything up based on usersettings like with 
qla4xxx and iscsi_tcp and ib_iser.

I started that patch that I sent to you guys to modify 
scsi_transport_iscsi abd libiscsi for you guys, but have not finished it 
yet. I am trying to finish merging Olafs iscsi_tcp patches. Feel free to 
change iscsid, scsi_transport_iscsi and libiscsi for me.


> +}
> +
> +int bnx2i_sysfs_setup(void)
> +{
> +	int ret;
> +	ret = class_register(&bnx2i_class);
> +
> +	bnx2i_register_port_class_dev(&port_class_dev);
> +	return ret;
> +}
> +
> +void bnx2i_sysfs_cleanup(void)
> +{
> +	class_device_unregister(&port_class_dev);
> +	class_unregister(&bnx2i_class);
> +}

The sysfs bits related to the hba should be use one of the scsi sysfs 
facilities or if they are related to iscsi bits and are generic then 
through the iscsi hba

^ permalink raw reply

* Re: [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies
From: Rick Jones @ 2007-09-05 17:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20070905073801.2f12991b@oldman>

Stephen Hemminger wrote:
> On Tue, 4 Sep 2007 13:20:47 -0700 (PDT)
> Rick Jones <rick.jones2@hp.com> wrote:
> 
> 
>>Build upon David Miller's initial patches to set the per-route rto_min
>>so users can specify the rto_min in the same units (milliseconds) in
>>which they are displayed.  This is desirable because asking users to
>>convert to and from jiffies themselves, when there can be different
>>values of HZ from system to system would be error prone.
>>
>>Signed-off-by: Rick Jones <rick.jones2@hp.com>
>>---
> 
> 
> The api in netlink should be in milliseconds rather than compensating
> in the application (iproute2).

In unrelated work I found myself using tc to setup some WAN emulation. 
There I noticed that one can specify times using a units suffix. So, one 
can say 100ms, 100us, or 100s.

That seems like a nice, friendly interface.  I've not looked yet to see 
if it was converting up in tc and then passing a consistent unit to the 
kernel, or if all the conversion was happening in the kernel.

rick jones

^ permalink raw reply

* Re: [PATCH RFC] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts with the host stack.
From: Steve Wise @ 2007-09-05 17:27 UTC (permalink / raw)
  To: Roland Dreier; +Cc: general, ewg, netdev, linux-kernel, sean.hefty
In-Reply-To: <46DECDED.90903@opengridcomputing.com>



Steve Wise wrote:
> 
> 
> Roland Dreier wrote:
>>  > > What's wrong with my suggestion of having the iwarp driver create an
>>  > > "iwX" interface to go with the normal "ethX" interface?  It seems
>>  > > simpler to me, and there's a somewhat similar precedent with how
>>  > > mac80211 devices create both wlan0 and wmaster0 interfaces.
>>  > >  - R.
>>  >  > It seemed much more painful for me to implement. :-)
>>  >  > I'll look into this, but I think for this to be done, the 
>> changes must
>>  > be in the cxgb3 driver, not the rdma driver, because the guts of the
>>  > netdev struct are all private to cxgb3.  Remember that this interface
>>  > needs to still do non TCP traffic (like ARP and UDP)...
>>  >  > Maybe you have something in mind here that I'm not thinking about?
>>
>> No, I was just spouting off.
>>
> 
> At least someone is looking at my patch. ;-)
> 
>> But the whole "create a magic alias" seems kind of unfriendly to the
>> user.  Maybe as you said, the cxgb3 net driver could create the alias
>> for the iw_cxgb3 driver?
> 
> I agree that it is not very user friendly.
> 
> My current patch just utilizes the IP address alias logic in the IP 
> stack.  So when you do 'ifconfig ethxx:blah ipaddr up' it creates a 
> struct in_ifaddr which contains a ptr to the real struct net_device that 
> services this alias.  However, from what I can tell, I cannot just 
> create one of these without binding an address.  So the driver cannot 
> create the alias interface until it knows the ipaddr/netmask/etc.  IE: 
> if you say 'ifconfig ethxx:blah up' it fails...  You must supply an 
> address to get one of these created.
> 
> To have the cxgb3 driver create something like 'iw0', I think it would 
> need to create a full net_device struct.  This makes the change much 
> more complex.  But perhaps its the right thing to do...
> 
> Steve.
> 

Also, I could defer registering the device with the rdma core until the 
alias interface is created by the user.  Thus the T3 device wouldn't be 
available for use until the ethxx:iw interface is created.

And I could log a WARN or INFO message if the iw_cxgb3 module is loaded 
and no ethxx:iw alias exists.  This would help clue in the user...

Steve.



^ permalink raw reply

* [patch 1/1] Fix some Kconfigs on net-2.6.24
From: dlezcano @ 2007-09-05 17:07 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20070905170729.127852231@dyn-9-101-17-26.toulouse-stg.fr.ibm.com>

[-- Attachment #1: net-2.6.24-fix-Kconfig.patch --]
[-- Type: text/plain, Size: 1576 bytes --]

From: Daniel Lezcano <dlezcano@fr.ibm.com>

Three fixes for Kconfigs.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>

---
 drivers/input/misc/Kconfig |    2 +-
 drivers/leds/Kconfig       |    2 +-
 drivers/telephony/Kconfig  |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Index: net-2.6.24/drivers/input/misc/Kconfig
===================================================================
--- net-2.6.24.orig/drivers/input/misc/Kconfig
+++ net-2.6.24/drivers/input/misc/Kconfig
@@ -152,7 +152,7 @@
 
 config INPUT_YEALINK
 	tristate "Yealink usb-p1k voip phone"
-	depends EXPERIMENTAL
+	depends on EXPERIMENTAL
 	depends on USB_ARCH_HAS_HCD
 	select USB
 	help
Index: net-2.6.24/drivers/leds/Kconfig
===================================================================
--- net-2.6.24.orig/drivers/leds/Kconfig
+++ net-2.6.24/drivers/leds/Kconfig
@@ -83,7 +83,7 @@
 
 config LEDS_H1940
 	tristate "LED Support for iPAQ H1940 device"
-	depends LEDS_CLASS && ARCH_H1940
+	depends on LEDS_CLASS && ARCH_H1940
 	help
 	  This option enables support for the LEDs on the h1940.
 
Index: net-2.6.24/drivers/telephony/Kconfig
===================================================================
--- net-2.6.24.orig/drivers/telephony/Kconfig
+++ net-2.6.24/drivers/telephony/Kconfig
@@ -19,7 +19,7 @@
 
 config PHONE_IXJ
 	tristate "QuickNet Internet LineJack/PhoneJack support"
-	depends ISA || PCI
+	depends on ISA || PCI
 	---help---
 	  Say M if you have a telephony card manufactured by Quicknet
 	  Technologies, Inc.  These include the Internet PhoneJACK and

-- 

^ permalink raw reply

* [patch 0/1] [PATCH] Fix Kconfigs for net-2.6.24
From: dlezcano @ 2007-09-05 17:07 UTC (permalink / raw)
  To: davem; +Cc: netdev

Fixes for 3 typos in Kconfig files

-- 

^ permalink raw reply

* Re: kernel crashes inside MV643xx driver
From: Dale Farnsworth @ 2007-09-05 16:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: gshan, linux-kernel, netdev
In-Reply-To: <20070905082452.4a20387b.akpm@linux-foundation.org>

On Wed, Sep 05, 2007 at 08:24:52AM -0700, Andrew Morton wrote:
> > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <gshan@alcatel-lucent.com> wrote:
> > Hi All,
> > 
> > After I started the NFS server, it crashed:
> > 
> > <3>Badness in local_bh_enable at 
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Badness in local_bh_enable at 
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Call trace:
> >  [c0005340] check_bug_trap+0xbc/0x11c
> >  [c0005604] ProgramCheckException+0x264/0x2bc
> >  [c0004ac4] ret_from_except_full+0x0/0x4c
> >  [c0022ae4] local_bh_enable+0x18/0x80
> >  [c024648c] skb_copy_bits+0x168/0x3b8
> >  [c024db44] __skb_linearize+0x90/0x150
> >  [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> >  [c025c934] qdisc_restart+0xac/0x2bc
> >  [c024de9c] dev_queue_xmit+0x298/0x34c
> >  [c0269814] ip_finish_output+0x140/0x2b8
> >  [c026a3ac] ip_fragment+0x3cc/0x6e0
> >  [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> >  [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> >  [c028a7c4] udp_sendpage+0x104/0x188
> >  [c0292fc8] inet_sendpage+0x90/0xb8
> > 
> > I searched the webs and found the similar problems:
> > http://www.mail-archive.com/netdev@vger.kernel.org/msg05199.html
> > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> > 
> > Who knew there are fixes for the problem?
> > 
> 
> Well that got a tremendous response, didn't it?
> 
> What do you mean by "crashed"?  The above is a warning and the system
> should have survived.
> 
> Which kernel version is being used?

That is the key question.  From the pathnames, I suspect that gshan is
using a MontaVista version.  I'm still (especially) interested, since
MontaVista pays me.

BTW, I never received the original message on netdev or linux-kernel.
Hmm.  Thanks to Andrew for replying.

-Dale

^ permalink raw reply

* Re: [PATCH] 3c59x: sparse warning fix
From: Joe Perches @ 2007-09-05 16:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: klassert, Jeff Garzik, netdev
In-Reply-To: <20070905152359.26215f89@oldman>

On Wed, 2007-09-05 at 15:23 +0100, Stephen Hemminger wrote:
> --- a/drivers/net/3c59x.c	2007-09-05 15:15:16.000000000 +0100
> +++ b/drivers/net/3c59x.c	2007-09-05 15:16:29.000000000 +0100
> @@ -1122,7 +1122,7 @@ static int __devinit vortex_probe1(struc
>  					   + sizeof(struct boom_tx_desc) * TX_RING_SIZE,
>  					   &vp->rx_ring_dma);
>  	retval = -ENOMEM;
> -	if (vp->rx_ring == 0)
> +	if (!vp->rx_ring)
>  		goto free_region;
>  
>  	vp->tx_ring = (struct boom_tx_desc *)(vp->rx_ring + RX_RING_SIZE);
> @@ -2476,7 +2476,8 @@ boomerang_rx(struct net_device *dev)
>  
>  			/* Check if the packet is long enough to just accept without
>  			   copying to a properly sized skbuff. */
> -			if (pkt_len < rx_copybreak && (skb = dev_alloc_skb(pkt_len + 2)) != 0) {
> +			if (pkt_len < rx_copybreak &&
> +			    (skb = dev_alloc_skb(pkt_len + 2)) ) {
>  				skb_reserve(skb, 2);	/* Align IP on 16 byte boundaries */
>  				pci_dma_sync_single_for_cpu(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
>  				/* 'skb_put()' points to the start of sk_buff data area. */

Shouldn't this be:

	if (pkt_len < rx_copybreak) {
		skb = dev_alloc_skb(pkt_len + 2);
		if (!skb) {
			bad_news! (like the refill rx ring buffers block)
		}


^ permalink raw reply

* request for information about the "ath5k" licensing
From: Reyk Floeter @ 2007-09-05 15:48 UTC (permalink / raw)
  To: netdev

----- Forwarded message from Reyk Floeter <reyk@openbsd.org> -----

Date: Wed, 5 Sep 2007 17:18:23 +0200
From: Reyk Floeter <reyk@openbsd.org>
To: linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org,
	linux-netdev@vger.kernel.org, jirislaby@gmail.com,
	moglen@softwarefreedom.org, mcgrof@gmail.com
Subject: request for information about the "ath5k" licensing
User-Agent: Mutt/1.5.12-2006-07-14

Hello!

I'm the author of the free hardware driver layer for wireless Atheros
devices in OpenBSD, also known as "OpenHAL".

I'm still trying to get an idea about the facts and the latest state
of the incidence that violated the copyright of my code, because I
just returned from vacation.

I'm very disappointed about this and I hope that it was a mistake,
because it is very unfair and malicious against me and the OpenBSD
community. I invested a lot of time to write the code and to make it
work with as many chipsets as possible. And during the last years, the
OpenBSD community helped to test and to improve the driver. I always
liked the idea to port it to other operating systems, but now somebody
harmed these efforts by violating the license.

Some time ago, I got repeated requests to change the license of the
code to GPL or to dual-license it but I always rejected these
requests. I clearly explained my reasons against dual-licensing in the
past. It needed some time, but it had seemed to me that the involved
people had finally accepted my decision. 

I do like to idea to port the free Atheros driver to other operating
systems in addition to OpenBSD, because it is a clear sign against
hardware companies attacking the free software "community" by
releasing binary-only driver objects instead of free code or hardware
documentation. I used to cooperate with the people working on the
madwifi port of "OpenHAL"; we exchanged ideas, bug fixes, and small
code snippets.  They sent me some bug reports and I also looked at
their changes and reported some functional problems. This was possible
because they kept the license in place. 

But now the Linux code is almost ready and somebody wants to cancel
any options to cooperate by locking me out with a prepended GPL and an
invalid copyright on top of it. I hope that this was not caused by the
same people.

Nevertheless, I'm cross-posting this mail to some lists and people
because I don't know the responsible persons. I have too many mails
about this issue in my inbox and it's very hard to filter out the
useful information.

Could you please give me some feedback about the latest state? Please
reply in private, I'm not subscribed to any of the Linux lists and I'm
rather interested in facts than in the usual trolling.

- Has this issue been fixed?

- Is there any repository available with the "ath5k" code using a
modified/"extended" license?

I don't know how to find the relevant bits in the various Linux git
repositories. Sorry, I don't get the structure of it. Are there any
other sources online except this diff on the linux kernel mailing
list?

- Are there any plans to release the "ath5k" code using a
modified/"extended" license?

Please let me know if it is planned to release my code with any other
license than the attached one. I also strongly disagree with the
concept of adding a new copyright and/or a GPL license on top of it -
it is still a derived work and a few stylistic changes, some code
shuffling, and some bug fixes don't allow to change the copyright. I
invested a lot of time and work to implement "OpenHAL" and it was much
more than just porting it. Please remember, it was a very hard task to
make it happen without any support from Atheros :(.

Please keep the following license and copyright notice in all affected
files and any derived work:

/*
 * Copyright (c) 2004, 2005, 2006, 2007 Reyk Floeter <reyk@openbsd.org>
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

Thanks!
Reyk

-- 
/* .vantronix|secure systems - (research & development)
 * reyk floeter - friendly known free software engineer
 * reyk@vantronix.net - http://team.vantronix.net/reyk/
 */

----- End forwarded message -----

^ permalink raw reply

* Re: kernel crashes inside MV643xx driver
From: Stephen Hemminger @ 2007-09-05 16:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: gshan, linux-kernel, Dale Farnsworth, netdev
In-Reply-To: <20070905082452.4a20387b.akpm@linux-foundation.org>

On Wed, 5 Sep 2007 08:24:52 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <gshan@alcatel-lucent.com> wrote:
> > Hi All,
> > 
> > After I started the NFS server, it crashed:
> > 
> > <3>Badness in local_bh_enable at 
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Badness in local_bh_enable at 
> > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > Call trace:
> >  [c0005340] check_bug_trap+0xbc/0x11c
> >  [c0005604] ProgramCheckException+0x264/0x2bc
> >  [c0004ac4] ret_from_except_full+0x0/0x4c
> >  [c0022ae4] local_bh_enable+0x18/0x80
> >  [c024648c] skb_copy_bits+0x168/0x3b8
> >  [c024db44] __skb_linearize+0x90/0x150
> >  [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> >  [c025c934] qdisc_restart+0xac/0x2bc
> >  [c024de9c] dev_queue_xmit+0x298/0x34c
> >  [c0269814] ip_finish_output+0x140/0x2b8
> >  [c026a3ac] ip_fragment+0x3cc/0x6e0
> >  [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> >  [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> >  [c028a7c4] udp_sendpage+0x104/0x188
> >  [c0292fc8] inet_sendpage+0x90/0xb8
> > 
> > I searched the webs and found the similar problems:
> > http://www.mail-archive.com/netdev@vger.kernel.org/msg05199.html
> > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> > 
> > Who knew there are fixes for the problem?
> > 
> 
> Well that got a tremendous response, didn't it?
> 
> What do you mean by "crashed"?  The above is a warning and the system
> should have survived.
> 
> Which kernel version is being used?

The transmit start rework look like it should have already fixed the problem.
The driver was calling spin_lock_irqsave before calling skb_linearize, now
it checks first.

commit c8aaea25e0b069e9572caa74f984e109899c1765
Author: Dale Farnsworth <dale@farnsworth.org>
Date:   Fri Mar 3 10:02:05 2006 -0700

    [PATCH] mv643xx_eth: Refactor tx command queuing code
    
    Simplify and remove redundant code for filling transmit descriptors.
    No changes in features; it's just a code reorganization/cleanup.
    
    Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

^ permalink raw reply

* [PATCH] sky2: restore multicast list on resume and other ops
From: Stephen Hemminger @ 2007-09-05 15:56 UTC (permalink / raw)
  To: Timo Weingärtner, Jeff Garzik; +Cc: netdev, akpm
In-Reply-To: <200708312010.38276.timo@tiwe.de>

Need to restore multicast settings on resume and after 'ethtool -r'.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

--- a/drivers/net/sky2.c	2007-09-05 13:57:22.000000000 +0100
+++ b/drivers/net/sky2.c	2007-09-05 13:57:24.000000000 +0100
@@ -149,6 +149,8 @@ static const char *yukon2_name[] = {
 	"FE",		/* 0xb7 */
 };
 
+static void sky2_set_multicast(struct net_device *dev);
+
 /* Access to external PHY */
 static int gm_phy_write(struct sky2_hw *hw, unsigned port, u16 reg, u16 val)
 {
@@ -2900,8 +2902,10 @@ static int sky2_set_settings(struct net_
 	sky2->autoneg = ecmd->autoneg;
 	sky2->advertising = ecmd->advertising;
 
-	if (netif_running(dev))
+	if (netif_running(dev)) {
 		sky2_phy_reinit(sky2);
+		sky2_set_multicast(dev);
+	}
 
 	return 0;
 }
@@ -2994,6 +2998,7 @@ static int sky2_nway_reset(struct net_de
 		return -EINVAL;
 
 	sky2_phy_reinit(sky2);
+	sky2_set_multicast(dev);
 
 	return 0;
 }
@@ -4171,6 +4176,8 @@ static int sky2_resume(struct pci_dev *p
 				dev_close(dev);
 				goto out;
 			}
+
+			sky2_set_multicast(dev);
 		}
 	}
 

^ permalink raw reply

* [ofa-general] Re: [PATCH RFC] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts with the host stack.
From: Steve Wise @ 2007-09-05 15:40 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, ewg, general, linux-kernel
In-Reply-To: <ada642puonr.fsf@cisco.com>



Roland Dreier wrote:
>  > > What's wrong with my suggestion of having the iwarp driver create an
>  > > "iwX" interface to go with the normal "ethX" interface?  It seems
>  > > simpler to me, and there's a somewhat similar precedent with how
>  > > mac80211 devices create both wlan0 and wmaster0 interfaces.
>  > >  - R.
>  > 
>  > It seemed much more painful for me to implement. :-)
>  > 
>  > I'll look into this, but I think for this to be done, the changes must
>  > be in the cxgb3 driver, not the rdma driver, because the guts of the
>  > netdev struct are all private to cxgb3.  Remember that this interface
>  > needs to still do non TCP traffic (like ARP and UDP)...
>  > 
>  > Maybe you have something in mind here that I'm not thinking about?
> 
> No, I was just spouting off.
>

At least someone is looking at my patch. ;-)

> But the whole "create a magic alias" seems kind of unfriendly to the
> user.  Maybe as you said, the cxgb3 net driver could create the alias
> for the iw_cxgb3 driver?

I agree that it is not very user friendly.

My current patch just utilizes the IP address alias logic in the IP 
stack.  So when you do 'ifconfig ethxx:blah ipaddr up' it creates a 
struct in_ifaddr which contains a ptr to the real struct net_device that 
services this alias.  However, from what I can tell, I cannot just 
create one of these without binding an address.  So the driver cannot 
create the alias interface until it knows the ipaddr/netmask/etc.  IE: 
if you say 'ifconfig ethxx:blah up' it fails...  You must supply an 
address to get one of these created.

To have the cxgb3 driver create something like 'iw0', I think it would 
need to create a full net_device struct.  This makes the change much 
more complex.  But perhaps its the right thing to do...

Steve.

^ permalink raw reply

* Re: WiFi and changed MAC addrs
From: Johannes Berg @ 2007-09-05 15:34 UTC (permalink / raw)
  To: Ben Woodard; +Cc: netdev
In-Reply-To: <46DEC94C.9010301@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On Wed, 2007-09-05 at 08:20 -0700, Ben Woodard wrote:

> If a particular piece of hardware will refuse to send frames after the 
> MAC address has been changed, then shouldn't the card's driver refuse to 
> allow a change of the MAC address and return an error in response to the 
> IOCTL that tries to change it?

Yes.

> I can't think of a situation where this 
> wouldn't be the right thing to do. However, before I submit a patch to 
> fix this, does anyone else know of any special cases that I might need 
> to deal with?

That wouldn't make sense now, would it?

> Also is behavior of refusing to send frames with a non-hardware 
> specified MAC address part of the WiFi spec or is it particular to 
> certain WiFi cards? IOW at what level do I need to fix it? Should I just 
> fix it for the Intel cards that I've tested or do I need to fix it for 
> all WiFi cards?

Most wifi cards don't care at all what MAC address you send, in fact
allowing you to send with various MAC addresses. So this is something
intel specific. Oh are we talking about iwlwifi? If so, please report to
their maintainers immediately, they're lazy bastards and don't even
upload the new MAC address to the card even if they could.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: kernel crashes inside MV643xx driver
From: Andrew Morton @ 2007-09-05 15:24 UTC (permalink / raw)
  To: gshan; +Cc: linux-kernel, Dale Farnsworth, netdev
In-Reply-To: <46C93701.7050509@alcatel-lucent.com>

> On Mon, 20 Aug 2007 14:38:57 +0800 gshan <gshan@alcatel-lucent.com> wrote:
> Hi All,
> 
> After I started the NFS server, it crashed:
> 
> <3>Badness in local_bh_enable at 
> /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> Badness in local_bh_enable at 
> /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> Call trace:
>  [c0005340] check_bug_trap+0xbc/0x11c
>  [c0005604] ProgramCheckException+0x264/0x2bc
>  [c0004ac4] ret_from_except_full+0x0/0x4c
>  [c0022ae4] local_bh_enable+0x18/0x80
>  [c024648c] skb_copy_bits+0x168/0x3b8
>  [c024db44] __skb_linearize+0x90/0x150
>  [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
>  [c025c934] qdisc_restart+0xac/0x2bc
>  [c024de9c] dev_queue_xmit+0x298/0x34c
>  [c0269814] ip_finish_output+0x140/0x2b8
>  [c026a3ac] ip_fragment+0x3cc/0x6e0
>  [c026bac8] ip_push_pending_frames+0x3dc/0x46c
>  [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
>  [c028a7c4] udp_sendpage+0x104/0x188
>  [c0292fc8] inet_sendpage+0x90/0xb8
> 
> I searched the webs and found the similar problems:
> http://www.mail-archive.com/netdev@vger.kernel.org/msg05199.html
> http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> 
> Who knew there are fixes for the problem?
> 

Well that got a tremendous response, didn't it?

What do you mean by "crashed"?  The above is a warning and the system
should have survived.

Which kernel version is being used?



^ permalink raw reply

* WiFi and changed MAC addrs
From: Ben Woodard @ 2007-09-05 15:20 UTC (permalink / raw)
  To: netdev

I just spent basically a day of my weekend grappling with this problem 
http://wiki.xensource.com/xenwiki/XenWifi
before I understood it well enough to realize that the fact that Xen 
changed the MAC addr for my WiFi card was the root cause of the problem. 
This got me thinking:

If a particular piece of hardware will refuse to send frames after the 
MAC address has been changed, then shouldn't the card's driver refuse to 
allow a change of the MAC address and return an error in response to the 
IOCTL that tries to change it? I can't think of a situation where this 
wouldn't be the right thing to do. However, before I submit a patch to 
fix this, does anyone else know of any special cases that I might need 
to deal with?

Also is behavior of refusing to send frames with a non-hardware 
specified MAC address part of the WiFi spec or is it particular to 
certain WiFi cards? IOW at what level do I need to fix it? Should I just 
fix it for the Intel cards that I've tested or do I need to fix it for 
all WiFi cards?

-ben

^ permalink raw reply

* Re: [PATCH] Remove unneeded pointer newdp from dccp_v4_request_recv_sock() in net/dccp/ipv4.c
From: David Miller @ 2007-09-05 14:58 UTC (permalink / raw)
  To: micah.gruber; +Cc: linux-kernel, netdev
In-Reply-To: <46DE42D9.2040200@gmail.com>

From: Micah Gruber <micah.gruber@gmail.com>
Date: Wed, 05 Sep 2007 13:47:05 +0800

> This trivial patch removes the unneeded pointer newdp, which is never used.
> 
> Signed-off-by: Micah Gruber <micah.gruber@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] Remove unneeded pointer iph from ipcomp6_input() in net/ipv6/ipcomp6.c
From: David Miller @ 2007-09-05 14:57 UTC (permalink / raw)
  To: micah.gruber; +Cc: linux-kernel, netdev
In-Reply-To: <46DE4248.6070604@gmail.com>

From: Micah Gruber <micah.gruber@gmail.com>
Date: Wed, 05 Sep 2007 13:44:40 +0800

> This trivial patch removes the unneeded pointer iph, which is never used.
> 
> Signed-off-by: Micah Gruber < micah.gruber@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH RFC] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts with the host stack.
From: Roland Dreier @ 2007-09-05 15:18 UTC (permalink / raw)
  To: Steve Wise; +Cc: general, ewg, netdev, linux-kernel, sean.hefty
In-Reply-To: <46DD762F.80904@opengridcomputing.com>

 > > What's wrong with my suggestion of having the iwarp driver create an
 > > "iwX" interface to go with the normal "ethX" interface?  It seems
 > > simpler to me, and there's a somewhat similar precedent with how
 > > mac80211 devices create both wlan0 and wmaster0 interfaces.
 > >  - R.
 > 
 > It seemed much more painful for me to implement. :-)
 > 
 > I'll look into this, but I think for this to be done, the changes must
 > be in the cxgb3 driver, not the rdma driver, because the guts of the
 > netdev struct are all private to cxgb3.  Remember that this interface
 > needs to still do non TCP traffic (like ARP and UDP)...
 > 
 > Maybe you have something in mind here that I'm not thinking about?

No, I was just spouting off.

But the whole "create a magic alias" seems kind of unfriendly to the
user.  Maybe as you said, the cxgb3 net driver could create the alias
for the iw_cxgb3 driver?

 - R.

^ permalink raw reply

* Re: [PATCH 9/11] cxgb3 - engine microcode update
From: Steve Wise @ 2007-09-05 14:36 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Divy Le Ray, netdev, linux-kernel
In-Reply-To: <20070905143029.GC19543@havoc.gtf.org>



Jeff Garzik wrote:
> On Wed, Sep 05, 2007 at 09:00:04AM -0500, Steve Wise wrote:
>>
>> Jeff Garzik wrote:
>>> Divy Le Ray wrote:
>>>> From: Divy Le Ray <divy@chelsio.com>
>>>>
>>>> Load microcode engine when the interface
>>>> is configured up.
>>>> Bump up version to 1.1.0.
>>>> Allow the driver to be and running with
>>>> older microcode images.
>>>> Allow ethtool to log the microcode version.
>>>>
>>>> Signed-off-by: Divy Le Ray <divy@chelsio.com>
>>> ACK patches 9-14, but dropped, since they do not apply to #upstream 
>>> (probably due to fixes sent into 2.6.23-rc)
>>>
>> Hey Jeff,
>>
>> I think 9-14 still need to be incorporated.  I don't see them in your 
>> upstream branch, and they aren't in linus' tree either.
> 
> I am not the blocker here.  Re-read the email you quoted.
> 
> 	Jeff, still waiting for the patches
> 

Ok.  I didn't get that you were stating that the patches should be 
resent because they're busted :)


Steve.


^ permalink raw reply

* Re: [PATCH 9/11] cxgb3 - engine microcode update
From: Jeff Garzik @ 2007-09-05 14:30 UTC (permalink / raw)
  To: Steve Wise; +Cc: Divy Le Ray, netdev, linux-kernel
In-Reply-To: <46DEB664.2090704@opengridcomputing.com>

On Wed, Sep 05, 2007 at 09:00:04AM -0500, Steve Wise wrote:
> 
> 
> Jeff Garzik wrote:
> >Divy Le Ray wrote:
> >>From: Divy Le Ray <divy@chelsio.com>
> >>
> >>Load microcode engine when the interface
> >>is configured up.
> >>Bump up version to 1.1.0.
> >>Allow the driver to be and running with
> >>older microcode images.
> >>Allow ethtool to log the microcode version.
> >>
> >>Signed-off-by: Divy Le Ray <divy@chelsio.com>
> >
> >ACK patches 9-14, but dropped, since they do not apply to #upstream 
> >(probably due to fixes sent into 2.6.23-rc)
> >
> 
> Hey Jeff,
> 
> I think 9-14 still need to be incorporated.  I don't see them in your 
> upstream branch, and they aren't in linus' tree either.

I am not the blocker here.  Re-read the email you quoted.

	Jeff, still waiting for the patches




^ permalink raw reply

* [PATCH] 3c59x: sparse warning fix
From: Stephen Hemminger @ 2007-09-05 14:23 UTC (permalink / raw)
  To: klassert, Jeff Garzik; +Cc: netdev

--- a/drivers/net/3c59x.c	2007-09-05 15:15:16.000000000 +0100
+++ b/drivers/net/3c59x.c	2007-09-05 15:16:29.000000000 +0100
@@ -1122,7 +1122,7 @@ static int __devinit vortex_probe1(struc
 					   + sizeof(struct boom_tx_desc) * TX_RING_SIZE,
 					   &vp->rx_ring_dma);
 	retval = -ENOMEM;
-	if (vp->rx_ring == 0)
+	if (!vp->rx_ring)
 		goto free_region;
 
 	vp->tx_ring = (struct boom_tx_desc *)(vp->rx_ring + RX_RING_SIZE);
@@ -2476,7 +2476,8 @@ boomerang_rx(struct net_device *dev)
 
 			/* Check if the packet is long enough to just accept without
 			   copying to a properly sized skbuff. */
-			if (pkt_len < rx_copybreak && (skb = dev_alloc_skb(pkt_len + 2)) != 0) {
+			if (pkt_len < rx_copybreak &&
+			    (skb = dev_alloc_skb(pkt_len + 2)) ) {
 				skb_reserve(skb, 2);	/* Align IP on 16 byte boundaries */
 				pci_dma_sync_single_for_cpu(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
 				/* 'skb_put()' points to the start of sk_buff data area. */

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: jamal @ 2007-09-05 14:21 UTC (permalink / raw)
  To: James Chapman
  Cc: Mandeep Singh Baines, Daniele Venzano, davem, rick.jones2, msb,
	netdev, grundler, robert.olsson, jeff, nhorman
In-Reply-To: <46DEB545.3030509@katalix.com>

On Wed, 2007-05-09 at 14:55 +0100, James Chapman wrote:

> Thanks Jamal. Yes, I'd already read your paper. I think my idea is 
> different to the ideas described in your paper 

I am hoping you can pick from the lessons of what has been tried and
failed and the justification for critiqueing something as "Failed". 
If you have - i feel i accomplished something useful writting the paper.

> and I'm in the process of 
> writing it up as an RFC to post to netdev. 

Please cc me if you want my feedback - I am backlogged by about 3000
messages on netdev.

> Briefly though, the driver's 
> ->poll() holds off doing netif_rx_complete() for 1-2 jiffies and keeps 
> itself in poll_on mode for that time, consuming no quota. 
> The net_rx 
> softirq processing loop is modified to detect the case when the only 
> devices in its poll list are doing no work (consuming no quota). The 
> driver's ->poll() samples jiffies while it is considering when to do the 
> netif_rx_complete() like your Parked state - no timers are used.

Ok, so the difference seems to be you actually poll instead for those
jiffies instead starting a timer in the parked state - is that right?

> If I missed that this approach has already been tried before and 
> rejected, please let me know. I see better throughput and latency in my 
> packet forwarding and LAN setups using it.

If you read the paper: There are no issues with high throughput - NAPI
kicks in.
The challenge to be overcome is at low traffic, if you have a real fast
processor your cpu-cycles-used/bits-processed ratio is high....
If you are polling (softirqs have high prio and depending on the cpu,
there could be a few gazillion within those 1-2 jiffies), then isnt the
end result still a high cpu-cycles used?
Thats what the timer tries to avoid (do nothing until some deffered
point).
If you waste cpu cycles and poll, I can see that (to emphasize: For
FastCPU-LowTraffic scenario), you will end up _not_ having latency
issues i pointed out, but you surely have digressed from the original
goal which is to address the cpu abuse at low traffic (because you abuse
more cpu).

One thing that may be valuable is to show that the timers and polls are
not much different in terms of cpu abuse (It's theoretically not true,
but reality may not match).
The other thing is now hrestimers are on, can you try using a piece of
hardware that can get those kicked and see if you see any useful results
on latency.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 9/11] cxgb3 - engine microcode update
From: Steve Wise @ 2007-09-05 14:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Divy Le Ray, netdev, linux-kernel
In-Reply-To: <46D819CE.5050207@garzik.org>



Jeff Garzik wrote:
> Divy Le Ray wrote:
>> From: Divy Le Ray <divy@chelsio.com>
>>
>> Load microcode engine when the interface
>> is configured up.
>> Bump up version to 1.1.0.
>> Allow the driver to be and running with
>> older microcode images.
>> Allow ethtool to log the microcode version.
>>
>> Signed-off-by: Divy Le Ray <divy@chelsio.com>
> 
> ACK patches 9-14, but dropped, since they do not apply to #upstream 
> (probably due to fixes sent into 2.6.23-rc)
> 

Hey Jeff,

I think 9-14 still need to be incorporated.  I don't see them in your 
upstream branch, and they aren't in linus' tree either.

Steve.


^ permalink raw reply

* Re: [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more flexible.
From: Jesper Dangaard Brouer @ 2007-09-05 13:58 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jesper Dangaard Brouer, netdev@vger.kernel.org, Stephen Hemminger
In-Reply-To: <46DD86F9.8000902@trash.net>

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

On Tue, 2007-09-04 at 18:25 +0200, Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
> > On Sun, 2007-09-02 at 23:16 +0200, Patrick McHardy wrote:
> > 
> >>Jesper Dangaard Brouer wrote:
> >>
> >>>On Sun, 2 Sep 2007, Patrick McHardy wrote:
> >>>
> >>>Lets focus on the general case, where the functionality actually is 
> >>>needed right away.
> >>>
> >>>In the general case:
> >>>
> >>>- The rate table needs to be aligned (cell_align=-1).
> >>>  (currently, we miscalculates up to 7 bytes on every lookup)
> >>
> >>We will always do that, thats a consequence of storing the
> >>transmission times for multiples of 8b.
> > 
> > 
> > The issue is that we use the lower boundary for calculating the transmit
> > cost. Thus, a 15 bytes packet only have a transmit cost of 8 bytes.
> 
> I believe this is something that should be fixed anyway,
> its better to overestimate than underestimate to stay
> in control of the queue. 

Well, I have attached a patch that uses the upper boundry instead.
The patch uses the cell_align feature.

The patch is very simple it self, but figure out what happens the rtab
array requires a little illustration:

Illustrating the rate table array:
 Legend description
  rtab[x]   : Array index x of rtab[x]
  xmit_sz   : Transmit size contained in rtab[x] (normal transmit time)
  maps[a-b] : Packet sizes from a to b, will map into rtab[x]

Current/old rate table mapping (cell_log:3):
 rtab[0]:=xmit_sz:0  maps[0-7]
 rtab[1]:=xmit_sz:8  maps[8-15]
 rtab[2]:=xmit_sz:16 maps[16-23]
 rtab[3]:=xmit_sz:24 maps[24-31]
 rtab[4]:=xmit_sz:32 maps[32-39]
 rtab[5]:=xmit_sz:40 maps[40-47]
 rtab[6]:=xmit_sz:48 maps[48-55]

New rate table mapping, with kernel cell_align support.
 rtab[0]:=xmit_sz:8  maps[0-8]
 rtab[1]:=xmit_sz:16 maps[9-16]
 rtab[2]:=xmit_sz:24 maps[17-24]
 rtab[3]:=xmit_sz:32 maps[25-32]
 rtab[4]:=xmit_sz:40 maps[33-40]
 rtab[5]:=xmit_sz:48 maps[41-48]
 rtab[6]:=xmit_sz:56 maps[49-56]

New TC util on a kernel WITHOUT support for cell_align
 rtab[0]:=xmit_sz:8 maps[0-7]
 rtab[1]:=xmit_sz:16 maps[8-15]
 rtab[2]:=xmit_sz:24 maps[16-23]
 rtab[3]:=xmit_sz:32 maps[24-31]
 rtab[4]:=xmit_sz:40 maps[32-39]
 rtab[5]:=xmit_sz:48 maps[40-47]
 rtab[6]:=xmit_sz:56 maps[48-55]

Notice that without the kernel cell_align feature, we are only off by
one byte.  That should be acceptable, when somebody uses a new TC util
on a old kernel. 

> We could additionally make the
> rate tables more finegrained (optionally).

That is actually already possible with the approach used to handle
overflow of the rate table ("TSO" large packet support).  By setting
cell_log=0, and letting the overflow code handle the rest, we get a very
fingrained lookup.


> >>>- The existing tc overhead calc can be made more accurate.
> >>>  (by adding overhead before doing the lookup, instead of the
> >>>   current solution where the rate table is modified with its
> >>>   limited resolution)
> >>
> >>Please demonstrate this with patches (one for the overhead
> >>calculation, one for the cell_align thing), then we can
> >>continue this discussion.
> > 
> > 
> > I have attached a patch for the overhead calculation.

Attached is a patch that uses "the cell_align thing".

> Thanks, I probably won't get to looking into this until
> after the netfilter workshop next week.

Okay, but I'll see you at the workshop, so I might bug you there ;-)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk

[-- Attachment #2: upperbound_rate_table_aligned.patch --]
[-- Type: text/x-patch, Size: 2156 bytes --]

commit 9a21e8bd56a5f057fc9f605e061c22d264ec27ef
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Wed Sep 5 15:24:51 2007 +0200

    [IPROUTE2]: Change the rate table calc of transmit cost to use upper bound value.
    
    Patrick McHardy, Cite: 'its better to overestimate than underestimate
    to stay in control of the queue'.
    
    Illustrating the rate table array:
     Legend description
       rtab[x]   : Array index x of rtab[x]
       xmit_sz   : Transmit size contained in rtab[x] (normally transmit time)
       maps[a-b] : Packet sizes from a to b, will map into rtab[x]
    
    Current/old rate table mapping (cell_log:3):
     rtab[0]:=xmit_sz:0  maps[0-7]
     rtab[1]:=xmit_sz:8  maps[8-15]
     rtab[2]:=xmit_sz:16 maps[16-23]
     rtab[3]:=xmit_sz:24 maps[24-31]
     rtab[4]:=xmit_sz:32 maps[32-39]
     rtab[5]:=xmit_sz:40 maps[40-47]
     rtab[6]:=xmit_sz:48 maps[48-55]
    
    New rate table mapping, with kernel cell_align support.
     rtab[0]:=xmit_sz:8  maps[0-8]
     rtab[1]:=xmit_sz:16 maps[9-16]
     rtab[2]:=xmit_sz:24 maps[17-24]
     rtab[3]:=xmit_sz:32 maps[25-32]
     rtab[4]:=xmit_sz:40 maps[33-40]
     rtab[5]:=xmit_sz:48 maps[41-48]
     rtab[6]:=xmit_sz:56 maps[49-56]
    
    New TC util on a kernel WITHOUT support for cell_align
     rtab[0]:=xmit_sz:8 maps[0-7]
     rtab[1]:=xmit_sz:16 maps[8-15]
     rtab[2]:=xmit_sz:24 maps[16-23]
     rtab[3]:=xmit_sz:32 maps[24-31]
     rtab[4]:=xmit_sz:40 maps[32-39]
     rtab[5]:=xmit_sz:48 maps[40-47]
     rtab[6]:=xmit_sz:56 maps[48-55]
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/tc/tc_core.c b/tc/tc_core.c
index c713a18..752b07c 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -84,11 +84,12 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mt
 			cell_log++;
 	}
 	for (i=0; i<256; i++) {
-		unsigned sz = (i<<cell_log);
+		unsigned sz = ((i+1)<<cell_log);
 		if (sz < mpu)
 			sz = mpu;
 		rtab[i] = tc_calc_xmittime(bps, sz);
 	}
+	r->cell_align=-1; // Due to the sz calc
 	r->cell_log=cell_log;
 	return cell_log;
 }

[-- Attachment #3: cleanup_tc_calc_rtable_git.patch --]
[-- Type: text/x-patch, Size: 6028 bytes --]

commit 29044ac37e30d9662ad1bb83290a007c492ad7b2
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Wed Sep 5 10:47:47 2007 +0200

    [IPROUTE2]: Cleanup: tc_calc_rtable().
    
    Change tc_calc_rtable() to take a tc_ratespec struct as an
    argument. (cell_log still needs to be passed on as a parameter,
    because -1 indicate that the cell_log needs to be computed by the
    function.).
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/tc/m_police.c b/tc/m_police.c
index 5d2528b..acdfd22 100644
--- a/tc/m_police.c
+++ b/tc/m_police.c
@@ -263,22 +263,20 @@ int act_parse_police(struct action_util *a,int *argc_p, char ***argv_p, int tca_
 	}
 
 	if (p.rate.rate) {
-		if ((Rcell_log = tc_calc_rtable(p.rate.rate, rtab, Rcell_log, mtu, mpu)) < 0) {
+		p.rate.mpu = mpu;
+		if (tc_calc_rtable(&p.rate, rtab, Rcell_log, mtu) < 0) {
 			fprintf(stderr, "TBF: failed to calculate rate table.\n");
 			return -1;
 		}
 		p.burst = tc_calc_xmittime(p.rate.rate, buffer);
-		p.rate.cell_log = Rcell_log;
-		p.rate.mpu = mpu;
 	}
 	p.mtu = mtu;
 	if (p.peakrate.rate) {
-		if ((Pcell_log = tc_calc_rtable(p.peakrate.rate, ptab, Pcell_log, mtu, mpu)) < 0) {
+		p.peakrate.mpu = mpu;
+		if (tc_calc_rtable(&p.peakrate, ptab, Pcell_log, mtu) < 0) {
 			fprintf(stderr, "POLICE: failed to calculate peak rate table.\n");
 			return -1;
 		}
-		p.peakrate.cell_log = Pcell_log;
-		p.peakrate.mpu = mpu;
 	}
 
 	tail = NLMSG_TAIL(n);
diff --git a/tc/q_cbq.c b/tc/q_cbq.c
index f2b4ce8..df98312 100644
--- a/tc/q_cbq.c
+++ b/tc/q_cbq.c
@@ -137,12 +137,11 @@ static int cbq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 	if (allot < (avpkt*3)/2)
 		allot = (avpkt*3)/2;
 
-	if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, allot, mpu)) < 0) {
+	r.mpu = mpu;
+	if (tc_calc_rtable(&r, rtab, cell_log, allot) < 0) {
 		fprintf(stderr, "CBQ: failed to calculate rate table.\n");
 		return -1;
 	}
-	r.cell_log = cell_log;
-	r.mpu = mpu;
 
 	if (ewma_log < 0)
 		ewma_log = TC_CBQ_DEF_EWMA;
@@ -336,12 +335,11 @@ static int cbq_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 		unsigned pktsize = wrr.allot;
 		if (wrr.allot < (lss.avpkt*3)/2)
 			wrr.allot = (lss.avpkt*3)/2;
-		if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, pktsize, mpu)) < 0) {
+		r.mpu = mpu;
+		if (tc_calc_rtable(&r, rtab, cell_log, pktsize) < 0) {
 			fprintf(stderr, "CBQ: failed to calculate rate table.\n");
 			return -1;
 		}
-		r.cell_log = cell_log;
-		r.mpu = mpu;
 	}
 	if (ewma_log < 0)
 		ewma_log = TC_CBQ_DEF_EWMA;
diff --git a/tc/q_htb.c b/tc/q_htb.c
index b579ebe..cca77fa 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -212,19 +212,17 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 	opt.ceil.mpu = mpu;
 	opt.rate.mpu = mpu;
 
-	if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, mpu)) < 0) {
+	if (tc_calc_rtable(&opt.rate, rtab, cell_log, mtu) < 0) {
 		fprintf(stderr, "htb: failed to calculate rate table.\n");
 		return -1;
 	}
 	opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-	opt.rate.cell_log = cell_log;
 
-	if ((ccell_log = tc_calc_rtable(opt.ceil.rate, ctab, cell_log, mtu, mpu)) < 0) {
+	if (tc_calc_rtable(&opt.ceil, ctab, ccell_log, mtu) < 0) {
 		fprintf(stderr, "htb: failed to calculate ceil rate table.\n");
 		return -1;
 	}
 	opt.cbuffer = tc_calc_xmittime(opt.ceil.rate, cbuffer);
-	opt.ceil.cell_log = ccell_log;
 
 	tail = NLMSG_TAIL(n);
 	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
diff --git a/tc/q_tbf.c b/tc/q_tbf.c
index 1fc05f4..c7b4f0f 100644
--- a/tc/q_tbf.c
+++ b/tc/q_tbf.c
@@ -170,21 +170,20 @@ static int tbf_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 		opt.limit = lim;
 	}
 
-	if ((Rcell_log = tc_calc_rtable(opt.rate.rate, rtab, Rcell_log, mtu, mpu)) < 0) {
+	opt.rate.mpu = mpu;
+	if (tc_calc_rtable(&opt.rate, rtab, Rcell_log, mtu) < 0) {
 		fprintf(stderr, "TBF: failed to calculate rate table.\n");
 		return -1;
 	}
 	opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-	opt.rate.cell_log = Rcell_log;
-	opt.rate.mpu = mpu;
+
 	if (opt.peakrate.rate) {
-		if ((Pcell_log = tc_calc_rtable(opt.peakrate.rate, ptab, Pcell_log, mtu, mpu)) < 0) {
+		opt.peakrate.mpu = mpu;
+		if (tc_calc_rtable(&opt.peakrate, ptab, Pcell_log, mtu) < 0) {
 			fprintf(stderr, "TBF: failed to calculate peak rate table.\n");
 			return -1;
 		}
 		opt.mtu = tc_calc_xmittime(opt.peakrate.rate, mtu);
-		opt.peakrate.cell_log = Pcell_log;
-		opt.peakrate.mpu = mpu;
 	}
 
 	tail = NLMSG_TAIL(n);
diff --git a/tc/tc_core.c b/tc/tc_core.c
index 1ab0ba0..c713a18 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -69,10 +69,11 @@ unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks)
    rtab[pkt_len>>cell_log] = pkt_xmit_time
  */
 
-int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu,
-		   unsigned mpu)
+int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mtu)
 {
 	int i;
+	unsigned bps = r->rate;
+	unsigned mpu = r->mpu;
 
 	if (mtu == 0)
 		mtu = 2047;
@@ -88,6 +89,7 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu,
 			sz = mpu;
 		rtab[i] = tc_calc_xmittime(bps, sz);
 	}
+	r->cell_log=cell_log;
 	return cell_log;
 }
 
diff --git a/tc/tc_core.h b/tc/tc_core.h
index a139da6..e98a7b4 100644
--- a/tc/tc_core.h
+++ b/tc/tc_core.h
@@ -13,7 +13,7 @@ long tc_core_time2ktime(long time);
 long tc_core_ktime2time(long ktime);
 unsigned tc_calc_xmittime(unsigned rate, unsigned size);
 unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks);
-int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu, unsigned mpu);
+int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mtu);
 
 int tc_setup_estimator(unsigned A, unsigned time_const, struct tc_estimator *est);
 

^ permalink raw reply related

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: James Chapman @ 2007-09-05 13:55 UTC (permalink / raw)
  To: hadi
  Cc: Mandeep Singh Baines, Daniele Venzano, davem, rick.jones2, msb,
	netdev, grundler, robert.olsson, jeff, nhorman
In-Reply-To: <1188995619.4238.5.camel@localhost>

jamal wrote:
> On Wed, 2007-05-09 at 13:03 +0100, James Chapman wrote:
> 
>> I have a patch that solves the high interrupt rate problem by keeping 
>> the driver in polled mode longer. It's written for the latest NAPI 
>> version that DaveM posted recently. I'll try to get some time to write 
>> it up and post it for comment.
> 
> Theres interesting challenges to be tackled with resolving that. 
> Just so you dont reinvent the wheel or to help invent a better one;
> please read:
> http://kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf
> 
> if you have, what are the differences in your approach - and when
> do you find it useful?

Thanks Jamal. Yes, I'd already read your paper. I think my idea is 
different to the ideas described in your paper and I'm in the process of 
writing it up as an RFC to post to netdev. Briefly though, the driver's 
->poll() holds off doing netif_rx_complete() for 1-2 jiffies and keeps 
itself in poll_on mode for that time, consuming no quota. The net_rx 
softirq processing loop is modified to detect the case when the only 
devices in its poll list are doing no work (consuming no quota). The 
driver's ->poll() samples jiffies while it is considering when to do the 
netif_rx_complete() like your Parked state - no timers are used.

If I missed that this approach has already been tried before and 
rejected, please let me know. I see better throughput and latency in my 
packet forwarding and LAN setups using it.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* Re: Fwd: That whole "Linux stealing our code" thing
From: David H. Lynch Jr. @ 2007-09-05 13:22 UTC (permalink / raw)
  To: netdev
In-Reply-To: <Pine.LNX.4.64.0709021754250.6772@condmat1.ciencias.uniovi.es>

Igor Sobrado wrote:
>>
>> There is definite value in sharing the ath5k HAL between OpenBSD and 
>> Linux.
>
> Of course. Sharing knowledge and efforts can only improve both the GPL 
> and BSD licensed code. It is important in all cases, but becomes 
> critical when support from manufacturers is limited or even non existent.

Cooperation between OpenBSD developers and Linux developers, wwould be 
wonderful, but this appears to just be the latest of a number of 
disputes that have devolved into legalism and acrimony.

The time wasted fighting over this seems significantly larger than the 
effort need to solve it.

The respective BSD and GPL licensed code is open documentation for the 
programming of the typically closed device.
What is wrong with chosing to rewrite the drivers in contention ?
If the level of bile is sufficiently high it might make sense to do so 
using "clean room" techniques, where one developer uses the source 
licensed driver as the basis for writing documentation
and another developer uses the documentation as the basis for writing a 
new driver. The original author could/should still be credited.

It might even make sense to use projects like this as a means of 
recruiting new driver developers and building their skills - drafting 
prospective kernel developers from kernel-newbies, or asking for
volunteers on the appropriate lists.
Not having looked at the code for either the Linux or BSD atheros 
driver, but having some limited linux network driver experience, I would 
be happy to make an attempt at writing a clean Linux GPL
driver for atheros cards.

Another benefit to this approach is it might cool tempers. Neither the 
GPL not the BSD/ISC Licenses protect the information their authors have 
painstakingly extracted about the hardware.
If both sides recognize that copyright - particularly Open Source 
copyright licenses do not somehow make the ideas and information they 
express proprietary, and that all that is really
in contention is credit and a reduction in labor, then maybe it would be 
easier to get them to agree to modifying licenses.










Dave Lynch 					  	    DLA Systems
Software Development:  				         Embedded Linux
717.627.3770 	       dhlii@dlasys.net 	  http://www.dlasys.net
fax: 1.253.369.9244 			           Cell: 1.717.587.7774
Over 25 years' experience in platforms, languages, and technologies too numerous to list.

"Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction."
Albert Einstein


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox