Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Open-FCoE] [PATCH RFC 2/5] qedf: Add QLogic FastLinQ offload FCoE driver framework.
From: Chad Dupuis @ 2017-01-09 16:45 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: martin.petersen, fcoe-devel, netdev, QLogic-Storage-Upstream,
	linux-scsi, yuval.mintz
In-Reply-To: <7df11d77-df7b-af58-2646-e92cdbba2146@suse.de>


On Wed, 28 Dec 2016, 9:00am -0000, Hannes Reinecke wrote:

> On 12/23/2016 08:17 PM, Dupuis, Chad wrote:
> > From: "Dupuis, Chad" <chad.dupuis@cavium.com>
> > 
> > The QLogic FastLinQ Driver for FCoE (qedf) is the FCoE specific module
> > for 41000 Series Converged Network Adapters by QLogic.
> > 
> > This patch consists of following changes:
> >   - MAINTAINERS Makefile and Kconfig changes for qedf
> >   - PCI driver registration
> >   - libfc/fcoe host level initialization
> >   - SCSI host template initialization and callbacks
> >   - Debugfs and log level infrastructure
> >   - Link handling
> >   - Firmware interface structures
> >   - QED core module initialization
> >   - Light L2 interface callbacks
> > 
> > Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
> > Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
> > Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
> > Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
> > ---
> >  MAINTAINERS                      |    6 +
> >  drivers/scsi/Kconfig             |    1 +
> >  drivers/scsi/qedf/Kconfig        |   11 +
> >  drivers/scsi/qedf/Makefile       |    5 +
> >  drivers/scsi/qedf/qedf.h         |  555 ++++++
> >  drivers/scsi/qedf/qedf_attr.c    |  165 ++
> >  drivers/scsi/qedf/qedf_dbg.c     |  192 +++
> >  drivers/scsi/qedf/qedf_dbg.h     |  153 ++
> >  drivers/scsi/qedf/qedf_debugfs.c |  472 +++++
> >  drivers/scsi/qedf/qedf_main.c    | 3519 ++++++++++++++++++++++++++++++++++++++
> >  drivers/scsi/qedf/qedf_version.h |   15 +
> >  11 files changed, 5094 insertions(+)
> >  create mode 100644 drivers/scsi/qedf/Kconfig
> >  create mode 100644 drivers/scsi/qedf/Makefile
> >  create mode 100644 drivers/scsi/qedf/qedf.h
> >  create mode 100644 drivers/scsi/qedf/qedf_attr.c
> >  create mode 100644 drivers/scsi/qedf/qedf_dbg.c
> >  create mode 100644 drivers/scsi/qedf/qedf_dbg.h
> >  create mode 100644 drivers/scsi/qedf/qedf_debugfs.c
> >  create mode 100644 drivers/scsi/qedf/qedf_main.c
> >  create mode 100644 drivers/scsi/qedf/qedf_version.h
> > 
> [ .. ]
> > +/* Returns true if we have a valid vlan, false otherwise */
> > +static bool qedf_initiate_fipvlan_req(struct qedf_ctx *qedf)
> > +{
> > +	int rc;
> > +
> > +	if (atomic_read(&qedf->link_state) != QEDF_LINK_UP) {
> > +		QEDF_ERR(&(qedf->dbg_ctx), "Link not up.\n");
> > +		return  false;
> > +	}
> > +
> > +	while (qedf->fipvlan_retries--) {
> > +		if (qedf->vlan_id > 0)
> > +			return true;
> Some weird FCoE bridges (most notably HP VirtualConnect) return a VLAN
> ID of '0'. Shouldn't you rather test for '>= 0' here?

Will look into this but isn't a VLAN ID of 0 not valid?

> 
> [ .. ]
> > +
> > +static void qedf_flogi_resp(struct fc_seq *seq, struct fc_frame *fp,
> > +	void *arg)
> > +{
> > +	struct fc_exch *exch = fc_seq_exch(seq);
> > +	struct fc_lport *lport = exch->lp;
> > +	struct qedf_ctx *qedf = lport_priv(lport);
> > +
> > +	if (!qedf) {
> > +		QEDF_ERR(NULL, "qedf is NULL.\n");
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * If ERR_PTR is set then don't try to stat anything as it will cause
> > +	 * a crash when we access fp.
> > +	 */
> > +	if (fp == ERR_PTR(-FC_EX_TIMEOUT) ||
> > +	    fp == ERR_PTR(-FC_EX_CLOSED)) {
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
> > +		    "fp has ERR_PTR() set.\n");
> > +		goto skip_stat;
> > +	}
> 
> Please use
> 
> if (IS_ERR(fp)) {
> 
> here instead of checking for individual error codes; if 'fp' has a
> different error value you'll continue with an invalid fp from here on.
>

Will fix up.
 
> [ .. ]
> 
> > +/**
> > + * qedf_xmit - qedf FCoE frame transmit function
> > + *
> > + */
> > +static int qedf_xmit(struct fc_lport *lport, struct fc_frame *fp)
> > +{
> > +	struct fc_lport		*base_lport;
> > +	struct qedf_ctx		*qedf;
> > +	struct ethhdr		*eh;
> > +	struct fcoe_crc_eof	*cp;
> > +	struct sk_buff		*skb;
> > +	struct fc_frame_header	*fh;
> > +	struct fcoe_hdr		*hp;
> > +	u8			sof, eof;
> > +	u32			crc;
> > +	unsigned int		hlen, tlen, elen;
> > +	int			wlen;
> > +	struct fc_stats		*stats;
> > +	struct fc_lport *tmp_lport;
> > +	struct fc_lport *vn_port = NULL;
> > +	struct qedf_rport *fcport;
> > +	int rc;
> > +	u16 vlan_tci = 0;
> > +	unsigned long flags;
> > +
> > +	qedf = (struct qedf_ctx *)lport_priv(lport);
> > +
> > +	fh = fc_frame_header_get(fp);
> > +	skb = fp_skb(fp);
> > +
> > +	/* Filter out traffic to other NPIV ports on the same host */
> > +	if (lport->vport)
> > +		base_lport = shost_priv(vport_to_shost(lport->vport));
> > +	else
> > +		base_lport = lport;
> > +
> > +	/* Flag if the destination is the base port */
> > +	if (base_lport->port_id == ntoh24(fh->fh_d_id)) {
> > +		vn_port = base_lport;
> > +	} else {
> > +		/* Got through the list of vports attached to the base_lport
> > +		 * and see if we have a match with the destination address.
> > +		 */
> > +		list_for_each_entry(tmp_lport, &base_lport->vports, list) {
> > +			if (tmp_lport->port_id == ntoh24(fh->fh_d_id)) {
> > +				vn_port = tmp_lport;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +	if (vn_port && ntoh24(fh->fh_d_id) != FC_FID_FLOGI) {
> > +		struct fc_rport_priv *rdata = NULL;
> > +
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_LL2,
> > +		    "Dropping FCoE frame to %06x.\n", ntoh24(fh->fh_d_id));
> > +		kfree_skb(skb);
> > +		rdata = fc_rport_lookup(lport, ntoh24(fh->fh_d_id));
> > +		if (rdata)
> > +			rdata->retries = lport->max_rport_retry_count;
> > +		return -EINVAL;
> > +	}
> > +	/* End NPIV filtering */
> > +
> > +	if (!qedf->ctlr.sel_fcf) {
> > +		kfree_skb(skb);
> > +		return 0;
> > +	}
> > +
> > +	if (!test_bit(QEDF_LL2_STARTED, &qedf->flags)) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "LL2 not started\n");
> > +		kfree_skb(skb);
> > +		return 0;
> > +	}
> > +
> > +	if (atomic_read(&qedf->link_state) != QEDF_LINK_UP) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "qedf link down\n");
> > +		kfree_skb(skb);
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(fh->fh_r_ctl == FC_RCTL_ELS_REQ)) {
> > +		if (fcoe_ctlr_els_send(&qedf->ctlr, lport, skb))
> > +			return 0;
> > +	}
> > +
> > +	/* Check to see if this needs to be sent on an offloaded session */
> > +	spin_lock_irqsave(&qedf->hba_lock, flags);
> > +	fcport = qedf_fcport_lookup(qedf, ntoh24(fh->fh_d_id));
> > +	spin_unlock_irqrestore(&qedf->hba_lock, flags);
> > +
> Really sad, having to take a spinlock here to get to the session.
> Can't you use RCU for rport lookup?
> That would save you the spinlock here ...

Yes, will look into converting this to use RCU

> 
> > +	if (fcport && test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags)) {
> > +		rc = qedf_xmit_l2_frame(fcport, fp);
> > +		/*
> > +		 * If the frame was successfully sent over the middle path
> > +		 * then do not try to also send it over the LL2 path
> > +		 */
> > +		if (rc)
> > +			return 0;
> > +	}
> > +
> > +	sof = fr_sof(fp);
> > +	eof = fr_eof(fp);
> > +
> > +	elen = sizeof(struct ethhdr);
> > +	hlen = sizeof(struct fcoe_hdr);
> > +	tlen = sizeof(struct fcoe_crc_eof);
> > +	wlen = (skb->len - tlen + sizeof(crc)) / FCOE_WORD_TO_BYTE;
> > +
> > +	skb->ip_summed = CHECKSUM_NONE;
> > +	crc = fcoe_fc_crc(fp);
> > +
> > +	/* copy port crc and eof to the skb buff */
> > +	if (skb_is_nonlinear(skb)) {
> > +		skb_frag_t *frag;
> > +
> > +		if (qedf_get_paged_crc_eof(skb, tlen)) {
> > +			kfree_skb(skb);
> > +			return -ENOMEM;
> > +		}
> > +		frag = &skb_shinfo(skb)->frags[skb_shinfo(skb)->nr_frags - 1];
> > +		cp = kmap_atomic(skb_frag_page(frag)) + frag->page_offset;
> > +	} else {
> > +		cp = (struct fcoe_crc_eof *)skb_put(skb, tlen);
> > +	}
> > +
> > +	memset(cp, 0, sizeof(*cp));
> > +	cp->fcoe_eof = eof;
> > +	cp->fcoe_crc32 = cpu_to_le32(~crc);
> > +	if (skb_is_nonlinear(skb)) {
> > +		kunmap_atomic(cp);
> > +		cp = NULL;
> > +	}
> > +
> > +
> > +	/* adjust skb network/transport offsets to match mac/fcoe/port */
> > +	skb_push(skb, elen + hlen);
> > +	skb_reset_mac_header(skb);
> > +	skb_reset_network_header(skb);
> > +	skb->mac_len = elen;
> > +	skb->protocol = htons(ETH_P_FCOE);
> > +
> > +	__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), qedf->vlan_id);
> > +
> > +	/* fill up mac and fcoe headers */
> > +	eh = eth_hdr(skb);
> > +	eh->h_proto = htons(ETH_P_FCOE);
> > +	if (qedf->ctlr.map_dest)
> > +		fc_fcoe_set_mac(eh->h_dest, fh->fh_d_id);
> > +	else
> > +		/* insert GW address */
> > +		ether_addr_copy(eh->h_dest, qedf->ctlr.dest_addr);
> > +
> > +	/* Set the source MAC address */
> > +	fc_fcoe_set_mac(eh->h_source, fh->fh_s_id);
> > +
> > +	hp = (struct fcoe_hdr *)(eh + 1);
> > +	memset(hp, 0, sizeof(*hp));
> > +	if (FC_FCOE_VER)
> > +		FC_FCOE_ENCAPS_VER(hp, FC_FCOE_VER);
> > +	hp->fcoe_sof = sof;
> > +
> > +	/*update tx stats */
> > +	stats = per_cpu_ptr(lport->stats, get_cpu());
> > +	stats->TxFrames++;
> > +	stats->TxWords += wlen;
> > +	put_cpu();
> > +
> > +	/* Get VLAN ID from skb for printing purposes */
> > +	__vlan_hwaccel_get_tag(skb, &vlan_tci);
> > +
> > +	/* send down to lld */
> > +	fr_dev(fp) = lport;
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_LL2, "FCoE frame send: "
> > +	    "src=%06x dest=%06x r_ctl=%x type=%x vlan=%04x.\n",
> > +	    ntoh24(fh->fh_s_id), ntoh24(fh->fh_d_id), fh->fh_r_ctl, fh->fh_type,
> > +	    vlan_tci);
> > +	if (qedf_dump_frames)
> > +		print_hex_dump(KERN_WARNING, "fcoe: ", DUMP_PREFIX_OFFSET, 16,
> > +		    1, skb->data, skb->len, false);
> > +	qed_ops->ll2->start_xmit(qedf->cdev, skb);
> > +
> > +	return 0;
> > +}
> > +
> > +static int qedf_alloc_sq(struct qedf_ctx *qedf, struct qedf_rport *fcport)
> > +{
> > +	int rval = 0;
> > +	u32 *pbl;
> > +	dma_addr_t page;
> > +	int num_pages;
> > +
> > +	/* Calculate appropriate queue and PBL sizes */
> > +	fcport->sq_mem_size = SQ_NUM_ENTRIES * sizeof(struct fcoe_wqe);
> > +	fcport->sq_mem_size = ALIGN(fcport->sq_mem_size, QEDF_PAGE_SIZE);
> > +	fcport->sq_pbl_size = (fcport->sq_mem_size / QEDF_PAGE_SIZE) *
> > +	    sizeof(void *);
> > +	fcport->sq_pbl_size = fcport->sq_pbl_size + QEDF_PAGE_SIZE;
> > +
> > +	fcport->sq = dma_alloc_coherent(&qedf->pdev->dev, fcport->sq_mem_size,
> > +	    &fcport->sq_dma, GFP_KERNEL);
> > +	if (!fcport->sq) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "Could not allocate send "
> > +			   "queue.\n");
> > +		rval = 1;
> > +		goto out;
> > +	}
> > +	memset(fcport->sq, 0, fcport->sq_mem_size);
> > +
> > +	fcport->sq_pbl = dma_alloc_coherent(&qedf->pdev->dev,
> > +	    fcport->sq_pbl_size, &fcport->sq_pbl_dma, GFP_KERNEL);
> > +	if (!fcport->sq_pbl) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "Could not allocate send "
> > +			   "queue PBL.\n");
> > +		rval = 1;
> > +		goto out_free_sq;
> > +	}
> > +	memset(fcport->sq_pbl, 0, fcport->sq_pbl_size);
> > +
> > +	/* Create PBL */
> > +	num_pages = fcport->sq_mem_size / QEDF_PAGE_SIZE;
> > +	page = fcport->sq_dma;
> > +	pbl = (u32 *)fcport->sq_pbl;
> > +
> > +	while (num_pages--) {
> > +		*pbl = U64_LO(page);
> > +		pbl++;
> > +		*pbl = U64_HI(page);
> > +		pbl++;
> > +		page += QEDF_PAGE_SIZE;
> > +	}
> > +
> > +	return rval;
> > +
> > +out_free_sq:
> > +	dma_free_coherent(&qedf->pdev->dev, fcport->sq_mem_size, fcport->sq,
> > +	    fcport->sq_dma);
> > +out:
> > +	return rval;
> > +}
> > +
> > +static void qedf_free_sq(struct qedf_ctx *qedf, struct qedf_rport *fcport)
> > +{
> > +	if (fcport->sq_pbl)
> > +		dma_free_coherent(&qedf->pdev->dev, fcport->sq_pbl_size,
> > +		    fcport->sq_pbl, fcport->sq_pbl_dma);
> > +	if (fcport->sq)
> > +		dma_free_coherent(&qedf->pdev->dev, fcport->sq_mem_size,
> > +		    fcport->sq, fcport->sq_dma);
> > +}
> > +
> > +/*
> > + * Allocate a cookie into the qedf_ctx rport list.  Assumes the hba lock
> > + * is held on entry.
> > + */
> > +static int qedf_alloc_conn_id(struct qedf_ctx *qedf, struct qedf_rport *fcport)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < QEDF_MAX_SESSIONS; i++) {
> > +		qedf->curr_conn_id++;
> > +		if (qedf->curr_conn_id == QEDF_MAX_SESSIONS)
> > +			qedf->curr_conn_id = 0;
> > +		if (qedf->fcports[qedf->curr_conn_id] == NULL) {
> > +			qedf->fcports[qedf->curr_conn_id] = fcport;
> > +			fcport->conn_id = qedf->curr_conn_id;
> > +			break;
> > +		}
> > +	}
> > +	if (i == QEDF_MAX_SESSIONS)
> > +		return -1;
> > +	else
> > +		return 0;
> > +}
> > +
> Have you looked at the 'sbitmap' code for conn_id allocation?
> Should be giving you the same results, but you won't need to use a
> spinlock ...

I've not.  I'll look into sbitmap for V2.

> 
> 
> > +static int qedf_offload_connection(struct qedf_ctx *qedf,
> > +	struct qedf_rport *fcport)
> > +{
> > +	struct qed_fcoe_params_offload conn_info;
> > +	u32 port_id;
> > +	u8 lport_src_id[3];
> > +	int rval;
> > +	uint16_t total_sqe = (fcport->sq_mem_size / sizeof(struct fcoe_wqe));
> > +
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Offloading connection "
> > +		   "portid=%06x.\n", fcport->rdata->ids.port_id);
> > +	rval = qed_ops->acquire_conn(qedf->cdev, &fcport->handle,
> > +	    &fcport->fw_cid, &fcport->p_doorbell);
> > +	if (rval) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "Could not acquire connection "
> > +			   "for portid=%06x.\n", fcport->rdata->ids.port_id);
> > +		rval = 1; /* For some reason qed returns 0 on failure here */
> > +		goto out;
> > +	}
> > +
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "portid=%06x "
> > +		   "fw_cid=%08x handle=%d.\n", fcport->rdata->ids.port_id,
> > +		   fcport->fw_cid, fcport->handle);
> > +
> > +	memset(&conn_info, 0, sizeof(struct qed_fcoe_params_offload));
> > +
> > +	/* Fill in the offload connection info */
> > +	conn_info.sq_pbl_addr = fcport->sq_pbl_dma;
> > +
> > +	conn_info.sq_curr_page_addr = (dma_addr_t)(*(u64 *)fcport->sq_pbl);
> > +	conn_info.sq_next_page_addr =
> > +	    (dma_addr_t)(*(u64 *)(fcport->sq_pbl + 8));
> > +
> > +	/* Need to use our FCoE MAC for the offload session */
> > +	port_id = fc_host_port_id(qedf->lport->host);
> > +	lport_src_id[2] = (port_id & 0x000000FF);
> > +	lport_src_id[1] = (port_id & 0x0000FF00) >> 8;
> > +	lport_src_id[0] = (port_id & 0x00FF0000) >> 16;
> > +	fc_fcoe_set_mac(conn_info.src_mac, lport_src_id);
> > +
> > +	ether_addr_copy(conn_info.dst_mac, qedf->ctlr.dest_addr);
> > +
> > +	conn_info.tx_max_fc_pay_len = fcport->rdata->maxframe_size;
> > +	conn_info.e_d_tov_timer_val = qedf->lport->e_d_tov / 20;
> > +	conn_info.rec_tov_timer_val = 3; /* I think this is what E3 was */
> > +	conn_info.rx_max_fc_pay_len = fcport->rdata->maxframe_size;
> > +
> > +	/* Set VLAN data */
> > +	conn_info.vlan_tag = qedf->vlan_id <<
> > +	    FCOE_CONN_OFFLOAD_RAMROD_DATA_VLAN_ID_SHIFT;
> > +	conn_info.vlan_tag |=
> > +	    qedf_default_prio << FCOE_CONN_OFFLOAD_RAMROD_DATA_PRIORITY_SHIFT;
> > +	conn_info.flags |= (FCOE_CONN_OFFLOAD_RAMROD_DATA_B_VLAN_FLAG_MASK <<
> > +	    FCOE_CONN_OFFLOAD_RAMROD_DATA_B_VLAN_FLAG_SHIFT);
> > +
> > +	/* Set host port source id */
> > +	port_id = fc_host_port_id(qedf->lport->host);
> > +	fcport->sid = port_id;
> > +	conn_info.s_id.addr_hi = (port_id & 0x000000FF);
> > +	conn_info.s_id.addr_mid = (port_id & 0x0000FF00) >> 8;
> > +	conn_info.s_id.addr_lo = (port_id & 0x00FF0000) >> 16;
> > +
> > +	conn_info.max_conc_seqs_c3 = fcport->rdata->max_seq;
> > +
> > +	/* Set remote port destination id */
> > +	port_id = fcport->rdata->rport->port_id;
> > +	conn_info.d_id.addr_hi = (port_id & 0x000000FF);
> > +	conn_info.d_id.addr_mid = (port_id & 0x0000FF00) >> 8;
> > +	conn_info.d_id.addr_lo = (port_id & 0x00FF0000) >> 16;
> > +
> > +	conn_info.def_q_idx = 0; /* Default index for send queue? */
> > +
> > +	/* Set FC-TAPE specific flags if needed */
> > +	if (fcport->dev_type == QEDF_RPORT_TYPE_TAPE) {
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN,
> > +		    "Enable CONF, REC for portid=%06x.\n",
> > +		    fcport->rdata->ids.port_id);
> > +		conn_info.flags |= 1 <<
> > +		    FCOE_CONN_OFFLOAD_RAMROD_DATA_B_CONF_REQ_SHIFT;
> > +		conn_info.flags |=
> > +		    ((fcport->rdata->sp_features & FC_SP_FT_SEQC) ? 1 : 0) <<
> > +		    FCOE_CONN_OFFLOAD_RAMROD_DATA_B_REC_VALID_SHIFT;
> > +	}
> > +
> > +	rval = qed_ops->offload_conn(qedf->cdev, fcport->handle, &conn_info);
> > +	if (rval) {
> > +		QEDF_WARN(&(qedf->dbg_ctx), "Could not offload connection "
> > +			   "for portid=%06x.\n", fcport->rdata->ids.port_id);
> > +		goto out_free_conn;
> > +	} else
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Offload "
> > +			   "succeeded portid=%06x total_sqe=%d.\n",
> > +			   fcport->rdata->ids.port_id, total_sqe);
> > +
> > +	spin_lock_init(&fcport->rport_lock);
> > +	atomic_set(&fcport->free_sqes, total_sqe);
> > +	return 0;
> > +out_free_conn:
> > +	qed_ops->release_conn(qedf->cdev, fcport->handle);
> > +out:
> > +	return rval;
> > +}
> > +
> > +#define QEDF_TERM_BUFF_SIZE		10
> > +static void qedf_upload_connection(struct qedf_ctx *qedf,
> > +	struct qedf_rport *fcport)
> > +{
> > +	void *term_params;
> > +	dma_addr_t term_params_dma;
> > +
> > +	/* Term params needs to be a DMA coherent buffer as qed shared the
> > +	 * physical DMA address with the firmware. The buffer may be used in
> > +	 * the receive path so we may eventually have to move this.
> > +	 */
> > +	term_params = dma_alloc_coherent(&qedf->pdev->dev, QEDF_TERM_BUFF_SIZE,
> > +		&term_params_dma, GFP_KERNEL);
> > +
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Uploading connection "
> > +		   "port_id=%06x.\n", fcport->rdata->ids.port_id);
> > +
> > +	qed_ops->destroy_conn(qedf->cdev, fcport->handle, term_params_dma);
> > +	qed_ops->release_conn(qedf->cdev, fcport->handle);
> > +
> > +	dma_free_coherent(&qedf->pdev->dev, QEDF_TERM_BUFF_SIZE, term_params,
> > +	    term_params_dma);
> > +}
> > +
> > +static void qedf_cleanup_fcport(struct qedf_ctx *qedf,
> > +	struct qedf_rport *fcport)
> > +{
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Clearing conn_id=%u "
> > +		   "for portid=%06x.\n", fcport->conn_id,
> > +		   fcport->rdata->ids.port_id);
> > +
> > +	/* Flush any remaining i/o's before we upload the connection */
> > +	qedf_flush_active_ios(fcport, -1);
> > +
> > +	spin_lock(&qedf->hba_lock);
> > +	qedf->fcports[fcport->conn_id] = NULL;
> > +	fcport->conn_id = -1;
> > +	spin_unlock(&qedf->hba_lock);
> > +
> > +	if (test_and_clear_bit(QEDF_RPORT_SESSION_READY, &fcport->flags))
> > +		qedf_upload_connection(qedf, fcport);
> > +	qedf_free_sq(qedf, fcport);
> > +	fcport->rdata = NULL;
> > +	fcport->qedf = NULL;
> > +}
> > +
> > +/**
> > + * This event_callback is called after successful completion of libfc
> > + * initiated target login. qedf can proceed with initiating the session
> > + * establishment.
> > + */
> > +static void qedf_rport_event_handler(struct fc_lport *lport,
> > +				struct fc_rport_priv *rdata,
> > +				enum fc_rport_event event)
> > +{
> > +	struct qedf_ctx *qedf = lport_priv(lport);
> > +	struct fc_rport *rport = rdata->rport;
> > +	struct fc_rport_libfc_priv *rp;
> > +	struct qedf_rport *fcport;
> > +	u32 port_id;
> > +	int rval;
> > +
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC, "event = %d, "
> > +		   "port_id = 0x%x\n", event, rdata->ids.port_id);
> > +
> > +	switch (event) {
> > +	case RPORT_EV_READY:
> > +		if (!rport) {
> > +			QEDF_WARN(&(qedf->dbg_ctx), "rport is NULL.\n");
> > +			break;
> > +		}
> > +
> > +		rp = rport->dd_data;
> > +		fcport = (struct qedf_rport *)&rp[1];
> > +		fcport->qedf = qedf;
> > +
> > +		/*
> > +		 * Don't try to offload the session again. Can happen when we
> > +		 * get an ADISC
> > +		 */
> > +		if (test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags)) {
> > +			QEDF_WARN(&(qedf->dbg_ctx), "Session already "
> > +				   "offloaded, portid=0x%x.\n",
> > +				   rdata->ids.port_id);
> > +			return;
> > +		}
> > +
> > +		/*
> > +		 * Set the connection id to -1 so we know if we ever assigned
> > +		 * one to the fcport.
> > +		 */
> > +		fcport->conn_id = -1;
> > +
> > +		if (rport->port_id == FC_FID_DIR_SERV) {
> > +			/*
> > +			 * qedf_rport structure doesn't exist for
> > +			 * directory server.
> > +			 * We should not come here, as lport will
> > +			 * take care of fabric login
> > +			 */
> > +			QEDF_WARN(&(qedf->dbg_ctx), "rport struct does not "
> > +			    "exist for dir server port_id=%x\n",
> > +			    rdata->ids.port_id);
> > +			break;
> > +		}
> > +
> > +		if (rdata->spp_type != FC_TYPE_FCP) {
> > +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
> > +			    "Not offlading since since spp type isn't FCP\n");
> > +			break;
> > +		}
> > +		if (!(rdata->ids.roles & FC_RPORT_ROLE_FCP_TARGET)) {
> > +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
> > +			    "Not FCP target so not offloading\n");
> > +			break;
> > +		}
> > +
> > +		spin_lock(&qedf->hba_lock);
> > +		rval = qedf_alloc_conn_id(qedf, fcport);
> > +		spin_unlock(&qedf->hba_lock);
> > +
> > +		if (rval) {
> > +			QEDF_WARN(&(qedf->dbg_ctx), "Could not allocate "
> > +				   "conn_id for port %06x.\n",
> > +				   rdata->ids.port_id);
> > +			break;
> > +		}
> > +
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
> > +			   "Assigned conn_id=%u to port_id=%06x.\n",
> > +			    fcport->conn_id, rdata->ids.port_id);
> > +
> > +		fcport->rdata = rdata;
> > +		fcport->rport = rport;
> > +
> > +		rval = qedf_alloc_sq(qedf, fcport);
> > +		if (rval) {
> > +			qedf_cleanup_fcport(qedf, fcport);
> > +			break;
> > +		}
> > +
> > +		/* Set device type */
> > +		if (rdata->flags & FC_RP_FLAGS_RETRY &&
> > +		    rdata->ids.roles & FC_RPORT_ROLE_FCP_TARGET &&
> > +		    !(rdata->ids.roles & FC_RPORT_ROLE_FCP_INITIATOR)) {
> > +			fcport->dev_type = QEDF_RPORT_TYPE_TAPE;
> > +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
> > +			    "portid=%06x is a TAPE device.\n",
> > +			    rdata->ids.port_id);
> > +		} else {
> > +			fcport->dev_type = QEDF_RPORT_TYPE_DISK;
> > +		}
> > +
> > +		rval = qedf_offload_connection(qedf, fcport);
> > +		if (rval) {
> > +			qedf_cleanup_fcport(qedf, fcport);
> > +			break;
> > +		}
> > +
> > +		/*
> > +		 * Set the session ready bit to let everyone know that this
> > +		 * connection is ready for I/O
> > +		 */
> > +		set_bit(QEDF_RPORT_SESSION_READY, &fcport->flags);
> > +		atomic_inc(&qedf->num_offloads);
> > +
> > +		break;
> > +	case RPORT_EV_LOGO:
> > +	case RPORT_EV_FAILED:
> > +	case RPORT_EV_STOP:
> > +		port_id = rdata->ids.port_id;
> > +		if (port_id == FC_FID_DIR_SERV)
> > +			break;
> > +
> > +		if (!rport) {
> > +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC,
> > +			    "port_id=%x - rport notcreated Yet!!\n", port_id);
> > +			break;
> > +		}
> > +		rp = rport->dd_data;
> > +		/*
> > +		 * Perform session upload. Note that rdata->peers is already
> > +		 * removed from disc->rports list before we get this event.
> > +		 */
> > +		fcport = (struct qedf_rport *)&rp[1];
> > +
> > +		/*
> > +		 * Only free the conn_id if this fcport was initialized with
> > +		 * one.
> > +		 */
> > +		if (fcport->conn_id > -1) {
> > +			set_bit(QEDF_RPORT_UPLOADING_CONNECTION, &fcport->flags);
> > +			qedf_cleanup_fcport(qedf, fcport);
> > +			clear_bit(QEDF_RPORT_UPLOADING_CONNECTION,
> > +			    &fcport->flags);
> > +			atomic_dec(&qedf->num_offloads);
> > +		}
> > +
> > +		break;
> > +
> > +	case RPORT_EV_NONE:
> > +		break;
> > +	}
> > +}
> > +
> > +static void qedf_abort_io(struct fc_lport *lport)
> > +{
> > +	/* NO-OP but need to fill in the template */
> > +}
> > +
> > +static void qedf_fcp_cleanup(struct fc_lport *lport)
> > +{
> > +	/*
> > +	 * NO-OP but need to fill in template to prevent a NULL
> > +	 * function pointer dereference during link down. I/Os
> > +	 * will be flushed when port is uploaded.
> > +	 */
> > +}
> > +
> > +static struct libfc_function_template qedf_lport_template = {
> > +	.frame_send		= qedf_xmit,
> > +	.fcp_abort_io		= qedf_abort_io,
> > +	.fcp_cleanup		= qedf_fcp_cleanup,
> > +	.rport_event_callback	= qedf_rport_event_handler,
> > +	.elsct_send		= qedf_elsct_send,
> > +};
> > +
> > +static void qedf_fcoe_ctlr_setup(struct qedf_ctx *qedf)
> > +{
> > +	fcoe_ctlr_init(&qedf->ctlr, FIP_ST_AUTO);
> > +
> > +	qedf->ctlr.send = qedf_fip_send;
> > +	qedf->ctlr.update_mac = qedf_update_src_mac;
> > +	qedf->ctlr.get_src_addr = qedf_get_src_mac;
> > +	ether_addr_copy(qedf->ctlr.ctl_src_addr, qedf->mac);
> > +}
> > +
> > +static int qedf_lport_setup(struct qedf_ctx *qedf)
> > +{
> > +	struct fc_lport *lport = qedf->lport;
> > +
> > +	lport->link_up = 0;
> > +	lport->max_retry_count = QEDF_FLOGI_RETRY_CNT;
> > +	lport->max_rport_retry_count = QEDF_RPORT_RETRY_CNT;
> > +	lport->service_params = (FCP_SPPF_INIT_FCN | FCP_SPPF_RD_XRDY_DIS |
> > +	    FCP_SPPF_RETRY | FCP_SPPF_CONF_COMPL);
> > +	lport->boot_time = jiffies;
> > +	lport->e_d_tov = 2 * 1000;
> > +	lport->r_a_tov = 10 * 1000;
> > +
> > +	/* Set NPIV support */
> > +	lport->does_npiv = 1;
> > +	fc_host_max_npiv_vports(lport->host) = QEDF_MAX_NPIV;
> > +
> > +	fc_set_wwnn(lport, qedf->wwnn);
> > +	fc_set_wwpn(lport, qedf->wwpn);
> > +
> > +	fcoe_libfc_config(lport, &qedf->ctlr, &qedf_lport_template, 0);
> > +
> > +	/* Allocate the exchange manager */
> > +	fc_exch_mgr_alloc(lport, FC_CLASS_3, qedf->max_scsi_xid + 1,
> > +	    qedf->max_els_xid, NULL);
> > +
> > +	if (fc_lport_init_stats(lport))
> > +		return -ENOMEM;
> > +
> > +	/* Finish lport config */
> > +	fc_lport_config(lport);
> > +
> > +	/* Set max frame size */
> > +	fc_set_mfs(lport, QEDF_MFS);
> > +	fc_host_maxframe_size(lport->host) = lport->mfs;
> > +
> > +	/* Set default dev_loss_tmo based on module parameter */
> > +	fc_host_dev_loss_tmo(lport->host) = qedf_dev_loss_tmo;
> > +
> > +	/* Set symbolic node name */
> > +	snprintf(fc_host_symbolic_name(lport->host), 256,
> > +	    "QLogic %s v%s", QEDF_MODULE_NAME, QEDF_VERSION);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * NPIV functions
> > + */
> > +
> > +static int qedf_vport_libfc_config(struct fc_vport *vport,
> > +	struct fc_lport *lport)
> > +{
> > +	lport->link_up = 0;
> > +	lport->qfull = 0;
> > +	lport->max_retry_count = QEDF_FLOGI_RETRY_CNT;
> > +	lport->max_rport_retry_count = QEDF_RPORT_RETRY_CNT;
> > +	lport->service_params = (FCP_SPPF_INIT_FCN | FCP_SPPF_RD_XRDY_DIS |
> > +	    FCP_SPPF_RETRY | FCP_SPPF_CONF_COMPL);
> > +	lport->boot_time = jiffies;
> > +	lport->e_d_tov = 2 * 1000;
> > +	lport->r_a_tov = 10 * 1000;
> > +	lport->does_npiv = 1; /* Temporary until we add NPIV support */
> > +
> > +	/* Allocate stats for vport */
> > +	if (fc_lport_init_stats(lport))
> > +		return -ENOMEM;
> > +
> > +	/* Finish lport config */
> > +	fc_lport_config(lport);
> > +
> > +	/* offload related configuration */
> > +	lport->crc_offload = 0;
> > +	lport->seq_offload = 0;
> > +	lport->lro_enabled = 0;
> > +	lport->lro_xid = 0;
> > +	lport->lso_max = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +static int qedf_vport_create(struct fc_vport *vport, bool disabled)
> > +{
> > +	struct Scsi_Host *shost = vport_to_shost(vport);
> > +	struct fc_lport *n_port = shost_priv(shost);
> > +	struct fc_lport *vn_port;
> > +	struct qedf_ctx *base_qedf = lport_priv(n_port);
> > +	struct qedf_ctx *vport_qedf;
> > +	int i;
> > +
> > +	char buf[32];
> > +	int rc = 0;
> > +
> > +	rc = fcoe_validate_vport_create(vport);
> > +	if (rc) {
> > +		fcoe_wwn_to_str(vport->port_name, buf, sizeof(buf));
> > +		QEDF_WARN(&(base_qedf->dbg_ctx), "Failed to create vport, "
> > +			   "WWPN (0x%s) already exists.\n", buf);
> > +		goto err1;
> > +	}
> > +
> > +	if (atomic_read(&base_qedf->link_state) != QEDF_LINK_UP) {
> > +		QEDF_WARN(&(base_qedf->dbg_ctx), "Cannot create vport "
> > +			   "because link is not up.\n");
> > +		rc = -EIO;
> > +		goto err1;
> > +	}
> > +
> > +	vn_port = libfc_vport_create(vport, sizeof(struct qedf_ctx));
> > +	if (!vn_port) {
> > +		QEDF_WARN(&(base_qedf->dbg_ctx), "Could not create lport "
> > +			   "for vport.\n");
> > +		rc = -ENOMEM;
> > +		goto err1;
> > +	}
> > +
> > +	fcoe_wwn_to_str(vport->port_name, buf, sizeof(buf));
> > +	QEDF_ERR(&(base_qedf->dbg_ctx), "Creating NPIV port, WWPN=%s.\n",
> > +	    buf);
> > +
> > +	/* Copy some fields from base_qedf */
> > +	vport_qedf = lport_priv(vn_port);
> > +	memcpy(vport_qedf, base_qedf, sizeof(struct qedf_ctx));
> > +
> > +	/* Set qedf data specific to this vport */
> > +	vport_qedf->lport = vn_port;
> > +	/* Use same hba_lock as base_qedf */
> > +	vport_qedf->hba_lock = base_qedf->hba_lock;
> > +	/* Purge any fcport info from base_qedf */
> > +	for (i = 0; i < QEDF_MAX_SESSIONS; i++)
> > +		vport_qedf->fcports[i] = NULL;
> > +	vport_qedf->pdev = base_qedf->pdev;
> > +	vport_qedf->cmd_mgr = base_qedf->cmd_mgr;
> > +	init_completion(&vport_qedf->flogi_compl);
> > +
> > +	rc = qedf_vport_libfc_config(vport, vn_port);
> > +	if (rc) {
> > +		QEDF_ERR(&(base_qedf->dbg_ctx), "Could not allocate memory "
> > +		    "for lport stats.\n");
> > +		goto err2;
> > +	}
> > +
> > +	fc_set_wwnn(vn_port, vport->node_name);
> > +	fc_set_wwpn(vn_port, vport->port_name);
> > +	vport_qedf->wwnn = vn_port->wwnn;
> > +	vport_qedf->wwpn = vn_port->wwpn;
> > +
> > +	vn_port->host->transportt = qedf_fc_vport_transport_template;
> > +	vn_port->host->can_queue = QEDF_MAX_ELS_XID;
> > +	vn_port->host->max_lun = qedf_max_lun;
> > +	vn_port->host->sg_tablesize = QEDF_MAX_BDS_PER_CMD;
> > +	vn_port->host->max_cmd_len = QEDF_MAX_CDB_LEN;
> > +
> > +	rc = scsi_add_host(vn_port->host, &vport->dev);
> > +	if (rc) {
> > +		QEDF_WARN(&(base_qedf->dbg_ctx), "Error adding Scsi_Host.\n");
> > +		goto err2;
> > +	}
> > +
> > +	/* Set default dev_loss_tmo based on module parameter */
> > +	fc_host_dev_loss_tmo(vn_port->host) = qedf_dev_loss_tmo;
> > +
> > +	/* Init libfc stuffs */
> > +	memcpy(&vn_port->tt, &qedf_lport_template,
> > +		sizeof(qedf_lport_template));
> > +	fc_exch_init(vn_port);
> > +	fc_elsct_init(vn_port);
> > +	fc_lport_init(vn_port);
> > +	fc_disc_init(vn_port);
> > +	fc_disc_config(vn_port, vn_port);
> > +
> > +
> > +	/* Allocate the exchange manager */
> > +	shost = vport_to_shost(vport);
> > +	n_port = shost_priv(shost);
> > +	fc_exch_mgr_list_clone(n_port, vn_port);
> > +
> > +	/* Set max frame size */
> > +	fc_set_mfs(vn_port, QEDF_MFS);
> > +
> > +	fc_host_port_type(vn_port->host) = FC_PORTTYPE_UNKNOWN;
> > +
> > +	if (disabled) {
> > +		fc_vport_set_state(vport, FC_VPORT_DISABLED);
> > +	} else {
> > +		vn_port->boot_time = jiffies;
> > +		fc_fabric_login(vn_port);
> > +		fc_vport_setlink(vn_port);
> > +	}
> > +
> > +	QEDF_INFO(&(base_qedf->dbg_ctx), QEDF_LOG_NPIV, "vn_port=%p.\n",
> > +		   vn_port);
> > +
> > +	/* Set up debug context for vport */
> > +	vport_qedf->dbg_ctx.host_no = vn_port->host->host_no;
> > +	vport_qedf->dbg_ctx.pdev = base_qedf->pdev;
> > +
> > +err2:
> > +	scsi_host_put(vn_port->host);
> > +err1:
> > +	return rc;
> > +}
> > +
> > +static int qedf_vport_destroy(struct fc_vport *vport)
> > +{
> > +	struct Scsi_Host *shost = vport_to_shost(vport);
> > +	struct fc_lport *n_port = shost_priv(shost);
> > +	struct fc_lport *vn_port = vport->dd_data;
> > +
> > +	mutex_lock(&n_port->lp_mutex);
> > +	list_del(&vn_port->list);
> > +	mutex_unlock(&n_port->lp_mutex);
> > +
> > +	fc_fabric_logoff(vn_port);
> > +	fc_lport_destroy(vn_port);
> > +
> > +	/* Detach from scsi-ml */
> > +	fc_remove_host(vn_port->host);
> > +	scsi_remove_host(vn_port->host);
> > +
> > +	/*
> > +	 * Only try to release the exchange manager if the vn_port
> > +	 * configuration is complete.
> > +	 */
> > +	if (vn_port->state == LPORT_ST_READY)
> > +		fc_exch_mgr_free(vn_port);
> > +
> > +	/* Free memory used by statistical counters */
> > +	fc_lport_free_stats(vn_port);
> > +
> > +	/* Release Scsi_Host */
> > +	if (vn_port->host)
> > +		scsi_host_put(vn_port->host);
> > +
> > +	return 0;
> > +}
> > +
> > +static int qedf_vport_disable(struct fc_vport *vport, bool disable)
> > +{
> > +	struct fc_lport *lport = vport->dd_data;
> > +
> > +	if (disable) {
> > +		fc_vport_set_state(vport, FC_VPORT_DISABLED);
> > +		fc_fabric_logoff(lport);
> > +	} else {
> > +		lport->boot_time = jiffies;
> > +		fc_fabric_login(lport);
> > +		fc_vport_setlink(lport);
> > +	}
> > +	return 0;
> > +}
> > +
> > +/*
> > + * During removal we need to wait for all the vports associated with a port
> > + * to be destroyed so we avoid a race condition where libfc is still trying
> > + * to reap vports while the driver remove function has already reaped the
> > + * driver contexts associated with the physical port.
> > + */
> > +static void qedf_wait_for_vport_destroy(struct qedf_ctx *qedf)
> > +{
> > +	struct fc_host_attrs *fc_host = shost_to_fc_host(qedf->lport->host);
> > +
> > +	QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_NPIV,
> > +	    "Entered.\n");
> > +	while (fc_host->npiv_vports_inuse > 0) {
> > +		QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_NPIV,
> > +		    "Waiting for all vports to be reaped.\n");
> > +		msleep(1000);
> > +	}
> > +}
> > +
> > +/**
> > + * qedf_fcoe_reset - Resets the fcoe
> > + *
> > + * @shost: shost the reset is from
> > + *
> > + * Returns: always 0
> > + */
> > +static int qedf_fcoe_reset(struct Scsi_Host *shost)
> > +{
> > +	struct fc_lport *lport = shost_priv(shost);
> > +
> > +	fc_fabric_logoff(lport);
> > +	fc_fabric_login(lport);
> > +	return 0;
> > +}
> > +
> > +static struct fc_host_statistics *qedf_fc_get_host_stats(struct Scsi_Host
> > +	*shost)
> > +{
> > +	struct fc_host_statistics *qedf_stats;
> > +	struct fc_lport *lport = shost_priv(shost);
> > +	struct qedf_ctx *qedf = lport_priv(lport);
> > +	struct qed_fcoe_stats *fw_fcoe_stats;
> > +
> > +	qedf_stats = fc_get_host_stats(shost);
> > +
> > +	/* We don't collect offload stats for specific NPIV ports */
> > +	if (lport->vport)
> > +		goto out;
> > +
> > +	fw_fcoe_stats = kmalloc(sizeof(struct qed_fcoe_stats), GFP_KERNEL);
> > +	if (!fw_fcoe_stats) {
> > +		QEDF_ERR(&(qedf->dbg_ctx), "Could not allocate memory for "
> > +		    "fw_fcoe_stats.\n");
> > +		goto out;
> > +	}
> > +
> > +	/* Query firmware for offload stats */
> > +	qed_ops->get_stats(qedf->cdev, fw_fcoe_stats);
> > +
> > +	/*
> > +	 * The expectation is that we add our offload stats to the stats
> > +	 * being maintained by libfc each time the fc_get_host_status callback
> > +	 * is invoked. The additions are not carried over for each call to
> > +	 * the fc_get_host_stats callback.
> > +	 */
> > +	qedf_stats->tx_frames += fw_fcoe_stats->fcoe_tx_data_pkt_cnt +
> > +	    fw_fcoe_stats->fcoe_tx_xfer_pkt_cnt +
> > +	    fw_fcoe_stats->fcoe_tx_other_pkt_cnt;
> > +	qedf_stats->rx_frames += fw_fcoe_stats->fcoe_rx_data_pkt_cnt +
> > +	    fw_fcoe_stats->fcoe_rx_xfer_pkt_cnt +
> > +	    fw_fcoe_stats->fcoe_rx_other_pkt_cnt;
> > +	qedf_stats->fcp_input_megabytes += fw_fcoe_stats->fcoe_rx_byte_cnt /
> > +	    1000000;
> > +	qedf_stats->fcp_output_megabytes += fw_fcoe_stats->fcoe_tx_byte_cnt /
> > +	    1000000;
> > +	qedf_stats->rx_words += fw_fcoe_stats->fcoe_rx_byte_cnt / 4;
> > +	qedf_stats->tx_words += fw_fcoe_stats->fcoe_tx_byte_cnt / 4;
> > +	qedf_stats->invalid_crc_count +=
> > +	    fw_fcoe_stats->fcoe_silent_drop_pkt_crc_error_cnt;
> > +	qedf_stats->dumped_frames =
> > +	    fw_fcoe_stats->fcoe_silent_drop_total_pkt_cnt;
> > +	qedf_stats->error_frames +=
> > +	    fw_fcoe_stats->fcoe_silent_drop_total_pkt_cnt;
> > +	qedf_stats->fcp_input_requests += qedf->input_requests;
> > +	qedf_stats->fcp_output_requests += qedf->output_requests;
> > +	qedf_stats->fcp_control_requests += qedf->control_requests;
> > +	qedf_stats->fcp_packet_aborts += qedf->packet_aborts;
> > +	qedf_stats->fcp_frame_alloc_failures += qedf->alloc_failures;
> > +
> > +	kfree(fw_fcoe_stats);
> > +out:
> > +	return qedf_stats;
> > +}
> > +
> > +static struct fc_function_template qedf_fc_transport_fn = {
> > +	.show_host_node_name = 1,
> > +	.show_host_port_name = 1,
> > +	.show_host_supported_classes = 1,
> > +	.show_host_supported_fc4s = 1,
> > +	.show_host_active_fc4s = 1,
> > +	.show_host_maxframe_size = 1,
> > +
> > +	.show_host_port_id = 1,
> > +	.show_host_supported_speeds = 1,
> > +	.get_host_speed = fc_get_host_speed,
> > +	.show_host_speed = 1,
> > +	.show_host_port_type = 1,
> > +	.get_host_port_state = fc_get_host_port_state,
> > +	.show_host_port_state = 1,
> > +	.show_host_symbolic_name = 1,
> > +
> > +	/*
> > +	 * Tell FC transport to allocate enough space to store the backpointer
> > +	 * for the associate qedf_rport struct.
> > +	 */
> > +	.dd_fcrport_size = (sizeof(struct fc_rport_libfc_priv) +
> > +				sizeof(struct qedf_rport)),
> > +	.show_rport_maxframe_size = 1,
> > +	.show_rport_supported_classes = 1,
> > +	.show_host_fabric_name = 1,
> > +	.show_starget_node_name = 1,
> > +	.show_starget_port_name = 1,
> > +	.show_starget_port_id = 1,
> > +	.set_rport_dev_loss_tmo = fc_set_rport_loss_tmo,
> > +	.show_rport_dev_loss_tmo = 1,
> > +	.get_fc_host_stats = qedf_fc_get_host_stats,
> > +	.issue_fc_host_lip = qedf_fcoe_reset,
> > +	.vport_create = qedf_vport_create,
> > +	.vport_delete = qedf_vport_destroy,
> > +	.vport_disable = qedf_vport_disable,
> > +	.bsg_request = fc_lport_bsg_request,
> > +};
> > +
> > +static struct fc_function_template qedf_fc_vport_transport_fn = {
> > +	.show_host_node_name = 1,
> > +	.show_host_port_name = 1,
> > +	.show_host_supported_classes = 1,
> > +	.show_host_supported_fc4s = 1,
> > +	.show_host_active_fc4s = 1,
> > +	.show_host_maxframe_size = 1,
> > +	.show_host_port_id = 1,
> > +	.show_host_supported_speeds = 1,
> > +	.get_host_speed = fc_get_host_speed,
> > +	.show_host_speed = 1,
> > +	.show_host_port_type = 1,
> > +	.get_host_port_state = fc_get_host_port_state,
> > +	.show_host_port_state = 1,
> > +	.show_host_symbolic_name = 1,
> > +	.dd_fcrport_size = (sizeof(struct fc_rport_libfc_priv) +
> > +				sizeof(struct qedf_rport)),
> > +	.show_rport_maxframe_size = 1,
> > +	.show_rport_supported_classes = 1,
> > +	.show_host_fabric_name = 1,
> > +	.show_starget_node_name = 1,
> > +	.show_starget_port_name = 1,
> > +	.show_starget_port_id = 1,
> > +	.set_rport_dev_loss_tmo = fc_set_rport_loss_tmo,
> > +	.show_rport_dev_loss_tmo = 1,
> > +	.get_fc_host_stats = fc_get_host_stats,
> > +	.issue_fc_host_lip = qedf_fcoe_reset,
> > +	.bsg_request = fc_lport_bsg_request,
> > +};
> > +
> > +static bool qedf_fp_has_work(struct qedf_fastpath *fp)
> > +{
> > +	struct qedf_ctx *qedf = fp->qedf;
> > +	struct global_queue *que;
> > +	struct qed_sb_info *sb_info = fp->sb_info;
> > +	struct status_block *sb = sb_info->sb_virt;
> > +	u16 prod_idx;
> > +
> > +	/* Get the pointer to the global CQ this completion is on */
> > +	que = qedf->global_queues[fp->sb_id];
> > +
> > +	rmb();
> > +
> > +	/* Get the current firmware producer index */
> > +	prod_idx = sb->pi_array[QEDF_FCOE_PARAMS_GL_RQ_PI];
> > +
> > +	return (que->cq_prod_idx != prod_idx);
> > +}
> > +
> > +/*
> > + * Interrupt handler code.
> > + */
> > +
> > +/* Process completion queue and copy CQE contents for deferred processesing
> > + *
> > + * Return true if we should wake the I/O thread, false if not.
> > + */
> > +static bool qedf_process_completions(struct qedf_fastpath *fp)
> > +{
> > +	struct qedf_ctx *qedf = fp->qedf;
> > +	struct qed_sb_info *sb_info = fp->sb_info;
> > +	struct status_block *sb = sb_info->sb_virt;
> > +	struct global_queue *que;
> > +	u16 prod_idx;
> > +	struct fcoe_cqe *cqe;
> > +	struct qedf_io_work *work;
> > +	unsigned long flags;
> > +	int num_handled = 0;
> > +	unsigned int cpu;
> > +	struct qedf_ioreq *io_req = NULL;
> > +	struct qedf_percpu_iothread_s *iothread;
> > +	u16 xid;
> > +	u16 new_cqes;
> > +	u32 comp_type;
> > +
> > +	/* Get the current firmware producer index */
> > +	prod_idx = sb->pi_array[QEDF_FCOE_PARAMS_GL_RQ_PI];
> > +
> > +	/* Get the pointer to the global CQ this completion is on */
> > +	que = qedf->global_queues[fp->sb_id];
> > +
> > +	/* Calculate the amount of new elements since last processing */
> > +	new_cqes = (prod_idx >= que->cq_prod_idx) ?
> > +	    (prod_idx - que->cq_prod_idx) :
> > +	    0x10000 - que->cq_prod_idx + prod_idx;
> > +
> > +	/* Save producer index */
> > +	que->cq_prod_idx = prod_idx;
> > +
> > +	while (new_cqes) {
> > +		fp->completions++;
> > +		num_handled++;
> > +		cqe = &que->cq[que->cq_cons_idx];
> > +
> > +		comp_type = (cqe->cqe_data >> FCOE_CQE_CQE_TYPE_SHIFT) &
> > +		    FCOE_CQE_CQE_TYPE_MASK;
> > +
> > +		/*
> > +		 * Process unsolicited CQEs directly in the interrupt handler
> > +		 * sine we need the fastpath ID
> > +		 */
> > +		if (comp_type == FCOE_UNSOLIC_CQE_TYPE) {
> > +			QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_UNSOL,
> > +			   "Unsolicated CQE.\n");
> > +			qedf_process_unsol_compl(qedf, fp->sb_id, cqe);
> > +			/*
> > +			 * Don't add a work list item.  Increment consumer
> > +			 * consumer index and move on.
> > +			 */
> > +			goto inc_idx;
> > +		}
> > +
> > +		xid = cqe->cqe_data & FCOE_CQE_TASK_ID_MASK;
> > +		io_req = &qedf->cmd_mgr->cmds[xid];
> > +
> > +		/*
> > +		 * Figure out which percpu thread we should queue this I/O
> > +		 * on.
> > +		 */
> > +		if (!io_req)
> > +			/* If there is not io_req assocated with this CQE
> > +			 * just queue it on CPU 0
> > +			 */
> > +			cpu = 0;
> > +		else {
> > +			cpu = io_req->cpu;
> > +			io_req->int_cpu = smp_processor_id();
> > +		}
> > +
> > +		work = mempool_alloc(qedf->io_mempool, GFP_ATOMIC);
> > +		if (!work) {
> > +			QEDF_WARN(&(qedf->dbg_ctx), "Could not allocate "
> > +				   "work for I/O completion.\n");
> > +			continue;
> > +		}
> > +		memset(work, 0, sizeof(struct qedf_io_work));
> > +
> > +		INIT_LIST_HEAD(&work->list);
> > +
> > +		/* Copy contents of CQE for deferred processing */
> > +		memcpy(&work->cqe, cqe, sizeof(struct fcoe_cqe));
> > +
> > +		work->qedf = fp->qedf;
> > +		work->fp = NULL; /* Only used for unsolicited frames */
> > +
> > +		iothread = &per_cpu(qedf_percpu_iothreads, cpu);
> > +		spin_lock_irqsave(&iothread->work_lock, flags);
> > +		list_add_tail(&work->list, &iothread->work_list);
> > +		spin_unlock_irqrestore(&iothread->work_lock, flags);
> > +		wake_up_process(iothread->iothread);
> > +
> > +inc_idx:
> > +		que->cq_cons_idx++;
> > +		if (que->cq_cons_idx == fp->cq_num_entries)
> > +			que->cq_cons_idx = 0;
> > +		new_cqes--;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +
> > +/* MSI-X fastpath handler code */
> > +static irqreturn_t qedf_msix_handler(int irq, void *dev_id)
> > +{
> > +	struct qedf_fastpath *fp = dev_id;
> > +
> > +	if (!fp) {
> > +		QEDF_ERR(NULL, "fp is null.\n");
> > +		return IRQ_HANDLED;
> > +	}
> > +	if (!fp->sb_info) {
> > +		QEDF_ERR(NULL, "fp->sb_info in null.");
> > +		return IRQ_HANDLED;
> > +	}
> > +
> > +	/*
> > +	 * Disable interrupts for this status block while we process new
> > +	 * completions
> > +	 */
> > +	qed_sb_ack(fp->sb_info, IGU_INT_DISABLE, 0 /*do not update*/);
> > +
> > +	while (1) {
> > +		qedf_process_completions(fp);
> > +
> > +		if (qedf_fp_has_work(fp) == 0) {
> > +			/* Update the sb information */
> > +			qed_sb_update_sb_idx(fp->sb_info);
> > +			rmb();
> > +
> > +			if (qedf_fp_has_work(fp) == 0) {
> > +				/* Re-enable interrupts */
> > +				qed_sb_ack(fp->sb_info, IGU_INT_ENABLE, 1);
> > +				return IRQ_HANDLED;
> > +			}
> > +		}
> > +	}
> > +
> > +	/* Do we ever want to break out of above loop? */
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +/* simd handler for MSI/INTa */
> > +static void qedf_simd_int_handler(void *cookie)
> > +{
> > +	/* Cookie is qedf_ctx struct */
> > +	struct qedf_ctx *qedf = (struct qedf_ctx *)cookie;
> > +
> > +	QEDF_WARN(&(qedf->dbg_ctx), "qedf=%p.\n", qedf);
> > +}
> > +
> > +#define QEDF_SIMD_HANDLER_NUM		0
> > +static void qedf_sync_free_irqs(struct qedf_ctx *qedf)
> > +{
> > +	int i;
> > +
> > +	if (qedf->int_info.msix_cnt) {
> > +		for (i = 0; i < qedf->int_info.used_cnt; i++) {
> > +			synchronize_irq(qedf->int_info.msix[i].vector);
> > +			irq_set_affinity_hint(qedf->int_info.msix[i].vector,
> > +			    NULL);
> > +			irq_set_affinity_notifier(qedf->int_info.msix[i].vector,
> > +			    NULL);
> > +			free_irq(qedf->int_info.msix[i].vector,
> > +			    &qedf->fp_array[i]);
> > +		}
> > +	} else
> > +		qed_ops->common->simd_handler_clean(qedf->cdev,
> > +		    QEDF_SIMD_HANDLER_NUM);
> > +
> > +	qedf->int_info.used_cnt = 0;
> > +	qed_ops->common->set_fp_int(qedf->cdev, 0);
> > +}
> > +
> > +static int qedf_request_msix_irq(struct qedf_ctx *qedf)
> > +{
> > +	int i, rc, cpu;
> > +
> > +	cpu = cpumask_first(cpu_online_mask);
> > +	for (i = 0; i < qedf->num_queues; i++) {
> > +		rc = request_irq(qedf->int_info.msix[i].vector,
> > +		    qedf_msix_handler, 0, "qedf", &qedf->fp_array[i]);
> > +
> > +		if (rc) {
> > +			QEDF_WARN(&(qedf->dbg_ctx), "request_irq failed.\n");
> > +			qedf_sync_free_irqs(qedf);
> > +			return rc;
> > +		}
> > +
> > +		qedf->int_info.used_cnt++;
> > +		rc = irq_set_affinity_hint(qedf->int_info.msix[i].vector,
> > +		    get_cpu_mask(cpu));
> > +		cpu = cpumask_next(cpu, cpu_online_mask);
> > +	 }
> > +
> > +	return 0;
> > +}
> > +
> Please use the irq allocation routines from hch here.

Will do.

> 
> 
> Cheers,
> 
> Hannes
> 

^ permalink raw reply

* Re: [Open-FCoE] [PATCH RFC 3/5] qedf: Add offloaded I/O request functions.
From: Chad Dupuis @ 2017-01-09 16:46 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: martin.petersen, fcoe-devel, netdev, QLogic-Storage-Upstream,
	linux-scsi, yuval.mintz
In-Reply-To: <7cfe465a-ad6c-7604-1262-d4dbf4aee525@suse.de>


On Wed, 28 Dec 2016, 9:08am -0000, Hannes Reinecke wrote:

> On 12/23/2016 08:17 PM, Dupuis, Chad wrote:
> > From: "Dupuis, Chad" <chad.dupuis@cavium.com>
> > 
> > This patch adds various I/O requests types that are handled in firmware:
> > 
> > - Normal I/O requests
> > - ABTS requests
> > - Cleanup requests
> > - Task management requests
> > 
> > It also contains:
> > 
> > - I/O request initialization
> > - Firmware completion handling
> > 
> > Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
> > Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
> > Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
> > Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
> > ---
> >  drivers/scsi/qedf/qedf_hsi.h |  427 ++++++++
> >  drivers/scsi/qedf/qedf_io.c  | 2303 ++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 2730 insertions(+)
> >  create mode 100644 drivers/scsi/qedf/qedf_hsi.h
> >  create mode 100644 drivers/scsi/qedf/qedf_io.c
> > 
> [ .. ]
> 
> > +static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd,
> > +	uint8_t tm_flags)
> > +{
> > +	struct qedf_ioreq *io_req;
> > +	struct qedf_mp_req *tm_req;
> > +	struct fcoe_task_context *task;
> > +	struct fc_frame_header *fc_hdr;
> > +	struct fcp_cmnd *fcp_cmnd;
> > +	struct qedf_ctx *qedf = fcport->qedf;
> > +	int rc = 0;
> > +	uint16_t xid;
> > +	uint32_t sid, did;
> > +	int tmo = 0;
> > +	unsigned long flags;
> > +
> > +	if (!sc_cmd) {
> > +		QEDF_ERR(&(qedf->dbg_ctx), "invalid arg\n");
> > +		return FAILED;
> > +	}
> > +
> > +	if (!(test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags))) {
> > +		QEDF_ERR(&(qedf->dbg_ctx), "fcport not offloaded\n");
> > +		rc = FAILED;
> > +		return FAILED;
> > +	}
> > +
> > +	scsi_block_requests(qedf->lport->host);
> > +
> Typically, EH commands will be executed after the scsi host is stopped
> and no commands are outstanding.
> So there's no point in issuing 'scsi_block_requests()' here.
> 

Will remove.

> Cheers,
> 
> Hannes
> 

^ permalink raw reply

* Re: [Open-FCoE] [PATCH RFC 5/5] qedf: Add FIP request handling
From: Chad Dupuis @ 2017-01-09 16:47 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: martin.petersen, fcoe-devel, netdev, QLogic-Storage-Upstream,
	linux-scsi, yuval.mintz
In-Reply-To: <a076b54b-966b-d652-7a0f-6b84aacb0ded@suse.de>


On Wed, 28 Dec 2016, 9:11am -0000, Hannes Reinecke wrote:

> On 12/23/2016 08:17 PM, Dupuis, Chad wrote:
> > From: "Dupuis, Chad" <chad.dupuis@cavium.com>
> > 
> > This patch adds handling for FIP requests and responses that are handled by
> > the driver itself and not by libfcoe.
> > 
> > Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
> > Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
> > Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
> > Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
> > ---
> >  drivers/scsi/qedf/qedf_fip.c | 267 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 267 insertions(+)
> >  create mode 100644 drivers/scsi/qedf/qedf_fip.c
> > 
> > diff --git a/drivers/scsi/qedf/qedf_fip.c b/drivers/scsi/qedf/qedf_fip.c
> > new file mode 100644
> > index 0000000..4f185c6
> > --- /dev/null
> > +++ b/drivers/scsi/qedf/qedf_fip.c
> > @@ -0,0 +1,267 @@
> > +/*
> > + *  QLogic FCoE Offload Driver
> > + *  Copyright (c) 2016 Cavium Inc.
> > + *
> > + *  This software is available under the terms of the GNU General Public License
> > + *  (GPL) Version 2, available from the file COPYING in the main directory of
> > + *  this source tree.
> > + */
> > +#include <linux/if_ether.h>
> > +#include <linux/if_vlan.h>
> > +#include "qedf.h"
> > +
> > +extern const struct qed_fcoe_ops *qed_ops;
> > +/*
> > + * FIP VLAN functions that will eventually move to libfcoe.
> > + */
> > +
> > +void qedf_fcoe_send_vlan_req(struct qedf_ctx *qedf)
> > +{
> > +	struct sk_buff *skb;
> > +	char *eth_fr;
> > +	int fr_len;
> > +	struct fip_vlan *vlan;
> > +#define MY_FIP_ALL_FCF_MACS        ((__u8[6]) { 1, 0x10, 0x18, 1, 0, 2 })
> > +	static u8 my_fcoe_all_fcfs[ETH_ALEN] = MY_FIP_ALL_FCF_MACS;
> 
> Do you support VN2VN, too?

Not currently, no.

> 
> Cheers,
> 
> Hannes
> 

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV
From: Randy Dunlap @ 2017-01-09 16:47 UTC (permalink / raw)
  To: Vivien Didelot, Florian Fainelli, netdev
  Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn, Jiri Pirko
In-Reply-To: <87d1fwurbw.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

On 01/09/17 08:32, Vivien Didelot wrote:
> Hi Randy,
> 
> Randy Dunlap <rdunlap@infradead.org> writes:
> 
>> On 01/08/17 17:18, Florian Fainelli wrote:
>>> On 01/08/2017 03:17 PM, Vivien Didelot wrote:
>>>> DSA wraps SWITCHDEV, thus select it instead of depending on it.
>>>>
>>>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>>>
>>> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>>>
>>
>> but when CONFIG_INET is not enabled, the patch causes this warning:
>>
>> warning: (NET_DSA) selects NET_SWITCHDEV which has unmet direct dependencies (NET && INET)
> 
> Thanks for spotting that! Would that be enough to change this first?
> 
>     diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
>     index 675acbf1502d..c7263b70e72b 100644
>     --- a/net/dsa/Kconfig
>     +++ b/net/dsa/Kconfig
>     @@ -1,6 +1,6 @@
>     config HAVE_NET_DSA
>             def_bool y
>     -       depends on NETDEVICES && !S390
>     +       depends on INET && NETDEVICES && !S390
> 
>     # Drivers must select NET_DSA and the appropriate tagging format

Yes, thanks.

Tested-by: Randy Dunlap <rdunlap@infradead.org>


-- 
~Randy

^ permalink raw reply

* [PATH net] tcp: do not export tcp_peer_is_proven()
From: Eric Dumazet @ 2017-01-09 16:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

After commit 1fb6f159fd21 ("tcp: add tcp_conn_request"),
tcp_peer_is_proven() no longer needs to be exported.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_metrics.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5b1c62edf95791e9d47d966c3bc61e1888..ba8f02d0f283c6eaaf14ed89103adea135093353 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -606,7 +606,6 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(tcp_peer_is_proven);
 
 void tcp_fetch_timewait_stamp(struct sock *sk, struct dst_entry *dst)
 {

^ permalink raw reply related

* Re: [PATCH] net: change init_inodecache() return void
From: David Miller @ 2017-01-09 17:05 UTC (permalink / raw)
  To: cugyly; +Cc: netdev, Linyu.Yuan
In-Reply-To: <1483780711-5759-1-git-send-email-cugyly@163.com>

From: yuan linyu <cugyly@163.com>
Date: Sat,  7 Jan 2017 17:18:31 +0800

> From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
> 
> sock_init() call it but not check it's return value,
> so change it to void return and add an internal BUG_ON() check.
> 
> Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH] [v2] net: qcom/emac: add ethtool support
From: David Miller @ 2017-01-09 17:08 UTC (permalink / raw)
  To: timur; +Cc: f.fainelli, netdev, alokc
In-Reply-To: <1483738981-31019-1-git-send-email-timur@codeaurora.org>

From: Timur Tabi <timur@codeaurora.org>
Date: Fri,  6 Jan 2017 15:43:01 -0600

> Add support for some ethtool methods: get/set link settings, get/set
> message level, get statistics, get link status, get ring params, get
> pause params, and restart autonegotiation.
> 
> The code to collect the hardware statistics is moved into its own
> function so that it can be used by "get statistics" method.
> 
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> ---
> 
> Notes:
>     I don't trust my implementation of emac_get_pauseparam.  I feel like
>     I'm missing something.
>     
>     v2: added emac_get_pauseparam and emac_get_ringparam

This doesn't apply cleanly to net-next, please respin.

^ permalink raw reply

* net/ipv6: use-after-free in sock_wfree
From: Andrey Konovalov @ 2017-01-09 17:08 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, Eric Dumazet, syzkaller

[-- Attachment #1: Type: text/plain, Size: 5712 bytes --]

Hi!

I've got the following error report while running the syzkaller fuzzer.

On commit a121103c922847ba5010819a3f250f1f7fc84ab8 (4.10-rc3).

A reproducer is attached.

==================================================================
BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
Read of size 8 at addr ffff880062da0060 by task a.out/4140

page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
page dumped because: kasan: bad access detected

CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 describe_address mm/kasan/report.c:262
 kasan_report_error+0x121/0x560 mm/kasan/report.c:370
 kasan_report mm/kasan/report.c:392
 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
 sock_flag ./arch/x86/include/asm/bitops.h:324
 sock_wfree+0x118/0x120 net/core/sock.c:1631
 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put ./include/net/inet_frag.h:133
 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn ./include/linux/netfilter.h:102
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook ./include/linux/netfilter.h:212
 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613
 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x620 net/socket.c:848
 new_sync_write fs/read_write.c:499
 __vfs_write+0x483/0x760 fs/read_write.c:512
 vfs_write+0x187/0x530 fs/read_write.c:560
 SYSC_write fs/read_write.c:607
 SyS_write+0xfb/0x230 fs/read_write.c:599
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
RIP: 0033:0x7ff26e6f5b79
RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003

The buggy address belongs to the object at ffff880062da0000
 which belongs to the cache RAWv6 of size 1504
The buggy address ffff880062da0060 is located 96 bytes inside
 of 1504-byte region [ffff880062da0000, ffff880062da05e0)

Freed by task 4113:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
 slab_free_hook mm/slub.c:1352
 slab_free_freelist_hook mm/slub.c:1374
 slab_free mm/slub.c:2951
 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
 sk_prot_free net/core/sock.c:1377
 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sk_free+0x23/0x30 net/core/sock.c:1479
 sock_put ./include/net/sock.h:1638
 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
 sock_release+0x8d/0x1e0 net/socket.c:599
 sock_close+0x16/0x20 net/socket.c:1063
 __fput+0x332/0x7f0 fs/file_table.c:208
 ____fput+0x15/0x20 fs/file_table.c:244
 task_work_run+0x19b/0x270 kernel/task_work.c:116
 exit_task_work ./include/linux/task_work.h:21
 do_exit+0x186b/0x2800 kernel/exit.c:839
 do_group_exit+0x149/0x420 kernel/exit.c:943
 SYSC_exit_group kernel/exit.c:954
 SyS_exit_group+0x1d/0x20 kernel/exit.c:952
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

Allocated by task 4115:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x43/0xd0 mm/kasan/kasan.c:502
 set_track mm/kasan/kasan.c:514
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
 slab_post_alloc_hook mm/slab.h:432
 slab_alloc_node mm/slub.c:2708
 slab_alloc mm/slub.c:2716
 kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
 sk_alloc+0x105/0x1010 net/core/sock.c:1396
 inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
 __sock_create+0x4f6/0x880 net/socket.c:1199
 sock_create net/socket.c:1239
 SYSC_socket net/socket.c:1269
 SyS_socket+0xf9/0x230 net/socket.c:1249
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203

Memory state around the buggy address:
 ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
 ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

[-- Attachment #2: ipv6-wfree-uaf-poc.c --]
[-- Type: text/x-csrc, Size: 9719 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_ioctl
#define __NR_ioctl 16
#endif
#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_connect
#define __NR_connect 42
#endif
#ifndef __NR_bind
#define __NR_bind 49
#endif
#ifndef __NR_sendto
#define __NR_sendto 44
#endif
#ifndef __NR_recvfrom
#define __NR_recvfrom 45
#endif
#ifndef __NR_write
#define __NR_write 1
#endif

#define _GNU_SOURCE

#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>

#include <linux/capability.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <linux/sched.h>
#include <net/if_arp.h>

#include <assert.h>
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <grp.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const int kFailStatus = 67;
const int kErrorStatus = 68;
const int kRetryStatus = 69;

__attribute__((noreturn)) void doexit(int status)
{
  syscall(__NR_exit_group, status);
  volatile unsigned i = 0;
  for (i = 0;; i++) {
  }
}

__attribute__((noreturn)) void fail(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(e == ENOMEM ? kRetryStatus : kFailStatus);
}

__attribute__((noreturn)) void exitf(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(kRetryStatus);
}

static int flag_debug;

void debug(const char* msg, ...)
{
  if (!flag_debug)
    return;
  va_list args;
  va_start(args, msg);
  vfprintf(stdout, msg, args);
  va_end(args);
  fflush(stdout);
}

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  doexit(sig);
  for (;;) {
  }
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  }
}

static void setup_main_process(uint64_t pid, bool enable_tun)
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = SIG_IGN;
  syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
  syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
  install_segv_handler();

  char tmpdir_template[] = "./syzkaller.XXXXXX";
  char* tmpdir = mkdtemp(tmpdir_template);
  if (!tmpdir)
    fail("failed to mkdtemp");
  if (chmod(tmpdir, 0777))
    fail("failed to chmod");
  if (chdir(tmpdir))
    fail("failed to chdir");
}

static void loop();

static void sandbox_common()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  setsid();

  struct rlimit rlim;
  rlim.rlim_cur = rlim.rlim_max = 128 << 20;
  setrlimit(RLIMIT_AS, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_FSIZE, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 0;
  setrlimit(RLIMIT_CORE, &rlim);

  unshare(CLONE_NEWNS);
  unshare(CLONE_NEWIPC);
  unshare(CLONE_IO);
}

static int do_sandbox_none()
{
  int pid = fork();
  if (pid)
    return pid;
  sandbox_common();
  loop();
  doexit(1);
}

long r[56];
void* thr(void* arg)
{
  switch ((long)arg) {
  case 0:
    r[0] =
        execute_syscall(__NR_mmap, 0x20000000ul, 0xf55000ul, 0x3ul,
                        0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);
    break;
  case 1:
    r[1] = execute_syscall(__NR_socket, 0xaul, 0x3ul, 0x2cul, 0, 0, 0,
                           0, 0, 0);
    break;
  case 2:
    NONFAILING(*(uint16_t*)0x20016000 = (uint16_t)0xa);
    NONFAILING(*(uint16_t*)0x20016002 = (uint16_t)0x204e);
    NONFAILING(*(uint32_t*)0x20016004 = (uint32_t)0x0);
    NONFAILING(*(uint64_t*)0x20016008 = (uint64_t)0x0);
    NONFAILING(*(uint64_t*)0x20016010 = (uint64_t)0x0);
    NONFAILING(*(uint32_t*)0x20016018 = (uint32_t)0x0);
    r[8] = execute_syscall(__NR_connect, r[1], 0x20016000ul, 0x20ul, 0,
                           0, 0, 0, 0, 0);
    break;
  case 3:
    NONFAILING(*(uint16_t*)0x20373000 = (uint16_t)0x2);
    NONFAILING(*(uint16_t*)0x20373002 = (uint16_t)0x204e);
    NONFAILING(*(uint8_t*)0x20373004 = (uint8_t)0xc0);
    NONFAILING(*(uint8_t*)0x20373005 = (uint8_t)0xa8);
    NONFAILING(*(uint8_t*)0x20373006 = (uint8_t)0xda);
    NONFAILING(*(uint8_t*)0x20373007 = (uint8_t)0xaa);
    NONFAILING(*(uint8_t*)0x20373008 = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20373009 = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300a = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300b = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300c = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300d = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300e = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x2037300f = (uint8_t)0x0);
    r[23] = execute_syscall(__NR_bind, 0xfffffffffffffffful,
                            0x20373000ul, 0x10ul, 0, 0, 0, 0, 0, 0);
    break;
  case 4:
    NONFAILING(*(uint16_t*)0x20f4a000 = (uint16_t)0x0);
    NONFAILING(*(uint16_t*)0x20f4a002 = (uint16_t)0x204e);
    NONFAILING(*(uint32_t*)0x20f4a004 = (uint32_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a008 = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a009 = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00a = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00b = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00c = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00d = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00e = (uint8_t)0x0);
    NONFAILING(*(uint8_t*)0x20f4a00f = (uint8_t)0x0);
    r[35] = execute_syscall(__NR_connect, 0xfffffffffffffffful,
                            0x20f4a000ul, 0x10ul, 0, 0, 0, 0, 0, 0);
    break;
  case 5:
    NONFAILING(*(uint16_t*)0x20f4cfe0 = (uint16_t)0xa);
    NONFAILING(*(uint16_t*)0x20f4cfe2 = (uint16_t)0x204e);
    NONFAILING(*(uint32_t*)0x20f4cfe4 = (uint32_t)0x0);
    NONFAILING(*(uint64_t*)0x20f4cfe8 = (uint64_t)0x0);
    NONFAILING(*(uint64_t*)0x20f4cff0 = (uint64_t)0x100000000000000);
    NONFAILING(*(uint32_t*)0x20f4cff8 = (uint32_t)0x5);
    r[42] =
        execute_syscall(__NR_sendto, 0xfffffffffffffffful, 0x20006000ul,
                        0x0ul, 0x0ul, 0x20f4cfe0ul, 0x20ul, 0, 0, 0);
    break;
  case 6:
    r[43] = execute_syscall(__NR_recvfrom, 0xfffffffffffffffful,
                            0x20144f28ul, 0x0ul, 0x10000ul,
                            0x20f4e000ul, 0x0ul, 0, 0, 0);
    break;
  case 7:
    r[44] = execute_syscall(__NR_socket, 0x1ful, 0x5ul, 0x2ul, 0, 0, 0,
                            0, 0, 0);
    break;
  case 8:
    r[45] = execute_syscall(__NR_write, r[1], 0x20aa4fdaul, 0xfffful, 0,
                            0, 0, 0, 0, 0);
    break;
  case 9:
    NONFAILING(*(uint32_t*)0x20f54000 = (uint32_t)0x0);
    NONFAILING(*(uint32_t*)0x20f54004 = (uint32_t)0x0);
    NONFAILING(*(uint64_t*)0x20f54008 = (uint64_t)0x0);
    r[49] =
        execute_syscall(__NR_ioctl, 0xfffffffffffffffful, 0xc010640bul,
                        0x20f54000ul, 0, 0, 0, 0, 0, 0);
    break;
  case 10:
    NONFAILING(*(uint32_t*)0x20f54000 = (uint32_t)0x0);
    NONFAILING(*(uint32_t*)0x20f54004 = (uint32_t)0x0);
    NONFAILING(*(uint64_t*)0x20f54008 = (uint64_t)0xfc51);
    r[53] =
        execute_syscall(__NR_ioctl, 0xfffffffffffffffful, 0xc010640bul,
                        0x20f54000ul, 0, 0, 0, 0, 0, 0);
    break;
  case 11:
    NONFAILING(memcpy((void*)0x20f50fe1, "\x1f\x00\x00\x80\x01\x00\x00"
                                         "\x16\x00\x00\x00\x9a\xc7\x00"
                                         "\x00\x06",
                      16));
    r[55] = execute_syscall(__NR_write, r[1], 0x20f50fe1ul, 0x10ul, 0,
                            0, 0, 0, 0, 0);
    break;
  }
  return 0;
}

void loop()
{
  long i;
  pthread_t th[24];

  memset(r, -1, sizeof(r));
  srand(getpid());
  for (i = 0; i < 12; i++) {
    pthread_create(&th[i], 0, thr, (void*)i);
    usleep(10000);
  }
  for (i = 0; i < 12; i++) {
    pthread_create(&th[12 + i], 0, thr, (void*)i);
    if (rand() % 2)
      usleep(rand() % 10000);
  }
  usleep(100000);
}

int main()
{
  setup_main_process(0, false);
  int pid = do_sandbox_none();
  int status = 0;
  while (waitpid(pid, &status, __WALL) != pid) {
  }
  return 0;
}

^ permalink raw reply

* Re: [PATCH] net: ibm: ehea: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-09 17:10 UTC (permalink / raw)
  To: tremyfr; +Cc: dougmill, netdev, linux-kernel
In-Reply-To: <1483807667-18264-1-git-send-email-tremyfr@gmail.com>

From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat,  7 Jan 2017 17:47:47 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ibm: emac: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-09 17:10 UTC (permalink / raw)
  To: tremyfr
  Cc: ivan, jarod, mugunthanvnm, felipe.balbi, fw, mpe, netdev,
	linux-kernel
In-Reply-To: <1483824747-6405-1-git-send-email-tremyfr@gmail.com>

From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat,  7 Jan 2017 22:32:27 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ibm: ibmveth: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-09 17:10 UTC (permalink / raw)
  To: tremyfr; +Cc: tlfalcon, linux-kernel, paulus, netdev, linuxppc-dev
In-Reply-To: <1483824913-6989-1-git-send-email-tremyfr@gmail.com>

From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat,  7 Jan 2017 22:35:13 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ibm: ibmvnic: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-09 17:10 UTC (permalink / raw)
  To: tremyfr
  Cc: tlfalcon, jallen, benh, paulus, mpe, netdev, linuxppc-dev,
	linux-kernel
In-Reply-To: <1483825049-7501-1-git-send-email-tremyfr@gmail.com>

From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat,  7 Jan 2017 22:37:29 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: intel: e100: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-09 17:10 UTC (permalink / raw)
  To: tremyfr; +Cc: jeffrey.t.kirsher, intel-wired-lan, netdev, linux-kernel
In-Reply-To: <1483827487-20355-1-git-send-email-tremyfr@gmail.com>

From: Philippe Reynes <tremyfr@gmail.com>
Date: Sat,  7 Jan 2017 23:18:07 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/4] net: ethernet: ti: cpsw: correct common res usage
From: Grygorii Strashko @ 2017-01-09 17:11 UTC (permalink / raw)
  To: Ivan Khoronzhuk, netdev, mugunthanvnm; +Cc: linux-omap, linux-kernel
In-Reply-To: <1483893663-15673-1-git-send-email-ivan.khoronzhuk@linaro.org>

Hi Ivan,

On 01/08/2017 10:40 AM, Ivan Khoronzhuk wrote:
> This series is intended to remove unneeded redundancies connected with
> common resource usage function.
>
> Based on net-next/master
> Tested on am572x idk
>
> Ivan Khoronzhuk (4):
>   net: ethernet: ti: cpsw: remove dual check from common res usage
>     function
>   net: ethernet: ti: cpsw: don't disable interrupts in ndo_open
>   net: ethernet: ti: cpsw: don't duplicate ndev_running
>   net: ethernet: ti: cpsw: don't duplicate common res in rx handler
>

thanks for the patches - I'll need some time to review them.

>  drivers/net/ethernet/ti/cpsw.c | 57 ++++++++++++++----------------------------
>  1 file changed, 19 insertions(+), 38 deletions(-)
>

-- 
regards,
-grygorii

^ permalink raw reply

* Re: net/ipv6: use-after-free in sock_wfree
From: Andrey Konovalov @ 2017-01-09 17:11 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, Eric Dumazet, syzkaller
In-Reply-To: <CAAeHK+yfNdNTkgCbUGbdRBM9bB=2DhGv1ZPCWm44CGL7zD=TLg@mail.gmail.com>

On Mon, Jan 9, 2017 at 6:08 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> Hi!
>
> I've got the following error report while running the syzkaller fuzzer.
>
> On commit a121103c922847ba5010819a3f250f1f7fc84ab8 (4.10-rc3).
>
> A reproducer is attached.
>
> ==================================================================
> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
> Read of size 8 at addr ffff880062da0060 by task a.out/4140
>
> page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
> index:0x0 compound_mapcount: 0
> flags: 0x100000000008100(slab|head)
> raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
> raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
> page dumped because: kasan: bad access detected
>
> CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15
>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>  describe_address mm/kasan/report.c:262
>  kasan_report_error+0x121/0x560 mm/kasan/report.c:370
>  kasan_report mm/kasan/report.c:392
>  __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
>  sock_flag ./arch/x86/include/asm/bitops.h:324
>  sock_wfree+0x118/0x120 net/core/sock.c:1631
>  skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
>  skb_release_all+0x15/0x60 net/core/skbuff.c:668
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:684
>  kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
>  inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
>  inet_frag_put ./include/net/inet_frag.h:133
>  nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
>  ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
>  nf_hook_entry_hookfn ./include/linux/netfilter.h:102
>  nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
>  nf_hook ./include/linux/netfilter.h:212
>  __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
>  ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
>  ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
>  rawv6_push_pending_frames net/ipv6/raw.c:613
>  rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  sock_write_iter+0x326/0x620 net/socket.c:848
>  new_sync_write fs/read_write.c:499
>  __vfs_write+0x483/0x760 fs/read_write.c:512
>  vfs_write+0x187/0x530 fs/read_write.c:560
>  SYSC_write fs/read_write.c:607
>  SyS_write+0xfb/0x230 fs/read_write.c:599
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
> RIP: 0033:0x7ff26e6f5b79
> RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
> RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
> RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
>
> The buggy address belongs to the object at ffff880062da0000
>  which belongs to the cache RAWv6 of size 1504
> The buggy address ffff880062da0060 is located 96 bytes inside
>  of 1504-byte region [ffff880062da0000, ffff880062da05e0)
>
> Freed by task 4113:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514
>  kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
>  slab_free_hook mm/slub.c:1352
>  slab_free_freelist_hook mm/slub.c:1374
>  slab_free mm/slub.c:2951
>  kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
>  sk_prot_free net/core/sock.c:1377
>  __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>  __sk_free+0x57/0x230 net/core/sock.c:1468
>  sk_free+0x23/0x30 net/core/sock.c:1479
>  sock_put ./include/net/sock.h:1638
>  sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
>  rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
>  sock_release+0x8d/0x1e0 net/socket.c:599
>  sock_close+0x16/0x20 net/socket.c:1063
>  __fput+0x332/0x7f0 fs/file_table.c:208
>  ____fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x19b/0x270 kernel/task_work.c:116
>  exit_task_work ./include/linux/task_work.h:21
>  do_exit+0x186b/0x2800 kernel/exit.c:839
>  do_group_exit+0x149/0x420 kernel/exit.c:943
>  SYSC_exit_group kernel/exit.c:954
>  SyS_exit_group+0x1d/0x20 kernel/exit.c:952
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
>
> Allocated by task 4115:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
>  slab_post_alloc_hook mm/slab.h:432
>  slab_alloc_node mm/slub.c:2708
>  slab_alloc mm/slub.c:2716
>  kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
>  sk_alloc+0x105/0x1010 net/core/sock.c:1396
>  inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
>  __sock_create+0x4f6/0x880 net/socket.c:1199
>  sock_create net/socket.c:1239
>  SYSC_socket net/socket.c:1269
>  SyS_socket+0xf9/0x230 net/socket.c:1249
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
>
> Memory state around the buggy address:
>  ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                        ^
>  ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================

Sometimes this reproducer leads to another report:

INFO: rcu_sched self-detected stall on CPU
1-...: (1 GPs behind) idle=ead/140000000000001/0 softirq=8122/8123 fqs=6497
(t=26000 jiffies g=3021 c=3020 q=345)
Task dump for CPU 1:
syz-executor    R  running task    18904  3943   3941 0x0000000c
Call Trace:
 <IRQ>
 sched_show_task+0x3fa/0x560 kernel/sched/core.c:5217
 dump_cpu_task+0x71/0x90 kernel/sched/core.c:8822
 rcu_dump_cpu_stacks+0x318/0x35e kernel/rcu/tree.c:1290
 print_cpu_stall+0x39f/0x6e0 kernel/rcu/tree.c:1434
 check_cpu_stall.isra.63+0x702/0xe80 kernel/rcu/tree.c:1502
 __rcu_pending kernel/rcu/tree.c:3469
 rcu_pending kernel/rcu/tree.c:3533
 rcu_check_callbacks+0x27f/0xda0 kernel/rcu/tree.c:2867
 update_process_times+0x30/0x60 kernel/time/timer.c:1612
 tick_sched_handle.isra.18+0xb3/0xe0 kernel/time/tick-sched.c:151
 tick_sched_timer+0x72/0x120 kernel/time/tick-sched.c:1158
 __run_hrtimer kernel/time/hrtimer.c:1238
 __hrtimer_run_queues+0x38c/0xf80 kernel/time/hrtimer.c:1302
 hrtimer_interrupt+0x1ab/0x5c0 kernel/time/hrtimer.c:1336
 local_apic_timer_interrupt+0x6f/0xe0 arch/x86/kernel/apic/apic.c:936
 smp_apic_timer_interrupt+0x71/0xa0 arch/x86/kernel/apic/apic.c:960
 apic_timer_interrupt+0x93/0xa0
RIP: 0010:__sanitizer_cov_trace_pc+0x46/0x60 kernel/kcov.c:93
RSP: 0018:ffff88006ad66a98 EFLAGS: 00000216 ORIG_RAX: ffffffffffffff10
RAX: 0000000000004000 RBX: ffff880068f4e500 RCX: ffffc90000e6c000
RDX: 0000000000004000 RSI: ffffffff83d652b7 RDI: ffff880064f09f51
RBP: ffff88006ad66a98 R08: ffffed000d633ca1 R09: ffffed000d633ca1
R10: 0000000000000001 R11: ffffed000d633ca0 R12: ffff880064f00020
R13: ffff88006ad66c28 R14: 0000000000009f38 R15: dffffc0000000000
 </IRQ>
 _decode_session6+0x8a7/0x13f0 net/ipv6/xfrm6_policy.c:147
 __xfrm_decode_session+0x63/0x100 net/xfrm/xfrm_policy.c:2475
 xfrm_decode_session_reverse ./include/net/xfrm.h:1117
 icmpv6_route_lookup+0x410/0x780 net/ipv6/icmp.c:362
 icmp6_send+0x1611/0x29b0 net/ipv6/icmp.c:515
 icmpv6_send+0x12e/0x260 net/ipv6/ip6_icmp.c:42
 ip6_fragment+0x583/0x3920 net/ipv6/ip6_output.c:864
 ip6_finish_output+0x322/0x960 net/ipv6/ip6_output.c:146
 NF_HOOK_COND ./include/linux/netfilter.h:246
 ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
 dst_output ./include/net/dst.h:501
 ip6_local_out+0x95/0x170 net/ipv6/output_core.c:172
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613
 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x620 net/socket.c:848
 new_sync_write fs/read_write.c:499
 __vfs_write+0x483/0x760 fs/read_write.c:512
 vfs_write+0x187/0x530 fs/read_write.c:560
 SYSC_write fs/read_write.c:607
 SyS_write+0xfb/0x230 fs/read_write.c:599
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
RIP: 0033:0x4421d9
RSP: 002b:00007f090e289b58 EFLAGS: 00000296 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004421d9
RDX: 000000000000ffff RSI: 0000000020aa4fda RDI: 0000000000000005
RBP: 00000000006de8c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000700000
R13: ffffffffffffffff R14: 0000000020f4a000 R15: 0000000000000010

^ permalink raw reply

* net/atm: warning in alloc_tx/__might_sleep
From: Andrey Konovalov @ 2017-01-09 17:20 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Al Viro
  Cc: Dmitry Vyukov, Kostya Serebryany, Eric Dumazet, syzkaller

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

Hi!

I've got the following error report while running the syzkaller fuzzer.

On commit a121103c922847ba5010819a3f250f1f7fc84ab8 (4.10-rc3).

A reproducer is attached.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4114 at kernel/sched/core.c:7737 __might_sleep+0x149/0x1a0
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff813fcb22>] prepare_to_wait+0x182/0x530
Modules linked in:
CPU: 0 PID: 4114 Comm: a.out Not tainted 4.10.0-rc3+ #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 __warn+0x19f/0x1e0 kernel/panic.c:547
 warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:562
 __might_sleep+0x149/0x1a0 kernel/sched/core.c:7732
 slab_pre_alloc_hook mm/slab.h:408
 slab_alloc_node mm/slub.c:2634
 kmem_cache_alloc_node+0x14a/0x280 mm/slub.c:2744
 __alloc_skb+0x10f/0x800 net/core/skbuff.c:219
 alloc_skb ./include/linux/skbuff.h:926
 alloc_tx net/atm/common.c:75
 vcc_sendmsg+0x5e8/0x1010 net/atm/common.c:609
 sock_sendmsg_nosec net/socket.c:635
 sock_sendmsg+0xca/0x110 net/socket.c:645
 ___sys_sendmsg+0x9d2/0xae0 net/socket.c:1985
 __sys_sendmsg+0x138/0x320 net/socket.c:2019
 SYSC_sendmsg net/socket.c:2030
 SyS_sendmsg+0x2d/0x50 net/socket.c:2026
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
RIP: 0033:0x7fcbacfddb79
RSP: 002b:00007ffed8b5a7b8 EFLAGS: 00000206 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffed8b5a950 RCX: 00007fcbacfddb79
RDX: 000000000000c000 RSI: 0000000020002000 RDI: 0000000000000003
RBP: 0000000000400af0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffed8b5a950 R14: 0000000000000000 R15: 0000000000000000
---[ end trace 9edf2da84d8112da ]---
atm:sigd_send: bad message type 0

[-- Attachment #2: sched-sleep-warn-poc.c --]
[-- Type: text/x-csrc, Size: 5526 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_ioctl
#define __NR_ioctl 16
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif

#define _GNU_SOURCE

#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>

#include <linux/capability.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <linux/sched.h>
#include <net/if_arp.h>

#include <assert.h>
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <grp.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const int kFailStatus = 67;
const int kErrorStatus = 68;
const int kRetryStatus = 69;

__attribute__((noreturn)) void doexit(int status)
{
  syscall(__NR_exit_group, status);
  volatile unsigned i;
  for (i = 0;; i++) {
  }
}

__attribute__((noreturn)) void fail(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(e == ENOMEM ? kRetryStatus : kFailStatus);
}

__attribute__((noreturn)) void exitf(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  doexit(kRetryStatus);
}

static int flag_debug;

void debug(const char* msg, ...)
{
  if (!flag_debug)
    return;
  va_list args;
  va_start(args, msg);
  vfprintf(stdout, msg, args);
  va_end(args);
  fflush(stdout);
}

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  doexit(sig);
  for (;;) {
  }
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  }
}

static void setup_main_process(uint64_t pid, bool enable_tun)
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = SIG_IGN;
  syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
  syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
  install_segv_handler();

  char tmpdir_template[] = "./syzkaller.XXXXXX";
  char* tmpdir = mkdtemp(tmpdir_template);
  if (!tmpdir)
    fail("failed to mkdtemp");
  if (chmod(tmpdir, 0777))
    fail("failed to chmod");
  if (chdir(tmpdir))
    fail("failed to chdir");
}

static void loop();

static void sandbox_common()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  setsid();

  struct rlimit rlim;
  rlim.rlim_cur = rlim.rlim_max = 128 << 20;
  setrlimit(RLIMIT_AS, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_FSIZE, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 0;
  setrlimit(RLIMIT_CORE, &rlim);

  unshare(CLONE_NEWNS);
  unshare(CLONE_NEWIPC);
  unshare(CLONE_IO);
}

static int do_sandbox_none()
{
  int pid = fork();
  if (pid)
    return pid;
  sandbox_common();
  loop();
  doexit(1);
}

long r[11];
void loop()
{
  memset(r, -1, sizeof(r));
  r[0] = execute_syscall(__NR_mmap, 0x20000000ul, 0x4000ul, 0x3ul,
                         0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);
  r[1] = execute_syscall(__NR_socket, 0x8ul, 0x0ul, 0x0ul, 0, 0, 0, 0,
                         0, 0);
  r[2] = execute_syscall(__NR_ioctl, r[1], 0x61f0ul, 0x20003000ul, 0, 0,
                         0, 0, 0, 0);
  NONFAILING(*(uint64_t*)0x20002000 = (uint64_t)0x0);
  NONFAILING(*(uint32_t*)0x20002008 = (uint32_t)0x0);
  NONFAILING(*(uint64_t*)0x20002010 = (uint64_t)0x20002000);
  NONFAILING(*(uint64_t*)0x20002018 = (uint64_t)0x2);
  NONFAILING(*(uint64_t*)0x20002020 = (uint64_t)0x0);
  NONFAILING(*(uint64_t*)0x20002028 = (uint64_t)0x0);
  NONFAILING(*(uint32_t*)0x20002030 = (uint32_t)0x0);
  r[10] = execute_syscall(__NR_sendmsg, r[1], 0x20002000ul, 0xc000ul, 0,
                          0, 0, 0, 0, 0);
}
int main()
{
  setup_main_process(0, false);
  int pid = do_sandbox_none();
  int status = 0;
  while (waitpid(pid, &status, __WALL) != pid) {
  }
  return 0;
}

^ permalink raw reply

* Re: net/ipv6: use-after-free in sock_wfree
From: Eric Dumazet @ 2017-01-09 17:21 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+yiziBBxw3JMqec3c7aLXjk6ddL2Xh1xufA3sp5ja=_TA@mail.gmail.com>

On Mon, Jan 9, 2017 at 9:11 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Mon, Jan 9, 2017 at 6:08 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
>> Hi!
>>
>> I've got the following error report while running the syzkaller fuzzer.
>>
>> On commit a121103c922847ba5010819a3f250f1f7fc84ab8 (4.10-rc3).
>>
>> A reproducer is attached.
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
>> Read of size 8 at addr ffff880062da0060 by task a.out/4140
>>
>> page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
>> index:0x0 compound_mapcount: 0
>> flags: 0x100000000008100(slab|head)
>> raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
>> raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:15
>>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>>  describe_address mm/kasan/report.c:262
>>  kasan_report_error+0x121/0x560 mm/kasan/report.c:370
>>  kasan_report mm/kasan/report.c:392
>>  __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
>>  sock_flag ./arch/x86/include/asm/bitops.h:324
>>  sock_wfree+0x118/0x120 net/core/sock.c:1631
>>  skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
>>  skb_release_all+0x15/0x60 net/core/skbuff.c:668
>>  __kfree_skb+0x15/0x20 net/core/skbuff.c:684
>>  kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
>>  inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
>>  inet_frag_put ./include/net/inet_frag.h:133
>>  nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
>>  ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
>>  nf_hook_entry_hookfn ./include/linux/netfilter.h:102
>>  nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
>>  nf_hook ./include/linux/netfilter.h:212
>>  __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
>>  ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
>>  ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
>>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
>>  rawv6_push_pending_frames net/ipv6/raw.c:613
>>  rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
>>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>>  sock_sendmsg_nosec net/socket.c:635
>>  sock_sendmsg+0xca/0x110 net/socket.c:645
>>  sock_write_iter+0x326/0x620 net/socket.c:848
>>  new_sync_write fs/read_write.c:499
>>  __vfs_write+0x483/0x760 fs/read_write.c:512
>>  vfs_write+0x187/0x530 fs/read_write.c:560
>>  SYSC_write fs/read_write.c:607
>>  SyS_write+0xfb/0x230 fs/read_write.c:599
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
>> RIP: 0033:0x7ff26e6f5b79
>> RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
>> RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
>> RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
>> R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
>>
>> The buggy address belongs to the object at ffff880062da0000
>>  which belongs to the cache RAWv6 of size 1504
>> The buggy address ffff880062da0060 is located 96 bytes inside
>>  of 1504-byte region [ffff880062da0000, ffff880062da05e0)
>>
>> Freed by task 4113:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>>  set_track mm/kasan/kasan.c:514
>>  kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
>>  slab_free_hook mm/slub.c:1352
>>  slab_free_freelist_hook mm/slub.c:1374
>>  slab_free mm/slub.c:2951
>>  kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
>>  sk_prot_free net/core/sock.c:1377
>>  __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
>>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>>  __sk_free+0x57/0x230 net/core/sock.c:1468
>>  sk_free+0x23/0x30 net/core/sock.c:1479
>>  sock_put ./include/net/sock.h:1638
>>  sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
>>  rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
>>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
>>  sock_release+0x8d/0x1e0 net/socket.c:599
>>  sock_close+0x16/0x20 net/socket.c:1063
>>  __fput+0x332/0x7f0 fs/file_table.c:208
>>  ____fput+0x15/0x20 fs/file_table.c:244
>>  task_work_run+0x19b/0x270 kernel/task_work.c:116
>>  exit_task_work ./include/linux/task_work.h:21
>>  do_exit+0x186b/0x2800 kernel/exit.c:839
>>  do_group_exit+0x149/0x420 kernel/exit.c:943
>>  SYSC_exit_group kernel/exit.c:954
>>  SyS_exit_group+0x1d/0x20 kernel/exit.c:952
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
>>
>> Allocated by task 4115:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>>  set_track mm/kasan/kasan.c:514
>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
>>  slab_post_alloc_hook mm/slab.h:432
>>  slab_alloc_node mm/slub.c:2708
>>  slab_alloc mm/slub.c:2716
>>  kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
>>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
>>  sk_alloc+0x105/0x1010 net/core/sock.c:1396
>>  inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
>>  __sock_create+0x4f6/0x880 net/socket.c:1199
>>  sock_create net/socket.c:1239
>>  SYSC_socket net/socket.c:1269
>>  SyS_socket+0xf9/0x230 net/socket.c:1249
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
>>
>> Memory state around the buggy address:
>>  ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>  ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                        ^
>>  ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
>
> Sometimes this reproducer leads to another report:

Looks very similar to issue fixed in 8282f27449bf15548cb82c77b6e04ee0ab827bdc

Could you try :

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 9948b5ce52dad3a823edede517f17069bd7226dc..986d4ca38832b17703b09e50209ec133885c7276
100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -589,6 +589,7 @@ int nf_ct_frag6_gather(struct net *net, struct
sk_buff *skb, u32 user)
        hdr = ipv6_hdr(skb);
        fhdr = (struct frag_hdr *)skb_transport_header(skb);

+       skb_orphan(skb);
        fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
                     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
        if (fq == NULL) {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 6b78bab27755b2758f3c8ecd5b9c6d61615af4b6..fdec2a4cc559ab956c58d26535bdf010c0a8964a
100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -367,7 +367,6 @@ static int handle_fragments(struct net *net,
struct sw_flow_key *key,
        } else if (key->eth.type == htons(ETH_P_IPV6)) {
                enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;

-               skb_orphan(skb);
                memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
                err = nf_ct_frag6_gather(net, skb, user);
                if (err) {

^ permalink raw reply

* Re: [PATCH 3/4] net: ethernet: ti: cpsw: don't duplicate ndev_running
From: Grygorii Strashko @ 2017-01-09 17:25 UTC (permalink / raw)
  To: Ivan Khoronzhuk, netdev, mugunthanvnm; +Cc: linux-omap, linux-kernel
In-Reply-To: <1483893663-15673-4-git-send-email-ivan.khoronzhuk@linaro.org>



On 01/08/2017 10:41 AM, Ivan Khoronzhuk wrote:
> No need to create additional vars to identify if interface is running.
> So simplify code by removing redundant var and checking usage counter
> instead.
> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 40d7fc9..daae87f 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -357,7 +357,6 @@ struct cpsw_slave {
>  	struct phy_device		*phy;
>  	struct net_device		*ndev;
>  	u32				port_vlan;
> -	u32				open_stat;
>  };
>  
>  static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
> @@ -1241,7 +1240,7 @@ static int cpsw_common_res_usage_state(struct cpsw_common *cpsw)
>  	u32 usage_count = 0;
>  
>  	for (i = 0; i < cpsw->data.slaves; i++)
> -		if (cpsw->slaves[i].open_stat)
> +		if (netif_running(cpsw->slaves[i].ndev))
>  			usage_count++;

Not sure this will work as you expected, but may be I've missed smth :(

code in static int __dev_open(struct net_device *dev)
..
	set_bit(__LINK_STATE_START, &dev->state);

	if (ops->ndo_validate_addr)
		ret = ops->ndo_validate_addr(dev);

	if (!ret && ops->ndo_open)
		ret = ops->ndo_open(dev);

	netpoll_poll_enable(dev);

	if (ret)
		clear_bit(__LINK_STATE_START, &dev->state);
..

so, netif_running(ndev) will start returning true before calling ops->ndo_open(dev);

>  
>  	return usage_count;
> @@ -1502,7 +1501,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
>  		 CPSW_RTL_VERSION(reg));
>  
>  	/* initialize host and slave ports */
> -	if (!cpsw_common_res_usage_state(cpsw))
> +	if (cpsw_common_res_usage_state(cpsw) < 2)

Ah. You've changed the condition here.

I think it might be reasonable to hide this inside cpsw_common_res_usage_state()
and seems it can be renamed to smth like cpsw_is_running().


>  		cpsw_init_host_port(priv);
>  	for_each_slave(priv, cpsw_slave_open, priv);
>  
> @@ -1513,7 +1512,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
>  		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
>  				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
>  
> -	if (!cpsw_common_res_usage_state(cpsw)) {
> +	if (cpsw_common_res_usage_state(cpsw) < 2) {
>  		/* disable priority elevation */
>  		__raw_writel(0, &cpsw->regs->ptype);
>  
> @@ -1556,9 +1555,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
>  	cpdma_ctlr_start(cpsw->dma);
>  	cpsw_intr_enable(cpsw);
>  
> -	if (cpsw->data.dual_emac)
> -		cpsw->slaves[priv->emac_port].open_stat = true;
> -
>  	return 0;
>  
>  err_cleanup:
> @@ -1578,7 +1574,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
>  	netif_tx_stop_all_queues(priv->ndev);
>  	netif_carrier_off(priv->ndev);
>  
> -	if (cpsw_common_res_usage_state(cpsw) <= 1) {
> +	if (!cpsw_common_res_usage_state(cpsw)) {

and here __LINK_STATE_START will be cleared before calling ops->ndo_stop(dev);
So, from one side netif_running(ndev) usage will simplify cpsw_common_res_usage_state() internals,
but from another side - it will make places where it's used even more entangled :( as for me,
because when cpsw_common_res_usage_state() will return 1 in cpsw_ndo_open() it will mean
"no interfaces is really running yet", but the same value 1 in cpsw_ndo_stop()
will mean "there are still one is running".

>  		napi_disable(&cpsw->napi_rx);
>  		napi_disable(&cpsw->napi_tx);
>  		cpts_unregister(cpsw->cpts);
> @@ -1592,8 +1588,6 @@ static int cpsw_ndo_stop(struct net_device *ndev)
>  		cpsw_split_res(ndev);
>  
>  	pm_runtime_put_sync(cpsw->dev);
> -	if (cpsw->data.dual_emac)
> -		cpsw->slaves[priv->emac_port].open_stat = false;
>  	return 0;
>  }
>  
> 

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable
From: Martin Blumenstingl @ 2017-01-09 17:37 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, devicetree, linux-amlogic, robh+dt, mark.rutland, carlo,
	khilman, peppe.cavallaro, alexandre.torgue, linux-arm-kernel
In-Reply-To: <CAFBinCDZ_mqt8q2rG6zDy8ke5sX6L+4sVYOOhWUdO3apwFiRYw@mail.gmail.com>

Hi David,

On Sun, Dec 18, 2016 at 5:13 PM, Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
> On Sun, Dec 18, 2016 at 4:49 PM, David Miller <davem@davemloft.net> wrote:
>> From: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
>> Date: Sat, 17 Dec 2016 19:21:19 +0100
>>
>>> Prior to this patch we were using a hardcoded RGMII TX clock delay of
>>> 2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for
>>> many boards, but unfortunately not for all (due to the way the actual
>>> circuit is designed, sometimes because the TX delay is enabled in the
>>> PHY, etc.). Making the TX delay on the MAC side configurable allows us
>>> to support all possible hardware combinations.
>>>
>>> This allows fixing a compatibility issue on some boards, where the
>>> RTL8211F PHY is configured to generate the TX delay. We can now turn
>>> off the TX delay in the MAC, because otherwise we would be applying the
>>> delay twice (which results in non-working TX traffic).
>>>
>>> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
>>> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
>>
>> Is this really the safest thing to do?
>>
>> If you say the existing hard-coded setting of 1/4 cycle works on most
>> boards, and what you're trying to do is override it with an OF
>> property value for boards where the existing setting does not work,
>> then you _must_ use a default value that corresponds to what the
>> existing code does not when you don't see this new OF property.
> it's a bit more complicated in reality: 1/4 cycle works when the TX
> delay of the RTL8211F PHY is turned off (until recently it was always
> enabled for phy-mode RGMII).
>
>> So please retain the current behavior of the 1/4 cycle TX delay
>> setting when you don't see the amlogic,tx-delay-ns property.
>>
>> I really think you risk breaking existing boards by not doing so,
>> unless you can have this patch tested on every such board that exists
>> and I don't think you really can feasibly and rigorously do that.
> there's a patch in my follow-up series which adds the 2ns to the .dts
> for all RGMII based boards: [0] (and I would keep these even if we had
> a default value, just to make it explicit and thus easier to
> understand for other people).
> however, we can add the 2ns default back (I can do this if you want -
> Rob Herring was unhappy with the missing documentation of this default
> value [1] - so note to myself: take care of that as well). but then we
> have to decide when to apply this default value: only when we're in
> RGMII mode or also in any of the RGMII_*ID modes?
>
> please let me know how we should proceed
gentle ping - what is your opinion on this?


Regards,
Martin

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Florian Fainelli @ 2017-01-09 17:42 UTC (permalink / raw)
  To: Jiri Pirko, Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Andrew Lunn,
	Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <20170109160632.GD1862@nanopsycho>

On 01/09/2017 08:06 AM, Jiri Pirko wrote:
> Mon, Jan 09, 2017 at 04:45:33PM CET, vivien.didelot@savoirfairelinux.com wrote:
>> Hi Jiri,
>>
>> Jiri Pirko <jiri@resnulli.us> writes:
>>
>>>> Extra question: shouldn't phys_port_{id,name} be switchdev attributes in
>>>
>>> Again, phys_port_id has nothing to do with switches. Should be removed
>>> from dsa because its use there is incorrect.
>>
>> Florian, since 3a543ef just got in, can it be reverted?
> 
> Yes, please revert it. It is only in net-next.

Maybe the use case can be understood before reverting the change. How do
we actually the physical port number of an Ethernet switch per-port
network device? The name is not enough, because there are plenty of
cases where we need to manipulate a physical port number (be it just for
informational purposes).

Should we just amend the existing description of ndo_get_phys_port_id()?
Should we introduce another ndo for that?
-- 
Florian

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Eric Dumazet @ 2017-01-09 17:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, xiyou.wangcong
In-Reply-To: <20170109150404.30215.44512.stgit@firesoul>

On Mon, 2017-01-09 at 16:04 +0100, Jesper Dangaard Brouer wrote:
> This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
> on stack"), because struct icmp_bxm no really a large struct, and
> allocating and free of this small 112 bytes hurts performance.
> 
> Fixes: 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Cong Wang @ 2017-01-09 17:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <20170109150404.30215.44512.stgit@firesoul>

On Mon, Jan 9, 2017 at 7:04 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
> on stack"), because struct icmp_bxm no really a large struct, and
> allocating and free of this small 112 bytes hurts performance.

The original commit fixes a warning for large stack usage, icmp_send()
is deep in the call stack.

Your optimization for a slow path makes no sense to me.

^ permalink raw reply

* Re: [net-next PATCH 2/3] net: reduce cycles spend on ICMP replies that gets rate limited
From: Eric Dumazet @ 2017-01-09 17:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, xiyou.wangcong
In-Reply-To: <20170109150409.30215.34612.stgit@firesoul>

On Mon, 2017-01-09 at 16:04 +0100, Jesper Dangaard Brouer wrote:
> This patch split the global and per (inet)peer ICMP-reply limiter
> code, and moves the global limit check to earlier in the packet
> processing path.  Thus, avoid spending cycles on ICMP replies that
> gets limited/suppressed anyhow.
> 
> The global ICMP rate limiter icmp_global_allow() is a good solution,
> it just happens too late in the process.  The kernel goes through the
> full route lookup (return path) for the ICMP message, before taking
> the rate limit decision of not sending the ICMP reply.
> 
> Details: The kernels global rate limiter for ICMP messages got added
> in commit 4cdf507d5452 ("icmp: add a global rate limitation").  It is
> a token bucket limiter with a global lock.  It brilliantly avoids
> locking congestion by only updating when 20ms (HZ/50) were elapsed. It
> can then avoids taking lock when credit is exhausted (when under
> pressure) and time constraint for refill is not yet meet.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---


Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [net-next PATCH 0/3] net: optimize ICMP-reply code path
From: Cong Wang @ 2017-01-09 17:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <20170109150246.30215.63371.stgit@firesoul>

On Mon, Jan 9, 2017 at 7:03 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> Use-case: The specific case I experienced this being a bottleneck is,
> sending UDP packets to a port with no listener, which obviously result
> in kernel replying with ICMP Destination Unreachable (type:3), Port
> Unreachable (code:3), which cause the bottleneck.

Why this is a case we should care about for performance?

^ permalink raw reply

* Re: [net-next PATCH 3/3] net: for rate-limited ICMP replies save one atomic operation
From: Eric Dumazet @ 2017-01-09 17:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, xiyou.wangcong
In-Reply-To: <20170109150414.30215.63724.stgit@firesoul>

On Mon, 2017-01-09 at 16:04 +0100, Jesper Dangaard Brouer wrote:
> It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
> by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
> as pointed out by Eric Dumazet, but the BH disabled state must be correct.
> 
> The icmp_global_allow() call states it must be called with BH
> disabled.  This protection was given by the calls icmp_xmit_lock and
> icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
> functions and maintain it explicitly at callers.
> 
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox