Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re: Socket Direct Protocol: help (2)
From: Andrea Gozzelino @ 2010-04-23 14:35 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Andrea Gozzelino, Tung, Chien Tin, Steve Wise,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	pavel-+ZI9xUNit7I@public.gmane.org,
	mingo-X9Un+BFzKDI@public.gmane.org, Eric B Munson
In-Reply-To: <4BCEE8FE.6050805-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

Hi Amir, 

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea



On Apr 21, 2010 02:01 PM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:

> Hi Andrea,
> 
> I am preparing the fix right now.
> 
> - Amir
> 
> On 04/20/2010 04:53 PM, Andrea Gozzelino wrote:
> > Hi Amir,
> >
> > have you any news about bugs 2027 "SDP not respecting # SGEs as
> > reported
> > from HW" and 2028 "SDP should support fastreg mrs"?
> >
> > When those bugs will be fixed, I will test the NE020 cards
> > performance
> > with SDP protocol and I will compare SDP and TCP.
> >
> > Keep in touch,
> >
> > Andrea Gozzelino
> >
> > INFN - Laboratori Nazionali di Legnaro	(LNL)
> > Viale dell'Universita' 2
> > I-35020 - Legnaro (PD)- ITALIA
> > Tel: +39 049 8068346
> > Fax: +39 049 641925
> > Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org
> >
> >
> >
> >
> >
> >
> >
> > On Apr 15, 2010 10:38 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
> >
> >   
> >> It should be a simple fix and I plan to do soon - just add yourself
> >> as
> >> CC in bugzilla  - that way I won't forget to notify you.
> >>
> >> - amir
> >>
> >> On 04/15/2010 10:07 AM, Andrea Gozzelino wrote:
> >>     
> >>> On Apr 15, 2010 08:24 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
> >>>
> >>>   
> >>>       
> >>>> I hope to have a fix next week for the first one.
> >>>>
> >>>> Thanks,
> >>>> Amir
> >>>>
> >>>> On 04/14/2010 09:48 PM, Tung, Chien Tin wrote:
> >>>>     
> >>>>         
> >>>>>> Tung, Chien Tin wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>> One more thing - Please open a bug regarding the num_sge
> >>>>>>>> limitation at:
> >>>>>>>> https://bugs.openfabrics.org/
> >>>>>>>>
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>> Done, Bug 2027.
> >>>>>>>
> >>>>>>> Chien
> >>>>>>>
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> And 2028 opened to request fastreg support.
> >>>>>>
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> I am open to test fixes for these two bugs.
> >>>>>
> >>>>> Chien
> >>>>>
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>     
> >>>>         
> >>> Hi Amir, 
> >>> Hi Chien,
> >>>
> >>> I understand that the bug 2027 could be solved next week, so I
> >>> will
> >>> test
> >>> SDP protocol performance on NE020 cards.
> >>> Is it correct? 
> >>> If yes, could you point out the code modifies?
> >>>
> >>> Keep in touch and take care.
> >>> Regards,
> >>> Andrea
> >>>
> >>>
> >>> Andrea Gozzelino
> >>>
> >>> INFN - Laboratori Nazionali di Legnaro	(LNL)
> >>> Viale dell'Universita' 2
> >>> I-35020 - Legnaro (PD)- ITALIA
> >>> Tel: +39 049 8068346
> >>> Fax: +39 049 641925
> >>> Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org			
> >>>
> >>>
> >>>   
> >>>       
> >>     
> >
> > 		
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   
> 


Andrea Gozzelino

INFN - Laboratori Nazionali di Legnaro	(LNL)
Viale dell'Universita' 2
I-35020 - Legnaro (PD)- ITALIA
Tel: +39 049 8068346
Fax: +39 049 641925
Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org			

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* SDP bugs 2027 and 2028
From: Andrea Gozzelino @ 2010-04-23 14:35 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Andrea Gozzelino, Tung, Chien Tin, Steve Wise,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	pavel-+ZI9xUNit7I@public.gmane.org,
	mingo-X9Un+BFzKDI@public.gmane.org, Eric B Munson
In-Reply-To: <4BCEE8FE.6050805-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

Hi Amir, 

have you any news about bugs 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea


On Apr 21, 2010 02:01 PM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:

> Hi Andrea,
> 
> I am preparing the fix right now.
> 
> - Amir
> 
> On 04/20/2010 04:53 PM, Andrea Gozzelino wrote:
> > Hi Amir,
> >
> > have you any news about bugs 2027 "SDP not respecting # SGEs as
> > reported
> > from HW" and 2028 "SDP should support fastreg mrs"?
> >
> > When those bugs will be fixed, I will test the NE020 cards
> > performance
> > with SDP protocol and I will compare SDP and TCP.
> >
> > Keep in touch,
> >
> > Andrea Gozzelino
> >
> > INFN - Laboratori Nazionali di Legnaro	(LNL)
> > Viale dell'Universita' 2
> > I-35020 - Legnaro (PD)- ITALIA
> > Tel: +39 049 8068346
> > Fax: +39 049 641925
> > Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org
> >
> >
> >
> >
> >
> >
> >
> > On Apr 15, 2010 10:38 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
> >
> >   
> >> It should be a simple fix and I plan to do soon - just add yourself
> >> as
> >> CC in bugzilla  - that way I won't forget to notify you.
> >>
> >> - amir
> >>
> >> On 04/15/2010 10:07 AM, Andrea Gozzelino wrote:
> >>     
> >>> On Apr 15, 2010 08:24 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
> >>>
> >>>   
> >>>       
> >>>> I hope to have a fix next week for the first one.
> >>>>
> >>>> Thanks,
> >>>> Amir
> >>>>
> >>>> On 04/14/2010 09:48 PM, Tung, Chien Tin wrote:
> >>>>     
> >>>>         
> >>>>>> Tung, Chien Tin wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>> One more thing - Please open a bug regarding the num_sge
> >>>>>>>> limitation at:
> >>>>>>>> https://bugs.openfabrics.org/
> >>>>>>>>
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>> Done, Bug 2027.
> >>>>>>>
> >>>>>>> Chien
> >>>>>>>
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> And 2028 opened to request fastreg support.
> >>>>>>
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> I am open to test fixes for these two bugs.
> >>>>>
> >>>>> Chien
> >>>>>
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>     
> >>>>         
> >>> Hi Amir, 
> >>> Hi Chien,
> >>>
> >>> I understand that the bug 2027 could be solved next week, so I
> >>> will
> >>> test
> >>> SDP protocol performance on NE020 cards.
> >>> Is it correct? 
> >>> If yes, could you point out the code modifies?
> >>>
> >>> Keep in touch and take care.
> >>> Regards,
> >>> Andrea
> >>>
> >>>
> >>> Andrea Gozzelino
> >>>
> >>> INFN - Laboratori Nazionali di Legnaro	(LNL)
> >>> Viale dell'Universita' 2
> >>> I-35020 - Legnaro (PD)- ITALIA
> >>> Tel: +39 049 8068346
> >>> Fax: +39 049 641925
> >>> Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org			
> >>>
> >>>
> >>>   
> >>>       
> >>     
> >
> > 		
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* SDP bugs 2027 and 2028
From: Andrea Gozzelino @ 2010-04-23 14:34 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Andrea Gozzelino, Tung, Chien Tin, Steve Wise,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	rolandd@cisco.com, peterz@infradead.org, pavel@ucw.cz,
	mingo@elte.hu, Eric B Munson
In-Reply-To: <4BCEE8FE.6050805@mellanox.co.il>

Hi Amir, 

have you any news about bug 2027 and 2028 (SDP)?

Take care and keep in touch.
Thamk you very much,
Andrea


On Apr 21, 2010 02:01 PM, Amir Vadai <amirv@mellanox.co.il> wrote:

> Hi Andrea,
> 
> I am preparing the fix right now.
> 
> - Amir
> 
> On 04/20/2010 04:53 PM, Andrea Gozzelino wrote:
> > Hi Amir,
> >
> > have you any news about bugs 2027 "SDP not respecting # SGEs as
> > reported
> > from HW" and 2028 "SDP should support fastreg mrs"?
> >
> > When those bugs will be fixed, I will test the NE020 cards
> > performance
> > with SDP protocol and I will compare SDP and TCP.
> >
> > Keep in touch,
> >
> > Andrea Gozzelino
> >
> > INFN - Laboratori Nazionali di Legnaro	(LNL)
> > Viale dell'Universita' 2
> > I-35020 - Legnaro (PD)- ITALIA
> > Tel: +39 049 8068346
> > Fax: +39 049 641925
> > Mail: andrea.gozzelino@lnl.infn.it
> >
> >
> >
> >
> >
> >
> >
> > On Apr 15, 2010 10:38 AM, Amir Vadai <amirv@mellanox.co.il> wrote:
> >
> >   
> >> It should be a simple fix and I plan to do soon - just add yourself
> >> as
> >> CC in bugzilla  - that way I won't forget to notify you.
> >>
> >> - amir
> >>
> >> On 04/15/2010 10:07 AM, Andrea Gozzelino wrote:
> >>     
> >>> On Apr 15, 2010 08:24 AM, Amir Vadai <amirv@mellanox.co.il> wrote:
> >>>
> >>>   
> >>>       
> >>>> I hope to have a fix next week for the first one.
> >>>>
> >>>> Thanks,
> >>>> Amir
> >>>>
> >>>> On 04/14/2010 09:48 PM, Tung, Chien Tin wrote:
> >>>>     
> >>>>         
> >>>>>> Tung, Chien Tin wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>> One more thing - Please open a bug regarding the num_sge
> >>>>>>>> limitation at:
> >>>>>>>> https://bugs.openfabrics.org/
> >>>>>>>>
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>> Done, Bug 2027.
> >>>>>>>
> >>>>>>> Chien
> >>>>>>>
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> And 2028 opened to request fastreg support.
> >>>>>>
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> I am open to test fixes for these two bugs.
> >>>>>
> >>>>> Chien
> >>>>>
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>     
> >>>>         
> >>> Hi Amir, 
> >>> Hi Chien,
> >>>
> >>> I understand that the bug 2027 could be solved next week, so I
> >>> will
> >>> test
> >>> SDP protocol performance on NE020 cards.
> >>> Is it correct? 
> >>> If yes, could you point out the code modifies?
> >>>
> >>> Keep in touch and take care.
> >>> Regards,
> >>> Andrea
> >>>
> >>>
> >>> Andrea Gozzelino
> >>>
> >>> INFN - Laboratori Nazionali di Legnaro	(LNL)
> >>> Viale dell'Universita' 2
> >>> I-35020 - Legnaro (PD)- ITALIA
> >>> Tel: +39 049 8068346
> >>> Fax: +39 049 641925
> >>> Mail: andrea.gozzelino@lnl.infn.it			
> >>>
> >>>
> >>>   
> >>>       
> >>     
> >
> > 		
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   
> 

^ permalink raw reply

* Re: [PATCH v3 1/2] libibnetdisc: Convert to a multi-smp algorithm
From: Ira Weiny @ 2010-04-23  2:12 UTC (permalink / raw)
  To: Sasha Khapyorsky
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Hal Rosenstock
In-Reply-To: <20100413132531.GI10830@me>

On Tue, 13 Apr 2010 16:25:31 +0300
Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:

> On 13:46 Tue 13 Apr     , Sasha Khapyorsky wrote:
> > 
> > However see some comments and questions below.
> 
> Another thought. What about API like:
> 
> 	ibnd_discover_fabric(cosnt char *ca_name, unsigned port_num,
> 			     struct ibnd_config *cfg);
> 
> So libibnetdisc will be responsible for opening and closing its own port
> and in this way will be fully reenterable?

I think we need this...  But this alone will not fix ibnetdisc...

I found out why "iblinkinfo -S" is hanging for me.  The new algorithm has the
library using both libibmad and libibumad calls simultaneously.

These libraries were not designed to be used this way.  Therefore, I don't
think there is any direct bug in those layers.  However, this is why I thought
it was better for ibnetdisc to sit on top of ibmad and not use the umad layer.

Anyway, I think it is a mistake to pass an ibmad_port construct directly to
this library now.  Here is why.

umad_recv in query_smp.c and ib_resolve_self_via in ibnetdisc.c conflicted.
The underlying call to _do_madrpc was discarding the response MAD which
query_smp.c:umad_recv was expecting.  This would result in a hang forever.[*]
This will not happen unless you use -S because -S causes ibnetdisc.c to call
ib_resolve_self_via to determine the "drslid".  Basically ibnetdisc needs to
open an ibmad_port for any libibmad call and a fd via umad_open_port for the
parallel stuff.[#]

If we change the API to specify the ca name and port then the library can open
2 ports (or as many as it wants) and use them appropriately.  I think this is
the only solution which does not involve fixing libibmad.

So what about something like this:

   int ibnd_discover_fabric(ibnd_fabric_t **fabric,
			    cosnt char *ca_name,  <== could we even default this?
			    struct ibnd_config *cfg);

I don't mind the ibnd_config_t struct but I don't think it should be visible
to the user.  Make it opaque and use "set" functions.  Something like.

ibnd_fabric_t *fabric;
ibnd_config_t cfg;
ib_portid_t * from;

ibnd_set_hops(&cfg, hops);         <== default -1
ibnd_set_port_num(&cfg, port_num); <== default 1
ibnd_set_max_smps(&cfg, max_smps); <== default 2
ibnd_set_from_node(&cfg, from);    <== default NULL

if (ibnd_discover_fabric(&fabric, "foo", &cfg)) {  <== anything not in cfg is
                                                       defaulted here
   fprintf(stderr, "Wow it failed\n");
}

This allows us to change ibnd_config structure any time we want without
affecting the API.  I don't think the "pad" you used is a good idea.

Also since we are breaking the API we might as well return the fabric as a
parameter and have an error code.  But I could go either way on this one.

Ira


[*] query_smp.c probably should have it's own timeout here but we can discuss
later.

[#] What sucks about this is that libibmad already has the functionality to
open the umad port and configure it (50 line function).  Now we will be
duplicating this functionality.  I still maintain that making libibmad thread
safe would be very beneficial, if not necessary.  For example handling RMPP
and redirection is already in libibmad.  Why make people reimplement it on
their own if they want concurrent execution of any kind?  <sigh>


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: opensm with multiple IB subnets
From: Ken Teague @ 2010-04-22 19:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

Thank you, Yevgeny.  I've modified my init script accordingly and
restarted opensm.  For anyone else that may find it useful, here is
the "start" portion of my init script:

start () {
    echo -n "Starting opensm: "
    for GUID in `${IBSTAT_BIN} ${IBSTAT_ARG}`
    do
        export OSM_TMP_DIR="/tmp/opensm/${GUID}"
        export OSM_CACHE_DIR="/var/cache/opensm/${GUID}"
        export OSM_LOG_DIR="/var/log/opensm/${GUID}"
        [ -d ${OSM_TMP_DIR} ] || mkdir -p ${OSM_TMP_DIR}
        [ -d ${OSM_CACHE_DIR} ] || mkdir -p ${OSM_CACHE_DIR}
        [ -d ${OSM_LOG_DIR} ] || mkdir ${OSM_LOG_DIR}
        ${OPENSM_BIN} --log_file ${OSM_LOG_DIR}/opensm.log
${OPENSM_ARG} ${GUID} > /dev/null
    done
    if [[ $RETVAL -eq 0 ]]; then
        touch /var/lock/subsys/opensm
        success
    else
        failure
    fi
    echo
}


On Wed, Apr 21, 2010 at 7:56 AM, Yevgeny Kliteynik
<kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> Ken,
>
> On 4/21/2010 3:07 AM, Ken Teague wrote:
>>
>> On Tue, Apr 20, 2010 at 2:13 PM, Ken Teague<kteague-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>  wrote:
>>>
>>> I have a 17-node cluster and each node has a single IB card that has
>>> 2x IB ports (ib0 and ib1).....
>>
>> After doing a little more research, I confirmed that my understanding
>> of the manual page is correct.  To run opensm for each GUID, I
>> modified my init script to run a for loop based on the information
>> returned from "ibstat -p".
>>
>>
>> I added this near the beginning of the script where the other
>> environment variables are located:
>> <snip>
>> OFA_HOME="/usr/local/sbin"
>> IBSTAT_BIN="${OFA_HOME}/ibstat"
>> IBSTAT_ARG="-p"
>> OPENSM_BIN="${OFA_HOME}/opensm"
>> OPENSM_ARG="-B -g"
>> <snip>
>>
>>
>> I replaced the single line which started opensm with this for loop:
>> for i in `${IBSTAT_BIN} ${IBSTAT_ARG}`
>> do
>>     ${OPENSM_BIN} ${OPENSM_ARG} ${i}
>> done
>> <snip>
>>
>> If anyone has a more elegant way to handle this, I'm open to
>> suggestions.  Many thanks.
>
> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
> When you have more than one OpenSM process, they will all dump the
> same files, which is probably not a good idea.
>
> To change the output directories, set the OSM_TMP_DIR and
> OSM_CACHE_DIR env. variables to some other place.
> In addition, you need to make sure that each SM instance
> prints its log in a different place. You need to do
> something like this:
>
> foreach guid in guid_list
>        export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
>        export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
>        opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
> other options]
>
> -- Yevgeny
>
>> Ken
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/4] ib_core: implement XRC RCV qp's
From: Roland Dreier @ 2010-04-22 18:03 UTC (permalink / raw)
  To: Jack Morgenstein
  Cc: rolandd-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201002281102.21207.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

So I'm looking at merging this, and I'm wondering about one thing.
Seems like it's just a mistake but I want to make sure I understand
properly:

 > @@ -1078,6 +1079,7 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
 >  		goto err_put;
 >  	}
 >  
 > +	attr.create_flags  = 0;
 >  	attr.event_handler = ib_uverbs_qp_event_handler;

This looks redundant, because this function already sets create_flags to
0 a few lines later.  So I think this line is just a remnant from some
other patch.

But then ib_uverbs_create_xrc_rcv_qp() doesn't set create_flags before
the call to device->create_xrc_rcv_qp() -- which maybe is OK, since that
function is not going to look at create_flags right now, but for the
future we should probably set it to 0, right?

Also it's not 100% clear to me why the low-level driver needs a special
create_xrc_rcv_qp method, rather than having uverbs just call create_qp
with the right parameters.  But I haven't looked throught carefully to
see the differences between eg query_xrc_rcv_qp() vs query_qp() methods.

 - R.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: opensm with multiple IB subnets
From: Yevgeny Kliteynik @ 2010-04-22 14:38 UTC (permalink / raw)
  To: Justin Clift; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BD05BD2.9000600-oNuxUQfTmABg9hUCZPvPmw@public.gmane.org>

On 4/22/2010 5:23 PM, Justin Clift wrote:
> We should really put this on the wiki. :)

Good idea :)

-- Yevgeny
  
>
> On 04/22/2010 12:56 AM, Yevgeny Kliteynik wrote:
> <snip>
>> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
>> When you have more than one OpenSM process, they will all dump the
>> same files, which is probably not a good idea.
>>
>> To change the output directories, set the OSM_TMP_DIR and
>> OSM_CACHE_DIR env. variables to some other place.
>> In addition, you need to make sure that each SM instance
>> prints its log in a different place. You need to do
>> something like this:
>>
>> foreach guid in guid_list
>> export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
>> export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
>> opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
>> other options]
>>
>> -- Yevgeny
> <snip>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: opensm with multiple IB subnets
From: Justin Clift @ 2010-04-22 14:23 UTC (permalink / raw)
  To: Yevgeny Kliteynik; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

We should really put this on the wiki. :)


On 04/22/2010 12:56 AM, Yevgeny Kliteynik wrote:
<snip>
> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
> When you have more than one OpenSM process, they will all dump the
> same files, which is probably not a good idea.
>
> To change the output directories, set the OSM_TMP_DIR and
> OSM_CACHE_DIR env. variables to some other place.
> In addition, you need to make sure that each SM instance
> prints its log in a different place. You need to do
> something like this:
>
> foreach guid in guid_list
> export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
> export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
> opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
> other options]
>
> -- Yevgeny
<snip>

-- 
Salasaga  -  Open Source eLearning IDE
               http://www.salasaga.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] RDMA/nes: make nesadapter->phy_lock usage consistent
From: Tung, Chien Tin @ 2010-04-22 13:50 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <adafx2oo1gc.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>

>By the way, any problem with me merging the following trivial patch for
>2.6.35?

Please do.  Thank you.

Chien

>RDMA/nes: Make unnecessarily global functions static
>
>This allows the compiler to do a bit better; on my x86-64 build:
>
>add/remove: 0/2 grow/shrink: 1/0 up/down: 2288/-2365 (-77)
>function                                     old     new   delta
>nes_init_phy                                 273    2561   +2288
>nes_init_1g_phy                              469       -    -469
>nes_init_2025_phy                           1896       -   -1896
>
>Signed-off-by: Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] ummunotify: Userspace support for MMU notifications V2
From: Eric B Munson @ 2010-04-22 13:38 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, rolandd-FYB4Gu1CFyUAvxtiuMwx3w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-X9Un+BFzKDI,
	pavel-+ZI9xUNit7I, jsquyres-FYB4Gu1CFyUAvxtiuMwx3w,
	randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA, Eric B Munson

From: Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>

As discussed in <http://article.gmane.org/gmane.linux.drivers.openib/61925>
and follow-up messages, libraries using RDMA would like to track
precisely when application code changes memory mapping via free(),
munmap(), etc.  Current pure-userspace solutions using malloc hooks
and other tricks are not robust, and the feeling among experts is that
the issue is unfixable without kernel help.

We solve this not by implementing the full API proposed in the email
linked above but rather with a simpler and more generic interface,
which may be useful in other contexts.  Specifically, we implement a
new character device driver, ummunotify, that creates a /dev/ummunotify
node.  A userspace process can open this node read-only and use the fd
as follows:

 1. ioctl() to register/unregister an address range to watch in the
    kernel (cf struct ummunotify_register_ioctl in <linux/ummunotify.h>).

 2. read() to retrieve events generated when a mapping in a watched
    address range is invalidated (cf struct ummunotify_event in
    <linux/ummunotify.h>).  select()/poll()/epoll() and SIGIO are
    handled for this IO.

 3. mmap() one page at offset 0 to map a kernel page that contains a
    generation counter that is incremented each time an event is
    generated.  This allows userspace to have a fast path that checks
    that no events have occurred without a system call.

Thanks to Jason Gunthorpe <jgunthorpe <at> obsidianresearch.com> for
suggestions on the interface design.  Also thanks to Jeff Squyres
<jsquyres <at> cisco.com> for prototyping support for this in Open MPI, which
helped find several bugs during development.

Signed-off-by: Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Eric B Munson <ebmunson-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

---

Changes from V1:
- Update Kbuild to handle test program build properly
- Update documentation to cover questions not addressed in previous
  thread
---
 Documentation/Makefile                  |    3 +-
 Documentation/ummunotify/Makefile       |    7 +
 Documentation/ummunotify/ummunotify.txt |  162 +++++++++
 Documentation/ummunotify/umn-test.c     |  200 +++++++++++
 drivers/char/Kconfig                    |   12 +
 drivers/char/Makefile                   |    1 +
 drivers/char/ummunotify.c               |  567 +++++++++++++++++++++++++++++++
 include/linux/Kbuild                    |    1 +
 include/linux/ummunotify.h              |  121 +++++++
 9 files changed, 1073 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/ummunotify/Makefile
 create mode 100644 Documentation/ummunotify/ummunotify.txt
 create mode 100644 Documentation/ummunotify/umn-test.c
 create mode 100644 drivers/char/ummunotify.c
 create mode 100644 include/linux/ummunotify.h

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6fc7ea1..27ba76a 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1,3 +1,4 @@
 obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
 	filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
-	pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
+	pcmcia/ spi/ timers/ video4linux/ vm/ ummunotify/ \
+	watchdog/src/
diff --git a/Documentation/ummunotify/Makefile b/Documentation/ummunotify/Makefile
new file mode 100644
index 0000000..89f31a0
--- /dev/null
+++ b/Documentation/ummunotify/Makefile
@@ -0,0 +1,7 @@
+# List of programs to build
+hostprogs-y := umn-test
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS_umn-test.o += -I$(objtree)/usr/include
diff --git a/Documentation/ummunotify/ummunotify.txt b/Documentation/ummunotify/ummunotify.txt
new file mode 100644
index 0000000..d6c2ccc
--- /dev/null
+++ b/Documentation/ummunotify/ummunotify.txt
@@ -0,0 +1,162 @@
+UMMUNOTIFY
+
+  Ummunotify relays MMU notifier events to userspace.  This is useful
+  for libraries that need to track the memory mapping of applications;
+  for example, MPI implementations using RDMA want to cache memory
+  registrations for performance, but tracking all possible crazy cases
+  such as when, say, the FORTRAN runtime frees memory is impossible
+  without kernel help.
+
+Basic Model
+
+  A userspace process uses it by opening /dev/ummunotify, which
+  returns a file descriptor.  Interest in address ranges is registered
+  using ioctl() and MMU notifier events are retrieved using read(), as
+  described in more detail below.  Userspace can register multiple
+  address ranges to watch, and can unregister individual ranges.
+
+  Userspace can also mmap() a single read-only page at offset 0 on
+  this file descriptor.  This page contains (at offest 0) a single
+  64-bit generation counter that the kernel increments each time an
+  MMU notifier event occurs.  Userspace can use this to very quickly
+  check if there are any events to retrieve without needing to do a
+  system call.
+
+Control
+
+  To start using ummunotify, a process opens /dev/ummunotify in
+  read-only mode.  This will attach to current->mm because the current
+  consumers of this functionality do all monitoring in the process
+  being monitored.  It is currently not possible to use this device to
+  monitor other processes.  Control from userspace is done via ioctl().
+  An ioctl was chosen because the number of files required to register
+  a new address range in sysfs would be unwieldy and new procfs entries
+  are discouraged.  The defined ioctls are:
+
+    UMMUNOTIFY_EXCHANGE_FEATURES: This ioctl takes a single 32-bit
+      word of feature flags as input, and the kernel updates the
+      features flags word to contain only features requested by
+      userspace and also supported by the kernel.
+
+      This ioctl is only included for forward compatibility; no
+      feature flags are currently defined, and the kernel will simply
+      update any requested feature mask to 0.  The kernel will always
+      default to a feature mask of 0 if this ioctl is not used, so
+      current userspace does not need to perform this ioctl.
+
+    UMMUNOTIFY_REGISTER_REGION: Userspace uses this ioctl to tell the
+      kernel to start delivering events for an address range.  The
+      range is described using struct ummunotify_register_ioctl:
+
+	struct ummunotify_register_ioctl {
+		__u64	start;
+		__u64	end;
+		__u64	user_cookie;
+		__u32	flags;
+		__u32	reserved;
+	};
+
+      start and end give the range of userspace virtual addresses;
+      start is included in the range and end is not, so an example of
+      a 4 KB range would be start=0x1000, end=0x2000.
+
+      user_cookie is an opaque 64-bit quantity that is returned by the
+      kernel in events involving the range, and used by userspace to
+      stop watching the range.  Each registered address range must
+      have a distinct user_cookie.
+
+      It is fine with the kernel if userspace registers multiple
+      overlapping or even duplicate address ranges, as long as a
+      different cookie is used for each registration.
+
+      flags and reserved are included for forward compatibility;
+      userspace should simply set them to 0 for the current interface.
+
+    UMMUNOTIFY_UNREGISTER_REGION: Userspace passes in the 64-bit
+      user_cookie used to register a range to tell the kernel to stop
+      watching an address range.  Once this ioctl completes, the
+      kernel will not deliver any further events for the range that is
+      unregistered.
+
+Events
+
+  When an event occurs that invalidates some of a process's memory
+  mapping in an address range being watched, ummunotify queues an
+  event report for that address range.  If more than one event
+  invalidates parts of the same address range before userspace
+  retrieves the queued report, then further reports for the same range
+  will not be queued -- when userspace does read the queue, only a
+  single report for a given range will be returned.
+
+  If multiple ranges being watched are invalidated by a single event
+  (which is especially likely if userspace registers overlapping
+  ranges), then an event report structure will be queued for each
+  address range registration.
+
+  It is possible, if a large enough number of overlapping ranges are
+  registered and the list of invalidated events is busy enough and
+  ignored long enough, to cause the kernel to run out of memory.
+  Because this situation is unlikely to occur, the event queue size
+  is not bounded in order to avoid dropping events if the queue grows
+  beyond set bounds.
+
+  Userspace retrieves queued events via read() on the ummunotify file
+  descriptor; a buffer that is at least as big as struct
+  ummunotify_event should be used to retrieve event reports, and if a
+  larger buffer is passed to read(), multiple reports will be returned
+  (if available).
+
+  If the ummunotify file descriptor is in blocking mode, a read() call
+  will wait for an event report to be available.  Userspace may also
+  set the ummunotify file descriptor to non-blocking mode and use all
+  standard ways of waiting for data to be available on the ummunotify
+  file descriptor, including epoll/poll()/select() and SIGIO.
+
+  The format of event reports is:
+
+	struct ummunotify_event {
+		__u32	type;
+		__u32	flags;
+		__u64	hint_start;
+		__u64	hint_end;
+		__u64	user_cookie_counter;
+	};
+
+  where the type field is either UMMUNOTIFY_EVENT_TYPE_INVAL or
+  UMMUNOTIFY_EVENT_TYPE_LAST.  Events of type INVAL describe
+  invalidation events as follows: user_cookie_counter contains the
+  cookie passed in when userspace registered the range that the event
+  is for.  hint_start and hint_end contain the start address and end
+  address that were invalidated.
+
+  The flags word contains bit flags, with only UMMUNOTIFY_EVENT_FLAG_HINT
+  defined at the moment.  If HINT is set, then the invalidation event
+  invalidated less than the full address range and the kernel returns
+  the exact range invalidated; if HINT is not sent then hint_start and
+  hint_end are set to the original range registered by userspace.
+  (HINT will not be set if, for example, multiple events invalidated
+  disjoint parts of the range and so a single start/end pair cannot
+  represent the parts of the range that were invalidated)
+
+  If the event type is LAST, then the read operation has emptied the
+  list of invalidated regions, and the flags, hint_start and hint_end
+  fields are not used.  user_cookie_counter holds the value of the
+  kernel's generation counter (see below of more details) when the
+  empty list occurred.
+
+Generation Count
+
+  Userspace may mmap() a page on a ummunotify file descriptor via
+
+	mmap(NULL, sizeof (__u64), PROT_READ, MAP_SHARED, ummunotify_fd, 0);
+
+  to get a read-only mapping of the kernel's 64-bit generation
+  counter.  The kernel will increment this generation counter each
+  time an event report is queued.
+
+  Userspace can use the generation counter as a quick check to avoid
+  system calls; if the value read from the mapped kernel counter is
+  still equal to the value returned in user_cookie_counter for the
+  most recent LAST event retrieved, then no further events have been
+  queued and there is no need to try a read() on the ummunotify file
+  descriptor.
diff --git a/Documentation/ummunotify/umn-test.c b/Documentation/ummunotify/umn-test.c
new file mode 100644
index 0000000..143db2c
--- /dev/null
+++ b/Documentation/ummunotify/umn-test.c
@@ -0,0 +1,200 @@
+/*
+ * Copyright (c) 2009 Cisco Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdint.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <linux/ummunotify.h>
+
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+
+#define UMN_TEST_COOKIE 123
+
+static int		umn_fd;
+static volatile __u64  *umn_counter;
+
+static int umn_init(void)
+{
+	__u32 flags;
+
+	umn_fd = open("/dev/ummunotify", O_RDONLY);
+	if (umn_fd < 0) {
+		perror("open");
+		return 1;
+	}
+
+	if (ioctl(umn_fd, UMMUNOTIFY_EXCHANGE_FEATURES, &flags)) {
+		perror("exchange ioctl");
+		return 1;
+	}
+
+	printf("kernel feature flags: 0x%08x\n", flags);
+
+	umn_counter = mmap(NULL, sizeof *umn_counter, PROT_READ,
+			   MAP_SHARED, umn_fd, 0);
+	if (umn_counter == MAP_FAILED) {
+		perror("mmap");
+		return 1;
+	}
+
+	return 0;
+}
+
+static int umn_register(void *buf, size_t size, __u64 cookie)
+{
+	struct ummunotify_register_ioctl r = {
+		.start		= (unsigned long) buf,
+		.end		= (unsigned long) buf + size,
+		.user_cookie	= cookie,
+	};
+
+	if (ioctl(umn_fd, UMMUNOTIFY_REGISTER_REGION, &r)) {
+		perror("register ioctl");
+		return 1;
+	}
+
+	return 0;
+}
+
+static int umn_unregister(__u64 cookie)
+{
+	if (ioctl(umn_fd, UMMUNOTIFY_UNREGISTER_REGION, &cookie)) {
+		perror("unregister ioctl");
+		return 1;
+	}
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int			page_size;
+	__u64			old_counter;
+	void		       *t;
+	int			got_it;
+
+	if (umn_init())
+		return 1;
+
+	printf("\n");
+
+	old_counter = *umn_counter;
+	if (old_counter != 0) {
+		fprintf(stderr, "counter = %lld (expected 0)\n", old_counter);
+		return 1;
+	}
+
+	page_size = sysconf(_SC_PAGESIZE);
+	t = mmap(NULL, 3 * page_size, PROT_READ,
+		 MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+
+	if (umn_register(t, 3 * page_size, UMN_TEST_COOKIE))
+		return 1;
+
+	munmap(t + page_size, page_size);
+
+	old_counter = *umn_counter;
+	if (old_counter != 1) {
+		fprintf(stderr, "counter = %lld (expected 1)\n", old_counter);
+		return 1;
+	}
+
+	got_it = 0;
+	while (1) {
+		struct ummunotify_event	ev;
+		int			len;
+
+		len = read(umn_fd, &ev, sizeof ev);
+		if (len < 0) {
+			perror("read event");
+			return 1;
+		}
+		if (len != sizeof ev) {
+			fprintf(stderr, "Read gave %d bytes (!= event size %zd)\n",
+				len, sizeof ev);
+			return 1;
+		}
+
+		switch (ev.type) {
+		case UMMUNOTIFY_EVENT_TYPE_INVAL:
+			if (got_it) {
+				fprintf(stderr, "Extra invalidate event\n");
+				return 1;
+			}
+			if (ev.user_cookie_counter != UMN_TEST_COOKIE) {
+				fprintf(stderr, "Invalidate event for cookie %lld (expected %d)\n",
+					ev.user_cookie_counter,
+					UMN_TEST_COOKIE);
+				return 1;
+			}
+
+			printf("Invalidate event:\tcookie %lld\n",
+			       ev.user_cookie_counter);
+
+			if (!(ev.flags & UMMUNOTIFY_EVENT_FLAG_HINT)) {
+				fprintf(stderr, "Hint flag not set\n");
+				return 1;
+			}
+
+			if (ev.hint_start != (uintptr_t) t + page_size ||
+			    ev.hint_end != (uintptr_t) t + page_size * 2) {
+				fprintf(stderr, "Got hint %llx..%llx, expected %p..%p\n",
+					ev.hint_start, ev.hint_end,
+					t + page_size, t + page_size * 2);
+				return 1;
+			}
+
+			printf("\t\t\thint %llx...%llx\n",
+			       ev.hint_start, ev.hint_end);
+
+			got_it = 1;
+			break;
+
+		case UMMUNOTIFY_EVENT_TYPE_LAST:
+			if (!got_it) {
+				fprintf(stderr, "Last event without invalidate event\n");
+				return 1;
+			}
+
+			printf("Empty event:\t\tcounter %lld\n",
+			       ev.user_cookie_counter);
+			goto done;
+
+		default:
+			fprintf(stderr, "unknown event type %d\n",
+				ev.type);
+			return 1;
+		}
+	}
+
+done:
+	umn_unregister(123);
+	munmap(t, page_size);
+
+	old_counter = *umn_counter;
+	if (old_counter != 1) {
+		fprintf(stderr, "counter = %lld (expected 1)\n", old_counter);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3141dd3..cf26019 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -1111,6 +1111,18 @@ config DEVPORT
 	depends on ISA || PCI
 	default y
 
+config UMMUNOTIFY
+       tristate "Userspace MMU notifications"
+       select MMU_NOTIFIER
+       help
+         The ummunotify (userspace MMU notification) driver creates a
+         character device that can be used by userspace libraries to
+         get notifications when an application's memory mapping
+         changed.  This is used, for example, by RDMA libraries to
+         improve the reliability of memory registration caching, since
+         the kernel's MMU notifications can be used to know precisely
+         when to shoot down a cached registration.
+
 source "drivers/s390/char/Kconfig"
 
 endmenu
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index f957edf..521e5de 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_NSC_GPIO)		+= nsc_gpio.o
 obj-$(CONFIG_CS5535_GPIO)	+= cs5535_gpio.o
 obj-$(CONFIG_GPIO_TB0219)	+= tb0219.o
 obj-$(CONFIG_TELCLOCK)		+= tlclk.o
+obj-$(CONFIG_UMMUNOTIFY)	+= ummunotify.o
 
 obj-$(CONFIG_MWAVE)		+= mwave/
 obj-$(CONFIG_AGP)		+= agp/
diff --git a/drivers/char/ummunotify.c b/drivers/char/ummunotify.c
new file mode 100644
index 0000000..c14df3f
--- /dev/null
+++ b/drivers/char/ummunotify.c
@@ -0,0 +1,567 @@
+/*
+ * Copyright (c) 2009 Cisco Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mmu_notifier.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/ummunotify.h>
+
+#include <asm/cacheflush.h>
+
+MODULE_AUTHOR("Roland Dreier");
+MODULE_DESCRIPTION("Userspace MMU notifiers");
+MODULE_LICENSE("GPL v2");
+
+/*
+ * Information about an address range userspace has asked us to watch.
+ *
+ * user_cookie: Opaque cookie given to us when userspace registers the
+ *   address range.
+ *
+ * start, end: Address range; start is inclusive, end is exclusive.
+ *
+ * hint_start, hint_end: If a single MMU notification event
+ *   invalidates the address range, we hold the actual range of
+ *   addresses that were invalidated (and set UMMUNOTIFY_FLAG_HINT).
+ *   If another event hits this range before userspace reads the
+ *   event, we give up and don't try to keep track of which subsets
+ *   got invalidated.
+ *
+ * flags: Holds the INVALID flag for ranges that are on the invalid
+ *   list and/or the HINT flag for ranges where the hint range holds
+ *   good information.
+ *
+ * node: Used to put the range into an rbtree we use to be able to
+ *   scan address ranges in order.
+ *
+ * list: Used to put the range on the invalid list when an MMU
+ *   notification event hits the range.
+ */
+enum {
+	UMMUNOTIFY_FLAG_INVALID	= 1,
+	UMMUNOTIFY_FLAG_HINT	= 2,
+};
+
+struct ummunotify_reg {
+	u64			user_cookie;
+	unsigned long		start;
+	unsigned long		end;
+	unsigned long		hint_start;
+	unsigned long		hint_end;
+	unsigned long		flags;
+	struct rb_node		node;
+	struct list_head	list;
+};
+
+/*
+ * Context attached to each file that userspace opens.
+ *
+ * mmu_notifier: MMU notifier registered for this context.
+ *
+ * mm: mm_struct for process that created the context; we use this to
+ *   hold a reference to the mm to make sure it doesn't go away until
+ *   we're done with it.
+ *
+ * reg_tree: RB tree of address ranges being watched, sorted by start
+ *   address.
+ *
+ * invalid_list: List of address ranges that have been invalidated by
+ *   MMU notification events; as userspace reads events, the address
+ *   range corresponding to the event is removed from the list.
+ *
+ * counter: Page that can be mapped read-only by userspace, which
+ *   holds a generation count that is incremented each time an event
+ *   occurs.
+ *
+ * lock: Spinlock used to protect all context.
+ *
+ * read_wait: Wait queue used to wait for data to become available in
+ *   blocking read()s.
+ *
+ * async_queue: Used to implement fasync().
+ *
+ * need_empty: Set when userspace reads an invalidation event, so that
+ *   read() knows it must generate an "empty" event when userspace
+ *   drains the invalid_list.
+ *
+ * used: Set after userspace does anything with the file, so that the
+ *   "exchange flags" ioctl() knows it's too late to change anything.
+ */
+struct ummunotify_file {
+	struct mmu_notifier	mmu_notifier;
+	struct mm_struct       *mm;
+	struct rb_root		reg_tree;
+	struct list_head	invalid_list;
+	u64		       *counter;
+	spinlock_t		lock;
+	wait_queue_head_t	read_wait;
+	struct fasync_struct   *async_queue;
+	int			need_empty;
+	int			used;
+};
+
+static void ummunotify_handle_notify(struct mmu_notifier *mn,
+				     unsigned long start, unsigned long end)
+{
+	struct ummunotify_file *priv =
+		container_of(mn, struct ummunotify_file, mmu_notifier);
+	struct rb_node *n;
+	struct ummunotify_reg *reg;
+	unsigned long flags;
+	int hit = 0;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	for (n = rb_first(&priv->reg_tree); n; n = rb_next(n)) {
+		reg = rb_entry(n, struct ummunotify_reg, node);
+
+		/*
+		 * Ranges overlap if they're not disjoint; and they're
+		 * disjoint if the end of one is before the start of
+		 * the other one.  So if both disjointness comparisons
+		 * fail then the ranges overlap.
+		 *
+		 * Since we keep the tree of regions we're watching
+		 * sorted by start address, we can end this loop as
+		 * soon as we hit a region that starts past the end of
+		 * the range for the event we're handling.
+		 */
+		if (reg->start >= end)
+			break;
+
+		/*
+		 * Just go to the next region if the start of the
+		 * range is after the end of the region -- there
+		 * might still be more overlapping ranges that have a
+		 * greater start.
+		 */
+		if (start >= reg->end)
+			continue;
+
+		hit = 1;
+
+		if (test_and_set_bit(UMMUNOTIFY_FLAG_INVALID, &reg->flags)) {
+			/* Already on invalid list */
+			clear_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags);
+		} else {
+			list_add_tail(&reg->list, &priv->invalid_list);
+			set_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags);
+			reg->hint_start = start;
+			reg->hint_end   = end;
+		}
+	}
+
+	if (hit) {
+		++(*priv->counter);
+		flush_dcache_page(virt_to_page(priv->counter));
+		wake_up_interruptible(&priv->read_wait);
+		kill_fasync(&priv->async_queue, SIGIO, POLL_IN);
+	}
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+static void ummunotify_invalidate_page(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long addr)
+{
+	ummunotify_handle_notify(mn, addr, addr + PAGE_SIZE);
+}
+
+static void ummunotify_invalidate_range_start(struct mmu_notifier *mn,
+					      struct mm_struct *mm,
+					      unsigned long start,
+					      unsigned long end)
+{
+	ummunotify_handle_notify(mn, start, end);
+}
+
+static const struct mmu_notifier_ops ummunotify_mmu_notifier_ops = {
+	.invalidate_page	= ummunotify_invalidate_page,
+	.invalidate_range_start	= ummunotify_invalidate_range_start,
+};
+
+static int ummunotify_open(struct inode *inode, struct file *filp)
+{
+	struct ummunotify_file *priv;
+	int ret;
+
+	if (filp->f_mode & FMODE_WRITE)
+		return -EINVAL;
+
+	priv = kmalloc(sizeof *priv, GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->counter = (void *) get_zeroed_page(GFP_KERNEL);
+	if (!priv->counter) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	priv->reg_tree = RB_ROOT;
+	INIT_LIST_HEAD(&priv->invalid_list);
+	spin_lock_init(&priv->lock);
+	init_waitqueue_head(&priv->read_wait);
+	priv->async_queue = NULL;
+	priv->need_empty  = 0;
+	priv->used	  = 0;
+
+	priv->mmu_notifier.ops = &ummunotify_mmu_notifier_ops;
+	/*
+	 * Register notifier last, since notifications can occur as
+	 * soon as we register....
+	 */
+	ret = mmu_notifier_register(&priv->mmu_notifier, current->mm);
+	if (ret)
+		goto err_page;
+
+	priv->mm = current->mm;
+	atomic_inc(&priv->mm->mm_count);
+
+	filp->private_data = priv;
+
+	return 0;
+
+err_page:
+	free_page((unsigned long) priv->counter);
+
+err:
+	kfree(priv);
+	return ret;
+}
+
+static int ummunotify_close(struct inode *inode, struct file *filp)
+{
+	struct ummunotify_file *priv = filp->private_data;
+	struct rb_node *n;
+	struct ummunotify_reg *reg;
+
+	mmu_notifier_unregister(&priv->mmu_notifier, priv->mm);
+	mmdrop(priv->mm);
+	free_page((unsigned long) priv->counter);
+
+	for (n = rb_first(&priv->reg_tree); n; n = rb_next(n)) {
+		reg = rb_entry(n, struct ummunotify_reg, node);
+		kfree(reg);
+	}
+
+	kfree(priv);
+
+	return 0;
+}
+
+static bool ummunotify_readable(struct ummunotify_file *priv)
+{
+	return priv->need_empty || !list_empty(&priv->invalid_list);
+}
+
+static ssize_t ummunotify_read(struct file *filp, char __user *buf,
+			       size_t count, loff_t *pos)
+{
+	struct ummunotify_file *priv = filp->private_data;
+	struct ummunotify_reg *reg;
+	ssize_t ret;
+	struct ummunotify_event *events;
+	int max;
+	int n;
+
+	priv->used = 1;
+
+	events = (void *) get_zeroed_page(GFP_KERNEL);
+	if (!events) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock_irq(&priv->lock);
+
+	while (!ummunotify_readable(priv)) {
+		spin_unlock_irq(&priv->lock);
+
+		if (filp->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			goto out;
+		}
+
+		if (wait_event_interruptible(priv->read_wait,
+					     ummunotify_readable(priv))) {
+			ret = -ERESTARTSYS;
+			goto out;
+		}
+
+		spin_lock_irq(&priv->lock);
+	}
+
+	max = min_t(size_t, PAGE_SIZE, count) / sizeof *events;
+
+	for (n = 0; n < max; ++n) {
+		if (list_empty(&priv->invalid_list)) {
+			events[n].type = UMMUNOTIFY_EVENT_TYPE_LAST;
+			events[n].user_cookie_counter = *priv->counter;
+			++n;
+			priv->need_empty = 0;
+			break;
+		}
+
+		reg = list_first_entry(&priv->invalid_list,
+				       struct ummunotify_reg, list);
+
+		events[n].type = UMMUNOTIFY_EVENT_TYPE_INVAL;
+		if (test_bit(UMMUNOTIFY_FLAG_HINT, &reg->flags)) {
+			events[n].flags	     = UMMUNOTIFY_EVENT_FLAG_HINT;
+			events[n].hint_start = max(reg->start, reg->hint_start);
+			events[n].hint_end   = min(reg->end, reg->hint_end);
+		} else {
+			events[n].hint_start = reg->start;
+			events[n].hint_end   = reg->end;
+		}
+		events[n].user_cookie_counter = reg->user_cookie;
+
+		list_del(&reg->list);
+		reg->flags = 0;
+		priv->need_empty = 1;
+	}
+
+	spin_unlock_irq(&priv->lock);
+
+	if (copy_to_user(buf, events, n * sizeof *events))
+		ret = -EFAULT;
+	else
+		ret = n * sizeof *events;
+
+out:
+	free_page((unsigned long) events);
+	return ret;
+}
+
+static unsigned int ummunotify_poll(struct file *filp,
+				    struct poll_table_struct *wait)
+{
+	struct ummunotify_file *priv = filp->private_data;
+
+	poll_wait(filp, &priv->read_wait, wait);
+
+	return ummunotify_readable(priv) ? (POLLIN | POLLRDNORM) : 0;
+}
+
+static long ummunotify_exchange_features(struct ummunotify_file *priv,
+					 __u32 __user *arg)
+{
+	u32 feature_mask;
+
+	if (priv->used)
+		return -EINVAL;
+
+	priv->used = 1;
+
+	if (copy_from_user(&feature_mask, arg, sizeof(feature_mask)))
+		return -EFAULT;
+
+	/* No extensions defined at present. */
+	feature_mask = 0;
+
+	if (copy_to_user(arg, &feature_mask, sizeof(feature_mask)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static long ummunotify_register_region(struct ummunotify_file *priv,
+				       void __user *arg)
+{
+	struct ummunotify_register_ioctl parm;
+	struct ummunotify_reg *reg, *treg;
+	struct rb_node **n = &priv->reg_tree.rb_node;
+	struct rb_node *pn;
+	int ret = 0;
+
+	if (copy_from_user(&parm, arg, sizeof parm))
+		return -EFAULT;
+
+	priv->used = 1;
+
+	reg = kmalloc(sizeof *reg, GFP_KERNEL);
+	if (!reg)
+		return -ENOMEM;
+
+	reg->user_cookie	= parm.user_cookie;
+	reg->start		= parm.start;
+	reg->end		= parm.end;
+	reg->flags		= 0;
+
+	spin_lock_irq(&priv->lock);
+
+	for (pn = rb_first(&priv->reg_tree); pn; pn = rb_next(pn)) {
+		treg = rb_entry(pn, struct ummunotify_reg, node);
+
+		if (treg->user_cookie == parm.user_cookie) {
+			kfree(reg);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	pn = NULL;
+	while (*n) {
+		pn = *n;
+		treg = rb_entry(pn, struct ummunotify_reg, node);
+
+		if (reg->start <= treg->start)
+			n = &pn->rb_left;
+		else
+			n = &pn->rb_right;
+	}
+
+	rb_link_node(&reg->node, pn, n);
+	rb_insert_color(&reg->node, &priv->reg_tree);
+
+out:
+	spin_unlock_irq(&priv->lock);
+
+	return ret;
+}
+
+static long ummunotify_unregister_region(struct ummunotify_file *priv,
+					 __u64 __user *arg)
+{
+	u64 user_cookie;
+	struct rb_node *n;
+	struct ummunotify_reg *reg;
+	int ret = -EINVAL;
+
+	if (copy_from_user(&user_cookie, arg, sizeof(user_cookie)))
+		return -EFAULT;
+
+	spin_lock_irq(&priv->lock);
+
+	for (n = rb_first(&priv->reg_tree); n; n = rb_next(n)) {
+		reg = rb_entry(n, struct ummunotify_reg, node);
+
+		if (reg->user_cookie == user_cookie) {
+			rb_erase(n, &priv->reg_tree);
+			if (test_bit(UMMUNOTIFY_FLAG_INVALID, &reg->flags))
+				list_del(&reg->list);
+			kfree(reg);
+			ret = 0;
+			break;
+		}
+	}
+
+	spin_unlock_irq(&priv->lock);
+
+	return ret;
+}
+
+static long ummunotify_ioctl(struct file *filp, unsigned int cmd,
+			     unsigned long arg)
+{
+	struct ummunotify_file *priv = filp->private_data;
+	void __user *argp = (void __user *) arg;
+
+	switch (cmd) {
+	case UMMUNOTIFY_EXCHANGE_FEATURES:
+		return ummunotify_exchange_features(priv, argp);
+	case UMMUNOTIFY_REGISTER_REGION:
+		return ummunotify_register_region(priv, argp);
+	case UMMUNOTIFY_UNREGISTER_REGION:
+		return ummunotify_unregister_region(priv, argp);
+	default:
+		return -ENOIOCTLCMD;
+	}
+}
+
+static int ummunotify_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct ummunotify_file *priv = vma->vm_private_data;
+
+	if (vmf->pgoff != 0)
+		return VM_FAULT_SIGBUS;
+
+	vmf->page = virt_to_page(priv->counter);
+	get_page(vmf->page);
+
+	return 0;
+
+}
+
+static struct vm_operations_struct ummunotify_vm_ops = {
+	.fault		= ummunotify_fault,
+};
+
+static int ummunotify_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct ummunotify_file *priv = filp->private_data;
+
+	if (vma->vm_end - vma->vm_start != PAGE_SIZE || vma->vm_pgoff != 0)
+		return -EINVAL;
+
+	vma->vm_ops		= &ummunotify_vm_ops;
+	vma->vm_private_data	= priv;
+
+	return 0;
+}
+
+static int ummunotify_fasync(int fd, struct file *filp, int on)
+{
+	struct ummunotify_file *priv = filp->private_data;
+
+	return fasync_helper(fd, filp, on, &priv->async_queue);
+}
+
+static const struct file_operations ummunotify_fops = {
+	.owner		= THIS_MODULE,
+	.open		= ummunotify_open,
+	.release	= ummunotify_close,
+	.read		= ummunotify_read,
+	.poll		= ummunotify_poll,
+	.unlocked_ioctl	= ummunotify_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ummunotify_ioctl,
+#endif
+	.mmap		= ummunotify_mmap,
+	.fasync		= ummunotify_fasync,
+};
+
+static struct miscdevice ummunotify_misc = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "ummunotify",
+	.fops	= &ummunotify_fops,
+};
+
+static int __init ummunotify_init(void)
+{
+	return misc_register(&ummunotify_misc);
+}
+
+static void __exit ummunotify_cleanup(void)
+{
+	misc_deregister(&ummunotify_misc);
+}
+
+module_init(ummunotify_init);
+module_exit(ummunotify_cleanup);
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index e2ea0b2..e086b39 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -163,6 +163,7 @@ header-y += tipc_config.h
 header-y += toshiba.h
 header-y += udf_fs_i.h
 header-y += ultrasound.h
+header-y += ummunotify.h
 header-y += un.h
 header-y += utime.h
 header-y += veth.h
diff --git a/include/linux/ummunotify.h b/include/linux/ummunotify.h
new file mode 100644
index 0000000..21b0d03
--- /dev/null
+++ b/include/linux/ummunotify.h
@@ -0,0 +1,121 @@
+/*
+ * Copyright (c) 2009 Cisco Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _LINUX_UMMUNOTIFY_H
+#define _LINUX_UMMUNOTIFY_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/*
+ * Ummunotify relays MMU notifier events to userspace.  A userspace
+ * process uses it by opening /dev/ummunotify, which returns a file
+ * descriptor.  Interest in address ranges is registered using ioctl()
+ * and MMU notifier events are retrieved using read(), as described in
+ * more detail below.
+ *
+ * Userspace can also mmap() a single read-only page at offset 0 on
+ * this file descriptor.  This page contains (at offest 0) a single
+ * 64-bit generation counter that the kernel increments each time an
+ * MMU notifier event occurs.  Userspace can use this to very quickly
+ * check if there are any events to retrieve without needing to do a
+ * system call.
+ */
+
+/*
+ * struct ummunotify_register_ioctl describes an address range from
+ * start to end (including start but not including end) to be
+ * monitored.  user_cookie is an opaque handle that userspace assigns,
+ * and which is used to unregister.  flags and reserved are currently
+ * unused and should be set to 0 for forward compatibility.
+ */
+struct ummunotify_register_ioctl {
+	__u64	start;
+	__u64	end;
+	__u64	user_cookie;
+	__u32	flags;
+	__u32	reserved;
+};
+
+#define UMMUNOTIFY_MAGIC		'U'
+
+/*
+ * Forward compatibility: Userspace passes in a 32-bit feature mask
+ * with feature flags set indicating which extensions it wishes to
+ * use.  The kernel will return a feature mask with the bits of
+ * userspace's mask that the kernel implements; from that point on
+ * both userspace and the kernel should behave as described by the
+ * kernel's feature mask.
+ *
+ * If userspace does not perform a UMMUNOTIFY_EXCHANGE_FEATURES ioctl,
+ * then the kernel will use a feature mask of 0.
+ *
+ * No feature flags are currently defined, so the kernel will always
+ * return a feature mask of 0 at present.
+ */
+#define UMMUNOTIFY_EXCHANGE_FEATURES	_IOWR(UMMUNOTIFY_MAGIC, 1, __u32)
+
+/*
+ * Register interest in an address range; userspace should pass in a
+ * struct ummunotify_register_ioctl describing the region.
+ */
+#define UMMUNOTIFY_REGISTER_REGION	_IOW(UMMUNOTIFY_MAGIC, 2, \
+					     struct ummunotify_register_ioctl)
+/*
+ * Unregister interest in an address range; userspace should pass in
+ * the user_cookie value that was used to register the address range.
+ * No events for the address range will be reported once it is
+ * unregistered.
+ */
+#define UMMUNOTIFY_UNREGISTER_REGION	_IOW(UMMUNOTIFY_MAGIC, 3, __u64)
+
+/*
+ * Invalidation events are returned whenever the kernel changes the
+ * mapping for a monitored address.  These events are retrieved by
+ * read() on the ummunotify file descriptor, which will fill the
+ * read() buffer with struct ummunotify_event.
+ *
+ * If type field is INVAL, then user_cookie_counter holds the
+ * user_cookie for the region being reported; if the HINT flag is set
+ * then hint_start/hint_end hold the start and end of the mapping that
+ * was invalidated.  (If HINT is not set, then multiple events
+ * invalidated parts of the registered range and hint_start/hint_end
+ * and set to the start/end of the whole registered range)
+ *
+ * If type is LAST, then the read operation has emptied the list of
+ * invalidated regions, and user_cookie_counter holds the value of the
+ * kernel's generation counter when the empty list occurred.  The
+ * other fields are not filled in for this event.
+ */
+enum {
+	UMMUNOTIFY_EVENT_TYPE_INVAL	= 0,
+	UMMUNOTIFY_EVENT_TYPE_LAST	= 1,
+};
+
+enum {
+	UMMUNOTIFY_EVENT_FLAG_HINT	= 1 << 0,
+};
+
+struct ummunotify_event {
+	__u32	type;
+	__u32	flags;
+	__u64	hint_start;
+	__u64	hint_end;
+	__u64	user_cookie_counter;
+};
+
+#endif /* _LINUX_UMMUNOTIFY_H */
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] libibverbs: Add huge page support to ibv_madvise_range()
From: Alex Vainman @ 2010-04-22  7:35 UTC (permalink / raw)
  To: Roland Dreier
  Cc: alexv-smomgflXvOZWk0Htik3J/w, roland,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexr-smomgflXvOZWk0Htik3J/w
In-Reply-To: <ada8wbzi490.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>

Roland Dreier Wrote:
>  > ibv_reg_mr() fails to register a memory region allocated on huge page and not
>  > the default page size. This happens because ibv_madvise_range() aligns memory
>  > region to the default system page size before calling to madvise() which fails
>  > with EINVAL error. madvise() fails because it expects that the start and end
>  > pointer of the memory range be huge page aligned.
> 
> Seems unfortunate.  I wonder if there's a way the kernel madvise could
> help us here?
> 
>  > +/*
>  > + * Get the kernel default huge page size.
>  > + */
>  > +static int get_huge_page_size()
>  > +{
>  > +	int fd;
>  > +	char buf[MEMINFO_SIZE];
>  > +	int mem_file_len;
>  > +	char *p_hpage_val = NULL;
>  > +	char *end_pointer = NULL;
>  > +	char file_name[] = "/proc/meminfo";
>  > +	const char label[] = "Hugepagesize:";
>  > +	int ret_val = 0;
>  > +
>  > +	fd = open(file_name, O_RDONLY);
>  > +	if (fd < 0)
>  > +		return fd;
>  > +
>  > +	mem_file_len = read(fd, buf, sizeof(buf) - 1);
>  > +
>  > +	close(fd);
>  > +	if (mem_file_len < 0)
>  > +		return mem_file_len;
>  > +
>  > +	buf[mem_file_len] = '\0';
>  > +
>  > +	p_hpage_val = strstr(buf, label);
>  > +	if (!p_hpage_val) {
>  > +		errno = EINVAL;
>  > +		return -1;
>  > +	}
>  > +	p_hpage_val += strlen(label);
>  > +
>  > +	errno = 0;
>  > +	ret_val = strtol(p_hpage_val, &end_pointer, 0);
>  > +
>  > +	if (errno != 0)
>  > +		return -1;
>  > +
>  > +	return ret_val * 1024;
>  > +}
> 
> This seems to duplicate but only partially a similar function from
> libhugetlbfs.  Is there any way we can just use that directly?  eg
> libhugetlbfs handles the case where there are multiple huge page sizes
> (and that exists even on mainstream x86 with 2MB and 1GB pages possible
> on the same system).
> 
>  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Roland,

After the patches, which handle madvise failure, are applied(these pathces were submited under the topic:"libibverbs: Undo changes in memory range tree when madvise() fails"), I would like to renew the discussion about this patch, which actually depends on the above patches, since it may cause madvise failure.

>This seems to duplicate but only partially a similar function from
>libhugetlbfs.  Is there any way we can just use that directly?  eg
>libhugetlbfs handles the case where there are multiple huge page sizes
>(and that exists even on mainstream x86 with 2MB and 1GB pages possible
>on the same system).

In order to avoid adding additional dependency to libibverbs, maybe we should just to enhance the get_huge_page_size() so it will support multiple huge page sizes?

-Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V4 2/2] mlx4/IB: Add support for enhanced atomic operations
From: Roland Dreier @ 2010-04-21 23:39 UTC (permalink / raw)
  To: vlad-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100414142339.GC16346@vlad-laptop>

thanks, applied both these patches.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] amso1100: Add missing memset
From: Roland Dreier @ 2010-04-21 23:31 UTC (permalink / raw)
  To: vlad-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100414142015.GA16260@vlad-laptop>

I think this patch is actually not needed.  c2_rnic_query() is only
called for the c2dev->props memory, and c2dev is allocated with
ib_alloc_device, which will always zero it out.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] RDMA/nes: make nesadapter->phy_lock usage consistent
From: Roland Dreier @ 2010-04-21 23:00 UTC (permalink / raw)
  To: Chien Tung; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <adak4s0o1ny.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>

By the way, any problem with me merging the following trivial patch for
2.6.35?


RDMA/nes: Make unnecessarily global functions static

This allows the compiler to do a bit better; on my x86-64 build:

add/remove: 0/2 grow/shrink: 1/0 up/down: 2288/-2365 (-77)
function                                     old     new   delta
nes_init_phy                                 273    2561   +2288
nes_init_1g_phy                              469       -    -469
nes_init_2025_phy                           1896       -   -1896

Signed-off-by: Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_hw.c    |    4 ++--
 drivers/infiniband/hw/nes/nes_verbs.c |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 8b67207..86acb7d 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1297,7 +1297,7 @@ int nes_destroy_cqp(struct nes_device *nesdev)
 /**
  * nes_init_1g_phy
  */
-int nes_init_1g_phy(struct nes_device *nesdev, u8 phy_type, u8 phy_index)
+static int nes_init_1g_phy(struct nes_device *nesdev, u8 phy_type, u8 phy_index)
 {
 	u32 counter = 0;
 	u16 phy_data;
@@ -1351,7 +1351,7 @@ int nes_init_1g_phy(struct nes_device *nesdev, u8 phy_type, u8 phy_index)
 /**
  * nes_init_2025_phy
  */
-int nes_init_2025_phy(struct nes_device *nesdev, u8 phy_type, u8 phy_index)
+static int nes_init_2025_phy(struct nes_device *nesdev, u8 phy_type, u8 phy_index)
 {
 	u32 temp_phy_data = 0;
 	u32 temp_phy_data2 = 0;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index e54f312..925e1f2 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -374,7 +374,7 @@ static int alloc_fast_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd,
 /*
  * nes_alloc_fast_reg_mr
  */
-struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len)
+static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len)
 {
 	struct nes_pd *nespd = to_nespd(ibpd);
 	struct nes_vnic *nesvnic = to_nesvnic(ibpd->device);
-- 
1.7.0.5


-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] RDMA/nes: make nesadapter->phy_lock usage consistent
From: Roland Dreier @ 2010-04-21 22:56 UTC (permalink / raw)
  To: Chien Tung; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100309215040.GA5912@ctung-MOBL>

actually added a chunk to delete the (now-unused) nesadapter variable
from nes_write_1G_phy_reg to fix a compile warning... no problem tho.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] RDMA/nes: make nesadapter->phy_lock usage consistent
From: Roland Dreier @ 2010-04-21 22:46 UTC (permalink / raw)
  To: Chien Tung; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100309215040.GA5912@ctung-MOBL>

thanks, applied.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 05/10] iw_cxgb4: Add connection management functions.
From: Roland Dreier @ 2010-04-21 22:41 UTC (permalink / raw)
  To: Steve Wise; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100416182951.22495.60351.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>

Thanks, all this looks pretty clean and small so I added it (as one big
patch).  One tiny issue that we can fix with a follow-up patch:

 > +int c4iw_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new,
 > +		     struct l2t_entry *l2t)
 > +{
 > +	struct c4iw_ep *ep = ctx;
 > +
 > +	if (ep->dst != old)
 > +		return 0;
 > +
 > +	PDBG("%s ep %p redirect to dst %p l2t %p\n", __func__, ep, new,
 > +	     l2t);
 > +	dst_hold(new);
 > +	cxgb4_l2t_release(ep->l2t);
 > +	ep->l2t = l2t;
 > +	dst_release(old);
 > +	ep->dst = new;
 > +	return 1;
 > +}

As far as I can see this function is not called or otherwise referenced
anywhere else (except for a declaration in a header).  Can we drop it?
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] RDMA/cxgb3: Don't free skbs on NET_XMIT_* indications from LLD.
From: Roland Dreier @ 2010-04-21 22:21 UTC (permalink / raw)
  To: Steve Wise; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100405195956.28814.29311.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>

thanks, applied
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] RDMA/amso1100: use the dma state API instead of the pci equivalents
From: Roland Dreier @ 2010-04-21 22:18 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW
In-Reply-To: <20100402132901M.fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

Thanks, applied all three of these conversion patches.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 4/4] libibverbs: Undo changes in memory range tree when madvise() fails
From: Roland Dreier @ 2010-04-21 22:12 UTC (permalink / raw)
  To: alexv-smomgflXvOZWk0Htik3J/w
  Cc: roland, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	alexr-smomgflXvOZWk0Htik3J/w
In-Reply-To: <4BAF8C88.80909-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Thanks, looks great, applied.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [infiniband-diags] support diffing lids and nodedesc on remoteports in ibnetdiscover
From: Al Chu @ 2010-04-21 18:19 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1271802640.17987.230.camel-X2zTWyBD0EhliZ7u+bvwcg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]

Hey Sasha,

A slight tweak to the patch.  Support diffing lids and node descriptions
on remote ports (previously it diffed only "local" lids and node
descriptions).  Also add appropriate manpage notes.

Al

On Tue, 2010-04-20 at 15:30 -0700, Al Chu wrote:
> Hey Sasha,
> 
> This patch supports diffing node descriptions on remote ports
> (previously diffing of just the "local" node description was supported).
> 
> Al
> 
> email message attachment
> > -------- Forwarded Message --------
> > From: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
> > Subject: [PATCH] support diffing nodedesc on remoteports in
> > ibnetdiscover
> > Date: Tue, 20 Apr 2010 15:09:59 -0700
> > 
> > Signed-off-by: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
> > ---
> >  infiniband-diags/src/ibnetdiscover.c |   11 +++++++++++
> >  1 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c
> > index 57f9625..eeb1b9f 100644
> > --- a/infiniband-diags/src/ibnetdiscover.c
> > +++ b/infiniband-diags/src/ibnetdiscover.c
> > @@ -720,6 +720,17 @@ static void diff_ports(ibnd_node_t * fabric1_node, ibnd_node_t * fabric2_node,
> >  			fabric2_out++;
> >  		}
> >  
> > +		if (data->diff_flags & DIFF_FLAG_PORT_CONNECTION
> > +		    && data->diff_flags & DIFF_FLAG_NODE_DESCRIPTION
> > +		    && fabric1_port && fabric2_port
> > +		    && fabric1_port->remoteport && fabric2_port->remoteport
> > +		    && memcmp(fabric1_port->remoteport->node->nodedesc,
> > +			      fabric2_port->remoteport->node->nodedesc,
> > +			      IB_SMP_DATA_SIZE)) {
> > +			fabric1_out++;
> > +			fabric2_out++;
> > +		}
> > +
> >  		if (fabric1_out) {
> >  			diff_iter_out_header(fabric1_node, data,
> >  					     out_header_flag);
-- 
Albert Chu
chu11-i2BcT+NCU+M@public.gmane.org
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

[-- Attachment #2: 0001-support-diffing-lids-and-nodedesc-on-remoteports-in.patch --]
[-- Type: message/rfc822, Size: 2478 bytes --]

From: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
Subject: [PATCH] support diffing lids and nodedesc on remoteports in ibnetdiscover
Date: Tue, 20 Apr 2010 15:09:59 -0700
Message-ID: <1271873984.17987.245.camel-X2zTWyBD0EhliZ7u+bvwcg@public.gmane.org>


Signed-off-by: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
---
 infiniband-diags/man/ibnetdiscover.8 |    5 ++++-
 infiniband-diags/src/ibnetdiscover.c |   20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8
index 76cfbc8..3beb70b 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -71,7 +71,10 @@ are: \fIsw\fR = switches, \fIca\fR = channel adapters, \fIrouter\fR = routers,
 \fIport\fR = port connections, \fIlid\fR = lids, \fInodedesc\fR = node
 descriptions.  Note that \fIport\fR, \fIlid\fR, and \fInodedesc\fR are
 checked only for the node types that are specified (e.g. \fIsw\fR,
-\fIca\fR, \fIrouter\fR).
+\fIca\fR, \fIrouter\fR).  If \fIport\fR is specified alongside \fIlid\fR
+or \fInodedesc\fR, remote port lids and node descriptions will also be compared.
+
+
 .TP
 \fB\-p\fR, \fB\-\-ports\fR
 Obtain a ports report which is a
diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c
index 57f9625..23e6dd4 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -720,6 +720,26 @@ static void diff_ports(ibnd_node_t * fabric1_node, ibnd_node_t * fabric2_node,
 			fabric2_out++;
 		}
 
+		if (data->diff_flags & DIFF_FLAG_PORT_CONNECTION
+		    && data->diff_flags & DIFF_FLAG_NODE_DESCRIPTION
+		    && fabric1_port && fabric2_port
+		    && fabric1_port->remoteport && fabric2_port->remoteport
+		    && memcmp(fabric1_port->remoteport->node->nodedesc,
+			      fabric2_port->remoteport->node->nodedesc,
+			      IB_SMP_DATA_SIZE)) {
+			fabric1_out++;
+			fabric2_out++;
+		}
+
+		if (data->diff_flags & DIFF_FLAG_PORT_CONNECTION
+		    && data->diff_flags & DIFF_FLAG_LID
+		    && fabric1_port && fabric2_port
+		    && fabric1_port->remoteport && fabric2_port->remoteport
+		    && fabric1_port->remoteport->base_lid != fabric2_port->remoteport->base_lid) {
+			fabric1_out++;
+			fabric2_out++;
+		}
+
 		if (fabric1_out) {
 			diff_iter_out_header(fabric1_node, data,
 					     out_header_flag);
-- 
1.5.4.5


^ permalink raw reply related

* Re: opensm with multiple IB subnets
From: Yevgeny Kliteynik @ 2010-04-21 14:56 UTC (permalink / raw)
  To: Ken Teague; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <u2q2d0a59b21004201707gecf7f978pa585ada342ccb9b6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Ken,

On 4/21/2010 3:07 AM, Ken Teague wrote:
> On Tue, Apr 20, 2010 at 2:13 PM, Ken Teague<kteague-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>  wrote:
>> I have a 17-node cluster and each node has a single IB card that has
>> 2x IB ports (ib0 and ib1).....
>
> After doing a little more research, I confirmed that my understanding
> of the manual page is correct.  To run opensm for each GUID, I
> modified my init script to run a for loop based on the information
> returned from "ibstat -p".
>
>
> I added this near the beginning of the script where the other
> environment variables are located:
> <snip>
> OFA_HOME="/usr/local/sbin"
> IBSTAT_BIN="${OFA_HOME}/ibstat"
> IBSTAT_ARG="-p"
> OPENSM_BIN="${OFA_HOME}/opensm"
> OPENSM_ARG="-B -g"
> <snip>
>
>
> I replaced the single line which started opensm with this for loop:
> for i in `${IBSTAT_BIN} ${IBSTAT_ARG}`
> do
>      ${OPENSM_BIN} ${OPENSM_ARG} ${i}
> done
> <snip>
>
> If anyone has a more elegant way to handle this, I'm open to
> suggestions.  Many thanks.

OpenSM dumps various files to /var/log and /var/cache/opensm folders.
When you have more than one OpenSM process, they will all dump the
same files, which is probably not a good idea.

To change the output directories, set the OSM_TMP_DIR and
OSM_CACHE_DIR env. variables to some other place.
In addition, you need to make sure that each SM instance
prints its log in a different place. You need to do
something like this:

foreach guid in guid_list
	export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
	export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
	opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your other options]

-- Yevgeny

> Ken
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: two questions about RDMA-WRITE
From: Ding Dinghua @ 2010-04-21 12:37 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4B0EDAC5753E48CD8CE32B6E3EF15949-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>

2010/4/16 Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>:
>>static void jm_cq_comp_handler(struct ib_cq *cq, void *context) {
>>        struct jm_rdma_conn *conn = context;
>>        struct ib_wc wc;
>>        struct jm_send_ctx *send;
>>
>>        /* No idea why it should be called twice. */
>>        printk("cq comp for id %p\n", conn->jc_id);
>>        ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>>        while (ib_poll_cq(cq, 1, &wc) == 1) {
>>                if (wc.opcode != IB_WC_RDMA_WRITE) {
>>                        printk("completed unknown opcode %d\n", wc.opcode);
>>                        /* continue; */
>>                }
>>                send = (struct jm_send_ctx *)wc.wr_id;
>>                printk("got send=%p\n", send);
>>                printk("completed RDMA_WRITE of IO(%Lu, %u)\n",
>>                       send->s_offset, send->s_size);
>>                send->s_done = wc.status == IB_WC_SUCCESS ? 1 : -EIO;
>>                wake_up_all(&send->s_wait);
>>        }
>>        ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>
> unrelated to your problem, but this second call to ib_req_notify_cq isn't
> necessary.
>
>>static int jm_rdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event
>>*event) {
> ..
>>        case RDMA_CM_EVENT_DISCONNECTED:
>>                connstate = -ECONNABORTED;
>>                goto connected;
> ..
>>connected:
>>                printk("%pI4:%u (event 0x%x)\n",
>>                       &conn->jc_remoteaddr.sin_addr.s_addr,
>>                       ntohs(conn->jc_remoteaddr.sin_port),
>>                       event->event << 11);
>>                conn->jc_connstate = connstate;
>>                wake_up_all(&conn->jc_connect_wait);
>>                break;
>
> How quickly do you respond to the disconnect event?  The remote side will wait
> until it receives a response or times out, which may be several seconds or
> minutes.
>
Thanks a lot, I think the problem lays here.
> - Sean
>
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Socket Direct Protocol: help (2)
From: Amir Vadai @ 2010-04-21 12:01 UTC (permalink / raw)
  To: Andrea Gozzelino
  Cc: Tung, Chien Tin, Steve Wise,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	pavel-+ZI9xUNit7I@public.gmane.org,
	mingo-X9Un+BFzKDI@public.gmane.org, Eric B Munson
In-Reply-To: <6490876.1271771620997.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org>

Hi Andrea,

I am preparing the fix right now.

- Amir

On 04/20/2010 04:53 PM, Andrea Gozzelino wrote:
> Hi Amir,
>
> have you any news about bugs 2027 "SDP not respecting # SGEs as reported
> from HW" and 2028 "SDP should support fastreg mrs"?
>
> When those bugs will be fixed, I will test the NE020 cards performance
> with SDP protocol and I will compare SDP and TCP.
>
> Keep in touch,
>
> Andrea Gozzelino
>
> INFN - Laboratori Nazionali di Legnaro	(LNL)
> Viale dell'Universita' 2
> I-35020 - Legnaro (PD)- ITALIA
> Tel: +39 049 8068346
> Fax: +39 049 641925
> Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org
>
>
>
>
>
>
>
> On Apr 15, 2010 10:38 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
>
>   
>> It should be a simple fix and I plan to do soon - just add yourself as
>> CC in bugzilla  - that way I won't forget to notify you.
>>
>> - amir
>>
>> On 04/15/2010 10:07 AM, Andrea Gozzelino wrote:
>>     
>>> On Apr 15, 2010 08:24 AM, Amir Vadai <amirv-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> wrote:
>>>
>>>   
>>>       
>>>> I hope to have a fix next week for the first one.
>>>>
>>>> Thanks,
>>>> Amir
>>>>
>>>> On 04/14/2010 09:48 PM, Tung, Chien Tin wrote:
>>>>     
>>>>         
>>>>>> Tung, Chien Tin wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> One more thing - Please open a bug regarding the num_sge
>>>>>>>> limitation at:
>>>>>>>> https://bugs.openfabrics.org/
>>>>>>>>
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> Done, Bug 2027.
>>>>>>>
>>>>>>> Chien
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> And 2028 opened to request fastreg support.
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> I am open to test fixes for these two bugs.
>>>>>
>>>>> Chien
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>     
>>>>         
>>> Hi Amir, 
>>> Hi Chien,
>>>
>>> I understand that the bug 2027 could be solved next week, so I will
>>> test
>>> SDP protocol performance on NE020 cards.
>>> Is it correct? 
>>> If yes, could you point out the code modifies?
>>>
>>> Keep in touch and take care.
>>> Regards,
>>> Andrea
>>>
>>>
>>> Andrea Gozzelino
>>>
>>> INFN - Laboratori Nazionali di Legnaro	(LNL)
>>> Viale dell'Universita' 2
>>> I-35020 - Legnaro (PD)- ITALIA
>>> Tel: +39 049 8068346
>>> Fax: +39 049 641925
>>> Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org			
>>>
>>>
>>>   
>>>       
>>     
>
> 		
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] opensm/osm_sa_path_record.c: Lower max number of hops allowed
From: Line Holen @ 2010-04-21 11:22 UTC (permalink / raw)
  To: sashak-smomgflXvOZWk0Htik3J/w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Lower max number of hops allowed in a path from 128 to 64.

Signed-off-by: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>

---

diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index 62102f4..9f508db 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -70,7 +70,7 @@
 #include <opensm/osm_prefix_route.h>
 #include <opensm/osm_ucast_lash.h>
 
-#define MAX_HOPS 128
+#define MAX_HOPS 64
 
 typedef struct osm_pr_item {
 	cl_list_item_t list_item;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox