All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: lkp@lists.01.org
Subject: Re: [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Date: Thu, 05 Sep 2019 18:48:11 +0800	[thread overview]
Message-ID: <20190905104811.GF5541@shbuild999.sh.intel.com> (raw)
In-Reply-To: <CAKMK7uESbH_0_SFUc+v=BhjW0bv4FbL8Dq2UG1fWcSqyue3wig@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10668 bytes --]

On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote:
> On Thu, Sep 5, 2019 at 8:58 AM Feng Tang <feng.tang@intel.com> wrote:
> >
> > Hi Vetter,
> >
> > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang <feng.tang@intel.com> wrote:
> > > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > > > > > > > >> Thank you for testing. But don't get too excited, because the patch
> > > > > > > > >> simulates a bug that was present in the original mgag200 code. A
> > > > > > > > >> significant number of frames are simply skipped. That is apparently the
> > > > > > > > >> reason why it's faster.
> > > > > > > > >
> > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming
> > > > > > > > > work inside atomic context on purpose. Is there any space to optmise it?
> > > > > > > > > If 2 scheduled update worker are handled at almost same time, can one be
> > > > > > > > > skipped?
> > > > > > > >
> > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling
> > > > > > > > the worker before a previous instance started, will not create a second
> > > > > > > > instance. The worker's instance will complete all pending updates. So in
> > > > > > > > some way, skipping workers already happens.
> > > > > > >
> > > > > > > So I think that the most often fbcon update from atomic context is the
> > > > > > > blinking cursor. If you disable that one you should be back to the old
> > > > > > > performance level I think, since just writing to dmesg is from process
> > > > > > > context, so shouldn't change.
> > > > > >
> > > > > > Hmm, then for the old driver, it should also do the most update in
> > > > > > non-atomic context?
> > > > > >
> > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs
> > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
> > > > > > the cache setting of DRM shadow buffer? say the orginal code use a
> > > > > > cachable buffer?
> > > > >
> > > > > Hm, that would indicate the write-combining got broken somewhere. This
> > > > > should definitely be faster. Also we shouldn't transfer the hole
> > > > > thing, except when scrolling ...
> > > >
> > > > First rule of fbcon usage, you are always effectively scrolling.
> > > >
> > > > Also these devices might be on a PCIE 1x piece of wet string, not sure
> > > > if the numbers reflect that.
> > >
> > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and
> > > overhead not entirely out of the question that 150MB/s is actually the
> > > hw limit. If it's really pcie 1x 1.0, no idea where to check that.
> > > Also might be worth to double-check that the gpu pci bar is listed as
> > > wc in debugfs/x86/pat_memtype_list.
> >
> > Here is some dump of the device info and the pat_memtype_list, while it is
> > running other 0day task:
> 
> Looks all good, I guess Dave is right with this probably only being a
> real slow, real old pcie link, plus maybe some inefficiencies in the
> mapping. Your 150MB/s, was that just the copy, or did you include all
> the setup/map/unmap/teardown too in your measurement in the trace?


Following is the breakdown, the 19240 us is the memory copy time

The drm_fb_helper_dirty_work() calls sequentially 
1. drm_client_buffer_vmap	  (290 us)
2. drm_fb_helper_dirty_blit_real  (19240 us)
3. helper->fb->funcs->dirty()    ---> NULL for mgag200 driver
4. drm_client_buffer_vunmap       (215 us)

Thanks,
Feng


> -Daniel
> 
> >
> > controller info
> > =================
> > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
> >         Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1)
> >         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 16
> >         NUMA node: 0
> >         Region 0: Memory at d0000000 (32-bit, prefetchable) [size=16M]
> >         Region 1: Memory at d1800000 (32-bit, non-prefetchable) [size=16K]
> >         Region 2: Memory at d1000000 (32-bit, non-prefetchable) [size=8M]
> >         Expansion ROM at 000c0000 [disabled] [size=128K]
> >         Capabilities: [dc] Power Management version 2
> >                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
> >                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
> >                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >         Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                 Address: 00000000  Data: 0000
> >         Kernel driver in use: mgag200
> >         Kernel modules: mgag200
> >
> >
> > Related pat setting
> > ===================
> > uncached-minus @ 0xc0000000-0xc0001000
> > uncached-minus @ 0xc0000000-0xd0000000
> > uncached-minus @ 0xc0008000-0xc0009000
> > uncached-minus @ 0xc0009000-0xc000a000
> > uncached-minus @ 0xc0010000-0xc0011000
> > uncached-minus @ 0xc0011000-0xc0012000
> > uncached-minus @ 0xc0012000-0xc0013000
> > uncached-minus @ 0xc0013000-0xc0014000
> > uncached-minus @ 0xc0018000-0xc0019000
> > uncached-minus @ 0xc0019000-0xc001a000
> > uncached-minus @ 0xc001a000-0xc001b000
> > write-combining @ 0xd0000000-0xd0300000
> > write-combining @ 0xd0000000-0xd1000000
> > uncached-minus @ 0xd1800000-0xd1804000
> > uncached-minus @ 0xd1900000-0xd1980000
> > uncached-minus @ 0xd1980000-0xd1981000
> > uncached-minus @ 0xd1a00000-0xd1a80000
> > uncached-minus @ 0xd1a80000-0xd1a81000
> > uncached-minus @ 0xd1f10000-0xd1f11000
> > uncached-minus @ 0xd1f11000-0xd1f12000
> > uncached-minus @ 0xd1f12000-0xd1f13000
> >
> > Host bridge info
> > ================
> > 00:00.0 Host bridge: Intel Corporation Device 7853
> >         Subsystem: Intel Corporation Device 0000
> >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 0
> >         NUMA node: 0
> >         Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0
> >                         ExtTag- RBE+
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L0s <512ns, L1 <4us
> >                         ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
> >                 RootCtl: ErrCorrectable+ ErrNon-Fatal+ ErrFatal+ PMEIntEna- CRSVisible-
> >                 RootCap: CRSVisible-
> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                 DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> >                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance De-emphasis: -6dB
> >                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> >                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >         Capabilities: [e0] Power Management version 3
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
> >         Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c <?>
> >         Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?>
> >         Capabilities: [250 v1] #19
> >         Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?>
> >         Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 <?>
> >
> >
> > Thanks,
> > Feng
> >
> >
> > >
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Feng Tang <feng.tang@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	"Chen, Rong A" <rong.a.chen@intel.com>, LKP <lkp@01.org>,
	Michel D?nzer <michel@daenzer.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Thomas Zimmermann <tzimmermann@suse.de>
Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Date: Thu, 5 Sep 2019 18:48:11 +0800	[thread overview]
Message-ID: <20190905104811.GF5541@shbuild999.sh.intel.com> (raw)
In-Reply-To: <CAKMK7uESbH_0_SFUc+v=BhjW0bv4FbL8Dq2UG1fWcSqyue3wig@mail.gmail.com>

On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote:
> On Thu, Sep 5, 2019 at 8:58 AM Feng Tang <feng.tang@intel.com> wrote:
> >
> > Hi Vetter,
> >
> > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang <feng.tang@intel.com> wrote:
> > > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > > > > > > > >> Thank you for testing. But don't get too excited, because the patch
> > > > > > > > >> simulates a bug that was present in the original mgag200 code. A
> > > > > > > > >> significant number of frames are simply skipped. That is apparently the
> > > > > > > > >> reason why it's faster.
> > > > > > > > >
> > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming
> > > > > > > > > work inside atomic context on purpose. Is there any space to optmise it?
> > > > > > > > > If 2 scheduled update worker are handled at almost same time, can one be
> > > > > > > > > skipped?
> > > > > > > >
> > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling
> > > > > > > > the worker before a previous instance started, will not create a second
> > > > > > > > instance. The worker's instance will complete all pending updates. So in
> > > > > > > > some way, skipping workers already happens.
> > > > > > >
> > > > > > > So I think that the most often fbcon update from atomic context is the
> > > > > > > blinking cursor. If you disable that one you should be back to the old
> > > > > > > performance level I think, since just writing to dmesg is from process
> > > > > > > context, so shouldn't change.
> > > > > >
> > > > > > Hmm, then for the old driver, it should also do the most update in
> > > > > > non-atomic context?
> > > > > >
> > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs
> > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
> > > > > > the cache setting of DRM shadow buffer? say the orginal code use a
> > > > > > cachable buffer?
> > > > >
> > > > > Hm, that would indicate the write-combining got broken somewhere. This
> > > > > should definitely be faster. Also we shouldn't transfer the hole
> > > > > thing, except when scrolling ...
> > > >
> > > > First rule of fbcon usage, you are always effectively scrolling.
> > > >
> > > > Also these devices might be on a PCIE 1x piece of wet string, not sure
> > > > if the numbers reflect that.
> > >
> > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and
> > > overhead not entirely out of the question that 150MB/s is actually the
> > > hw limit. If it's really pcie 1x 1.0, no idea where to check that.
> > > Also might be worth to double-check that the gpu pci bar is listed as
> > > wc in debugfs/x86/pat_memtype_list.
> >
> > Here is some dump of the device info and the pat_memtype_list, while it is
> > running other 0day task:
> 
> Looks all good, I guess Dave is right with this probably only being a
> real slow, real old pcie link, plus maybe some inefficiencies in the
> mapping. Your 150MB/s, was that just the copy, or did you include all
> the setup/map/unmap/teardown too in your measurement in the trace?


Following is the breakdown, the 19240 us is the memory copy time

The drm_fb_helper_dirty_work() calls sequentially 
1. drm_client_buffer_vmap	  (290 us)
2. drm_fb_helper_dirty_blit_real  (19240 us)
3. helper->fb->funcs->dirty()    ---> NULL for mgag200 driver
4. drm_client_buffer_vunmap       (215 us)

Thanks,
Feng


> -Daniel
> 
> >
> > controller info
> > =================
> > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
> >         Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1)
> >         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 16
> >         NUMA node: 0
> >         Region 0: Memory at d0000000 (32-bit, prefetchable) [size=16M]
> >         Region 1: Memory at d1800000 (32-bit, non-prefetchable) [size=16K]
> >         Region 2: Memory at d1000000 (32-bit, non-prefetchable) [size=8M]
> >         Expansion ROM at 000c0000 [disabled] [size=128K]
> >         Capabilities: [dc] Power Management version 2
> >                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
> >                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
> >                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >         Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                 Address: 00000000  Data: 0000
> >         Kernel driver in use: mgag200
> >         Kernel modules: mgag200
> >
> >
> > Related pat setting
> > ===================
> > uncached-minus @ 0xc0000000-0xc0001000
> > uncached-minus @ 0xc0000000-0xd0000000
> > uncached-minus @ 0xc0008000-0xc0009000
> > uncached-minus @ 0xc0009000-0xc000a000
> > uncached-minus @ 0xc0010000-0xc0011000
> > uncached-minus @ 0xc0011000-0xc0012000
> > uncached-minus @ 0xc0012000-0xc0013000
> > uncached-minus @ 0xc0013000-0xc0014000
> > uncached-minus @ 0xc0018000-0xc0019000
> > uncached-minus @ 0xc0019000-0xc001a000
> > uncached-minus @ 0xc001a000-0xc001b000
> > write-combining @ 0xd0000000-0xd0300000
> > write-combining @ 0xd0000000-0xd1000000
> > uncached-minus @ 0xd1800000-0xd1804000
> > uncached-minus @ 0xd1900000-0xd1980000
> > uncached-minus @ 0xd1980000-0xd1981000
> > uncached-minus @ 0xd1a00000-0xd1a80000
> > uncached-minus @ 0xd1a80000-0xd1a81000
> > uncached-minus @ 0xd1f10000-0xd1f11000
> > uncached-minus @ 0xd1f11000-0xd1f12000
> > uncached-minus @ 0xd1f12000-0xd1f13000
> >
> > Host bridge info
> > ================
> > 00:00.0 Host bridge: Intel Corporation Device 7853
> >         Subsystem: Intel Corporation Device 0000
> >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 0
> >         NUMA node: 0
> >         Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0
> >                         ExtTag- RBE+
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L0s <512ns, L1 <4us
> >                         ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
> >                 RootCtl: ErrCorrectable+ ErrNon-Fatal+ ErrFatal+ PMEIntEna- CRSVisible-
> >                 RootCap: CRSVisible-
> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                 DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> >                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance De-emphasis: -6dB
> >                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> >                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >         Capabilities: [e0] Power Management version 3
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
> >         Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c <?>
> >         Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?>
> >         Capabilities: [250 v1] #19
> >         Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?>
> >         Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 <?>
> >
> >
> > Thanks,
> > Feng
> >
> >
> > >
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Feng Tang <feng.tang@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	"Chen, Rong A" <rong.a.chen@intel.com>,
	Michel D?nzer <michel@daenzer.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Thomas Zimmermann <tzimmermann@suse.de>, LKP <lkp@01.org>
Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Date: Thu, 5 Sep 2019 18:48:11 +0800	[thread overview]
Message-ID: <20190905104811.GF5541@shbuild999.sh.intel.com> (raw)
In-Reply-To: <CAKMK7uESbH_0_SFUc+v=BhjW0bv4FbL8Dq2UG1fWcSqyue3wig@mail.gmail.com>

On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote:
> On Thu, Sep 5, 2019 at 8:58 AM Feng Tang <feng.tang@intel.com> wrote:
> >
> > Hi Vetter,
> >
> > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang <feng.tang@intel.com> wrote:
> > > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > > > > > > > >> Thank you for testing. But don't get too excited, because the patch
> > > > > > > > >> simulates a bug that was present in the original mgag200 code. A
> > > > > > > > >> significant number of frames are simply skipped. That is apparently the
> > > > > > > > >> reason why it's faster.
> > > > > > > > >
> > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming
> > > > > > > > > work inside atomic context on purpose. Is there any space to optmise it?
> > > > > > > > > If 2 scheduled update worker are handled at almost same time, can one be
> > > > > > > > > skipped?
> > > > > > > >
> > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling
> > > > > > > > the worker before a previous instance started, will not create a second
> > > > > > > > instance. The worker's instance will complete all pending updates. So in
> > > > > > > > some way, skipping workers already happens.
> > > > > > >
> > > > > > > So I think that the most often fbcon update from atomic context is the
> > > > > > > blinking cursor. If you disable that one you should be back to the old
> > > > > > > performance level I think, since just writing to dmesg is from process
> > > > > > > context, so shouldn't change.
> > > > > >
> > > > > > Hmm, then for the old driver, it should also do the most update in
> > > > > > non-atomic context?
> > > > > >
> > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs
> > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
> > > > > > the cache setting of DRM shadow buffer? say the orginal code use a
> > > > > > cachable buffer?
> > > > >
> > > > > Hm, that would indicate the write-combining got broken somewhere. This
> > > > > should definitely be faster. Also we shouldn't transfer the hole
> > > > > thing, except when scrolling ...
> > > >
> > > > First rule of fbcon usage, you are always effectively scrolling.
> > > >
> > > > Also these devices might be on a PCIE 1x piece of wet string, not sure
> > > > if the numbers reflect that.
> > >
> > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and
> > > overhead not entirely out of the question that 150MB/s is actually the
> > > hw limit. If it's really pcie 1x 1.0, no idea where to check that.
> > > Also might be worth to double-check that the gpu pci bar is listed as
> > > wc in debugfs/x86/pat_memtype_list.
> >
> > Here is some dump of the device info and the pat_memtype_list, while it is
> > running other 0day task:
> 
> Looks all good, I guess Dave is right with this probably only being a
> real slow, real old pcie link, plus maybe some inefficiencies in the
> mapping. Your 150MB/s, was that just the copy, or did you include all
> the setup/map/unmap/teardown too in your measurement in the trace?


Following is the breakdown, the 19240 us is the memory copy time

The drm_fb_helper_dirty_work() calls sequentially 
1. drm_client_buffer_vmap	  (290 us)
2. drm_fb_helper_dirty_blit_real  (19240 us)
3. helper->fb->funcs->dirty()    ---> NULL for mgag200 driver
4. drm_client_buffer_vunmap       (215 us)

Thanks,
Feng


> -Daniel
> 
> >
> > controller info
> > =================
> > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
> >         Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1)
> >         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 16
> >         NUMA node: 0
> >         Region 0: Memory at d0000000 (32-bit, prefetchable) [size=16M]
> >         Region 1: Memory at d1800000 (32-bit, non-prefetchable) [size=16K]
> >         Region 2: Memory at d1000000 (32-bit, non-prefetchable) [size=8M]
> >         Expansion ROM at 000c0000 [disabled] [size=128K]
> >         Capabilities: [dc] Power Management version 2
> >                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
> >                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
> >                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >         Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                 Address: 00000000  Data: 0000
> >         Kernel driver in use: mgag200
> >         Kernel modules: mgag200
> >
> >
> > Related pat setting
> > ===================
> > uncached-minus @ 0xc0000000-0xc0001000
> > uncached-minus @ 0xc0000000-0xd0000000
> > uncached-minus @ 0xc0008000-0xc0009000
> > uncached-minus @ 0xc0009000-0xc000a000
> > uncached-minus @ 0xc0010000-0xc0011000
> > uncached-minus @ 0xc0011000-0xc0012000
> > uncached-minus @ 0xc0012000-0xc0013000
> > uncached-minus @ 0xc0013000-0xc0014000
> > uncached-minus @ 0xc0018000-0xc0019000
> > uncached-minus @ 0xc0019000-0xc001a000
> > uncached-minus @ 0xc001a000-0xc001b000
> > write-combining @ 0xd0000000-0xd0300000
> > write-combining @ 0xd0000000-0xd1000000
> > uncached-minus @ 0xd1800000-0xd1804000
> > uncached-minus @ 0xd1900000-0xd1980000
> > uncached-minus @ 0xd1980000-0xd1981000
> > uncached-minus @ 0xd1a00000-0xd1a80000
> > uncached-minus @ 0xd1a80000-0xd1a81000
> > uncached-minus @ 0xd1f10000-0xd1f11000
> > uncached-minus @ 0xd1f11000-0xd1f12000
> > uncached-minus @ 0xd1f12000-0xd1f13000
> >
> > Host bridge info
> > ================
> > 00:00.0 Host bridge: Intel Corporation Device 7853
> >         Subsystem: Intel Corporation Device 0000
> >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 0
> >         NUMA node: 0
> >         Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0
> >                         ExtTag- RBE+
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L0s <512ns, L1 <4us
> >                         ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
> >                 RootCtl: ErrCorrectable+ ErrNon-Fatal+ ErrFatal+ PMEIntEna- CRSVisible-
> >                 RootCap: CRSVisible-
> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                 DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
> >                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance De-emphasis: -6dB
> >                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> >                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >         Capabilities: [e0] Power Management version 3
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
> >         Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c <?>
> >         Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?>
> >         Capabilities: [250 v1] #19
> >         Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?>
> >         Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 <?>
> >
> >
> > Thanks,
> > Feng
> >
> >
> > >
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

  reply	other threads:[~2019-09-05 10:48 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29  9:51 [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression kernel test robot
2019-07-29  9:51 ` kernel test robot
2019-07-30 17:50 ` Thomas Zimmermann
2019-07-30 17:50   ` Thomas Zimmermann
2019-07-30 18:12   ` Daniel Vetter
2019-07-30 18:12     ` Daniel Vetter
2019-07-30 18:50     ` Thomas Zimmermann
2019-07-30 18:50       ` Thomas Zimmermann
2019-07-30 18:59       ` Daniel Vetter
2019-07-30 18:59         ` Daniel Vetter
2019-07-30 20:26         ` Dave Airlie
2019-07-30 20:26           ` Dave Airlie
2019-07-31  8:13           ` Daniel Vetter
2019-07-31  8:13             ` Daniel Vetter
2019-07-31  9:25             ` Huang, Ying
2019-07-31  9:25               ` [LKP] " Huang, Ying
2019-07-31 10:12               ` Thomas Zimmermann
2019-07-31 10:12                 ` [LKP] " Thomas Zimmermann
2019-07-31 10:21               ` Michel Dänzer
2019-08-01  6:19                 ` Rong Chen
2019-08-01  6:19                   ` [LKP] " Rong Chen
2019-08-01  8:37                   ` Feng Tang
2019-08-01  8:37                     ` [LKP] " Feng Tang
2019-08-01  9:59                     ` Thomas Zimmermann
2019-08-01  9:59                       ` [LKP] " Thomas Zimmermann
2019-08-01 11:25                       ` Feng Tang
2019-08-01 11:25                         ` [LKP] " Feng Tang
2019-08-01 11:58                         ` Thomas Zimmermann
2019-08-01 11:58                           ` [LKP] " Thomas Zimmermann
2019-08-02  7:11                           ` Rong Chen
2019-08-02  7:11                             ` [LKP] " Rong Chen
2019-08-02  8:23                             ` Thomas Zimmermann
2019-08-02  8:23                               ` [LKP] " Thomas Zimmermann
2019-08-02  9:20                             ` Thomas Zimmermann
2019-08-02  9:20                               ` [LKP] " Thomas Zimmermann
2019-08-01  9:57                   ` Thomas Zimmermann
2019-08-01  9:57                     ` [LKP] " Thomas Zimmermann
2019-08-01 13:30                   ` Michel Dänzer
2019-08-02  8:17                     ` Thomas Zimmermann
2019-08-02  8:17                       ` [LKP] " Thomas Zimmermann
2019-07-31 10:10             ` Thomas Zimmermann
2019-07-31 10:10               ` Thomas Zimmermann
2019-08-02  9:11               ` Daniel Vetter
2019-08-02  9:11                 ` Daniel Vetter
2019-08-02  9:26                 ` Thomas Zimmermann
2019-08-02  9:26                   ` Thomas Zimmermann
2019-08-04 18:39   ` Thomas Zimmermann
2019-08-04 18:39     ` Thomas Zimmermann
2019-08-05  7:02     ` Feng Tang
2019-08-05  7:02       ` Feng Tang
2019-08-05  7:28       ` Rong Chen
2019-08-05 10:25         ` Thomas Zimmermann
2019-08-05 10:25           ` Thomas Zimmermann
2019-08-06 12:59           ` Chen, Rong A
2019-08-06 12:59             ` [LKP] " Chen, Rong A
2019-08-07 10:42             ` Thomas Zimmermann
2019-08-07 10:42               ` [LKP] " Thomas Zimmermann
2019-08-09  8:12               ` Rong Chen
2019-08-09  8:12                 ` [LKP] " Rong Chen
2019-08-12  7:25                 ` Feng Tang
2019-08-12  7:25                   ` [LKP] " Feng Tang
2019-08-13  9:36                   ` Feng Tang
2019-08-13  9:36                     ` [LKP] " Feng Tang
2019-08-13  9:36                     ` Feng Tang
2019-08-16  6:55                     ` Feng Tang
2019-08-16  6:55                       ` [LKP] " Feng Tang
2019-08-22 17:25                     ` Thomas Zimmermann
2019-08-22 17:25                       ` [LKP] " Thomas Zimmermann
2019-08-22 17:25                       ` Thomas Zimmermann
2019-08-22 20:02                       ` Dave Airlie
2019-08-22 20:02                         ` [LKP] " Dave Airlie
2019-08-23  9:54                         ` Thomas Zimmermann
2019-08-23  9:54                           ` [LKP] " Thomas Zimmermann
2019-08-23  9:54                           ` Thomas Zimmermann
2019-08-24  5:16                       ` Feng Tang
2019-08-24  5:16                         ` [LKP] " Feng Tang
2019-08-24  5:16                         ` Feng Tang
2019-08-26 10:50                         ` Thomas Zimmermann
2019-08-26 10:50                           ` [LKP] " Thomas Zimmermann
2019-08-27 12:33                           ` Chen, Rong A
2019-08-27 12:33                             ` [LKP] " Chen, Rong A
2019-08-27 12:33                             ` Chen, Rong A
2019-08-27 17:16                             ` Thomas Zimmermann
2019-08-27 17:16                               ` [LKP] " Thomas Zimmermann
2019-08-28  9:37                               ` Rong Chen
2019-08-28  9:37                                 ` [LKP] " Rong Chen
2019-08-28 10:51                                 ` Thomas Zimmermann
2019-08-28 10:51                                   ` [LKP] " Thomas Zimmermann
2019-09-04  6:27                                   ` Feng Tang
2019-09-04  6:27                                     ` [LKP] " Feng Tang
2019-09-04  6:53                                     ` Thomas Zimmermann
2019-09-04  6:53                                       ` [LKP] " Thomas Zimmermann
2019-09-04  8:11                                       ` Daniel Vetter
2019-09-04  8:11                                         ` [LKP] " Daniel Vetter
2019-09-04  8:35                                         ` Feng Tang
2019-09-04  8:35                                           ` [LKP] " Feng Tang
2019-09-04  8:43                                           ` Thomas Zimmermann
2019-09-04  8:43                                             ` [LKP] " Thomas Zimmermann
2019-09-04 14:30                                             ` Chen, Rong A
2019-09-04 14:30                                               ` [LKP] " Chen, Rong A
2019-09-04  9:17                                           ` Daniel Vetter
2019-09-04  9:17                                             ` [LKP] " Daniel Vetter
2019-09-04 11:15                                             ` Dave Airlie
2019-09-04 11:15                                               ` [LKP] " Dave Airlie
2019-09-04 11:20                                               ` Daniel Vetter
2019-09-04 11:20                                                 ` [LKP] " Daniel Vetter
2019-09-04 11:20                                                 ` Daniel Vetter
2019-09-05  6:59                                                 ` Feng Tang
2019-09-05  6:59                                                   ` [LKP] " Feng Tang
2019-09-05 10:37                                                   ` Daniel Vetter
2019-09-05 10:37                                                     ` [LKP] " Daniel Vetter
2019-09-05 10:48                                                     ` Feng Tang [this message]
2019-09-05 10:48                                                       ` Feng Tang
2019-09-05 10:48                                                       ` Feng Tang
2019-09-09 14:12                                     ` Thomas Zimmermann
2019-09-09 14:12                                       ` [LKP] " Thomas Zimmermann
2019-09-09 14:12                                       ` Thomas Zimmermann
2019-09-16  9:06                                       ` Feng Tang
2019-09-16  9:06                                         ` [LKP] " Feng Tang
2019-09-17  8:48                                         ` Thomas Zimmermann
2019-09-17  8:48                                           ` [LKP] " Thomas Zimmermann
2019-09-17  8:48                                           ` Thomas Zimmermann
2019-08-05 10:22       ` Thomas Zimmermann
2019-08-05 10:22         ` Thomas Zimmermann
2019-08-05 12:52         ` Feng Tang
2019-08-05 12:52           ` Feng Tang
2020-01-06 13:19           ` Thomas Zimmermann
2020-01-06 13:19             ` Thomas Zimmermann
2020-01-08  2:25             ` Rong Chen
2020-01-08  2:28               ` Rong Chen
2020-01-08  5:20               ` Thomas Zimmermann
2020-01-08  5:20                 ` Thomas Zimmermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905104811.GF5541@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.