Bug 20896

Summary: [GM965 KMS] X does not draw untill mouse is moved. Probably IRQ problems
Product: DRI Reporter: Mateusz Kaduk <mateusz.kaduk>
Component: DRM/IntelAssignee: Jesse Barnes <jbarnes>
Status: CLOSED NOTOURBUG QA Contact:
Severity: major    
Priority: medium CC: dariush, dino, markuman, mateusz.kaduk
Version: DRI gitKeywords: NEEDINFO
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
This includes config.gz for kernel 2.6.30-rc1-vanilla with two patches from mailing list that I used.
none
set proper clock gating & render clock bits
none
Intel regs dumps from ums & kms
none
Xorg.0.log in failing kms case.
none
dmesg in failing kms case.
none
Regump with 2.6.30-rc3-git5 with jbarnes patch for RENCLK differences on kms faling case.
none
DEBUGFS: Content of /sys/kernel/debug/dri/0/ with kms failing case.
none
DEBUGFS: Content of /sys/kernel/debug/dri/0/ with ums working case.
none
more debug info
none
remove INSTPM setting
none
dmesg with more debug info patch
none
workaround IER clearing in EnterVT
none
Check IER in wait_request
none
fix build none

Description Mateusz Kaduk 2009-03-26 16:42:37 UTC
Latest working kernel version: Every version with KMS disabled.
Earliest failing kernel version: Same problem with 2.6.29rc8
Distribution: GNU/Linux Debian Sid
Hardware Environment: 00:02.0 VGA compatible controller: Intel Corporation
Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)

Software Environment: Xorg(master) libdrm(master) xf86-intel(master)
mesa(master)
Problem Description: X does not draw elements unless mouse currsor is moved.
Moreover things are horribly slow.

Also after booting in KMS enabled mode.
 grep i915 /proc/interrupts 
 29:          1          0   PCI-MSI-edge      i915
After booting in KMS disabled mode.
 grep i915 /proc/interrupts 
 29:       5853       5411   PCI-MSI-edge      i915@pci:0000:00:02.0

dmesg/syslog/Xorg.0.log does not produce any strange information so not
attached.

Steps to reproduce: Boot in KMS enabled kernel 2.6.29 on GM965 with software as
above and startx or gdm.
Comment 1 Mateusz Kaduk 2009-04-14 12:36:01 UTC
More tests on my GM965:

Booting pci=nomsi i915.modeset=1
$ grep i915 /proc/interrupts 
 16:       6639       6520   IO-APIC-fasteoi   i915, ahci, uhci_hcd:usb5, yenta
Moving or switching windows is slow and not acceptable, running glxgears shows framerates from 32 to below 3 and you need luck to kill it. Then its again possible to write anything.

Booting i915.modeset=1
$ grep i915 /proc/interrupts 
 29:          0          0   PCI-MSI-edge      i915
Things are not rendered until mouse is moved.

Booting with kms disabled by default
$ grep i915 /proc/interrupts 
 29:       5853       5411   PCI-MSI-edge      i915@pci:0000:00:02.0
Everything works

There is something different on KMS and UMS pathways that cause the problem.
Comment 2 Mateusz Kaduk 2009-04-14 12:42:40 UTC
Created attachment 24794 [details]
This includes config.gz for kernel 2.6.30-rc1-vanilla with two patches from mailing list that I used.
Comment 3 Markus Bergholz 2009-04-22 13:20:56 UTC
I have similar problems.
sometimes the keyboard hangs while typing. if i move the mouse, it goes on. with activ compositing it happen very very often! atm i try intel 2.7.0-3.

kernel 2.6.29.1-4 on arch linux. 
Inte 945GC (thats what lspci said. foxconn mainboard 945CSX mini itx with intel atom 330).
no kms

zcat /proc/config.gz | grep MSI
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_MSI_LAPTOP=m


anymore details?
Comment 4 Dariush Forouher 2009-04-27 06:23:52 UTC
Same problem here.  Screen doesn't get updated until I either move the mouse or
press a key. 

System: Dell Latitude D630
Chip: GM965
Platform: AMD64/
Kernel: Current kernel git + pull from dri-intel
OS: Debian Stable/Unstable
Xorg: 7.4
Intel: 2.7.0
Comment 5 Florian Mickler 2009-04-29 12:58:47 UTC
hm... if your typing also hangs sometimes, maybe it is an kms unrelated _regression_ in the kernel and kms just is most affected? 

if so, you could file a bug at the kernel bugzilla and have it listed as a regression, which gives this issue more publicity, and maybe someone involved with interrupts has some suggestions....

Comment 6 Florian Mickler 2009-04-29 13:01:32 UTC
but of course first try with rc3-git6 as andreas on the xorg@ suggested... 
Comment 7 Dariush Forouher 2009-04-29 13:45:55 UTC
Hi Florian,

well, running the same kernel without KMS works fine, so I wouldn't call it a regression (at least for non-KMS users). Also typing doesn't hang, it's just that when the screen freezes, I can cause a repaint either with an mouse or keyboard event.

I also tried current git as you suggested (plus the patches from drm-intel git tree): no change, unfortunately.
Comment 8 Mateusz Kaduk 2009-04-29 13:55:07 UTC
(In reply to comment #7)
> Hi Florian,
> 
> well, running the same kernel without KMS works fine, so I wouldn't call it a
> regression (at least for non-KMS users). Also typing doesn't hang, it's just
> that when the screen freezes, I can cause a repaint either with an mouse or
> keyboard event.
> 
> I also tried current git as you suggested (plus the patches from drm-intel git
> tree): no change, unfortunately.
> 

There are no gpu related changes between git5 and git6.
So it should not change anything. Also patches from drm-intel are already in git5.
I tested with git5 and still enabling KMS cause this problem.

IMHO it is regression since some code has been introduce that cause bug.
I am really looking forward if someone will fix it.
Comment 9 Jesse Barnes 2009-05-04 15:29:15 UTC
*** Bug 21155 has been marked as a duplicate of this bug. ***
Comment 10 Jesse Barnes 2009-05-04 15:32:18 UTC
Some questions here:
  1) does this happen even if you're running something with animation?  (e.g. 
     glxgears, some animated web page)
  2) do you see graphics interrupts happening (grep i915 /proc/interrupts) during 
     that time?  you'd probably have to ssh in to confirm that so as not to 
     perturb the display.

If the answer to both is yes, it may be that our block handler isn't running or somehow we're not flushing front buffer rendering to the screen in a timely way...

If the answer to (1) is yes but (2) is no (as is the case in some of the reports here) it may be that we're not enabling user interrupts but are completing rendering and flushing it eventually through the server...
Comment 11 Mateusz Kaduk 2009-05-04 16:20:38 UTC
(In reply to comment #10)
> Some questions here:
>   1) does this happen even if you're running something with animation?  (e.g. 
>      glxgears, some animated web page)
>   2) do you see graphics interrupts happening (grep i915 /proc/interrupts)
> during 
>      that time?  you'd probably have to ssh in to confirm that so as not to 
>      perturb the display.
> 
> If the answer to both is yes, it may be that our block handler isn't running or
> somehow we're not flushing front buffer rendering to the screen in a timely
> way...
> 
> If the answer to (1) is yes but (2) is no (as is the case in some of the
> reports here) it may be that we're not enabling user interrupts but are
> completing rendering and flushing it eventually through the server...
> 

1) Yes

Mouse cursor animation seems not to be affected at all. It animates 5-6s then stops on last animation.

But glxgears works only if I keep moving mouse or pressing something on keyboard.

Fps are higher when I move mouse more frequently.
Writing this comment is possible though.

To see animation I also tired playing music in totem.

Moving mouse is needed, rendering stops faster then music which lasts 5-6s probably sound buffer and again mouse movement is needed to hear something.

So not only rendering is affected.

Test with youtube, I can watch movie for 4-5s then I hear only sound 2s more then no sound and static picture till I move mouse or press key again.

2) No

After running glxgears and moving mouse to make it animated
$ cat /proc/interrupts 
           CPU0       CPU1       
  0:     110339     110515   IO-APIC-edge      timer
  1:       1678       1644   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc0
  9:        837        845   IO-APIC-fasteoi   acpi
 12:       3946       3900   IO-APIC-edge      i8042
 14:       3427       3615   IO-APIC-edge      ide0
 16:       6047       6274   IO-APIC-fasteoi   uhci_hcd:usb5, yenta
 17:      19027      18417   IO-APIC-fasteoi   uhci_hcd:usb6, HDA Intel
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb7, mmc0
 19:          2          0   IO-APIC-fasteoi   ehci_hcd:usb2
 20:         28         30   IO-APIC-fasteoi   uhci_hcd:usb3
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          2          1   IO-APIC-fasteoi   ehci_hcd:usb1
 29:          0          0   PCI-MSI-edge      i915
 30:       6170       6187   PCI-MSI-edge      ahci
 31:        187        198   PCI-MSI-edge      eth2
 32:       5554       5486   PCI-MSI-edge      iwlagn
NMI:          0          0   Non-maskable interrupts
LOC:      76740      98836   Local timer interrupts
SPU:          0          0   Spurious interrupts
RES:      21083      21734   Rescheduling interrupts
CAL:         45         71   Function call interrupts
TLB:        118        251   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
ERR:          0
MIS:          0

dmesg does not show anything special except
[drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22

Xorg.0.log is also fine.

Switching windows is slow badly and sometimes needs moving mouse or pressing keyboard key.

Switching to VT and back is possible.
Comment 12 Jesse Barnes 2009-05-04 16:58:31 UTC
Created attachment 25445 [details] [review]
set proper clock gating & render clock bits

We need to do this in KMS mode anyway; hope it helps with this problem.
Comment 13 Mateusz Kaduk 2009-05-04 17:34:16 UTC
Comment on attachment 25445 [details] [review]
set proper clock gating & render clock bits

Does not fix this.
Comment 14 Mateusz Kaduk 2009-05-05 13:15:09 UTC
Created attachment 25490 [details]
Intel regs dumps from ums & kms

Two files contain output from intel_reg_dumper for both cases (kms/ums).
Comment 15 Jesse Barnes 2009-05-05 13:35:07 UTC
Upping severity.
Comment 16 Mateusz Kaduk 2009-05-05 13:38:35 UTC
Created attachment 25495 [details]
Xorg.0.log in failing kms case.
Comment 17 Mateusz Kaduk 2009-05-05 13:39:10 UTC
Created attachment 25496 [details]
dmesg in failing kms case.
Comment 18 Mateusz Kaduk 2009-05-05 13:40:28 UTC
Created attachment 25499 [details]
Regump with 2.6.30-rc3-git5 with jbarnes patch for  RENCLK differences on kms faling case.
Comment 19 Dariush Forouher 2009-05-05 14:17:47 UTC
Found a smoking gun:
Running "vbetool vbestate save" during boot induces the bug.
Disabling this (which probably is unnecessary anyway?) fixes the problem.
Comment 20 Mateusz Kaduk 2009-05-05 14:25:51 UTC
Created attachment 25512 [details]
DEBUGFS: Content of /sys/kernel/debug/dri/0/ with kms failing case.
Comment 21 Mateusz Kaduk 2009-05-05 14:27:00 UTC
Created attachment 25513 [details]
DEBUGFS: Content of /sys/kernel/debug/dri/0/ with ums working case.
Comment 22 Jesse Barnes 2009-05-05 14:31:34 UTC
Dariush, that's interesting.  I'm starting to wonder if PCI state might have changed somehow.  Can you check your /sys/devices/pci0000:00/0000:00:02.0/ PCI files for changes before and after vbetool runs?  I wonder if config space changes somehow (maybe bus mastering or some other feature gets turned off)...  resource0 from before & after might also be interesting to diff...
Comment 23 Jesse Barnes 2009-05-05 14:46:50 UTC
Created attachment 25518 [details] [review]
more debug info

Ok in the KMS case your IER is reading back as 0, which indicates a problem.  This patch adds some more debug info, please try it out.
Comment 24 Jesse Barnes 2009-05-05 15:00:49 UTC
Created attachment 25519 [details] [review]
remove INSTPM setting

No idea why this was here, but it could cause problems.
Comment 25 Mateusz Kaduk 2009-05-05 15:01:20 UTC
Created attachment 25520 [details]
dmesg with more debug info patch
Comment 26 Dariush Forouher 2009-05-05 15:15:16 UTC
Hi Jesse,

nope, nothing changes as far as I can tell. I wasn't able to open resource0 though (I/O-Error). Do I have to employ some magic to access it?

This is the output of "lspci -vvvxxxx", before & after (no difference)

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) (prog-if 00 [VGA controller])
	Subsystem: Dell Latitude D630
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 27
	Region 0: Memory at f6e00000 (64-bit, non-prefetchable) [size=1M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at eff8 [size=8]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Mask- 64bit- Count=1/1 Enable+
		Address: fee0300c  Data: 4179
	Capabilities: [d0] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
		Bridge: PM- B3+
	Kernel driver in use: i915
00: 86 80 02 2a 07 04 90 00 0c 00 00 03 00 00 80 00
10: 04 00 e0 f6 00 00 00 00 0c 00 00 e0 00 00 00 00
20: f9 ef 00 00 00 00 00 00 00 00 00 00 28 10 f9 01
30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 48 00 00 00 09 00 0a 91 2c 64 00 30
50: 04 00 30 00 19 00 00 00 00 00 00 00 00 00 80 7f
60: 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 05 d0 01 00 0c 30 e0 fe 79 41 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 01 01 01 01 00 00 00 00 4d 01 00 00
d0: 01 00 23 00 00 00 01 01 01 01 00 94 34 00 00 00
e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00
f0: 05 02 34 07 ff 00 00 00 90 0f 04 00 00 c8 66 7f
Comment 27 Jesse Barnes 2009-05-05 15:31:09 UTC
Created attachment 25525 [details] [review]
workaround IER clearing in EnterVT

Check if IER was cleared at EnterVT time.  If so, re-enable interrupts so that things work.
Comment 28 Jesse Barnes 2009-05-05 15:46:33 UTC
Created attachment 25526 [details] [review]
Check IER in wait_request

The 2D driver doesn't call EnterVT in KMS mode, so we have to do the check somewhere else.  This makes the kernel check at wait_request time instead.
Comment 29 Jesse Barnes 2009-05-05 15:47:36 UTC
Created attachment 25527 [details] [review]
fix build

Oops this one actually builds.
Comment 30 Dariush Forouher 2009-05-05 15:53:16 UTC
Ah, just wanted to write that "workaround IER clearing in EnterVT" doesn't fix the problem. :)

It's 1 am, will try the other patch tommorrow.

Have a nice day and thanks for your help!
Dariush
Comment 31 Jesse Barnes 2009-05-05 16:04:24 UTC
Ok, sent out the workaround.  But this is really a distro config/BIOS issue so I'm marking NOTOURBUG.
Comment 32 Dariush Forouher 2009-05-08 12:24:23 UTC
Hi Jesse,

yes, the workaround does fix the issue.

[   61.162844] [drm:i915_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling

I've filed a bugreport at Debian to not start vbetool when KMS is active.

Thanks again for your help!

Dariush

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.