Bug 23183

Summary: [G4x] HDMI hotplug interrupts stuck
Product: DRI Reporter: Alberto González <luis6674>
Component: DRM/IntelAssignee: Jesse Barnes <jbarnes>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: high CC: jbarnes, jonathan, mikko.rantalainen
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Init outputs before IRQ status
none
another debug patch
none
Check hotplug status bits
none
Handle spurious interrupts
none
Handle spurious interrupts #2
none
IRQ debug patch
none
dmesg with HDMI stuck interrupts
none
debug output init
none
DRM debug log
none
enable hotplug only for detected outputs none

Description Alberto González 2009-08-06 15:13:56 UTC
After upgrading the kernel to 2.6.30 (from 2.6.29) I found a problem with udev using a lot of CPU time. The problem would trigger randomly, right at boot time and others some minutes after. Once triggered, only killing udev would stop the problem.

It turned out to be a DRM problem where something got stuck and would send events with every interrupt.

The problem is also present on 2.6.31 (~rc4).

Reference LMKM thread:
http://lkml.org/lkml/2009/6/28/11

Hardware: Dell Studio desktop, Intel G45 chipset, using the integrated graphics.

An example dmesg with a debug patch applied can be found here:
http://lkml.org/lkml/2009/7/22/313

And a test patch that proved to solve the problem here:
http://lkml.org/lkml/2009/7/22/327
Comment 1 Jesse Barnes 2009-08-31 10:42:14 UTC
Created attachment 29047 [details] [review]
Init outputs before IRQ status

Does this patch also work?  In order to avoid the spurious interrupts we're supposed to initialize the PEG band gap to the correct voltage...
Comment 2 Alberto González 2009-08-31 13:03:27 UTC
I just tested from latest linus' git and with this latest patch and I do see the problem still.

I had nothing in dmesg when the problem triggered since I didn't apply the debug patch provided some time ago. However, I did have an error that might or might not be related to the patch (I can just say that git from about a week ago didn't show this error):

[   23.764976] kbuildsycoca4 used greatest stack depth: 6028 bytes left
[   75.535185] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   75.602287] [drm:i915_gem_execbuffer] *ERROR* Object f6160060 appears more than once in object list
[   75.617678] [drm:i915_gem_execbuffer] *ERROR* Object f6160060 appears more than once in object list
[   75.671562] [drm:i915_gem_execbuffer] *ERROR* Object f638f420 appears more than once in object list
[   75.818337] [drm:i915_gem_execbuffer] *ERROR* Object f638f480 appears more than once in object list
[   75.881663] [drm:i915_gem_execbuffer] *ERROR* Object f638f4e0 appears more than once in object list
[   75.901713] [drm:i915_gem_execbuffer] *ERROR* Object f638f4e0 appears more than once in object list
[   76.150701] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   76.170042] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   76.242343] [drm:i915_gem_execbuffer] *ERROR* Object f638f540 appears more than once in object list
[   76.306192] [drm:i915_gem_execbuffer] *ERROR* Object f638f420 appears more than once in object list
[   76.312373] [drm:i915_gem_execbuffer] *ERROR* Object f638f420 appears more than once in object list
[   76.317394] [drm:i915_gem_execbuffer] *ERROR* Object f638f420 appears more than once in object list
[   76.451448] [drm:i915_gem_execbuffer] *ERROR* Object f638f540 appears more than once in object list
[   76.455672] [drm:i915_gem_execbuffer] *ERROR* Object f638f540 appears more than once in object list
[   76.471548] [drm:i915_gem_execbuffer] *ERROR* Object f638f540 appears more than once in object list
[   76.552370] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   76.567279] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   76.614682] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   76.627649] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   76.697832] [drm:i915_gem_execbuffer] *ERROR* Object f638f660 appears more than once in object list
[   76.711030] [drm:i915_gem_execbuffer] *ERROR* Object f638f660 appears more than once in object list
[   76.845556] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   76.862304] [drm:i915_gem_execbuffer] *ERROR* Object f638f300 appears more than once in object list
[   77.228811] [drm:i915_gem_execbuffer] *ERROR* Object f638f600 appears more than once in object list
[   77.241402] [drm:i915_gem_execbuffer] *ERROR* Object f638f600 appears more than once in object list
[   77.298431] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   77.318427] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   77.446953] [drm:i915_gem_execbuffer] *ERROR* Object f638f5a0 appears more than once in object list
[   77.462289] [drm:i915_gem_execbuffer] *ERROR* Object f638f5a0 appears more than once in object list
[   77.601891] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[   77.614703] [drm:i915_gem_execbuffer] *ERROR* Object f638f3c0 appears more than once in object list
[  187.018764] kio_thumbnail used greatest stack depth: 5644 bytes left

Let me know if I should try something else with this patch or another.
Comment 3 Jesse Barnes 2009-08-31 19:03:15 UTC
Oh well, I guess it's not the PEG band voltage bug then... I'll ping the display guys and see what I can come up with.
Comment 4 Jesse Barnes 2009-09-11 09:10:21 UTC
Created attachment 29419 [details] [review]
another debug patch

Hopefully this one allows your monitor to come back?
Comment 5 Alberto González 2009-09-11 13:31:47 UTC
Yes, this patch solves the problem. In fact, the second part of the patch is the same as the test patch that already proved to solve it some time ago, but it was not considered the right fix. I guess the first part of this last patch is what was missing to make the first one a real fix?

Thanks.
Comment 6 Jesse Barnes 2009-09-11 14:34:32 UTC
(In reply to comment #5)
> Yes, this patch solves the problem. In fact, the second part of the patch is
> the same as the test patch that already proved to solve it some time ago, but
> it was not considered the right fix. I guess the first part of this last patch
> is what was missing to make the first one a real fix?

No, this one still isn't right (chipset guys will get back to me soon I hope).  It was a test patch for what sounds like a related issue; I've had one report that although the hack patch solves the stuck interrupt issue it also prevents monitors from syncing again when they're turned off and back on again (all the while attached).  Pretty weird, but possibly related to the hotplug quirks on G45.
Comment 7 Alberto González 2009-09-11 15:02:21 UTC
Ah, ok, I didn't know about that monitor problem and I really can't confirm if it solves that problem. I was just talking about the uevents thing in my previous reply.
Comment 8 Alberto González 2009-09-22 07:17:01 UTC
I pulled from git today (the soon-to-be 2.6.32-rc1) and I can't reproduce the problem anymore. I'll check to make sure I haven't done anything wrong maybe with the .config (though DRI and everything is working fine), but it does look like it is fixed. I do see the [drm:i915_gem_execbuffer] errors posted above, but that seems unrelated to this issue.

Any idea of what could have fixed it? All upcomming distros will ship with kernel 2.6.31, so it would still be nice to know what fixed it and be able to backport it, if possible.
Comment 9 Jesse Barnes 2009-09-28 10:18:09 UTC
Interesting... no I'm not sure what may have fixed it offhand.  Would it be too much trouble to bisect it?  There have been some fixes to somewhat related areas, but nothing that should directly affect the stuck hotplug interrupt afaik...
Comment 10 Alberto González 2009-09-28 14:25:50 UTC
I've just pulled from git again and bad news: I can see the problem again. I don't know why it seemed to be fixed some days ago, but I probably did something wrong. Sorry about the false report.

I'm not sure I'll have the time and knowledge to perform a bisect, but in case I can do it, would it be worth to bisect between .29 and .30 to see the commit that introduced the problem? Or is it not too relevant at this point?
Comment 11 Jesse Barnes 2009-09-28 14:48:09 UTC
No, you don't need to bisect for the bad commit; I think I know what it's related to.  I was hoping you could bisect to find the *good* commit, but it sounds like there isn't one. :p

When I get back from travelling I'll dig through all the hotplug errata (apparently there were many) and see if I can come up with a real patch for this.
Comment 12 Alberto González 2009-11-02 12:59:55 UTC
Small update: I have connected my monitor using the HDMI connector on my card (with a HDMI to DVI adapter) and I don't get interrupts anymore. Not sure if this was expected, but thought I should report it just in case.

Previously I was connecting my monitor through VGA.

Let me know if you need any further info.
Comment 13 Alberto González 2009-11-05 07:25:22 UTC
Well, after _days_ of using for many hours the computer with the monitor connected via HDMI, the bug has showed its face again. So using HDMI doen't completely solve the problem, but it makes it much more difficult to trigger (it just took 5-10 minutes of usage before).
Comment 14 Jesse Barnes 2009-11-05 09:51:33 UTC
Created attachment 30988 [details] [review]
Check hotplug status bits

Looks like we were checking the wrong bits in the interrupt handler.  Can you give this patch a try?
Comment 15 Alberto González 2009-11-05 15:25:32 UTC
Unfortunately it doesn't seem to help. First I tested on current git but DRI was not working for some reason, so while I couldn't reproduce the bug I thought it was not a good test. So then I tested the patch on 2.6.31.5 (it applied with a trivial change) and there I could reproduce the bug in a few minutes (using the VGA connector).

I'll try to retest on current git again to be sure (if I find the reason why DRI didn't work).
Comment 16 Alberto González 2009-11-05 15:43:17 UTC
Ok, the DRI problem was a stupid typo in the boot parameters, so now it booted fine and just playing a Tux Racer game made the problem show up (even on HDMI).

So the patch really doesn't help :(
Comment 17 Jesse Barnes 2009-11-05 15:51:15 UTC
Created attachment 31000 [details] [review]
Handle spurious interrupts

Ok maybe we need to use both sets; if we get an interrupt on a port but the live bit isn't set we should disable the port.
Comment 18 Jesse Barnes 2009-11-05 15:55:10 UTC
Created attachment 31001 [details] [review]
Handle spurious interrupts #2

Oops, last one had live vs. hotplug interrupts in the wrong order.
Comment 19 Alberto González 2009-11-06 13:18:32 UTC
Sorry for the bad news, this one didn't help either. I could trigger the interrupt storm within a few minutes of usage via VGA.
Comment 20 Jesse Barnes 2009-11-12 09:20:36 UTC
Created attachment 31133 [details] [review]
IRQ debug patch

Can you guys reproduce the problem with this patch applied and attach the output to this bug?  I had some similar data awhile back but I lost it, and I need new theories so I want to see the initial problem data again.  Thanks.
Comment 21 Alberto González 2009-11-12 15:20:38 UTC
Created attachment 31146 [details]
dmesg with HDMI stuck interrupts

Here is a full dmesg with the patch applied. Let me know if you need further information.
Comment 22 Jesse Barnes 2009-11-16 15:50:27 UTC
I wonder if DP_D is supposed to be enabled on your system at all... Can you try the patchset at https://bugs.freedesktop.org/show_bug.cgi?id=22785?  It may need a refresh, I'll ping the author.
Comment 23 Alberto González 2009-11-16 18:03:25 UTC
Ok, I'll try those patches when the author posts the refreshed ones.

One thing I noticed is that xrandr -q reports this:

VGA1 disconnected (normal left inverted right x axis y axis)
DVI1 connected 1680x1050+0+0 (normal left inverted right x axis y axis) 474mm x 296mm
   1680x1050      60.0*+
   1280x1024      75.0
   1024x768       75.1     60.0
   800x600        75.0     60.3
   640x480        75.0     60.0
   720x400        70.1
DP1 disconnected (normal left inverted right x axis y axis)

But the DVI1 that appears connected is in fact an HDMI one. When I connect through VGA is also reports that I have a DVI output (disconnected in that case) but this computer does not have DVI at all.
Comment 24 Jesse Barnes 2009-11-16 18:23:58 UTC
Hm ok, well maybe the child device patchset will help after all...
Comment 25 Alberto González 2009-11-20 06:32:59 UTC
I've tested the child device patchset and it did work correctly in detecting my HDMI output as HDMI plus not detecting an inexistent DP (I posted about it on the bug report).

However, that didn't change the situation regarding the interrupt storm. I could easily reproduce the problem by playing tuxracer :(
Comment 26 Jesse Barnes 2009-11-20 08:53:00 UTC
Do you know which outputs where detected?  I'm thinking if DP_D wasn't created, we should also disable interrupts from that source rather than enabling all of them...
Comment 27 Alberto González 2009-11-20 19:41:14 UTC
I didn't apply the debug patch in my last test, but from my Xorg.0.log:

(II) intel(0): Integrated Graphics Chipset: Intel(R) G45/G43
(--) intel(0): Chipset: "G45/G43"
(II) intel(0): Output VGA1 has no monitor section
(II) intel(0): Output HDMI1 has no monitor section
(II) intel(0): Output VGA1 disconnected
(II) intel(0): Output HDMI1 connected
(II) intel(0): Using exact sizes for initial modes
(II) intel(0): Output HDMI1 using initial mode 1680x1050

Also xrandr reports only a VGA1 disconnected and a HDMI1 connected.

Should I apply the last debug patch and send the logs?

I was also going to try those two previous patches you posted here with the child device ones applied, since I thought that maybe they didn't work just because HDMI was being detected as DVI.
Comment 28 Alberto González 2009-11-20 20:53:47 UTC
I tested the previous patches with the child device ones and it didn't work either.
Comment 29 Jesse Barnes 2009-12-03 12:50:35 UTC
Created attachment 31712 [details] [review]
debug output init

Can you apply this patch and attach the output from when you load with drm debug=6?  I'm hoping the DP output causing problems is ignored; if so I can fix up the hotplug code to handle that case.
Comment 30 Alberto González 2009-12-04 14:47:38 UTC
This patch doesn't apply on top of 2.6.32 and i can't seem to find anything similar in the source code to apply it manually. What should I do to test it?
Comment 31 Jesse Barnes 2009-12-04 16:33:02 UTC
It should apply to Eric's drm-intel-next branch.
Comment 32 Alberto González 2009-12-04 22:58:41 UTC
I'm trying to get something useful but even assuming I built the kernel correctly with the drm-intel-next branch (at least the patch applied and the kernel does work), when I boot with drm.debug=6 and try to get the dmesg I just get this line repeated all the time:

[   40.881495] [drm:i915_add_request], 2242
[   40.886500] [drm:i915_add_request], 2243
...

I tried to get dmesg without starting X, but again it is flooded by this:

[   32.375745] [drm:i915_driver_irq_handler], hotplug event received, stat 0x38200000
[   32.376571] [drm:i915_driver_irq_handler], hotplug event received, stat 0x30200000

Any idea of how to avoid this messages flooding the log so I can get it from the start?
Comment 33 Jesse Barnes 2009-12-04 22:59:58 UTC
You could try drm debug=4 instead, I think that'll dump fewer messages.
Comment 34 Alberto González 2009-12-04 23:12:31 UTC
Created attachment 31761 [details]
DRM debug log

Yes, that worked. Here is the log with drm.debug=4.
Comment 35 Jesse Barnes 2009-12-10 14:37:40 UTC
Created attachment 31953 [details] [review]
enable hotplug only for detected outputs

I didn't include all the output I wanted, but I'm hoping this is what you were running into.

This patch only enables hotplug detection for outputs we actually initialize, so should minimize the chance of getting interrupts for outputs that don't exist.  I also found a note about DP_D in some recent that I'll check out, it could also be what you're hitting.
Comment 36 Alberto González 2009-12-10 16:18:39 UTC
This one looks REALLY good. I've been trying for an hour to reproduce the problem by all means and I've been unable. The problem is 95% reproducible within 5-10 minutes, so I'm almost certain that this patch fixed it. Thanks! :)

Anyway I'll keep testing tomorrow (it's late here) and report back with a 100% definitive answer.
Comment 37 Alberto González 2009-12-11 01:55:10 UTC
Ok, so I've built the same kernel from drm-intel-next without the patch and there I can easily reproduce the problem by simply running glxgears. On the patched kernel there is no way to reproduce it, so now I'm certain that this patch fixes the problem here.

Thank you for all your effort into solving this issue!

Side note: In case this patch is a candidate for being backported, I wonder if it depends on the other patches that make my outputs being correctly detected. Up to (and including) 2.6.32, 3 outputs are detected here: VGA, DVI and DP, but I just have a VGA and a HDMI outputs. In drm-intel-next (therefor 2.6.33, I assume), the outputs are detected correctly as VGA and HDMI. Just in case it matters.

If you'd like me to test any backport or if you want me to send any logs from the patched latest kernel, please let me know.
Comment 38 Jesse Barnes 2009-12-11 11:03:56 UTC
Thanks a lot for testing and confirming.

Yeah, it does depend on correct output detection, which is only present in git (so it'll land in 2.6.33).  I'll post it for review now.
Comment 39 Jesse Barnes 2009-12-28 10:43:13 UTC
commit	b01f2c3a4a37d09a47ad73ccbb46d554d21cfeb0

drm/i915: only enable hotplug for detected outputs

Fix on its way upstream.
Comment 40 Alberto González 2010-02-27 07:59:03 UTC
I have just upgraded to 2.6.33 hoping to finally leave this bug behind, but found that it's still there. However, it's probably just because the outputs are not correctly detected.

This computer only has 2 outputs: VGA (not used) and HDMI (used). But this is what "xrandr -q" says:

VGA1 disconnected (normal left inverted right x axis y axis)
HDMI1 connected 1680x1050+0+0 (normal left inverted right x axis y axis) 474mm x 296mm
   1680x1050      60.0*+
   1280x1024      75.0     60.0
   1152x864       75.0
   1024x768       75.1     60.0
   800x600        75.0     60.3
   640x480        75.0     60.0
   720x400        70.1
DP1 disconnected (normal left inverted right x axis y axis)

And it's that detected (and therefor initialized) DP output which causes the trouble (or so is my understanding).

At some point detection worked good with drm-intel-next branch (detecting only the 2 existing ones), but with 2.6.33 it again detects a non-existent DP.

Any ideas? Should I open a new report for this thing?
Comment 41 Jesse Barnes 2010-02-27 08:51:27 UTC
Yeah, please open a new one.  Would be especially good if you could bisect where things went bad.
Comment 42 Alberto González 2010-02-28 16:15:37 UTC
I guess no need to bisect. I found that the child device patches were reverted by another commit (6207937d4feea000913e8ca23fe20c7744be7847) because they caused trouble for other people. I posted on the relevant report (bug #22785) so I hope that Zhao Yakui can look into another solution.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.