Bug 28546

Summary: [965GM] crash at login unless BIOS setup screen is entered
Product: DRI Reporter: Frank Polte <akasaka030>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED WONTFIX QA Contact:
Severity: blocker    
Priority: medium CC: akasaka030, chris, kenyon
Version: unspecifiedKeywords: NEEDINFO
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
apport output at working state
none
apport output at not working state
none
screenshot with kompare
none
/intel_reg_dumper_dump_not_working.txt
none
intel_reg_dumper_dump_working.txt
none
the boot screen is working
none
but the login screen is broken
none
after 5 seconds it changes to this
none
normally in a working state the login screen looks like this
none
the newest dmesg file
none
the error_state file
none
all files from the kernel debug folder and the dmesg file packed none

Description Frank Polte 2010-06-15 02:02:51 UTC
Created attachment 36275 [details]
apport output at working state

Using the HP 550, it is impossible to use the Xserver without starting the BIOS changing nothing and ignoring the changes. Maybe it is a BIOS or Video BIOS bug, but it is the same problem at any Linux. I'm using ubuntu lucid lynx and made 2 different apport files. One at a working state and one at the not working state.

In the not working file you see the following failure message starting at line 1677:
++++++++++++++++++++++++++++++++++++++++++++++++++++++
../../intel/intel_bufmgr_gem.c:983: Error setting domain 613: Input/output error
 (WW) intel(0): i830_uxa_prepare_access: bo map failed
 
 Fatal server error:
 Failed to submit batchbuffer: Input/output error
 
 
 Please consult the The X.Org Foundation support 
 	 at http://wiki.x.org
  for help. 
 Please also check the log file at "/var/log/Xorg.0.log" for additional information.
++++++++++++++++++++++++++++++++++++++++++++++++++++++

another difference can be seen at my attachment picture
Comment 1 Frank Polte 2010-06-15 02:03:51 UTC
Created attachment 36276 [details]
apport output at not working state
Comment 2 Frank Polte 2010-06-15 02:04:41 UTC
Created attachment 36277 [details]
screenshot with kompare
Comment 3 Frank Polte 2010-06-15 02:13:38 UTC
If you need more information, a BIOS or VIDEO BIOS dump, just ask.

This bug is very bad, because it brings the notebook to a non-bootable state.

The inteltool from coreboot project cannot read any Nortbridge data, so I don't know how to read out the state of the graphics adapter registers.

I hope you can help me.

Thank you very much.
Comment 4 Jesse Barnes 2010-06-15 08:40:57 UTC
Ouch, so unless you enter BIOS setup it looks like either your VBT (video bios table) or your GPU isn't set up properly so X can't start.

Does the kernel set modes correctly?  I.e. do you see the kernel boot log at startup time?

To get the current graphics registers, you can use intel_reg_dumper from the intel-gpu-tools package.  Having that from a broken boot might be useful.
Comment 5 Frank Polte 2010-06-15 12:45:53 UTC
(In reply to comment #4)
> Ouch, so unless you enter BIOS setup it looks like either your VBT (video bios
> table) or your GPU isn't set up properly so X can't start.
> 

maybe both? 


> Does the kernel set modes correctly?  I.e. do you see the kernel boot log at
> startup time?

I think so, I see everything even the ubuntu splashscreen and the blinking points.
Everything is fine, even the background of the desktop is coming. but if he starts to create windows... then he produces noisy signs and it crashes.

> 
> To get the current graphics registers, you can use intel_reg_dumper from the
> intel-gpu-tools package.  Having that from a broken boot might be useful.

I will try this as soon as possible.

Thank you very much for the quick answer.
Comment 6 Frank Polte 2010-06-15 13:01:47 UTC
Created attachment 36304 [details]
/intel_reg_dumper_dump_not_working.txt
Comment 7 Frank Polte 2010-06-15 13:02:13 UTC
Created attachment 36305 [details]
intel_reg_dumper_dump_working.txt
Comment 8 Frank Polte 2010-06-15 13:06:38 UTC
I just added the stdout-output of intel_reg_dumper at the working and at the not working state. It seems to be different, but I don't know anything so special.

I saw that there is a  intel_reg_read and a intel_reg_write in that package. May I read out a working state and write it into the gpu before starting the Xserver?

Maybe this could be a quick workaround.

I hope this bug can be closed as quick a possible. Because the new Intel driver ist soo fast and notebook so nice...

Thank you very much one more time.
Comment 9 Jesse Barnes 2010-07-01 14:00:43 UTC
Ok this sounds different than I was first thinking then.  Can you capture a video or pictures of the failing case?  If the desktop is coming up ok but then it starts failing once GNOME apps start up, it could be a problem related to the gnome display manager...
Comment 10 Jesse Barnes 2010-07-01 14:01:53 UTC
Also, since it sounds like you're using ubuntu, please try the xorg edgers repo packages, including the kernel.  That should help us narrow down this problem.
Comment 11 Frank Polte 2010-07-01 15:26:50 UTC
(In reply to comment #9)
> Ok this sounds different than I was first thinking then.  Can you capture a
> video or pictures of the failing case?  If the desktop is coming up ok but then
> it starts failing once GNOME apps start up, it could be a problem related to
> the gnome display manager...

The same happens with KDE, so I think it is the Xserver.

I will take a video and upload it to youtube ok?
Comment 12 Frank Polte 2010-07-01 15:28:17 UTC
(In reply to comment #10)
> Also, since it sounds like you're using ubuntu, please try the xorg edgers repo
> packages, including the kernel.  That should help us narrow down this problem.

Please explain more exactly, what I should install and which logs I should upload.

Thanks for the help. :)
Comment 13 Jesse Barnes 2010-07-01 15:34:15 UTC
Yeah a youtube upload is fine.  For xorg edgers you'll have to look through the ubuntu websites, I think there's a launchpad project for it.
Comment 14 Jesse Barnes 2010-07-01 15:35:57 UTC
Oh also I don't know what format
Comment 15 Jesse Barnes 2010-07-01 15:36:14 UTC
Err nevermind that last comment.
Comment 16 Chris Wilson 2010-07-02 11:50:40 UTC
(In reply to comment #12)
> (In reply to comment #10)
> > Also, since it sounds like you're using ubuntu, please try the xorg edgers repo
> > packages, including the kernel.  That should help us narrow down this problem.
> 
> Please explain more exactly, what I should install and which logs I should
> upload.

To install the testing packages for Ubuntu,

  add-apt-repository ppa:xorg-edgers
  apt-get update
  apt-get upgrade

To update the kernel, you'll have to specify that by hand. However, if it is booting fine and dies when using X, a good first step will be just updating the user space components (as above).
Comment 17 Frank Polte 2010-07-03 03:46:00 UTC
Created attachment 36710 [details]
the boot screen is working
Comment 18 Frank Polte 2010-07-03 03:47:15 UTC
Created attachment 36711 [details]
but the login screen is broken
Comment 19 Frank Polte 2010-07-03 03:48:01 UTC
Created attachment 36712 [details]
after 5 seconds it changes to this
Comment 20 Frank Polte 2010-07-03 03:49:47 UTC
Created attachment 36713 [details]
normally in a working state the login screen looks like this
Comment 21 Frank Polte 2010-07-03 07:01:37 UTC
Now I installed all the stuff from the ppa, but the problem is still the same.
This is the link to the video. The quality is bad, but you can see the problem anyway.

http://www.youtube.com/watch?v=-APAXPetxeg
Comment 22 Frank Polte 2010-07-04 06:25:11 UTC
With your xorg-edgers packages installed, I cannot create an apport-bug-report with ubuntu-bug, because it is not an "ubuntu package". What log files do you need?
Comment 23 Chris Wilson 2010-07-04 07:02:50 UTC
Does the kernel still detect a GPU hang? [Check dmesg.] If so, then the /sys/kernel/debug/dri/0/i915_error_state is the most interesting file, provided you have also installed a post-2.6.34 kernel.

That failure mode doesn't immediately ring any alarms bells. Though I think it might be partially rendered as a result of disabling acceleration following a GPU hang.

Hmm, the only other point of interest seems to be that FBC differs between the working and non-working configs.
Comment 24 Frank Polte 2010-07-04 13:26:51 UTC
There is an error_state file I will upload it. The dmesg file I will upload too.

Is this enough? Feel free to ask for more information you need the solve this bug. ;)
Comment 25 Frank Polte 2010-07-04 13:28:03 UTC
Created attachment 36751 [details]
the newest dmesg file
Comment 26 Frank Polte 2010-07-04 13:29:44 UTC
Created attachment 36752 [details]
the error_state file
Comment 27 Frank Polte 2010-07-04 13:30:46 UTC
Created attachment 36753 [details]
all files from the kernel debug folder and the dmesg file packed
Comment 28 Frank Polte 2010-07-04 23:26:19 UTC
In the dmesg file I found the following lines, maybe it gives an hint:

...
[    1.488375] agpgart-intel 0000:00:00.0: Intel 965GME/GLE Chipset
[    1.488817] agpgart-intel 0000:00:00.0: detected 7676K stolen memory
[    1.493287] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000
...
[    1.512236] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    1.512242] i915 0000:00:02.0: setting latency timer to 64
[    1.528894]   alloc irq_desc for 44 on node -1
[    1.528898]   alloc kstat_irqs on node -1
[    1.528908] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[    1.528918] [drm] set up 7M of stolen space
[    1.634568] [drm] initialized overlay support
...

What does "stolen memory" mean?
Maybe it is a memory initialization error?

Bye
Frank Polte
Comment 29 Chris Wilson 2010-07-12 14:50:10 UTC
(In reply to comment #28)
> What does "stolen memory" mean?

"Stolen memory" is region of memory reserved by the BIOS for the sole use of the GPU (i.e. it is not available for the rest of the system). GEM has little use for this (as the emphasis is on dynamic allocation of memory to the GPU, thus allowing the memory to be returned to the CPU depending upon system usage), but a few chipsets do require some stolen memory for pages that must be physically contiguous, the framebuffer compression being the major one.

The bug itself, I've stalled upon. I don't have enough information within the error dump to check the programs and samplers being executed. When I get some time I will work upon improving the debugging facilities for i965+. In the meantime, if you can think of anything that might differ on your system between the two conditions, or find another way to reproduce the same bug, that may grant enough insight in order to determine the cause.
Comment 30 Jesse Barnes 2010-07-15 10:29:26 UTC
Reassigning to Chris so he has another excuse to improve our debug infrastructure even more. ;)
Comment 31 Jesse Barnes 2011-02-22 11:21:02 UTC
Frank, does this issue still occur?
Comment 32 Frank Polte 2011-02-22 14:15:45 UTC
No, I sold the Notebook. So I don't have any problems at all. ;)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.