5133 – xorg-x11 - radeon driver crashes at least on x86_64 architecture

Bug 5133 - xorg-x11 - radeon driver crashes at least on x86_64 architecture

Summary: xorg-x11 - radeon driver crashes at least on x86_64 architecture

Status:	RESOLVED DUPLICATE of bug 4859

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/Radeon (show other bugs)
Version:	6.99.99.902 (7.0 RC2)
Hardware:	x86 (IA32) Linux (All)

Importance:	high normal
Assignee:	Xorg Project Team
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	1690
	Show dependency tree / graph

Reported:	2005-11-23 18:33 UTC by Michal Jaegermann
Modified:	2005-11-30 08:45 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
a sample log file from an attempt to start X server with radeon driver (27.14 KB, text/plain) 2005-11-23 18:36 UTC, Michal Jaegermann	no flags	Details
a configuration file used (3.11 KB, text/plain) 2005-11-23 18:40 UTC, Michal Jaegermann	no flags	Details
X server log after NoDDC option was added (47.15 KB, text/plain) 2005-11-24 07:23 UTC, Michal Jaegermann	no flags	Details
log from a working setup with Version 6.8.2 (43.89 KB, text/plain) 2005-11-24 07:24 UTC, Michal Jaegermann	no flags	Details
for completness - log from a server starting with "NoDDC" and no dri loaded (45.78 KB, text/plain) 2005-11-30 05:24 UTC, Michal Jaegermann	no flags	Details
View All

Description Michal Jaegermann 2005-11-23 18:33:34 UTC

After a switch from 6.8.2 to 6.99.99.902 on x86_64 machine attempt to start
X server invariably ends up with:

Fatal server error:
Caught signal 11.  Server aborting

and I am reduces to only vesa driver which still is ok.  This is a regress
from 6.8.2 radeon driver which worked on the same hardware without issues.
Comparing old logs from a working setup with current logs a crash seems to
happen when a monitor detection should happen.

A full sample log and a config file (generated mostly by system-config-display
from Fedora rawhide) are attached.

This bug was first reported as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173439

Comment 1 Michal Jaegermann 2005-11-23 18:36:32 UTC

Created attachment 3874 [details]
a sample log file from an attempt to start X server with radeon driver

Comment 2 Michal Jaegermann 2005-11-23 18:40:14 UTC

Created attachment 3875 [details]
a configuration file used

What "(Secondary)" really means in 'BoardName' I really do not know.  This was
generated.  One would think that this should not matter.

Comment 3 Michel Dänzer 2005-11-23 19:28:51 UTC

Does Option "NoDDC" in the radeon Device section work around the problem?

It would be great if you could run the server inside gdb and get a backtrace.
Beware that you can't do this from the same machine.

Comment 4 Michal Jaegermann 2005-11-24 07:21:05 UTC

> Does Option "NoDDC" in the radeon Device section work around the problem?

It definitely changes what happens although results are still problematic.
Visually the whole screen goes blank, a keyboard is dead and after a login
from remote I have an unkillable X server process pegged around 99% CPU.
The only way to restore video and sanity is 'shutdown -r now'.

I attach a sample log from a situation when NoDDC option was used below and
also, for a comparison, a log from a working situation with 6.8.2.  Do not seem
to be drastically different but results are not the same

OTOH when NoDDC allows me to get far enough I am see:

(WW) RADEON: No matching Device section for instance (BusID PCI:1:0:1) found

and that even I added an explicit BusID identifier in a Device section
(either PCI:1:0:1 or PCI:1:0:0) or even with another Device section for
"Videocard1" and matching Screen section.

A layout of video on a PCI bus happens to be that:

-[0000:00]-+-00.0  VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge
           +-01.0-[0000:01]--+-00.0  ATI Technologies Inc R300 AD [Radeon 9500 Pro]
           |                 \-00.1  ATI Technologies Inc R300 AD [Radeon 9500
Pro] (Secondary)

(plus a number of other devices).  The card does have digital output but
I have hooked up there only a small analog monitor.

> It would be great if you could run the server inside gdb and get a backtrace.
> Beware that you can't do this from the same machine.

I am not really sure how to do something of that sort from another machine.
Does it has to be x86_64 too?

Comment 5 Michal Jaegermann 2005-11-24 07:23:27 UTC

Created attachment 3889 [details]
X server log after NoDDC option was added

Comment 6 Michal Jaegermann 2005-11-24 07:24:56 UTC

Created attachment 3890 [details]
log from a working setup with Version 6.8.2

Comment 7 Michel Dänzer 2005-11-29 22:03:11 UTC

(In reply to comment #4)
> > Does Option "NoDDC" in the radeon Device section work around the problem?
> 
> It definitely changes what happens although results are still problematic.
> Visually the whole screen goes blank, a keyboard is dead and after a login
> from remote I have an unkillable X server process pegged around 99% CPU.

Could be DRI related, try disabling it.


> OTOH when NoDDC allows me to get far enough I am see:
> 
> (WW) RADEON: No matching Device section for instance (BusID PCI:1:0:1) found

You can ignore this, the secondary function is just there for multihead to work
properly with some versions of Windows.


> > It would be great if you could run the server inside gdb and get a backtrace.
> > Beware that you can't do this from the same machine.
> 
> I am not really sure how to do something of that sort from another machine.

The usual way is to ssh in.

> Does it has to be x86_64 too?

No, it doesn't matter what kind of machine it is.

Comment 8 Michal Jaegermann 2005-11-30 05:15:23 UTC

>> I am not really sure how to do something of that sort from another machine.

> The usual way is to ssh in.

Ah, misunderstanding.  One is not running gdb from another machine but simply
not from a console login when trying to start X.

OK, here is what gdb has to say when "NoDDC" is not in use:

Program received signal SIGSEGV, Segmentation fault.
xf86DoEDID_DDC2 (scrnIndex=0, pBus=0x7fef70) at xf86DDC.c:221
221	    VDIF_Block = 
(gdb) l
216	#ifdef DEBUG
217	    if (!tmp)
218		ErrorF("Cannot interpret EDID block\n");
219	    ErrorF("Sections to follow: %i\n",tmp->no_sections);
220	#endif
221	    VDIF_Block = 
222		VDIFRead(scrnIndex, pBus, EDID1_LEN * (tmp->no_sections + 1));    
223	    tmp->vdif = xf86InterpretVdif(VDIF_Block);
224	
225	    return tmp;
(gdb) bt
#0  xf86DoEDID_DDC2 (scrnIndex=0, pBus=0x7fef70) at xf86DDC.c:221
#1  0x00002aaaab8b409c in RADEONDisplayDDCConnected (pScrn=0x7fc350, 
    DDCType=DDC_VGA, port=0x7f5ff0) at radeon_driver.c:1029
#2  0x00002aaaab8b540f in RADEONQueryConnectedMonitors (pScrn=0x7fc350)
    at radeon_driver.c:2062
#3  0x00002aaaab8c038c in RADEONPreInit (pScrn=0x7fc350, flags=Variable "flags"
is not available.
)
    at radeon_driver.c:4871
#4  0x000000000045fce1 in InitOutput (pScreenInfo=0x6cad40, argc=1, 
    argv=0x7fffff8cb5a8) at xf86Init.c:612
#5  0x0000000000432a88 in main (argc=1, argv=0x7fffff8cb5a8, envp=Variable
"envp" is not available.
)
    at main.c:372


Sure enough the line in question tries to dereference 'tmp' which happens
to be zero while it is clear from the code that this is really not expected.
This happens on the fourth call of xf86DoEDID_DDC2 with 'EDID_block' printing
as (unsigned char *) 0x7fb050 "" and 'xf86InterpretEDID()' indeed then
returns NULL.  On three preceding calls 'EDID_block' is consistently NULL
so the whole function immediately returns NULL as well.

> Could be DRI related, try disabling it.

Good guess.  If I will add "NoDDC" and will comment out in xorg.conf

       Load  "dri"

line then I can start server using radeon driver.

Comment 9 Michal Jaegermann 2005-11-30 05:24:03 UTC

Created attachment 3944 [details]
for completness - log from a server starting with "NoDDC" and no dri loaded

Should I open new bug about DRI or this is not needed?

Comment 10 Adam Jackson 2005-11-30 16:06:10 UTC

dupe, already fixed.

*** This bug has been marked as a duplicate of 4859 ***

Comment 11 Michal Jaegermann 2005-11-30 16:59:21 UTC

If this is already fixed in 6.8.2 then why it reappears in 6.99.99.902?
If you know that 'xf86InterpretEDID()' is allowed to return NULL then the fix
is indeed obvious.

What about my question on a DRI ticket?

Comment 12 Michel Dänzer 2005-12-01 03:45:40 UTC

(In reply to comment #11)
> If this is already fixed in 6.8.2 then why it reappears in 6.99.99.902?

Can you verify that CVS HEAD doesn't have the fix and reopen this bug if it doesn't?

> What about my question on a DRI ticket?

Isn't there one about that already?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.