SGI: Development

how to troubleshoot a "bus error" ? - Page 2

Few comment again, since i don't have opportunity to send a xcircuit window on the o200 from work to home:

- I realized the .Xdefaults file is actually for the Xw version of xcircuit, but nobody uses that since the xcircuit developer mentioned code problems with Xw, i.e. it is shite.
- With Phase Change you mean the library display/non-display? I've reread the entire post and i think you mean the ibrary display bug being the cause of 3.6.41 not working correctly.

If you now have two versions side-by-side, one without the bug and the other with the bug, do a version diff. That way you can easily check differences. Most likely code changes in canvas refresh or mouse event handling might be the culprit.
:Crimson: :PI: :Indigo: :O2: :Indy: :Indigo2: :Indigo2IMP: :O200: :O2000: :Onyx2:
dexter1 wrote: If you now have two versions side-by-side, one without the bug and the other with the bug, do a version diff. That way you can easily check differences. Most likely code changes in canvas refresh or mouse event handling might be the culprit.

Thanks, Mr Dex. I believe I have the problem semi-isolated now, it's in events.c, but took a few hours off to make it fit the Family ...

Wasn't exactly straightforward, settings are spread over a couple places but not too bad. I'll share after I hunt down the last few recalcitrants.

Maybe someone artistic could do something about those icons ? Drawing is not my strong suit :(

Then we'll get back to the Important Stuff. Can't let vish steal all the glory :P
Okey-dokey, Smokey ... 3.6.40 works, 3.6.41 does not. But importing events.c from 40 into 41, with the addition of one line, does work. Differences are very few :

As marked ... (this is maybe not the best way ever, but ...)

The last area is the one that seems the most suspicious ...

The release notes aren't too much help on this problem but here they are :

XCircuit wrote: posted: July 18, 2006 at 2:40am version: 3.6 revision: 41
2006-07-17 21:06 tim A number of changes: Modified the way XCircuit creates backup files, avoiding spurious timeout errors that occur in Linux. Does not write the backup file multiple times if there is no activity. Added a command "config suspend" that allows scripts to suspend drawing to the screen during read-in of scripted files. This function does not yet cover all cases where drawing occurs, but it catches the major ones. Corrected numerous Tcl command-line commands. Additional options will be written up in the documentation. "object make" now returns a handle to the newly-generated instance ("symbol.tcl" has been modified to no longer attempt to work around the error), and "object make ... -force" will generate a new empty object (previously disallowed), avoiding the necessity of creating some dummy element and then erasing it.


and the newer, might be relevant ? Same subject, the new "suspend" thingy ...

posted: July 19, 2006 at 2:40am version: 3.6 revision: 42
2006-07-18 17:08 tim Added new TCL script "edif.tcl" that handles EDIF 2.0.0 file format reads. Includes some source file bug fixes, such as having the command "parameter make ..." return a TCL_ERROR if an attempt is made to create a duplicate parameter name. Also, more drawing routines removed when in "suspend" mode, which otherwise cause XCircuit to produce a weird Tcl result on an ordinary command (in this case, a "no such variable XCOps(focus)" error was returned from the command "label type"). The flag TCL_NAMESPACE_ONLY was removed from calls to Tcl_Var, which should prevent the above error from occurring even outside of "suspend" mode.
Also: Rewrote the "config suspend" mechanism slightly so that a single call to "config suspend" puts XCircuit in a "temporarily suspended" drawing mode. Any key or button press in the window will return XCircuit to a normal drawing state. This prevents XCircuit from getting hung in suspend mode if, for example, the "read_edif" procedure halts in the middle with an error return value. "config suspend" can be called twice in a row to prevent keystrokes from breaking the suspend state, in case anyone cares.
Here's a question from logic, not programming, but still ...

Code: Select all

/*--------------------------------------------------------------*/
/* Set the name for a new user-defined object and make the   */
/* object.  If "forceempty" is true, we allow creation of a new   */
/* object with no elements (normally would be used only from a   */
/* script, where an object is being constructed automatically).   */
/*--------------------------------------------------------------*/

objinstptr domakeobject(int libnum, char *name, Boolean forceempty)
{
objectptr *newobj;
objinstptr *newinst;
genericptr *ssgen;
oparamptr ops, newop;
eparamptr epp, newepp;
stringpart *sptr;
XPoint origin;
short loclibnum = libnum;

if (libnum == -1) loclibnum = USERLIB - LIBRARY;

/* make room for new entry in library list */

xobjs.userlibs[loclibnum].library = (objectptr *)
realloc(xobjs.userlibs[loclibnum].library,
(xobjs.userlibs[loclibnum].number + 1) * sizeof(objectptr));

newobj = xobjs.userlibs[loclibnum].library + xobjs.userlibs[loclibnum].number;

*newobj = delete_element(areawin->topinstance, areawin->selectlist,
areawin->selects, NORMAL);

if (*newobj == NULL) {
objectptr initobj;

if (!forceempty) return NULL;

/* Create a new (empty) object */

initobj = (objectptr) malloc(sizeof(object));
initmem(initobj);
*newobj = initobj;
}

invalidate_netlist(topobject);
xobjs.userlibs[loclibnum].number++;


In the first line after the comment, "forcempty" is declared as a Boolean.

But about seven lines from the bottom, it's states "if forcempty returns NULL .."

How can a Boolean return null ? It's either true or false. How can it ever be null ? That's not an option ... when did you stop beating your wife ?

Pretty sure the problem is in this "suspend" thingy. There's only about twenty lines changed in events.c, all involving the new "suspend" function, and events.c is THE file that causes the weird behaviour.
It's not the Boolean which is set to NULL, it's the return value of the function. The function is of type "objinstptr". So if forcempty is false (at this stage in the code), the function returns a objinstptr with a NULL value. Without knowing what type of object "objinstptr" is, and how it's used elsewhere, it's difficult to say what type of impact this may have.
Systems in use:
:Indigo2IMP: - Nitrogen : R10000 195MHz CPU, 384MB RAM, SolidIMPACT Graphics, 36GB 15k HDD & 300GB 10k HDD, 100Mb/s NIC, New/quiet fans, IRIX 6.5.22
:Fuel: - Lithium : R14000 600MHz CPU, 4GB RAM, V10 Graphics, 36GB 15k HDD & 300GB 10k HDD, 1Gb/s NIC, New/quiet fans, IRIX 6.5.30
Other system in storage: :O2: R5000 200MHz, 224MB RAM, 72GB 15k HDD, PSU fan mod, IRIX 6.5.30
Trippynet wrote: It's not the Boolean which is set to NULL, it's the return value of the function.

I will be the first to admit that the "logic" of programming languages often escapes me :P

But still pretty sure it is in events.c This is the only other change of any significance in that file ....

Code: Select all

/* copy name into object and check for conflicts */

strcpy((*newobj)->name, name);
checkname(*newobj);

/* generate library instance for this object (bounding box   */
/* should be default, so don't do calcbbox() on it)      */

addtoinstlist(loclibnum, *newobj, FALSE);

/* recompile the user catalog and reset view bounds */

composelib(loclibnum + LIBRARY);
centerview(xobjs.libtop[loclibnum + LIBRARY]);

return *newinst;
}

Which suspiciously generates the library and resets the view bounds ... altho the crash happens exactly when you drop the object but if there's nothing to drop it onto ... ?
Found it (I think) but i need people to test this, since it's late and i have to bike home in the rain.

I've looked at all the data in Hamei's post and the introduction of "suspend" in 3.6.41 struck a nerve a few days ago, because i remembered i got several warnings during compile of the more recent 3.8.78.

I also couldn't build 3.6.4x since it cannot find the function/command "unsetenv" at runtime. Strange, since it did work last week. I have more issues with the 3.6 branch since it wants to include /usr/lib and pollute my linker with o32 libraries :(

Anyway because of the grumpy behavior of my o200 today, i list some "suspend" compiler warnings here:

Code: Select all

cc-1183 c99: WARNING File = events.c, Line = 516
An unsigned integer is being compared to zero.

if (xobjs.suspend >= 0) return;

And many many more. This was from 3.6.41, but 3.8.78 is also littered with those. Interesting to see that exactly the introduction of the suspend field in 3.6.41 in the xobjs struct causes the "phase change".

So what type is xobjs.suspend ? Answer is in xcircuit.h somewhere at the end:

Code: Select all

u_short      new_changes;
char         suspend;        /* suspend graphics updates if TRUE */
short        numlibs;
short        pages;

A char. Erm is it signed or unsigned? After some googling i found in http://unix.derkeiler.com/Newsgroups/co ... /0424.html

Dr. David Kirkby wrote:
> Erik Max Francis wrote:
>
>>"Dr. David Kirkby" wrote:
>>
>>
>>>char foo;
>>>
>>>is asking for trouble, if you hope to put negative numbers in foo.
>>
>>Indeed. The Standard leaves it unspecified whether an unadorned char is
>>signed or unsigned.
>
>
> I've just confirmed that AIX and IRIX declare it unsigned, whereas
> Solaris, Linux, AIX, HP-UX, Tru64, NetBSD, OpenBSD all declare it
> signed. Although I've not had chance to fix the problem yet, I think
> that explains why my program for computing the properties of
> transmission lines
> http://atlc.sourceforge.net/
> works on Solaris, Linux, AIX, HP-UX, Tru64, NetBSD, OpenBSD, but not
> on AIX or IRIX.
>
> At least fixing it should not be too hard - just needs a bit of time.

I don't have an SGI system handy to check the proper flag but both the
MIPSPro C and GNU-C compilers on Irix have a flag to cause all chars not
explicitly declared as unsigned to be signed. I ran into the same
problem when I ported a package from Intel/Linux to Irix.


Wow, a compiler option to make char behave as being signed? Let's see what "man c99" has to say:

Code: Select all

-signed     Causes values of type char to be treated as if they had
type signed char (which can affect the result of integer
promotions), but the values of CHAR_MIN and CHAR_MAX are
not affected.  The default is to treat values of type char
as if they had type unsigned char.

Thus, does CFLAGS need '-signed' as extra option? Is it that simple?
So if i set that and recompile xcircuit 3.8.78, will i see the library again?



Affirmative. I also see the grid :)
:Crimson: :PI: :Indigo: :O2: :Indy: :Indigo2: :Indigo2IMP: :O200: :O2000: :Onyx2:
dexter1 wrote: Found it (I think) but i need people to test this, since it's late and i have to bike home in the rain.

Jeeze, dexter. I'm speechless. Thank you.

I'll try this out in a few minutes.

Also, I had this thought but didn't want to be a distraction, but those compiler warnings should be good clues. I also noticed dozens of the things flashing past and wondered how to capture them ? I've never used < tail > but that should work ? Or is there an easier way ?
hamei wrote: Or is there an easier way ?


I'd just redirect the compiler output to a text file, then you can search through it later for whatever you need, eg.

Code: Select all

cc -v > output.txt >2&1
directs all output of 'cc -v' (including stderr) to 'output.txt'.
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.
nekonoko wrote: I'd just redirect the compiler output to a text file,

Too easy :P I'll try that next time, thanks ...

Confirmation 1:

Confirmation 2:


Altho I am not overjoyed with the 3.9.40 .... first, it absolutely refuses to find xpm no matter what I do. Second, I had to go into the Makefile and ditch

Code: Select all

cairo.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra
elements.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra
events.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra
fontfile.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra
text.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra
utf8encodings.$(OBJEXT) : CFLAGS += -pedantic -Wall -Wextra

which makes me wonder what other gcc-isms may have been missed.

But 3-8-78 seems to be fine, it's the 'stable' build anyway, so maybe for the moment ... don't know. It's easy enough to keep both installed, just change the startup script, one is installed in /basedir/lib/xcircuit-3.8.78 and the other in /basedir/lib/xcircuit-3.9.40

One step for man, one giant leap for Mankind :D