======================================================================= 'My Hairiest Bug' War Stories [APPENDIX B: Condensed data tables] [early extended version of paper which appeared in Comm. ACM, 40 (4) April 1997] Marc Eisenstadt Knowledge Media Institute The Open University Milton Keynes MK7 6AA UK M.Eisenstadt@open.ac.uk ======================================================================= ASCII documents available from ftp://kmi-ftp.open.ac.uk/pub/bugtales: bugtales.1o4 (Main part 1: Abstract, intro, data analysis) bugtales.2o4 (Main part 2: Relating the dimensions, legacy, ref's) bugtales.3o4 (Appendix A: Selected anecdotes - see also bugdata.txt) =>bugtales.4o4 (Appendix B: Condensed data tables) bugdata.txt (ASCII raw data from 1st trawl) PURE ASCII VERSION FOR EMAIL/BBOARD POSTINGS Be sure to print in monospace font (e.g. Courier 10pt., 6 3/4" margin). All of the above documents are available via FTP from kmi-ftp.open.ac.uk (login: anonymous, password: , directory /pub/bugtales) or by email from M.Eisenstadt@open.ac.uk. ======================================================================= APPENDIX B: THE CONDENSED DATA Table B-1. The condensed data, showing only those 36 entries for which every field could be filled. Enries in the left hand column are coded to preserve anonymity. ID "B1a" means BIX informant number 1 supplying the first of several anecdotes from that informant. ID labels U1-U37 refer to Usenet informants, and A1-A8 refer to AppleLink informants. Entries in the rightmost column include labels such as {L} and {T} to show the most plausible mapping to the categories used by Knuth (1983). Knuth's category labels are: A=Algorithm awry; B=blunder; D=data structure debacle; F=forgotten function; L=language liability; M=module mismatch;S=surprise;T=typo. The other category labels used in the table cells are discussed in the body of the paper. ---------------------------------------------------------------------- ID: B1a Context: New commercial software about to be shipped; Quality Assurance found crash Symptom: Should be a call to OS at specific addres, but it's missing Why hard: mis-directed blame (compiler) How found: inspeculation: hand-replicate compiled code; inspection of source; call in expert Root cause: init: undeclared variable 'temp' clashes w. keyword 'temp' Knuth sez: {L} ---------------------------------------------------------------------- ID: B2 Context: Punched card COBOL programming Symptom: executed an 'unreachable' line! Why hard: WYSIPIG (What You See Is Probably Illusory, Guv'nor) How found: inspeculation: visual inspection (with 15-inch steel ruler) Root cause: lex: '.' was in col 72, hence regarded as a comment! Knuth sez: {T} ---------------------------------------------------------------------- ID: B1b Context: IBM Series/1 programming Symptom: Console prints "IEW1234 IMMINENT SYSTEM FAILURE" Why hard: faulty assumption (of cooperative programmer... turned out to be practical or malicious joke) How found: expert recognized (after grilling programmer) Root cause: behav: own program printed this out intentionally, user forgot (programmer's behaviour unpredictable... this was a practical or malicious joke) ---------------------------------------------------------------------- ID: B9 Context: VAX Pascal program for reading/writing file of complex records Symptom: write OK, but read yields garbage Why hard: tools hampered: Heisenbug (bug goes away when debugging tools used) How found: gather data: step & study, print & peruse Root cause: init: read parameters should have been declared as VAR (i.e. pointer rather than value) Knuth sez: {L} ---------------------------------------------------------------------- ID: U1 Context: 15,000 lines of C code; PCs/Unix; does screen writes using curses library Symptom: Odd chars on screen Why hard: tools hampered: Heisenbug How found: gather data: wrap & profile Root cause: mem: free() called multiple times; malloc() buffers overrun by /0 at end of string Knuth sez: {D} ---------------------------------------------------------------------- ID: U3 Context: Fileserver maintenance; C / Ultrix Symptom: open file, then try to read it, server claims 'not open' Why hard: tools hampered: long run to replicate: multiple flakey parts, so tracing/stepping slowed by other failures How found: gather data: conditional break & inspect: bkpt on memory access (spec. address) Root cause: mem: array of char maxlength 1024 got overrun, munging file pointer structure Knuth sez: {D} ---------------------------------------------------------------------- ID: U6 Context: Compiler for 8086's running MSDOS Symptom: function returned wrong value Why hard: faulty model (thought stacks grew down); timing How found: gather data: step & study:single-step assembler, observe registers Root cause: mem: address BELOW stack pointer being wiped out by os interrupt handlers; pointer decremented too late in the compiled code ---------------------------------------------------------------------- ID: U9 Context: set covering code in Fortran (spaghetti) Symptom: anomalous test results Why hard: spaghetti: other person's code How found: gather data: MEM probe: hand-trace & debugger trace, home in via "wolf-fence" Root cause: des.logic: array element was both a status flag & a value... '0' was ambiguous, and mis-interpreted & therefore clobbered Knuth sez: {A/F} ---------------------------------------------------------------------- ID: U10 Context: PC clone, debugging memory resident ('TSR') programs Symptom: crash after 20 minutes, but would not crash when the debugger was switched on Why hard: tools hampered: a) long run to replicate (w. lotsa printout); b) Heisenbug How found: inspeculation: 'dedication'/observation Root cause: mem: bounds overrun; TSR wrote above top of memory into program... didn't happen under debugger which occupied some of that memory Knuth sez: {D} ---------------------------------------------------------------------- ID: U11 Context: called foo(1); but in definition of foo(X); assigned X = 2 Symptom: 1 =2 Why hard: WYSIPIG semantics How found: gather data: print & peruse Root cause: init: famous FORTAN prob.. redefined 1 to be 2!!!! Knuth sez: {L/M} ---------------------------------------------------------------------- ID: U12 Context: Artificial Life; 4000 lines of unstructured K&R C code Symptom: Crash (segmentation fault) after ~45,000 iterations; 2 hours Why hard: 1) spaghetti: other person's code; 2) tools hampered: long run to replicate (watchpoints etc. slowed downx10); 3) cause effect chasm How found: gather data: wrap & profile (GNU malloc() range-checking); trace data flow, print out data structures looking for oddball (= a kind of dump & diff) Root cause: mem: array of shorts (max value 32K) incremented every 1.5 iterations until > 32K, then this value was used as an array index!; bounds checking on array operation would have noticed, since 32676+1 -> -32768, ouch negative array index Knuth sez: {D?/S/L} ---------------------------------------------------------------------- ID: U14 Context: IBM kernel development for AIX v3 Symptom: once every ~20,000 iterations, SIGTRAP killed traced proc Why hard: cause/effect chasm: infrequent; tools hampered: long run to replicate; Heisenbug How found: gather data: step & study (problem went away with new compiler) Root cause: unsolved: it's never been solved (new AIX released, prob went away) ---------------------------------------------------------------------- ID: U15 Context: Port of large financial planning package from PC to Mac Symptom: random wrong answers (only for large models) Why hard: tools hampered: long run to replicate How found: gather data: print & peruse Root cause: init: uninitialized variable, on the PC version it is set to 0, but on Mac may be set to whatever was in that location previously Knuth sez: {L/F} ---------------------------------------------------------------------- ID: U16 Context: binary i/o package Symptom: strings were gibberish Why hard: cause/effect chasm: infrequent How found: expert recognized clichˇ & suggested discriminating test Root cause: lang: compiler (MSC) derived alignment constraints from base type rather than full type Knuth sez: {L} ---------------------------------------------------------------------- ID: U17 Context: cpu-intensive nighttime job doing big citation index search Symptom: job WITHOUT i/o mysteriously terminated by console interrupt Why hard: cause/effect chasm: infrequent; dump showed nothing How found: inspeculation: gestation, thinking about logic; realizing it wasn't a fluke Root cause: des.logic: if main acct. idle while bkgnd job has a file locked, -> os kills job (hack to avoid deadlock) Knuth sez: {S/A} ---------------------------------------------------------------------- ID: U18 Context: VAX-11 FORTRAN code Symptom: mysterious behaviour of FORTRAN code Why hard: WYSIPIG lex How found: expert recognized clichˇ & suggested discriminating test Root cause: lex: TAB (1 char) replaced by 8 space (8 chars), pushed identifier past column 72, so truncated (cf. entry 49) Knuth sez: {T} ---------------------------------------------------------------------- ID: U19 Context: developing code generator for Ada compiler on PERQ Symptom: user complained of crash with stack underflow; other users ok Why hard: cause/effect chasm: inconsistent, many degrees of freedom (HWxcompilerxlinkerxsource=2^^4) How found: controlled expts: exhaustively try every combination, only happened when compiler was linked on specific machine Root cause: vendor: hardware fault... after swapping CPU card & re- linking compiler, problem vanished ---------------------------------------------------------------------- ID: U20 Context: PC clone, editor bug Symptom: crash ONLY on 486 executing wrong interrupt number Why hard: tools hampered: Heisenbug How found: inspeculation: 'inspiration' Root cause: vendor: int86() stores interrupt, then modifies (ok) BUT 486 instruction pipeline had ALREADY read the instruction Knuth sez: {D/S} ---------------------------------------------------------------------- ID: U21 Context: code inherited from others Symptom: 5 old bugs (new user didn't even know it) Why hard: spaghetti How found: inspeculation: reformat code & visual inspection Root cause: des.logic: misc... flaws in logical flow Knuth sez: {A} ---------------------------------------------------------------------- ID: U22a Context: Programming an embedded system in PL/M-86 Symptom: crash.. process jumped to stack segment of another process Why hard: faulty assumption due to 'warning', not 'error' so still compiled & linked How found: gather data: step & study w. hed debugger Root cause: init: allocated 1 byte less than needed, e.g. char msg[1]={'h', 'i'} should be [2] Knuth sez: {D} ---------------------------------------------------------------------- ID: U22b Context: implementing quicksort + print result in C; testing with printfs Symptom: out of stack space Why hard: tools hampered: error cloberred diagnostic tools!!! How found: gather data: print'n 'peruse Root cause: init: own use of 'write' redefined system's 'write' without warning, so qsort's output & manual trace's printf() recursed endlessly Knuth sez: {L} ---------------------------------------------------------------------- ID: U23 Context: portable C code with some machine-specific assembler Symptom: ran ok EXCEPT on Vax/Ultrix Why hard: faulty assumption (thought bug in own code) How found: a) inspeculation: hand simulation; (b) gather data: wrap & profile -> dump & diff; step & study Root cause: vendor: instruction present on older Vaxes only emulated on MicroVAX-II, emulation code had a bug in it! ---------------------------------------------------------------------- ID: U25 Context: shorthand-to-English translation program Symptom: disk system returned wrong sector, but on different iterations! Why hard: cause/effect chasm: inconsistent; timing-sensitive (75µsec!) How found: gather data: wrap & profile; canonicalize (reduce to simplest replicable case) Root cause: des.logic: flip-flop set/reset side effect w. timing interaction; read(A) reads A, then read(unknown) continues to return A Knuth sez: {A;S} ---------------------------------------------------------------------- ID: U26 Context: Mac NetHack Symptom: misc. bugs Why hard: cause/effect chasm: inconsistent How found: gather data: 'Heap scramble' (provokes bugs) & 'Mr. Bus error' (tailored tease-out) Root cause: mem: double-indirect references, middle pointer ('handle') is owned by Mac OS, trouble if unlocked or invalid handle moved Knuth sez: {D} ---------------------------------------------------------------------- ID: U27 Context: IBM 1401 w. punched cards; 8K core Symptom: dud compiler Why hard: faulty assumption (mis-directed blame, told 'didn't work') How found: inspeculation: book ('anatomy of compiler') + reasoning (lo mem + h'ware multiply + multiply SUBR) Root cause: mem (prog too big): mutiply SUBR pushed compiler beyond 8K... removing punched cards for this SUBR cured problem (because this model had hardware multiply) ---------------------------------------------------------------------- ID: U28a Context: Modifying SOS editor under TOPS10 Symptom: crashed when exiting intra-line alter mode with Why hard: cause/effect chasm: intermittent How found: gather data: dump & diff Root cause: des.logic: different instruction for vs, (logic error) Knuth sez: {A} ---------------------------------------------------------------------- ID: U29 Context: game playing program; asked 'want to continue? (y/n)' Symptom: Program worked when user input "y", but only on Wednesdays, else always quit!!! Why hard: cause/effect chasm How found: inspeculation: re-arrange code (didn't help), + ? Root cause: mem: documentation said 8 bytes needed, but 12 really needed, so 6 days a week cloberred mem with blanks, but on Wednesday,'y' luckily matched 9th byte Knuth sez: {D} ---------------------------------------------------------------------- ID: U34 Context: large office management system Symptom: Word Perfect said 'printing', but nothing happened Why hard: cause/effect chasm: inconsistent; worked ok on similar setup How found: 1) gather data: wrap & profile; 2) controlled experiments Root cause: unsolved: never debugged!.. failed precisely with machine A & printer B & > 1MB code & not(breakout box) ! Knuth sez: {S} ---------------------------------------------------------------------- ID: U35 Context: Porting graphics code to new DG machine Symptom: infinite loop Why hard: tools hampered: Heisenbug How found: controlled experiment: binary probe; gather data: conditional break & inspect Root cause: vendor: when arctan instruction was on a page boundary, a microcode defect caused jump to 0; since a content of 0 also means 'jump to 0', it resulted in endless loop ---------------------------------------------------------------------- ID: U36 Context: TCP/IP network kernel for MS-DOS Symptom: Telnet hangs, but only with 1 terminal emulator, and only at one slow speed Why hard: cause/effect chasm: intermittent; speed-dependent; tools hampered: context precluded using debugger How found: gather data: print & peruse; step & study Root cause: des.logic: re-xmit (slow) packet, test 'already?'->neg number; old packet updated where in data stream we were Knuth sez: {A} ---------------------------------------------------------------------- ID: U37 Context: Porting game 'omega' from Unix to Atari ST Symptom: intermittent weirdness Why hard: tools hampered: context precluded using debugger How found: gather data: wrap & profile Root cause: mem: program deleted list containing ptrs to other objects Knuth sez: {D} ---------------------------------------------------------------------- ID: A6 Context: Developing a Mac sound application, requires rapid open/close of serial ports Symptom: crash after ~3000 iterations Why hard: cause/effect chasm: timing problem; intermittent How found: gather data: wrap & profile, controlled experiments; expert recognized clichˇ Root cause: unsolved: device mgr not robust enough to handle rapid open/reset/close of serial ports... root cause still unknown; used workaround ---------------------------------------------------------------------- ID: A7 Context: Ampex: Unix upgraded for real-time stuff Symptom: system crash after ~2 hrs Why hard: tools hampered: error consumed evidence; long run to replicate How found: gather data: print & peruse; step & study with hardware bus analyzer. Root cause: vendor: custom-tuned boards-> bad data -> jump to bad address Knuth sez: {D} ---------------------------------------------------------------------- ID: A8 Context: Kids developing Hypercard 2.0 apps Symptom: Hypercard card suddenly disappears Why hard: WYSIPIG user-action How found: inspeculation: lucky observation Root cause: behav: CMD-Shift-Del kills card