root / trunk / buildscripts / jslink.pl

Revision 145, 49.6 kB (checked in by alex, 4 years ago)

adding MDA's JavaScript? linker. Holy smokes is this thing cool!

Line 
1#!/usr/bin/perl -w
2
3=head1 NAME
4
5  jslink.pl - eliminate unused code from a javascript library
6
7=head1 SYNOPSIS
8
9  jslink.pl -pre cat -i myapp.js -l lib.js -o -
10
11Options are:
12
13   -pre cat                  # preprocessor to apply to input files (but not -e).
14   -e 'myfunc(3); foobar;'   # an "anchor" expression.
15   -h index.html             # an "anchor" html file, whose script will be used. unimplemented.
16   -i myapp.js               # an "anchor" script file, which will pull in things from library files. 
17   -l lib.js                 # a library file.
18   -o output.js              # the -l files with unneeded code removed. defaults to '-' (stdout).
19   -debug debug              # default is none (no debugging)
20   -warn functionmatch,instmeth,ambigs,dups                # default is none (no warnings about things that may be acceptable). 'all' is also supported.
21   -dump used,unused,usedby,refs,undefs                    # default is none (no dumping). 'all' is also supported.
22   -print used,filemarker,skipped,sourcelines              # default is 'used,filemarker' (print used code from libraries)
23   -trace symname            # issue debug output every time symname is seen. unimplemented.
24   -tabwidth 4               # set the number of spaces that tabs are interpreted as, if different from default of 8.
25   -nestedassigns 0          # whether to attempt to track assignments of nested function definitions to other symbols.
26
27All output except for -o (from -debug, -warn, or -dump) is sent to STDERR.
28
29=head1 DESCRIPTION
30
31This determines "dead" code based on following the transitive closure of references to
32symbols in one or more "anchor" files.
33
34It eliminates whole definitions only; it does not do anything even
35approaching full dead code elimination, as might be done within function bodies.
36
37It will eliminate nested functions if they are not used.
38
39It has knowledge of the builtin ECMAScript objects and their method names, as well as many DOM objects and their methods. It assumes that
40any parameter which calls a method with a name matching a builtin method name is indeed that kind of object. It does not currently
41do any sort of detailed static analysis that might actually prove this.
42
43It also has knowledge of all builtin ECMAscript global functions, and so knows not
44to try to find their definitions. Note that this means that it is up to you to
45determine if you need to provide definitions for missing builtin function or
46missing builtin object methods (for example, Array.prototype.push for IE 5.0).
47
48
49=head2 Treatment of Top-Level Statements
50
51If it finds any references at all to a symbol in some library file (global data or function),
52it will not only pull in the definitions of those symbols, it will then also include
53*all* of the top-level statements in that file.
54That is because we don't want to try to analyze the necessity of the load-time
55statements; it is all or nothing for them.
56If any of those load-time statements have references to functions in
57the same file, then those functions are pulled in too.
58For this reason, it is better to supply individual smaller javascript files
59(or just have few load-time statements).
60
61=head1 IMPLEMENTATION
62
63=head2 Unnecessary Limitations
64
65This implementation is based on a crude parser using regular expressions,
66which makes assumptions about indentation in order to identify the beginnings
67and endings of function definitions.
68
69The assumptions about indentation are valid if a pretty-printing
70preprocessor is used. The default preprocessor is just 'cat'.
71(We also have a Rhino-based preprocessor that we hacked together,
72but it is dependent on Rhino patches we have not yet organized.)
73
74Even if the indentation is regular, there are still lots of problems
75with this implementation: not properly skipping literals (String and RegExp),
76not properly parsing all expressions that might be a function call,
77etc.
78
79There are a variety of alternative implementation approaches, any of which
80would be more interesting and probably better, such as:
81
82  - based on a real ECMAScript grammar, or
83  - based on extending some real ECMAScript interpreter (such as SpiderMonkey or
84    Rhino, which do not explicitly use a grammar), or
85  - implemented in ECMAScript itself, such as something based on Narcissus, or
86  - implemented by analyzing the result of a "real" ECMAScript/JScript linker
87    (which would work with a "real" ECMAScript compiler), to see what symbols it pulls in.
88  - perform source translation to some other programming language that has
89    more mature tools.
90
91=head2 Fundamental Limitations
92
93Given the possible uses of eval(), function lookup tables, computed function names,
94dynamically added object methods, redefinition of functions, and so on,
95it is practically impossible to do this job fully automatically and correctly.
96
97These challenges can result in both false negatives and false positives.
98We can for example entirely miss a dependence on some code
99whose entry point is a string that might even come from
100the outside environment.
101
102On the other hand, when applied to an application that has
103a "driver" or "plugin" model, we might end up pulling in all
104available drivers/plugins, just because their entry points show
105up in some global registry table.
106
107We will also tend to pull in code even if the reference
108is protected by a conditional:
109
110   if (typeof someFunc != 'undefined') someFunc()
111
112The intent of the programmer in such code is typically to call
113someFunc() only if the code that defines it has been loaded
114for some other reason (vz. "weak references" in compiled
115languages).
116
117It is impossible really to solve all these situations automatically.
118It is necessary to get some guidance from the programmer, such
119as pragmas in the code itself, or by some other external
120configuration.
121
122=head2 Internals
123
124As we parse the input, we make up a list of definition objects, which
125have these keys:
126
127   deftype      # one of: 'ctormeth', 'protometh', 'instmeth', 'globalfunc', 'localfunc', 'assign', 'singleton'
128   is_global    # whether this is a top-level expression in a file (versus a function definition).
129   actualname   # the string the actually occurred in the source code for the definition, such as 'MyClass.prototype.sort'.
130   justname     # the last identifier name in the dotted name of the definition, such as 'sort'. key for $DEFS_BY_UNQUAL.
131   qualname     # the fully qualified name for the defined symbol, which may be more explicit than appeared in source code. key for $DEFS_BY_QUAL.
132   parentqual   # the value of qualname of the parent definition (in nesting level), if any.
133   protoname    # the name of the prototype class, such as 'MyClass' (if any -- just for deftype 'protometh')
134   aliasto      # the name of another symbol that this is an alias for (as in "Foo.prototype.meth = aliasfunc").
135   level        # the nesting level of the definition
136   filename     # name of the file (or expression) this came from.
137   startline    # zero-based line number this definition starts with.
138   lastline     # zero-based line number of last line of this definition.
139   lines        # array ref of source lines corresponding to startline to lastline.
140   params       # a hash ref with keys which are the parameter names of the defined function.
141   refs         # the references from this definition to other symbols. hash ref from qualnames to [$reftype, $lineno]
142   undefs       # array ref of symbols in refs that are apparently not defined anywhere.
143   usedby       # array ref of other definition objects which refer (directly) to this one. opposite of refs.
144   used         # boolean to indicate whether it is part of the transitive closure (actually it is the loopcount).
145
146=head1 TODO
147
148=head2 Data Tracking
149
150Track definitions of global data variables, and references to them.
151
152   # global data
153   ^var $NAMERE =
154   # member data
155   ^\ *this.$NAMERE =
156   # local data
157   ^\ *var $NAMERE =
158
159=head2 Aliasing and Scoping
160
161Make sure aliased definitions are not used to resolve references across lexical blocks.
162
163Warn about shadowed variable/function names. Implement 'localfunc' and 'assign' properly
164(see $ANALYZE_NESTED_ASSIGNMENTS).
165
166Track alias definitions (in 'assign' case, LHS inherits all the methods available from RHS).
167Things like "a = b.c.d;".
168
169Track creation of global instances, things like "var foo = new SomeFunc();".
170
171Ultimately, properly tracking of aliases could allow for tightening up of typeing.
172
173=head2 Typing
174
175Right now, if we can't resolve a reference using the fully qualified name, we'll
176match to any definition of a function with the same unqualified name.
177This has the unfortunate consequence of potentially pulling in the wrong code.
178
179We also currently ignore all references to methods that share a name with
180any DOM or builtin ECMAScript object method. This means we can miss undefined
181symbols.
182
183A proper fix would involve attempts at full data flow analysis to determine
184and variable argument types, and/or capitalize on some declaration mechanism
185(such as commented out types, or reliance on JScript .NET source code prior
186to down translation).
187
188Other benefits would be obtained as well, such as catching such errors as
189calling getElementById on a Window object instead of a Document object....
190
191=head2 Subclass Method Calls
192
193Track calls to prototype functions from within subprototype method bodies (based on
194remembering the set of the subprototype's prototype object).
195
196=head2 Function References
197
198Do something about passing in named function references (no parens). Though
199we win somewhat with "justname" look up on the other side.
200
201Scope locally bound functions such as "OuterFunction.inner."
202
203Extend $QUALRE to handle function calls with parameters in dotted expressions: "foobar(1,2).println()"
204
205Handle (ignore) calls from literals, such as '00000'.substring(2)
206
207=head2 Predefined Symbols
208
209Make complete list of ECMAScript functions, variables, and builtin object methods.
210
211Make complete (or somewhat complete) list of DOM object methods.
212
213=head2 Top-Level Statements
214
215Break up top-level statements into multiple continuous sequences within the file,
216or even per whole statement. This would make for better diagnostics (tracking
217line number for the definition/reference instead of just line -1). It would
218also pave the way for perhaps excluding more blocks from the
219final code.
220
221=head2 Preprocessing
222
223Get Rhino to issue original file line numbers.
224Also maybe get Rhino to mangle/change names, or do other transforms (such as conditional "in").
225
226Exclude pattern matches within quoted strings and regexp literals.
227
228Regular expressions to match for strings containing references that should be
229considered (vs. ignore all strings).
230
231Support for -h: allow anchor refs to come from an html page.
232
233Support -D of expressions known to be false or true, to exclude references in:
234
235    if (FALSEEXPR) {...}
236    if (!FALSEEXPR) {} else {...}
237    if (TRUEEXPR) {} else {...}
238
239=head2 Pragmas
240
241Support external declaration of other builtin objects and functions to assume are defined.
242
243Support some special inline comment syntax to indicate that something should be
244considered defined.
245
246=head2 Debugging and Reporting Features
247
248Implement -trace.
249
250Report on line counts in used and unused code.
251
252=head2 Lint-Like Features
253
254Provide warnings on:
255
256   calls to eval
257   computed apply and computed call
258   js file names
259   unknown method on builtin object
260   redefinition of builtin object method
261
262=head1 AUTHOR
263
264Copyright 2004, Mark D. Anderson, mda@discerning.com.
265
266This is free software; you can redistribute it and/or
267modify it under the same terms as Perl itself.
268
269Alternatively, this is licensed under Academic Free License version 1.2.
270
271=cut
272
273use Data::Dumper;
274
275################################################################
276# constants
277
278# including 'this'
279my @KEYWORDS = qw(break else new var case finally return void catch for switch while continue function with this default if throw delete in try do instanceof typeof);
280my %KEYWORDS = map {$_=>1} @KEYWORDS;
281my @BUILTIN_OBJECTS = (qw(Number String Boolean Date RegExp Array Math Object Error Function),
282                       qw(XMLHttpRequest ActiveXObject DOMParser XMLSerializer),
283                       qw(arguments NaN Infinity undefined));
284my %BUILTIN_OBJECTS = map {$_=>1} @BUILTIN_OBJECTS;
285my @BUILTIN_FUNCTIONS = (qw(eval parseInt parseFloat isNaN isFinite decodeURI decodeURIComponent encodeURI encodeURIComponent escape unescape),
286                         qw(ScriptEngineMajorVersion ScriptEngineMinorVersion));
287my @WINDOW_FUNCTIONS = qw(alert blur clearTimeout close focus open print setTimeout);
288my %BUILTIN_FUNCTIONS = map {$_=>1} (@BUILTIN_FUNCTIONS, @WINDOW_FUNCTIONS);
289
290# TODO: Math static methods, rest of methods for RegExp and Date
291my @BUILTIN_METHODS = (
292  qw(toString toLocaleString valueOf hasOwnProperty isPrototypeOf propertyIsEnumerable), # Object
293  qw(apply call), # Function
294  qw(charAt charCodeAt fromCharCode concat indexOf lastIndexOf localeCompare match replace search slice split substring substr toLowerCase toUpperCase toLocaleLowerCase toLocaleUpperCase), # String
295  qw(test match exec), # RegExp
296  qw(concat join push pop reverse shift slice sort splice unshift), # Array
297  qw(toFixed toExponential toPrecision), # Number
298  qw(parse toDateString toTimeString getDate getDay getFullYear getHours getMilliseconds getMinutes getMonth getSeconds getTime getTimezoneOffset getYear), # Date
299  qw(setDate setHours setMilliseconds setMinutes setMonth setSeconds setYear toLocaleTimeString), # more Date
300  qw(caller), # Arguments
301       );
302
303my @DOM_METHODS = (
304    qw(clear createDocument createDocumentFragment createElement createEvent createEventObject createRange createTextNode getElementsByTagName getElementById write), # Document, Document.implementation
305    qw(addEventListener appendChild attachEvent cloneNode createTextRange detachEvent dispatchEvent fireEvent getAttributeNS getAttributeNode hasChildNodes hasAttribute hasAttributes insertBefore removeChild removeEventListener replaceChild scrollIntoView), # Node
306    qw(submit), # Form
307    qw(item), # DOM collections
308    qw(collapse createContextualFragment moveEnd moveStart parentElement select setStartBefore), # Range
309    qw(getPropertyValue setProperty), # Style
310    qw(initEvent preventDefault stopPropagation), # Event
311    qw(serializeToString), # XMLSerializer
312    qw(open send), # XMLHTTP
313    qw(loadXML), # XMLDOM
314    qw(parseFromString), # DOMParser
315);
316# put @WINDOW_FUNCTIONS both in methods and functions, since might be qualified or not
317my %BUILTIN_METHODS = map {$_=>1} (@BUILTIN_METHODS, @DOM_METHODS, @WINDOW_FUNCTIONS);
318my $NAMERE = '([\w_]+)';
319# TODO: need to handle "foobar().baz", but not "switch(a)" 
320my $QUALRE = '([\w_][\w_\.]*)';
321my $GLOBALNAME = 'GLOBAL '; # fake symbol name for top-level statements in some file. trailing space is deliberate to match any real symbol.
322
323################################################################
324# internal variables
325my $DEFS_BY_QUAL = {};
326my $DEFS_BY_UNQUAL = {};
327my $DEFS_BY_FNAME = {};
328
329################################################################
330# configuration variables (or at least potentially configurable)
331
332# the presumed indent on input to indicate a whole definition level.
333my $PREINDENT = 1;
334my $TABWIDTH = 8;
335my $TABSPACES;
336my $PREPROCESSOR = 'cat';
337my $RHINOJAR = '/Users/mda/workspaces/rhino1_5R5/js.jar';
338my $JSLINKERDIR = '.';
339
340my $DEBUGOPTS = {debug => 0};
341my $WARNOPTS = {functionmatch => 0, instmeth => 0, ambigs => 0, dups => 0};
342
343my $DUMPOPTS = {used => 0, unused => 0, usedby => 0, undefs => 0, refs => 0};
344my $DUMPINDENT = '   ';
345
346my $SYMSEP = "         ";
347
348my $OUTOPTS = {skipped => 0, sourcelines => 0, used => 1, filemarker => 1};
349
350my $OUTFILE = '-';
351
352my $TRACE_SYMS = {};
353
354my $FIND_UNDEFINED_IN_UNUSED = 1;
355my $ANALYZE_NESTED_ASSIGNMENTS = 1;
356
357################################################################
358# output routines
359
360sub debug {
361    print STDERR 'DEBUG: ', @_, "\n" if $DEBUGOPTS->{debug};
362}
363
364sub info {
365    my $mess = "\n" . join('', @_);
366    $mess =~ s/\n$//;
367    $mess =~ s/\n/\n    /g;
368    print STDERR 'INFO:', $mess, "\n";
369}
370
371# set in parse_file()
372my $CURRENT_FNAME = '';
373
374sub parse_warn {
375    my ($mess, @rest) = @_;
376    my $line = $_;
377    my $lineno = $.;
378    my $fname = $CURRENT_FNAME;
379    print STDERR "PARSE WARNING: At line $fname\:$lineno : '$line'\n        $mess", @rest, "\n";
380}
381
382sub warning {
383    print STDERR "WARNING: ", @_, "\n";
384}
385
386sub trace_sym {
387    my ($sym, @rest) = @_;
388    my $is_trace = 0;
389    if (ref($sym) eq 'ARRAY') {$is_trace = grep{$TRACE_SYMS->{$_}} @$sym}
390    else {$is_trace = $TRACE_SYMS->{$sym};}
391    debug @rest if $is_trace;
392}
393
394################################################################
395
396my $FILEOBJS_BY_NAME = {};
397
398sub usage {
399    my ($mess) = @_;
400    print STDERR "ERROR: $mess\n" if $mess;
401    print STDERR "See 'perldoc $0' for usage\n";
402    exit(1);
403}
404
405sub process_commandline {
406    my $fileobjs = [];
407    my $i = 0;
408    my $expr_count = 0;
409    while (@ARGV) {
410        my $opt = shift @ARGV;
411        warning "skipping empty argument", next unless $opt;
412        usage("unexpected non-option $opt") unless $opt =~ m/^-/;
413        my $optarg = shift @ARGV;
414        usage("no option argument following '$opt'") unless $optarg;
415        my $fileobj;
416        if ($opt eq '-e') {
417            $fileobj = {
418                filename => ('expr' . ($expr_count++)),
419                expr => $optarg,
420                is_anchor => 1,
421            };
422        }
423        elsif ($opt eq '-h') {
424            usage("unimplemented: reading script out of an html file");
425        }
426        elsif ($opt eq '-i') {
427            $fileobj = {filename => $optarg, is_anchor => 1};
428        }
429        elsif ($opt eq '-l') {
430            $fileobj = {filename => $optarg};
431        }
432        elsif ($opt eq '-o') {
433            $OUTFILE = $optarg;
434        }
435        elsif ($opt eq '-debug') {
436            $DEBUGOPTS = {map {$_ => 1} split(',',$optarg)};
437        }
438        elsif ($opt eq '-trace') {
439            $TRACE_SYMS->{$optarg} = 1;
440        }
441        elsif ($opt eq '-warn') {
442            if ($optarg eq 'all') {
443                while(my($k,$v) = each %$WARNOPTS) {$WARNOPTS->{$k} = 1;}
444            }
445            else {
446                $WARNOPTS = {map {$_ => 1} split(',',$optarg)};
447            }
448        }
449        elsif ($opt eq '-dump') {
450            if ($optarg eq 'all') {
451                while(my($k,$v) = each %$DUMPOPTS) {$DUMPOPTS->{$k} = 1;}
452            }
453            else {
454                $DUMPOPTS = {map {$_ => 1} split(',',$optarg)};
455            }
456        }
457        elsif ($opt eq '-pre') {
458            $PREPROCESSOR = $optarg;
459        }
460        elsif ($opt eq '-tabwidth') {
461            $TABWIDTH = $optarg;
462        }
463        elsif ($opt eq '-nestedassigns') {
464            $ANALYZE_NESTED_ASSIGNMENTS = (!$optarg || $optarg eq '0') ? 1 : 0;
465        }
466        else {
467            usage("unknown option '$opt'");
468        }
469        if ($fileobj) {
470            push(@$fileobjs, $fileobj) ;
471            $FILEOBJS_BY_NAME->{$fileobj->{filename}} = $fileobj;
472        }
473        $i++;
474    }
475    $TABSPACES = ' 'x$TABWIDTH;
476    return $fileobjs;
477}
478
479# used for 'ctormeth' definition, for example "this.foobar = function ...".
480# It figures out what "this" is from the $parentdef
481sub qualify_def_this {
482    my ($name, $parentdef) = @_;
483    my $pname = $parentdef->{qualname};
484    my $pdeftype = $parentdef->{deftype};
485    if ($pdeftype eq 'globalfunc') {
486        debug("qualifying name '$name' with parent '$pname' of type '$pdeftype'");
487        $name = "$pname\.constructor\.$name";
488    }
489    else {
490        if ($pdeftype eq 'instmeth' || $pdeftype eq 'localfunc') {
491            if ($WARNOPTS->{instmeth}) {
492                parse_warn("instance method definition of '$name' in parent '$pname'; if parent is a singleton, defining a method in the constructor is ok")
493            }
494        }
495        elsif ($pdeftype eq 'singleton') {
496        }
497        else {
498            parse_warn("member function definition of '$name' with parent '$pname', parent deftype '$pdeftype'");
499        }
500    }
501    return $name;
502}
503
504# used when a reference string for a function call starts with 'this.'
505# we try to determine what "this" means based on deftype we are in :
506#   'ctormeth'   - in a "this.foobar = function" (itself inside a global function)
507#   'protometh'  - in a "FooBar.prototype.methname = function"
508#   'instmeth'  -  in a "whatever.methname = function"
509#   'globalfunc' - inside body of global function (constructor)
510sub qualify_ref_this {
511    my ($refname, $def) = @_;
512
513    # we this could have been from the body of a constructor, or from the body of another method.
514    my ($restname) = ($refname =~ m/^this\.(.*)/);
515    my $deftype = $def->{deftype};
516
517    my $refqual = undef;
518    my $is_funny = 0;
519
520    if ($deftype eq 'ctormeth' || $deftype eq 'protometh' || $deftype eq 'instmeth' || $deftype eq 'singleton') {
521        ($refqual) = ($def->{qualname} =~ m/(.*)\./);
522        if (!$refqual && $def->{parentqual}) {
523            debug("giving reference '$refname' the parent qualifier of '", $def->{parentqual}, "'");
524            $refqual = $def->{parentqual};
525        }
526    }
527    elsif ($deftype eq 'globalfunc') {
528        $refqual = $def->{qualname} . '.prototype';
529    }
530    else {
531        $is_funny = 1;
532        parse_warn("reference '$refname' in a definition of unknown type '$deftype'");
533    }
534
535    if ($refqual) {
536        debug("qualifying reference '$refname' in a '$deftype' definition as '$refqual' + '.' + '$restname'");
537        $refname = "$refqual\.$restname";
538    }
539    elsif (!$is_funny) {
540        parse_warn("could not figure out what 'this' means in reference, inside definition: ", Dumper($def));
541    }
542    return $refname;
543}
544
545# create and register a definition object.
546sub add_def {
547    my ($qualname, $actualname, $deftype, $fname, $lineno, $level, $parentdef, $protoname, $aliasto) = @_;
548    die "add_def: wrong number args: @_" unless scalar(@_) == 9;
549    # we are currently at the next line, so subtract 1 from $. for zero-based line number.
550    $lineno--;
551
552    my $is_global = 0;
553    if ($qualname eq $GLOBALNAME) {
554        $is_global = 1;
555        $qualname = "$GLOBALNAME$fname";
556    }
557
558    # parse out params, unless type 'assign'
559    my $params = {};
560    if (!$is_global && $deftype ne 'assign') {
561        my ($funcname, $paramstr) =  m/function ([\w_]*?)\s*\((.*?)\)/ ;
562        if (!$funcname) {
563            $funcname = '';
564            ($paramstr) = m/function\s*\((.*?)\)/;
565        }
566        if (defined($paramstr)) {
567            trace_sym($funcname, "funcname=$funcname, paramstr=$paramstr in '$_'");
568            # $paramstr ||= '';
569            # if (!defined($paramstr)) { parse_warn("did not match function params, 1='$1'"); $paramstr = ''}
570            my @parms = split(/\s*,\s*/, $paramstr);
571            $params = {map {$_ => 1} @parms};
572            trace_sym($funcname, "got params '$paramstr', ", Dumper($params));
573        }
574        else {
575            parse_warn("no function params to parse in: ", $_);
576        }
577    }
578   
579    my $parentqual = $parentdef->{qualname};
580    my $justname = undef;
581    if (!$is_global) {
582        ($justname) = ($qualname =~ m/$NAMERE$/);
583        parse_warn("no match to m/$NAMERE\$/ in '$qualname'") unless $justname;
584    }
585    my $def = {
586        qualname => $qualname,
587        is_global => $is_global,
588        actualname => $actualname,
589        justname => $justname,
590        deftype => $deftype,
591        filename => $fname,
592        startline => $lineno,
593        linenos => [],    # used only if $is_global, a list of line numbers
594        params => $params,
595        level => $level,
596        parentqual => $parentqual,
597        protoname => $protoname,
598        aliasto => $aliasto,
599        refs => {},      # all references found in body of this definition. hash from $qualname to [$reftype, $lineno]
600        undefs => [],    # list of keys from refs which are not defined.
601        usedby => [],    # array of other $def's which point to this one.
602        used => 0,
603        added => 0,
604    };
605    my $existing = $DEFS_BY_QUAL->{$qualname};
606    my $do_replace = 1;
607    if ($existing) {
608        # don't warn if either of type 'assign' (and same file?)
609        if ( # $existing->{filename} eq $def->{filename} &&
610            ($existing->{deftype} eq 'assign' || $def->{deftype} eq 'assign')) {
611            # don't replace a non-assign with an assign
612            if ($def->{deftype} eq 'assign') {
613                $do_replace = 0;
614                parse_warn("preventing replacement of existing definition of '$qualname' at ", $existing->{filename}, ":", $existing->{startline},
615                           " with an 'assign' definition") if $WARNOPTS->{dups};
616            }
617            else {
618                parse_warn("allowing replacement of an existing 'assign' definition of '$qualname' at ", $existing->{filename}, ":", $existing->{startline},
619                           " with another") if $WARNOPTS->{dups};
620            }
621
622        }
623        # don't warn if a localfunction
624        elsif ($existing->{deftype} eq 'localfunc') {
625            debug("overriding previous definition of '$qualname' because localfunc");
626        }
627        else {
628            parse_warn("duplicate definition of '$qualname': ", Dumper($existing), Dumper($def)) if $WARNOPTS->{dups};
629        }
630    }
631    $DEFS_BY_QUAL->{$qualname} = $def if $do_replace;
632
633    # if $do_replace of $existing, then we better make sure that $existing knows not to
634    # complain later....
635    $existing->{replacedby} = $def if $do_replace && $existing;
636
637    trace_sym($qualname, "storing def on '$qualname' with justname=$justname") if $justname;
638
639    if ($justname) {
640        my $a = $DEFS_BY_UNQUAL->{$justname};
641        $DEFS_BY_UNQUAL->{$justname} = $a ? [@$a, $def] : [$def];
642    }
643    my $filedefs = $DEFS_BY_FNAME->{$fname};
644    my ($already) = grep {$_->{qualname} eq $qualname} @$filedefs;
645    parse_warn("The file $fname already has a definition for '$qualname' at ", $already->{filename}, ':', $already->{startline})
646        if $WARNOPTS->{dups} && $already && ($already->{deftype} ne 'localfunc' || $def->{deftype} ne 'localfunc') ;
647    push(@$filedefs, $def);
648
649    debug("starting definition of '", def_name($def), "' with type '$deftype' at $fname:$lineno, level $level");
650    return $def;
651}
652
653# used when displaying messages about a definition.
654sub def_name {
655    my ($def) = @_;
656    my $q = $def->{qualname};
657    if ($def->{is_global}) {
658        return "(global expr)";
659        my $sl = $def->{startline};
660        my $ll = $def->{lastline};
661        return "(expr lines $sl-$ll)"; # in $1)";
662    }
663    # return $def->{protoname} . ".prototype.$q" if $def->{protoname};
664    return $q;
665}
666
667# note that this definition $def is referring to qualified symbol $refname.
668# a single definition might refer to some other symbol multiple times; we only record one such case.
669sub add_ref {
670    my ($def, $refname, $reftype, $fname, $lineno) = @_;
671    # we are currently at the next line, so subtract 1 from $. for zero-based line number.
672    $lineno--;
673    my ($startname) = ($refname =~ m/^(\w+)/);
674
675    if ($refname eq $def->{qualname} || $refname eq $def->{actualname}) {
676        debug("skipping recursive reference to '$refname' in ", def_name($def));
677        return;
678    }
679
680    if ($def->{params}->{$refname}) {
681        debug("skipping reference '$refname' in ", def_name($def), " because it is a parameter");
682        return;
683    }
684
685    if (!$startname) {
686        warning("reference name '$refname' does not start with word (reftype=$reftype), at $fname\:$lineno");
687        return;
688    }
689
690    my $qualname = $refname;
691    # my $actualname = $refname;
692    if ($startname eq 'this') {
693        if ($refname eq 'this') {debug("skipping 'this' as function"); return;}
694        $qualname = qualify_ref_this($refname, $def);
695    }
696    elsif ($KEYWORDS{$startname}) {
697        # debug("skipping keyword '$refname' at $fname:$lineno");
698        return;
699    }
700    elsif ($BUILTIN_OBJECTS{$startname}) {
701        # debug("skipping builtin object reference '$refname' at $fname:$lineno");
702        return;
703    }
704    elsif ($BUILTIN_FUNCTIONS{$startname}) {
705        # debug("skipping builtin function reference '$refname' at $fname:$lineno");
706        return;
707    }
708    debug("adding reference '$refname' (qualname='$qualname') of type '$reftype' from ", def_name($def), " at $fname:$lineno : ", $_);
709    # 3rd slot is to hold the definition, once we know it
710    $def->{refs}->{$qualname} = [$reftype, $., undef];
711}
712
713sub is_parameter {
714    my ($currentdef, $actualname) = @_;
715    return ($currentdef && $actualname && $currentdef->{params}->{$actualname});
716}
717
718# parse the provided file (or expression)
719sub parse_file {
720    my ($f, $fileobj) = @_;
721
722    my $fname = $fileobj->{filename};
723    $CURRENT_FNAME = $fname;
724    $DEFS_BY_FNAME->{$fname} = [];
725
726    # the function def we are currently inside of, or the top-level script
727    my $currentdef = add_def($GLOBALNAME, '', 'global', $fname, 0, 0, undef, undef, undef);
728    $fileobj->{globaldef} = $currentdef;
729
730    my $nested = [$currentdef];          # stack of functions being defined.
731    my $in_comment = 0;
732    my $lines = [];
733    while(<$f>) {
734        push(@$lines, "$_");
735        chop;
736        # debug("parsing $fname:$. : '$_'");
737
738        # convert tabs to equivalent number of spaces
739        s/\t/$TABSPACES/g;
740
741        # a dumb C-comment parser, in case preprocessor didn't exclude them.
742        # check for end of multi-line C comment
743        if ($in_comment) {
744            if (m,\*/,) {
745                $in_comment = 0;
746                s,.*?\*/,,;
747            }
748            else {next;}
749        }
750        # remove C++ comment
751        s,//.*$,,;
752        # start of C comment.
753        if (m,^\s*/\*,) {
754            if (m,\*/,) {
755                s,/\*.*?\*/,,;
756            }
757            else {
758                $in_comment = 1;
759                next;
760            }
761        }
762
763        # entirely blank line
764        next if m/^\s*$/;
765
766        # collapse quotes
767        s/"[^\\\"]*"/""/g;
768        s/'[^\\\']*'/''/g;
769
770        # determine indent level.
771        m/^( *)/;
772        my $level = length($1)/$PREINDENT;
773
774        # ugly assume global definitions are ones that start with a zero indent
775        my $is_global = ($level == 0);
776
777        # maybe done defining function
778        my $lastlevel = $currentdef->{level};
779        my $numnested = scalar(@$nested);
780        my $popped = 0;
781        if ($level <= $lastlevel && ($lastlevel > 0 || $numnested > 1)) {
782            debug("finishing definition of ", def_name($currentdef), " at line $. because $level <= $lastlevel\: '", $_, "'");
783            $popped = 1;
784            $currentdef->{lastline} = $.;
785            pop(@$nested);
786            $currentdef = $nested->[$numnested - 2];
787            die "no nested function definition to pop at: $_" unless $currentdef;
788        }
789
790        my $qualname = undef; # full qualified
791        my $actualname = undef; # what actually was found in the file
792        my $deftype = undef;
793        my $protoname = undef;
794        my $aliasto = undef;
795
796        if (m/^ *function $NAMERE/ || m/^ *var $NAMERE = function/ ) {
797            $deftype = ($is_global ? 'globalfunc' : 'localfunc');
798            $qualname = $actualname = $1;
799        }
800        elsif (m/^$NAMERE = new function\(/ || m/^ *var $NAMERE = new function\(/) {
801            $deftype = 'singleton';
802            $qualname = $actualname = $1;
803        }
804        elsif (m/^ *this\.$NAMERE = function/) {
805            $deftype = 'ctormeth';
806            $actualname = "this.$1";
807            $qualname = qualify_def_this($1, $currentdef);
808        }
809        # TODO: Foo.prototype.meth = aliasfunc
810        # TODO: var Foo = {methname : function ...
811        elsif (m/^ *$QUALRE\.prototype\.$NAMERE = function/) {
812            $deftype = 'protometh';
813            $protoname = $1;
814            $actualname = $qualname = "$1\.prototype\.$2";
815        }
816        elsif (m/^ *$QUALRE\.$NAMERE = function/) {
817            $actualname = $qualname = "$1\.$2";
818            $deftype = 'instmeth';
819        }
820
821        # starting a function definition - register and continue loop
822        my $wholebody;
823        if ($deftype) {
824            my $parentdef = $currentdef;
825            $currentdef = add_def($qualname, $actualname, $deftype, $fname, $., $level, $parentdef, $protoname, $aliasto);
826            push(@$nested, $currentdef);
827            # if line ends in an open curly, then there are no body lines to parse, and it will finish on another line
828            if (m/\{[\s\n]*$/) {
829                next;
830            }
831            elsif (m/\{(.*)\}[\;\s\n]*$/) {
832                debug("function begins and ends on same line $.: $_");
833                $wholebody = $1;
834                $_ = $wholebody;
835            }
836            else {
837                parse_warn("definition start line does not end with open or closing bracket");
838                next;
839            }
840        }
841
842        # if global expression, and not a new function definition, add the line to the currentdef
843        if (!$popped && !$wholebody && $currentdef->{is_global} && ! m/^[\s\n\r]*$/) {
844            my $linenos = $currentdef->{linenos};
845            push(@$linenos, $. - 1);
846            # debug("pushing expression line ", $. - 1);
847        }
848        else {
849            debug("not in a global expression at $.: ", $popped, $currentdef->{is_global}, ': ', $_);
850        }
851
852        # deal with assignment. This could be both a definition (LHS) and reference (RHS). could also be an alias.
853        if (!$deftype && (m/^ *$QUALRE = / || m/^ *var $NAMERE = /) && ($is_global || $ANALYZE_NESTED_ASSIGNMENTS) ) {
854            $deftype = 'assign';
855            $actualname = $qualname = $1;
856            if (is_parameter($currentdef, $actualname)) {
857                debug("skipping assignment because it is to parameter '$actualname', at line $.");
858            }
859            elsif (@$nested > 1 && is_parameter($nested->[@$nested - 2], $actualname)) {
860                debug("skipping assignment because it is to parameter '$actualname' of parent function, at line $.");
861            }
862            else {
863                # we don't push a function defining context
864                add_def($qualname, $actualname, $deftype, $fname, $., $level, $currentdef, $protoname, $aliasto);
865            }
866        }
867        else {
868            my $line = $_;
869            trace_sym($1, "did not match assignment") if grep {$line =~ m/$_/} keys %$TRACE_SYMS;
870            $_ = $line;
871        }
872
873        # Maybe warn because not starting a function definition, but contains the string 'function'.
874        if (m/function/) {
875            # matches to the string 'function' that are normal and expected:
876            #    closures in function calls: foobar(17, function(a, b) {
877            #    returning a closure:        return function (o) { 
878            #    closure on rhs, in level:   foobar = function (s) {
879            #    matches in string:          foobar("what is this function");
880            #    matches in regexp:          var m = s.match(/function /);
881            #    partial matches:            var s = foobar.functionName(f);
882            parse_warn("ignoring function defintion starting here") if $WARNOPTS->{functionmatch};
883        }
884
885        # scan this line for function references, using possibly qualified names
886
887        # find all constructor calls (use of "new")
888        my @ctorcalls = m/new $QUALRE/g;
889        for my $csym (@ctorcalls) {
890            if (is_parameter($currentdef, $csym)) {
891                debug "skipping call to constructor '$csym' because a parameter";
892            }
893            add_ref($currentdef, $csym, 'construct', $fname, $.);
894        }
895
896