-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathlexical-analysis.html
More file actions
173 lines (158 loc) · 8.58 KB
/
lexical-analysis.html
File metadata and controls
173 lines (158 loc) · 8.58 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<title>Lexical Analysis — Compiler Programming</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/agogo.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Syntax Analysis" href="syntax-analysis.html" />
<link rel="prev" title="The EeZee Programming Language" href="ez-lang.html" />
</head><body>
<div class="header-wrapper" role="banner">
<div class="header">
<div class="headertitle"><a
href="index.html">Compiler Programming</a></div>
<div class="rel" role="navigation" aria-label="related navigation">
<a href="ez-lang.html" title="The EeZee Programming Language"
accesskey="P">previous</a> |
<a href="syntax-analysis.html" title="Syntax Analysis"
accesskey="N">next</a> |
<a href="genindex.html" title="General Index"
accesskey="I">index</a>
</div>
</div>
</div>
<div class="content-wrapper">
<div class="content">
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="lexical-analysis">
<h1>Lexical Analysis<a class="headerlink" href="#lexical-analysis" title="Permalink to this headline">¶</a></h1>
<p>When compiling a program we need to recognize the words and punctuations that make up the vocabulary of the language.
This part of the compiler is therefore known as “lexical” analysis.</p>
<p>Usually a compiler is given one or more input programs, and the first thing it must do is read the program and
figure out what lexical elements appear in the program.</p>
<p>Typically, these lexical elements are known as tokens. So for example, in the following snippet of code:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">'hi'</span><span class="p">)</span>
</pre></div>
</div>
<p>We have a number of lexical elements / tokens:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">print</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">(</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">'hi'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">)</span></code></p></li>
</ul>
<p>There are many different ways to implement a “lexer” - the name we give to this component of the compiler.</p>
<ul class="simple">
<li><p>We can write this code by hand. This involves scanning the input program character by character and
deciding what tokens appear in the program.</p></li>
<li><p>Or we can specify the lexical elements in a grammar and have a tool generate the code to process the input
program and give us the tokens that appear in the program.</p></li>
</ul>
<p>A lexical analyser can be designed to process input on demand, or it may be designed to translate the entire
input source to a set of tokens at the very beginning.</p>
<section id="considerations">
<h2>Considerations<a class="headerlink" href="#considerations" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><p>Should comments in the input program be retained as tokens? Usually a lexer will discard comments, but in languages that
allow comments to be retained as documentation, the lexer must not discard them.</p></li>
<li><p>Should end of line markers be retained? Typically lexers drop all intermediate space including line markers,
but if the language syntax depends on line markers then these may need to be retained.</p></li>
<li><p>Should tokens copy the input text, convert them to another form, or retain pointers to the input itself?
Retaining the original form of the lexical token may be important in some cases, for example if the lexer
is used in a code formatter.</p></li>
<li><p>How much can we peek ahead? During later stages of the compiler, depending on the complexity of the language grammar,
it may be necessary to allow the compiler to look ahead one or more tokens without consuming them.</p></li>
<li><p>Ancillary information regarding tokens such a line number, column number in the input source are invaluable for
error reporting.</p></li>
</ul>
</section>
<section id="example-hand-coded-implementation">
<h2>Example Hand-Coded Implementation<a class="headerlink" href="#example-hand-coded-implementation" title="Permalink to this headline">¶</a></h2>
<p>The <a class="reference external" href="https://github.com/CompilerProgramming/ez-lang/tree/main/lexer">Lexer</a> module in the EZ language
implementation contains an example of hand-coded lexical analyser written in Java. This implementation returns tokens
on demand.</p>
<p>Another example is the <a class="reference external" href="https://lua.org/source/5.4/llex.c.html">Lua lexer</a>.</p>
</section>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
</div>
<div class="sidebar">
<h3>Table of Contents</h3>
<p class="caption" role="heading"><span class="caption-text">Preliminaries</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="prelim-impl-lang.html">Compiler Implementation Language</a></li>
<li class="toctree-l1"><a class="reference internal" href="ez-lang.html">The EeZee Programming Language</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Parsing Techniques</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Lexical Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="syntax-analysis.html">Syntax Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="abstract-syntax-tree.html">Abstract Syntax Tree</a></li>
<li class="toctree-l1"><a class="reference internal" href="type-systems.html">Type Systems</a></li>
<li class="toctree-l1"><a class="reference internal" href="semantic-analysis.html">Semantic Analysis</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Backend Basics</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="intermediate-representations.html">Intermediate Representations</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Learning Resources</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="learning-resources.html">Learning Resources</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Reviews</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="compiler-books.html">Compiler Books</a></li>
</ul>
<div role="search">
<h3 style="margin-top: 1.5em;">Search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer-wrapper">
<div class="footer">
<div class="left">
<div role="navigation" aria-label="related navigaton">
<a href="ez-lang.html" title="The EeZee Programming Language"
>previous</a> |
<a href="syntax-analysis.html" title="Syntax Analysis"
>next</a> |
<a href="genindex.html" title="General Index"
>index</a>
</div>
<div role="note" aria-label="source link">
<br/>
<a href="_sources/lexical-analysis.rst.txt"
rel="nofollow">Show Source</a>
</div>
</div>
<div class="right">
<div class="footer" role="contentinfo">
© Copyright 2024, Dibyendu Majumdar.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.3.2.
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</body>
</html>