When implementing their first compiler, students are often tempted to do exactly what you are suggesting, for some of the more common syntax errors. Internally, the compiler often has to do something to get itself back on track in an effort to avoid cascading error messages downstream, and throwing in a manufactured semicolon is sometimes a way to do this. So, the thinking goes, why not make the change to the developer’s source code for them?
But the students quickly realize that there are cases where the compiler thinks it’s doing the right thing, when in fact it is altering the original intent of the programmer. A simple “missing character” problem from the compiler’s point of view can sometimes be much more than just a missing character.
For example, the real issue might be a missing block of code. The compiler can’t know the extent of what’s really missing from the code. It can’t know what the developer intended. All it knows is that it thinks there is a missing semicolon. But it doesn’t know that’s the actual issue.
For example, let’s say a C compiler sees this source code:
- double result;
- {
- result = 0.0;
- for (int i = 0; i < elements; i++)
- // TODO - Add body of loop to calculate result
- }
- result += 1.0;
The compiler can’t understand the comment. (Actually, some IDEs can detect and flag TODO in comments, but we’re talking about the compiler itself here.) So, the compiler thinks that the problem is a missing semicolon on line 6, and reports that. Of course, that’s not what’s missing. What’s really missing is a block of code for the body of the loop.
So, what if the compiler simply added the semicolon to the source code, and just issued a warning?
- double result = 0.0;
- {
- result = 0.0;
- for (int i = 0; i < elements; i++);
- // TODO - Add body of loop to calculate result
- }
- result += 1.0;
Now, let’s assume that the developer ignores the warning (not a good practice, but an all too common one). Now the altered code will compile cleanly without any syntax errors or warnings, because the compiler changed the developer’s source code, thinking it knew what was best. This is clearly not what was intended by the developer, and now the problem is hidden and silent because subsequent compilations will be free of errors or warnings on this line.
In fact, the problem is now a bit insidious, because even if the developer comes along and adds the body of the function as a block of code:
- double result = 0.0;
- {
- result = 0.0;
- for (int i = 0; i < elements; i++);
- {
- // body of loop - do this multiple times
- }
- }
- result += 1.0;
the semicolon added by the compiler will cause the for loop to do nothing and will cause the block of code below it to execute unconditionally just once. This is clearly not what the developer intended, and can be pretty difficult to track down without really staring at the code. In fact, what the compiler has done by changing the developer’s source code in this way is to commit a common human error of placing a semicolon on the control line of the loop.
This is just one example of a situation in which the compiler would make things worse by applying what it believes is simply missing punctuation.
A much much better approach is for the compiler to flag the missing semicolon as a syntax error, and let the human developer figure out what the correct fix is. The correct fix is not always just adding a semicolon, as you can see from this example. And that’s exactly what real compilers do - flag the error.
Compilers should not be in the business of altering the developer’s source code. The human developer still knows better what the code is actually supposed to do. The compiler should flag the error, and let the human deal with fixing the code properly.