Answered: Why don't compilers correct programming error they find automatically?

Home » Coding » Answered: Why don’t compilers correct programming error they find automatically?

Question

When programming, a compiler can generate an error if a minor mistake such as omitting a semicolon occurs. It is necessary for the programmer to make the necessary correction manually. Could this process be improved by having the compiler automatically remedy the issue and alert the programmer of the correction?

Answer #1

Yes, it is possible to improve the process by having the compiler automatically remedy certain errors and alert the programmer of the correction. This is known as “error recovery” in compilers. However, it is not always possible for the compiler to automatically correct the error, as sometimes the error may be a semantic one, which requires human understanding to fix.

Additionally, automatically fixing errors could lead to unintended consequences and make it harder to track down the source of the problem. So, it’s a trade-off between automation and human understanding.

Answer #2

Experience has demonstrated that the initial concept of the DWIM (Do What I Mean) feature, which originated from BBN’s LISP and was eventually incorporated into InterLISP, is not beneficial in the long run. Following the release of Cornell’s PL/C compiler, IBM later incorporated a similar feature into their PL/1 system, though it was not as thorough as the one developed by Cornell. While some may have found it to be beneficial, I personally did not find it particularly useful.

The original Pascal implementation for Unix featured an innovative concept of automatically correcting errors in code. A picture from Sun’s Pascal manual, illustrating this feature in action, is provided below:

A picture from Sun's Pascal manual, illustrating this feature in action.

Unfortunately, some individuals may use the tool as an excuse for laziness, rather than taking advantage of its potential to accelerate development and address errors. When teaching the “Intro to CS” course at UCB, we noticed that students were submitting programs which contained syntax errors that had been corrected by above example instead of being fixed by the student prior to submitting their program to the TA. Consequently, these students would express dissatisfaction when their program was marked down.
It is important to note that UCB drew inspiration from the Cornell PL/C compiler for the IBM 360[1], which, according to Wikipedia, sought to maintain and complete the compilation process.

“The PL/C compiler had the unusual capability of never failing to compile any program, through the use of extensive automatic correction of many syntax errors and by converting any remaining syntax errors to output statements.”

Ultimately, the experiment was unsuccessful due to people’s preferences. Introducing a feature like this in an Integrated Development Environment (IDE) or an editor such as Emacs may be a feasible approach as the source code would be modified. However, this does require users to have a preference for IDEs, and as other answers have suggested, what occurs when the computer’s interpretation differs from the programmer’s intended outcome?

Some have suggested that this is comparable to auto-correct and tools such as Grammarly. To be honest, I have a conflicting relationship with auto-correct, particularly on my mobile devices. As I am dyslexic, these sorts of tools can be of great help to me, but I often find myself having to override the corrections they make.

In my opinion, I believe that auto-correct for programming is similar to the saying “Do What I Mean” and the solution causes more issues than it resolves.

Answer #3

What are the reasons why compilers are not able to automatically fix errors they detect?

It is reasonable to ask whether compilers are able to detect errors; however, this is not necessarily the case. It is unfortunate that the language used to discuss compilers, both in informal and formal contexts, often uses the term “error” to refer to such reports. This term does not accurately reflect the meaning most individuals would associate with it.

In short, compilers do not detect errors automatically as they have no way of identifying them in the first place. The output from the compiler is not an error report, but rather a report on inconsistencies.

We can explore an example of an “inconsistency report” to better understand its form. The following code written in the Java language will serve as a demonstration:

public class App { 
  private static int twice(int value) { 
    return value + value; 
  } 
 
  public static void main(String[] args) { 
    twice(2); 
    twice("Hi"); 
  } 
}

If you try to compile this code, the Java compiler will tell you this:

Main.java:8: error: incompatible types: String cannot be converted to int
twice(“Hi”);

An inquiry arises: is this an incorrect statement? The answer is no. Through the evaluation of `2 + 2` and `”Hi” + “Hi”`, it is evident that Java has no basis to assume the code is wrong.

Nevertheless, the Java compiler will not compile this, and it provides an explanation for this refusal.

The Java Compiler features a model for programs which serves the purpose of rejecting any programs that cannot be understood. While this is a typical feature of compilers, the model is very conservative and may reject valid and functional programs.

The potential design for this model may resemble the following:

F(A) when F : int -> int and A : int = <what it means>

Given a method call F with one argument A, we will define what it means only if F takes an argument of type int and returns an argument of type int and the argument provided has the same type int. Although it is possible that there are valid cases not included in this model, the authors of Java did not take the time to specifically define them. Had they done so, our example would have compiled and operated successfully.

The authors of the TypeScript compiler invested effort to support this case; however, they did not extend this support to all forms of this case. Consequently, the TypeScript model will not accept this, as the authors have not established a definition for it.

function twice(x: number | string): number | string { 
  return x + x;  
} 
 
twice(2); 
twice("Hi");

But will happily accept this one:

function twice(x: number | string): number | string { 
  if (typeof x == "string") { 
    return x + x; 
  } else { 
    return x + x; 
  } 
} 
 
twice(2); 
twice("Hi");

Although the two programs have identical execution semantics, the TypeScript compiler will accept one and reject the other. Consequently, the two programs are essentially the same from an execution standpoint.

It is important to note that TypeScript’s model is limited, and there may be instances where it will reject code intentionally or unintentionally. As a result, it is necessary to understand how to adhere to the compiler’s requirements, which is distinct from writing code without errors.

What could be the possible reasons for compilers to reject valid programs?

The short answer is that it is overly laborious to obtain the desired characteristics via a particular language whilst still accommodating all valid programs a user may create.

For instance, compiling programs could take an extended period of time, and consequently, delivering responsive IDEs for the language could be impeded. Additionally, we may be inclined to encourage users to adhere to a particular coding style, even if this entails them having a less than satisfactory experience due to rejections.

Answer #4

When implementing their first compiler, students may be inclined to take the approach you have suggested for some of the more frequent syntax errors. Internally, the compiler may need to take action in order to prevent an accumulation of error messages further down the line. One of the approaches taken to achieve this is the addition of a manufactured semicolon, therefore, it is beneficial to make the alteration to the developer’s source code.

The students soon become aware that there are instances where the compiler believes it is correctly interpreting the programmer’s code, yet it is in fact altering the original purpose. A seemingly insignificant “missing character” issue from the compiler’s standpoint can sometimes be much more than just a missing character.

For instance, the root cause of the issue may be a missing block of code. It is difficult for the compiler to determine the scope of the code that is absent. Additionally, it is unable to comprehend what the developer was attempting to implement. All that it can ascertain is that it believes there is a missing semicolon, without recognizing if it is the true problem.

As an illustration, consider the following source code that a C compiler might encounter:

double result; 
{ 
	result = 0.0; 
 
	for (int i = 0; i < elements; i++) 
	// TODO - Add body of loop to calculate result 
} 
 
result += 1.0;

The compiler is unable to comprehend the comment. While some integrated development environments (IDEs) are able to identify and mark ‘TODO’ comments, the compiler is separate in this case. The compiler thus assumes that the problem is a missing semicolon on line 6, and alerts the user accordingly. This issue is not the actual cause of the issue; the true culprit is missing block.

What if the compiler automatically included the semicolon in the source code and issued a warning as a notification?

double result = 0.0; 
 
{ 
	result = 0.0; 
 
	for (int i = 0; i < elements; i++); 
	// TODO - Add body of loop to calculate result 
} 
 
result += 1.0;

If the developer disregards the warning (which is not advised, but unfortunately a frequent occurrence), the altered code will compile without any syntax errors or warnings as the compiler has modified the developer’s source code under the assumption that it was the most suitable option. It appears that the developer’s intent was not achieved, resulting in the issue becoming concealed and undetected as there will be no errors or warnings raised upon subsequent compilations.

The issue at hand has become increasingly complex, as even when a developer adds the body of the function as a block of code, the problem still persists.

double result = 0.0; 
 
{ 
	result = 0.0; 
 
	for (int i = 0; i < elements; i++); 
	{ 
		// body of loop - do this multiple times 
	} 
} 
 
result += 1.0;

The semicolon added by the compiler will result in the for loop being non-operational, thus causing the code block beneath it to execute without conditions only once. This is not the desired outcome of the developer, making it troublesome to identify without carefully analyzing the code. In actuality, what the compiler has done by modifying the developer’s source code is to commit a typical human error of inserting a semicolon on the control line of the loop.

This is an illustrative example of a situation in which the compiler would cause further complications by assuming and applying punctuation that is absent.

It is advantageous for the compiler to identify the absent semicolon as a syntax error, allowing the developer to determine the appropriate correction. It should be noted that the proper resolution is not always the addition of a semicolon, as is demonstrated by this example. This is exactly the approach employed by actual compilers – alerting the user to the issue.

Compilers should refrain from making alterations to a developer’s source code as the developer is best equipped to understand the intended function of the code. Instead, the compiler should identify any errors and allow the developer to address the issue correctly.

Answer #5

It is possible to design a compiler that automatically remedies certain types of errors, such as omitting a semicolon, and alerts the programmer of the correction. This is known as an “error-correcting compiler.” However, there are trade-offs to consider when designing such a system. Automatically correcting errors can make the code more difficult to understand and debug, as the programmer may not be aware of the correction made by the compiler. Additionally, the compiler may not always be able to accurately determine the intended behavior, potentially introducing new errors.

It’s also worth noting that some modern compilers have built-in suggestions and auto-fixing feature, but it is ultimately up to the developer to implement the suggested change.