faculty: "FNWI" and publication year: "2010"
| Author||S. Vellinga|
|Title||Identifying behavior changes after PHP language migration using static source-code analysis|
|Supervisors||J. Vinju, J. Hofstede|
|Faculty||Faculty of Science|
|Programme||FNWI MSc Software Engineering|
|Abstract||Migrating the source-code of a PHP 4 program to PHP 5 could change the source-code semantics and therefore cause unexpected behavior. The goal of this project is to identify the causes for such changed behavior and build a tool which is able to locate the source-code that is subject to these changes using static source-code analysis.|
The key difference in PHP 5 compared to its predecessor is the completely rewritten OOP-model which offers better support for object oriented programming. One of the main changes in the new OOP-model is that objects are now passed and assigned by-reference instead of by-value which leads to much more aliases.
When objects are accessible through multiple paths, modifying the object at one location could influence behavior at another location if the same object is referenced. Detecting such changed behavior is difficult because PHP is a dynamic typed programming language which means type information is not available.
Because no type information is available, resolving method invocations has to be conservative. If the variable type is unknown, all classes which implement the requested method should be considered which could result in incorrect resolved method invocations.
Also PHP’s include mechanism includes files at run-time and allows the use of arbitrary expressions to include files. In order to resolve function calls, static source-code analysis has to know in which files the functions can reside. Therefore the possible values of the include expressions need to be computed.
When the precise value of an include expression cannot be resolved, file resolution needs to be conservative. This could result in incorrect resolved file inclusions.
Identifying source-code which semantics are different in PHP 5 compared to PHP 4 depends on the gathered facts previously described. Incorrect facts could result in false identified source-code or in overlooked source-code which should had been identified.
This research shows it is possible to automatically identify source-code which semantics are different in PHP 5 compared to PHP 4 using static source-code analysis. We have build a tool which was used to analyze 22 unique test cases and to analyze modules from a in production source-base developed by The Patient Safety Company consisting of over 40.000 lines of source-code.
Each test case represented a unique combination of variables which could or could not lead to changed behavior. The tool did correctly identify 11 cases which would behave different in PHP 5 and correctly skipped 6 cases which would not behave different in PHP 5. However 3 cases were incorrectly skipped by the tool and two cases were incorrectly identified.
Because the tool has to make conservative assumptions, false positives could arise. However, the results on the in production source-base showed only 68 spots of source-code which could behave different in PHP 5 compared to PHP 4 which is less than 1% on a source-base of over 40.000 lines of code.
All 68 spots turned out to be false positives caused by various reasons. While false negatives have not been tested other than implementing the unique test cases the tool did correctly identify in the in production source-base, we assume they still exist since not all small test cases were identified while they should had. Because the false negatives are caused by common language structures, it is plausible they also remain in the real test case. The unique test cases were correctly identified in the in production source-base though.
Both false negatives and false positives were caused by missing facts, the tools performance could be improved by implementing methods to extract these missing facts. The author of this thesis believes it is possible to extract these facts but this is something for further work. Currently running the tool and improving the identified spots does not guarantee no source-code which semantics are different in PHP 4 compared to PHP 5 remain.
|Document type|| scriptie master|
Use this url to link to this page: http://dare.uva.nl/en/scriptie/341759
Contact us about this recordNotify a colleague
Add to bookbag