faculty: "FNWI" and publication year: "2006"
| Author||Menno Bredenoord|
|Title||How to optimize Rscript comprehensions?|
|Faculty||Faculty of Science|
|Institute/dept.||FNWI: Instituut voor Informatica|
|Programme||FNWI MSc Software Engineering|
|Abstract||Rscript is a small scripting language based on relational calculus. Its main purpose is to analyze and query source code from software. In this thesis we will give an answer to the following two research questions:|
I. How can Rscript comprehensions be optimized?
II. What are the effects of the optimizations?
We will investigate two methods which can be used to optimize Rscript comprehensions. Comprehensions are constructions used for iterating over one or more sets or relations, while matching one or more of their elements to a boolean-valued expression.
The first method considers several algebraic optimization techniques we found in the literature on comprehensions. These are: Qualifier Interchange, Filter Hiding, Product Elimination, Common Subexpression Elimination and Index Introduction.
Next we discuss optimization techniques which are used for optimizing relational databases, and can be used to optimize Rscript. These are: Commuting Selections – Cartesian Product, Commuting Selections – set-difference, Commuting Selections – union, Semantic Query Caching – subsumption, Semantic Query Caching – overlap and Peephole Optimization. For each of the techniques mentioned above it is discussed which transformations they perform, how the technique realizes a performance increase, which problems occur when applying them to Rscript and some examples of use in Rscript.
In order to determine the most efficient algebraic optimization technique, we evaluated the techniques by 1) performing measurements on them by using test cases, 2) checking the probability that the optimization can be applied to a Rscript file and 3) interpreting what the literature claims about them. Finally we concluded that the optimization techniques Qualifier Interchange and Filter Hiding are the most efficient.
The second method is based on the optimization of the set operators by using other data structures for set representation.
The current implementation of Rscript uses a Linear data structure to perform set operations. In order to optimize this, we investigate several data structures, such as: Hash Table, Binary Search Tree, Red-Black Tree, Binomial Tree and Judy Arrays. For each of the data structures we obtained their theoretical performance. For the two theoretically best performing data structures, Hash Table and Red-Black Tree, we performed several measurements on their operators.
The results of these measurements show us that the Red-Black Tree is the best performing data structure.
Finally we give an advice on which method to use best for optimizing Rscript comprehensions. We advice not to focus on algebraic optimization techniques to optimize the Rscript comprehensions, because the applicability of these optimizations is very low and an answer about the performance increase is hard to give. We do advice to change the current data structure and thereby improving the performance of the entire comprehension.
|Document type|| scriptie master|
Use this url to link to this page: http://dare.uva.nl/en/scriptie/411072
Contact us about this recordNotify a colleague
Add to bookbag