ANALYSIS OF SHELLSORT ALGORITHMS

: Shellsort is a comparison sort that uses insertion sort at each iteration to make a list of interleaved elements nearly sorted so that at the last iteration the list is almost sorted. The time complexity of Shellsort is dependent upon the method of interleaving (called increment sequence) giving variants of Shellsort. However, the problem of finding proper of interleaving to achieve the minimum time complexity of O(n log n) is still open. In this paper, we have analyzed the performance of variants of Shellsort based on their time complexity. Our measure of time complexity is independent of the machine configuration and considers all the operations of a sorting process. We found that the interleaving method or increment sequence proposed by Sedgewick performs best among the analyzed variants.

INTRODUCTION Shellsort [1] is an in-situ comparison sort algorithm where at each iteration, each list of interleaved elements from the list A[0, ..., n], are sorted by insertion sort; each list of interleaved elements forms disjoint sets of elements. The interleaving is reduced in subsequent iteration, until it becomes 1, in which case insertion sort gets applied to the whole, now nearly sorted, list A. The sorting algorithm is oblivion to the data [2] and one of its implementation is (sorting in non-decreasing order):    [15]; the time complexity of the algorithm is also dependent upon the increment sequence. Finding optimal increment sequence that will minimize the time complexity of the Shellsort is still an open problem [12]. In addition to this, data-oblivion property of the sorting algorithm makes it an attractive solution for deployment in those systems where a dataset, distributed over multiple nodes in a network, needs to be arranged in certain order. In this paper we will compare the Shellsort variants in terms of the time complexities. Unlike [16]- [18] that compared the shellsort variants based on a parameter that includes number of swaps or number of comparisons, we have defined a parameter that have included these factors and also the time consumed in checking conditions for loop to run. We believe that our parameter for comparison is closer to the general definition of theoretical time complexity of an algorithm. The rest of this paper is organized as follows: section II will make a brief description of the variants of Shellsort. The framework used to make the comparison among the Shellsort variants and findings are discussed in section III. We conclude our paper in section IV, followed by references.

II. SURVEY OF SHELLSORT VARIANTS
At i-th iteration of Shellsort, list A gets subdivided into hi sublists each of size ⌊ n/ h i ⌋ [12], and insertion sort is applied to each of these lists. So, the time complexity of the sorting algorithm depends upon the time complexity of sorting each of the sublists. Moreover, since each of these sublists gets sorted resulting partially sorted list, therefore, at subsequent iterations it is expected that there will be less swaps than the number of comparisons. The sequence proposed by Shell [1] uses ⌊ log n ⌋ length sequence. The time complexity in worst case is proved to be O(n 2 ) when n is a power of 2. To reduce the time complexity, [3] proposed that even skip length should be replaced by next odd number, resulting time complexity of O(n 3/2 ). [4] also achieved the same worst-case complexity using the increment sequence ( The problem of moving an element to its rightful place is reduced to Frobenius problem [19] by [6] where it is derived that at each iteration, the skip length should not be a linear combination of the skip length of the next iterations, thus avoiding unnecessary relocation of same element in the list; this resulted better time complexity of O(n 5/4 ). Time complexity of 3-tuple increment sequence (h, k, 1) is studied in [7] where the exact time complexities for each of the three iterations are derived. [8] proposed the increment sequenceshk =1 and hi=3hi+1 + 1 where h1 is such that 3h0 ≥ n, and (2, 1), whereas, [9] proposed the reverse of following increment sequence (3hi ≥ n) using the Frobenius problem: Using this increment sequence, O(n 4/3 ) time complexity is achieved. 3-tuple increment sequence for Shellsort are again explored in [10] where the increment sequence (n 7/15 , n 1/5 , 1) is proposed, getting a time complexity of O(n 23/15 ). [11] proposed increment sequence where hi = 2 i -1 (i ≥ 1) until hi ≥ n; the achieved time complexity is almost same as in [3]. [12] analyzed three increment sequences: (1) (n 1/3 , 1) with time complexity Ω(n 5/3 ) (2) (n 1/2 , n 1/4 , 1) with time complexity O(n 3/2 ) (3) (n 11/16 , n 7/16 , n 3/16 , 1) with time complexity Ω(n 21/16 The discussed variants did not achieve the lower bound Θ(n logn) [20] of a comparison sort algorithm. During writing of this paper this bound is probabilistically achieved by [2] and [17] proved that to achieve the lower bound the length of the increment sequence will be Θ ¿, which is yet to be found. Most of the time complexities of the Shellsort variants have considered either number of swaps or number of comparisons, except [14] and [21] which have additionally used a specialized machine for actual time taken. However, during run of algorithm associated variables like use of counter variables and temporary variables, using instruction for increment or decrement etc., adds up the time complexity and their contribution to the time complexity is proportional to the number of times a loop, using these, runs. In the next section we define a parameter measuring of time complexity where we include these factors and using this parameter we make compare the performances of the Shellsort variants.

III. COMPARATIVE ANALYSIS OF SHELLSORT VARIANTS
We first define the parameter for the comparison followed by the methodology for comparison and results obtained.

A. Defining Complexity
Time complexity of an algorithm B, C(B), consisting (I1, ... It) instructions running (n1, ... nt) times respectively can be defined as: We can, therefore, measure C(B) by using a global variable which is initially set to 0; the variable gets incremented by one for each execution of the instructions.
The definition (1) includes the number of comparisons, number of exchanges and the number of times associated variables are used. The definition is also independent of underlying platform used for implementation of the algorithm B.

B. Methodology for Comparison
We compare average case complexities of the Shellsort variants. We have selected the variants [1][3]- [5][7]- [13] and [15]. We have not taken [2] as it is a probabilistic algorithm where sorting is not guaranteed (though the probability of getting a sorted list is very high). The increment sequence generation of [6] is complex as it requires checking co-prime of two numbers, thus it can take more time to generate the increment sequence than actual sorting if the size of list to be sorted is high. Therefore we have not considered the variant in our analysis. Based on similar reason (time complexity of genetic algorithms are generally high), [14] is also not considered.
We have implemented the selected Shellsort variants in Java programming language as generating random list of elements is easier in it. Each of the variants is implemented in a separate class. Objects are created for each of these variants in the main method and executed. During implementation we have not considered the time complexity of increment sequence generation as we are more inclined towards the time complexity of sorting (section I). To find time complexity of each Shellsort variant and size n of the list to be sorted, 1000 random lists are generated, the average of these 1000 runs are taken. n is varied from 500 to 10000 with interval of 500, that is, 20 values of n is taken. To ensure fairness of analysis among the selected variants, same 1000 lists for given n is fed into all the variants. Therefore, our analysis has used same 20 × 1000 = 20000 lists, for all the variants.

C. Results
We have plotted the obtained average of average-case time complexity in vertical axis and size of dataset in horizontal axis. The mapping of the labels used in plots is given in table I. The variation of time complexity with lists' size is shown in figure 1. To avoid overflow during measuring the time complexity of a variant, the variable used for the measure is incremented by 0.010 for a run of an observed instruction. So, the time complexities in figure are scaled down by 100.
From figure 1, it can be observed that the second increment sequence in [8] performed worst as it has used only two skiplengths and hence it nearly reduced to insertion sort which has time complexity O(n 2 ). [9] performed best among the selected variants since: (1) the possibility that two consecutive skip length in increment sequences are co-primes is high, thereby avoiding unnecessary movement of an element of list; (2) the number of skip lengths for given n is close to log n as suggested by [17]. On similar reason, [1] and [15] (though [15] performed better than [1]) performed nearly as good as [9].

IV. CONCLUSION
We have compared average case time complexity of variants of Shellsort algorithm and observed that the variants that uses increment sequences of length close to 1 tends to perform worse than those variants whose length of increment sequences is proportional or close to log n. Based on this fact increment sequence proposed by Sedgewick performed best among the other variants that we have considered. Therefore, we suggest use of Sedgewick's increment sequence for use in practical field.