Sunday, June 2

Java: Remove an element from array

In Java, the common practice to remove an array is to call an Apache common library
newarray = org.apache.commons.lang3.ArrayUtils.remove(oldarray, i)

The source code shows that to remove the ith element, this method forms a new array "newarray" of the size of oldarray.size-1,  copy [0, i-1] of the oldarray into newarray, then copy [i+1, size] into new array.

I have an application that requires to call this function a lot, and I feel the limitation of it.

My scenario is: In a [8000][12000] sparse array (double value), I want to remove rows and columns that are all zero. The 6000 rows and 7000 columns will be removed.
To remove the 6000 rows, this remove() function is called 6000 times.
To remove 7000 columns, I need a loop of the 2000 remaining rows from the previous step to remove each one of them, so this remove() function is called 7000*2000=14,000,000 times.

It takes 4 minutes to complete the task.

Also, because in each remove() operation, a new array is created, and the old array is left for Garbage Collection, the program is using almost 4G of memory during these 4 minutes.

So I decided to create my own ArrayRemove(array, i) function that is using the same array to store the modified array:

    private void ArrayRemove(Object array, int index) {
        int length = java.lang.reflect.Array.getLength(array);
        if ((index < length - 1)  && (index > -1)){
            System.arraycopy(array, index + 1, array, index, length - index - 1);         

        }
        return;
    }


Using this new function, it takes 1.5 minutes to perform the same task, and the peak memory is 500M.

The downside of this function is that: The array is NOT resized. Before process it contains 7000 elements; After process, it still contains 7000 elements. The last element is obsolete, not useful at all, but the array.length is still 7000. Even though the sparse array is still [8000][12000], only the first [2000][5000] are meaningful.

So, you need to remember how many rows and columns are removed, and form a new array :
       removerowcount = RemoveRow(array);
       double[][] temparray = new double[array.length-removerowcount][array[0].length];
       temparray = Arrays.copyOf(array, array.length-removerowcount);
       array = temparray;
     
       removecolumncount = RemoveColumn(array)
       double[][] temparray = new double[array.length][array[0].length-removecolumncount];
       for(int i=0; i < array.length; i++){
              temparray[i] = Arrays.copyOf(array[i], array[0].length-removecolumncount);
       }
       array = temparray;


The best thing is, you only need to form new array twice (the first time is to remove the trailing obsolete rows before starting RemoveColumn), not 14,006,000 times.

Have fun! Make sure to add your own error handling code.

PS: To remove an element from array, the above code works. But to achieve what I intend to (to remove null rows and columns), there is a better way (a Matlab way): Have another 2 arrays (nullrows[] and nullcolumns[]) to keep a record of which rows and columns are null, then remove these rows and columns in a batch.

Labels: ,