Mar 28 2006

Collections, Sets, Lists, and the Deep Existential Question of Equality

Published by Robert Fischer at 7:05 am under To Be Categorized

The Java API is badly in need of editting.

I just bumped into Collection.equals(Object) method, which contains this gem:

Thus, a custom equals method for a collection class that implements neither the List nor Set interface must return false when this collection is compared to any list or set. (By the same logic, it is not possible to write a class that correctly implements both the Set and List interfaces.)

So a list that guaranties uniqueness would have to be a Set or a List, but absolutely cannot be both. At least, not without violating the API.

The intuitive API to me is to have the Collection interface require that the Collection.equals(Object) method to return true iff the argument is a collection whose Iterator returns the same elements in the same order as this.iterator(). This would mean that collections with different add/remove behavior could be equals to eachother, but that’s already true — a synchronized ArrayList is equals to a unsynchronized ArrayList is equals to a typed List is equals to a LinkedList by definition, as long as they return the same elements in the same order.

The real fun is that the note in the Collections class is actually wrong. If you look at Set.equals(Object), it requires that the parameter is equals to this iff they are both sets, and if the parameter has the same size as this and if all the elements of this are contained in the parameter. If you look at List.equals(Object), this is equals to the parameter iff they are both lists, and if they return the same elements in the same order. I see nothing in these definitions that says you can’t satisfy both of them at the same time.

Popularity: 2% [?]

One Response to “Collections, Sets, Lists, and the Deep Existential Question of Equality”

  1. bhurt-awon 28 Mar 2006 at 9:05 am

    I have come to the conclusion that equality testing can not be a member function, i.e. it can not be an operation that the class itself provides. That Java’s mistake was to try to provide Object.equals() as a method.

    Here’s the problem with saying that two collections of objects are equal if they hold the same elements. What happens if one collection holds duplicates of the same element? This is possible with lists but not with sets. So we have a set that holds one (and only one) copy of element x, and a list that holds two copies of element x- other than that, they are identical to each other. Are these two collections the same? What happens if I make the set a list that just happens to contain only one x, and not two?

    And then there’s the performance problem. Testing for membership on an array or a list is O(N), which means testing for set equality based on membership can be O(N2). Your iterator definition at least has the advantage of being O(N).

    And then there is the implied behavior. If I have a set and a list which are equal (currently), and add element x to both, can I rely on them to still be equal? The answer, in this case, is no- x could already be a member of the set (in which case adding a second copy wouldn’t change the set, but would change the list), or the list could add x in such a way as to break the ordering requirements- x could show up in a strange location compared to the set. I could argue that this means that a set and a list can never be equal, because equal operations on both may result in them becoming not equal.

    When there are multiple different, legitimate, tactics the user of the class might want, I can flat out gaurentee that no matter which one you pick, you’ll pick the wrong one. If the best choice is right only 40% of the time, and all the other choices are right less than that, that still means that picking the best choice is wrong most of the time.

    At this point any choice is the wrong one- and the only choice is not to choose. There is no one comparison operation which is right often enough that it can be called the comparison operation. Which is why the class- which can only supply one comparison operator- should not supply any comparison operator. Make the user define what they mean by equal. That way it’s right 100% of the time, and not wrong most of the time.

    In other words, don’t use equals() or compareTo(), use a Comparator.

Trackback URI | Comments RSS

Leave a Reply

Green Web Hosting! This site hosted by DreamHost.