Fossil data and fuzzy sets

By Cathy Willermet

The nature of fossil evidence available to answer questions of human origins is fragmentary, has a patchy distribution over large geographic areas, and is often difficult to date accurately. There is disagreement over both how many groups of humans are represented, and how to measure variability within and between these groups. Not surprisingly, interpretations of this evidence vary among paleoanthropologists.

Living species of plants and animals are sometimes difficult to define, since sometimes individuals of "good" species are observed to mate with individuals of other species. Fossil species are even more difficult to define, since we only have parts of a skeleton, and could not observe its behavior when it was alive. Also, no two fossil individuals are exactly alike, just like no two living individuals are exactly alike. So it is often difficult to classify an individual specimen into a fossil species.

What paleoanthropologists try to do is set up a list of features that most of the individuals of a species share (for instance, a large brow ridge or a sagittal keel). But any given specimen may exhibit only some of those features. If the specimen has enough of the features (maybe 70% or so) then it will be placed into that fossil species, even if it does not show all of the features associated with that species. We have to classify fossils this way because we often do not have complete specimens that preserve all of the features in which we are interested.

However, because of the variation that exists a) because no two individuals are alike, and b) we have to construct the fossil species based on what we find, we're never really sure how good our reconstruction of fossil species is. Think about it: if aliens dug up the human species 100,000 years from now, and found Lucy Lawless (aka Xena), Carrie Fisher, Dan Majerle, and Billy Crystal, they might place them into two groups (tall and short)!

Also, two species can overlap on some variables. For example, Neandertals and modern humans overlap in terms of brain size (the Neandertal average is bigger than our average). If we use just brain size, we can't clearly separate these two groups. Lots of features overlap this way. So there are two things we can do:

Of course the best thing is to do both.

What do you do when a fossil shows lots of features from more than one list? Unfortunately this happens a lot. If a species is in a transition period (linear speciation) or very recently split into two species (branching speciation) then the two fossil species are going to be very similar because they share a very recent common ancestor. Maybe the two groups were still interbreeding sometimes. How can you tell them apart with just the bones?

If you set up your research to require an individual specimen to belong only to one species, then you might have problems with classification. Perhaps you should design your research to allow specimens to belong to more than one species. At this point, you should be thinking, "Wait a minute...I only belong to one species, Homo sapiens. I'm not partially in another species!" Of course. But think about this...

Suppose you were a forensic anthropologist working for the police. The police brought you a skull that was found in the desert. They want to find out who this person was who died. It is your job to narrow down their missing person search. They want you to figure out, first, if the skull is from a male or a female. Now of course the skull will only belong to a male OR a female. But as you take measurements on the skull, you find that some measurements fall into the male range, and some into the female range. This is normal, as human males and females overlap for many measurements. Let's say that the skull comes out male in 60% of the measurements. What do you do? You could a) tell the police that the skull is from a male, or b) tell the police that it is a male, but likely a small male. Which conveys more information? B, of course.

The same logic can be applied to fossil species. If you have a fossil that shows features of more than one group, you could say a) it's mostly like species A, therefore it IS species A, or b) it's mostly like species A, but it has lots of species B features too, so maybe we should look at these species a bit more carefully, maybe better study the boundary between these two groups.
Putting a specimen into more than one set requires a special kind of math, called fuzzy logic. NOT fuzzy as in ill-thought-out. Fuzzy as opposed to crisp.

Perhaps some definitions are in order. Crisp sets are collections of discrete elements which could be conceptualized as forming a distinct group. Crisp set membership is to a degree of 1 or 0, meaning that an element is either in the set (1) or not (0). Set membership is exclusive, meaning an element is in only one set, unless it is in the intersection between two sets. Objects in this intersection belong 100% to both sets. This is how we normally classify things. Thinking about sets this way is fine for much of the data we encounter. However, interpretations of specimens that fall in the intersection between two sets can be problematic.

For example, let's look at two sets of vehicles, cars and trucks. There are many variables that one can use to classify vehicles, including shape, size of engine, type of shock absorbers, and so on. Two specimens, a sedan and a pickup truck, are easy to place crisply into their respective sets. But what does one do when faced with a specimen like an El Camino? Perhaps this is not too difficult; it shares features with elements of both sets, so it is placed in the intersection of these sets. However, it is the interpretation of this that then becomes problematic. While the sedan is 100% a car and the pickup truck is 100% a truck, in a crisp set the El Camino is both 100% a car and 100% a truck. This interpretation does not reflect the blend of features presented by the El Camino. What we know intuitively, and what we want to communicate, is that the El Camino is partly a car and partly a truck.

The concept of fuzzy sets originates from the observation that the world is not really composed of crisp sets, it is not really black and white, but rather is composed of shades of grey. Many variables overlap more than one group, and we want our classification to reflect this. Fuzzy set membership of an element can be to a degree anywhere from 0 to 1. Specimens can be a member of many sets to a degree. Most importantly, objects in an intersection between two sets can belong to both sets to a degree.

Revisiting our example, we can set up our two sets as fuzzy sets. The vertical axis represents fuzzy membership in that set (to what degree the specimen belongs in that set); the horizontal axis represents the sets. In this model, the sedan and the pickup truck still sit comfortably within their respective sets. Now, however, the El Camino is partially in both sets, and our interpretation can reflect that the El Camino is partly a car and partly a truck. Depending upon the criteria one uses to define these sets, the El Camino could be 50% a car, 72% a car, and so on.

So fuzzy sets can apply to fossils too. Or males and females. My dissertation research involves trying to figure out how many groups of humans are living in the world around 100,000 years ago, when we have Neandertals in Europe and the Near East, and some other archaic-looking humans in North Africa and Europe, and some more modern-looking humans in the Near East and East/South Africa. One species? Two? Three? I can't wait to find out...