I was going to post a comment with similar points, but instead Ill just add to s...

Dn_Ab · on May 25, 2014

The no free lunch theorem states that, the performance of any two search algorithms are equivalent when averaged across all possible problems. This fails to hold in coevolutionary settings - selecting a champion through self play. In such cases there will be pairs of algorithms where one is demonstrably better than the other for all possible problems; a free lunch by criteria of the NFL theorem [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100... ; also worth checking out: http://www.santafe.edu/media/workingpapers/12-10-017.pdf].

The No Free Lunch Theorem is also one of those limit statements that rarely impinges on reality. We are not interested in all possible functions - the majority of which will be of such complexity as to be indistinguishable from random - only those with exploitable structure. It's much the same reason for why, although kmeans is NP-hard, failing to find a good clustering is very often suggestive of an ill-posed problem with no interesting structure. If kmeans didn't find a good cluster, very possibly a good one does not exist (e.g. Clustering is difficult only when it does not matter; http://arxiv.org/abs/1205.4891)

sampo · on May 25, 2014

> If nothing is known about the objective function

What that "nothing is known" means, in the context of NFL theorems, is that values at each point are random, and independent of values at other points.

For example, if our domain is a 100-by-100 grid, and we fill the grid with random numbers, now we have such a function. And indeed, no optimization function is of any help in trying to find the largest of the 10 000 random numbers.

But this kind of totally random functions pretty much do not exist in the real world. So if out objective function came from the real application (and not from a random number generator), we already know enough about it that we can say that the NFL theorems don't apply to it.

srean · on May 25, 2014

Sure, no particular instance of a function will be so utterly random, but I think you are probably missing the point of the parent comment. All that NFL says is that if you fix an optimization algorithm and average its performance over all possible measurable functions, no algorithm will be better than the other. The average is with respect to the distribution you described. However, if you skew the distribution so that certain class of functions are more likely than others, then an ordering will be imposed over some of the algorithms. Some algorithms will be better than others.

Therefore, trying to design a fully generic optimization algorithm is a fool's errand. On the contrary, a specific objective function will have specific properties and in that case it would be valuable to use specific algorithms suited to its properties. To follow the argument forward, it would be very useful if it was characterized what is exactly the class of objective functions that EA, GA, swarms, ant colon algorithms handle well. Under what assumptions is their specific randomization and parallel evaluation the right thing to do. These are resource intensive procedures, so such a characterization will tell us when is that effort well spent.

One uncharitable but plausible way to read the tail end of your comment is that just because a particular function is not random, it will violate NFL and magically make EA, GA style algorithms appropriate, that is not true. Glad that you were able to reword it before the edit window closed.

Edit: replying here because this thread is becoming nested too deep.

Hi @Dn_Ab I think we may have talked passed each other, so clarifying. Of course a bias / preference will naturally get induced over algorithms when you select objective functions non-uniformly. No magic there. But that is not going to make a particular choice of an algorithm (in this case EA, GA et al) magically appropriate for whatever specific nonuniform distribution over the objective function chosen. The choice either has to be deliberate (in which case we would need to know a measurable description of the class where these algorithms work better) or one has wait to get wildly lucky, the latter is about as productive as playing lottery except that the tickets are pretty expensive when we play EA, GA etc.

@sampo > have very specific mathematically defined meanings here, and those meaning are probably very different than what a casual reader might expect.

Good point, upvoted, now I understand your previous comment better.

@sampo > I also like your "snake oil" metaphor, and I was happy to see your top comment on this topic.

I expected it to be downvoted out of existence given a few snarks that I yielded to.

sampo · on May 25, 2014

> All that NFL says is that if you fix an optimization algorithm and average its performance over all possible measurable functions, no algorithm will be better than the other.

> Therefore, trying to design a fully generic optimization algorithm is a fool's errand.

I don't think we have any factual disagreement. In pure mathematical context, what you say is absolutely true.

I just wish to emphasize, that "all possible measurable functions" and "fully generic optimization algorithm" have very specific mathematical meanings here, and those meaning are probably different than what a casual reader might expect.

In a set of "all possible functions" an overwhelming majority are functions that are indistinguishable from random noise. They are e.g. not continuous, not even remotely like continuous, they have no structure at all whatsoever.

On an intuitive level, everybody understands that it is a fool's errand to try to design an optimization algorithm to find the maximum from an array of random numbers. But when you just say "trying to design a fully generic optimization algorithm", people may not realize that the "fully generic" contains the requirement that it should also work on random noise.

The No Free Lunch Theorem does not say anything about the feasibility of fully generic optimization algorithms, if we restrict the "fully generic" to mean anything that can be expected to appear in any real world application. A lot of people might agree that an algorithm that performs well in any real world problem is "fully generic", even though it may not perform well in all imaginable abstract mathematical settings (which mostly means settings of random noise).

To your last comment: I also agree with you on this front. Also my understanding is that for every or almost every type of problem there are other optimization algorithms that drastically outperform genetic algorithms. I also like your "snake oil" metaphor, and I was happy to see your top comment on this topic.

Dn_Ab · on May 25, 2014

The bias is not magical at all if it is against complexity - it's what underpins occam's razor.