Winboard Forum

by **Onno Garms** » 30 Oct 2008, 08:42

Hello,

the problem with engine testing is that it is almost impossible to play a sufficiently large number of games to get a significant change in the score, at least if you try small changes in the eval function.

What if we could measure the quality of every move? This would immediately enlarge the database by a factor of 40 to 60.

What I'm thinking about is the following test:
1. Grab some 100.000 positions from master's games
2. For each of the positions, compute the best move and its value with a strong engine and sufficient time. This will take some weeks but must be done only once.
3. For each of the positions, compute a move with your engine and shorter time. If the move is the same like the move computed in 2. score zero minus points. Otherwise, compute the value of your engine's move with the strong engine and sufficient time, remember the result for future tests, and score the difference of the values as minus points.

Now try to minimize your minus points.

The time for computing move's values with the strong engine is large in the beginning, but assymptotically zero ;-)

Has anybody tried this? Do you think this will help to improve an engine's results in actual gameplay?

Some problems that might arise:
- If engine A plays 40 OK moves and engine B plays 39 excellent moves and one terrible blunder, A might be better at gameplay, but B might be better at above test.
- The strong engine becomes the mother of all evaluation, your engine starts to imitate its style, disregarding your personal strengths and weaknesses.

by **Harald Johnsen** » 30 Oct 2008, 09:49

1) & 2)
Why would you want to use weak GM games and then spent that much time to produce moves with an engine when you can directly take games & moves from rybka 3 games ?

3) This was done to tune DT eval (see http://www.tim-mann.org/deepthought.html)

I don't see the point of reproducing the eval of another GM or another engine. Btw since you can not have (exactly) the same eval because we can not know the component used in the eval (of rybka 3 or a GM) I think that the new eval will allways be weaker than the original eval.

Ok, you don't want to reproduce an existing eval, but read DT team notes, they said that playing like GM was weakening their engine.

HJ.

by **Onno Garms** » 30 Oct 2008, 10:45

Harald Johnsen wrote:1) & 2)
Why would you want to use weak GM games and then spent that much time to produce moves with an engine when you can directly take games & moves from rybka 3 games ?

I just thought of any source of a large number of reasonable chess positions. You are right, taking Rybka 3 games at long time controls with annoted eval by Rybka is the easier way.

3) This was done to tune DT eval (see http://www.tim-mann.org/deepthought.html)

Thank you for pointing out.

I don't see the point of reproducing the eval of another GM

I never planned that.

or another engine. Btw since you can not have (exactly) the same eval because we can not know the component used in the eval (of rybka 3 or a GM) I think that the new eval will allways be weaker than the original eval.

This is why I'm also not so declined to try out the idea.

You are not bound to Rybka 3. You also might use the reference version of your own engine at significantly longer time control. If

Code: Select all: similarity (new ver at short time, ref ver at long time) > similarity (ref ver at short time, ref ver at long time)

then I would assume that the new version is better. However I would expect that above condition never occurs. :-(

My goal is not to reproduce Rybka 3 eval. My question is if there is a replacement for a true-position-value-oracle that exists in reality and is sufficiently good to be used to judge every move's quality.

Winboard Forum

Testing every move's quality

Testing every move's quality

Re: Testing every move's quality

Re: Testing every move's quality

Who is online