Daves Software Blog: February 2013

Friday, 8 February 2013

Problems that CUDA can't solve

The answer was obvious, of course there are problems that CUDA isn't good for.

One example is Chess. The classic case of answering the question as to whether a move is legal, cannot be solved without the move history information, or the current board state.

For example, the sequence

1.e4 e5 2.Nf3 Nc6 3.Bc4 Qxa1

has an obvious illegal move, Qxa1. There is no real point in making a thread for each move individually
because the state of the board before Qxa1 determines the moves legality. The same is true for many types of state machine, and also for things similar to state machines, including Markov chains.

Sunday, 3 February 2013

Dynamic Programming

I think learning about Dynamic Programming (see wikipedia) would be a very optimal technique to learn and put into practice, because so many problems are solvable with these techniques. The problem with coding using these techniques is similar to the problem of applying optimization algorithms to a real world problem - working out what to use for inputs can be quite confusing, and the general best way to learn it seems to be to do loads of examples. I still can't whip out a GA with an auto-tuning NN to solve any random problem I encounter (e.g Rage-style Mega-textures, or optimal rendering).

Making code parallel

Various algorithms would not seem to be good on parallel architectures, or at least are not using the parallel architecture to its full potential.

This is one reason why the current computer architecture using a (multi-core) CPU and parallel GPU (s) is particularly elegant. One CPU core can be reserved for processing things like the operating system background processes, the user interface, and the ability to halt execution of a process. Also some processes would be best on the CPU ... right ?

In my search for examples, one example I thought would be best suited to a CPU process was simulated annealing, because the algorithm is iterative and each loop depends on the last, so the loops would not be easily computed in parallel. I based my assumption on this nice implementation on google code. However running a google search shows that Simulated annealing can be optimized for CUDA

Link

So I was wrong. I am still searching for an example program that should not be in parallel