For efficient parallel execution it is necessary to write massively concurrent algorithms and to optimize memory access. In this session we show our approach of a programming model that is able to execute the same concurrent algorithm efficiently on GPUs and CPUs: Similar to OpenMP it allows the programmer to describe concurrency and memory access declaratively but hides complexity like memory transfers between the CPU and the GPU. In comparison to OpenMP our model provides a higher level of expressiveness which enables us to reach a performance comparable to OpenCL/CUDA.