Domain-Specific Translator and Optimizer for Massive On-Chip Parallelism