Due to the diffusion of cryptography in real time applications, performances in cipher and decipher operations are nowadays more important than in the past. On the other side, while facing the problem for embedded systems, additional constraints of area and power consumption must be considered. Many optimized software implementations, instruction set extensions and co-processors, were studied in the past with the aim to either increase performances or to keep the cost low. This paper presents a co-processor that aims to be an intermediate solution, suitable for such applications that require a throughput in the Megabit range and where the die size is a bit relaxed as constraint. To achieve this goal, the core is designed to operate at 32 bits and the throughput is guaranteed by a 2 stage pipeline with data forwarding. The obtained results synthesizing our coprocessor by means of the CMOS $0.18$ $μ$m standard cell library show that the throughput reaches 640 Mbit/s while the circuit size is of only 20 K equivalent gates.