Slight further memory optimisations
Single precision, configurable vector width and memory alignment optimisations.
Improvements on memory utilization and locality.
Full fit and smoother