Spec: Best practices for raw vs cleaned input data validation?

I think that is a very good idea, having a fail-safe cleaning function and only check the (maybe) cleaned data. There some ways I know of in which the raw data is “dirty” but there may be other, unexpected problems, that I want to discover. This approach would do that without much additional work/complexity.