Here’s an interesting short paper by Wadler:
To paraphrase, the simplest model of lazy evaluation is given by Church’s original $\lambda$ calculus, and the simplest model of strict evaluation is given by Plotkin’s $\lambda_v$ calculus. $\lambda$ is problematic because it gives a poor notion of the cost of programs, and $\lambda_v$ is problematic because it is not complete. These flaws can be removed at the cost of slightly more complicated models, and the resulting models turn out to be extremely similar. Specifically, the only difference between them is the following law, let x = M in N -> N which holds with laziness and fails for strictness. In other words, the different between strict and lazy languages is exactly what one would expect: in a lazy model unused terms can be ignored, and in a strict model they must be evaluated to make sure they don’t loop forever.