One common approach to achieving very natural results is combining formant manipulation and pitch shifting.
Which of these 4 techniques will sound more natural? It depends on you and how you implement them. Your example sounds like something formant manipulation would do, but there is no way to know for sure without having the original voice sound as reference. Each has its own sound, each will provide different results. I can think of 4 ways to achieve that low-pitched voice: playback speed (aka a sampler's transpose), pitch shifting, frequency shifting, and formant manipulation. I guess 'natural' is a pretty broad and somewhat subjective concept, so for simplicity we will define 'more natural' as 'inducing less change' and 'less natural' as 'inducing more change'. To me it seems that the sound was intentionally designed to sound unnatural.
It's interesting that to me your example doesn't sound natural at all.