The opportunity to have seven data sets associated with a stated choice experiment that are very similar in content and design is rare, and provides an opportunity to look in detail at the empirical evidence within and between each data set in the context of a range of discrete choice estimation methods, from multinomial logit to latent class to scale multinomial logit to mixed logit, and the most general model, generalized mixed multinomial logit that accounts for preference and scale heterogeneity. Given the problems associated with data from different countries and time periods, we estimate separate models for each data set, obtaining values of travel time savings that are then updated post estimation to a common dollar for comparative purposes. We also pooled all data sets for a scaled MNL model, treating each data set as a set of three separate utility expressions, but linked to the other data sets through scale heterogeneity. This is not behaviourally appropriate with MNL, latent class or mixed logit. The main question investigated is whether there exists greater synergy in the willingness to pay evidence within model form across data sets compared to across model forms within data sets. The evidence suggests that there is a relatively greater convergence of evidence across the choice models, with the exception of generalized mixed logit, after controlling for data set differences; and there is strong evidence to suggest that differences between data sets do matter.