Are most published research findings in empirical software engineering wrong or with exaggerated effect sizes? How to improve?

Are most published research findings in empirical software engineering wrong or with exaggerated effect sizes? How to improve?

Authors
M. Jørgensen