Ideally, humane endpoints allow for early termination of experiments by minimizing an animal's discomfort, distress and pain, while ensuring that scientific objectives are reached. Yet, lack of commonly agreed methodology and heterogeneity of cut-off values published in the literature remain a challenge to the accurate determination and application of humane endpoints. With the aim to synthesize and appraise existing humane endpoint definitions for commonly used physiological parameters, we conducted a systematic review of mouse studies of acute and chronic disease models, which used body weight, temperature and/or sickness scores for endpoint definition. In the second part of the study, we used previously published and unpublished data on weight, temperature and sickness scores from mouse models of sepsis and stroke and applied machine learning algorithms to assess the usefulness of this method for parameter selection and endpoint definition across models. Studies were searched for in two electronic databases (MEDLINE/Pubmed and Embase). Out of 110 retrieved full-text manuscripts, 34 studies were included. We found large intra- and inter-model variance in humane endpoint determination and application due to varying animal models, lack of standardized experimental protocols and heterogeneity of performance metrics (part 1). Machine learning models trained with physiological data and sickness severity score or modified DeSimoni neuroscore identified animals with a high risk of death at an early time point in both mouse models of stroke (male: 93.2% at 72h post-treatment; female: 93.0% at 48h post-treatment) and sepsis (96.2% at 24h post-treatment), thus demonstrating generalizability in endpoint determination across models (part 2).