Background and aims In 2008, the International Association for the Study of Pain Special Interest Group on Neuropathic Pain (NeuPSIG) proposed a clinical grading system to help identify patients with neuropathic pain (NeP). We previously applied this classification system, along with two NeP screening tools, the painDETECT (PD-Q) and Leeds Assessment of Neuropathic Symptoms and Signs pain scale (LANSS), to identify NeP in patients with neck/upper limb pain. Both screening tools failed to identify a large proportion of patients with clinically classified NeP, however a limitation of our study was the use of a single clinician performing the NeP classification. In 2016, the NeuPSIG grading system was updated with the aim of improving its clinical utility. We were interested in field testing of the revised grading system, in particular in the application of the grading system and the agreement of interpretation of clinical findings. The primary aim of the current study was to explore the application of the NeuPSIG revised grading system based on patient records and to establish the inter-rater agreement of detecting NeP. A secondary aim was to investigate the level of agreement in detecting NeP between the revised NeuPSIG grading system and the LANSS and PD-Q. Methods In this retrospective study, two expert clinicians (Specialist Pain Medicine Physician and Advanced Scope Physiotherapist) independently reviewed 152 patient case notes and classified them according to the revised grading system. The consensus of the expert clinicians' clinical classification was used as "gold standard" to determine the diagnostic accuracy of the two NeP screening tools. Results The two clinicians agreed in classifying 117 out of 152 patients (ICC 0.794, 95% CI 0.716-850; κ 0.62, 95% CI 0.50-0.73), yielding a 77% agreement. Compared to the clinicians' consensus, both LANSS and PD-Q demonstrated limited diagnostic accuracy in detecting NeP (LANSS sensitivity 24%, specificity 97%; PD-Q sensitivity 53%, specificity 67%). Conclusions The application of the revised NeP grading system was feasible in our retrospective analysis of patients with neck/upper limb pain. High inter-rater percentage agreement was demonstrated. The hierarchical order of classification may lead to false negative classification. We propose that in the absence of sensory changes or diagnostic tests in patients with neck/upper limb pain, classification of NeP may be further improved using a cluster of clinical findings that confirm a relevant nerve lesion/disease, such as reflex and motor changes. The diagnostic accuracy of LANSS and PD-Q in identifying NeP in patients with neck/upper limb pain remains limited. Clinical judgment remains crucial to diagnosing NeP in the clinical practice. Implications Our observations suggest that in view of the heterogeneity in patients with neck/upper limb pain, a considerable amount of expertise is required to interpret the revised grading system. While the application was feasible in our clinical setting, it is unclear if this will be feasible to apply in primary health care settings where early recognition and timely intervention is often most needed. The use of LANSS and PD-Q in the identification of NeP in patients with neck/upper limb pain remains questionable.