Case-Based Reasoning for Medical Coding – Project topics materials

Get Complete Project Material File(s) Now! »

National Cancer Registry of Luxembourg

Registries are long-term studies meant to provide data on a given phenomenon. The aim of a registry is to enable the analysis of the trends of the given phenomenon, in order to assess its impact. Cancer registries are one example, focusing on the analysis of cancer and its burden on health care costs and living conditions.
In 2013, the Luxembourg Institute of Health (LIH) was mandated by the Ministry of Health to set up a nationwide population-based cancer registry in Luxembourg. The National Cancer Registry (NCR) is a systematic, continuous, exhaustive and non-redundant collection of data for each new case of cancer (excluding non-melanoma skin cancers). The NCR has been implemented according to international standards, recommendations and classi cations. It is a large dataset with high level of quality, completeness and national coverage.
The aim of the NCR is to provide an objective analysis of cancer evolution in Luxembourg (inci-dence, prevalence, survival after a cancer diagnosis). It enables health care professionals and public authorities to better assess the quality of health care given to cancer patients. Another goal is to evaluate prevention campaigns and national cancer screening programs (i.e. for breast and colorectal vessels. Source: https://www.cancer.gov/types/metastatic-cancer cancers). On medium-term, the registry will serve as a tool to evaluate whether the National Cancer Plan is achieving its objectives.
All new cases of solid tumors (excluding non-melanoma skin cancers) since 2013 and of hematological malignancies since 2014 are recorded in the NCR. The NCR is a multi-source system and the main sources are the hospital-based cancer registries. The LIH has created a speci c software for data entry, quality check and data export, called ONCOLIN, and has made it available to Luxembourg’s four hospitals and to the National Radiotherapy Center (Centre Francois Baclesse). Other sources of data are the medico-administrative databases provided by the \Caisse Nationale de Sante » (national social security) and the \Contr^ole Medical de la Securit Sociale » (health inspection). Data on the vital status of cancer patients is extracted from death certi cates provided by the Ministry of Health. Pathological records from the \Laboratoire National de Sante » (national health laboratory) will be integrated into the NCR as soon as they are available in an electronic and standardized format.
Basic and advanced training courses on how to codify and introduce data from patient records into a hospital-based cancer registry are provided to data entry operators working at the hospitals (referred to as \Data Managers Cancer ») by the team of the NCR. In addition, one-day workshops are organized ten times a year.
One characteristic that distinguishes the NCR from other population-based cancer registries, is the population it covers: the NCR includes not only people living in Luxembourg at the time of cancer diagnosis, but also people living abroad who have been diagnosed and/or treated in Luxembourg. Given the signi cant number of cross-border workers in Luxembourg and the European directive on cross-border health care, quality of care indicators and health care resource estimates that are based on the NCR data must take this speci city into account.
Specialized clinicians were involved right from the beginning of the NCR. Seven working groups of clinicians have been established. Of the activities carried out by these groups, clinical guidelines for prostate, lung, colorectal and breast cancers were prepared for Luxembourg, and then submitted for approval and publication to the \Conseil Scienti que dans le Domaine de la Sante ». These working groups and the Scienti c Committee of the NCR de ned a set of quality of care indicators for di erent types of cancer (Breast, lung, prostate and colorectal cancers). The Scienti c Committee is also responsible for the validation of all results before dissemination and publication.
NCR activities are conducted in close collaboration with hospitals, clinicians, the National Cancer Institute, foundations involved in cancerology, medical and scienti c societies, and the Ministry of Health. Besides being a surveillance system, the NCR is recognized as an important member of the oncology landscape in Luxembourg. Representatives of the NCR participate in several national working groups within the framework of the National Cancer Plan, and in the scienti c and technical committee of the National Colorectal Cancer Screening Program.
One of the purposes of the NCR is to provide an infrastructure dedicated to epidemiological and clinical research in oncology. One example of national collaboration is the future partnership with the Integrated Biobank of Luxembourg (IBBL) for the \Plan Cancer Collection » project (PKC project) within the framework of the National Cancer Plan. For this project, tumor specimens collected and stored at the IBBL will be annotated with data extracted from the hospital-based cancer registries and from the NCR, to create a national tumor bank.
By collecting standardized data with a high level of reliability, the NCR will be able to transfer Luxembourg cancer data to European and International organizations in order to compare results of Luxembourg with those of other European countries. The NCR is a member of the European Network of Cancer Registries (ENCR), the International Association of Cancer Registries (IACR) and the Group for cancer Registration and Epidemiology in Latin Language countries (GRELL). The NCR has published its rst national report on Non Small Cell Lung Cancer (NSCLC) quality of care indicators3, in December 2018.

International Coding Standards

For public health, the collected data play a crucial role. In order to be able to compare the results of one study with results of another study, it is necessary to ensure that the collected data contain comparable information. This implies that the meaning of the collected features should be the same and that the process used to collect the data is similar (similar sources, surveys, exams, data cleaning and processing). However, given that these studies are often performed by researchers from di erent teams, institutions or even countries, it is necessary to have a global shared agreement of the previous aspects, i.e. international standards. The idea of standards is not speci c to medical coding. In natural sciences, like physics or chemistry, all units have been de ned in the International System of Units (SI), to facilitate sharing and comparing of measurements and results. For medical coding, these international standards usually de ne the context in which the data are collected and used, in particular specifying the terms and vocabulary to use. The information is usually not kept in textual, but rather coded using alphanumerical sequences. For example, for the International Classi cation of Diseases for Oncology (ICD-O), which is a coding standard used for the registration and analysis of cancer cases, the topography of the tumor, i.e. the original body part in which a tumor started developing, is coded using a three digit sequence preceded by the letter C and a dot between the second and the third digit. C34.1 is a valid topography code. There are a little over 300 topography codes, ranging from C00.0 to C80.9. In general, one code may be linked to more than one medical concept. For the ICD-O topography codes, the code C40.0 is used for all bones of the arms and shoulders. The number of codes and the grouping of concepts depend on the intended use of the data and is chosen so as to facilitate data analysis. This can lead to di culties during the coding process, as the available source data might have been collected with a di erent purpose in mind. For example, the data collected in a patient record for clinical purposes (e.g. diagnosing and treating a patient) are di erent from the data collected for a cancer registry (observing cancer occurrences). The granularity might be di erent and the relevant information is not the same in both cases.
To overcome these problems, coding standards may provide some solutions and rules, however, they do not and cannot cover all possible situations. Thus to complete these standards, coding guides have been created. They provide additional guidelines and are not as strict as coding standards. They consist of expert knowledge, domain agreements and guidelines developed by everyday use. Their goal is to ease coding and increase the quality of the coded data, by facilitating the understanding and application of the sometimes very vast and intricate international coding standards. Some of these recommendations are also created and maintained by international organizations and working groups, like the recommendations for cancer registries of the ENCR4. As an illustrating example, let us consider the case of a particular male patient which should be coded for the NCR5. In 2013, he su ered from lasting pains in his side and a sudden loss of appetite. On January 12th, 2014, a CT scan of his left kidney revealed nothing out of the ordinary. As the patient’s condition continued to deteriorate, a second scan was made on February 15th, 2014. This time, two suspicious neoplasms were found and the clinicians suspected cancer. Another CT scan made on March 10th, 2014 showed signs of multiple renal adenopathy, which reinforced the cancer suspicion. On June 2nd, 2014, a renal biopsy was carried out and the following histological ndings pointed to a renal cell carcinoma.
To code this case, an operator needs to carefully read all the relevant parts of the patient record. For the NCR, there is a lot of data to collect. Some of it is mandatory and strictly de ned by international coding standards. There are also data which have been selected by the various committees of the NCR. These data have been deemed useful for national indicators and measures. It is also possible to have data which are collected for a speci c study, over a limited period of time. For example, a study on lung cancer might require more detailed information on the smoking habits of cancer patients than is normally collected for the NCR.
The mandatory data to collect concern the basic information about the cancer, like when (inci-dence date) and where (topography) it started, how it has been diagnosed and what type of cancer (morphology) it is.
The incidence date is the date of the event which allowed to con rm the cancer diagnosis. This date is usually not part of the patient record, as it serves no purpose from a treatment point of view. It needs to be determined by the operator of the NCR using the international de nition of the incidence date provided by the ENCR6.

READ "The transcription factor NRF2 regulates matrisome gene expression and collagen fibrillogenesis in human skin fibroblasts “

Current Work on Medical Coding

There is a lot of ongoing research in medical coding, notably on the creation and maintenance of coding standards, on coding support and on automated coding.
An important contribution of coding standards is the de nition of common vocabulary and seman-tics. This is an essential element to obtain comparable data. To increase the quality and exhaustiveness of these standards, it can be very useful to include experts from various areas (e.g. di erent countries, hospitals) and domains (e.g. di erent specializations). This is usually done through the creation of working groups by international organizations, like the World Health Organization (WHO) or the ENCR.
When applying these standards, operators have to extract the relevant information and code it using the provided rules. However, the textual reports from the medical record may use di erent terms. It is possible that synonyms are used (e.g. in uenza and u), but more speci c terms or more general terms (e.g. viral respiratory infection) could also be used. The information needed may also be split among several documents, thus needing some reasoning to reconstruct the information required by the coding standard. This partially explains the slow uptake of more automated coding systems, as both the di erence in terms used and the lack of consistent structure constitutes a major challenge for systems.
To make the content of medical documents more accessible for machines, it would be interesting to structure them more precisely. However, this requires a huge amount of work for all parties involved. In fact, as medicine progresses and new discoveries are made, the items for each document need to be updated to follow the evolution of the eld. The previously coded data might need to be updated and recoded using the new standards, which adds another burden for operators. It would also imply a major change for the daily activities of most health practitioners, with new tools and methods to support them in the use of these structured documents.
Another interesting avenue would be to enable machines to parse and use natural language text. This would reduce the changes in daily activities, as health professionals could continue to write free text reports.

Natural Language Processing

Natural Language Processing (NLP) is an area of computer science and linguistics that deals with parsing and exploiting data represented using natural language [Manning et al., 1999]. The complexity of NLP tasks comes form the richness of the languages used in our society. In fact, most languages are full of synonyms and expressions, allowing for very nuanced descriptions. In order to properly understand the content of a text, a system needs to know all of these synonyms and expressions, but it also needs a rm understanding of the context in which this text was written. Each domain can have its own speci c ways and customs of describing and writing.
Despite this complexity, enabling machines to process natural languages provides some very inter-esting possibilities. Currently, there is a huge amount of information which exists only in textual form, with no speci c structure. For example, the World Wide Web contains vast amounts of documents, with no coherent structure. There are some standards de ning how to access these documents, create links between them and how to present them for a human user (e.g. hypertext transfer protocol [Field-ing et al., 1997], cascading style sheets [Atkins et al., 2019]). Similarly, medical records for the most part are also only existing in the form of texts, with very little structure. For example, exam reports might have di erent sections (e.g. observations, conclusions), however, the content of these sections is a free text written by a health professional.

Automated Coding

NLP is used in many areas, medical coding is one of those. More recent research has focused on applying machine learning and other arti cial intelligence techniques [Shi et al., 2017, Kavuluru et al., 2013a, Kavuluru et al., 2013b, Kavuluru et al., 2015, Pons et al., 2016, Stan ll et al., 2010] to parse and annotate medical documents automatically, minimizing human intervention as much as possible. The hope is to achieve at least human-like performance and bene t from the increased speed provided by computers.
In 2007, there was a contest at the BIONLP workshop for the annotation of ICD-9-CM1 codes to radiology reports [Pestian et al., 2007]. For this task, a dataset of manually annotated, anonymized, English radiology reports was provided, with a learning set and a test set. The goal of the task was to add one or more ICD-9-CM codes to each report. Several designs were proposed [Aronson et al., 2007, Patrick et al., 2007, Crammer et al., 2007], some with very promising performance. Despite recent progress, there does not exist any widespread solution for automatically coding documents.

Coding Support

Besides automatic annotations, there has also been research and development of tools to help operators and clinicians use the existing medical terminologies [Noussa-Yao et al., 2015]. When a user wants to assign a medical code for a given patient, most likely codes are presented, reducing the number of possibilities for the user. The challenge of this approach is the selection of the appropriate, most likely codes. To address this issue, probabilities could be used, as presented in [Lecornu et al., 2009].
Given the complexity of medical coding and in particular coding for a cancer registry, it is essential to provide support for operators.

Table of contents :

1 Resume francais
1.1 Introduction
1.1.1 Registres du Cancer
1.1.2 Codication
1.2 Preliminaires
1.2.1 L’intelligence articielle explicable
1.2.2 RDFS et SPARQL
1.2.3 Les distances d’edition
1.2.4 Le raisonnement a partir de cas
1.2.5 L’argumentation
1.3 Representation des connaissances
1.3.1 Representation des cas
1.3.2 Representation des arguments
1.4 Raisonnement a partir de cas et argumentation
1.4.1 Types d’arguments
1.4.2 Exemple
1.4.3 L’etape retrouver
1.4.4 L’etape reutiliser
1.4.5 Les etapes reviser et retenir
1.5 Evaluation
1.5.1 Methode
1.5.2 Resultats
1.5.3 Discussion
1.6 Conclusion
1.6.1 Codication medicale
1.6.2 Perspectives
2 Introduction
2.1 Public Health
2.2 Oncology
2.3 National Cancer Registry of Luxembourg
2.4 International Coding Standards
2.5 Coding diculties
2.6 Problem description and goals
3 Medical Coding Assistance
3.1 Current Work on Medical Coding
3.1.1 Natural Language Processing
3.1.2 Automated Coding
3.1.3 Coding Support
3.2 Coding Assistant
3.2.1 Automated Coding for the NCR
3.2.2 Implementation
4 Case-Based Reasoning for Medical Coding
4.1 Explainable AI
4.2 Knowledge Representation and Manipulation
4.2.1 Resource Description Framework
4.2.2 Resource Description Framework Schema
4.2.3 SPARQL Protocol and RDF Query Language
4.3 Semantic Web
4.4 Edit Distance
4.5 Case-Based Reasoning
4.5.1 The 4-R cycle
4.5.2 Knowledge Containers
4.5.3 Case Maintenance
4.6 Other Problem Solving Methods
4.6.1 Rule-Based Reasoning
4.6.2 Preference-based reasoning
4.6.3 Conversational Systems
4.6.4 Recommender systems
4.6.5 Belief Merging
4.6.6 Argumentation
5 Case Acquisition and Representation
5.1 Case Denition
5.2 Case Representation
5.3 Case Authoring
5.3.1 Initial Case Acquisition
5.3.2 Reviewing and Revising New Cases
5.4 Use Case
5.4.1 Asking a Question
5.4.2 Reviewing a Case
6 Case Retrieval and Reuse
6.1 Running Example
6.2 Retrieval
6.2.1 Coding Expert Reasoning
6.2.2 Argument Types
6.2.3 Comparing Source Cases
6.3 Reuse
6.3.1 Reuse by Copy
6.3.2 New Coding Standards
6.4 Use case
6.5 Conclusion
7 Evaluation
7.1 Method
7.1.1 Evaluation Set
7.1.2 Indicators
7.2 Results
7.3 Discussion
7.4 Conclusion
8 Conclusion and Future Work
8.1 Contributions
8.2 Domain Knowledge
8.3 Case Representation
8.4 Argumentation
8.5 Coding Assistant