{"id":1194,"date":"2023-05-24T08:28:43","date_gmt":"2023-05-24T06:28:43","guid":{"rendered":"https:\/\/risc.web-email.at\/fachbeitrag-maschinelle-datenanalyse-mittels-kuenstlicher-intelligenz\/"},"modified":"2026-03-20T14:05:33","modified_gmt":"2026-03-20T13:05:33","slug":"technical-article-machine-data-analysis-using-artificial-intelligence","status":"publish","type":"publication","link":"https:\/\/risc.web-email.at\/en\/technicalarticles\/technical-article-machine-data-analysis-using-artificial-intelligence\/","title":{"rendered":"Machine data analysis using artificial intelligence"},"content":{"rendered":"\n<h2 class=\"wp-block-heading is-style-v2-telegrafico\">A Generic Pipeline for AI-based Data Analysis<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">by DI Dr. Alexander Maletzky<\/h3>\n\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<p class=\"has-text-align-left\">Nowadays, data is recorded and stored in vast quantities. The goal is often to create a data-based forecasting model that can be used to predict future developments. However, the path from the raw data to the finished model is usually longer than expected. RISC Software GmbH developed a generic pipeline for AI-based data analysis.<br><br><\/p>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text has-media-on-the-right is-stacked-on-mobile is-vertically-aligned-center\"><div class=\"wp-block-media-text__content\">\n<p><strong>Table of contents<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The problem<\/li>\n\n\n\n<li>Our solution: A generic pipeline<\/li>\n\n\n\n<li>Example application: Mortality prediction in the intensive care unit<\/li>\n\n\n\n<li>Further information<\/li>\n\n\n\n<li>Author<\/li>\n<\/ul>\n<\/div><figure class=\"wp-block-media-text__media\"><img decoding=\"async\" width=\"1024\" height=\"838\" src=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241-1024x838.jpg\" alt=\"Artificial Intelligence\" class=\"wp-image-1168 size-full\" srcset=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241-1024x838.jpg 1024w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241-300x245.jpg 300w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241-768x628.jpg 768w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241-1536x1257.jpg 1536w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/AdobeStock_135142241.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n<\/div>\n<\/div>\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">The problem<\/h3>\n\n\n\n<p>State-of-the-Art Methoden der k\u00fcnstlichen Intelligenz, wie z.B. neuronale Netze, ben\u00f6tigen qualitativ hochwertige, gut aufbereitete Daten als Input, um brauchbare Ergebnisse erzielen zu k\u00f6nnen. Die Realit\u00e4t h\u00e4lt diesen Anforderungen jedoch nicht Stand: Daten enthalten Ausrei\u00dfer und fehlende Werte, oder werden in unterschiedlichen \u2013 manchmal sogar unregelm\u00e4\u00dfigen \u2013 Messfrequenzen aufgezeichnet.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What to do?<\/h3>\n\n\n\n<p>In order to be able to use the data nevertheless, extensive preparation is necessary. Of course, this process depends on the specific data, but essentially always involves the same steps:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Importing the raw data, e.g. from relational databases,<\/li>\n\n\n\n<li>Validating and harmonising the data, and<\/li>\n\n\n\n<li>Importing (&#8220;filling in&#8221;) missing values.<\/li>\n<\/ul>\n\n\n\n<p>For the downstream training of a predictive model, &#8220;organisational&#8221; steps are also necessary, such as partitioning into training and test data. Figure 1 schematically depicts the entire process of machine data analysis.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-top is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-large is-style-rounded\"><img decoding=\"async\" width=\"1024\" height=\"768\" sizes=\"(max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613-1024x768.jpg\" alt=\"Digitale menschliche Silhouette mit Daten\" class=\"wp-image-1144\" srcset=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613-1024x768.jpg 1024w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613-300x225.jpg 300w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613-768x576.jpg 768w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613-1536x1152.jpg 1536w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-172215613.jpg 1920w\" \/><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Our solution: A generic pipeline<\/h3>\n\n\n\n<p>As part of the MC3 project (<a href=\"https:\/\/risc-software.at\/mc3\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/risc-software.at\/mc3\/<\/a>), which deals with data analysis in the medical environment, experts from RISC Software GmbH have developed a generic data pipeline. This allows a large part of the data analysis process to be mapped &#8211; in particular, the aforementioned data preparation is an integral part. In addition, the system provides a uniform interface for any machine learning algorithms or model classes, so that the training, application and analysis of a prediction model can also be mapped via the pipeline.<\/p>\n\n\n\n<p>A special focus is also on the increasingly important topic of Explainable AI &#8211; explainable artificial intelligence. Thus, almost any state-of-the-art explanation method, from Layerwise Relevance Propagation to Shapley Values, can be integrated via a simple interface to make model predictions comprehensible to humans. The pipeline is implemented in such a way that it is as reusable as possible. It is modular, which means that individual components can be combined, added and removed as desired. In addition, end users can easily configure the individual steps, such as specifying validation rules, imputation strategies, etc. Even grid search for exploring the parameter space is easily possible. The applicability of the pipeline is thus not limited to medical data.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-full is-style-rounded\"><img decoding=\"async\" width=\"604\" height=\"299\" sizes=\"(max-width: 604px) 100vw, 604px\" src=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/Generische-Datenpipeline_RISC_Software_GmbH.png\" alt=\"Generic data pipeline RISC Software GmbH\" class=\"wp-image-1149\" srcset=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/Generische-Datenpipeline_RISC_Software_GmbH.png 604w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/Generische-Datenpipeline_RISC_Software_GmbH-300x149.png 300w\" \/><\/figure>\n\n\n\n<p>Figure 1. Schematic representation of machine data analysis using artificial intelligence. The reusable data pipeline includes the components shown in blue, and supports data import and application\/analysis of the trained model.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Example application: Mortality prediction in the intensive care unit<\/h3>\n\n\n\n<p>Researchers at RISC Software GmbH applied the pipeline as an example to the publicly available MIMIC-III database, which is often used in the literature as a benchmark dataset in the field of (intensive) medical data analysis. The aim was to predict the probability of death of a patient in intensive care based on the first twelve hours after admission. Thanks to the pipeline developed, almost the entire data analysis and modelling process could be reduced to a few simple parameter configurations. The result achieved does not have to fear comparison with current scientific publications on this topic.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-large is-style-rounded\"><img decoding=\"async\" width=\"1024\" height=\"683\" sizes=\"(max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1-1024x683.jpg\" alt=\"Medical Technology Concept. Electronical Medical Record\" class=\"wp-image-1155\" srcset=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1-1024x683.jpg 1024w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1-300x200.jpg 300w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1-768x512.jpg 768w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1-1536x1024.jpg 1536w, https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/iStock-872676342-1.jpg 1920w\" \/><\/figure>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n<p>The projects of the Medical Informatics Department are funded by the Upper Austrian government under the strategic economic and research programme &#8220;Innovative Upper Austria 2020&#8221;.<\/p>\n\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Further information<\/h3>\n\n\n\n<p>Technology stack: Python 3, with the relevant add-on packages (Pandas, Plotly, scikit-learn, etc.).<\/p>\n\n\n\n<p>MC\u00b3: Medical Cognitive Computing Center, joint research project of Kepler University Hospital Linz \/ MedCampus III, Johannes Kepler University Linz \/ Institute for Machine Learning, and RISC Software GmbH \/ Department of Medical Informatics.<\/p>\n\n\n\n<p>MIMIC-III database: Medical Information Mart for Intensive Care III, public dataset of over 58,000 intensive care patients from a hospital in Boston, MA; <a href=\"https:\/\/mimic.physionet.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/mimic.physionet.org\/<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\"><\/div>\n<\/div>\n\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignfull is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Contact<\/h3>\n\n\n\n<div class=\"wp-block-contact-form-7-contact-form-selector\">\n<div class=\"wpcf7 no-js\" id=\"wpcf7-f663-o1\" lang=\"en-US\" dir=\"ltr\" data-wpcf7-id=\"663\">\n<div class=\"screen-reader-response\"><p role=\"status\" aria-live=\"polite\" aria-atomic=\"true\"><\/p> <ul><\/ul><\/div>\n<form action=\"\/en\/wp-json\/wp\/v2\/publication\/1194#wpcf7-f663-o1\" method=\"post\" class=\"wpcf7-form init\" aria-label=\"Contact form\" novalidate=\"novalidate\" data-status=\"init\">\n<fieldset class=\"hidden-fields-container\"><input type=\"hidden\" name=\"_wpcf7\" value=\"663\" \/><input type=\"hidden\" name=\"_wpcf7_version\" value=\"6.1.5\" \/><input type=\"hidden\" name=\"_wpcf7_locale\" value=\"en_US\" \/><input type=\"hidden\" name=\"_wpcf7_unit_tag\" value=\"wpcf7-f663-o1\" \/><input type=\"hidden\" name=\"_wpcf7_container_post\" value=\"0\" \/><input type=\"hidden\" name=\"_wpcf7_posted_data_hash\" value=\"\" \/>\n<\/fieldset>\n<div class=\"form-row\">\n\t<div class=\"form-input\">\n\t\t<p><label class=\"sr-only\" for=\"your-name\">Your name <\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-name\"><input size=\"40\" maxlength=\"400\" class=\"wpcf7-form-control wpcf7-text wpcf7-validates-as-required\" id=\"your-name\" aria-required=\"true\" aria-invalid=\"false\" placeholder=\"Name\" value=\"\" type=\"text\" name=\"your-name\" \/><\/span>\n\t\t<\/p>\n\t<\/div>\n\t<div class=\"form-input\">\n\t\t<p><label class=\"sr-only\" for=\"your-email\">Your email<\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-email\"><input size=\"40\" maxlength=\"400\" class=\"wpcf7-form-control wpcf7-email wpcf7-validates-as-required wpcf7-text wpcf7-validates-as-email\" id=\"your-email\" aria-required=\"true\" aria-invalid=\"false\" placeholder=\"E-Mail\" value=\"\" type=\"email\" name=\"your-email\" \/><\/span>\n\t\t<\/p>\n\t<\/div>\n<\/div>\n<div class=\"form-row\">\n\t<div class=\"form-input\">\n\t\t<p><label class=\"sr-only\" for=\"your-company\">Company <\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-company\"><input size=\"40\" maxlength=\"400\" class=\"wpcf7-form-control wpcf7-text\" id=\"your-company\" aria-invalid=\"false\" placeholder=\"Unternehmen\" value=\"\" type=\"text\" name=\"your-company\" \/><\/span>\n\t\t<\/p>\n\t<\/div>\n\t<div class=\"form-input\">\n\t\t<p><label class=\"sr-only\" for=\"your-position\">Position<\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-position\"><input size=\"40\" maxlength=\"400\" class=\"wpcf7-form-control wpcf7-text\" aria-invalid=\"false\" placeholder=\"Position\" value=\"\" type=\"text\" name=\"your-position\" \/><\/span>\n\t\t<\/p>\n\t<\/div>\n<\/div>\n<div class=\"form-row\">\n\t<div class=\"form-input\">\n\t\t<p><label class=\"sr-only\" for=\"your-subject\"> Subject <\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-subject\"><input size=\"40\" maxlength=\"400\" class=\"wpcf7-form-control wpcf7-text wpcf7-validates-as-required\" id=\"your-subject\" aria-required=\"true\" aria-invalid=\"false\" placeholder=\"Thema\" value=\"\" type=\"text\" name=\"your-subject\" \/><\/span>\n\t\t<\/p>\n\t<\/div>\n<\/div>\n<p><span id=\"wpcf7-69de5d8023f3c-wrapper\" class=\"wpcf7-form-control-wrap phone-95-wrap\" style=\"display:none !important; visibility:hidden !important;\"><label for=\"wpcf7-69de5d8023f3c-field\" class=\"hp-message\">Please leave this field empty.<\/label><input id=\"wpcf7-69de5d8023f3c-field\"  class=\"wpcf7-form-control wpcf7-text\" type=\"text\" name=\"phone-95\" value=\"\" size=\"40\" tabindex=\"-1\" autocomplete=\"new-password\" \/><\/span><br \/>\n<label class=\"sr-only\" for=\"your-message\"> Your message (optional)<\/label><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"your-message\"><textarea cols=\"40\" rows=\"10\" maxlength=\"2000\" class=\"wpcf7-form-control wpcf7-textarea\" id=\"your-message\" aria-invalid=\"false\" placeholder=\"Ihre Nachricht an uns\" name=\"your-message\"><\/textarea><\/span><br \/>\n<span class=\"wpcf7-form-control-wrap\" data-name=\"hcap-cf7\">\t\t<input\n\t\t\t\ttype=\"hidden\"\n\t\t\t\tclass=\"hcaptcha-widget-id\"\n\t\t\t\tname=\"hcaptcha-widget-id\"\n\t\t\t\tvalue=\"eyJzb3VyY2UiOlsiY29udGFjdC1mb3JtLTdcL3dwLWNvbnRhY3QtZm9ybS03LnBocCJdLCJmb3JtX2lkIjo2NjN9-5cf29316f0fc31f5a29d11a228757560\">\n\t\t\t\t<span id=\"hcap_cf7-69de5d802463e1.09319571\" class=\"wpcf7-form-control h-captcha \"\n\t\t\tdata-sitekey=\"3a6a81c1-2b2e-4b2a-b1eb-d9446bc09afb\"\n\t\t\tdata-theme=\"light\"\n\t\t\tdata-size=\"normal\"\n\t\t\tdata-auto=\"false\"\n\t\t\tdata-ajax=\"false\"\n\t\t\tdata-force=\"false\">\n\t\t<\/span>\n\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"c96e028190\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/en\/wp-json\/wp\/v2\/publication\/1194\" \/><\/span><input class=\"wpcf7-form-control wpcf7-submit has-spinner btn\" type=\"submit\" value=\"Senden\" \/>\n<\/p><div class=\"wpcf7-response-output\" aria-hidden=\"true\"><\/div>\n<\/form>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<h3 class=\"wp-block-heading\">Author<\/h3>\n\n\n<div class=\"contact-person\">\n      <picture>\n      \n      \n      \n      \n      <img decoding=\"async\" data-aos=\"fade-zoom-in\"\n           data-aos-offset=\"0\" class=\"w-full\" width=\"212\" height=\"293\"\n           src=\"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/amaletzk1-removebg-preview-1.png\"\n           alt=\"\">\n    <\/picture>\n    \n\n<h5 class=\"wp-block-heading\">DI Dr. Alexander Maletzky<\/h5>\n\n\n\n<p>Researcher &amp; Developer Unit Medical Informatics<\/p>\n\n  <\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n<\/div>\n\n<div class=\"wp-block-group-container alignfull \">\n<div class=\"wp-block-group alignwide is-layout-constrained wp-block-group-is-layout-constrained\"><\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>The path from raw data to the finished model is usually further than expected. RISC Software GmbH developed a generic pipeline for AI-based data analysis for this purpose.<\/p>\n","protected":false},"featured_media":1150,"template":"","publication-category":[50,72,74,77,76],"class_list":["post-1194","publication","type-publication","status-publish","has-post-thumbnail","hentry","publication-category-data-science-and-a-i","publication-category-industrie-4-0","publication-category-industry-4-0","publication-category-medical-informatics","publication-category-medizin-informatik"],"acf":[],"portrait_thumb_url":"https:\/\/risc.web-email.at\/app\/uploads\/2023\/06\/Generische-Datenpipeline_RISC_Software_GmbH-360x214.png","_links":{"self":[{"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/publication\/1194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/publication"}],"about":[{"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/types\/publication"}],"version-history":[{"count":10,"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/publication\/1194\/revisions"}],"predecessor-version":[{"id":36641,"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/publication\/1194\/revisions\/36641"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/media\/1150"}],"wp:attachment":[{"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/media?parent=1194"}],"wp:term":[{"taxonomy":"publication-category","embeddable":true,"href":"https:\/\/risc.web-email.at\/en\/wp-json\/wp\/v2\/publication-category?post=1194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}