Data and data management

Terms related to data, including definitions about data warehousing and words and phrases about data management.
  • data scientist - A data scientist is an analytics professional who is responsible for collecting, analyzing and interpreting data to help drive decision-making in an organization.
  • data set - A data set, also spelled 'dataset,' is a collection of related data that's usually organized in a standardized format.
  • data source name (DSN) - A data source name (DSN) is a data structure containing information about a specific database to which an Open Database Connectivity (ODBC) driver needs to connect.
  • data splitting - Data splitting is when data is divided into two or more subsets.
  • data stewardship - Data stewardship is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.
  • data streaming - Data streaming is the continuous transfer of data from one or more sources at a steady, high speed for processing into specific outputs.
  • data structure - A data structure is a specialized format for organizing, processing, retrieving and storing data.
  • Data Transfer Project (DTP) - Data Transfer Project (DTP) is an open source initiative to facilitate customer-controlled data transfers between two online services.
  • data virtualization - Data virtualization is an umbrella term used to describe an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data.
  • data warehouse - A data warehouse is a repository of data from an organization's operational systems and other sources that supports analytics applications to help drive business decision-making.
  • data warehouse as a service (DWaaS) - Data warehouse as a service (DWaaS) is an outsourcing model in which a cloud service provider configures and manages the hardware and software resources a data warehouse requires, and the customer provides the data and pays for the managed service.
  • database (DB) - A database is a collection of information that is organized so that it can be easily accessed, managed and updated.
  • database management system (DBMS) - A database management system (DBMS) is a software system for creating and managing databases.
  • database marketing - Database marketing is a systematic approach to the gathering, consolidation and processing of consumer data.
  • database replication - Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another -- so that all users share the same level of information.
  • DataOps - DataOps is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.
  • Db2 - Db2 is a family of database management system (DBMS) products from IBM that serve a number of different operating system (OS) platforms.
  • decision-making process - A decision-making process is a series of steps one or more individuals take to determine the best option or course of action to address a specific problem or situation.
  • deep analytics - Deep analytics is the application of sophisticated data processing techniques to yield information from large and typically multi-source data sets comprised of both unstructured and semi-structured data.
  • descriptive analytics - Descriptive analytics is a type of data analytics that looks at past data to give an account of what has happened.
  • digital wallet - In general, a digital wallet is a software application, usually for a smartphone, that serves as an electronic version of a physical wallet.
  • dimension - In data warehousing, a dimension is a collection of reference information that supports a measurable event, such as a customer transaction.
  • dimension table - In data warehousing, a dimension table is a database table that stores attributes describing the facts in a fact table.
  • disambiguation - Disambiguation is the process of determining a word's meaning -- or sense -- within its specific context.
  • disaster recovery (DR) - Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations.
  • distributed database - A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks.
  • distributed ledger technology (DLT) - Distributed ledger technology (DLT) is a digital system for recording the transaction of assets in which the transactions and their details are recorded in multiple places at the same time.
  • document - A document is a form of information that might be useful to a user or set of users.
  • Dublin Core - Dublin Core is an international metadata standard formally known as the Dublin Core Metadata Element Set and includes 15 metadata (data that describes data) terms.
  • ebXML (Electronic Business XML) - EbXML (Electronic Business XML or e-business XML) is a project to use the Extensible Markup Language (XML) to standardize the secure exchange of business data.
  • Eclipse (Eclipse Foundation) - Eclipse is a free, Java-based development platform known for its plugins that allow developers to develop and test code written in other programming languages.
  • edge analytics - Edge analytics is an approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or other device instead of waiting for the data to be sent back to a centralized data store.
  • empirical analysis - Empirical analysis is an evidence-based approach to the study and interpretation of information.
  • empiricism - Empiricism is a philosophical theory applicable in many disciplines, including science and software development, that human knowledge comes predominantly from experiences gathered through the five senses.
  • encoding and decoding - Encoding and decoding are used in many forms of communications, including computing, data communications, programming, digital electronics and human communications.
  • encryption key management - Encryption key management is the practice of generating, organizing, protecting, storing, backing up and distributing encryption keys.
  • enterprise search - Enterprise search is a type of software that lets users find data spread across organizations' internal repositories, such as content management systems, knowledge bases and customer relationship management (CRM) systems.
  • entity - An entity is a single thing with a distinct separate existence.
  • entity relationship diagram (ERD) - An entity relationship diagram (ERD), also known as an 'entity relationship model,' is a graphical representation that depicts relationships among people, objects, places, concepts or events in an information technology (IT) system.
  • Epic Systems - Epic Systems, also known simply as Epic, is one of the largest providers of health information technology, used primarily by large U.
  • erasure coding (EC) - Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media.
  • exabyte (EB) - An exabyte (EB) is a large unit of computer data storage, two to the sixtieth power bytes.
  • Excel - Excel is a spreadsheet program from Microsoft and a component of its Office product group for business applications.
  • exponential function - An exponential function is a mathematical function used to calculate the exponential growth or decay of a given set of data.
  • extension - An extension typically refers to a file name extension.
  • facial recognition - Facial recognition is a category of biometric software that maps an individual's facial features to confirm their identity.
  • fact table - In data warehousing, a fact table is a database table in a dimensional model.
  • failover - Failover is a backup operational mode in which the functions of a system component are assumed by a secondary component when the primary becomes unavailable.
  • file extension (file format) - In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file is organized.
  • file synchronization (file sync) - File synchronization (file sync) is a method of keeping files that are stored in several different physical locations up to date.
  • FIX protocol (Financial Information Exchange protocol) - The Financial Information Exchange (FIX) protocol is an open specification intended to streamline electronic communications in the financial securities industry.
  • foreign key - A foreign key is a column or columns of data in one table that refers to the unique data values -- often the primary key data -- in another table.
  • garbage in, garbage out (GIGO) - Garbage in, garbage out, or GIGO, refers to the idea that in any system, the quality of output is determined by the quality of the input.
  • Google BigQuery - Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.
  • GPS coordinates - GPS coordinates are a unique identifier of a precise geographic location on the earth, usually expressed in alphanumeric characters.
  • gradient descent - Gradient descent is an optimization algorithm that refines a machine learning (ML) model's parameters to create a more accurate model.
  • grid computing - Grid computing is a system for connecting a large number of computer nodes into a distributed architecture that delivers the compute resources necessary to solve complex problems.
  • gzip (GNU zip) - Gzip (GNU zip) is a free and open source algorithm for file compression.
  • Hadoop - Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
  • Hadoop Distributed File System (HDFS) - The Hadoop Distributed File System (HDFS) is the primary data storage system Hadoop applications use.
  • hashing - Hashing is the process of transforming any given key or a string of characters into another value.
  • health informatics - Health informatics is the practice of acquiring, studying and managing health data and applying medical concepts in conjunction with health information technology systems to help clinicians provide better healthcare.
  • Health IT (health information technology) - Health IT (health information technology) is the area of IT involving the design, development, creation, use and maintenance of information systems for the healthcare industry.
  • heartbeat (computing) - In computing, a heartbeat is a program that runs specialized scripts automatically whenever a system is initialized or rebooted.
  • heat map (heatmap) - A heat map is a two-dimensional representation of data in which various values are represented by colors.
  • hierarchy - Generally speaking, hierarchy refers to an organizational structure in which items are ranked in a specific manner, usually according to levels of importance.
  • histogram - A histogram is a type of chart that shows the frequency distribution of data points across a continuous range of numerical values.
  • historical data - Historical data, in a broad context, is data collected about past events and circumstances pertaining to a particular subject.
  • IBM IMS (Information Management System) - IBM IMS (Information Management System) is a database and transaction management system that was first introduced by IBM in 1968.
  • ICD-10-CM (Clinical Modification) - The ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) is a system used by physicians and other healthcare providers to classify and code all diagnoses, symptoms and procedures related to inpatient and outpatient medical care in the United States.
  • IDoc (intermediate document) - IDoc (intermediate document) is a standard data structure used in SAP applications to transfer data to and from SAP system applications and external systems.
  • in-memory analytics - In-memory analytics is an approach to querying data residing in a computer's random access memory (RAM) as opposed to querying data stored on physical drives.
  • in-memory database - An in-memory database is a type of analytic database designed to streamline the work involved in processing queries.
  • infographic - An infographic (information graphic) is a representation of information in a graphic format designed to make the data easily understandable at a glance.
  • information - Information is the output that results from analyzing, contextualizing, structuring, interpreting or in other ways processing data.
  • information asset - An information asset is a collection of knowledge or data that is organized, managed and valuable.
  • information assurance (IA) - Information assurance (IA) is the practice of protecting physical and digital information and the systems that support the information.
  • information governance - Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and metrics that treat information as a valuable business asset.
  • information lifecycle management (ILM) - Information lifecycle management (ILM) is a comprehensive approach to managing an organization's data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.
  • IT incident management - IT incident management is a component of IT service management (ITSM) that aims to rapidly restore services to normal following an incident while minimizing adverse effects on the business.
  • Java Database Connectivity (JDBC) - Java Database Connectivity (JDBC) is an API packaged with the Java SE edition that makes it possible to connect from a Java Runtime Environment (JRE) to external, relational database systems.
  • job - In certain computer operating systems, a job is the unit of work that a computer operator -- or a program called a job scheduler -- gives to the OS.
  • job scheduler - A job scheduler is a computer program that enables an enterprise to schedule and, in some cases, monitor computer 'batch' jobs (units of work).
  • key-value pair (KVP) - A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data.
  • knowledge base - In general, a knowledge base is a centralized repository of information.
  • knowledge management (KM) - Knowledge management is the process an enterprise uses to gather, organize, share and analyze its knowledge in a way that's easily accessible to employees.
  • laboratory information system (LIS) - A laboratory information system (LIS) is computer software that processes, stores and manages data from patient medical processes and tests.
  • legal health record (LHR) - A legal health record (LHR) refers to documentation about a patient's personal health information that is created by a healthcare organization or provider.
  • Lisp (programming language) - Lisp, an acronym for list processing, is a functional programming language that was designed for easy manipulation of data strings.
  • LTO-8 (Linear Tape-Open 8) - LTO-8, or Linear Tape-Open 8, is a tape format from the Linear Tape-Open Consortium released in late 2017.
  • medical scribe - A medical scribe is a professional who specializes in documenting patient encounters in real time under the direction of a physician.
  • metadata - Often referred to as data that describes other data, metadata is structured reference data that helps to sort and identify attributes of the information it describes.
  • Microsoft Azure Data Lake - Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets.
  • Microsoft MyAnalytics - Microsoft MyAnalytics is a personal analytics application in Office 365 that enables employees to gain insights into how they spend their time at work and how they can work smarter.
  • Microsoft System Center - Microsoft System Center is a suite of software products designed to simplify the deployment, configuration and management of IT infrastructure and virtualized software-defined data centers.
  • middleware - Middleware is software that bridges the gap between applications and operating systems by providing a method for communication and data management.
  • Monte Carlo simulation - A Monte Carlo simulation is a mathematical technique that simulates the range of possible outcomes for an uncertain event.
  • MPP database (massively parallel processing database) - An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time.
  • multidimensional database (MDB) - A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications.
  • national identity card - A national identity card is a portable document, typically a plasticized card with digitally embedded information, that is used to verify aspects of a person's identity.