{ "cells": [ { "cell_type": "markdown", "id": "c5d4df30-761b-4643-b73b-a222fe30e7a6", "metadata": {}, "source": [ "# Motivation" ] }, { "cell_type": "markdown", "id": "f6917cff-aee9-4dc2-a301-8944646da523", "metadata": {}, "source": [ "This episode provides a broad overview of this course and the main motivation to attend this course.\n", "\n", ":::{objectives}\n", "- What is big data\n", "- What is the Python programming environment and the ecosystem\n", "- What you will learn during this course\n", "\n", ":::\n", "\n", ":::{instructor-note}\n", "- 20 min teaching/type-along\n", "- 0 min exercising\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "d7b872bc-22bb-4388-9cf9-a71e6179711b", "metadata": {}, "source": [ "## Big Data" ] }, { "cell_type": "markdown", "id": "06178eac-6734-4ec3-82fa-8bed58281851", "metadata": {}, "source": [ ":::{discussion} How large is your data?\n", "\n", "- How large is the data you are working with?\n", "- Are you experiencing performance bottlenecks when you try to analyse it?\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "fa634f18-b72e-4700-a3e5-56ed5d3ca7e2", "metadata": {}, "source": [ "“Big data refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. […] Big data analysis challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, and data source.” (from [Wikipedia](https://en.wikipedia.org/wiki/Big_data))\n", "\n", "“Big data” is a current buzzword used heavily in the tech industry, but many scientific research communities are increasingly adopting high-throughput data production methods which lead to very large datasets. One driving force behind this development is the advent of powerful machine learning methods which enable researchers to derive novel scientific insights from large datasets. Another is the strong development of high performance computing (HPC) hardware and the accompanying development of software libraries and packages which can efficiently take advantage of the hardware.\n", "\n", "This module focuses on high-performance data analytics (HPDA), a subset of high-performance computing which focuses on working with large data. The data can come from either computer models and simulations or from experiments and observations, and the goal is to preprocess, analyse and visualise it to generate scientific results.\n", "\n", "The video shown below provide more descriptions of the big data." ] }, { "cell_type": "code", "execution_count": 1, "id": "eedea65f-dc03-48e8-b0b0-b13f1245e1da", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "