.. _sciml: Scientific Machine Learning =========================== .. questions:: - What is the SciML Julia package and how to use it? - How can I mix physical pre-knowledge and data to solve modelling problems in Julia? .. instructor-note:: - 35 min teaching - 15 min exercises .. callout:: The code in this lession is written for Julia v1.11.3. A modelling problem ------------------- In this session we will have a look at a physical modelling problem and investigate how Machine Learning can be used in combination with classical modelling techniques. Background on neural networks and scientific machine learning can be found here :download:`download slides `. Consider a spherical object moving in a viscous fluid under the influence of drag along side other forces. We will consider this problem in 2 dimensions. The code below is adapted from the `Missing Physics example `_ included in the SciML documentation. .. code-block:: julia using LinearAlgebra, Statistics, Random using ComponentArrays, Lux, Zygote, Plots using OrdinaryDiffEq, ModelingToolkit, DataDrivenDiffEq, SciMLSensitivity, DataDrivenSparse using Optimization, OptimizationOptimisers, OptimizationOptimJL, LineSearches # random number generator rng = Random.default_rng() function dynamics!(du, u, p, t) m = 1.0 g = 10.0 # u[1] = x' # u[2] = x # u[3] = y' # u[4] = y # drag force, gravity and Lorentz force du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + 2*u[3] du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] - g - 2*u[1] # reduce to 1st order ODE system du[2] = u[1] du[4] = u[3] end tspan = (0.0, 1.0) sc = 1.0 deltat = 0.01 u0 = sc * rand(4) inits_g = [rand(4) for ii in range(1,1)] prob = ODEProblem(dynamics!, u0, tspan) prbs = [ODEProblem(dynamics!, ui, tspan) for ui in inits_g] times = Vector(0:deltat:1.0) sols = [Array(solve(prb, Vern7(), abstol = 1e-12, reltol = 1e-12, saveat = deltat)) for prb in prbs] Xs = hcat(sols...) # for axis equal; aspect_ratio = :equal scatter(Xs[2,:], Xs[4,:], alpha = 0.75, color = :green, label = "True Data") .. figure:: img/solutions_1.png :align: center Solution to an initial value problem. The object experiences a drag force which scales as the speed squared, and a constant gravitational pull. To illustrate the learning of missing physics, we have added a Lorentz force experienced by a charged particle under a contant magnetic field pointing in the z-direction. .. code-block:: julia # m = 1.0 # g = 10.0 # drag force, gravity and Lorentz force du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + 2*u[3] du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] - g - 2*u[1] First consider an almost black box UDE (Universal Differential Equation) where we model the whole right-hand side of the equation system by a neural network. The model is helped by assumed prior knowledge of homogeneity, that is the forces acting on the object only depend on its velocity, not its position. .. code-block:: julia using LinearAlgebra, Statistics, Random using ComponentArrays, Lux, Zygote, Plots using OrdinaryDiffEq, ModelingToolkit, DataDrivenDiffEq, SciMLSensitivity, DataDrivenSparse using Optimization, OptimizationOptimisers, OptimizationOptimJL, LineSearches # random number generator rng = Random.default_rng() function dynamics!(du, u, p, t) m = 1.0 g = 10.0 # u[1] = x' # u[2] = x # u[3] = y' # u[4] = y # drag force, gravity and Lorentz force du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + 2*u[3] du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] - g - 2*u[1] # reduce to 1st order ODE system du[2] = u[1] du[4] = u[3] end tspan = (0.0, 1.0) sc = 1.0 deltat = 0.1 u0 = sc * rand(4) inits_g = [rand(4) for ii in range(1,1)] prob = ODEProblem(dynamics!, u0, tspan) prbs = [ODEProblem(dynamics!, ui, tspan) for ui in inits_g] times = Vector(0:deltat:1.0) sols = [Array(solve(prb, Vern7(), abstol = 1e-12, reltol = 1e-12, saveat = deltat)) for prb in prbs] Xs = hcat(sols...) # for axis equal; aspect_ratio = :equal scatter(Xs[2,:], Xs[4,:], alpha = 0.75, color = :green, label = "True Data") # savefig("solutions_1.png") # define our activation function, radial basis function rbf(x) = exp.(-(x .^ 2)) # the neural network const U = Lux.Chain(Lux.Dense(2, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 2)) # Get the initial parameters and state variables of the model p, st = Lux.setup(rng, U) const _st = st function ude_dynamics!(du, u, p, t) û = U(u[[1,3],:], p, _st)[1] # black box du[1] = û[1] du[3] = û[2] # model with more structure ###du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + û[1] ###du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] + û[2] # reduce to 1st order ODE system du[2] = u[1] du[4] = u[3] end problems = [ODEProblem(ude_dynamics!, ui, tspan, p) for ui in inits_g] function predict(θ, inits = inits_g, T = times) _probs = [remake(problems[ii], u0 = inits[ii], tspan = (T[1], T[end]), p = θ) for ii in range(1,size(inits)[1])] allsols = [Array(solve(_prob, Vern7(), saveat = T, abstol = 1e-6, reltol = 1e-6, sensealg = QuadratureAdjoint(autojacvec = ReverseDiffVJP(true)))) for _prob in _probs] hcat(allsols...) end function loss(θ) X̂ = predict(θ) mean(abs2, Xs .- X̂) end losses = Float64[] callback = function (state, l) push!(losses, l) if length(losses) % 50 == 0 println("Current loss after $(length(losses)) iterations: $(losses[end])") end return false end adtype = Optimization.AutoZygote() optf = Optimization.OptimizationFunction((w,params) -> loss(w), adtype) optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p)) # epochs = 250 epochs = 1000 res = Optimization.solve( optprob, LBFGS(linesearch = BackTracking()), callback = callback, maxiters = epochs) println("Final training loss after $(length(losses)) iterations: $(losses[end])") p_trained = res.u plot(1:length(losses), losses[1:end], yaxis = :log10, xaxis = :log10, xlabel = "Iterations", ylabel = "Loss", label = "LBFGS", color = :red) ts = first(times):(mean(diff(times)) / 2):last(times) X̂ = predict(p_trained, inits_g, ts) scatter(Xs[2,:], Xs[4,:], alpha = 0.75, color = :green, label = "True Data") scatter!(X̂[2,:], X̂[4,:], alpha = 0.4, color = :red, label = "Prediction") u_test = rand(4) X̂_test = predict(p_trained, [u_test], ts) prob_test = ODEProblem(dynamics!, u_test, tspan) solution_test = solve(prob_test, Vern7(), abstol = 1e-12, reltol = 1e-12, saveat = deltat) Xs_test = Array(solution_test) scatter!(Xs_test[2,:], Xs_test[4,:], alpha = 0.75, color = :blue, label = "True Data Test") #scatter(Xs_test[2,:], Xs_test[4,:], alpha = 0.75, color = :blue, label = "True Data Test") scatter!(X̂_test[2,:], X̂_test[4,:], alpha = 0.4, color = :yellow, label = "Prediction Test") # savefig("solutions_2.png") At the end of the script, we plot the true data and model prediction on the trajectory that was used as training data, as well as a test trajectory with random initial values. .. figure:: img/solutions_2.png :align: center One training and one test trajectory with their corresponding predictions. The predictions on the training trajectory are accurate but the predictions on the test trajectory are not very good. This is not too unexpected since we are only training the model on a single inital condition. Let's see what happens when we train the model on trajectories from 6 randomly generated initial conditions instead. .. code-block:: julia # inits_g = [rand(4) for ii in range(1,1)] inits_g = [rand(4) for ii in range(1,6)] .. figure:: img/solutions_3.png :align: center Training on 6 trajectories. Prediction on test trajectory is quite good in this case. It takes the black box UDE about 1000 epochs to get a good result. .. figure:: img/loss_bb.png :align: center Training loss on the black box UDE model. Now consider the case where, as an example, the drag force is known but not the rest of the dynamics. This means that the neural network has to learn for intance the Lorentz force from data. A model that uses partial pre-knowledge and learns the rest from data is called a hybrid model. To implement this, we make the following change in the function defining the UDE dynamics: explicitly encode the drag force and let the neural network take care of the rest, that is the other terms on the right-hand side of the system of equations. .. code-block:: julia function ude_dynamics!(du, u, p, t) û = U(u[[1,3],:], p, _st)[1] # black box ###du[1] = û[1] ###du[3] = û[2] # model with more structure du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + û[1] du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] + û[2] du[2] = u[1] du[4] = u[3] end In this case we get similar results but much quicker. .. figure:: img/loss_hybrid.png :align: center Training loss on the hybrid model. .. todo:: Not assuming homogeneity In the dynamics example above, what happens if you do not assume homogeneity? In other words, if the forces acting on the object are allowed to depend both the object's velocity and position, as would be the case if for example the magetic field was non-homogeneous. Try running the code under such weaker assumptions. .. solution:: Modifications You can change the definition of the UDE dynamics by letting the neural network that models the dynamics depend on both position and velocity: .. code-block:: julia function ude_dynamics!(du, u, p, t) # û = U(u[[1,3],:], p, _st)[1] û = U(u, p, _st)[1] # U depends on the whole u # black box du[1] = û[1] du[3] = û[2] # model with more structure ###du[1] = -((u[1]^2 + u[3]^2)^0.5)*u[1] + û[1] ###du[3] = -((u[1]^2 + u[3]^2)^0.5)*u[3] + û[2] # reduce to 1st order ODE system du[2] = u[1] du[4] = u[3] end You also have to make sure that the neural network has 4 input nodes rather than just 2: .. code-block:: julia #const U = Lux.Chain(Lux.Dense(2, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 5, rbf), # Lux.Dense(5, 2)) const U = Lux.Chain(Lux.Dense(4, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 2)) Typically you need more data to get similar results compared to the case where we assume homogeneity. If you get issues with renaming :math:`U` since you already ran the code with the orignal definition, you can restart the REPL or introduce another neural network :math:`V` and replace :math:`U` where needed. .. todo:: The neural network Experiment with other architecture of the neural network in the above example. How small (in terms of number of parameters) can it be and still perform well? Is the convergence rate slower or faster when the number of parameter increases or decreases? You can also try other activation functions and see how the model performs (tanh, ReLU, GELU, etc.).