# Multiscale modeling

Post-publication activity

Curator: Jianfeng Lu

Multiscale modeling refers to a style of modeling in which multiple models at different scales are used simultaneously to describe a system. The different models usually focus on different scales of resolution. They sometimes originate from physical laws of different nature, for example, one from continuum mechanics and one from molecular dynamics. In this case, one speaks of multi-physics modeling even though the terminology might not be fully accurate.

The need for multiscale modeling comes usually from the fact that the available macroscale models are not accurate enough, and the microscale models are not efficient enough and/or offer too much information. By combining both viewpoints, one hopes to arrive at a reasonable compromise between accuracy and efficiency.

The subject of multiscale modeling consists of three closely related components: multiscale analysis, multiscale models and multiscale algorithms. Multiscale analysis tools allow us to understand the relation between models at different scales of resolutions. Multiscale models allow us to formulate models that couple together models at different scales. Multiscale algorithms allow us to use multiscale ideas to design computational algorithms.

## Contents

Traditional approaches to modeling focus on one scale. Macroscale models require constitutive relations which are almost always obtained empirically, by guessing. Making the right guess often requires and represents far-reaching physical insight, as we see from the work of Newton and Landau, for example. It also means that for complex systems, the guessing game can be quite hard and less productive, as we have learned from our experience with modeling complex fluids.

The other extreme is to work with a microscale model, such as the first principle of quantum mechanics. As was declared by Dirac back in 1929 (Dirac, 1929), the right physical principle for most of what we are interested in is already provided by the principles of quantum mechanics, there is no need to look further. There are no empirical parameters in the quantum many-body problem. We simply have to input the atomic numbers of all the participating atoms, then we have a complete model which is sufficient for chemistry, much of physics, material science, biology, etc. Dirac also recognized the daunting mathematical difficulties with such an approach -- after all, we are dealing with a quantum many-body problem. With each additional particle, the dimensionality of the problem is increased by three. For this reason, direct applications of the first principle are limited to rather simple systems without much happening at the macroscale.

Take for example, the incompressible fluids. The fundamental laws are simply that of the conservation of mass and momentum, which, after introducing the notion of stress, can be expressed as follows: $\begin{matrix} \nabla \cdot \mathbf{u} =0 \\ \rho_0 (\partial_t \mathbf{u} + (\mathbf{u} \cdot \nabla ) \mathbf{u}) = \nabla \cdot \tau \end{matrix}$ where $$\mathbf{u}$$ is the velocity field, $$\rho_0$$ is the density of the fluid which is assumed to be constant and $$\tau$$ is the stress tensor. To close this system of equations, we need an expression for the stress tensor in terms of the velocity field. Here is where the guessing game begins.

The standard approach is to ask: What is the simplest expression that is consistent with

• symmetry (Galilean, translational/rotational invariance);
• laws of physics, particularly the second law of thermodynamics;
• experimental data.

In this case, Galilean invariance implies that $$\tau$$ does not depend on $$\mathbf{u}\ .$$ The next simplest thing is to say that $$\tau$$ is a function (in fact, linear function) of $$\nabla \mathbf{u}\ .$$ This gives us $\tau = \tau_0 + C \nabla \mathbf{u} = -p \mathbf{I} + \mu (\nabla \mathbf{u} + (\nabla \mathbf{u})^{\mathrm{T}})$ where $$p$$ is the pressure, $$\mu$$ is the kinematic viscosity coefficient, which in this context is the only parameter that carries the information specific to the microscopic constituents of the system, $$\mathbf{I}$$ is the identity tensor. Second law of thermodynamics requires that $$\mu \geq 0 \ .$$ Substituting the above expression for $$\tau$$ into the momentum equation, we obtain the celebrated Navier-Stokes equation.

Even though the reasoning that went into the derivation of the Navier-Stokes equation is exceedingly simple, the model itself has proven to be extremely successful in describing virtually all phenomena that we encounter for simple fluids, i.e. fluids that are made up of molecules with a simple molecular structure. Partly for this reason, the same approach has been followed in modeling complex fluids, such as polymeric fluids. Unfortunately the success there is quite limited. The first problem is that simplicity is largely lost. In order to model the complex rheological properties of polymer fluids, one is forced to make more complicated constitutive assumptions with more and more parameters. The second difficulty is that accuracy is not guaranteed. For polymer fluids we are often interested in understanding how the conformation of the polymer interacts with the flow. This kind of information is missing in the kind of empirical approach described above.

A more rigorous approach is to derive the constitutive relation from microscopic models, such as atomistic models, by taking the hydrodynamic limit. This is an example of the multiscale analysis approach. For simple fluids, this will result in the same Navier-Stokes equation we derived earlier, now with a formula for $$\mu$$ in terms of the output from the microscopic model. But for complex fluids, this would result in rather different kinds of models. This is one of the starting points of multiscale modeling.

## Multiscale modeling

In the multiscale approach, one uses a variety of models at different levels of resolution and complexity to study one system. The different models are linked together either analytically or numerically. For example, one may study the mechanical behavior of solids using both the atomistic and continuum models at the same time, with the constitutive relations needed in the continuum model computed from the atomistic model. The hope is that by using such a multi-scale (and multi-physics) approach, one might be able to strike a balance between accuracy (which favors using more detailed and microscopic models) and feasibility (which favors using less detailed, more macroscopic models). Figure 1: Illustration of the multi-physics hierarchy. It should be noted that this diagram is slightly misleading: Quantum mechanics is valid not just at the microscale, it also applies at the macroscale -- only that much simpler models are already quite sufficient at the macroscale.

### Sequential multiscale modeling

In sequential multiscale modeling, one has a macroscale model in which some details of the constitutive relations are precomputed using microscale models. For example, if the macroscale model is the gas dynamics equation, then an equation of state is needed. This equation of state can be precomputed using kinetic theory. When performing molecular dynamics simulation using empirical potentials, one assumes a functional form of the empirical potential, the parameters in the potential are precomputed using quantum mechanics.

Sequential multiscale modeling is mostly limited to the case when only a few parameters are passed between the macro and microscale models. For this reason, it is also called parameter passing. This does not have to be the case though: It has been demonstrated that constitutive relations which are functions of up to 6 variables can be effectively precomputed if sparse representations are used (Bungartz and Griebel, 2004; Garcia-Cervera, Ren, Lu and E, 2008).

### Concurrent multiscale modeling

In concurrent multiscale modeling, the quantities needed in the macroscale model are computed on-the-fly from the microscale models as the computation proceeds. In this setup, the macro- and micro-scale models are used concurrently. Take again the example of molecular dynamics. If one wants to compute the inter-atomic forces from the first principle instead of modeling them empirically, then it is much more efficient to do this on-the-fly. Precomputing the inter-atomic forces as functions of the positions of all the atoms in the system is not practical since there are too many independent variables. On the other hand, in a typical simulation, one only probes an extremely small portion of the potential energy surface. Concurrent coupling allows one to evaluate these forces at the locations where they are needed.

### The two types of multiscale problems

The first type are problems where some interesting events, such as chemical reactions, singularities or defects, are happening locally. In this situation, we need to use a microscale model to resolve the local behavior of these events, and we can use macroscale models elsewhere. The second type are problems for which some constitutive information is missing in the macroscale model, and coupling with the microscale model is required in order to supply this missing information. We refer to the first type as type A problems and the second type as type B problems.

## Examples of multiscale methods

### Car-Parrinello molecular dynamics

To model the classical dynamics of the atoms or nuclei, we need the inter-atomic forces, which arise from the Coulomb interaction between the nuclei and the electrons. This is a type B problem. Knowing the position of the atoms, we should in principle be able to evaluate the electronic structure and determine the inter-atomic forces. However, precomputing such functions is unfeasible due to the large number of degrees of freedom in the problem. The Car-Parrinello molecular dynamics (Car and Parrinello, 1985), or CPMD, is a way of performing molecular dynamics with inter-atomic forces evaluated on-the-fly using electronic structure models such as the ones from density functional theory.

There are two technical aspects that are worth mentioning. The first is that the implementation of CPMD is based on an extended Lagrangian framework by considering the wavefunctions for electrons in the same setting as the positions of the nuclei. In this extended phase space, one can write down a Lagrangian which incorporates both the Hamiltonian for the nuclei and the wavefunctions. In this way CPMD looks just like the usual MD. The second is the choice of the mass parameter for the wavefunctions. Perhaps the most natural choice should be the mass of an electron. This makes the system stiff since the time scales of the electrons and the nuclei are quite disparate. However, since we are only interested in the dynamics of the nuclei, not the electrons, we can choose a value which is much larger than the electron mass, so long as it still gives us satisfactory accuracy for the nuclear dynamics.

### Quantum mechanics - molecular mechanics (QM-MM) methods

When studying chemical reactions involving large molecules, it often happens that the active areas of the molecules involved in the reaction are rather small. The rest of the molecules just serves to provide the environment for the reaction. In this case, it is natural to only treat the reaction zone quantum mechanically, and treat the rest using classical description. This is a type A problem. Such a methodology is called the QM-MM (quantum mechanics-molecular mechanics) method (Warshel and Levitt, 1976).

The most important ingredient in such a method is the Hamiltonian which should consist of the quantum mechanical Hamiltonian for the active region, the classical Hamiltonian for the rest as well as the part that represents the interaction between the two regions. Another important ingredient is how one terminates the quantum mechanical region, in particular, the covalent bonds. Many ideas have been proposed, among which we mention the linked atom methods, hybrid orbitals, and the pseudo-bond approach.

### Quasicontinuum (QC) method

Quasicontinuum method (Tadmor, Ortiz and Phillips, 1996; Knap and Ortiz, 2001) is a finite element type of method for analyzing the mechanical behavior of crystalline solids based on atomistic models. A triangulation of the physical domain is formed using a subset of the atoms, the representative atoms (or rep-atoms). The rep-atoms are selected using an adaptive mesh refinement strategy. In regions where the deformation is smooth, few atoms are selected. In regions where the deformation gradient is large, more atoms are selected. Typically, near defects such as dislocations, all the atoms are selected.

The next ingredient in the quasicontinuum method is a simplified way of computing the energy of a trial configuration. This is done based on the following principle: In the smooth region, one uses the Cauchy-Born rule which is a way of evaluating the continuum stored energy density using the atomistic model. Near defects, one uses the atomistic model. Some hybrid of these two approaches are used in between.

In the language used below, the quasicontinuum method can be thought of as an example of domain decomposition methods.

### Macro-micro formulations for polymer fluids

One way of overcoming the difficulties associated with the empirical constitutive relations for modeling polymer fluids is as follows (Bird, Armstrong and Hassager, 1987; Oettinger, 1996). One can model the microscale dynamics of the constituent polymers within the flow, and then evaluate the polymer stress using the microscopic configurations of the polymers. This is referred to as a macro-micro model since the macroscale fluid dynamics affect the dynamics of the polymers by providing the background in which the polymers move, and the microscale polymer configurations affect the flow by supplying the polymer contribution of the stress.

Typically the microscale polymer dynamics is modeled using some coarse-grained empirical models such as the dumbbell model, the bead-and-spring model or the rigid rod model. The polymers obey Brownian dynamics: they are convected, stretched and subject to thermal noise. Their contribution to the stress can be evaluated using an Irving-Kirkwood type of formula, such as the Kramers' expression.

Even though the polymer model is still empirical, such an approach usually provides a better physical picture than models based on empirical constitutive laws.

Other examples including large eddy simulation of turbulent flows, solvers for stiff ordinary differential equations, techniques such as super-parametrization in climate modeling, the generalized finite element method that enriches the finite element space with multiscale trial and test functions, etc.

### Examples of classical multiscale algorithms

Multiscale ideas have also been used extensively in contexts where no multi-physics models are involved.

Multi-grid method (Brandt, 1977). Classically this is a way of solving the system of algebraic equations that arise from discretizing differential equations by simultaneously using different levels of grids. In this way, one can more efficiently eliminate the errors on different scales using different grids. In particular, it is typically much more efficient to eliminate large scale (or smooth) component of the errors using coarse grids.

Fast multipole method (Greengard and Rokhlin, 1987). This is a way of summing up long range interaction potentials for a large set of particles. The contribution to the interaction potential is decomposed into components with different scales and these different contributions are evaluated at different levels in a hierarchy of grids.

Adaptive mesh refinement. This is a strategy for choosing the numerical grid or mesh adaptively based on what is known about the current approximation to the numerical solution. Usually one finds a local error indicator from the available numerical solution based on which one modifies the mesh in order to find a better numerical solution.

Domain decomposition (Toselli and Widlund, 2004). The idea is to decompose the whole computational domain into several overlapping or non-overlapping subdomains and to obtain the numerical solution over the whole domain by iterating over the solutions on these subdomains. The domain decomposition method is not limited to multiscale problems, but it can be used for multiscale problems.

Multi-resolution representation. This is a general strategy of decomposing functions or more generally signals into components at different scales. A well-known example is the wavelet representation (Daubechies, 1992).

## General methodologies

Despite the fact that there are already so many different multiscale algorithms, potentially many more will be proposed since multiscale modeling is relevant to so many different applications. Therefore it is natural to ask whether one can develop some general methodologies or guidelines. An analogy can be made with the general methodologies developed for numerically solving differential equations, for example, the finite difference, finite element, finite volume, and spectral methods. These different but also closely related methodologies serve as guidelines for designing numerical methods for specific applications.

Several proposals have been made regarding general methodologies for designing multiscale algorithms. Here we give a brief summary of these attempts.

### Extended multi-grid methods

Traditional multi-grid method is a way of efficiently solving a large system of algebraic equations, which may arise from the discretization of some partial differential equations. For this reason, the effective operators used at each level can all be regarded as an approximation to the original operator at that level. In recent years, Brandt has proposed to extend the multi-grid method to cases when the effective problems solved at different levels correspond to very different kinds of models (Brandt, 2002). For example, the models used at the finest level might be molecular dynamics or Monte Carlo models whereas the effective models used at the coarse levels correspond to some continuum models. Brandt noted that there is no need to have closed form macroscopic models at the coarse scale since coupling to the models used at the fine scale grids automatically provides effective models at the coarse scale. Brandt also noted that one might be able to exploit scale separation to improve the efficiency of the algorithm, by restricting the smoothing operations at fine grid levels to small windows and for few sweeps.

The structure of such an algorithm follows that of the traditional multi-grid method. In a two-level setup, at any macro time step or macro iteration step, the procedure is as follows.

Interpolation: An initial condition for the microscale model at the fine grid level is constructed from the current approximation to the macro state variables.

Relaxation: The microscale model is used to bring the micro state variables to a local equilibrium at the fine grid level.

Restriction: The macro state variables are then extracted from the results of the microscale model.

### The heterogeneous multiscale method (HMM)

In the heterogeneous multiscale method (E and Engquist, 2003), one starts with a preconceived form of the macroscale model with possible missing components, and then estimate the needed data from the microscale model.

The general setup is as follows. At the macroscopic level, we have an incomplete macroscale model: $\tag{1} F(U, D) = 0,$

where $$U$$ is the macroscale state variable, $$D$$ denotes the data needed in order for the macroscale model to be complete. For example, for complex fluids, $$U$$ might be the macroscopic velocity field and $$D$$ might be the stress tensor. In addition, we also have a microscopic model: $\tag{2} f(u, d) =0, \quad d= d(U),$

Here the macroscale variable $$U$$ may enter the system via some constraints, $$d$$ is the data needed in order to set up the microscale model. For example, if the microscale model is the NVT ensemble of molecular dynamics, $$d$$ might be the temperature.

The incomplete macroscale model (1) represents the knowledge we have about the possible form of the effective macroscale model.

The general philosophy of HMM is to solve the incomplete macroscale model by extracting the needed data from the microscale model. The coupling between the macro and micro models are done in such a way that the macro-state provides the constraints for setting up the micro model and the micro model provides the needed constitutive data $$D$$ for the macro model. The two main components of HMM are:

1. A macroscopic solver. Based on whatever knowledge that is available on the possible form of the macroscale model, one selects a suitable macroscale solver. For example, if we are dealing with a variational problem, we may use a finite element method as the macroscale solver.
2. A procedure for estimating the missing macroscale data $$D$$ using the microscale model. This is typically done in two steps:
1. Constrained microscale simulation: At each point where some macroscale data is needed, perform a series of microscopic simulations which are constrained so that they are consistent with the local value of the macro variable.
2. Data processing: Use the results from the microscopic simulations to extract the macroscale data needed in the macroscale solver.

HMM has been used on a variety of problems, including stochastic simulation algorithms (SSA) with disparate rates, elliptic partial differential equations with multiscale data, and ordinary differential equations (ODE) with multiple time scales.

It should be noted that HMM represents a compromise between accuracy and feasibility, since it requires a preconceived form of the macroscale model to begin with. To see why this is necessary, just note that even for the situation when we do know the macroscale model in complete detail, selecting the right algorithm to solve the macroscale model is still often a non-trivial matter. Therefore trying to capture the macroscale behavior without any knowledge about the macroscale model is quite difficult. Of course, the usefulness of HMM depends on how much prior knowledge one has about the macroscale model. In particular, guessing the wrong form of the macroscale model is likely going to lead to wrong results using HMM.

### The equation-free approach

It consists of a set of techniques including coarse bifurcation analysis, projective integrators, the gap-tooth scheme and the patch dynamics (Kevrekidis, Gear, Hyman, et al, 2003). Its basic structure is very similar to that of the multi-grid method, except that in the equation-free approach, one often adds a temporal extrapolation step in order to further increase the time scale over which the simulation is carried out. The overall strategy of the equation-free approach consists of the following steps: Lifting (interpolation), evolution (relaxation or equilibration), restriction and extrapolation.

Take, for example, patch dynamics. The basic idea is to use microscale simulations on patches (which are local spatial-temporal domains) to mimic the macroscale behavior of a system through interpolation in space and extrapolation in time.

Roughly speaking, one might regard HMM as an example of the top-down approach and the equation-free as an example of the bottom-up approach. In HMM, the starting point is the macroscale model, the microscale model is used to supplement the missing data in the macroscale model. In the equation-free approach, particularly patch dynamics or the gap-tooth scheme, the starting point is the microscale model. Various tricks are then used to entice the microscale simulations on small domains to behave like a full simulation on the whole domain.

## Multiscale analysis

Another important component of multiscale modeling is multiscale analysis. The objective is to derive effective macroscale models from the complicated microscale models. A variety of techniques have been developed for this purpose. These techniques can be broadly divided into two classes: Local analysis techniques such as matched asymptotics which can be used to analyze localized singularities, boundary layers, internal layers, etc; averaging methods which can be used to extract macroscale effective models by averaging over small scales.

### Matched asymptotics

Matched asymptotics is a way of extracting the local structure of singularities or sharp transition layers in solutions of differential equations. The idea is to divide the domain of interest into inner and outer regions, and introduce (both dependent and independent) inner variables in the inner region, with the goal that in the new variables, the solutions have $$\mathcal{O}(1)$$ gradients. The leading orders of the inner and outer solutions are then extracted order by order, by solving the leader order differential equations in the inner and outer regions and matching these approximate solutions over the intermediate region where the inner and outer regions overlap.

A classical example in which matched asymptotics has been used is Prandtl's boundary layer theory in fluid mechanics.

### Averaging methods

Averaging methods were developed originally for the analysis of ordinary differential equations with multiple time scales. The main idea is to obtain effective equations for the slow variables over long time scales by averaging over the fast oscillations of the fast variables (Arnold, 1983). Averaging methods can be considered as a special case of the technique of multiple time scale expansions (Bender and Orszag, 1978).

### Homogenization methods

Homogenization methods were first developed for analyzing the effective behavior of partial differential equations with multiscale coefficients representing material properties of heterogeneous medium (Bensoussan, Lions and Papanicolaou, 1978; Pavliotis and Stuart, 2008). A typical example is found in $\nabla \cdot (a^\varepsilon(\mathbf{x}) \nabla u^\varepsilon(\mathbf{x})) = f(\mathbf{x})$ where $$\varepsilon$$ is considered to be a small parameter and $$a^\varepsilon$$ represents some multiscale coefficients representing the transport or elastic properties of the medium. In the limit when $$\varepsilon \rightarrow 0\ ,$$ one would like to replace the above model by $\nabla \cdot (A(\mathbf{x}) \nabla U(\mathbf{x})) = f(\mathbf{x})$ with the requirement that $$U$$ represents the large scale component of $$u^\varepsilon\ .$$ In this case, $$A$$ describes the properties of an effective medium.

Homogenization methods can be applied to many other problems of this type, in which a heterogeneous behavior is approximated at the large scale by a slowly varying or homogeneous behavior.

### Hydrodynamic limit

Starting from models of molecular dynamics, one may also derive hydrodynamic macroscopic models for a set of slowly varying quantities. These slowly varying quantities are typically the Goldstone modes of the system. They correspond to long wavelength variations in the problem. For example, the densities of conserved quantities such as mass, momentum and energy densities are Goldstone modes. The equilibrium states of macroscopically homogeneous systems are parametrized by the values of these quantities. When the system varies on a macroscopic scale, these conserved densities also vary, and their dynamics is described by a set of hydrodynamic equations (Spohn, 1991). In this case, locally, the microscopic state of the system is close to some local equilibrium states parametrized by the local values of the conserved densities.

### Renormalization group methods

The renormalization group method is one of the most powerful techniques for studying the effective behavior of a complex system in the space of scales (Wilson and Kogut, 1974). The basic object of interest is a dynamical system for the effective model in which the time parameter is replaced by scale. Therefore this dynamical system describes how the effective model changes as the scale changes.

The renormalization group method has found applications in a variety of problems ranging from quantum field theory, to statistical physics, dynamical systems, polymer physics, etc.

### The Mori-Zwanzig formalism

The Mori-Zwanzig formalism (Zwanzig, 1960; Mori, 1965) is a general strategy for eliminating degrees of freedom in a system. The basic idea can be understood from the following simple example:

Consider a linear system $\begin{matrix} \frac{\mathrm{d} p}{\mathrm{d} t} &=& A_{11} p + A_{12} q,\\ \frac{\mathrm{d} q}{\mathrm{d} t} &=& A_{21} p + A_{22} q. \end{matrix}$ Our objective is to obtain a closed model for $$p\ ,$$ which is the quantity that we are really interested in. To this end, we solve the second equation for $$q\ ,$$ viewing $$p$$ as a parameter: $q(t) = e^{A_{22} t} q(0) + \int_0^t e^{A_{22}(t-\tau)} A_{21} p(\tau) \,\mathrm{d}\tau.$ Substituting this into the first equation, we obtain $\tag{3} \begin{matrix} \frac{\mathrm{d} p}{\mathrm{d} t} &= & A_{11} p + A_{12}\int_0^t e^{A_{22}(t-\tau)}A_{21}p(\tau) \, \mathrm{d}\tau + A_{12}e^{A_{22}t} q(0)\\ &= &A_{11} p + \int_0^t K(t-\tau) p(\tau) \,\mathrm{d}\tau + f(t), \end{matrix}$

where $$K(t) = A_{12}e^{A_{22}t} A_{21}$$ is the memory kernel and $$f(t) = A_{12}e^{A_{22}t} q(0)$$ is the so-called noise term, if we think of $$q(0)$$ as a random variable. Note that it does make sense to think of $$q(0)$$ as a random variable since we are not really interested in the details of $$q\ .$$ Eq. (3) is in the form of a generalized Langevin equation, with a memory term and the noise term.

The main ideas behind this procedure are quite general and can be carried over to general linear or nonlinear models. The procedure allows one to eliminate a subset of degrees of freedom, and obtain a generalized Langevin type of equation for the remaining degrees of freedom. However, in the general case, the generalized Langevin equation can be quite complicated and one needs to resort to additional approximations in order to make it tractable.

## Concluding remarks

Even though multiscale modeling has already had great success and holds even greater promises in a wide variety of areas, many challenges remain in order to make it into a solid scientific discipline. These challenges include:

1. Mathematical study of the multiscale models. Are these models well-posed? Are they consistent with the full microscale model?
2. Adaptivity. For type A problems, we need to decide where fine scale models should be used and where macro-scale models are sufficient. This is adaptivity at the level of models. This requires developing new style of error indicators to guide the refinement algorithms.
3. Effect of noise. It is often the case that microscale models contain noise or fluctuations and conventional macroscale models do not. This does not necessarily mean that noise can be neglected at the macro level. Assessing and accounting for the effect of noise is a major issue that needs to be better understood.