What is simplex in machine learning

Discrete Exterior Calculus in Machine Learning

Transcript

1 Discrete Exterior Calculus in Machine Learning Alexander Schier Born on March 25, 1986 in Cologne January 31, 2014 Diploma thesis Computer Science Supervisor: Prof. Dr. Michael Griebel Institute for Numerical Simulation Faculty of Mathematics and Natural Sciences at the Rheinische Friedrich-Wilhelms-Universität Bonn

2

3 Declaration By submitting the diploma thesis, I affirm, in accordance with 20 paragraph 6 of the diploma examination regulations of March 14, 2003, that I have written the thesis independently, have not used any sources and aids other than those specified, and have marked citations. Bonn, February 3rd

4

5 Contents 1 Introduction 1 2 Graphs Introduction Formal definition Storage of graphs Algorithms for graph problems Graphs from points The graph Laplace operator Discrete Exterior Calculus (DEC) Introduction Differential forms Discretization Storage of objects Important operators The mesh The dual mesh Laplace operator Sharp and flat operators Data structures Efficient calculation Mesh refinement High-dimensional simplex complex applications Darcy flow DEC results Heat conduction Heat conduction on tetrahedron meshes Non-uniformly distributed thermal conductivity Curvature of the unit sphere Darcy flow

6 Contents 5 Machine Learning Introduction Concepts Learning with Cores Unsupervised Learning Semi-Supervised Learning Differences in Laplace Operators Results in Machine Learning Learning of functions Classification of 2D data Classification of digits from the MNIST data set Recognition of handwritten digits Further applications DEC and machine learning Introduction Applications The Laplace kernel equivalence of the functional norm and the Laplace seminorm DEC-Laplace vs. Graph-Laplace A comparison of the Laplace kernels Result of the comparison Learning from DEC calculations Summary 119 4

7 1 Introduction Nowadays machine learning methods can be found in many areas, often without being aware of it, apart from the fact that some things just work better. Whoever receives one no longer has to sort out whether it is undesirable, as this task is taken over by an automatic SPAM filter, which saves the user time and nerves. If you enter text on smartphones, intelligent word completions are suggested that fit the context, sometimes even a suitable next word before the first letter is typed. Character recognition is used in many areas, letters no longer have to be sorted by hand at the post office, as postcodes can now be recognized very well by computers and a large part of the text can also be converted automatically when books are digitized. Even when recognizing speech, syllables are automatically recognized by computers and put together appropriately to words, so that today it is even possible to dictate entire texts to the computer, which are recognized with good accuracy. All of these problems have one thing in common: To adjust these systems in such a way that they fulfill their function well would be a time-consuming and complicated work for a person, which is only possible on the basis of the principle of trial and error. On the other hand, many correct examples of these problems are known. Anyone who has previously sorted out advertising from their emails themselves has many examples of emails that they would like to receive and of those that should be sorted out. Once a book has been digitized, the original scan and the associated text are known and with handwriting recognition it is no problem to collect handwriting samples from different people. If enough of this data is available, a method for machine learning makes it possible to learn the function that describes the data with the help of this sample data. For this purpose, the computer can select the best from a large number of possible functions which are candidates for solving the problem. The quality of such a candidate can easily be calculated by comparing the known solution with the calculated solution. A learning method thus searches in a large number of functions for the function which makes as few errors as possible on the known data. The real challenge with such a learning process is not to learn the training data, but to deliver a good result later on previously unknown data, such as a book that has not yet been digitized. A mere evaluation based on the error on the training data is therefore not sufficient, but function 1

8 1 Introduction must also be optimized to make few mistakes when working with new data. The problem can be easily imagined by thinking of a function that uses a table with the known data as the solution. Such a function can find every entry from the training data in the table and provide the correct solution, but with such a solution there are no clues as to what the solution looks like on data that are not in the table. It is therefore desirable that the calculation is based on as few features of the data as possible, so that the recognition is robust against insignificant changes in the features. For example, the number eight should not be recorded by its individual image points, since it can then no longer be correctly recognized if the spelling is somewhat skewed. Instead, for example, a feature containing two circles could be used as a function that can distinguish it from other digits even if it is indistinctly written. The approach of keeping the solution as simple as possible is called Occam's Razor [BEHW87], which is a metaphor for cutting off unnecessary complexity as if with a razor, so that only the core of the thesis remains. This principle of economy, which was originally a statement about different scientific explanations of the same data, plays a role in the learning of functions, since a machine-learned function also represents an explanation of the learned data. Compliance with this principle is implemented in machine learning by penalizing not only the error but also the complexity of the function that explains the data, whereby simpler functions than the functions can be preferred, even if a complex function has a slightly smaller error. For example, a function that recognizes the eight using this simple description will no longer recognize a really smeared digit from the training data, but will allow better recognition of clearly written numbers when used later on unlearned digits. The relationship between how big the error can be during training and how complex the function can be can be balanced with a regularization factor. It is possible to further improve this regularization with the help of so-called semi-supervised learning by adding further data to the existing training data without a known solution. These are often available in large quantities, for example through a number of not yet digitized book scans. This data cannot be used to measure the error of the function during learning, but it can be used to better classify the training examples. For example, handwritten digits without a known solution enable a grouping between the numbers, so that when learning the training examples with a sample solution, a specific digit can be assigned to the whole group as a solution. Technically, the input data are modeled as high-dimensional points and commonalities between the points mean that they lie on a common manifold (a shape in space). For example, if an input has three properties during learning, it can be viewed as a point that lies within a cube

9 places. If different dates are similar, the points are close together, if they are different they have a greater distance. A classification, for example according to different digits, tries to recognize these groups and to separate them from one another. A group defines, for example, the characteristic The number is an eight. If a point lies behind the boundary between the other groups and this group, it probably represents an eight. An operator that describes the similarity within such a group and enables learning from these points is the Laplace operator. This can be calculated in various ways from the training points and the given points without a training solution, and can be used to regularize the complexity. Models in the computer are also described with points in three-dimensional space, with which, for example, the figures in animated films are calculated. With such models, however, not only models can be represented, but they are also used to calculate simulations. Nowadays, before the actual crash test of a car prototype, a simulation can be carried out in a kind of virtual laboratory, with which some technical defects can be identified before the actual test in order to change the corresponding technical details beforehand. With such simulations, fewer tests are required, which saves the automotive company time and money as fewer real prototypes have to be built. One of the challenges of such a simulation is that the modeling leads to errors. For example, the hood of a car is curved, which means that it cannot be represented in the computer with a finite number of points and triangles, which is why every model can only be an approximation of reality. Especially when modeling physical processes, there are different approaches to deal with this simplification, which differences show at which points errors occur and how they are propagated in further calculations. One approach for such simulations is the Discrete Exterior Calculus (DEC), which allows abstract calculations on objects such as triangles, edges and points. For this purpose, only the relationship between the objects is used for many operations, without including concrete values ​​such as the position of the points in space. The values ​​are only added after the abstract calculations, which means that the error due to the simplified model only flows into the calculation at this point. The elements of the model take on tasks that seem intuitively suitable. The change in the value between two points is defined on the edge that connects these two points, and with a flow simulation in two dimensions, the flow of water that flows from a triangle over an edge into an adjacent triangle is stored on this edge. The fact that the operators are all defined on the relationships between objects without numerical values ​​having to be used to apply the operator means mathematically that important integral theorems, such as Green's and Stokes' theorem, are exactly correct and that no errors occur at this point which are reproduced in the further calculation. 3

10 1 Introduction In addition to solving PDEs in order to carry out physical simulations on models, the DEC also provides tools for machine learning by defining a Laplace operator. The Laplace operator of the DEC has the advantage that it is clearly defined by the given model and is very well adapted to the data of the model, while other Laplace operators are generated from the points in machine learning and the quality of the result depends heavily on the type of generation depends. The combination of the two approaches is that the DEC can be used to calculate a simulation on a 3D model, from which a function of the results is learned, with which it is possible to calculate data on points that were not simulated. For example, if a simulation on an exact model takes too long, the simulation can be calculated on a somewhat coarser model and then the result can be used as training data in machine learning to interpolate the value on the points of the fine model. The advantage of regularization with the DEC-Laplace is that points that are close to each other but are not connected in the model do not influence each other during learning. If you calculate the heat conduction on a bent sheet metal, in which two opposite ends come close without touching, then the heat is mainly transferred along the sheet metal, since the heat conduction through the air is significantly less than that of the metal. In a typical learning process, the points close to the two ends of the sheet would have a strong influence on one another, while in the simulation they have only very little influence on one another. If the DEC-Laplace is used for regularization, which was calculated from the model itself, the learning process also uses the shape of the model and thus calculates a function that does not have this problem. Own contributions This work builds on the work of Ulrike von Luxburg [VL07] on Graph-Laplace, Mikhail Belkin [BN04] on semi-supervised learning and Anil N. Hirani [Hir03] on Discrete Exterior Calculus and extends the topics described there DEC and semi-supervised learning to consider a combination of both methods. Own contributions to this work are: Evaluation of the semi-supervised learning with a graph Laplace from the data. Calculation of example functions with your own implementation of semisupervised learning. Comparison of a graph Laplace operator with the Laplace operator of the Discrete Exterior Calculus. Calculation of different PDEs with your own implementation of a DEC framework. 4th

11 Consideration of various possibilities to construct graphs and simplex complexes from high-dimensional points. Use of the DEC Laplace operator for semi-supervised learning of data on meshes. Structure of the thesis This thesis is divided into four sections, each of which explains the theory of the methods used in order to show results you calculated yourself: Graphs: Repetition of the most important graph definitions used in DEC, construction of graphs from (high-dimensional) points and finally the definition of a graph Laplace operator. Discrete Exterior Calculus: Description of the Discrete Exterior Calculus, definition of efficient data structures and methods for the implementation of the DEC, presentation of possibilities to generate simplex complexes from high-dimensional data, application of the DEC for solving PDEs of various physical problems and presentation of solutions that are based on a own implementation. Machine learning: Presentation of semi-supervised learning with the help of kernels and regularization using a Graph-Laplace operator, consideration of the differences between Graph-Laplace and DEC-Laplace operators and the result of sample calculations with a self-implemented program for learning from training data and data points with no known training value. DEC and machine learning: Presentation of a Laplace kernel from the eigenvalues ​​and eigenvectors of a graph-Laplace operator, comparison of the graph-Laplace kernel from a generated graph and the DEC-Laplace operator and learning of data on three-dimensional models with the help of semi-supervised learning with a Laplace- Regularization. 5

12

13 2 Graphs 2.1 Introduction Graph theory makes it possible to model relationships between objects, whereby there are no special requirements for the number of objects in a graph. Illustrative examples of sets on which graphs can be used to calculate interesting things are, for example, cities, groups of people, websites and books (e.g. who is quoting whom). The set of objects includes a set of relationships to which different requirements can apply depending on the graph. An introduction and elementary algorithms on graphs can be found in [BM76], [Kön01]. At this point, some basic definitions and algorithms are briefly defined, which are relevant for the further parts of the work. 2.2 Formal definition A graph G = (V, E) has a set of objects V (the nodes of the graph) and a set E of relationships (the edges of the graph) between two objects. The edges are a subset V V, whereby further restrictions apply to certain graphs. Different ways of defining the set of edges result in different types of graphs that are used for different purposes. The edge set stores tuples of two nodes [v i, v j] with v V The node set is an ordered list of the objects that form the nodes of the graph, whereby only the index and possibly the weighting of the node is necessary for graph algorithms. Complement For a subset V V of the node set V, the complement is defined as V = V \ V. In particular, = V and V =. Directed and undirected graphs A graph with an edge set E = {[v 1, v 2] v 1, v 2 V} with (v 1, v 2) (v 2, v 1), in which the order of the nodes of the edge is clearly stored is called directed. An undirected graph has an edge set E = {{v 1, v 2} v 1, v 2 V}, for which edges are given as a set {v 1, v 2}. Such a set defines only a connection between the nodes v i, v j, but no direction. 7th

14 2 Graphs Notation for Edges We write edges with vertices v 1, v 2 for directed graphs as [v 1, v 2], and for undirected graphs as set {v 1, v 2}. If it is irrelevant whether the graph is directed, we may also write [v 1, v 2], since the edge can be viewed in any orientation.Let e ​​= [v 1, v 2] E be an edge and v V be a node, then we define v E as the relation that v is an end point of an edge, ie e = [v 1, v 2] E, sd. V = v 1 v = v 2. A directed edge e points from v 1 to v 2, and we call v 1 the starting point and v 2 the end point of the edge. Weights and Nodal Degree A graph with edge weights assigns a weight to each edge. This can be stored, for example, by an expanded set of edges E V V R or by a function E R. Furthermore, a weight of 0 means that the edge does not exist. Nodes can also be assigned weights, which e.g. with be defined by a mapping V R. The node degree is defined as the number of edges e that have the node v as one of their endpoints, i.e. degree (v) = {e e E, w V with e = {v, w}}. For a directed graph we define the degree of entry of a node v as the number of edges e = [v 1, v 2], with v = v 2 and the degree of exit of a node as the number of edges e = [v 1, v 2] , with v = v 1. Paths Two nodes vi, vj are connected by a path if there are edges [vi, v i + 1], ..., [vj 1, vj] such that the nodes are connected. If the graph is directed, the edges must be traversed according to their orientation. Path (v, w) denotes an (arbitrary) path from v to w. The length of a path is the sum of the edge weights of the edges traversed. If the graph is unweighted, we take the edge weights as 1 and the length corresponds to the number of edges. A path that ends at the same node as it begins is called a circle. A graph that does not contain any circles is said to be a circle. The complete graph of a vertex set V has an edge between all pairs of vertices. This means that {v 1, v 2} E v 1, v 2 V. A node has a loop (inherent edge) if there is an edge e E with e = (v 1, v 2), v 1 = v 2 V. If there are no loops in a graph, then E = {(v 1, v 2) v 1, v 2 V, v 1 v 2}. Connected components A subset of points C V is a connected component (ZHK) of G if all nodes within the ZHK are connected by paths and there is no path 8

15 2.2 Formal definition that connects a node outside the ZHK with a node in the ZHK: Path (v, w) Path (v, w) v, w C v C, w C Tree A tree is a connected circular graph. Degree 1 nodes are called leaves, all other nodes are internal nodes. A directed tree is a directed circular graph in which exactly one node has entry degree 0 and the entry degree of all other nodes is 1. The node with degree 0 is called the root of the tree, all nodes with degree 0 are called leaves. Remark: The fact that the graph is connected follows from the fact that it is circle-free and has exactly one root. A minimal spanning tree of an undirected connected graph is a subgraph that yields an undirected tree of the graph's nodes with a minimal sum of the edge weights. A spanning tree thus contains the minimum number of edges that are necessary for the set of nodes to result in a connected graph. Examples Undirected Graphs An example of an undirected graph with edge weights is a city network, as shown in Figure 2.1. One edge corresponds to the connection of two cities by a street. It is assumed that the streets are not one-way streets, which would correspond to directed edges. Directed graphs An example of a directed graph is the web graph. The web graph is a graph that models the link structure between different websites. An example of such a link structure can be seen in Figure 2.1, which shows a graph that models the structure of links between different domains. If links from the website are also counted to itself, the graph can contain loops. If you capture each hyperlink individually, you get a graph with multiple edges, since a page with several links can link to another page. Trees An example of a (directed) tree is the directory structure of a computer. Each folder and each file is in exactly one folder and therefore has entry level 1, except for the root directory /, which is the root of the tree. 9

16 2 Graphs A-City Website 1 20km 20km Website 2 Website 5 15km C-City D-City 30km Website 3 Website 6 B-City Website 4 Fig. 2.1: Left: A road network between cities is an undirected graph, the edge weights of which are the length of the routes are. Right: Example of a directed graph: The link structure between different Internet domains. / etc usr boot home dev var ... bin lib local ... alice bob ... bin lib ... Fig. 2.2: The Linux directory structure is an example of a tree. 10

17 2.3 Storage of graphs V = [v 1 = A-city, v 2 = B-city, v 3 = C-city, v 4 = D-city] A = Fig. 2.3: The adjacency matrix for the city graph from storage of graphs The storage of the nodes can be done with a simple list and each node receives its position in the list as a unique index. Any node weights can be stored in a list with the same index set. Depending on the application, various structures are available for storing the set of edges, including a list of the edges [v 1, v 2, weight], or implicit storage in which a list for the outgoing and / or incoming edges is stored for each node, E v = ((v 1, edge weight), (v 2, edge weight), ...). To store a graph with edge weights and without multiple edges, an adjacency matrix A can also be defined, in which the weight of the edge [v i, v j] is at the point A ij. If the graph is undirected, then the matrix A is symmetric and if the graph is not weighted, then A v1, v 2 {0, 1} applies. We use the adjacency matrix as defined above for the following algorithms, since they can be implemented efficiently with it. An example of such an adjacency matrix for the graph from Figure 2.1 is shown in Figure 2.3. 2.4 Algorithms for graph problems Depth / breadth-first search: A depth-first or breadth-first search searches for nodes in a tree, starting from the root. The depth-first search first branches into the subnodes for each node, while the breadth-first search first processes all neighboring nodes. Dijkstra's algorithm: A greedy algorithm to find all shortest paths in a graph with nonnegative edge weights. A formal description can be found in [BM76, page 19ff]. A naive implementation with a list results in a running time of O (V 2 + E), with a Fibonacci heap the running time can be reduced to O (V log V + E) [ft87]. An efficient calculation is also possible with the help of dynamic programming [sni06]. 11

18 2 Graphs Bellman-Ford algorithm: The Bellman-Ford algorithm calculates the shortest paths from a starting node and, in contrast to Dijkstra's algorithm, can also deal with negative edge weights. From the set of edges, the algorithm selects the minimum edge that does not form a circle with the previous edges until the spanning tree has been constructed (and thus every additional edge would create a circle). The construction is not deterministic, but every result is a minimal spanning tree, i.e. the sum of the edge weights is the same. The runtime of the algorithm is O (V E) to find the way from one start node to all the others. For all shortest paths we get O (V 2 E) accordingly. A formal description can be found in [Bel56]. Kruskal algorithm: The Kruskal algorithm calculates a minimal spanning tree as defined in section 2.2. If all edge lengths are different, the spanning tree is clearly defined, otherwise there can be different spanning trees in which the sum of the edge weights is the same. The running time is O (E log V), where log is the minimum number of applications of the logarithm that is required for log (log (... log (n))) 1 to apply. The formal definition can be found in the original paper [Kru56]. Other Interesting Graph Problems One of the most popular graph problems is the traveling salesman problem [Flo56]. The task is to determine the shortest round trip from a set of cities that can be visited in any order. The problem can be formulated in such a way that in a complete graph the shortest path from a node to itself is sought, which contains every node v V 1. The problem is NP-complete [Pap77] and therefore it is not known whether it is is solvable in polynomial time. There is a simple approximation that constructs a path that is at most twice as long as the optimal solution. For this purpose, the minimal spanning tree of the points is constructed and this is then run through with a depth-first search. Each edge is visited twice and the tour begins and ends at the root. Since the spanning tree minimizes the sum of the edge weights, the path is at most twice as long as the optimal solution. Pagerank A prominent graph algorithm is the PageRank algorithm, which Page, Lawrence, and others described in [PBMW99], which models the World Wide Web as a directed graph. The nodes correspond to individual web pages and the edges to the hyperlinks between different web pages. The weight of an edge from a node v to a node u is calculated from the quotient of the (known) page rank of the website v and the number of pages 1, where the edge weights represent the length of the paths 12

19 2.5 Graphs from points linked to the v. The pagerank of the website u can then be calculated from a sum of all v that link to the website u. The calculation of the pagerank from a given graph is a recursive problem which converges with an arbitrarily chosen set of node weights as starting condition 2 against a unique solution of the node weights (the pagerank) and the edge weights. Pagerank can be implemented efficiently as repeated matrix-vector multiplication of a stochastic matrix with the current pagerank values ​​[Hav99]. The pagerank is one of the most important factors in the ranking of the search engine Google today. 2.5 Graphs from Points In many cases there is a set of points V, for example from a measurement, but no associated set of edges. In order to work with graph algorithms on such a set, one must first define an edge set. In the following we assume that a distance function d: V V R + is given. You can now construct a graph that connects similar points according to this distance measure. There are different approaches as to which points should be connected, such as a minimal spanning tree, the r-graph, the knn-graph and the graph of relative neighbors. [Tou80, MVLH08] Methods for graph construction The full graph is a very easy choice the complete graph in which the relationships between the nodes are only defined by the similarity function. The r-graph The r-graph (sometimes also called ɛ-graph in the literature) has a radius r, which defines whether points are considered to be adjacent. Then the points v i, v j are connected if d (v i, v j) r. r can be freely selected or, if necessary, calculated from the data; it would be conceivable, for example, to use a function of the minimum, average or maximum distance between two points as the basis for calculating r. k nearest neighbors The knn-graph connects a point with its k nearest neighbors. A variant is the symmetrical (mutual) knn-graph, which connects two points only if both have the other in the set of their k nearest neighbors. 2 In general, the choice of the start condition only affects the convergence rate, not the solution 13

20 2 Graphs Minimal Spanning Tree A minimal spanning tree, as defined in Section 2.2, is also a possible graph that can be generated from a point set. Weighting The weighting of the edges of such a graph can also be defined differently. The simplest possibilities are: Binary weighting: If two points are connected, the weight is 1, otherwise 0. The weight is (inversely) proportional to the distance. The weight is (inversely) proportional to the number of neighbors of a node. A typical example for the weighting of an edge between two points by the distance is d (x i, x j): = exp (x i x j 2 2). Due to the Gaussian bell, edges with 2σ 2 differently weighted edges between points that are close differ significantly more from one another than edges between points that are far apart. The disadvantage of the approach is e.g. in the case of a complete graph, that the matrix is ​​fully occupied. The other options for generating a graph usually result in more sparsely populated matrices, which can be used for faster calculations. In [VL07], von Luxburg recommends the k-nearest-neighbors graph as the first choice for most applications, among other things because it is less problematic than other graphs when parameters are poorly chosen. Examples The graphs generated from the points in Fig. 2.4 using these methods are shown in graphs 2.5 and 2.6. For the sake of clarity, the complete graph is not shown. Fig. 2.4: The points from which a graph is to be created Significance of the graph construction As you can see from the examples, depending on the construction, graphs arise that differ greatly from one another. The influence of the graph construction on the 14th

21 2.6 The graph Laplace operator Fig. 2.5: Left: A minimal spanning tree of the points. Right: an r-graph of the points. Fig. 2.6: Left: A knn-graph with k = 3. Right: a symmetrical knn-graph with k = 3. Results for example of cluster algorithms are shown in [VL07, Section 8.1] and examined more closely in [MVLH08] . There it is shown that the choice of the graph can have a great influence on the result. The choice of parameters also has a great influence on the result, especially with the r-graph it can be difficult to find a radius that connects neither too many nor too few points with each other. The knn-graphs are more flexible because the number of neighbors does not depend on the distance in space. With the full graph and the minimal spanning tree, there are no parameters that control connectivity. There only the calculation of the edge weights influences the result. In the case of a complete graph, the edge weights are the only information about the position of the points. In the example graphs you can see that completely different connected components arise, depending on the algorithm with which the graph was generated. The degree of knot also differs significantly. 2.6 The Graph-Laplace Operator The Graph-Laplace is a discretization of the Laplace operator on the points of a graph. This is particularly interesting for graphs that represent meshes (e.g. triangulations of an area), but the operator is defined for all simple graphs. While the Laplace operator is clearly defined in continuous space, there are many different discretizations of the operator, which have different advantages for different purposes, but also have specific disadvantages, see [WMKG07]. The two-dimensional finite difference Laplace A frequently used discrete Laplace operator for regular lattices is the five-point star of the finite difference Laplace operator in two dimensions [Mit]. This looks at neighboring points on the grid, adds the value of these four points and then draws 15

22 2 Graphs Fig. 2.7: The 5-point star of the finite difference Laplace operator from four times the value of the middle point. That means for a function φ: RRR applies (φ) (x, y) φ (x 1, y) + φ (x + 1, y) + φ (x, y 1) + φ (x, y + 1) 4φ (x, y) This discrete Laplace operator works on regular lattices and uses exactly these 5 points. A general graph Laplace operator We define the non-normalized graph Laplace as in [VL07] and also use the same notation. We will not consider the two normalized graph Laplacians here, but it should be noted that all three graph Laplacians are very similar if the degree of the nodes deviates little from each other. If these deviations are greater, however, there are clear differences [VL07, 8.5]. Note: The graph Laplace L defined there has a negative sign, in contrast to e.g. For the finite difference Laplace operator, the values ​​on the diagonal are positive and those of the neighboring points are negative. Let G = (V, E) be a simple graph with non-negative edge weights, and W its adjacency matrix, which stores the weight w ij of the edge e ij = [i, j] at i, j. Since the graph is undirected, the matrix W is symmetric, i.e. w ij = w ji i, j V. Definition 2.1. The degree matrix D of the nodal degrees is a diagonal matrix of the nodal degrees: D ij = {grad (vi) if i = j 0 otherwise (2.1) Definition 2.2. The non-normalized graph Laplace is L: = D W L ˆ = 16

23 2.6 The Graph-Laplace Operator Comment: Loops do not flow into the Graph-Laplace. The fact that loops have no influence on the Graph-Laplace can easily be seen from the construction of the entries l ii from w ii and d ii: nl ii: = d ii w ii = (w ij) w ii = (w i1 + + w ii + + w in) w ii j = 1 l ii = nj = 1, jiw ij i {1, ..., n}. It follows from the fact that the entry w ii is omitted that possible loops do not change the matrix L. Example Figure 2.8 shows a simple example with 8 nodes and two connected components. We take the edge weights for the unweighted graph as 1 Fig. 2.8: Example graph for the Graph-Laplace 17

24 2 Graphs The matrices for the graph in Figure 2.8 are: W: = D: = L: = D W = (2.2) (2.3) Properties of Graph-Laplace Let L R n R n be such a Graph-Laplace matrix. Then: 1.All row and column sums of the matrix L are For every vector f R n, f T Lf = 1 2 ni, j = 1 w ij (f i f j) L is symmetric and positive semi-definite. 4. The smallest eigenvalue is 0 and the associated eigenvector is the one vector The eigenvalues ​​of L are 0 = λ 1 λ 2 λ n with λ i R. Proof: Properties of L 1) follows from the fact that D ii is the i -th row / column sum of W is. 18th

25 2.6 The Graph-Laplace Operator 2) Let f R n. Then: f T Lf = f T (DW) f = f T Df f T Wf (2.4) nn = fi D ij fjfiw ij fj (2.5) = i, j = 1 ni = 1 f 2 i D ii i, j = 1 ni, j = 1 fiw ij fj (2.6) = 1 nn 2 fi 2 D ii 2 fiw ij fj (2.7) 2 i = 1 i, j = 1 = 1 nnn fi 2 D ii + fj 2 D jj 2 fiw ij fj (2.8) 2 i = 1 j = 1 i, j = 1 = 1 nnn fi 2 w ij + fj 2 w ij 2 fiw ij fj (2.9 ) 2 = 1 2 i, j = 1 ni, j = 1 i, j = 1 i, j = 1 w ij (fifj) 2 (2.10) In 2.6 we use that D ij = 0 ij, since D is a Is diagonal matrix. After we have multiplied the formula by 2 in 2.7 and divide it by 2 outside the brackets to compensate for this, the sum over D ii can now be divided into one over i and one over j, since D is symmetrical. In step 2.9 we insert the definition for D ii in both terms and thus get the result in 2.10 according to the binomial theorem. 3) The symmetry follows directly from the fact that W and D are symmetrical. From 2) follows f T Lf 0, therefore L is positive semi-definite. 4) is true because the row sums of the matrix are 0. 5) it follows that the matrix is ​​positive semi-definite (i.e. only has non-negative eigenvalues) and, as shown in 4), has the eigenvalue 0. The eigenvalues ​​of the Graph-Laplace From the eigenvalues ​​of the Graph-Laplace some interesting properties of the graph can be read. The Eigenvalue 0 Definition 2.3. Let A {1, ..., n} be the set of node indices. Then the indicator vector indicates which indices are contained in A. We denote the i-th entry of the vector as 1 i A and define the indicator vector for A as: 19

26 2 graphs 1 i A: = {1 if i A 0 if i A Theorem 2.1. The multiplicity of the eigenvalue 0 corresponds to the number of connected components of the graph, and the indicator vectors 1 Ai of the ZHK A i are eigenvectors for the eigenvalues ​​0. Let f be an eigenvector of L with the eigenvalue 0. Then: 0 = f T Lf = 1 2 ni, j = 1 Lemma 2.1. In a ZHK A we have fi = fj for all indices i, j A Proof: For all indices i, j in the same ZHK we have fi = fjw ij (fifj) 2 (2.11) Since all w ij of the edges are greater than 0, For two directly connected nodes vi, vj it follows from 2.11 that fi = fj. For a path v 1, v 2, ..., v n it holds for each segment [v i, v j] that f i = f j. According to the definition, all nodes of a connected component are connected to one another by paths, so f i = f j i, j A. Note: The opposite direction, reading the ZHK from the eigenvectors, is not directly possible, since the basis does not necessarily have to consist of the indicator vectors. However, the basis vectors have the form k i = 1 a i 1 Ai with a i R, which means that they still contain the information about the ZHK, see [VL07, Section 8.2]. Lemma 2.2. The multiplicity of the eigenvalue 0 corresponds to the number of connected components of the graph. Proof: Without loss of generality, we can assume that the lines of the graph are grouped according to connected components, since the sorting of the node list can be chosen arbitrarily. This sorting results in a block matrix, since according to definition there are no edges between different ZHK, and therefore L ij = 0 i A, j A. The blocks of the matrix are themselves graph-Laplace matrices for the subgraphs, which are exactly the corresponding ones ZHK contain, and therefore have the eigenvalue 0 with eigenvector 1. Theorem follows from the two lemmas

27 2.6 The Graph-Laplace Operator Formula 2.3 shows the block structure of the Laplace matrix for the graph from Figure 2.8. Graph Cuts Definition 2.4. The weight between two sets of nodes A and B is defined as W (A, B): = i A, j B w ij. (2.12) One application of the non-normalized graph Laplace is to find a graph cut with which a connected graph can be broken down into several clusters. The function cut (a 1, ..., A k): = 1 k W (A i, 2 Āi) (2.13) i = 1 defines a measure of the quality of a graph cut by measuring how strong it is Is the weight of the edges between the subgraphs. Definition 2.5. A graph cut is called minimal if the weight cut (a 1, ..., A k) is minimal. Finding a minimal graph cut is possible in polynomial time, but if the number of nodes in A and Ā is to be balanced with respect to a balance function, the problem becomes NP-hard [WW93]. Without such a balance function, however, the optimal solution to the problem is often that a single node is separated from the graph, which is not useful for applications such as clustering. One of the balanced graph cuts from [VL07] is RatioCut: RatioCut (A 1, ..., A k): = 1 2 ki = 1 W (A i, Āi) A i = ki = 1 cut (ai, Āi ) A i (2.14) With RatioCut, the number of nodes in the clusters is balanced. An approximation of the optimal cut for two sets of nodes can be calculated with the Cheeger-Cut [BH09] using the second eigenvalue of the Graph-Laplace. The signs of the entries of the associated eigenvector indicate the clusters. Example For the example from Figure 2.8 with the associated Graph-Laplace 2.3 we have: Eigenvalues: 0, 0, 2, 3, 3, 4, 5, 5 Eigenvector for eigenvalue 2: (0.55, ɛ, 0.45, 0.28, ɛ, 0, 0, 0) 21

28 2 Graphs Clustering according to sign: (1, 2, 2, 2, 2, 1, 1, 1) The values ​​marked with ɛ are negative and are close to 0, but for the clustering result, unlike the values, must be exactly 0 can also be classified as negative values. The result of the cut can be seen in Figure 2.9. Figure 2.9: The graph cut of the example graph, calculated with the second eigenvector. Relationship of the Graph Laplace to the Finite Difference Laplace The graph Laplace is calculated on a uniform rectangular grid as in Figure 2.7 , just comes out L = fd. This definition of the graph Laplace operator can thus be seen as a generalization of the finite difference Laplace operator. 22nd

29 3 Discrete Exterior Calculus (DEC) 3.1 Introduction After we have dealt with graphs as a more general form of lattices, and have defined a Laplace operator with the Graph-Laplace that can be applied to every graph structure, the question now arises as to which Graphs are useful for various calculations. We saw in Section 2.5 that graphs can be generated from a set of points by generating a set of edges from the geometric properties of the points in a suitable manner. Now let's look at the problem from the other side. A graph is already available for many applications, as the points come from a known structure, such as a 3D model. For example, if you want to solve the heat conduction equation, you usually have given an object on which to distribute the heat. This object is already available as a mesh for the numerical calculations. Since the object is already given as a mesh, the edges of the triangulation are present. Intuitively, it makes sense to calculate the values ​​with the help of these edges instead of defining a new grid to calculate the values ​​there. Using the edges of the model also has advantages over a new grid, as they clearly define the surface and, for example, prevent heat from spreading between points that are geometrically close, but e.g. are themselves far apart on a curved surface is avoided. Concepts such as adaptive grids, which calculate more detailed parts of the mesh with a finer grid, are no longer explicitly necessary. The mesh itself defines the grid and parts of the mesh with a higher resolution therefore also have a higher resolution during the calculation. A general method of doing this is called Discrete Exterior Calculus (DEC). What is the Discrete Exterior Calculus? The Discrete Exterior Calculus is a method of defining operators on a mesh so that partial differential equations can be solved directly there. The concrete values, such as the coordinates of the points in the mesh and the values ​​on these points, are separated from the relations of the objects used. As a result, the abstract objects can be used for exact calculations, and a discretization error only arises when the calculated objects are applied to discretized values. 23

30 3 Discrete Exterior Calculus (DEC) One of the strengths of the DEC is that physical conservation quantities and integral theorems are met precisely, since all quantities and operators used are defined directly on the mesh. While in many other methods the integral theorems asymptotically show an ever smaller error at higher resolution, in the DEC only the accuracy of the calculated quantities depends on the resolution, but not the accuracy of the integral theorems. To implement this, elements such as points, edges, surfaces and tetrahedra are defined as abstract objects in a graph. The numerical values ​​such as area, length, value of a function on a point are saved independently and only applied to these objects when required. As a result, operations such as differentiate or the application of a Laplace operator are defined as operations on graphs, which means that they work directly on the mesh used and are therefore exact. In the concrete calculation, these operators then take the form of multiplications of matrices and vectors that map the relations of the objects, such as adjacency matrices. Basic work on the topic One of the most extensive works on the theory of Discrete Exterior Calculus is the dissertation by Anil N. Hirani [Hir03] with the title Discrete Exterior Calculus, which we will follow in the following for the theoretical definitions. During the implementation we follow the work Building Your Own DEC at Home by Elcott, Schröder [ES05] and Discrete Differential Forms for Computational Modeling by Desbrun, Kanso, Tong [DKT08]. 3.2 Differential forms Since the Discrete Exterior Calculus deals with a discretization of differential forms, the general concept of a differential form is introduced here, in order to use it later in discrete form in the DEC. A differential form is an approach to make the integration of functions over (multidimensional) intervals independent of coordinates. For example, the value of a line integral is independent of whether the curve over which it is integrated is shifted as a whole in space. An intuitive definition is given in [DKT08]. There f = df is defined as the derivative dx of a function F and with regard to the integration ba df = baf (x) dx = F (b) F (a) (3.1) the integrand f (x) dx is a differential 1 -Shape. The term k-shape refers to the number of dimensions over which integration takes place. 24

31 3.3 Discretization The same applies to a curve in more dimensions, such as a function G (x, y, z) in three-dimensional space, for which the 1-form dg = G x dx + G y G dy + dz (3.2 ) z can be integrated over every curve in R 3. Analogously, a 2-form is an integrand which can be integrated over any surface and, in general, a k-form can be integrated over a k-dimensional manifold. Formal definition The formal definition of a k-form is: Let R n be the surrounding space and M R n an open subset of the space, a so-called n-manifold. Then at each point x M the vector space T x M is spanned from the tangent vectors of the manifold at point x. A k-form ω k is then an anti-symmetrical tensor field on M, which when evaluated on a point x M results in a mapping which maps k tangent vectors to a real number: ω k: T x MT x MR (3.3) Um with differential forms To compute on discrete objects, the manifold and the differential form themselves must be stored as discrete values. For this purpose we now define a discretization of these quantities. 3.3 Discretization In order to calculate on discrete models, it is first defined how a manifold can be discretely approximated by a 3D model in order to then define differential forms that work on discretized manifolds. Objects in 3D meshes The basic objects for storing a manifold in the DEC are the same as in the usual mesh formats of 3D models. A set of points in space (vertices) that is stored as a list of three-dimensional vectors. A vertex can be clearly identified by its index in the list. Ordered lists of discrete objects (points, edges, surfaces, ...), which are defined by the vertices at their corner points. 25th

32 3 Discrete Exterior Calculus (DEC) Typical 3D formats, however, only save the objects of the highest dimension, as these are sufficient to describe a 3D model and contain all other information implicitly. Example: A format that stores triangle meshes has a list of triangles, while the edges can only be reconstructed implicitly from the triangles. The orientation of the surfaces is only given by the order of the vertices in the definition of the triangle. One such format is the Wavefront OBJ format. An example can be seen in Section 3.10. Objects in DEC In contrast to these formats, a DEC implementation requires lists with the objects of all dimensions less than or equal to the dimension of the manifold, and therefore saves a separate list of k-dimensional objects for each dimension k. Definition 3.1. A vertex is a d-dimensional vector in space R d, which is stored as a d-tuple of real numbers. The vertices can be uniquely designated with v i, where i is the index in the vertex list. Example: (0, 0, 1) R 3 is a vertex in three-dimensional space. Definition 3.2. A k-simplex is the convex hull of k + 1 vertices. In order for it to be k-dimensional, we further require that the vertices of the simplex are linearly independent. A simplex receives a clear orientation through the sequence of the vertices, which defines a direction of rotation. We denote a k-simplex with σ k. Note: An oriented simplex is clearly defined by the set of its vertices ordered in the direction of rotation, but all even permutations of this set result in the same simplex. Example: σ 2 = [v 1, v 2, v 3] defines a 2-simplex (triangle) with the corner points v 1, v 2, v 3 and the direction of rotation v 1 v 2 v 3 v 1. We define for oriented Simplices continues to have a minus operator, which reverses the orientation. A simple implementation is to reverse the order of the vertex indices in the tuple, which defines a simplex: σ k = [v 1, v 2, ..., v k + 1] = [v k + 1, vk, .. ., v 1] (3.4) For some simplices we use already existing geometric designations: 0-simplex: vertex or point 1-simplex: edge 26

33 3.3 Discretization 2-simplex: triangle 3-simplex: tetrahedron Note: The term vertex is ambiguous. Vertex often stands for a corner point of a simplex, which is a coordinate in space. A vertex as 0-simplex, on the other hand, represents a point in space as an object that has exactly one vertex (corner point). The 1-tuple, which describes a 0-simplex, contains exactly the index that the vertex has in the list of vertices. In particular, the list of vertices usually has the same cardinality as the list of 0-simplices, but generally 1 does not have the same indices. Example: Even permutations of simplices As an example that even permutations of the order of the vertices of a simplex describe the same orientation, we consider a triangle [v 1, v 2, v 3]. The edges [v 1, v 2], [v 2, v 3] and [v 3, v 1] are visited in one cycle. If one considers the even permutation [v 2, v 3, v 1], in which the vertices v 1, v 2 and then in the sequence v 1, v 3 were swapped, one revolution visits the edges [v 2, v 3], [v 3, v 1] and then [v 1, v 2]. You can see that the straight permutation has changed at which point of the triangle the orbit begins, but not in which direction the individual edges are traversed. Faces The (k 1) -simplices that lie on the edge of a k-simplex are called its faces. The origin of the term are the triangles on the edge of a tetrahedron, but it is used more generally here for objects of lower dimension that lie on the edge of an object. If no explicit dimension is given, a face of a k-simplex always means a (k 1) simplex on the edge of the k-simplex. Since the faces of a simplex are stored with a clear orientation in the list of (k 1) simplices, the face has an orientation which can be compared with the orientation of the k-simplex. The relative orientation of the faces of a simplex is given by whether the direction of rotation of the stored (k 1) simplex corresponds to the direction of rotation of the (k 1) face generated by the k simplex. With the exception of even permutations, this also agrees with the definition of the minus operator for simplices, which converts the negatively induced face into a permutation of the stored one. {+1 with the same orientation sign (σ k, σ k 1): = 1 otherwise (3.5) 1 In fact, the indexing of the two lists with some methods of reading a mesh is completely different 27

34 3 Discrete Exterior Calculus (DEC) Fig. 3.1: The edges have a negative orientation in relation to a triangle if their direction does not coincide with the direction of rotation of the triangle. We can compare the orientation of two simplices exactly when they are either in the same (hyper) plane or have a common face. In the second case, they are oriented in the same way if the common face is oriented positively by one of the simplices and negatively by the other.Simplex complex A simplex complex K is a set of simplices for which the following applies: If a k-simplex (with k 1) is contained in the complex, all its (k 1) faces are also contained. Two simplices either do not intersect or in a common face. 3 2-simplices (triangles) 7 1-simplices (edges) 5 0-simplices (points) Fig. 3.2: Example of a 2D simplex complex We denote the set of all simplices in the complex with dimension k by K k. The set of simplices with dimension less than or equal to k is called the k-skeleton of K and we denote it with K (k). 28

35 3.3 Discretization Definition 3.3. A k-chain is a map c k: K k R that assigns a value to each simplex. A k-chain can be seen as a linear combination of k-simplices (cf. [Hir03, Def]): c = σ k K kc (σ k) σ k (3.6) We denote the space of the k-chains in a complex K with C k (K). Fig. 3.3: Example: A 1-chain which assigns weight 1 from the edges e 0 and e 1 and weights the edge e 2 with 0. With the help of k-chains it is possible, for example, to integrate only over a part of the discrete manifold, in that the desired part has the coefficients 1 and the rest of the manifold is weighted with 0. Definition 3.4. A k-co-chain is a mapping ω k: C k R, which assigns a real number to every k-chain, and thus corresponds to a discrete differential form. We denote the space of the k-co-chains in a complex K with Ω k (K) Example: A 1-chain c 1 contains the weights for a discrete edge train, which is the discretization of a curve. A 1-co-chain maps c 1 to a real number, so that the application of the co-chain to c 1 corresponds to the integration of the associated differential form over the curve. Operators in DEC There are two forms of operators that interest us: Operators on the objects over which we integrate. For example the edge operator, which maps a surface onto its edge. Operators on the differential forms that we integrate, such as the differential operator. In DEC, the first form maps operators k-chains to l-chains, and the second k-cochains to l-cochains, for l, k n. In Section 3.5, some important discrete operators are defined. 29

36 3 Discrete Exterior Calculus (DEC) 3.4 Saving the Objects In the DEC there are three different categories of objects that are permanently saved, and two types that are used when working with the DEC. The following objects are generated from a given mesh and stored permanently: Vertices Simplices Operators Objects that are used when calculating with the DEC but are not stored permanently: Chains (linear combinations of simplices) Co-chains (discrete differential forms) Vertices The vertices are saved as stores a list of d-dimensional vectors, where d is the dimension of the space in which the vertices lie. Each vertex is uniquely identified by its index in the vertex list, since the set of vertices in a model does not change. Simplices For k = 0, ..., n the k-simplices are stored as ordered k + 1-tuples of vertex indices in a list of length K k. The orientation of the simplex is given by the sequence. A simplex is uniquely defined by its dimension k and the index in the list of k-simplices, and can e.g. can be written as (k, i) {0, ..., n} K k. Since a change in the order of the vertices only changes the orientation of the simplex (and its faces) and we can write the simplices with a sign, a simple way of saving is to assume the simplices of the highest dimension as positively oriented and then to generate the faces recursively. If a (k 1) face has already been added to the list of (k 1) simplexes, it is only saved whether the sign of the generated face is positive or negative in relation to the k-simplex. Chains and Co-Chains Since a k- (co-) chain is a linear combination of values ​​to k-simplices, it is stored as a vector σ k or ω k from R Kk, where the components of the vector correspond to the respective simplices of dimension k assigned. 30th

37 3.5 Important operators - + = + + Fig. 3.4: The boundary operator maps a triangle onto its oriented edges. + + The evaluation of a k-co-chain on a k-chain is a scalar product when stored as vectors: ω k ​​(ck): = ω k, ck = (ω k) T σ k (3.7) Operators A linear operator O, which maps a k- (co-) chain to an l- (co-) chain, is stored as an R Kl K k matrix, so that the application to a (co-) chain is a matrix-vector multiplication: O : R Kk R Kl Oω k 1 = ω l 2 or Oσ k 1 = σ l Important operators Boundary operator for simplices The boundary operator for k-simplices k: K k K k 1 is according to [ES05, ​​3.1.3 ] defined as: kk [v 0, ..., vk]: = (1) k {[v 0, ..., ˆv j, ..., vk]} (3.8) j = 0 where ˆv means j that just the jth vertex was removed to get a (k 1) face of the simplex. An example of the edge mapping can be seen in Figure 3.4. A triangle (2-simplex) [v 0, v 1, v 2] is mapped to the set of edges (1-simplices) on its edge by placing 31

38 3 Discrete Exterior Calculus (DEC) Fig. 3.5: The orientation of the middle edge can only coincide with the orientation of one of the two triangles for each edge one vertex of the triangle has been removed. Since the resulting edge [v 0, v 2] is oriented against the direction of passage of the triangle (v 0 v 1 v 2 v 0), it has a negative sign. As described in the definition of a k-simplex, the simplex with a negative sign corresponds to an inversely oriented simplex with the same points with a positive sign. Note: Negative orientation cannot be avoided. If a simplex σ k 1 is the face of two k-simplices σ1, k σ2 k, then it must be negatively oriented with respect to one of the two simplices. Figure 3.5 shows a 2D example in which the inner edge cannot be positively oriented in relation to both triangles. There are two ways of dealing with this in the implementation: Half-edges can be used in triangular meshes, which means that the same edge is stored in two different orientations. However, this approach does not make sense for higher-dimensional meshes. The sign of a k-simplex with respect to a k + 1-simplex is stored explicitly and the sign between simplices with a larger dimensional difference can be calculated recursively. This is given in the DEC by the sign in the edge operator. The boundary operator for chains From the linearity of the figure it follows that the operator is also defined for k-chains [DKT08]: k: C k C k + 1 (3.9) k (c) = kc (σ k) = kc (σ k) (3.10) σ k K k σ k K k 32

39 3.5 Important Operators The Boundary Operator as Matrix This linear mapping of k-chains to (k 1) chains defines a K k 1 K k matrix k. The following applies to the matrix: ij k ij k 0 otherwise = 1, if σk 1 i is a face of σj k = 1, if σk 1 i is a face of σj k The simplices σi k are stored. the orientation as shown in the list of k-simplices Example 3 F = {[v 0, v 1, v 2], [v 1, v 3, v 2]} E = {[v 0, v 1], [v 0, v 2], [v 1, v 2], [v 1, v 3], [v 2, v 3]} V = {[v 0], [v 1], [v 2], [v 3]} =, 1 = The example of the boundary operator 2, which maps triangles to edges, shows that the inner edges of a 2-chain (1, 1), which consists of both triangles, cancel each other out. This means that the result only contains the marginal edges. 33

40 3 Discrete Exterior Calculus (DEC) The coboundary operator Analogously, the coboundary operator d k: C k C k + 1 is defined, which maps k-chains to the k + 1 chains and is the adjoint operator to k + 1. Since the scalar product <ω k, ck> is a bilinear form, the following applies: dk ω k, c k + 1 = ω k, k + 1 c k + 1 (3.11) and therefore the coboundary operator for k-simplices results as the transposed matrix of the boundary operator for (k + 1) -simplices: dk = T k + 1. Comment: Because the edge of a k-simplex is a closed cycle of (k 1) -simplices, it has no edge itself and k 1 k = 0. Chain / Cochain complex The boundary and coboundary Operators define the chain / cochain complex: 0 C nn C n 1 n 1 ... CC ndn 1 C n 1 dn 2 ... C 0 0 where 0 and d n + 1 are defined as 0, which is consistent with is the operators for differential forms. This can also easily be seen in the definition of the boundary operator matrix. The differential operator With the help of the boundary operator we can now use Stokes' theorem to define the differential operator: d: Ω k (K) Ω k + 1 (K) Let be the boundary operator for discrete k-forms, σ a k-chain and ω a k-co-chain. Then the following applies: c ω = ω T (c) = (ω T) c = (T ω) T c = (dω) T c = c dω (3.12) This means that the co-boundary operator for k- chains is the differential operator for k-cochains. By construction, Stokes' theorem is exactly fulfilled, since it was used to define the operator. 34

41