The finite element method is used for solving partial differential equations (PDE). Solutions are achieved by either eliminating the differential equation completely (steady state problems), or rendering the PDE into an equivalent ordinary differential equation, which is then solved using standard techniques such as finite differences, etc. Use of the finite element method in engineering for the analysis of physical systems is commonly known as finite element analysis.
Finite element methods have also been developed to solve integral equations such as the heat transport equation.
The method was introduced by Richard Courant to solve torsion on cylinder. Courant's contribution was evolutionary, drawing on a large body of earlier results for PDEs developed by Rayleigh, Ritz and Galerkin. Development of the method began in earnest in the middle to late 1950s for airframe and structural analysis, and picked up a lot of steam at Berkeley in the 1960s for use in civil engineering. The method was provided with a rigorous mathematical foundation in 1973 with the publication of Strang and Fix's The Finite Element Method.
Finite element methods are used in a wide variety of engineering disciplines. In computer graphics, radiosity algorithms are finite element methods.
In solving partial differential equations, the primary challenge is to create an equation which approximates the equation to be studied, but which is stable, meaning that errors in the calculation do not acculumlate and cause the resulting output to be garbage.
See also: Discrete element method Spectral method
Table of contents 

Technical Overview
For this discussion, we assume a basic understanding of differential calculus in several variables. If f is a function, then the notation f_{x} will denote the partial derivative of f with respect to x.
The best way to introduce the subject is to give a simple example (a "model problem"). We shall use the Laplace equation on the torus [0,1)x[0,1). First some notation. Let
 T:=[0,1)x[0,1)={(x,y);0≤x,y<1}
 u is twice differentiable over R^{2} in the sense of multivariate calculus.
 u_{xx}+u_{yy}=g in T
 u(x,y)=u(x+1,y)=u(x,y+1), that is, u is periodic in x and in y, and of period 1 in both direction.
 u is in V, the vector space of twice differentiable functions of the torus T.
 Lu=g with g is some given vector in V.
 Lf=f_{xx}+f_{yy}
Weak formulation
Now let φ(u,v) be any functional of V, that is, φ is a function of V to F so that for any t in F and u,v in V:
 φ(tu+v)=tφ(u)+φ(v)
 Lu=g
 For all φ in V^{*}, φ(Lu)=φ(g\)
 φ(Lu)=φ(g)
The bilinear form of L, test functions
That is, we are now replacing the original problem with that of finding u in V so that
 φ_{v}(Lu)=φ_{v}(g)
 ψ(u,v):=∫_{T}(u_{x}v_{x}+u_{y}v_{y}) = ∫_{T}g·v = φ_{v}(g)
We note here that ψ only uses first derivatives, and it would therefore be possible to discuss solutions to the original problem while only assuming first derivatives. Also note that in fact in most cases, the solution u will be infinitely differentiable.
A minimal set of test functions
From functional analysis, we know that we do not in fact need to try every possible test function v in V. In fact, if E={e_{1},e_{2},...} is a subset of V so that its linear span is dense in V (in a suitable topology) then we can use only those functions as test functions. That is, we are now trying to find a solution to the problem
(*) ψ(u,e_{j})=∫_{T}g · e_{j} for every e_{j} in E.
First discretization step
(**) ψ(u_{n},e_{j})=∫_{T}g · e'_{j} for every e_{j} in F_{n}
This problem is underdetermined, so we take another discretization step.
Second discretization step
Substituting into (**) and expanding, we obtain:(***) ∑a_{k}ψ(e_{k},e_{j})=∑b_{k}∫_{T}e_{k}e_{j} for all j=1,...,n.
There is now the question of how to invent a suitable g_{n} and there are many approaches, depending on how the set E was chosen. If E is chosen to be some Fourier basis, then g_{n} can be obtained as the projection of g onto the linear span of F, but other approaches are possible.
The desire of course is again to make certain that the resulting u_{n} will converge to a solution of (*).
Matrix version
The astute reader will have noted that the last problem can be formulated as a matrix problem as follows:
 Pa=Qb
 P[j,k]=ψ(e_{k},e_{j})
 Q[j,k]=∫_{T}e_{k}e_{j}
Algorithm
The Finite Element algorithm is thus as follows:
 Compute the Stiffness Matrix P.
 Compute the Mass Matrix Q.
 Approximate g and compute b.
 Solve the matrix problem Pa=Qb for the unknown vector a.
 Possibly convert the vector a back into a solution u (e.g. for viewing with a graphical computer program.)
Note on the choice of basis
If one chooses a Fourier basis for E, then one gets a socalled Spectral method, which from our point of view is closely related to the finite element method. Typically, in the finite element method, the set F is chosen directly, and is not usually construed to be a subset of some E.
Instead, the functions of F can be chosen to be piecewise linear. A tesselation of the domain T is chosen, which decomposes T into triangles (say) and the functions of F are those that are linear on each component of the tesselation of T. (An illustration would be good here.) Each triangle is referred to as an "element."
To obtain better algorithms, one can attempt to vary the primitives of the tesselation; it may be more natural to use rectangular elements, and in some cases curvilinear elements are called for. Conversely, once the elements are chosen, one still has a choice of how to define the test functions on each element. Test functions are usually, but not always chosen to be piecewise polynomial.
If the test functions are chosen in a cunning way, the matrices P and Q will be structured so that the linear problem Pa=Qb can be solved very quickly. To understand the problem here, the calculation of Qb requires apparently O(n^2) multiplications and additions. However, if one chooses the Fourier basis, one notes that Q is the identity matrix, so there's nothing to be done to compute Qb. The coefficients of b can then be found using the fast fourier transform of g, which runs in O(nlogn) time. The matrix P is diagonal with easily calculated entries, so solving for a is trivial. One would then typically do an inverse Fourier transform on a, which requires O(nlogn) to obtain u.
If one chooses a finite element basis (using, for instance, the triangular elements discussed above) so that each test function is supported on a small number of elements then one can see that the matrices P and Q will be sparse  that is, most of the entries are zero. Then the calculation Qb can be done in O(n) time, and solving for a can be done efficiently using an iterative algorithm (such as the Conjugate gradient iteration).
We note that the choice of piecewise linear elements is not even once differentiable, however it is piecewise differentiable. It is interesting to study how such solutions converge to the real solution of the Laplace problem as the number of elements tends to infinity.\n