In statistics, the method of

**maximum likelihood**, pioneered by geneticist/statistician Sir Ronald A. Fisher, is a method of point estimation, that uses as an estimate of an unobservable population parameter the member of the parameter space that maximizes the likelihood function. For the moment let

`p`denote the unobservable population parameter to be estimated. Let

`X`denote the random variable observed (which in general will not be scalar-valued, but often will be a vector of probabilistically independent scalar-valued random variables. The probability of an observed outcome

`X=x`(this is case-sensitive notation!), or the value at (lower-case)

`x`of the probability density function of the random variable (Capital)

`X`,

**as a function of**is the

`p`with`x`held fixed**likelihood function**

`p`who will vote "yes" is unobservable, and is to be estimated based on a political opinion poll. A sample of

`n`voters is chosen randomly, and it is observed that

`x`of those

`n`voters will vote "yes". Then the likelihood function is

`p`that maximizes

`L(p)`is the

**maximum-likelihood estimate**of

`p`. By finding the root of the first derivative one will obtain

`x/n`as the maximum-likelihood estimate. In this case, as in many other cases, it is much easier to take the logarithm of the likelihood function before finding the root of the derivative:

**log-likelihood**is commonplace among statisticians. The log-likelihood is closely related to information entropy.

If we replace the lower-case `x` with capital `X` then we have, not the observed value in a particular case, but rather a random variable, which, like all random variables, has a probability distribution. The value (lower-case) `x/n` observed in a particular case is an **estimate**; the random variable (Capital) `X/n` is an **estimator**. The statistician may take the nature of the probability distribution of the **estimator** to indicate how good the estimator is; in particular it is desirable that the probability that the estimator is far from the parameter `p` be small. Maximum-likelihood estimators are sometimes better than unbiased estimatorss. They also have a property called "functional invariance" that unbiased estimators lack: for any function `f`, the maximum-likelihood estimator of `f(p)` is `f(T)`, where `T` is the maximum-likelihood estimator of `p`.

However, the bias of maximum-likelihood estimators can be substantial. Consider a case where *n* tickets numbered from 1 through to *n* are placed in a box and one is selected at random, giving a value *X*. If *n* is unknown, then the maximum-likelihood estimator of *n* is *X*, even though the expectation of *X* is only *n*/2; we can only be certain that *n* is at least *X* and is probably more.