How do I learn mathematics for machine learning?
Promoted by Time Doctor
Software for productivity tracking.
Time tracking and productivity improvement software with screenshots and website and applications.
55 Answers
William Chen, MS in Applied Mathematics from Harvard
The math most directly useful for machine learning is:

If you're interested in an accessible introduction to matrix algebra, Coursera is running a course on it right now: Coding the Matrix: Linear Algebra through Computer Science Applications

The applied math most directly useful for machine learning is:

Your response is private.
Is this answer still relevant and up to date?

· 256 Upvotes · Answer requested by Siddharth Verma

Promoted by QuantInsti
Become a successful algo & quant trader in 6 months.
Acquire the knowledge, tools & techniques used by traders in the real world.
Abhinav Sharma, I've designed for and built ML systems
Updated Dec 15, 2013 ·

Upvoted by Tudor Achim, phd student working on machine learning
going through my Machine Learning course last semester, I felt like I
had the most catching up to do with Linear Algebra. I felt key ideas
from LinAlg are harder to remember over time than Probability. I found
myself to be mostly working with probability distributions, Bayes' rule,
MLEs and MAPs, while the algebra side of it was mostly optimization in
higher dimensions, was mostly Matrix calculus.

I discovered that
the Matrix Cookbook was popular with most students for working with
Matrix Calculus as it seems to have a never-ending list of  matrix

As far as brushing up on the rest of your Linear Algebra knowledge is concerned, I highly recommend Strang's lectures/book:

relevant topics include knowing about rank and inversion, SVD, and also
make sure you're very comfortable with eigenvalues and eigenvectors,
amongst other things.

Finally, with Analysis, I don't think ML
requires a formal introduction to Analysis at all. Its important to know
higher dimensional calculus well, especially parts related to
optimization, such as Lagrange multipliers, the primal-dual form, and in
general, the calculus of Matrices, and you should be good to go.

I think the case with Linear Algebra and Calculus is to work your way
through an ML book/course, and stop and look at the relevant math when
necessary, whereas you need a strong foundation in Probability right
from the beginning, and most textbooks on ML tend to talk a lot about
probability while skimming over the mathematical details of LinAlg and

Your response is private.
Is this answer still relevant and up to date?


Calvin John, Autonomous vehicle researcher; joining UCLA in Fall 2017

Let me first caveat what I’m about to say with this: go to graduate school.

To show you just how super-serious I am about this, I’m even going to
separate this caveat from the rest of the answer with one of the
ultra-cool line breaks.

at this point, I’m assuming that you are still solely considering
graduate school preparation without an undergraduate education. Let’s

My background consists of an undergraduate
BS in mathematics, a minor in physics, and a few years of research
experience that has spanned from charged particle detectors (physics/EE)
to autonomous vehicle system design for collision detection and
evasion. Long story short: I’m far more qualified to answer your question when robotics is emphasized, so that’s what I’m going to do.

Robotics is Multi-Disciplinary

is a highly multi-disciplinary field. In fact, I’d argue that it could
well be the academic field which encompasses the largest quantity of
distinct domains into its core structure. When we’re talking about
robotics, we’re really talking about

  • Computer science
  • Mathematics
  • Computer engineering
  • Electrical engineering
  • Control engineering
  • Systems engineering
  • Mechanical engineering
  • Physics (mechanics, more specifically)

even more impressive about the above list than its size is the depth of
each field. Aside from control and systems engineering, which are a bit
more specialized and less fundamental than the others, each of the
above domains are extremely broad—indicating that if you were to break
down robotics concepts into a networked graph, it would resemble
something like this:


to say, roboticist ultimately specialize in a much narrower range so
that expertise in a topic can be attained. But that doesn’t change the
fact that to pursue robotics, high breadth and versatility in
engineering and math is a tool whose utility can’t be overstated.

Specific Areas of Research

regardless of whether you want to pursue a masters or a Ph.D., you will
ultimately have to carve out a niche for yourself. As I mentioned
above, mastery of all robotics is a hopelessly daunting task; it’s
impossible. Therefore, it’s important that you expose yourself to the
different areas of robotics, and gradually hone in on your desired path
according to the topics with which you’re interested and at which you’re

Here’s my breakdown of robotics
research, in increasing order of mathematical abstraction and decreasing
order of hands-on engineering and building:

  1. Sensors. About
    as applied and hands-on as you can get, the domain of sensors works on
    expanding the current technical constraints that robotics hardware
    faces. It’s because of these guys that the iphone magically gets smaller
    and smaller every year, while also increasing its technological
    capacities. An example of the importance of this domain which is even
    more specific to robotics is radar evading drones. Remember when Osama
    Bin Laden got taken out because we flew a helicopter in Afghanistan that
    magically evades radars? Thanks sensors.
  2. Nano-robotics. Focusing
    on developing robotic systems on the micro-level, nano-robotics
    explores how robotic agents can be built and implemented on a scale
    sufficiently small that they can be directly inserted into your body.
    Sound scary? It shouldn’t. Nano-robotics has a plethora of game-changing
    medical applications, some of which include legitimately curing cancer
    and preventing aging.
  3. Machine vision. While the ability
    to process and interpret visual information comes very intuitively to
    humans, translating our abilities to an algorithmic environment in this
    matter has proven to be an intimidating process. In fact, I’d argue that
    the largest obstacle facing self-driving cars is machine vision. Just
    take a look at the self-driving car expert at Tesla who died because his
    car failed to distinguish between the bright sky and an incoming white
    truck. [2]
  4. Robotic learning. When machine learning is
    applied in a robotic context, it basically becomes robotic learning.
    Robotic learning is the overlap between robotics and machine learning;
    it approaches the problem of developing tools for adaptation and
    learning in robotic systems. Very cool field, with a lot of promising
    application, and very well suited for someone interested in machine
    learning and robotics.
  5. Robotic control. This is the area
    in which I’m currently nested. Control represents a mathematical
    approach to modeling the behavior and evolution of a Dynamical system - Wikipedia
    in relation to inputs, which can be used to affect the system’s output.
    The goal here is to mathematically demonstrate that a certain approach
    for input selection guarantees that the system’s output will quickly
    converge to a stabilized desired range, as illustrated in this kick-a**
    picture. [3]

you have stated that robotics and machine learning are your interests,
I’m going to assume your interests align with the #3–5 end of the
spectrum. But even when your interests are honed in on these two areas,
there is still a massive range of topics and skill sets spanned by these
two very broad domains.

Developing Skills for Robotic Learning

I’m far from an expert in robotic learning and machine learning, but
I’ll do my best to show some helpful tips for pursuing this domain. The
fundamental fields from which machine learning constantly draws, as I
understand it, are the following:

  • Probability
  • Statistics
  • Algorithms
  • Optimization
  • Systems

last one is a bit more of a stretch in comparison to the others, but
I’ve heard that a high portion of machine learning can actually be
approached from a systems perspective, and that its inception actually
arose from system theory modeling.

For probability and statistics,
both intuition and rigorous technicality will be important. I had a
horrible textbook which provided very little conceptual basis for the
theorems, and mostly included a bunch of isolated problems which were
crudely connected in a very disjointed way. I recommend Introduction to Probability by
Grinstead and Snell, [4] which provides a lot of clear,
well-articulated conceptual explanations which enhance both intuition
and precise reasoning on the subject. It’s also free and available
online, which ya’ know, is always a big plus.

Becoming comfortable with algorithms is
a task which can more easily be achieved in a college setting, but one
which is also very feasible to execute independently. Regarding a
textbook to guide you through key concepts to algorithm theory, I
recommend to look no further than the classic Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein. [5]

Additionally, I would look to two additional sources to continually expand algorithmic skills: Project Euler Archived Problems - Project Eulerand Topcoder Deliver Faster through Crowdsourcing.
Project Euler encompasses a diverse range of mathematical problems for
algorithmic development which will strengthen your mathematical
algorithmic thinking, and your “out of the box,” creativity. Topcoder
provides challenges which will improve your technical programming
skills, and diversify and expand your problem solving breadth.

Of course, once you have a solid background in the above topics, you’ll want to receive a comprehensive introduction to robot learning, for which I’ve been told that Robot Learning by Connell and Mahadevan is a solid choice. [6]

robotic learning and robotic control are distinct domains, robotic
learning is intrinsically tied in to concepts from control theory. In
fact, one of the most challenging problems facing the robotic learning
community is that it lacks the rigorous analysis and descriptions that
the control and systems theories possess.

example, a self-driving car that implements a series of clever robotic
learning algorithms will never be implemented without tools from control
systems. Why? Because without tools from control and systems theory,
you will never get close to demonstrating rigorous, mathematically
demanding qualities such as robustness, safety guarantees, stability,
etc., without which, the government wouldn’t let your self-driving car
see the light of day.

Robotic Control

I think that optimization, control, and systems are all presented and integrated very concisely in Design of Optimal Control Systems by
Bini. [7] This book consists of more than a minimal amount of knowledge
in any of these topics which is needed for machine learning. But a deep
understanding of at least some of the ideas shown in this book will
allow for insights to be drawn between these domains which most others
will likely not be capable of seeing.

Note that
I recommend the above for someone interested in both machine learning
and robotic control. If you’re primarily interested in robotic control,
then your mathematical skills need to be more sophisticated than the
vast majority of other engineers. This is likely the only engineering
discipline in which highly abstract mathematical fields play a
fundamental role. They include

  1. Real analysis
  2. Systems of Differential Equations
  3. Dynamical Systems (similar to 2., but distinct from it)
  4. Advanced Linear Algebra
  5. Advanced Optimization
  6. Basic Topology
  7. Set Theory (more than the basics, but not quite “advanced” set theory)

your mathematical skills have to be beyond the more applied end of the
spectrum in which things like formalities, proofs, theorems, and rigor
are almost never relevant.

For a comprehensive introduction to real analysis and topology that isn’t esoteric (difficult to find), I recommend Basic Analysis by
Lebl. [8] While the book isn’t intended for studying topology
specifically, it covers nearly all of the fundamentals which are
relevant to control. Note that real analysis is the most important item
in the above list.

Advanced Linear algebra is
the most difficult field for which to find an accessible, engaging
textbook, I.M.O.. The majority of the texts are far too focused on
minute, irrelevant details and burdensome proofs whose understanding
gains little insight regarding the deeper concepts. More importantly,
most textbooks totally fail to connect the ideas to deeper concepts
which are both cool and incredibly useful. After a lot of searching, I
found hope in an unexpected place: online lecture notes. [9] If you
master this book, and its difficult problems, to the point where you can
comfortably walk through the main concepts with a high school student,
then you’ll be five steps ahead of me.

As for dynamical systems, I’d say that Dynamical Systems by Sternberg does the trick. [10] Until you get to the more theoretical content like stability and invariance,
you really want to focus more on the concepts; the details aren’t
particularly important, surprisingly. You really just need to know what
kind of assumptions you have to make about the system you’re modeling.

Once you’re comfortable with most of the above, you can get your hands dirty with some actual control theory. For this, I recommend Mathematical Control Theory by Sontag. [11]


I have a hunch that’s not what you want to here, since you didn’t ask
for advice regarding this matter. So I’m sorry if this caveat irks you
in any way, but it’s the best advice I can give, and I think it’s
important for you to hear.

I’m a firm believer
in pragmatic optimism, and while it’s optimistic to believe that
admittance into graduate school—especially in a technical field—is
feasible without an undergraduate degree, it is far from optimistic.
Without an undergraduate degree, you are immediately excluded from
consideration for all departments at the majority of universities.

can’t find any specific statistics on this matter, so you’ll have to
choose whether or not to take my word for it. But trust me when I say
that I can currently think of one graduate school that doesn’t
necessitate an undergraduate degree as a strict requirement.

putting the strict requirements aside, for deeply embedded
multidisciplinary fields like robotics and machine learning, an
undergraduate education is crucial. Although I do think that the ability
to interact with professors; learn with faculty and peers in person;
and receive a curriculum designed by experts on which you are tested in a
competitive environment are all vital assets for initiating the
engineering experience in any field, they are especially true for

Another important distinction regarding your question is are you planning for a masters or a Ph.D.?

[1] Pawel Pralat: Graph Theory

[2] Tesla driver killed while using autopilot was watching Harry Potter, witness says

[3] Vehicle stability control systems: An overview of the integrated ...


[5] Introduction
to Algorithms, 3rd Edition (MIT Press): Thomas H. Cormen, Charles E.
Leiserson, Ronald L. Rivest, Clifford Stein: 9780262033848:

[6] Robot Learning | J. H. Connell | Springer




[10] Dynamical Systems (Dover Books on Mathematics): Shlomo Sternberg: 9780486477053: Books



· 34 Upvotes · Answer requested by Abdulmajeed Kabir

Nikita Zhiltsov, Computer science researcher at Kazan University; Textocat, co-founder & CTO
Updated Sep 12, 2014 ·

Upvoted by Yuval Feinstein, Algorithmic Software Engineer in NLP,IR and Machine Learning

A couple of years ago, based on his experience, Bradford Cross
gave a comprehensive list of the best resources on machine learning and
the prerequisites in his blog ("Measuring measures"). Unfortunately, it
appears to be down right now.

Here the blog post at WebArchive's mirror is:

Bradford's lists at Amazon:

  • Analysis [1]
  • Linear Algebra [2]
  • Probability [3]
  • Statistics [4, 5]
  • Optimization [6]
  • Machine learning [7]
  • Feature Selection [8]

I hope, Mr. Cross will be able to join the discussion.









UPD 2:
is the list of must-read books for theoretical machine learning [1],
which is attributed to prof. Michael Jordan (UC Berkeley). The sources
are [2] and [3].


[2] Learning About Statistical Learning

[3] AMA: Michael I Jordan • /r/MachineLearning



Osman Baskaya, Research Engineer @ Seven Bridges Genomics
There is a book named "Mathematics for Computer Science". It is also a lecture in MIT. This is the MIT OCW link of that course:

course materials are old by the way. Good news is that you can find the
book (composition of all materials) easily by searching. If I am not
wrong, the last revised version of this book is 6th May, 2012.

You need linear algebra as well. I recommend you for this reason, Gilbert Strang's "Linear Algebra and Its Applications". It may be little bit tough, but it is a great book.

If you want to dive into probabilistic approach, you can enroll Probabilistic Graphical Models course: I heard that it is very good course. Textbook of that course looks very useful:



Chomba Bupe, develops machine learning algorithms

The current machine learning (ML) algorithms are based upon mapping functions.


The function F

can be anything such as a support vector machine (SVM), a restricted
Boltzmann machine (RBM), a deep neural network (DNN) or anything else
that you can hand engineer yourself. In application areas, X represents the input space while Y

represents the output space.

In speech recognition X

might be a set of spectrograms while Y a set of identities representing the speakers. In image recognition, X is the raw image pixel space while Y is the categorization consisting of different classes in which xi∈X

can fall into.

Each ML model has parameters w

that affects the behavior of F

that we can normally adjust in order to change the behavior of that
function. We can thus write the mapping more conveniently as:



where yi^∈Y

We will focus on supervised ML model where we have a dataset T

of training input-output pairs in the form:


The goal of supervised machine learning is to find the best parameter values w^

that makes the function F

map the input-output pairs with the least error. So in supervised ML we have two main issues:

  1. Define a fitness measure that tells us how well the ML model is performing on the trainging set T
  • .
  • Generalization:
    We can run the same fitness measure on the test set after training is
    complete in order to measure how well the model generalizes to novel
    inputs. This is a very important concept in modern ML.
  • A learning algorithm to update the weights, w→w^
  1. .

is where the maths come in, to understand the underlying maths concepts
you need to understand what ML is trying to solve in the first place.
The aim here is to find solutions to those 3 issues mentioned above and
maths can help us with that.

1: A fitness measure:

This is normally done by an objective function also known as the loss/cost function:


where yi^

= actual output and yi

= desired output.

In empirical risk minimization[1](ERM) the goal is to to minimize the overall loss as defined by the risk R



ERM states that the learning algorithm should choose the hypothesis function f^

such that the empirical risk is minimized, In simple mathematical terms we need to solve:


Where f^=f(x,w^)

2: Generalization:

The above naive ERM can result in the function f^

just memorizing the training examples which can cause what is called overfitting, that is, fitting the function F to each and every noisy/outlier data point. That is not ideal thus instead we normally use structural risk minimization[2](SRM) whereby we add a regularization term C(w)

to the risk, thus we get the regularized risk:



Then in SRM we need to solve:


simply simplifies the weight parameters so that they don't model too
much of outliers or noise. That is done by penalization of large weight
values in w

which are a cause of most overfitting issues. Thus L0 norm can be used in order to favor a very sparse set of weights whereby most weight values are zero. You can also use L1 or L2 regularization instead as the L0

norm is hard to optimize. Other weird regularization methods have since
popped up such as dropout, which is used in learning algorithms for
DNNs whereby neurons are randomly dropped out and back during training
so that the overall network becomes robust to noise, dropout can be
loosely seen as an ensemble method.

3: A learning algorithm:

in current ML can be viewed as a way to update the weights in order to
find the optimal parameters. ERM and SRM both are relying on the
existence of a learning algorithm for weight adjustment. We need an
algorithm to find the weights that solve.




We need a way to update the model such that


current ML systems we just look to the old idea of gradient decent (GD)
from numerical optimization. In GD we simply just move down the
steepest slope on the error (risk) surface defined by the risk R

. That means we can just use the update rule defined by.


where t

=step count, α

=learning rate

Here we assume a convex surface defined by R

but in practice especially for DNNs the surface is highly non-convex
but in practice almost any local minima is just good enough, plus we can
add momentum to the update rule so that it can escape from the local
minima traps easily. Also the shear number of parameters makes it harder
for the DNN model to get trapped in a local minima trap as there are
many possible escape routes through the other many dimensions.

In DNNs the gradient computations can become cumbersome even for a modern machine as the number of gradient steps needed to hit w^

are normally large. Thus we need fast ways to accelerate gradient
computations for layered architectures. Backpropagation (backprop)
algorithm, to be specific, is a way of computing gradients extremely
efficiently in any differentiable computational graph. Backprop uses
chain rule by starting from the output layer which is directly connected
to the loss function and hence easier to evaluate the derivatives and
then move towards the layers (input layer) far away from the output
layer while chaining the derivatives. It is called backprop because
errors are passed from back layers towards the front layers thereby
saving a lot of repeat computations.

requires that all training pairs are considered before taking a single
small update step, this is not scalable. Thus in practice we have the so
called stochastic gradient descent (SGD) that takes a step just after
one example, this is so efficient that it is normally a standard
learning algorithm for DNNs together with backprop. There are batch
variants of SGD which you can consider as being inbetween SGD and GD,
the batch gradient descent approach uses a small random set known as the
batch of training examples that it uses to approximate the gradient
field via backprop algorithm. Thus SGD can be seen as the batch variant
with just 1 example in the batch.

to learn the maths theory behind ML start from the underlying goals of
ML which we have looked at in this discussion. Of course this was just a
tip of the iceberg, but the best way to see most ML models is that they
are function approximators and we wish to recovery those approximations
from input-output training pairs alone, which we call end-to-end

It also helps to visualize ML as just
optimization theory. We have a loss function and all we need is an
algorithm that helps us find the right settings such that the loss is
minimized. In practice SGD+backprop works very well for training modern
ML models.

You need to also try and implement
some of these algorithms yourself from scratch. Try to implement
backprop and SGD for a multi-layer neural network (NN), not a deep now,
then try it on MNIST dataset. You can only learn via practice, make sure
before implementation you go through backprop and derive it for
multi-layer NNs and convolutional neural networks (convNet).

be too much in a hurry though, concepts take time to make sense. In
order to help yourself assimilate the stuff a bit easily, solve some
problems and try to also explain the systems to others via platforms
like Quora, that way you will start to have more and more confidence in
your understanding of the maths behind ML algorithms.

Hope this helps.


[1]Empirical risk minimization - Wikipedia

[2]Structural risk minimization - Wikipedia


· 12 Upvotes · Answer requested by Pankaj Sharma

Florian Courtial, Software engineer, machine learning lover and java enthusiast.

Some people say that mathematics are useless for a software engineer, machine learning proves them false.

are the prerequisites for machine learning because machine learning is
math. The computer is only useful to do the calculus.

You'll mainly need to learn calculus, matrix calculation, linear and non linear algebra, statistics and graph calculus.

Let's take a basic ML algorithm, the linear regression.

goal is to use some data to find a function which takes parameters and
gives an output. Data are used to find the function and test it. In the
future, we will use the function with some parameters and we will obtain
an approximate output.

Let's say our data are
about planes, as input we have the number of miles travelled by the
plane and its age. As output we have the price of the plane. I don't
normalize data to keep things simple.

A sample of our data could be :





Our question is : Given the miles travelled by a plane and its age, give a price.

linear regression (gradient descent) we will find a vector theta. This
vector has two values, theta[0] and theta[1]. To find an approximate
price we will multiply the miles by theta[0] and the age by theta[1] to
obtain a result, which is an approximate price.

instance our algorithm could find theta = [2;-10 000] and if we have a
plane 5 years old with 78 000 miles, we can than approximate the price
doing 78 000 * 2 + 5 * -10 000 so 106 000 dollars.

The hard part is to find the good values for theta. To do that you need some maths.

have a cost function that give you how good your theta is, this cost
function tests your theta values using your data (which already have a
price for a plane regarding the miles and the age).

So your goal is to minimize the cost function by adjusting your theta values.

The cost function to minimize is this one :


Using the batch gradient descent algorithm each iteration adjusts the theta values using this formula :

you test the theta value with the previous function J(theta) and you'll
see that the cost (ie. the diff between the predicted value and the
real) will decrease at each iteration.

As you can see this simple ML algorithm is math. The computer will be useful to compute the previous formulas.



Pankesh Bamotra, ML is recollecting what humans have learnt so far
is too vast a subject to be considered for this question. The breadth
and depth of mathematical awareness you require for machine learning 
totally depends on what you are learning in the subject. Keeping this in
mind, let's deal with what you need to know in "mathematics" for 
machine learning.

1. Probability and mathematical statistics 
This is a fundamental requirement for machine learning and so you need
to know well. When I say probability it's more than what you studied in
High school and almost everything you probably not paid attention to
during your undergrad. You need to know about Random variables, their
distributions, probabilistic convergence, and estimation theory. That
covers a major part of what you need to know here.
Two of my favourite resources are:-
1. Joseph Blitzstein - Harvard Stat 110 lectures
2. Larry Wasserman's book - All of statistics

2. Linear algebra
algebra will pop up every now and then in ML. PCA, SVD, LU
decomposition, QR decomposition, symmetric matrices, othogonalization,
projections, matrix operations are needed many a times. The good thing
is that there are countless resources available on linear algebra.
My all time favourite is Gilbert Strang's MIT lectures on linear algebra.

3. Optimisation
only a few things from optimisation are needed most of the time, a
strong foundational knowledge will help long way. You need to know
Langrange multipliers, gradient descent, and primal-dual formulation.
The best resource on this is Boyd and Vandenberghe's course on Convex
optimisation from Stanford.

4. Calculus
wanted to put this on the top, but I'm putting it in the last just to
emphasise on the fact that only a fundamental knowledge is needed in
terms of calculus. Know about 3-D geometry, integration, and
differentiation and you'll survive. It's the easiest to start with
amongst the topic I've mentioned here. MIT has good lectures on

I think with these 4 tools you'll most likely find ML
easy to understand. Other than these you may find real analysis and
functional analysis relevant too, but they are just formal
generalisations of the topics mentioned before.



Arik Beremzon, Health economics graduate, international politics etc.

From a beginner.

An introductory Linear Algebra course will generally include the following:

  • Vectors
  • Vector Spaces
  • Matrices
  • Inner Product Spaces
  • Orthogonality
  • Projection
  • Linear transformations
  • eigenvectors, eigenvalues
  • change of bases
  • Various decompositions: LU, Polar, SVD.

I also had some geometric algebra, but haven't found that useful so far.

Probability and statistics:

  • probabilities
  • combinations
  • permutations
  • distributions
  • Understanding of hypothesis testing
  • Descriptive statistics: Means, modes, standard deviations, variances etc.

If you can get through:


You are good to go.



Justin Rising, MSE in CS, PhD in Statistics
There's a recent CMU course called Computer Science Theory for the Information Age
which includes a lot of the math for machine learning.  There's also a
draft textbook there which is well worth grabbing a copy of.



The machine learning field needs the following mathematics background to understand more things.



Scott Triglia, MS in Machine Learning
you are truly looking for a one-stop reference, the best that I can
suggest is Chris Bishop's Pattern Recognition and Machine Learning (
Although it is quite difficult to start with, it will cover the
majority of your interests until you are well versed enough in the
subject to be able to read publications and more specific texts.

in doubt, MIT OpenCourseWare is always a good source -- I believe they
even offer one or two machine learning courses at the graduate level.

Good general reference/tutorial texts:

  • Information Theory, Inference and Learning Algorithms -- McKay
  • Introduction to Probability Models -- Ross
  • AI: A Modern Approach -- Russel & Norvig
  • Algorithms -- Kleinberg & Tardos


Fluff Miller, PhD Machine Learning & Data Science, University of Manchester
Answered Feb 19, 2013 ·

Upvoted by Yuval Feinstein, Algorithmic Software Engineer in NLP,IR and Machine Learning
ML theory:
Bishop - Pattern Recognition & Machine Learning.   First time I
picked this book up it was pretty daunting, but once you  get a bit of
the maths under your belt, I found that it presents clearer 
explanations than other texts.  I found it really clearly laid out and 
it seems to progress pretty well.  It also covers a lot of stuff. 
Linear Algebra:
Strang videos on linear algebra are excellent, so are the Khan academy
ones.  The Gilbert Strang book doesn't seem to get particularly great
reviews.  On the basis of reviews, I picked up a copy of Howard Anton's
Elementary Linear Algebra which seems to be very highly regrarded.  I
would recommend it. I also have David Poole: A Modern
Introduction...which feels a bit more......modern than Anton and I have
tended to use it more.  Doesn't seem to be a particularly well known
book on t'internet, but I find it very clear (more so than Anton).
you want to practise, then there's Schaums Outline of Theory and
Problems of Linear Algebra.  (Good for practising but insufficient as a
standalone text to the subject)

If you have the luxury of having
some time before starting on Machine Learning, I would suggest really
focusing on linear algebra in a very hands on way (working through
structured examples) and getting a good understanding of orthogonality,
vector spaces, eigenvectors, transformations.  From my experience,
trying to learn the maths at the same time as learning Machine Learning
was overwhelming and I would have got a lot more out of ML lectures if I
had already got a grasp of the maths.



Daniel McLaury, Ph.D. Student in Mathematics at University of Illinois at Chicago

want to know calculus up to vector calculus, a first course in linear
algebra, and a good course in calculus-based statistics that actually
explains what the concepts mean (as opposed to "if you're trying to do
this, you should press the chi-squared-test button" like you see in a
lot of classes.)  A discrete math course would be nice just for
background on notation, although you don't actually need to know any
nontrivial discrete math.



Shehroz Khan, ML Researcher, Postdoc @U of Toronto

for ML is no different from what you learn in high school or in
under-grad studies. If you have that mathematics base, most of the time
it is sufficient to understand what's going on in those creepy equations
you see in books and research papers. However, sometimes more than that
is required and you may have to take some advanced courses in
statistics, calculus, linear algebra etc. You may also like to read in
general more about How do I learn machine learning?


· 3 Upvotes · Answer requested by Shuvanon Razik, Francisco Sosa, and 1 more

Yuval Feinstein, Algorithmic Software Engineer in NLP,IR and Machine Learning

Please see How do I learn mathematics for machine learning?
which has some good answers. I believe the Witten et. al. book is one
of the most accessible introductions. I guess a basic book on statistics
and probability and another one on Linear Algebra (For example Strang,
4th edition [1]) will take you most of the way there.



Lucian Sasu, Teaching humans and machines since Y2K.
See the list of books from Learning about Machine Learning, 2nd Ed., or Michael Jordan's list: Mike Jordan at Berkeley sent me his list on what people should learn for ML. The...Of course, correlating with these lists does not necessarily imply causation :), but these are good starting points.


· 19 Upvotes · Answer requested by Francisco Sosa

Darshan Hegde, MS in Data Science at NYU. Interested in Deep Learning and Natural Language Processing.
Here is my view on, how to learn enough math online for free.

Teach yourself Machine Learning the hard way ! and follow up Teach yourself Machine Learning the hard way ! (Part 2)

It lists many pre-requisites that you need to understand and also some of the advanced stuff in part2.

I hope this helps.



Sukrit Shankar, Computer Vision, Machine Learning & AI Researcher @ University of Cambridge

With regards to mathematics for machine learning, I reckon all of the following skills are important:

(1) Some Basic Mathematical Skills (Linear Algebra, Probability, Optimization)

(2) Knowing how those mathematical skills are exploited for machine learning algorithms

(3) Developing a way to understand mathematics, so that any advanced maths for modern machine learning can be well comprehended.

one would generally recommend all sorts of linear algebra and
probability books for machine learning, I feel those are not always
worth the time at least for machine learning. I would recommend
following texts to read through (perhaps in order), which should cater
to the above three mentioned points.

(a) Pattern Recognition and Machine Learning by Christopher Bishop (Will cater to 1 and 2 above)

(b) Deep Learning book by Goodfellow, Bengio and Courville (Will again strengthen 1 and build on 2)

Understanding Machine Learning by Shai Shalev Shwartz and Shai Ben
David (Will advance your skills in 1, strengthen 2, and give an insight
to 3)

(d) Ankur Moitra’s rather short but useful book on Algorithmic aspects on Machine Learning (Will mainly cater to 3)

(e) Optimization for Machine Learning by Sra, Nowozin and Wright & Off the convex path by Sanjeev Arora and collaborators (Will cater to 3 and advance 1 and 2)

truly believe if one can properly understand the above stuff in machine
learning, he will develop all the Maths basics needed for machine
learning, that too in a very connected form !! Hope this helps !!



Halil Lacevic, studied Computer Science at Faculty of Electrical Engineering Sarajevo

I won't say that you “learn” math. I would rather say that you train math.

you want to train boxing and your coach is teaching you directs, low
kicks and high kicks. No matter how many times he shows you how to kick,
you can't do it perfectly. You do know that it takes patience, hard work and effort
to finally learn how to punch and you need to keep trying and training.
After so many trys you can finally say that you can actually punch.

Whats the point?

Math is the same. Consider direct punches your formulas, low kick your theories and high kicks your
solutions to problems. No matter how many formulas or theories you
know, no matter how many times you've seen solutions you just can't do
it perfectly. Why? Because you need to train those formulas, train those theories and knock out those problems with a damn good high kicks. And how do you do that?

  • Do as many problems as you can on a daily basis. It is not going to happen overnight, it takes time to train those kicks.

Wanna learn it fast? Better start now!



William Emmanuel Yu, Fell in love with computers at a young age and never looked back.

Brush up on your statistics and probability. This is definitely critical particularly for supervised learning methods.

Some also require a good deal of number theory knowledge especially when discussing SVM, PCA and friends.

you are planning to take a Ph.D. and move the science further you might
want to narrow your focus to a particular area for your research while
working with your candidate adviser.



This is not an exhaustive list of topics. Best read in this order:

  • Linear Algebra
  • Vector Calculus
  • Statistics and information theory
  • Discrete Math
  • Convex Optimization
  • Probabilistic Graphical models

I believe there is a book : which can help you get a good head start.



I will try to keep this as concise as possible.

Edit: Somebody merged the original question to this question, so the premise becomes irrelevant.

To become a full stack AI/ML engineer, it is imperative that you have a complete grasp of the mathematical foundations
of ML so that you can build upon concepts easily. The basic
mathematical skills required are Linear Algebra, Matrix Algebra,
Probability and some basic Calculus.

Linear Algebra

The best source to study Linear Algebra is Prof. Gilbert Strang’s Linear Algebra book/course. Video Lectures | Linear Algebra | Mathematics | MIT OpenCourseWare
(MIT OCW). There are 34 lectures and believe me, they are completely
worth it as after completing this, linear algebra should not pose any
more problems for you. Solve some exercises/exams if you want to achieve
mastery (recommended).

Matrix Algebra

Matrix algebra is an essential component of deep learning. I personally recommend this (Matrix Cookbook by Kaare Brandt Petersen & Michael Syskind Pedersen):
(PDF). There are 66 pages of pure matrix operations and this is the
absolute “go-to” in case you are stuck trying to understand certain
matrix manipulations that a researcher might have done.

Probability & Statistics

probability is a very important aspect of understanding ML. Some of the
key probability concepts that you must be aware of include Bayes’
Theorem, distributions, MLE, regression, inference and so on. The best
resource for this is Think Stats (Exploratory Data Analysis in Python) by Allen Downey:
(PDF). This absolute gem of a book is 264 pages long and covers all the
aspects of probability and statistics that you need to understand with
relevant Python code.


The go-to book for Convex Optimization is Convex Optimization by Stephen Boyd and Lieven Vandenberghe:
(PDF). This is a 730 page book and you need not read it all in one go.
Choose the concept which you need to learn depending on your
requirements and interest and read that part. It is complete and
extremely well written. This book is free as part of the CVX 101 MOOC on

This 263 page book on metaheuristics, Essentials of Metaheuristics by Sean Luke (
(PDF)) talks about gradient based optimization, policy optimization
etc. and it is well written. One can choose to go through this also if

Data science concepts are covered
in the above topics. Other topics can be learnt by googling for sources
easily as and when you encounter them. But complete understanding of the
above should suffice for 95% of all scenarios.

mastery of the above topics will surely make you a mathematically
strong AI/ML engineer. Now that you have built the foundation, start
dipping your feet into research papers. They are absolutely
essential as these clearly show the standards of AI
researchers/engineers. Firstly, find out the famous papers of AI like
RNN, LSTM, SVM etc. and go through the technical content.

Can you understand the jargon?

Can you understand the mathematics?

Can you implement the mathematics in code now without the help of overly sufficient libraries?

These are the key questions to be answered. Once you can answer “Yes/Mostly Yes” to these 3 questions, you are good to go.

After trying to read these papers dealing with the most popular concepts, try to read the not-so-famous papers. arXiv
is a great site with hundreds of preprints being published everyday by
top researchers and reading the papers from here is like drinking
straight out of the fire-hose. Try to choose a paper which looks fairly
well written and the abstract seems interesting. Then, read that paper
and try to answer those 3 questions again. The same can be done with
papers of top AI conferences like NIPS, AAAI, AAMAS, IJCAI, ICML etc.
You may not be able to fully implement the papers due to data
constraints and other issues, but if you are able to understand even 60%
of the mathematical reasoning, then I can safely say you have completed
your training.

Do not concentrate on learning more and more “packages”.
Concentrate on the concept. While implementing, you will automatically
see that you require “this” package and then you will automatically
learn to use it. Learning the various commands of random packages won’t
help. If you start implementing and writing codes to solve problems or
simulate results from a paper, you will automatically learn about
packages and use them appropriately; they’ll be the least of your
concerns. This is the correct way to maintain “balance” between math and coding.
You can also participate in competitions (e.g. Kaggle or conference
competitions) to improve speed, development and processing skills if you
feel the need to do so.

Alternatively, you can choose to pursue a doctoral degree (like me :P ) in AI/ML to gain a complete in-depth understanding of everything discussed here and more.

(All the links in this answer are working as of 6th July 2017)



Ivo Danihelka, Self-improving programmer
The needed mathematics includes:

They will make your later reading much more pleasant. You will be able to devise your own proofs.

Terence Tao put multiple math-learning advices on his blog:



Krishna Kumar Sekar, B.S Electrnics and Communicaton & Mathematics, Noorul Islam College of Engineering (2012)

started writing the github awesome page for that ,it may help ,its
having topics from basic machine learning maths to advanced and quantum
machine learning


Thanks and Regards





Prasoon Goyal, Have been working in Machine Learning for a few years


To have a basic mathematical background, you need to have some knowledge of the following mathematical concepts:
- Probability and statistics
- Linear algebra
- Optimization
- Multivariable calculus
- Functional analysis (not essential)
- First-order logic (not essential)
can find some reasonable material on most of these by searching for
"<topic> lecture notes" on Google. Usually, you'll find good
lecture notes compiled by some professor teaching that course. The first
few results should give you a good set to choose from.

For instance, here’s a list of some lecture notes that I just found:

Probability & Statistics :

Linear algebra :

Optimization :


Matrix Calculus :

should skim through these, without going into too much detail. You can
come back to studying the topics as and when required while learning ML.


· 37 Upvotes · Answer requested by Jasdeep Rana

Ansup Babu, Working on Uncertainity Quantification

you want to be a real Data Scientist Not the fake ones with skills of
Analyst and not any mathematical intuition or point of view. Real Data
Scientist Need to have very strong mathematical grounding.

So to learn Mathematics for ML this should be the order :-

  1. Start with probability ( Conditional Basic Marginal etc …)
  2. Mathematical Series and Convergence , Numerical methods for Analysis
  3. Matrix and Linear Algebra
  4. Bayesian Statistics
  5. Vectors ( Most Important)
  6. Calculus
  7. Markov Process and Chains
  8. Basics of Optimization ( Linear/ Quadratic)
  9. Advanced Matrix Algebras and Calculus ( Gradient , Divergence , Curls etc)

This much mathematics will enable the understanding behind the core ideas of ML and probabilistic algorithms,

You should pause now and start analysing certain Packages from Scratch in Python :

1. K-NN is great starting point learn it , and code it from scratch.

2. Logistic Regression with Gradient Descent.

now you can see the parameters and numbers moving in a matrix form ,
and understand the mathematics of prediction, And if you feel this is
enough. Hold your breath. There is more exciting stuff to come. This
will enable you to be a beginner of being a “Real Data Scientist”.

Next Start with :-

  1. Stochastic Models and Time Series Analysis
  2. Differential Equations
  3. Dynamic Programming and Optimization Techniques
  4. Fourier's and Wavelengths
  5. Random Fields
  6. Basic Knowledge of PDEs
  7. Techniques to solve PDEs using Monte-Carlo , Polynomial Expansions.

mathematical techniques will help you visualize the model’s working and
how to model and process raw data to create unique models whose
functionality can be tuned. Parameters can be optimized for the problems
and fine tuned with these techniques.

For a Next Level Up:- ( Statistics of Higher Dimensions)

  1. PDEs numerical solution with numerical input/ random input. ( fascinating subject to work on )
  2. Stochastic Differential Equations and Solutions
  3. PCA etc
  4. Dirichlet Processes, Markov Decision Process.
  5. Uncertainty Quantification - Polynomial Chaos, Projections on vector space

think these are subject which one must learn to be a good Machine
learning engineer in 21st century. with a knowledge base like this one
can connect dots very rapidly and build systems and model of high

( I am not a big fan of Neural nets, forgot to mention here)



Eugene Insall Jr, I'm a mathematician.
Algebra is important in many ways, but you really need to learn some
logic.  I don't mean the babiest of baby things that people say is easy
because they can understand and track and foretell the end of a mystery
novel in a tv series like Foyle's War or CSI or Sherlock Holmes.  I
don't mean the intro to logic course in many philosophy departments.  I
don't mean the Boolean circuits course you may have taken as a freshman
in computer engineering or the simple truth table arguments you did and
eventually turned in to graph theory problems in a second semester or
second year computer science course called ``Discrete Math''.  I don't
mean the simple arguments you went through in your modern algebra course
as a senior in a mathematics department.  But all of those can be
useful, and are pre-cursors or example generators for a beginning course
in Model Theory.  Then you can begin to truly appreciate the NOTION of a
THINKING MACHINE, and what it means to model such a monstrosity.  Then
you can begin to understand how to develop formal languages for solution
of specific problems.  Then you can start understanding why it's really
strange to model THINKING as a neural network, although that is not a
completely useless way to do it.  (Basically, neurel networks seem to me
to be ``pattern recognizers'', roughly, basically using fixed-point
iteration in metric spaces to hone in on a pattern or set of patterns of
behaviors of inputting agents.  Please note that I said ``roughly''. 
This is not intended to be a tutorial on neural networks.)

course, it helps to have a notion of what it means to define or model
the concept described by the verb ``to learn''.  That, my friends, is
the realm of philosophy and pedagogy, but to apply it requires an
understanding of the notion of a model, and we are back to my main
point:  Take some model theory.  It's not likely to hurt you for more
than a semester, and well...




Dan Dunay, Computer Scientist, Administrator, RPCV, Philosopher

addition to Martin Thoma's great answer, I'd study up on the "Theory of
Computation". Text books abound, but they are expensive. Search on the
web. Wiki has an overview, but it's won't make sense until you've
studied a bit. Still, it may show you what you've missed.


· 1 Upvote

Carlin Eng, Data analyst and programmer.

Theorem is a fundamental concept of probability that underpins many
extremely important algorithms, from the very basic (e.g., Naive Bayes)
to the quite complicated (e.g., Latent Dirichlet Allocation).

linear algebra, a solid understanding of eigenvalues and eigenvectors is
important for topics such as principal component analysis, factor
analysis and other dimensionality reduction tasks.



Rafael Gustavo da Cunha Pereira Pinto, Systems analyst at Petrobras (2006-present)

I would suggest reading as much Linear Algebra books as possible, followed by some probability and statistics texts.

the first, I suggest Gilbert Strang's "Linear Algebra and Its
Applications", while for the second, "Probability, Random Variables And
Random Signal Principles" by Peebles is a good choice.

previous answer suggests Convex Optimization text, which I also
recommend. A good text is "Convex Optimization" by Stephen Boyd, which
is also available for free in the author's website.



took both Andrew Ng's Machine Learning class and Sebastian Thrun's AI
class. I liked the Machine Learning one more - even though AI class
touches more topics, but it does it in a haphazard way. ML class is
narrower, but more practical and focused. It helps to keep a link to
Khan's linear algebra videos handy.



Shashank Gupta, Learning the nuts and bolts of Machine Learning.

For understanding Machine Learning you need following Mathematics prerequisites :

1. Probability and Statistics : Machine
Learning has deep roots in Statistics. In fact the modern Machine
learning is essentially Statistical learning i.e using stats to find
patterns in data and inferring using them. So Stats and Probability are
bare minimum for ML.

2. Linear Algebra : This
is required because data is represented as matrix in Machine Learning
and essentially all ML algorithms can be seen as Matrix manipulation in
the end so basic understanding of Linear Algebra is required.

3. Optimization : Many
people argue that Machine learning is a fancy name for optimization.
While this is true to certain extent there is more to ML than
optimization. But a large part of it is indeed optimization. In the end
mostly all ML algos come down to some optimization task.

4. Calculus : This
is a very useful tool for ML. Most ML algos rely on Differential
Calculus to find solutions (Gradient Descent, Newton's method, quasi
Newton's method etc.).

IMO if you master these
topics than you can learn pretty much anything in ML, because all
algorithms are essentially application of these tools in Ml.



Conditional probability, random variables, pdfs etc. Whatever you'd
learn in your undergrad probability course and a bit more.
Some bit of stochastic processes (like markovian processes, etc)
Linear Algebra: Data analysis and machine learning builds up ALOT on these concepts.
Algorithms: not as important, but still important when it comes to optimizing your solution. some graph theory
Basic Linear Programming and recognizing convexity and relaxation.

guess once you have a fair idea of most of these concepts, then its
pretty simple to pick up the intuition behind any algorithm and where it
would work/ how to improve it/ etc.


· 1 Upvote

Nayan Gupta, Connecting the dots of curiosity
It depends on your goals you want to achieve while learning ML.

Goal 1: To understand what is ML, how to apply different algorithms to the task, how interpret the output,common pitfalls etc : You
will need to have a grasp on linear & matrix algebra,probability
and optimization. You don't need to take a deep dive into each one, but
study basic things like eigen vectors,conditional
probability,distributions, bayes theorem.Additionally learn the concepts
of overfitting, cross validation.

Video Lectures : Machine learning on coursera ( not only the  one by AndrewNg, there are few others as well)

Books : Machine Learning by Tom Mitchell. It includes the necessary linear algebra and probability too.

Goal2:Why the current algorithms are designed in a particular way, How do they fundamentally differ from each other:
If you are more interested in the theoritical aspects like how kernels
of  a support vector machine are defined, or how deep learning neural
networks are designed or how to tweak the existing algorithms to make a
new one,then you might want to extend your mathematics to functional
analysis, topology , advanced optimization.

Resources : Video: Advanced Machine learning, Caltech (Prof. Mostafa's Lecture)
: Mining of massive datasets by ullman, Any good books on advanced
linear algebra, topology but they wont connect it to machine learning.



Harsh Prasad, Big Data Analytics Centre at SNU

mathematics is about doing. Remember the 80/20 Rule : You must study
theory 20% of the time and practice/implement what you learn 80% of the

Here is a list of books you could use. You can find accompanying online courses for many of them.

1. Strang's Linear Algebra and its Applications

2. Apostol Calculus - Both the volumes

3. Golub's Matrix Computations

4. Sheldon Ross' Probability

5. Elements of Statistical Learning by Hastie et al

6. Bishop's Pattern Recognition and Machine Learning

7. David Barber's Bayesian Reasoning and Machine Learning

8. Kevin Murphy's Machine learning: a Probabilistic Perspective

9. Wasserman's All of Statistics and Non-parametric Statistics

From Hacker News :

1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.

2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.

3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.

4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.

5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.

6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.

7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.

8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.

9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.

10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.

11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.

12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.

do try implementing as many things as you can. Pick up a project. Talk
to your peers and professors and people, see if you can help them with
what you've learned. Do.



Meenakshi Deenadayalan, Learning ML for a year now

Some algorithms are really sweet they are available in Wikipedia with formulae, implementation and application.

Some dodge you till you watch two or three YouTube videos (Victor laverenko, Bert huang, udacity or MIT lectures)

are really mischievous, you got to do a lot of research, they test your
patience and perseverance more than your mathematics!

And there are lots of books you can read.

How to learn a particular algorithm?

  1. First
    from the business point, learn why to use a algorithm and not any other
    counter part of its. Like why Fuzzy K-Means instead of K-Means.
  2. Secondly from an analyst point of view. Learn how to use the algorithm to solve some use cases. What is it meant to do.
  3. Last will be the mathematics. The how of the algorithm. And more research to enhance the algorithm and patent it.

P. S. It is normal to not understand in the first go.

P. P. S. And very normal to get totally confused in the second and third.



Michael Arthur Bucko, Machine learning consultant.
For the foundations of data crunching, from Microsoft, 2014, Page on Thank you, Microsoft, for sharing.



Valeryia Shchutskaya, Marketing, InData Labs

I work for a Data Science and AI company called InData Labs and on of
our tech experts has recently prepared a short guide to learn neural
networks, hope it is helpful for you:

A short guide to neural networks. Master them and become famous.


· 1 Upvote

is important part for learn machine learning. Necessary topics and
useful resources of mathematics for machine learning?

i am sharing weightage of machine learning important mathematics topics
and making your confusion very clear. So see below list and start
preparation according it.

35% - Linear Algebra

25% - Probability Theory and Statistics

15% - Multivariate Calculus

15% - Algorithms and Complex Optimizations

10% - Others

i am taking forward my article in deep level so you get totally
clearance to start machine learning or artificial intelligent .

  1. Linear Algebra:
    Topics such as Principal Component Analysis (PCA), Singular Value
    Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition,
    QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization
    & Orthonormalization, Matrix Operations, Projections, Eigenvalues
    & Eigenvectors, Vector Spaces and Norms are needed for understanding
    the optimization methods used for machine learning. The amazing thing
    about Linear Algebra is that there are so many online resources.
  2. Probability Theory and Statistics:Probability
    Rules & Axioms, Bayes' Theorem, Random Variables, Variance and
    Expectation, Conditional and Joint Distributions, Standard Distributions
    (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment
    Generating Functions, Maximum Likelihood Estimation (MLE), Prior and
    Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
  3. Multivariate Calculus:
    topics include Differential and Integral Calculus, Partial Derivatives,
    Vector-Values Functions, Directional Gradient, Hessian, Jacobian,
    Laplacian and Lagragian Distribution.
  4. Algorithms and Complex Optimizations:
    Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc),
    Dynamic Programming, Randomized & Sublinear Algorithm, Graphs,
    Gradient/Stochastic Descents and Primal-Dual methods are needed.
  5. Others:
    This comprises of other Math topics not covered in the four major areas
    described above. They include Real and Complex Analysis (Sets and
    Sequences, Topology, Metric Spaces, Single-Valued and Continuous
    Functions, Limits, Cauchy Kernel, Fourier Transforms), Information
    Theory (Entropy, Information Gain), Function Spaces and Manifolds.

you are thinking and looking for best knowledge and practice resources
for your week points right? Don’t worry learners i would also like to
suggest some few good resources for it.

Thank you. Keep Learning.



Yin Zhu, studied at Hong Kong University of Science and Technology

Especially convex optimization. E.g. gradient based methods for
non-linear optimiztion (L-BFGS method and conjugate gradient), quadratic
programming, etc.



Tyler Renelle, React & Machine Learning Developer at OCDevel

I made a podcast episode on the math you need for machine learning, and the resources for learning (if you like audio): Machine Learning Guide #8

Leonardo Federico, Where analytics ∩ automation.
I would recommend to take this course: Course Introduction - Amazon Machine Learning

It covers most of the math you need to get started with machine learning.



Ankur Raj, Author of Quasi-Based Hierarchical Clustering algorithm

There are many reasons why the mathematics of Machine Learning is important and I will highlight some of them below:

  • Selecting
    the right algorithm which includes giving considerations to accuracy,
    training time, model complexity, number of parameters and number of
  • Choosing parameter settings and validation strategies.
  • Identifying underfitting and overfitting by understanding the Bias-Variance tradeoff.
  • Estimating the right confidence interval and uncertainty.
  1. Linear
    algebra is a cornerstone because everything in machine learning is a
    vector or a matrix. Dot products, distance, matrix factorization,
    eigenvalues etc. come up all the time. Gilbert Strang’s linear algebra course i would recommend
  1. Multivariate Calculus:
    Some of the necessary topics include Differential and Integral
    Calculus, Partial Derivatives, Vector-Values Functions, Directional
    Gradient, Distribution.Differentiation matters because of gradient descent. Again, gradient descent is almost everywhere . some courses i recommend
  1. Probability Theory and Statistics:
    Machine Learning and Statistics aren't very different fields. Actually,
    someone recently defined Machine Learning as 'doing statistics on a
    Mac'. Some of the fundamental Statistical and Probability Theory needed
    for ML are Combinatorics, Probability Rules & Axioms, Bayes'
    Theorem, Random Variables, Variance and Expectation, Conditional and
    Joint Distributions, Standard Distributions (Bernoulli, Binomial,
    Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum
    Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori
    Estimation (MAP) and Sampling Methods.
  1. Algorithms and Complex Optimizations:
    This is important for understanding the computational efficiency and
    scalability of our Machine Learning Algorithm and for exploiting
    sparsity in our datasets. Knowledge of data structures (Binary Trees,
    Hashing, Heap, Stack etc), Dynamic Programming, Randomized &
    Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and
    Primal-Dual methods are needed.
    1. Boyd and Vandenberghe's course on Convex optimization from Stanford.

Given all that , ML is not all about Maths and to frank Starting you will hardly spend 5% of your effort doing maths



Vaibhav Aparimit, studied at Indian School of Business

I wrote a detailed medium post on this. You can read it here Math for Deep Learning is not Merlin’s Enchantment – Vaibhav Aparimit – Medium

of, I really like your question. You seem to implicitly understand that
math is an essential skill required to grasp the underpinnings of
machine learning .

If your question was around
deep learning, I would say linear algebra for 95% cases. In case of
machine learning you would need to know probability ( especially bayes
rules and conditional probability) , differential calculus and linear
algebra( matrix multiplication, Eigen vectors , determinants , Hessians )

Hope this helps .



Colorado Reed, Tool Builder, Berkeley CS PhD Student

You may find Metacademy helpful when trying to understand the prereqs for various concepts in machine learning: Concepts - Metacademy



Prashant Sharma, Computer Science major

You must have a sound understanding of at least the following (there might be others which are not there in this list):

  • Linear algebra
  • Calculus
  • Matrix calculus
  • Probability and statistics
  • Optimization - linear programming, convex optimization, non-linear optimization

Some other topics that are useful in specific sub-areas of machine learning are:

  • Basic graph theory
  • Basic algorithms
  • First-order logic


Nikhil Singhal, Artificial Intelligence & Virtual Reality, Jaypee University Anoopshahr (2019)
  1. Linear Algebra
  2. Probablity theory and statistics
  3. Multivariate calculus
  4. Algorithms and Complex optimizations
  5. Others- Real
    and Complex Analysis (Sets and Sequences, Topology, Metric Spaces,
    Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier
    Transforms), Information Theory (Entropy, Information Gain), Function
    Spaces and Manifolds.

To learn them go through


· 2 Upvotes · Answer requested by Shiv Kr

Alex Gilgur, Data Scientist; IT Analyst; occasional lecturer at UCBerkeley's MIDS program.

Numerical Methods; Matrix and Tensor Algebra; Probability and Statistics; Operations Research; occasionally Calculus.


· Answer requested by Shuvanon Razik

Charles H Martin, Calculation Consulting; we predict things
Answered May 19, 2013 ·

Upvoted by Yuval Feinstein, Algorithmic Software Engineer in NLP,IR and Machine Learning
do this class online

EE364a: Convex Optimization I


· 11 Upvotes · Answer requested by Shuvanon Razik and Francisco Sosa

Alex mentions above, Andrew Ng's Course on Machine Learning is the best
I have seen so far and he gives an intuitional feel for the concepts,
so its easy to follow rather than looking at plain formulae in
If you are looking to refresh/clarify linear algebra concepts after going through the above course, Khan Academy
could be useful. It also has videos on other topics that might be of
interest for machine learning. If you are looking for concepts like PCA
etc., you might not find it here..
Another useful resource that can focus on concepts is video lectures.. Machine Learning - and search only for tutorials.
is no one stop shop as the concepts can go deeper and might require
special treatment.. All this is theoretical which can clarify concepts.
However, if you are a novice and if you want a deep and intuitive feel
for concepts then pick one simple problem and implement the solution.


· 1 Upvote

Dan Zhang, Computer Engineering PhD student at UT Austin; interned at Microsoft and Apple

Linear Algebra, Statistics, Discrete Math, Set Theory, etc.


· 1 Upvote I hope it quenches your thirst.



Sarvesh Dhage, studied at Ses High School and Jr College
friends i just came across a very interactive course on Understanding
Machine Learning. This is a completely free video course. You just need
to enroll using your id and password. I am sharing the link with you. Do
Understanding Machine Learning with R -


· 1 Upvote


  1. ASP.NET 页面禁止被 iframe 框架引用
  2. base的应用
  3. 第二天----列表、元组、字符串、算数运算、字典、while
  4. 在C语言源程序中的格式字符与空格等效
  5. HDU5853 Jong Hyok and String(二分 + 后缀数组)
  6. 基于TCP/IP的长连接和短连接
  7. [推荐] WordPress主题使用Google Fonts字体访问不了的解决办法
  8. android 多屏幕适配 : 第一部分
  9. Oracle:Authid Current_User使用
  10. zboot/xtract.c
  11. android使用Intent操作拨打号码发送短信
  12. EXTJS4.2 时间动态刷新显示
  13. InputStream的封装类
  14. Android基础笔记(十四)- 内容提供者读取联系人
  15. Linux crontab命令
  16. 安装bower
  17. vs找不到svn源代码管理插件之我见
  18. 我推荐的一些iOS开发书单
  19. 开发微信小程序中SSL协议的申请、证书绑定、TLS 版本处理等
  20. 西门子 PLC SFC14/15 80B1故障


  1. yum安装提示错误Thread/process failed: Thread died in Berkeley DB library
  2. Arduino通过I2C(SSD1306)驱动0.96寸12864OLED
  3. 转:extern &quot;C&quot;的用法解析
  4. Jqeury Mobile实战之切屏效果以及屏幕滚动到底端加载更多和点击切换更多
  5. jquery.uploadify 在firefox会出现httperror
  6. 阿里云k8s服务springboot项目应用升级时出现502错误
  7. ios中tableview的移动添加删除
  8. uitableview 和UISearchBar 下拉提示结合使用
  9. SqlServer整库备份还原脚本
  10. ng-file-upload结合springMVC使用