# How do I learn mathematics for machine learning?

- Linear algebra / Matrix Algebra (See How do I learn linear algebra? and How do I learn matrix algebra?)
- Probability Theory (See How do I learn probability?)

If you're interested in an accessible introduction to matrix algebra, Coursera is running a course on it right now: Coding the Matrix: Linear Algebra through Computer Science Applications

The **applied math** most directly useful for machine learning is:

- Statistics (See How

do I learn statistics for data science? What statistics book do you

recommend to a wannabe data scientist who is familiar with basic

statistics and mathematics?) - Optimization (See How do I learn optimization?)

· 256 Upvotes · Answer requested by Siddharth Verma

going through my Machine Learning course last semester, I felt like I

had the most catching up to do with Linear Algebra. I felt key ideas

from LinAlg are harder to remember over time than Probability. I found

myself to be mostly working with probability distributions, Bayes' rule,

MLEs and MAPs, while the algebra side of it was mostly optimization in

higher dimensions, was mostly Matrix calculus.

I discovered that

the Matrix Cookbook was popular with most students for working with

Matrix Calculus as it seems to have a never-ending list of matrix

derivatives:

http://www2.imm.dtu.dk/pubdb/vie...

As far as brushing up on the rest of your Linear Algebra knowledge is concerned, I highly recommend Strang's lectures/book:

http://ocw.mit.edu/courses/mathe...

Highly

relevant topics include knowing about rank and inversion, SVD, and also

make sure you're very comfortable with eigenvalues and eigenvectors,

amongst other things.

Finally, with Analysis, I don't think ML

requires a formal introduction to Analysis at all. Its important to know

higher dimensional calculus well, especially parts related to

optimization, such as Lagrange multipliers, the primal-dual form, and in

general, the calculus of Matrices, and you should be good to go.

Overall,

I think the case with Linear Algebra and Calculus is to work your way

through an ML book/course, and stop and look at the relevant math when

necessary, whereas you need a strong foundation in Probability right

from the beginning, and most textbooks on ML tend to talk a lot about

probability while skimming over the mathematical details of LinAlg and

Calculus.

Let me first caveat what I’m about to say with this: **go to graduate school. **†

To show you just how super-serious I am about this, I’m even going to

separate this caveat from the rest of the answer with one of the

ultra-cool line breaks.

Alright,

at this point, I’m assuming that you are still solely considering

graduate school preparation without an undergraduate education. Let’s

go.

My background consists of an undergraduate

BS in mathematics, a minor in physics, and a few years of research

experience that has spanned from charged particle detectors (physics/EE)

to autonomous vehicle system design for collision detection and

evasion. Long story short: *I’m far more qualified to answer your question when robotics is emphasized, so that’s what I’m going to do.*

**Robotics is Multi-Disciplinary**

Robotics

is a highly multi-disciplinary field. In fact, I’d argue that it could

well be the academic field which encompasses the largest quantity of

distinct domains into its core structure. When we’re talking about

robotics, we’re really talking about

- Computer science
- Mathematics
- Computer engineering
- Electrical engineering
- Control engineering
- Systems engineering
- Mechanical engineering
- Physics (mechanics, more specifically)

What’s

even more impressive about the above list than its size is the depth of

each field. Aside from control and systems engineering, which are a bit

more specialized and less fundamental than the others, each of the

above domains are extremely broad—indicating that if you were to break

down robotics concepts into a networked graph, it would resemble

something like this:

[1]

Needless

to say, roboticist ultimately specialize in a much narrower range so

that expertise in a topic can be attained. But that doesn’t change the

fact that to pursue robotics, high breadth and versatility in

engineering and math is a tool whose utility can’t be overstated.

**Specific Areas of Research**

Now,

regardless of whether you want to pursue a masters or a Ph.D., you will

ultimately have to carve out a niche for yourself. As I mentioned

above, mastery of all robotics is a hopelessly daunting task; it’s

impossible. Therefore, it’s important that you expose yourself to the

different areas of robotics, and gradually hone in on your desired path

according to the topics with which you’re interested and at which you’re

talented.

Here’s my breakdown of robotics

research, in increasing order of mathematical abstraction and decreasing

order of hands-on engineering and building:

*Sensors.*About

as applied and hands-on as you can get, the domain of sensors works on

expanding the current technical constraints that robotics hardware

faces. It’s because of these guys that the iphone magically gets smaller

and smaller every year, while also increasing its technological

capacities. An example of the importance of this domain which is even

more specific to robotics is radar evading drones. Remember when Osama

Bin Laden got taken out because we flew a helicopter in Afghanistan that

magically evades radars? Thanks sensors.*Nano-robotics.*Focusing

on developing robotic systems on the micro-level, nano-robotics

explores how robotic agents can be built and implemented on a scale

sufficiently small that they can be directly inserted into your body.

Sound scary? It shouldn’t. Nano-robotics has a plethora of game-changing

medical applications, some of which include legitimately curing cancer

and preventing aging.*Machine vision.*While the ability

to process and interpret visual information comes very intuitively to

humans, translating our abilities to an algorithmic environment in this

matter has proven to be an intimidating process. In fact, I’d argue that

the largest obstacle facing self-driving cars is machine vision. Just

take a look at the self-driving car expert at Tesla who died because his

car failed to distinguish between the bright sky and an incoming white

truck. [2]*Robotic learning.*When machine learning is

applied in a robotic context, it basically becomes robotic learning.

Robotic learning is the overlap between robotics and machine learning;

it approaches the problem of developing tools for adaptation and

learning in robotic systems. Very cool field, with a lot of promising

application, and very well suited for someone interested in machine

learning and robotics.*Robotic control.*This is the area

in which I’m currently nested. Control represents a mathematical

approach to modeling the behavior and evolution of a Dynamical system - Wikipedia

in relation to inputs, which can be used to affect the system’s output.

The goal here is to mathematically demonstrate that a certain approach

for input selection guarantees that the system’s output will quickly

converge to a stabilized desired range, as illustrated in this kick-a**

picture. [3]

Because

you have stated that robotics and machine learning are your interests,

I’m going to assume your interests align with the #3–5 end of the

spectrum. But even when your interests are honed in on these two areas,

there is still a massive range of topics and skill sets spanned by these

two very broad domains.

**Developing Skills for Robotic Learning**

Again,

I’m far from an expert in robotic learning and machine learning, but

I’ll do my best to show some helpful tips for pursuing this domain. The

fundamental fields from which machine learning constantly draws, as I

understand it, are the following:

- Probability
- Statistics
- Algorithms
- Optimization
- Systems

The

last one is a bit more of a stretch in comparison to the others, but

I’ve heard that a high portion of machine learning can actually be

approached from a systems perspective, and that its inception actually

arose from system theory modeling.

For **probability and statistics**,

both intuition and rigorous technicality will be important. I had a

horrible textbook which provided very little conceptual basis for the

theorems, and mostly included a bunch of isolated problems which were

crudely connected in a very disjointed way. I recommend *Introduction to Probability *by

Grinstead and Snell, [4] which provides a lot of clear,

well-articulated conceptual explanations which enhance both intuition

and precise reasoning on the subject. It’s also free and available

online, which ya’ know, is always a big plus.

Becoming comfortable with **algorithms **is

a task which can more easily be achieved in a college setting, but one

which is also very feasible to execute independently. Regarding a

textbook to guide you through key concepts to algorithm theory, I

recommend to look no further than the classic *Introduction to Algorithms *by Cormen, Leiserson, Rivest and Stein. [5]

Additionally, I would look to two additional sources to continually expand algorithmic skills: *Project Euler **Archived Problems - Project Euler*and Topcoder Deliver Faster through Crowdsourcing.

Project Euler encompasses a diverse range of mathematical problems for

algorithmic development which will strengthen your mathematical

algorithmic thinking, and your “out of the box,” creativity. Topcoder

provides challenges which will improve your technical programming

skills, and diversify and expand your problem solving breadth.

Of course, once you have a solid background in the above topics, you’ll want to receive a comprehensive introduction to **robot learning**, for which I’ve been told that *Robot Learning *by Connell and Mahadevan is a solid choice. [6]

Although

robotic learning and robotic control are distinct domains, robotic

learning is intrinsically tied in to concepts from control theory. In

fact, one of the most challenging problems facing the robotic learning

community is that it lacks the rigorous analysis and descriptions that

the control and systems theories possess.

For

example, a self-driving car that implements a series of clever robotic

learning algorithms will never be implemented without tools from control

systems. Why? Because without tools from control and systems theory,

you will never get close to demonstrating rigorous, mathematically

demanding qualities such as robustness, safety guarantees, stability,

etc., without which, the government wouldn’t let your self-driving car

see the light of day.

**Robotic Control**

I think that optimization, control, and systems are all presented and integrated very concisely in *Design of Optimal Control Systems *by

Bini. [7] This book consists of more than a minimal amount of knowledge

in any of these topics which is needed for machine learning. But a deep

understanding of at least some of the ideas shown in this book will

allow for insights to be drawn between these domains which most others

will likely not be capable of seeing.

Note that

I recommend the above for someone interested in both machine learning

and robotic control. If you’re primarily interested in robotic control,

then your mathematical skills need to be more sophisticated than the

vast majority of other engineers. This is likely the only engineering

discipline in which highly abstract mathematical fields play a

fundamental role. They include

- Real analysis
- Systems of Differential Equations
- Dynamical Systems (similar to 2., but distinct from it)
- Advanced Linear Algebra
- Advanced Optimization
- Basic Topology
- Set Theory (more than the basics, but not quite “advanced” set theory)

Clearly,

your mathematical skills have to be beyond the more applied end of the

spectrum in which things like formalities, proofs, theorems, and rigor

are almost never relevant.

For a comprehensive introduction to **real analysis and topology** that isn’t esoteric (difficult to find), I recommend *Basic Analysis *by

Lebl. [8] While the book isn’t intended for studying topology

specifically, it covers nearly all of the fundamentals which are

relevant to control. Note that real analysis is the most important item

in the above list.

**Advanced Linear algebra **is

the most difficult field for which to find an accessible, engaging

textbook, I.M.O.. The majority of the texts are far too focused on

minute, irrelevant details and burdensome proofs whose understanding

gains little insight regarding the deeper concepts. More importantly,

most textbooks totally fail to connect the ideas to deeper concepts

which are both cool and incredibly useful. After a lot of searching, I

found hope in an unexpected place: online lecture notes. [9] If you

master this book, and its difficult problems, to the point where you can

comfortably walk through the main concepts with a high school student,

then you’ll be five steps ahead of me.

As for **dynamical systems**, I’d say that *Dynamical Systems *by Sternberg does the trick. [10] Until you get to the more theoretical content like *stability and invariance*,

you really want to focus more on the concepts; the details aren’t

particularly important, surprisingly. You really just need to know what

kind of assumptions you have to make about the system you’re modeling.

*Once you’re comfortable with most of the above*, you can get your hands dirty with some actual **control theory**. For this, I recommend Mathematical Control Theory by Sontag. [11]

†:

I have a hunch that’s not what you want to here, since you didn’t ask

for advice regarding this matter. So I’m sorry if this caveat irks you

in any way, but it’s the best advice I can give, and I think it’s

important for you to hear.

I’m a firm believer

in pragmatic optimism, and while it’s optimistic to believe that

admittance into graduate school—especially in a technical field—is

feasible without an undergraduate degree, it is far from optimistic.

Without an undergraduate degree, you are immediately excluded from

consideration for all departments at the majority of universities.

I

can’t find any specific statistics on this matter, so you’ll have to

choose whether or not to take my word for it. But trust me when I say

that I can currently think of one graduate school that doesn’t

necessitate an undergraduate degree as a strict requirement.

Even

putting the strict requirements aside, for deeply embedded

multidisciplinary fields like robotics and machine learning, an

undergraduate education is crucial. Although I do think that the ability

to interact with professors; learn with faculty and peers in person;

and receive a curriculum designed by experts on which you are tested in a

competitive environment are all vital assets for initiating the

engineering experience in any field, they are especially true for

robotics.

Another important distinction regarding your question is are you planning for a masters or a Ph.D.?

[1] Pawel Pralat: Graph Theory

[2] Tesla driver killed while using autopilot was watching Harry Potter, witness says

[3] Vehicle stability control systems: An overview of the integrated ...

[4]https://www.dartmouth.edu/~chanc...

[6] Robot Learning | J. H. Connell | Springer

[7] http://retis.sssup.it/~bini/math...

[8] http://www.jirka.org/ra/realanal...

[9] https://www.math.uh.edu/~climenh...

[10] Dynamical Systems (Dover Books on Mathematics): Shlomo Sternberg: 9780486477053: Amazon.com: Books

[11]http://www.mit.edu/~esontag/FTPD...

· 34 Upvotes · Answer requested by Abdulmajeed Kabir

- Convex Optimization (Convex Optimization - Boyd and Vandenberghe)
- Linear algebra
- Some rudimentary Calculas (especially use of Lagrangian)
- Lots of Probability ad Statistics

http://courses.washington.edu/cs...

A couple of years ago, based on his experience, Bradford Cross

gave a comprehensive list of the best resources on machine learning and

the prerequisites in his blog ("Measuring measures"). Unfortunately, it

appears to be down right now.

**UPD**:

Here the blog post at WebArchive's mirror is: http://web.archive.org/web/20101...

Bradford's lists at Amazon:

- Analysis [1]
- Linear Algebra [2]
- Probability [3]
- Statistics [4, 5]
- Optimization [6]
- Machine learning [7]
- Feature Selection [8]

I hope, Mr. Cross will be able to join the discussion.

[1] http://www.amazon.com/Analysis/l...

[2] http://www.amazon.com/Matrix-Fu/...

[3] http://www.amazon.com/Probabilit...

[4] http://www.amazon.com/Statistics...

[5] http://www.amazon.com/Nonparamet...

[6] http://www.amazon.com/Heuristic-...

[7] http://www.amazon.com/Machine-Le...

[8] http://www.amazon.com/Feature-Se...

**UPD 2**:

Here

is the list of must-read books for theoretical machine learning [1],

which is attributed to prof. Michael Jordan (UC Berkeley). The sources

are [2] and [3].

[1] https://www.goodreads.com/review...

[2] Learning About Statistical Learning

[3] AMA: Michael I Jordan • /r/MachineLearning

**Mathematics for Computer Science**". It is also a lecture in MIT. This is the MIT OCW link of that course: http://ocw.mit.edu/courses/elect....

This

course materials are old by the way. Good news is that you can find the

book (composition of all materials) easily by searching. If I am not

wrong, the last revised version of this book is **6th May, 2012.**

You need linear algebra as well. I recommend you for this reason, Gilbert Strang's **"Linear Algebra and Its Applications".** It may be little bit tough, but it is a great book.

If you want to dive into probabilistic approach, you can enroll Probabilistic Graphical Models course: https://www.coursera.org/course/pgm. I heard that it is very good course. Textbook of that course looks very useful: http://www.amazon.com/Probabilis...

The current machine learning (ML) algorithms are based upon mapping functions.

F:X→Y

The function F

can be anything such as a support vector machine (SVM), a restricted

Boltzmann machine (RBM), a deep neural network (DNN) or anything else

that you can hand engineer yourself. In application areas, X represents the input space while Y

represents the output space.

In speech recognition X

might be a set of spectrograms while Y a set of identities representing the speakers. In image recognition, X is the raw image pixel space while Y is the categorization consisting of different classes in which xi∈X

can fall into.

Each ML model has parameters w

that affects the behavior of F

that we can normally adjust in order to change the behavior of that

function. We can thus write the mapping more conveniently as:

yi^=f(xi

,w)

where yi^∈Y

We will focus on supervised ML model where we have a dataset T

of training input-output pairs in the form:

T=[(x1,y1),(x2,y2),…,(xN,yN)]

The goal of supervised machine learning is to find the best parameter values w^

that makes the function F

map the input-output pairs with the least error. So in supervised ML we have two main issues:

- Define a fitness measure that tells us how well the ML model is performing on the trainging set T

- .
- Generalization:

We can run the same fitness measure on the test set after training is

complete in order to measure how well the model generalizes to novel

inputs. This is a very important concept in modern ML. - A learning algorithm to update the weights, w→w^

- .

This

is where the maths come in, to understand the underlying maths concepts

you need to understand what ML is trying to solve in the first place.

The aim here is to find solutions to those 3 issues mentioned above and

maths can help us with that.

**1: A fitness measure**:

This is normally done by an objective function also known as the loss/cost function:

L(yi^,yi)

where yi^

= actual output and yi

= desired output.

In empirical risk minimization[1](ERM) the goal is to to minimize the overall loss as defined by the risk R

:

Remp(w)=1N∑Ni=1L(f(xi,w),yi)

ERM states that the learning algorithm should choose the hypothesis function f^

such that the empirical risk is minimized, In simple mathematical terms we need to solve:

w^=argminRemp(w)

Where f^=f(x,w^)

**2: Generalization**:

The above naive ERM can result in the function f^

just memorizing the training examples which can cause what is called overfitting, that is, fitting the function F to each and every noisy/outlier data point. That is not ideal thus instead we normally use structural risk minimization[2](SRM) whereby we add a regularization term C(w)

to the risk, thus we get the regularized risk:

Rstru(w)=1N∑Ni=1L(f(xi,w),yi)+λC(w)

Rstru(w)=Remp(w)+λC(w)

Then in SRM we need to solve:

w^=argminRstru(w)

Regularization

simply simplifies the weight parameters so that they don't model too

much of outliers or noise. That is done by penalization of large weight

values in w

which are a cause of most overfitting issues. Thus L0 norm can be used in order to favor a very sparse set of weights whereby most weight values are zero. You can also use L1 or L2 regularization instead as the L0

norm is hard to optimize. Other weird regularization methods have since

popped up such as dropout, which is used in learning algorithms for

DNNs whereby neurons are randomly dropped out and back during training

so that the overall network becomes robust to noise, dropout can be

loosely seen as an ensemble method.

**3: A learning algorithm:**

Learning

in current ML can be viewed as a way to update the weights in order to

find the optimal parameters. ERM and SRM both are relying on the

existence of a learning algorithm for weight adjustment. We need an

algorithm to find the weights that solve.

w^=argminRemp(w)

or

w^=argminRstru(w)

We need a way to update the model such that

w^←w

In

current ML systems we just look to the old idea of gradient decent (GD)

from numerical optimization. In GD we simply just move down the

steepest slope on the error (risk) surface defined by the risk R

. That means we can just use the update rule defined by.

wt+1=wt−α∂R∂wt

where t

=step count, α

=learning rate

Here we assume a convex surface defined by R

but in practice especially for DNNs the surface is highly non-convex

but in practice almost any local minima is just good enough, plus we can

add momentum to the update rule so that it can escape from the local

minima traps easily. Also the shear number of parameters makes it harder

for the DNN model to get trapped in a local minima trap as there are

many possible escape routes through the other many dimensions.

In DNNs the gradient computations can become cumbersome even for a modern machine as the number of gradient steps needed to hit w^

are normally large. Thus we need fast ways to accelerate gradient

computations for layered architectures. Backpropagation (backprop)

algorithm, to be specific, is a way of computing gradients extremely

efficiently in any differentiable computational graph. Backprop uses

chain rule by starting from the output layer which is directly connected

to the loss function and hence easier to evaluate the derivatives and

then move towards the layers (input layer) far away from the output

layer while chaining the derivatives. It is called backprop because

errors are passed from back layers towards the front layers thereby

saving a lot of repeat computations.

GD

requires that all training pairs are considered before taking a single

small update step, this is not scalable. Thus in practice we have the so

called stochastic gradient descent (SGD) that takes a step just after

one example, this is so efficient that it is normally a standard

learning algorithm for DNNs together with backprop. There are batch

variants of SGD which you can consider as being inbetween SGD and GD,

the batch gradient descent approach uses a small random set known as the

batch of training examples that it uses to approximate the gradient

field via backprop algorithm. Thus SGD can be seen as the batch variant

with just 1 example in the batch.

So

to learn the maths theory behind ML start from the underlying goals of

ML which we have looked at in this discussion. Of course this was just a

tip of the iceberg, but the best way to see most ML models is that they

are function approximators and we wish to recovery those approximations

from input-output training pairs alone, which we call end-to-end

learning.

It also helps to visualize ML as just

optimization theory. We have a loss function and all we need is an

algorithm that helps us find the right settings such that the loss is

minimized. In practice SGD+backprop works very well for training modern

ML models.

You need to also try and implement

some of these algorithms yourself from scratch. Try to implement

backprop and SGD for a multi-layer neural network (NN), not a deep now,

then try it on MNIST dataset. You can only learn via practice, make sure

before implementation you go through backprop and derive it for

multi-layer NNs and convolutional neural networks (convNet).

Don't

be too much in a hurry though, concepts take time to make sense. In

order to help yourself assimilate the stuff a bit easily, solve some

problems and try to also explain the systems to others via platforms

like Quora, that way you will start to have more and more confidence in

your understanding of the maths behind ML algorithms.

Hope this helps.

Footnotes

[1]Empirical risk minimization - Wikipedia

[2]Structural risk minimization - Wikipedia

· 12 Upvotes · Answer requested by Pankaj Sharma

Some people say that mathematics are useless for a software engineer, machine learning proves them false.

Mathematics

are the prerequisites for machine learning because machine learning is

math. The computer is only useful to do the calculus.

You'll mainly need to learn calculus, matrix calculation, linear and non linear algebra, statistics and graph calculus.

Let's take a basic ML algorithm, the linear regression.

The

goal is to use some data to find a function which takes parameters and

gives an output. Data are used to find the function and test it. In the

future, we will use the function with some parameters and we will obtain

an approximate output.

Let's say our data are

about planes, as input we have the number of miles travelled by the

plane and its age. As output we have the price of the plane. I don't

normalize data to keep things simple.

A sample of our data could be :

miles;age;price

120000;12;120000

48000;4;1500000

...

Our question is : Given the miles travelled by a plane and its age, give a price.

Using

linear regression (gradient descent) we will find a vector theta. This

vector has two values, theta[0] and theta[1]. To find an approximate

price we will multiply the miles by theta[0] and the age by theta[1] to

obtain a result, which is an approximate price.

For

instance our algorithm could find theta = [2;-10 000] and if we have a

plane 5 years old with 78 000 miles, we can than approximate the price

doing 78 000 * 2 + 5 * -10 000 so 106 000 dollars.

The hard part is to find the good values for theta. To do that you need some maths.

You

have a cost function that give you how good your theta is, this cost

function tests your theta values using your data (which already have a

price for a plane regarding the miles and the age).

So your goal is to minimize the cost function by adjusting your theta values.

The cost function to minimize is this one :

where

Using the batch gradient descent algorithm each iteration adjusts the theta values using this formula :

then

you test the theta value with the previous function J(theta) and you'll

see that the cost (ie. the diff between the predicted value and the

real) will decrease at each iteration.

As you can see this simple ML algorithm is math. The computer will be useful to compute the previous formulas.

is too vast a subject to be considered for this question. The breadth

and depth of mathematical awareness you require for machine learning

totally depends on what you are learning in the subject. Keeping this in

mind, let's deal with what you need to know in "mathematics" for

machine learning.

1. *Probability and mathematical statistics*

This is a fundamental requirement for machine learning and so you need

to know well. When I say probability it's more than what you studied in

High school and almost everything you probably not paid attention to

during your undergrad. You need to know about Random variables, their

distributions, probabilistic convergence, and estimation theory. That

covers a major part of what you need to know here.

Two of my favourite resources are:-

1. Joseph Blitzstein - Harvard Stat 110 lectures

2. Larry Wasserman's book - All of statistics

2. *Linear algebra*

Linear

algebra will pop up every now and then in ML. PCA, SVD, LU

decomposition, QR decomposition, symmetric matrices, othogonalization,

projections, matrix operations are needed many a times. The good thing

is that there are countless resources available on linear algebra.

My all time favourite is Gilbert Strang's MIT lectures on linear algebra.

3. *Optimisation*

Though

only a few things from optimisation are needed most of the time, a

strong foundational knowledge will help long way. You need to know

Langrange multipliers, gradient descent, and primal-dual formulation.

The best resource on this is Boyd and Vandenberghe's course on Convex

optimisation from Stanford.

4. *Calculus*

I

wanted to put this on the top, but I'm putting it in the last just to

emphasise on the fact that only a fundamental knowledge is needed in

terms of calculus. Know about 3-D geometry, integration, and

differentiation and you'll survive. It's the easiest to start with

amongst the topic I've mentioned here. MIT has good lectures on

calculus.

I think with these 4 tools you'll most likely find ML

easy to understand. Other than these you may find real analysis and

functional analysis relevant too, but they are just formal

generalisations of the topics mentioned before.

From a beginner.

An introductory Linear Algebra course will generally include the following:

- Vectors
- Vector Spaces
- Matrices
- Inner Product Spaces
- Orthogonality
- Projection
- Linear transformations
- eigenvectors, eigenvalues
- change of bases
- Various decompositions: LU, Polar, SVD.

I also had some geometric algebra, but haven't found that useful so far.

Probability and statistics:

- probabilities
- combinations
- permutations
- distributions
- Understanding of hypothesis testing
- Descriptive statistics: Means, modes, standard deviations, variances etc.

If you can get through:

https://www.khanacademy.org/math...

And

https://www.khanacademy.org/math...

You are good to go.

which includes a lot of the math for machine learning. There's also a

draft textbook there which is well worth grabbing a copy of.

The machine learning field needs the following mathematics background to understand more things.

- Calculus and in my view the following reference is very good,
- The linear algebra and matrix calculation and the following reference is very relevant,
- The statistics and probability background, and the following books are very good,
- All

of Statistics: A Concise Course in Statistical Inference (Springer

Texts in Statistics): Larry Wasserman: 9780387402727: Amazon.com: Books - Amazon.com: A First Course in Probability (9th Edition) (9780321794772): Sheldon Ross: Books
- The knowledge of optimization and the following textbook is very good,

you are truly looking for a one-stop reference, the best that I can

suggest is Chris Bishop's Pattern Recognition and Machine Learning (http://www.amazon.com/Pattern-Re...).

Although it is quite difficult to start with, it will cover the

majority of your interests until you are well versed enough in the

subject to be able to read publications and more specific texts.

When

in doubt, MIT OpenCourseWare is always a good source -- I believe they

even offer one or two machine learning courses at the graduate level.

Good general reference/tutorial texts:

- Information Theory, Inference and Learning Algorithms -- McKay
- Introduction to Probability Models -- Ross
- AI: A Modern Approach -- Russel & Norvig
- Algorithms -- Kleinberg & Tardos

Christopher

Bishop - Pattern Recognition & Machine Learning. First time I

picked this book up it was pretty daunting, but once you get a bit of

the maths under your belt, I found that it presents clearer

explanations than other texts. I found it really clearly laid out and

it seems to progress pretty well. It also covers a lot of stuff.

Linear Algebra:

Gilbert

Strang videos on linear algebra are excellent, so are the Khan academy

ones. The Gilbert Strang book doesn't seem to get particularly great

reviews. On the basis of reviews, I picked up a copy of Howard Anton's

Elementary Linear Algebra which seems to be very highly regrarded. I

would recommend it. I also have David Poole: A Modern

Introduction...which feels a bit more......modern than Anton and I have

tended to use it more. Doesn't seem to be a particularly well known

book on t'internet, but I find it very clear (more so than Anton).

If

you want to practise, then there's Schaums Outline of Theory and

Problems of Linear Algebra. (Good for practising but insufficient as a

standalone text to the subject)

If you have the luxury of having

some time before starting on Machine Learning, I would suggest really

focusing on linear algebra in a very hands on way (working through

structured examples) and getting a good understanding of orthogonality,

vector spaces, eigenvectors, transformations. From my experience,

trying to learn the maths at the same time as learning Machine Learning

was overwhelming and I would have got a lot more out of ML lectures if I

had already got a grasp of the maths.

You'll

want to know calculus up to vector calculus, a first course in linear

algebra, and a good course in calculus-based statistics that actually

explains what the concepts mean (as opposed to "if you're trying to do

this, you should press the chi-squared-test button" like you see in a

lot of classes.) A discrete math course would be nice just for

background on notation, although you don't actually need to know any

nontrivial discrete math.

Mathematics

for ML is no different from what you learn in high school or in

under-grad studies. If you have that mathematics base, most of the time

it is sufficient to understand what's going on in those creepy equations

you see in books and research papers. However, sometimes more than that

is required and you may have to take some advanced courses in

statistics, calculus, linear algebra etc. You may also like to read in

general more about How do I learn machine learning?

· 3 Upvotes · Answer requested by Shuvanon Razik, Francisco Sosa, and 1 more

Please see How do I learn mathematics for machine learning?

which has some good answers. I believe the Witten et. al. book is one

of the most accessible introductions. I guess a basic book on statistics

and probability and another one on Linear Algebra (For example Strang,

4th edition [1]) will take you most of the way there.

[1] http://math.mit.edu/linearalgebra/

· 19 Upvotes · Answer requested by Francisco Sosa

Teach yourself Machine Learning the hard way ! and follow up Teach yourself Machine Learning the hard way ! (Part 2)

It lists many pre-requisites that you need to understand and also some of the advanced stuff in part2.

I hope this helps.

With regards to mathematics for machine learning, I reckon all of the following skills are important:

(1) Some Basic Mathematical Skills (Linear Algebra, Probability, Optimization)

(2) Knowing how those mathematical skills are exploited for machine learning algorithms

(3) Developing a way to understand mathematics, so that any advanced maths for modern machine learning can be well comprehended.

While

one would generally recommend all sorts of linear algebra and

probability books for machine learning, I feel those are not always

worth the time at least for machine learning. I would recommend

following texts to read through (perhaps in order), which should cater

to the above three mentioned points.

(a) Pattern Recognition and Machine Learning by Christopher Bishop (Will cater to 1 and 2 above)

(b) Deep Learning book by Goodfellow, Bengio and Courville (Will again strengthen 1 and build on 2)

(c)

Understanding Machine Learning by Shai Shalev Shwartz and Shai Ben

David (Will advance your skills in 1, strengthen 2, and give an insight

to 3)

(d) Ankur Moitra’s rather short but useful book on Algorithmic aspects on Machine Learning (Will mainly cater to 3)

(e) Optimization for Machine Learning by Sra, Nowozin and Wright & Off the convex path by Sanjeev Arora and collaborators (Will cater to 3 and advance 1 and 2)

I

truly believe if one can properly understand the above stuff in machine

learning, he will develop all the Maths basics needed for machine

learning, that too in a very connected form !! Hope this helps !!

I won't say that you “learn” math. I would rather say that you **train **math.

Imagine

you want to train boxing and your coach is teaching you directs, low

kicks and high kicks. No matter how many times he shows you how to kick,

you can't do it perfectly. You do know that it takes **patience, hard work and effort**

to finally learn how to punch and you need to keep trying and training.

After so many trys you can finally say that you can actually punch.

**Whats the point?**

Math is the same. Consider direct punches your *formulas*,* low kick* your theories and *high kicks *your

solutions to problems. No matter how many formulas or theories you

know, no matter how many times you've seen solutions you just can't do

it perfectly. *Why? *Because you need to train those formulas, train those theories and knock out those problems with a damn good **high kicks**. And how do you do that?

- Do as many problems as you can on a daily basis. It is not going to happen overnight, it takes time to train those kicks.

Wanna learn it fast? Better start now!

Brush up on your statistics and probability. This is definitely critical particularly for supervised learning methods.

Some also require a good deal of number theory knowledge especially when discussing SVM, PCA and friends.

Since,

you are planning to take a Ph.D. and move the science further you might

want to narrow your focus to a particular area for your research while

working with your candidate adviser.

This is not an exhaustive list of topics. Best read in this order:

- Linear Algebra
- Vector Calculus
- Statistics and information theory
- Discrete Math
- Convex Optimization
- Probabilistic Graphical models

I believe there is a book : http://www.amazon.com/All-Mathem... which can help you get a good head start.

I will try to keep this as concise as possible.

*Edit: Somebody merged the original question to this question, so the premise becomes irrelevant.*

To become a full stack AI/ML engineer, it is imperative that you have a **complete grasp of the mathematical foundations**

of ML so that you can build upon concepts easily. The basic

mathematical skills required are Linear Algebra, Matrix Algebra,

Probability and some basic Calculus.

**Linear Algebra**

The best source to study Linear Algebra is **Prof. Gilbert Strang’s Linear Algebra book/course**. Video Lectures | Linear Algebra | Mathematics | MIT OpenCourseWare

(MIT OCW). There are 34 lectures and believe me, they are completely

worth it as after completing this, linear algebra should not pose any

more problems for you. Solve some exercises/exams if you want to achieve

mastery (recommended).

**Matrix Algebra**

Matrix algebra is an essential component of deep learning. I personally recommend this (**Matrix Cookbook by Kaare Brandt Petersen & Michael Syskind Pedersen**): http://www2.imm.dtu.dk/pubdb/vie...

(PDF). There are 66 pages of pure matrix operations and this is the

absolute “go-to” in case you are stuck trying to understand certain

matrix manipulations that a researcher might have done.

**Probability & Statistics**

Understanding

probability is a very important aspect of understanding ML. Some of the

key probability concepts that you must be aware of include Bayes’

Theorem, distributions, MLE, regression, inference and so on. The best

resource for this is **Think Stats (Exploratory Data Analysis in Python) by Allen Downey**: http://greenteapress.com/thinkst...

(PDF). This absolute gem of a book is 264 pages long and covers all the

aspects of probability and statistics that you need to understand with

relevant Python code.

**Optimization**

The go-to book for Convex Optimization is **Convex Optimization by Stephen Boyd and Lieven Vandenberghe**: https://web.stanford.edu/~boyd/c...

(PDF). This is a 730 page book and you need not read it all in one go.

Choose the concept which you need to learn depending on your

requirements and interest and read that part. It is complete and

extremely well written. This book is free as part of the CVX 101 MOOC on

EdX.

This 263 page book on metaheuristics, **Essentials of Metaheuristics by Sean Luke** (http://cs.gmu.edu/~sean/book/met...

(PDF)) talks about gradient based optimization, policy optimization

etc. and it is well written. One can choose to go through this also if

interested.

Data science concepts are covered

in the above topics. Other topics can be learnt by googling for sources

easily as and when you encounter them. But complete understanding of the

above should suffice for 95% of all scenarios.

Achieving

mastery of the above topics will surely make you a mathematically

strong AI/ML engineer. Now that you have built the foundation, start

dipping your feet into **research papers**. They are absolutely

essential as these clearly show the standards of AI

researchers/engineers. Firstly, find out the famous papers of AI like

RNN, LSTM, SVM etc. and go through the technical content.

**Can you understand the jargon?**

**Can you understand the mathematics?**

**Can you implement the mathematics in code now without the help of overly sufficient libraries?**

These are the key questions to be answered. Once you can answer “Yes/Mostly Yes” to these 3 questions, you are good to go.

After trying to read these papers dealing with the most popular concepts, try to read the not-so-famous papers. **arXiv**

is a great site with hundreds of preprints being published everyday by

top researchers and reading the papers from here is like drinking

straight out of the fire-hose. Try to choose a paper which looks fairly

well written and the abstract seems interesting. Then, read that paper

and try to answer those 3 questions again. The same can be done with

papers of top AI conferences like NIPS, AAAI, AAMAS, IJCAI, ICML etc.

You may not be able to fully implement the papers due to data

constraints and other issues, but if you are able to understand even 60%

of the mathematical reasoning, then I can safely say you have completed

your *training*.

**Do not concentrate on learning more and more “packages”**.

Concentrate on the concept. While implementing, you will automatically

see that you require “this” package and then you will automatically

learn to use it. Learning the various commands of random packages won’t

help. If you start implementing and writing codes to solve problems or

simulate results from a paper, you will automatically learn about

packages and use them appropriately; they’ll be the least of your

concerns. **This is the correct way to maintain “balance” between math and coding.**

You can also participate in competitions (e.g. Kaggle or conference

competitions) to improve speed, development and processing skills if you

feel the need to do so.

Alternatively, you can **choose to pursue a doctoral degree** (like me :P ) in AI/ML to gain a complete in-depth understanding of everything discussed here and more.

*(All the links in this answer are working as of 6th July 2017)*

- Analysis http://www.amazon.com/Introducti...
- Algebra http://www.amazon.com/Introducti...
- Probability http://www.amazon.com/All-Statis...

They will make your later reading much more pleasant. You will be able to devise your own proofs.

Terence Tao put multiple math-learning advices on his blog:

- Solving mathematical problems

http://terrytao.wordpress.com/ca... - There’s more to mathematics than grades and exams and methods

http://terrytao.wordpress.com/ca... - There’s more to mathematics than rigour and proofs

http://terrytao.wordpress.com/ca...

I

started writing the github awesome page for that ,it may help ,its

having topics from basic machine learning maths to advanced and quantum

machine learning

krishnakumarsekar/awesome-machine-learning-deep-learning-mathematics

Thanks and Regards

Krishna

krishnakumarsekar/awesome-quantum-machine-learning

A2A.

To have a basic mathematical background, you need to have some knowledge of the following mathematical concepts:

- Probability and statistics

- Linear algebra

- Optimization

- Multivariable calculus

- Functional analysis (not essential)

- First-order logic (not essential)

You

can find some reasonable material on most of these by searching for

"<topic> lecture notes" on Google. Usually, you'll find good

lecture notes compiled by some professor teaching that course. The first

few results should give you a good set to choose from.

For instance, here’s a list of some lecture notes that I just found:

Probability & Statistics : http://www2.aueb.gr/users/demos/...

Linear algebra : https://www.math.ku.edu/~lerner/...

Optimization : http://www.ifp.illinois.edu/~ang...

Calculus: https://www.math.wisc.edu/~angen...

Matrix Calculus : http://www.atmos.washington.edu/...

You

should skim through these, without going into too much detail. You can

come back to studying the topics as and when required while learning ML.

· 37 Upvotes · Answer requested by Jasdeep Rana

If

you want to be a real Data Scientist Not the fake ones with skills of

Analyst and not any mathematical intuition or point of view. Real Data

Scientist Need to have very strong mathematical grounding.

So to learn Mathematics for ML this should be the order :-

- Start with probability ( Conditional Basic Marginal etc …)
- Mathematical Series and Convergence , Numerical methods for Analysis
- Matrix and Linear Algebra
- Bayesian Statistics
- Vectors ( Most Important)
- Calculus
- Markov Process and Chains
- Basics of Optimization ( Linear/ Quadratic)
- Advanced Matrix Algebras and Calculus ( Gradient , Divergence , Curls etc)

This much mathematics will enable the understanding behind the core ideas of ML and probabilistic algorithms,

You should pause now and start analysing certain Packages from Scratch in Python :

1. K-NN is great starting point learn it , and code it from scratch.

2. Logistic Regression with Gradient Descent.

Till

now you can see the parameters and numbers moving in a matrix form ,

and understand the mathematics of prediction, And if you feel this is

enough. Hold your breath. There is more exciting stuff to come. This

will enable you to be a beginner of being a “Real Data Scientist”.

Next Start with :-

- Stochastic Models and Time Series Analysis
- Differential Equations
- Dynamic Programming and Optimization Techniques
- Fourier's and Wavelengths
- Random Fields
- Basic Knowledge of PDEs
- Techniques to solve PDEs using Monte-Carlo , Polynomial Expansions.

These

mathematical techniques will help you visualize the model’s working and

how to model and process raw data to create unique models whose

functionality can be tuned. Parameters can be optimized for the problems

and fine tuned with these techniques.

For a Next Level Up:- ( Statistics of Higher Dimensions)

- PDEs numerical solution with numerical input/ random input. ( fascinating subject to work on )
- Stochastic Differential Equations and Solutions
- PCA etc
- Dirichlet Processes, Markov Decision Process.
- Uncertainty Quantification - Polynomial Chaos, Projections on vector space

I

think these are subject which one must learn to be a good Machine

learning engineer in 21st century. with a knowledge base like this one

can connect dots very rapidly and build systems and model of high

accuracy.

( I am not a big fan of Neural nets,..so forgot to mention here)

Algebra is important in many ways, but you really need to learn some

logic. I don't mean the babiest of baby things that people say is easy

because they can understand and track and foretell the end of a mystery

novel in a tv series like Foyle's War or CSI or Sherlock Holmes. I

don't mean the intro to logic course in many philosophy departments. I

don't mean the Boolean circuits course you may have taken as a freshman

in computer engineering or the simple truth table arguments you did and

eventually turned in to graph theory problems in a second semester or

second year computer science course called ``Discrete Math''. I don't

mean the simple arguments you went through in your modern algebra course

as a senior in a mathematics department. But all of those can be

useful, and are pre-cursors or example generators for a beginning course

in Model Theory. Then you can begin to truly appreciate the NOTION of a

THINKING MACHINE, and what it means to model such a monstrosity. Then

you can begin to understand how to develop formal languages for solution

of specific problems. Then you can start understanding why it's really

strange to model THINKING as a neural network, although that is not a

completely useless way to do it. (Basically, neurel networks seem to me

to be ``pattern recognizers'', roughly, basically using fixed-point

iteration in metric spaces to hone in on a pattern or set of patterns of

behaviors of inputting agents. Please note that I said ``roughly''.

This is not intended to be a tutorial on neural networks.)

Of

course, it helps to have a notion of what it means to define or model

the concept described by the verb ``to learn''. That, my friends, is

the realm of philosophy and pedagogy, but to apply it requires an

understanding of the notion of a model, and we are back to my main

point: Take some model theory. It's not likely to hurt you for more

than a semester, and well...

NO PAIN NO GAIN!!

In

addition to Martin Thoma's great answer, I'd study up on the "Theory of

Computation". Text books abound, but they are expensive. Search on the

web. Wiki has an overview, but it's won't make sense until you've

studied a bit. Still, it may show you what you've missed.

· 1 Upvote

Bayes

Theorem is a fundamental concept of probability that underpins many

extremely important algorithms, from the very basic (e.g., Naive Bayes)

to the quite complicated (e.g., Latent Dirichlet Allocation).

In

linear algebra, a solid understanding of eigenvalues and eigenvectors is

important for topics such as principal component analysis, factor

analysis and other dimensionality reduction tasks.

I would suggest reading as much Linear Algebra books as possible, followed by some probability and statistics texts.

For

the first, I suggest Gilbert Strang's "Linear Algebra and Its

Applications", while for the second, "Probability, Random Variables And

Random Signal Principles" by Peebles is a good choice.

EDIT: A

previous answer suggests Convex Optimization text, which I also

recommend. A good text is "Convex Optimization" by Stephen Boyd, which

is also available for free in the author's website.

took both Andrew Ng's Machine Learning class and Sebastian Thrun's AI

class. I liked the Machine Learning one more - even though AI class

touches more topics, but it does it in a haphazard way. ML class is

narrower, but more practical and focused. It helps to keep a link to

Khan's linear algebra videos handy.

- ML class is running right now - https://www.coursera.org/course/ml
- Linear algebra: http://www.khanacademy.org/math/...
- Place where to find more useful courses: http://www.topfreeclasses.com

For understanding Machine Learning you need following Mathematics prerequisites :

**1. Probability and Statistics : **Machine

Learning has deep roots in Statistics. In fact the modern Machine

learning is essentially Statistical learning i.e using stats to find

patterns in data and inferring using them. So Stats and Probability are

bare minimum for ML.

**2. Linear Algebra : **This

is required because data is represented as matrix in Machine Learning

and essentially all ML algorithms can be seen as Matrix manipulation in

the end so basic understanding of Linear Algebra is required.

**3. Optimization : **Many

people argue that Machine learning is a fancy name for optimization.

While this is true to certain extent there is more to ML than

optimization. But a large part of it is indeed optimization. In the end

mostly all ML algos come down to some optimization task.

**4. Calculus : **This

is a very useful tool for ML. Most ML algos rely on Differential

Calculus to find solutions (Gradient Descent, Newton's method, quasi

Newton's method etc.).

IMO if you master these

topics than you can learn pretty much anything in ML, because all

algorithms are essentially application of these tools in Ml.

Conditional probability, random variables, pdfs etc. Whatever you'd

learn in your undergrad probability course and a bit more.

Some bit of stochastic processes (like markovian processes, etc)

Linear Algebra: Data analysis and machine learning builds up ALOT on these concepts.

Algorithms: not as important, but still important when it comes to optimizing your solution. some graph theory

Basic Linear Programming and recognizing convexity and relaxation.

I

guess once you have a fair idea of most of these concepts, then its

pretty simple to pick up the intuition behind any algorithm and where it

would work/ how to improve it/ etc.

· 1 Upvote

**Goal 1: To understand what is ML, how to apply different algorithms to the task, how interpret the output,common pitfalls etc : **You

will need to have a grasp on linear & matrix algebra,probability

and optimization. You don't need to take a deep dive into each one, but

study basic things like eigen vectors,conditional

probability,distributions, bayes theorem.Additionally learn the concepts

of overfitting, cross validation.

Resources

Video Lectures : Machine learning on coursera ( not only the one by AndrewNg, there are few others as well)

Books : Machine Learning by Tom Mitchell. It includes the necessary linear algebra and probability too.

**Goal2:Why the current algorithms are designed in a particular way, How do they fundamentally differ from each other:**

If you are more interested in the theoritical aspects like how kernels

of a support vector machine are defined, or how deep learning neural

networks are designed or how to tweak the existing algorithms to make a

new one,then you might want to extend your mathematics to functional

analysis, topology , advanced optimization.

Resources : Video: Advanced Machine learning, Caltech (Prof. Mostafa's Lecture)

Books

: Mining of massive datasets by ullman, Any good books on advanced

linear algebra, topology but they wont connect it to machine learning.

Learning

mathematics is about doing. Remember the 80/20 Rule : You must study

theory 20% of the time and practice/implement what you learn 80% of the

time.

Here is a list of books you could use. You can find accompanying online courses for many of them.

1. Strang's Linear Algebra and its Applications

2. Apostol Calculus - Both the volumes

3. Golub's Matrix Computations

4. Sheldon Ross' Probability

5. Elements of Statistical Learning by Hastie et al

6. Bishop's Pattern Recognition and Machine Learning

7. David Barber's Bayesian Reasoning and Machine Learning

8. Kevin Murphy's Machine learning: a Probabilistic Perspective

9. Wasserman's All of Statistics and Non-parametric Statistics

From Hacker News :

1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.

2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.

3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.

4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.

5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.

6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.

7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.

8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.

9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.

10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.

11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.

12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.

Please

do try implementing as many things as you can. Pick up a project. Talk

to your peers and professors and people, see if you can help them with

what you've learned. Do.

Some algorithms are really sweet they are available in Wikipedia with formulae, implementation and application.

Some dodge you till you watch two or three YouTube videos (Victor laverenko, Bert huang, udacity or MIT lectures)

Some

are really mischievous, you got to do a lot of research, they test your

patience and perseverance more than your mathematics!

And there are lots of books you can read.

**How to learn a particular algorithm?**

- First

from the business point, learn why to use a algorithm and not any other

counter part of its. Like why Fuzzy K-Means instead of K-Means. - Secondly from an analyst point of view. Learn how to use the algorithm to solve some use cases. What is it meant to do.
- Last will be the mathematics. The how of the algorithm. And more research to enhance the algorithm and patent it.

P. S. It is normal to not understand in the first go.

P. P. S. And very normal to get totally confused in the second and third.

Hi,

I work for a Data Science and AI company called InData Labs and on of

our tech experts has recently prepared a short guide to learn neural

networks, hope it is helpful for you:

A short guide to neural networks. Master them and become famous.

· 1 Upvote

Mathematics

is important part for learn machine learning. Necessary topics and

useful resources of mathematics for machine learning?

Here

i am sharing weightage of machine learning important mathematics topics

and making your confusion very clear. So see below list and start

preparation according it.

35% - *Linear Algebra*

25% - *Probability Theory and Statistics*

15% - *Multivariate Calculus*

15% - *Algorithms and Complex Optimizations*

10% - *Others*

Now

i am taking forward my article in deep level so you get totally

clearance to start machine learning or artificial intelligent .

*Linear Algebra***:**

Topics such as Principal Component Analysis (PCA), Singular Value

Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition,

QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization

& Orthonormalization, Matrix Operations, Projections, Eigenvalues

& Eigenvectors, Vector Spaces and Norms are needed for understanding

the optimization methods used for machine learning. The amazing thing

about Linear Algebra is that there are so many online resources.:Probability*Probability Theory and Statistics*

Rules & Axioms, Bayes' Theorem, Random Variables, Variance and

Expectation, Conditional and Joint Distributions, Standard Distributions

(Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment

Generating Functions, Maximum Likelihood Estimation (MLE), Prior and

Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.:*Multivariate Calculus*

topics include Differential and Integral Calculus, Partial Derivatives,

Vector-Values Functions, Directional Gradient, Hessian, Jacobian,

Laplacian and Lagragian Distribution.*Algorithms and Complex Optimizations***:**

Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc),

Dynamic Programming, Randomized & Sublinear Algorithm, Graphs,

Gradient/Stochastic Descents and Primal-Dual methods are needed.*Others***:**

This comprises of other Math topics not covered in the four major areas

described above. They include Real and Complex Analysis (Sets and

Sequences, Topology, Metric Spaces, Single-Valued and Continuous

Functions, Limits, Cauchy Kernel, Fourier Transforms), Information

Theory (Entropy, Information Gain), Function Spaces and Manifolds.

Now

you are thinking and looking for best knowledge and practice resources

for your week points right? Don’t worry learners i would also like to

suggest some few good resources for it.

**For Books :**

Programming Collective Intelligence by Toby Segaran , Pattern

Recognition and Machine Learning and others Artificial Intelligence 3e: A

Modern ApproachPaperback by Russell and**other books.**- Best resources for online , video tutorials : Coursera , Kachhua.com , Udemy Online Courses - Learn Anything, On Your Schedule, chalkstreet, etc.

Thank you. Keep Learning.

Optmization.

Especially convex optimization. E.g. gradient based methods for

non-linear optimiztion (L-BFGS method and conjugate gradient), quadratic

programming, etc.

I made a podcast episode on the math you need for machine learning, and the resources for learning (if you like audio): Machine Learning Guide #8

It covers most of the math you need to get started with machine learning.

There are many reasons why the mathematics of Machine Learning is important and I will highlight some of them below:

- Selecting

the right algorithm which includes giving considerations to accuracy,

training time, model complexity, number of parameters and number of

features. - Choosing parameter settings and validation strategies.
- Identifying underfitting and overfitting by understanding the Bias-Variance tradeoff.
- Estimating the right confidence interval and uncertainty.

- Linear

algebra is a cornerstone because everything in machine learning is a

vector or a matrix. Dot products, distance, matrix factorization,

eigenvalues etc. come up all the time. Gilbert Strang’s linear algebra course i would recommend

- a youtube playlist
- the book: Introduction to linear algebra
- course page at MIT OCW

*Multivariate Calculus*:

Some of the necessary topics include Differential and Integral

Calculus, Partial Derivatives, Vector-Values Functions, Directional

Gradient, Distribution.Differentiation matters because of gradient descent. Again, gradient descent is almost everywhere . some courses i recommend

- Introduction to Mathematical Thinking - Stanford University | Coursera
- Convex Optimization
- Massively Multivariable Open Online Calculus Course from the Ohio State University -
*the course is a first taste of multivariable calculus, but viewed through the lens of linear algebra*.

*Probability Theory and Statistics*:

Machine Learning and Statistics aren't very different fields. Actually,

someone recently defined Machine Learning as 'doing statistics on a

Mac'. Some of the fundamental Statistical and Probability Theory needed

for ML are Combinatorics, Probability Rules & Axioms, Bayes'

Theorem, Random Variables, Variance and Expectation, Conditional and

Joint Distributions, Standard Distributions (Bernoulli, Binomial,

Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum

Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori

Estimation (MAP) and Sampling Methods.

- Khan Academy's Linear Algebra, Probability & Statistics, Multivariable Calculus and Optimization.
- Larry Wasserman's book - All of statistics: A Concise Course in Statistical Inference.
- Udacity's Introduction to Statistics.

*Algorithms and Complex Optimizations*:

This is important for understanding the computational efficiency and

scalability of our Machine Learning Algorithm and for exploiting

sparsity in our datasets. Knowledge of data structures (Binary Trees,

Hashing, Heap, Stack etc), Dynamic Programming, Randomized &

Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and

Primal-Dual methods are needed.- Boyd and Vandenberghe's course on Convex optimization from Stanford.

Given all that , ML is not all about Maths and to frank Starting you will hardly spend 5% of your effort doing maths

I wrote a detailed medium post on this. You can read it here Math for Deep Learning is not Merlin’s Enchantment – Vaibhav Aparimit – Medium

First

of, I really like your question. You seem to implicitly understand that

math is an essential skill required to grasp the underpinnings of

machine learning .

If your question was around

deep learning, I would say linear algebra for 95% cases. In case of

machine learning you would need to know probability ( especially bayes

rules and conditional probability) , differential calculus and linear

algebra( matrix multiplication, Eigen vectors , determinants , Hessians )

Hope this helps .

You may find Metacademy helpful when trying to understand the prereqs for various concepts in machine learning: Concepts - Metacademy

You must have a sound understanding of at least the following (there might be others which are not there in this list):

- Linear algebra
- Calculus
- Matrix calculus
- Probability and statistics
- Optimization - linear programming, convex optimization, non-linear optimization

Some other topics that are useful in specific sub-areas of machine learning are:

- Basic graph theory
- Basic algorithms
- First-order logic

**Linear Algebra****Probablity theory and statistics****Multivariate calculus****Algorithms and Complex optimizations****Others-**Real

and Complex Analysis (Sets and Sequences, Topology, Metric Spaces,

Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier

Transforms), Information Theory (Entropy, Information Gain), Function

Spaces and Manifolds.

To learn them go through

Numerical Methods; Matrix and Tensor Algebra; Probability and Statistics; Operations Research; occasionally Calculus.

· Answer requested by Shuvanon Razik

· 11 Upvotes · Answer requested by Shuvanon Razik and Francisco Sosa

Alex mentions above, Andrew Ng's Course on Machine Learning is the best

I have seen so far and he gives an intuitional feel for the concepts,

so its easy to follow rather than looking at plain formulae in

mathematics.

If you are looking to refresh/clarify linear algebra concepts after going through the above course, Khan Academy

could be useful. It also has videos on other topics that might be of

interest for machine learning. If you are looking for concepts like PCA

etc., you might not find it here..

Another useful resource that can focus on concepts is video lectures.. Machine Learning - videolectures.net.. and search only for tutorials.

There

is no one stop shop as the concepts can go deeper and might require

special treatment.. All this is theoretical which can clarify concepts.

However, if you are a novice and if you want a deep and intuitive feel

for concepts then pick one simple problem and implement the solution.

· 1 Upvote

Linear Algebra, Statistics, Discrete Math, Set Theory, etc.

· 1 Upvote

friends i just came across a very interactive course on Understanding

Machine Learning. This is a completely free video course. You just need

to enroll using your id and password. I am sharing the link with you. Do

enroll

Understanding Machine Learning with R - uFaber.com

· 1 Upvote

### Related Questions

## 最新文章

- ASP.NET 页面禁止被 iframe 框架引用
- base的应用
- 第二天----列表、元组、字符串、算数运算、字典、while
- 在C语言源程序中的格式字符与空格等效
- HDU5853 Jong Hyok and String（二分 + 后缀数组）
- 基于TCP/IP的长连接和短连接
- [推荐] WordPress主题使用Google Fonts字体访问不了的解决办法
- android 多屏幕适配 ： 第一部分
- Oracle：Authid Current_User使用
- zboot/xtract.c
- android使用Intent操作拨打号码发送短信
- EXTJS4.2 时间动态刷新显示
- InputStream的封装类
- Android基础笔记（十四）- 内容提供者读取联系人
- Linux crontab命令
- 安装bower
- vs找不到svn源代码管理插件之我见
- 我推荐的一些iOS开发书单
- 开发微信小程序中SSL协议的申请、证书绑定、TLS 版本处理等
- 西门子 PLC SFC14/15 80B1故障

## 热门文章

- yum安装提示错误Thread/process failed: Thread died in Berkeley DB library
- Arduino通过I2C(SSD1306)驱动0.96寸12864OLED
- 转:extern "C"的用法解析
- Jqeury Mobile实战之切屏效果以及屏幕滚动到底端加载更多和点击切换更多
- jquery.uploadify 在firefox会出现httperror
- 阿里云k8s服务springboot项目应用升级时出现502错误
- ios中tableview的移动添加删除
- uitableview 和UISearchBar 下拉提示结合使用
- SqlServer整库备份还原脚本
- ng-file-upload结合springMVC使用